Travel Vendor Finds Millions of Dollars of Revenue
Problem: One of Europe's largest travel vendors had outdated data collection and storage and no data strategy. They were missing tens of millions of dollars of revenue.
Solution: After working closely with the executive team, we mapped out a Data Strategy for the company to modernize their infrastructure using real-time data. We showed them a clear path to $8,000,000 in increased revenue in 16 weeks by targeting more products to their best customers, growing to $47,000,000 in the next year. After a fantastic initial experience, they knew we were the right experts to build their system, so we brought in a world-class team of data scientists, PhDs, and engineers and built an end-to-end, real-time data platform.
Public Company Acquisition
Problem: Gamestop was looking to reinvigorate their retail with new target demographics and offerings. But they didn’t know which demographics to target or how to target them.
Solution: Gamestop brought us in as experts with the industry knowledge to handle demographic data research and vertical analysis. After briefing Gamestop executives on our research results, they re-oriented their brand toward millennials and women. In the process, Gamestop acquired ThinkGeek in May, 2015.
"Pretty much the best [fashion] app ever." – The Gloss
Problem: Build and launch the next generation of Cloth, a popular mobile iOS fashion app with hundreds of thousands of users.
Solution: In a timespan of 1 year, we raised investment capital, built an engineering team, and launched Cloth (clothapp.com). The engineering team consisted of seven contributors with back-end, iOS, and distributed systems experience. We built a close relationship with the Google App Engine team and secured large commitments of resources. The app is massively scalable, tested to handle millions of users. Cloth features a photostream, social sharing, group messaging, push notifications, daily alerts, and elegant design-driven development.
Real-Time Infrastructure for Major Social Game
Problem: QuizUp, a Top 10 mobile game, needed to launch a Version 2.0 with high-volume social data requirements.
Solution: RoboticProfit provided intensive, on-site architecture consulting and prototyping. This included a review of the existing Python/SQL/ElasticSearch codebase, AWS architecture, multi-threaded Python implementation tweaks, and a full load testing framework.
Are You Leaving Money on the Table because of Inefficient Product Pricing?
Problem: A large pharmaceutical vendor was losing millions of dollars a year due to inefficient drug pricing. They didn’t have the tools to analyze prices and predict responses to pricing changes.
Solution: The client had multiple databases of drug sales, pricing information, insurance reimbursement, manufacturing data, and more. With a run rate of almost $200,000,000, they were spending significant capital on obtaining drugs for pharmacies. However, they had no data strategy and used guesswork to determine what drug prices to negotiate. They were such a large purchaser that their pricing could move the entire generic drug market. RoboticProfit was able to quickly research the core issues and create a technical roadmap. We created algorithms and data mining solutions to clean and extract the relevant data, and then simulated negotiation strategies and discounts. We also prototyped a market simulation that predicted how changing prices on various drugs would affect insurance reimbursement rates, overall drug prices, manufacturing volume, and more.
Personalized Recommendation Engine for Educational Videos
Problem: A large educational video vendor wanted to improve the user experience and maximize revenue by building a ranking system and recommendation engine for training materials on their website.
Solution: We analysed the current state of their data and quickly produced a series of options for proceeding based on their business needs. We created a comprehensive Data Strategy Roadmap presenting the way forward, including a summary of the tools they would need along with customized recommendations. We outlined what they could create with the existing data, where they would run up against problems, and what kind of data they needed to collect in order to optimize their efforts. We outlined the opportunities for positive impact of the recommendation engine and how they could tailor it to meet their goals in a way tailored to their model. They then hired a team to implement the engine.
Building a Real-Time Social Media Analytics Engine
Problem: A company with over $100M in VC wanted to run real-time social media analytics for brands. This meant collecting and analyzing several terabytes of data and millions of documents an hour and then running NLP and Sentiment analysis on it.
Solution: We built a real-time social media analytics engine for their core business, powered by Hadoop, HBase, and Solr. The collection engine comprised several modules: webcrawlers, RSS readers, API streams, and more. After data collection and storage, we built processes to match data sources to parsers to extract structured data from unstructured sources (like determining the author of a blog post). We then integrated NLP and sentiment analysis packages to classify the data. Finally, data was indexed in a Solr cluster and made available for real-time faceted search. As a result, the product became a market leader and Visible was acquired by a large enterprise software vendor.
Government Streaming Social Media Analytics and Collection Engine
Problem: Monitor all social media in real-time for terrorism activities and provide actionable analytics.
Solution: Our client had recently become a vendor to the government, which was looking to detect terrorist activity via social media. To respond to and prevent disasters, response time needed to be in seconds. We architected a real-time system to analyze this data. We built a streaming collection engine running on Google Cloud to pull data in from Twitter and similar sources. The streamed data is then passed through NLP, entity extraction, and sentiment analysis services. The data is stored in bulk with Google Cloud Datastore and simultaneously indexed and faceted with ElasticSearch. The ElasticSearch cluster can be queried in milliseconds for real-time updates, and it takes only a few seconds to extract a piece of data through the system. This allows the government to respond to terrorism threats in real-time.
Data-Heavy Geospatial Analytics: Fixing Query Latency
Problem: Develop a scalable platform for geospatial pathfinding and reduce query latency from minutes to milliseconds.
Solution: Our client was collecting GPS data from several thousand devices an hour. The existing dataset was placed in a sharded Ruby Object Store and experienced query delays of minutes, bringing functionality to a complete halt. We re-designed the storage infrastructure of their platform. Using HBase as the new distributed storage layer, we designed a new indexed geohash format for storing data. We re-wrote custom Ruby scripts in Java to clean and process data and port it to Hadoop. As a result, all infrastructure bottlenecks were removed and query latency dropped from 10 minutes to 300ms. This functionality allowed the company to secure major new clients.
Social Event Business Intelligence and Oracle Migration
Problem: Our client had hundreds of millions of invitations and users. They wanted to use data to make informed decisions about which aspects of their business to grow but they couldn’t ask the right questions.
Solution: With all their data stored in an Oracle data warehouse, asking the right questions to get solid business answers was physically impossible. Queries took days to run, if they completed at all. We developed a BI solution for their website visitors, migrated their data from the outdated Oracle warehouse into an HBase cluster, and then designed reports using Hive to make it easy for their analysts to make decisions. As a result, their analysts were able to design queries that provided more accurate growth and financial predictions.
Social Gaming Analytics: Millisecond Query Times
Problem: HBase queries were taking 100 times too long, with frequent downtime.
Solution: Our client used HBase to power a real-time retargeting engine. Unfortunately, the queries were taking seconds to run and were frequently not completing. In addition, thousands of dollars a week were being wasted on Amazon EC2 instances. We tuned their queries and JVM settings for low-latency workloads and were able to reduce the number of EC2 instances. This brought query times from minutes to milliseconds.