It's sometimes said that big data is like teenage sex, everyone talks about it but few are actually doing it. To which you could add that those who are doing it aren't really sure if they’re getting it right.
In an effort to find out how big data is being used in the real world, we spoke to the heads of three startup companies which are employing it in unique ways to pursue specific business opportunities.
Qbox
BN: Introduce yourself and your company.
MB: I am Mark Brandon, CEO and Co-Founder of StackSearch, the proprietors of the Qbox.io Hosted Elasticsearch platform. We help customers store, process, analyze, search, filter, and otherwise make meaning of their large data sets. I am passionate about data and IT because more often than not, it is felicity with data that has separated the winners from the losers over the last 30-40 years. I love helping customers get the edge, or to catch up. It's an arms race, and it's exciting.
BN: How are the tools you offer different from other options available? What backend sources do your tools have in common?
MB: Qbox provides hosted instances that can be spun up in the cloud over a dozen data centers across the globe, whereas our competitors usually offer 1-3. We have 4 officially trained Elasticsearch developers on staff, and of course, like to think our support is better too. Though I won’t badmouth our competitors here since I have never provisioned an instance from them. Technologically, Qbox does not have a "noisy neighbor" problem brought about by a container-based deployment. Elasticsearch is a technology where a bulk indexing can completely hoover up available RAM and even if you spin up resources immediately, there will be a lag of a few minutes where your customers will have an awful experience. We learned this the hard way. The container-based deployments are in a shared environment. Our resources are dedicated.
We use NGINX web server. Our site was built in Ruby on Rails with Bootstrap. The provisioning library and deployment scripts are a combination of chef, vagrant, and some from-scratch code. When we build front-end experiences for clients, and with our demos, we will often use AngularJS.
BN: What's the most unique application of your services you've come across so far?
MB: We liked a StackOverflow application that was built a little while back. We bought it and put it up on our site as a demo. The application processes in real time over 17 million tags on StackOverflow, allowing users to see trends on particular topics. For example, you should put the tag "elasticsearch" in the search box, see the trends, see the popular questions, the top answerers, and the top questioners. Then, compare Elasticsearch to the incumbent technology "Solr" that it is quickly eclipsing (the pun is fully intended).
BN: Is big data a good catch-all for what you're actually working in? What are some myths/misconceptions that you’d like to see clarified?
MB: What Mongolab is for MongoDB, Cloudant is for CouchDB, RedisToGo is for Redis, we are to Elasticsearch -- providing managed and hosted instances of the popular open source data exploration and analytics platform. We are working in big data. Elasticsearch is a technology for users whose data set and processing needs require more than 1 server (a "cluster").
MineWhat
BN: Introduce yourself and your company.
JG: I'm Janakram Ganesan, CEO and Co-Founder of MineWhat. I am a dreamer. A few years back, the grind, the routine, the same old work, food and sleep cycle finally got to me. I decided I wanted to make a difference in people's life so I chose entrepreneurship. My interest in data came about during the eCommerce boom around 2010. Focusing on ecommerce, MineWhat uses big data analytics to turn visitors into customers. The company offers an easy to navigate solution for ecommerce operators so that they can get precise information about their visitors. Gathering information like what products sell best in different regions and even how a competitor is doing, it offers actionable predictions that can be used to increase overall sales.
BN: How are the tools you offer different from other options available? What backend sources do your tools have in common?
JG: As far as backend sources go, our data collection method is quite similar to most other current analytical tools. We collect data with an asynchronous JavaScript tag, customers insert this tag into their web page code and the tag then sends data to our servers. For data processing our multi-tenanted architecture stores each customer's data on Cassandra and MongoDB clusters independently. Currently our servers are hosted in AWS, but we've been considering installing location specific data servers in the EU/APAC regions so we can reduce the response time for our collection scripts.
Our system is built so that most actions on the user end are automated. Most analytical tools require the users to write extra code to track any events. An example of this would be if you were to track an in-page dynamic element. On most tools today, you would have to write custom code for that. On MineWhat, this is bypassed. The major differences lie in what we do with the data next. Other tools help with finding out what happened, we help with understanding why.
Say I run an online store and I have to make a decision on how to spread my marketing budget across a few ad platforms (FB, Google, twitter). What web analytics will give me is how much revenue each of these generate. While I can make a decision based on that alone, the decision would be quite uninformed because I don’t know the "why" of it all -- did the Facebook shoppers see something that didn't appeal to their casual intentions?. I also don’t know what I need to do next -- which products should I display to FB shoppers?
That’s where we fit in.
BN: What's the most unique application of your services you've come across so far?
JG: Oddly enough, we hear our collaboration feature has helped set up a few dinner dates. One thing we quite hoped for, but weren't sure would happen, was to see the product being used by more than just analysts. We’ve been seeing category managers and the like creating their own custom dashboards to stay on top of their tasks.
BN: Is big data a good catch-all for what you’re actually working in? What are some myths/misconceptions that you'd like to see clarified?
JG: Mining big data is like a never ending search for hidden gold, and that's all it usually ends up being without a good way to interpret what you find. An ecommerce domain will need to take a very different approach than a finance one to get any real value from big data. We've found that vertically focused analysis is the key to derive insights out of huge volumes of data.
Info Assembly
BN: Introduce yourself and your company.
AG: I am Aditya Goel, Co-Founder of Info Assembly, an intended data discovery platform focusing on market and investment research. InfoAssembly takes both structured and unstructured data to give investors better insights on emerging market trends. An investment researcher would traditionally have to spend valuable time making sense out of mountains of unstructured information like government filings, media reports, blogs and social data. With InfoAssembly they can easily filter out unreliable information and quickly graph trends to determine the best investment opportunities.
BN: How are the tools you offer different from other options available? What backend sources do your tools have in common?
AG: Info Assembly offers a context aware visual search and analysis platform with an intuitive interactive visualization that quickly connects and identifies people, organizations, locations and high level themes and topics in thousands of documents. We aim to speed up the research process through the right mix of machine learning and user interaction experiences so that the end user can get an overview quickly and dig deeper in an interactive manner.
We use D3.JS, one of the most robust and flexible rich JavaScript based frameworks for visualizations. Our backend is run on Stanford NLP and other open source machine learning framework in Java and Python. Our Server side application in Node.JS ties up well with Angular.JS as our front end framework. Also we use Elastic Search for free text search
BN: What's the most unique application of your services you've come across so far?
AG: The most unique application of Info Assembly is how an investment analyst was able to quickly come up with over 50 investment ideas by just pushing in over 40,000 news articles on the global macroeconomics and consumer packaging goods sector. We're working to make the experience much smoother and so we can validate these ideas a lot faster.
BN: Is big data a good catch-all for what you’re actually working in? What are some myths/misconceptions that you'd like to see clarified?
AG: The biggest misconception is that you can plug in lots of information and big data just happens like magic. The most important step in any big data/machine learning application is to get clean data input in the right format. The primary effort and a lot of success depends on pre-processing. Remember that a smaller amount of data with better quality is always better than "just more data". A really good data scientist is one who knows how to pre-process the data first.
Image credit: David Gaylor /Shutterstock