As Big Data becomes more and more popular, and more and more options become available selecting Big Data technology for your business can become a real headache. Number of options of different stacks and tools is huge ranging from pure Hadoop and Hortonworks to more proprietary solutions from Microsoft, IBM or Google. If this wasn’t enough you will need to choose between on premise installation and cloud solution. Number of proprietary solutions also increases at a huge rate. Here we sum up a few strategies to introduce Big Data in your business.
One of the first questions you will meet when looking into possibilities of using Big Data for your business is if you should build a generic platform or a solution for specific needs.
Building for specific needs
In many businesses, if you follow internal processes and project frameworks you will intuitively ask yourself what purpose or use case you want to support using Big Data technology. This approach may seem to be correct, but unfortunately, there is number of pitfalls here.
First of all, by only building a platform for specific needs and specific use cases, you will most likely choose a very limited product, which only mimics some of the features of a full-blown implementation. Examples here might be classical, old-fashioned analytical platforms like e.g. a Data Warehouse, statistical tools or even a plain old relational database. This will be sufficient for implementing your use case but as soon as you try to reuse it for another use case, you will realize the limitations. In particular the fact that you need to decide the structure of the stored data before you start collecting it, you need to transform it to adapt it to the new use case and face issues with scale-up every time the data volume increase and your Data Warehouse or relational database is unable to keep up with the volume and velocity of the data. You will in another word largely limit your flexibility and the possibility to explore your data.
A solution implemented for specific needs is in practice not really a Big Data solution although your vendor may insist calling it Big Data, thus this is just a Small Data solution. It may still be a viable choice for your business as long as you do not have any bigger ambitions or expectations in the future. By introducing more and more solutions like this you will ultimately fragment and disperse your business data into multiple loosely connected systems. The more fragmentation there is, the more difficult it gets to analyze data across your business.
Build a generic platform
Building a generic platform is much harder, but might be the right thing to do. It requires though courage to build a solution and start collecting data often without an adequate use case, to begin with. This is often difficult to advocate for, it is a leap of faith or a bet that your business needs to take. However, if you really want to unleash the power of Big Data, this is the strategy which potentially will both give you the flexibility to explore your data and to conduct experiments and find new facts, information and ways to use it for your business. This kind of platform based on open Big Data technology like Hadoop will also be easier to scale when needed and process increasing volumes and velocity of data.
The second very basic question one will meet is where to deploy and establish your platform – Cloud or on-premise? Although this question may seem really unrelated to it is important to be aware of the implications of chosen right deployment strategy.
On-premise platform
Choosing the on-premise platform seems like a natural choice here for many established business with established, in-house IT operations. However as soon as you choose to build a generic platform you will quickly realize that you need to experiment since the number of different Big Data stacks, technologies and tools is extreme. You need to be able to quickly change from one solution to another without too much lead time and waste. It may be hard to change the platform once you have heavily invested in an expensive proprietary on-premise platform like Oracle Big Data Appliance or even IBM Big Insights. It also requires people with a rather specific skill set to maintain the platform.
Cloud platform
Cloud-based Big Data platform like Amazon EMR, Google Cloud Platform or Microsoft Azure provides necessary flexibility and agility which is crucial when starting experimenting with Big Data. If you want to focus your business on what matters the most you will concentrate on the core of your business. Setting up hardware, installing Hadoop and running the basic Big Data infrastructure is not what most businesses need to focus on and should prioritize.
The cloud platform is especially relevant in the first, exploration phase when you are still unsure what to use the technology for. After the first exploration phase, when your solution is stabilized you may still reconsider sourcing in operations BigData technologies however in most of the cases you will like to still keep the flexibility of the cloud.
Summary
All in all, the best strategy is a platform which is open and flexible enough to cover future cases, do not build your BigData solution just for current needs. This is one of the cases when you actually need to concentrate more on technology and capabilities and not only the current, short-term business needs.
This work excluding photos is licensed under a Creative Commons Attribution 4.0 International License.