Big Data in the cloud – avoiding cloud lock-in

In our previous article we looked at different approaches to introducing Big Data technology in your business – either as a generic or specific solution deployed on premise or in cloud. Cloud gives obviously very good flexibility when it comes to experimenting in early stages when you need quick iterations of trying and failing before you find the use case and solution, which fits your business needs best.

Photo:Pexels

Cloud lock in

However in your try and fail iterations you will need to focus to not to fall in another pitfall – the cloud vendor lock-in or simply cloud lock-in. By cloud lock-in we mean using  vendor specific implementations, which only a particular cloud supplier provides. A good example here could be Amazon Kinesis or Google Big Query. Using this specialized functionality may seem to be a quick way of implementing and delivering your business value however if your cloud provider chooses to phase out support for that functionality your business may be forced to reimplement parts or whole of the system that depends on it. A good strategy against lock-in is particularly important for established businesses although while for startups with relatively thin software stack this isn’t such a big deal since the switching costs are usually stil low.

Open source to the rescue

Open source software has a great track of providing a good solutions to reduce vendor lock-in. It has helped fighting vendor lock-in for decades. In particular within operating systems Linux has played an important role in fighting the vendor locking. Taking this into the Big Data world it does not take long time to understand that automation and in particular open source automation tools play important role in avoiding cloud lock-in. This could for instance be achieved by deploying and running the same complete Big Data stack on-premise and in the cloud.
Using automation tools, like Chef, Puppet, Ansible Tower is one of the strategies to avoid vendor locking and quickly move between the cloud providers. Also container technologies like Docker or OpenShift Containers make it possible to deploy the same Big Data stack, either it is Hortonworks, Coouder or MapR across different cloud providers, making it easier to swap or even use multiple cloud setups in parallel to diversify the operation risks.

What about Open Source lock-in?

Listening to Paul Cormier at RedHat Forum 2016 (Oslo) last week one quickly could get an impression that the cloud lock-in can simply be avoided by promoting Open Source tools like Ansible Tower or OpenShift Containers. These solutions effectively help turning the IaaS and PaaS resources offered by the Big Three (Amazon, Google and Microsoft) as well as other cloud providers into a commodity. On the other hand critics of Open Source could say that by using this kind of solution you actually get into another kind of lock-in. However the immense success of Open Source software over last 15 years shows that lock-in in case of an Open Source system is at most hypothetical. It is easy to find a similar alternative or in absolutely worst case scenario to maintain the software yourself. Open Source by its very nature of being open brings down any barriers for competitive advantage and the new ideas and features can easily be copied by anyone, anywhere and almost at no time.
Creative Commons License

This work excluding photos is licensed under a Creative Commons Attribution 4.0 International License.

Leave a Reply

Your email address will not be published. Required fields are marked *