The architect’s dilemma; high, low, or no-code?

Photo: Craiyon

For decades IT architects and developers have dreamed of building fully configurable software that does not require coding. As far back as the 1980s, we heard and dreamt about fifth-generation programming languages and model-based approaches to programming.

The recent arrival of concepts like Composable Business (See bit.ly/3GoJ0O8), which emphasizes the use of modular and configurable blocks to implement business support functions using multiple building blocks only accelerated these trends. Also new practices and patterns like automation, cloud, DevOps, agile development, advanced IDEs (Integrated Development Environments), and last but not least, generative AI (ChatGPT and similar) made coding much faster and more optimal than it used to be just decade ago. 

Although the development of no-code or low-code platforms has only accelerated since then, the high-code approach has also been optimized simultaneously. Low-code and no-code still have a way to go to replace the high-code completely. The question remains, though, what to choose, the classical high-code approach, low-code, or no-code approach? Which is better? As usual, the answer is not trivial. It all depends.

Let’s start clarifying what we mean by low-code/no-code and high-code approaches:

  • Low-code (no-code) refers to software development platforms allowing users to create applications with minimal or no coding. These platforms often have a visual interface that enables users to build applications by dragging and dropping pre-built components or using predefined templates. Low-code platforms are designed to make it easy for non-technical users or business analysts to create applications without requiring specialized programming knowledge. Examples include Service Now, Microsoft Dynamics 365, Salesforce, Microsoft Power Platform, Appian, and similar solutions.
  • On the other hand, high code refers to traditional software development approaches that require extensive coding in a programming language. High code development typically involves writing code from scratch and using libraries and frameworks to build applications. It requires a strong understanding of programming concepts and higher technical expertise. Examples are all major programming languages and development stacks and tools: Python, Java, C#, C++, Intellij, Eclipse, and thousands of associated tools for development and testing.

As we experience each aspect of ​​human activity, development in IT is also a cyclical process. It happens now and then a reevaluation – techniques and practices considered a bleeding edge and modern must give way to procedures and practices once regarded as outdated. The configurable off-the-shelf solutions, either as best-of-suite or best-of-breed solutions, were an obvious no-code/low-code choice just a decade ago. However, a lot has changed around ​​building and implementing IT solutions in the last decade, and the high-code approach has also been improved.

These improvements – such as Agile, CI / CD (continuous integration and continuous deployment), containerization, cloud, IaC (Infrastructure as Code), and DevOps – reduce unit costs of IT systems development and increase fault tolerance. We can now create high-quality software in small increments, often providing measurable business value, and we can withdraw a change with errors at no cost. Modern hyper-shortening business cycles prefer fast point solutions, and all modern IT engineering is suited to such activities – from the cloud, and manufacturing processes, to architecture.

For us architects, this is a world with increasing challenges, but it is often more profitable to make custom software than to implement ready-made “combines.” It is often cheaper to change software than to buy a configurable solution and change the configuration of such a solution.

The no-code/low-code platform economy is absolute. Each configuration option adds an extra dimension of complexity to the solution that must be paid for today, no matter whether we will ever use it. Compared to the tailored solutions created based on actual needs and requirements, there are several economically unnecessary and meaningless functions in a configurable system. Maintaining it also means an additional dimension of complexity in every analysis, test, and each implementation. 

Is there, therefore, any point in using no-code/low-code with all these drawbacks?

Low-code and no-code cannot replace coding completely. However, there are still several cases where no-code/low-code platforms are the best fit. That includes:

  • Non-technical users or business analysts can use low-code platforms to build applications that automate business processes or improve workflows. This concept is known as citizen development, and it can help organizations quickly respond to changing business needs without relying on IT teams.
  • We can use Low-code platforms to build simple applications quickly, which can be helpful in situations where there is a need for rapid application development, such as in startup environments or for small businesses (rapid application development).
  • Low-code platforms allow users to quickly build and test prototypes of an application without the need for extensive coding. Such low-code prototyping can help evaluate an idea’s feasibility or gather feedback from stakeholders.

Low-code and no-code are particularly well-suited for implementing standard functionality which is not specific to a particular business domain. Implementing systems and platforms for support functions like sales/CRM (Customer Relation Management), HR (Human Resources), logistics, IT support, and IT infrastructure makes sense. These areas often require a low level of customization. All kinds of enterprise systems like CRM, ERP (Enterprise Resource Planning), ITSM (IT Service Management), integration platforms, and basic infrastructure are often where low-code and no-code can be optimal. The same applies to systems and functions, which are well standardized and used by several actors in the same industry, e.g.:

  • OSS (Operation Support Systems) 
  • BSS (Business Support Systems) solutions
  • OT (Operation Technology) systems for energy actors, travel planner systems for mobility actors
  • and so on.

On the other hand, the functionality specific to your business and where there are no standard solutions are often targets for tailored high-code implementation. Depending on the complexity and scale of the project, low-code and no-code platforms may not be suitable for large-scale or performance-critical applications, as they may not be able to handle the volume of data or processing requirements. High-code approaches may be ideal for these projects, as they allow for more flexibility and control in the development process.

Low-code and no-code platforms may be suitable for organizations that do not have access to or cannot afford specialized programming resources, as they allow non-technical users to create applications without coding knowledge. On the other hand, high-code approaches may require a more significant investment in training and development resources, as they require a strong understanding of programming concepts.

Low-code and no-code platforms may also offer limited customization options and may need to be able to integrate with other systems or technologies as seamlessly as high-code approaches. This issue may be a significant drawback for organizations that need to integrate their applications with other systems or have specific customization requirements.

Finally, low-code and no-code platforms may struggle to keep up with the rapid technological change and may become outdated or unsupported. High-code approaches may offer more flexibility and adaptability, allowing developers to customize and update their solutions precisely as required.

To summarize. The choice between low-code/no-code and high-code approaches will depend on the specific needs and resources of the organization, as well as the complexity and scale of the project. While low-code and no-code platforms may be suitable for prototyping and testing ideas, creating simple applications quickly, or for non-technical users, they may be less useful for more complex or customized projects that require specialized programming skills. On the other hand, high-code approaches may be ideal for these projects, as they offer more flexibility and control in the development process. However, high-code methods may require a more significant investment in training and development resources. They may not be as suitable for organizations that do not have access to specialized programming resources. There is no single answer to the question of what to choose. As usual, it all depends. However, in any organization and most cases, we should see both low-code/no-code and high-code solutions simultaneously. Many organizations can reduce the use of high-code/tailored solutions to the absolute minimum. We should, of course, attempt to minimize the use of high-code to zero since that is the dream, but that should never be a goal by itself.

Views expressed are my own.

Capitalizing the value of your data, get your basic in place first!

It has been a while since The Economist proclaimed that “data is the new oil” following the tremendous surge of profits of FAMGA – Facebook, Apple, Google, Microsoft and Amazon. Businesses in all kinds of industries from utilities to retail, followed and embarked on this new trend and started hoarding vast amounts of data, strengthening their analytical teams and looking for use cases that make it possible to extract value from data. As it turns out however this isn’t an easy task especially for not typical IT companies.

Photo: Shutterstock.com

It does not take a long time to realize that the insights are never better than the underlying data. It is slowly becoming obvious how crucial it is to have in place sufficient control over data quality and information governance.

But first thing first – before you can improve the data quality, you need to understand what data quality means. Data quality isn’t just a single dimensional feature. It is a broad term and often described by a number of dimensions, see e.g. 6 dimensions or data quality worksheet:

  • completeness – data must be as completely as possible (close to 100%)
  • consistency/integrity – there should be no differences in the dataset when comparing two different represantations of the same object
  • uniqueness – avoiding duplication of data
  • timeliness – whether information is available when it is expected and needed
  • validity/conformity – data are valid if it conforms to the syntax (format, type, range) of its definition
  • accuracy – how well the data set represents the real world
  • traceability – is it possible to track the data origin and its changes

You will need to work with all of these dimensions. It isn’t enough to improve the completeness of the data if the data does not conform to the expected format.

Moreover, there are some profound implications to your organization. Although the quality of data is something that the whole organization should focus on it is natural to focus on teams that actively use the data and are dependent on the quality and suffer most. This usually includes the customer-facing channels, user-facing interfaces or data-warehouse teams that are the first to observe and detect issues related to the data quality. This is often the case when the information governance framework is missing or it is poorly implemented.

Consequently, the information governance framework becomes crucial to ensure sufficient control over data and data quality. Such framework includes both a set of principles, i.e. information governance principles, which are established and supported by the organization as well as new roles to enforce it. Moreover, there is often a need to establish or strengthen the data culture, focus on data quality and a right mindset to both ensure that the quality issues are corrected at the origin and not where they manifest themselves.

Photo: Shutterstock.com

The information governance organization itself can operate under a number of different models, i.e.:

  • IT driven
    IT takes care of everything, storage, processing and processes that secure high quality, structuring and catalogization of data
  • business driven
    IT only provides the storage infrastructure, business is in charge of processes to secure data quality
  • hybrid model
    IT driven in some domains, business driver where it makes most sense, probably the most pragmatic approach

The process of improving your information architecture and information governance framework isn’t so complicated, but it requires some effort and a huge amount of patience as it is primarily an organization and culture change.
In order to improve the information governance framework and as a result improve data quality, you will need to get through at least the following steps:

  1. Get an overview of the information architecture and create/improve data models
    You need to know the current state of the union when it comes to what are the most central information entities, how the information is modeled, used and transferred between different parts of your organisation.
  2. Get an overview over pain points in data quality
    You need to know the actual data related issues that your organisation currently experiences. Without proper insights you are unable to improve the data quality. You need to talk to the business, talk to people around to get enough insights and understanding of most critical data related issues they deal with.
  3. Create an initial set of governance principles
    Establish the initial governance framework, first of all by creating and describing a set of principles for Information Architecture, Enterprise Information Architecture as well as principles for data analytics and advanced analytics. Get sufficient backing in the organization.
  4. Adjust the organization, create new roles and responsibilities including roles like information owners, information stewards, data stewards, data scientists, and other roles (see e.g. IBM Redbook, IA governance)
  5. Finally, consider and introduce new tools and technologies for managing the information
    Depending on the results of previous steps and needs of your organisation you may need to consider new tools for better control of your master and reference data. The most obvious one is a Master Data Management system. A Master Data Management system makes it possible to reduce manual operation on master data, coordinate master data between different systems and keep it aligned as well as detect any deviation from the data model.

Although it is very tempting to jump on and start implementing new, exciting use cases for AI/Machine Learning, the actual value of this technology is completely dependent on the underlying data quality and other aspects of information architecture. Data quality and proper information governance are crucial, basic aspects. Without them the vast amounts of data that you spend lots of effort gathering becomes not oil, but garbage with little value.

ArchiMate 3.0 – a modern modeling language for digital age

A great deal of IT Architects aren’t big fans of modeling languages, modell driven development and modeling tools. MS PowerPoint, Visio or other drawing tools are far too often used as a surrogate for a more structured approach. However communicating ideas clearly is crucial for an IT architect and not everything can be easily explained with words, and PowerPoint drawings are often too ambiguous in expression. Creating comprehensive diagrams and modells that clearly express the ideas is still crucial for IT Architects and developers to be able to communicate the ideas both in within the development teams. It is also crucial for efficient communication with other parties including the business stakeholders.

Photo: Shutterstock.com

For the Enterprise Architects over long time there was no good alternative to UML. UML is good for low level software modeling in particular application architecture. It is far less useful when communicating with business. There existed BPMN, but it was mainly covering process related modeling and not covering all the needs related to modeling strategy, tactics or even the business processes.

This was the situation until the arrival of ArchiMate in 2009. Based on IEEE 1471, developed by ABN AMRO and introduced by The Open Group The Open Group. Archimate defines three main layers: Business, Application, and Technology:

  • Business layer describes business processes, services, functions, and events. It describes the products and services offered to the external customers
  • Application layer describes application services and components
  • Technology layer describes hardware, communication infrastructure, and system software

Those three layers provide a structured way of bridging the different perspectives from business to technology and infrastructure.

However, the full model of ArchiMate 3.0 also brings or enhance another three very useful layers:

  • Strategy and Motivation layer – introduced in 2016 in ArchiMate 3.0 for modeling of the capabilities of an organization and help to explain the impact of changes on the business (gives a better connection between strategic and tactic planning)
  • Implementation and Migration layer – supports modeling related to project, portfolio or program management
  • Physical layer – for modeling physical assets like factories

These last three layers are crucial to properly bridge together the world of business with software and technology. In that sense, ArchiMate is bringing a new quality when it comes to modeling languages. ArchiMate 3.0 is also tightly connected and aligned with TOGAF 9.1 which makes it even more suitable as a new state of the art modeling language.

A simple example of Strategy and Motivation layer modeling

Summing up, ArchiMate 3.0 brings several new capabilities and qualities to modeling, which makes it a great tool for the digital age, where we are not only supposed to model the software and technology itself, but where it becomes increasingly important to be able to link the business models, strategy, tactics to the actual business processes and finally applications and technology.

Creative Commons License

This work excluding photos is licensed under a Creative Commons Attribution 4.0 International License.

Let it crash

As functional programming paradigm becomes more and more broadly recognized,interest in functional languages (Scala, F#, Erlang, Elixir, Haskell, Clojure, Mathematica and many other) increases rapidly over last few years, it still remains far from the position that mainstream languages like Java and .NET have. Functional languages are predominantly declarative and based on principles of avoiding changing state and eliminating side effects. Several of these languages and frameworks like Scala/Akka and Erlang/OTP also provide new approach to handling the concurrency avoiding shared state and promoting messaging/events as a mean for communication and coordination between the processes. As a consequence they also provide frameworks based on actors and lightweight processes.

Fail-fast, on the other hand, as an important system design paradigm helps avoiding flawed processing in mission critical systems. Fail-fast makes it easier to find the root cause of the failure, but also requires that the system is built in a fault-tolerant way and is able to automatically recover from the failure.

Fail-fast combined with lightweight processes brings us to “Let it crash” paradigm. “Let it crash” takes fail-fast paradigm even further. The “Let it crash” system is not only build to detect and handle errors and exceptions early but also with an assumption that only the main flow of the processing is the one which really counts and the only one that should be implemented and handled. There is little purpose in programming in a defensive way, i.e. by attempting to identify all possible fault scenarios upfront. As a programmer, you now only need to focus on the most probable scenarios and the most likely exceptional flows. Any other hypothetical flows are not worth to spend time on and should lead to crash and recovery instead. “Let it crash” focuses on the functionality first and this way supports very well modern Lean Development and Agile Development paradigms.

As Joe Armstrong states in his Ph.D. thesis, if you canʼt do what you want to do, die and you should not program defensively, thus program offensively and “Let it crash“ Instead of trying focusing on covering all possible fault scenarios – just “Let it crash“

Photo: Pexels

However, recovery from a fault always takes some time (i.e. seconds or even minutes). Not all kinds of languages and systems are designed to handle this kind of behavior. In particular “Let it crash” is hard to achieve in C++ or Java. The recovery needs to be fast and unnoticed for the processes which are not directly involved in it. This is where functional languages and actor frameworks come into the picture. Languages like Scala/Akka or Erlang/OTP promote actor framework, making it possible to handle many thousands of processes on a single machine as opposed to hundreds of OS processes. Thousands of lightweight processes make it possible to isolate processing related to a single user of the system or a subscriber. It is thus cheaper to let the process crash, it recovers faster as well.

“Let it crash” is also naturally easier to implement in an untyped language (e.g. Erlang). The main reason for this is error handling and how hard it is to redesign the handling of exceptions once it is implemented. Typed languages can be quite constraining when combined with “Let it crash” paradigm. In particular, it is rather hard to change an unchecked exception into checked exception and vice versa once you designed your java class.

Finally “Let it crash” also implies that there exists a sufficient framework for recovery. In particular, Erlang and OTP (Open Telecommunications Platform) provides a concept of supervisor and various recovery scenarios of the recovery of whole process trees. This kind of framework makes implementing the “Let it crash” much simpler by providing a foolproof, out of the box recovery scheme for your system.

There are also other benefits of “Let it crash” approach. As there are now each end-user of your system, and each subscriber is represented as a single process, you can easily take into use advanced models like e.g. finite state machines. Even though not specific to Erlang or Scala, the finite state machines are quite useful to understand what has lead to the failure once your system fails. Finite state machines combined with a “Let it crash” frameworks can potentially be very efficient in for fault analysis and fault correction.

Although very powerful and sophisticated, “Let it crash” did unfortunately not yet gain much attention besides when combined with Scala/Akka and Erlang/OTP. The reasons are many, on one side (as explained above) the very specific and tough requirements on the programming languages and platforms but also the very fact that only the mission-critical systems really require this level of fault tolerance. In the case of classic, less critical business systems, the fault tolerance requirements are not significant enough to justify the use of a niche technology like Erlang or Scala/Akka.

“Perfect is the enemy of good” and mainstream languages like Java or .NET win the game again, even though they are inferior when it comes to fault-tolerance and supporting “Let it crash” approach.

Creative Commons License

This work excluding photos is licensed under a Creative Commons Attribution 4.0 International License.

Big Data solution – generic or specific, cloud or on-premise?

As Big Data becomes more and more popular, and more and more options become available selecting Big Data technology for your business can become a real headache. Number of options of different stacks and tools is huge ranging from pure Hadoop and Hortonworks to more proprietary solutions from Microsoft, IBM or Google. If this wasn’t enough you will need to choose between on premise installation and cloud solution. Number of proprietary solutions also increases at a huge rate.  Here we sum up a few strategies to introduce Big Data in your business.

One of the first questions you will meet when looking into possibilities of using Big Data for your business is if you should build a generic platform or a solution for specific needs.

Photo: Vasin Lee/Shutterstock.com

Building for specific needs

In many businesses, if you follow internal processes and project frameworks you will intuitively ask yourself what purpose or use case you want to support using Big Data technology. This approach may seem to be correct, but unfortunately, there is number of pitfalls here.

First of all, by only building a platform for specific needs and specific use cases, you will most likely choose a very limited product, which only mimics some of the features of a full-blown implementation. Examples here might be classical, old-fashioned analytical platforms like e.g. a Data Warehouse, statistical tools or even a plain old relational database. This will be sufficient for implementing your use case but as soon as you try to reuse it for another use case, you will realize the limitations. In particular the fact that you need to decide the structure of the stored data before you start collecting it, you need to transform it to adapt it to the new use case and face issues with scale-up every time the data volume increase and your Data Warehouse or relational database is unable to keep up with the volume and velocity of the data. You will in another word largely limit your flexibility and the possibility to explore your data.

A solution implemented for specific needs is in practice not really a Big Data solution although your vendor may insist calling it Big Data, thus this is just a Small Data solution. It may still be a viable choice for your business as long as you do not have any bigger ambitions or expectations in the future. By introducing more and more solutions like this you will ultimately fragment and disperse your business data into multiple loosely connected systems. The more fragmentation there is, the more difficult it gets to analyze data across your business.

Build a generic platform

Building a generic platform is much harder, but might be the right thing to do. It requires though courage to build a solution and start collecting data often without an adequate use case, to begin with. This is often difficult to advocate for, it is a leap of faith or a bet that your business needs to take. However, if you really want to unleash the power of Big Data, this is the strategy which potentially will both give you the flexibility to explore your data and to conduct experiments and find new facts, information and ways to use it for your business. This kind of platform based on open Big Data technology like Hadoop will also be easier to scale when needed and process increasing volumes and velocity of data.

The second very basic question one will meet is where to deploy and establish your platform – Cloud or on-premise? Although this question may seem really unrelated to it is important to be aware of the implications of chosen right deployment strategy.

On-premise platform

Choosing the on-premise platform seems like a natural choice here for many established business with established, in-house IT operations. However as soon as you choose to build a generic platform you will quickly realize that you need to experiment since the number of different Big Data stacks, technologies and tools is extreme. You need to be able to quickly change from one solution to another without too much lead time and waste. It may be hard to change the platform once you have heavily invested in an expensive proprietary on-premise platform like Oracle Big Data Appliance or even IBM Big Insights. It also requires people with a rather specific skill set to maintain the platform.

Cloud platform

Cloud-based Big Data platform like Amazon EMR, Google Cloud Platform or Microsoft Azure provides necessary flexibility and agility which is crucial when starting experimenting with Big Data. If you want to focus your business on what matters the most you will concentrate on the core of your business. Setting up hardware, installing Hadoop and running the basic Big Data infrastructure is not what most businesses need to focus on and should prioritize.

The cloud platform is especially relevant in the first, exploration phase when you are still unsure what to use the technology for. After the first exploration phase, when your solution is stabilized you may still reconsider sourcing in operations BigData technologies however in most of the cases you will like to still keep the flexibility of the cloud.

Summary

All in all, the best strategy is a platform which is open and flexible enough to cover future cases, do not build your BigData solution just for current needs. This is one of the cases when you actually need to concentrate more on technology and capabilities and not only the current, short-term business needs.

Creative Commons License

This work excluding photos is licensed under a Creative Commons Attribution 4.0 International License.

Big Data – quick overview

If you do not have time to dig into all possible variations of Big Data technologies, here is a quick (yet far from complete) overview over Big Data technologies, summarizing on-premise and cloud solutions.

Photo: is am are/Shutterstock.com

Main On-premise Big Data distributions

Hortonworks

Hortonworks established in 2011 and the only distribution that uses pure Apache Hadoop without any proprietary tools and components. Hortonworks is also the only pure Open Source project of all three distributions.

Cloudera

Cloudera was one of the first Hadoop distributions, established i 2008. Cloudera is based to large extent on Open Source components but not as much as Hortonworks. Cloudier is easier to installed use than Hortonworks. The most important difference from Hortonworks is the proprietary management stack.

MapR
MapR swaps HDFS file system with a proprietary MapRFS. MapRFS gives better robustness and redundancy and largely simplified use. Most likely the on-premise distribution that offers the best performance, redundancy and user friendliness. MapR offers extensive documentation, courses and other materials.
 
Comparison of most important Hadoop distributions (based on: “Hadoop buyers guide”)
   
Hortonworks
Cloudera
MapR
Data access
SQL
Hive
Impala
MapR-DB
Hive
Impala
Drill
SparkSQL
Data access
NoSQL
HBase
Accumulo
Phoenix
HBase
HBase
Data access
Scripting
Pig
Pig
Pig
Data access
Batch
MapReduce
Spark
Hive
MapReduce
Spark
Pig
MapReduce
Data access
Search
Solr
Solr
Solr
Data access
Graph/ML
   
GraphX
MLib
Mahout
Data access
RDBMS    Kudu MySQL
Data access
File system access Limited, not standard NFS Limited, not standard NFS HDFS, read/write NFS (Posix)
Data access Authentication Kerberos Kerberos Kerberos and native
Data access Streaming Storm Spark Storm
Spark
MapR-Streams
Ingestion
Ingestion
Sqoop
Flume
Kafka
Sqoop
Flume
Kafka
Sqoop
Flume
Operations
Scheduling
Oozie
 
Oozie
Operations
Data lifecycle
Falcon
Atlas
Cloudera Navigator
 
Operations
Resource management
 
YARN
YARN
Operations
Coordination
ZooKeeper
 
ZooKeeper
Sahara
Myriad
Security
Security
 
Sentry
RecordService
Sentry
Record Service
Perfromance
Data ingestion
Batch
Batch
Batch and streaming (write)
Perfromance
Metadata Architecture
Centralized
Centralized
Distributed
Redundancy
HA
Survives single fault Survives single fault Survives multiple faults
(self healing)
Redundancy
MapReduce HA
Restart of jobs Restart of jobs Continuous without restart
Redundancy
Upgrades With planned dowtnime Rolling upgrades Rolling upgrades
Redundancy
Replication Data only Data only Data and metadata
Redundancy
Snapshots
Consistent for closed files Consistent for closed files Consistent for all files and tables
Redundancy
Disaster recovery
None Scheduled file copy Data mirroring
Management
Tools
Ambari
Cloudbreak
Cloudera Manager
MapR Control System
Management
Heat map, alarms
Supported
Supported
Supported
Management
ReST API
Supported
Supported
Supported
Management
Data and job placement
None
None
Yes

Other on-premise solutions

Oracle Cloudera

Oracle Cloudera is a joint solution from Oracle/Cloudera. Oracle based their Big Data platform on a Cloudera distribution. This distribution offers some additional and useful tools and solutions that give increased performance, in particular Oracle Big Data Appliance, Oracle Big Data Discovery, Oracle NoSQL database and Oracle R Enterprise. 

Oracle Big Data appliance is an integrated HW and SW Big Data solution running on a platform based on Engineered Systems (like Exa Data). Oracle adds Big Data Discovery visualization tools on top of Cloudier/Hadoop while Oracle R Enterprise includes R – an open source, advanced statistical analysis tool.

IBM BigInsights
IBM BigInsights for Apache Hadoop is a solution from IBM that also builds on top of Hadoop. BigInsights offers in addition to Hadoop, some proprietary tool for analysis like BigSQL, BigSheets and BigInsights Data Scientist that includes BigR.
IBM BigInsights for Hadoop also offers BigInsights Enterprise Management solution and IBM Spectrum Scale-FPO file system as an alternative to HDFS.

Cloud solutions

Amazon EMR

Amazon EMR (Elastic Map Reduce) is a Hadoop distribution put together by Amazon and running in Amazon cloud. Amazon EMR is easier to take into use than on-premise Hadoop. Amazon is absolutely the biggest cloud provider but when it comes to BigData its solution is relatively new compared to Google.

Google Cloud Platform
Google offers also BigData cloud services. The most popular er known as BigQuery (SQL like database), Cloud Dataflow (processing framework) and Cloud Dataproc (Sparc and Hadoop services). Google has been working on BigData technologies since long which gives a good start point when it comes to advanced Big Data tools. GCP offers good analysis and visualization tools as well as an advanced platform test the solutions (Cloud Datalab).
Microsoft Azure
Microsoft offers three different cloud solutions based on Azure: HDInsights, HDP for Windows and Microsoft Analytics Platform System.
 
 Comparison of most important Big Data cloud solutions
    Amazon
Web Services
Google
Cloud Platform
Azure
(HDInsights)
Data access
File system storage
Hadoop
Cloud Storage
 
Data access
NoSQL
HBase
Cloud Bigtable
HBase
Data access
SQL
Hive
Hue
Presto
BigQuery
Cloud SQL
Hive
Data access
RDBMS
Phoenix
Cloud SQL
 
Data access
Batch
Pig
Spark
Cloud Dataflow
Map Reduce
Pig
Spark
Data access
Streaming
Spark
Google Cloud Pub/Sub
Storm
Spark
Data access
Script      Pig
Data access
Search      Solr
Ingestion
Ingestion
Sqoop
Cloud Dataflow
 
Visualisation
Visualisation   CloudData lab  
Analytics
Machine Learning Mahout Google Cloud Machine Learning
Speech API
Natural Language API
Translate API
Vision API
R Server
Azure Machine Learning
Operations
Logging
 
Logging
Error reporting
Trace
 
Operations
Coordination
ZooKeeper
   
Operations
Scheduling Oozie    
Operations
Resource Management HCatalog

 

 

Tez
Cloud Console

 

 

Cloud Resource Manager
 
Operations
Monitoring Ganglia Monitoring  
Creative Commons License

This work excluding photos is licensed under a Creative Commons Attribution 4.0 International License.