Let it crash

As functional programming paradigm becomes more and more broadly recognized,interest in functional languages (Scala, F#, Erlang, Elixir, Haskell, Clojure, Mathematica and many other) increases rapidly over last few years, it still remains far from the position that mainstream languages like Java and .NET have. Functional languages are predominantly declarative and based on principles of avoiding changing state and eliminating side effects. Several of these languages and frameworks like Scala/Akka and Erlang/OTP also provide new approach to handling the concurrency avoiding shared state and promoting messaging/events as a mean for communication and coordination between the processes. As a consequence they also provide frameworks based on actors and lightweight processes.

Fail-fast, on the other hand, as an important system design paradigm helps avoiding flawed processing in mission critical systems. Fail-fast makes it easier to find the root cause of the failure, but also requires that the system is built in a fault-tolerant way and is able to automatically recover from the failure.

Fail-fast combined with lightweight processes brings us to “Let it crash” paradigm. “Let it crash” takes fail-fast paradigm even further. The “Let it crash” system is not only build to detect and handle errors and exceptions early but also with an assumption that only the main flow of the processing is the one which really counts and the only one that should be implemented and handled. There is little purpose in programming in a defensive way, i.e. by attempting to identify all possible fault scenarios upfront. As a programmer, you now only need to focus on the most probable scenarios and the most likely exceptional flows. Any other hypothetical flows are not worth to spend time on and should lead to crash and recovery instead. “Let it crash” focuses on the functionality first and this way supports very well modern Lean Development and Agile Development paradigms.

As Joe Armstrong states in his Ph.D. thesis, if you canʼt do what you want to do, die and you should not program defensively, thus program offensively and “Let it crash“ Instead of trying focusing on covering all possible fault scenarios – just “Let it crash“

Photo: Pexels

However, recovery from a fault always takes some time (i.e. seconds or even minutes). Not all kinds of languages and systems are designed to handle this kind of behavior. In particular “Let it crash” is hard to achieve in C++ or Java. The recovery needs to be fast and unnoticed for the processes which are not directly involved in it. This is where functional languages and actor frameworks come into the picture. Languages like Scala/Akka or Erlang/OTP promote actor framework, making it possible to handle many thousands of processes on a single machine as opposed to hundreds of OS processes. Thousands of lightweight processes make it possible to isolate processing related to a single user of the system or a subscriber. It is thus cheaper to let the process crash, it recovers faster as well.

“Let it crash” is also naturally easier to implement in an untyped language (e.g. Erlang). The main reason for this is error handling and how hard it is to redesign the handling of exceptions once it is implemented. Typed languages can be quite constraining when combined with “Let it crash” paradigm. In particular, it is rather hard to change an unchecked exception into checked exception and vice versa once you designed your java class.

Finally “Let it crash” also implies that there exists a sufficient framework for recovery. In particular, Erlang and OTP (Open Telecommunications Platform) provides a concept of supervisor and various recovery scenarios of the recovery of whole process trees. This kind of framework makes implementing the “Let it crash” much simpler by providing a foolproof, out of the box recovery scheme for your system.

There are also other benefits of “Let it crash” approach. As there are now each end-user of your system, and each subscriber is represented as a single process, you can easily take into use advanced models like e.g. finite state machines. Even though not specific to Erlang or Scala, the finite state machines are quite useful to understand what has lead to the failure once your system fails. Finite state machines combined with a “Let it crash” frameworks can potentially be very efficient in for fault analysis and fault correction.

Although very powerful and sophisticated, “Let it crash” did unfortunately not yet gain much attention besides when combined with Scala/Akka and Erlang/OTP. The reasons are many, on one side (as explained above) the very specific and tough requirements on the programming languages and platforms but also the very fact that only the mission-critical systems really require this level of fault tolerance. In the case of classic, less critical business systems, the fault tolerance requirements are not significant enough to justify the use of a niche technology like Erlang or Scala/Akka.

“Perfect is the enemy of good” and mainstream languages like Java or .NET win the game again, even though they are inferior when it comes to fault-tolerance and supporting “Let it crash” approach.

Creative Commons License

This work excluding photos is licensed under a Creative Commons Attribution 4.0 International License.