For context, Erlang is designed for building highly reliable fault tolerant systems, things like telecom switches that may run for many years without rebooting or powering down or failing. "Fail" like "error" and "fault" have specific technical meanings when talking about such systems. A fault causes the system to enter a problematic state, the problematic state is an error, if the problematic state causes the system not to perform its intended function then the system fails.
What the article is getting at is adding an error handler that's not in the specification keeps the system in an unspecified state [or transitions the system into a different unspecified state depending on how you look at it].
Well, I guess there's another piece of context: it's probably best to treat systems as algebraically closed in that combining two systems produces another system and so forth [1]. This means that when a specific system crashes (and "crashing" is not a technical term in the context) the larger system may be designed to handle the error. With proper encapsulation, there's no way for the crashed system to "know" what's going on above it or what's best...without a specification.
In other words, the assumption for such systems is that if a system is supposed to handle an error then the specification would say so. High reliability fault tolerant systems aren't built with cowboy coding. The software expresses the design and while some systems are such that failing results in failing to delight someone with a piano playing cat, other systems are such that when they fail 911 calls are missed and people die. In those cases, the author is suggesting when you don't know, don't guess.
[1]: 52. Systems have sub-systems and sub-systems have sub- systems and so on ad infinitum - which is why we're always starting over. --Alan J. Perlis
Yes, we agree that adding unspecified handling results in the system being in an unspecified state. It's also a good thinking that every system is a sub system of another system, so at some point a crash is meaningful because it tells the higher level system that something went wrong and the higher level system should know that so that it can start to handle the error.
The problem space which I like to add is the scenario in which the higher level system is a human.
In a sense humans are like old code, they do odd things, are unable to specify their expectations and sometimes prioritize expectations that are utterly wrong. But we as smart programmers, who want to succeed with our program, can not be satisfied with "scientifically correct" code. We need to produce code that humans use. Therefore we can not always crash when entering unspecified states. That's not "cowboy coding", that's handling the imperfection of the higher level system. Sometimes the additional error handler is good, because the crash itself is worse than an undefined state.
The idea is not to crash the entire program, but the part of it that has gotten itself into a bad state. Look at the Cowboy web server for instance: it doesn't just fall over in its entirety if a request handler barfs for some reason.
You'd want the UI to not crash, and show a nice error message:
Okay, but that's exception and exception handling not crashing. Crashing means the whole program dies. I can understand if the term "crash" is defined differently in Erlang world, but I don't understand why Erlang citizens don't know that the rest of the world calls that "exception" and that "exception handling" is implemented in many high level languages by default, doing exactly as the article explains you should be doing it. I mean, Erlang people also know other languages, right? I think there are more Erlang experts who know C++ than C++ experts who know Erlang.
A stacktrace passed to a Word User is exactly the problem. The error was handled at the lower level system which includes a stack rather than at the higher level composite system that involves people. If the low level error hadn't been handled locally, a more user friendly level of the system could have made a more user friendly level presentation of the information regarding the failure.
Because the kernel code cowboys ignored the UX city slickers, the system failure is in a sense, double. Not only does it stop letting the user type words into boxes, but then it also fails by presenting the user with a bunch of useless gobbledy-gook. It would have been better if they had asked the people working on the higher level abstractions, "Should we handle <this type> of error and if so how?" Or with wonderfully non-technical ambiguity, asking, "What should we do when we crash?"
If the error is handled locally, there's no chance that a higher level can handle it more gracefully. What causes a low level system programmer to freak out, may be nothing if the higher level system is redundant in regard to the low level system. A system shouldn't assume it knows what is catastrophic in the large. A message stating, "I have an error" is the place to start.
What the article is getting at is adding an error handler that's not in the specification keeps the system in an unspecified state [or transitions the system into a different unspecified state depending on how you look at it].
Well, I guess there's another piece of context: it's probably best to treat systems as algebraically closed in that combining two systems produces another system and so forth [1]. This means that when a specific system crashes (and "crashing" is not a technical term in the context) the larger system may be designed to handle the error. With proper encapsulation, there's no way for the crashed system to "know" what's going on above it or what's best...without a specification.
In other words, the assumption for such systems is that if a system is supposed to handle an error then the specification would say so. High reliability fault tolerant systems aren't built with cowboy coding. The software expresses the design and while some systems are such that failing results in failing to delight someone with a piano playing cat, other systems are such that when they fail 911 calls are missed and people die. In those cases, the author is suggesting when you don't know, don't guess.
[1]: 52. Systems have sub-systems and sub-systems have sub- systems and so on ad infinitum - which is why we're always starting over. --Alan J. Perlis