!['Modern Operating System' looks as Ford-T in XXI century 'Modern Operating System' looks as Ford-T in XXI century](http://ithare.com/wp-content/uploads/BB_part208_ModernOS_v3-1-640x427.png)
Dude, that’s so 90’s it shits neon windbreakers.
— Urban Dictionary —
![BB_emotion_0027b.png](http://ithare.com/wp-content/uploads/BB_emotion_0027b.png)
BTW, Multics/VAX and later Unix/Windows were extremely good systems for their time, but there is little doubt that quite a few major developments have happened in IT since 60s, both on hardware and software fronts, and probably even more importantly – in the way how we’re using computers in XXI century.
It might mean that some of the premises upon which existing operating systems were designed, no longer apply – and that on the other hand, new considerations might have emerged. In other words – it might be a good idea to take a fresh look at the operating system design (without blindly accepting the way how existing operating systems are built), and see whether current approaches are still optimal.
Of course, as any attempt to disrupt current way of thinking, this article is going to be attacked – and this is perfectly fine; still, while attacking, please try to refrain from the arguments along the lines of “hey, what you’re proposing is different from what is done for last 50 years, so it MUST be wrong”, and “it is obviously impossible to achieve what you’re saying; it is like proposing a perpetuum mobile, so it MUST be wrong without reading the rest” (the latter usually comes because of thinking in terms of existing OS designs, where many of such things are indeed impossible).
What Has Changed Over Last 50 Years?
As noted above, a Damn Lot
![™ ™](https://s.w.org/images/core/emoji/11.2.0/72x72/2122.png)
Changes in the Way We’re Using Computers
The first high-level category of changes is about ways we’re using our computers:
- For such interactive systems (which cover a vast majority of all the real-world MCUs/CPUs), processing incoming events (usually coming in an unspecified order, just like it happens in the real world) becomes an extremely important task. Moreover, the whole concept of “thread of execution” (as in “thread which runs indefinitely”) becomes much less important for interactive programs than for HPC-like purely-computing tasks (if all we have is short event handlers – we don’t really need long-running “threads”, more on it below). Just as one example – all the network protocols I know are described in terms of state machines which handle events – and not in terms of indefinitely-running threads of execution.
- Emergence of Microprocessors and Single-Purpose Boxes. Before the advent of microprocessors in 1970s (we can argue whether it was i8008 which changed the game, or i8080, but for our purposes it doesn’t really matter), everybody was sitting in front of a mainframe terminal, with only a few of such mainframes in existence (as Sir Charles Darwin, head of NPL, has said – “as it is very possible that … one machine would suffice to solve all the problems that are demanded of it from the whole country”); as a result, that mainframe just had to be shared. These days, we have MCUs/CPUs in our cars and fridges, not mentioning phones, watches, etc. etc. etc. This switch from mainframes to microprocessors has two additional important implications which are largely ignored by currently existing OS designs:
- in many real-world cases, preemptive scheduling (which is essentially another form of protection against ill-behaving apps) is no longer a real requirement.
- NB: on the other hand, multi-app deployment scenarios (such as on phones and desktops) are still widely used, so giving up on kernel protection entirely is not feasible.
- Popularization of Networking and Appearance of Distributed Systems. As noted above, back in the times when our existing operating systems were designed, there were very few computers in existence – with many people sharing one computer. In contrast, these days, whole networks of computers are routinely used to perform what can be seen one single task (either computational one, or interactive one).
- This allows to have lots of stuff done – but surprisingly OS-level support for such distributed things is extremely limited (and distributed OS’s are not really used in practice).
- Proliferation of Open-Source Software. By Multics/VAX times in 60s-70s, public domain software (the one coming from academic research) became insufficient – and all the software was on the way of becoming more and more proprietary. However, by 80s-90s several different open-source initiatives have started – and lots of high-quality open-source software was developed.
- From our current perspective, open source makes source-level (API-level) compatibility perfectly acceptable for lots of use cases (opposed to binary ABI-level compatibility required for closed-source software). In other words, shipping programs in the source code (or at least sending them to a 3rd-party compiler) in many cases doesn’t qualify as a showstopper anymore.
Changes in Hardware
Now, let’s take a look at the changes which have happened in hardware:
- Performance Improvements. Since the 60s, vast improvements in CPU speed and memory size (and more generally – storage size) have occurred. These improvements mean that we can afford many of the things which we weren’t able to afford back in 1970. As one example, RAM size has grown from 1M in Multics times (that’s for a multi-million-dollar mainframe(!) and was considered huge at the time) to over 10 Gigabytes right on my desktop (that’s a 10’000x improvement in RAM size with the price dropping another 10’000x).
- In practice, it means that these days, we can quite easily afford to store all the inputs of most desktop apps for several hours even in RAM. This, in turn, enables some highly-desirable features such as recording/replay of production failures to enable post-mortem analysis, as well as fault tolerance (more on it below).
- In addition, importance of the size of the program has diminished greatly. Sure, there are still odd stories of the program code not fitting into caches – but they’re very few and far between. On the other hand, dynamic branch predictors (see below) DO benefit from the same code used in different contexts, being compiled to different addresses in memory.
- Runtime Indirections Became Expensive. In spite of all the components of computers being greatly improved since the 60s, performance improvements weren’t even for different components. As one all-important example (sometimes referred to as “memory gap”): back in 70-80s a typical register-register operation was 1 to 4 CPU cycles, while RAM access time was around 2-4 CPU cycles [Knuth][i8080]; in general, we could say that back at that time, RAM access costs were roughly comparable to R-R ops. In 2019, if speaking of desktops/servers/phones (though not about MCUs) a simple R-R op takes <1 CPU cycle (on average, of course), but non-cached RAM access can easily take 100+ CPU cycles (mind you, it is still faster than RAM access 50 years ago, it is just that R-R improved much more significantly since then). In other words, for quite a few systems out there, RAM access became 100x more expensive than an R-R op – which is a major change from the performance point of view compared to K&R times. This, in turn, undermines certain programming paradigms which are widely used while programming existing operating systems; in particular,
- Thread Context Switches Became Enormously Expensive. To battle the same “memory gap”, multi-level caches were introduced to higher-end CPUs; these days for a CPU supporting multi-gigabyte DRAM, it is common to have three levels of cache. This, in turn, has an extremely important implication: costs of re-populating caches after the thread context switch went through the roof. It is often mentioned that the cost of a thread context switch is around 2000 CPU cycles – however, this is only an explicit cost of the context switch (saving registers, etc.), which doesn’t account for the cost of re-populating caches. And if we take into account these latter costs – we’ll find out that thread context switch can cost up to a million(!) CPU cycles[LiEtAl]– which is a Damn Lot
- Shared-Nothing architectures are to be strongly preferred to Shared-Memory ones, both at app and kernel levels.
- Cooperative multitasking (and Run-to-Completion a.k.a. RTC), where applicable, is preferable to pre-emption. Indeed, if we switch context only when the context doesn’t exist anymore, there is no need to re-populate anything (of course, current caches will still be discarded – but with no need to get them back into caches and to incur relevant costs).
- Advent of NUMA. In 1990s-2000s, NUMA systems were introduced, and quickly demonstrated the superiority of NUMA to FSB-based pure-SMP architectures.
- This means that Shared-Nothing systems tend to have an inherent performance advantage over Shared-Memory ones – even before we take synchronization costs into account. Exceptions do exist but are rather few and far between.
- Interrupts Improvements. Since the 60s, interrupt-based CPUs became ubiquitous (and cost of interrupt – really low). While the concept of an I/O interrupt was introduced in the early 60s, it wasn’t seriously considered in Multics-inspired designs (as the quote above clearly indicates, Multics philosophy was to allow “major processing of information in a… leisurely way”, treating interrupts as a mere annoyance to be handled by some other (non-Multics) systems). Since Multics times, interrupts were made more and more efficient (with interrupt coalescing and DMA being probably the most important things) – but they still feel like stepchildren in the overall OS design (just as one example, in [TanenbaumBos] interrupts are discussed in Chapter 5 – that’s after processes, threads, memory management, and even (drumroll!) file system).
- Longer Pipelines, and Branch Predictors, including Dynamic Branch Predictors. While primitive pipelines were known since times of Ancient Programmers, pipelines started to be more and more sophisticated since about 80s; while in 60s typical pipeline took 3 stages, currently it is more like 8-14 stages. This has increased costs of branch misprediction – which in turn has led to the development of sophisticated branch predictors, including dynamic branch predictors.
- One of the implications of dynamic branch prediction is that often it is good to have the same line of code, when used in two different contexts, to compile into two different instructions, so that dynamic branch predictor can gather different stats for these different instructions. This tends to play very nicely with support for greatly increased program memory sizes.
Changes in Software
Last but certainly not least – let’s take a look at the changes on the software front:
- Static Analysis. Over last 50 years, static analysis tools have improved drastically – currently allowing to prove certain safety aspects of the real-world program; examples include Rust, D1179 [Sutter], and [memory-safe-cpp].
- This opens the door to the revival of language-based protection systems (either as a replacement of current hardware-based protections – which makes it feasible for MCUs, or as an addition to hardware-based protection – improving security).
- Public Crypto/PKI. In the 70s, public cryptography was invented – which has many benefits, including an ability to sign executable on one box and validate the signature on any number of other boxes.
- This, in turn, enables performing language-based checks on one box only – while shipping signed executables to any number of other boxes.
- Support for Asynchronous Programming. Closely related to the previous item, and in probably the most recent development out of all those listed above, support for asynchronous programming was drastically improved in programming languages; in particular, since 2015 await operator has made it into a wide range of different programming languages (ranging from JS and Python to C# and C++).
- This enables reasonably intuitive thread-agnostic programming for interactive programs; BTW, it was argued in [NoBugs] that await-based programming is the best way (surpassing both multi-threaded programming and previous forms of async programming) to write stateful programs whenever an intervening event has to be processed while an operation is outstanding.
To Be Continued…
![BB_emotion_0001b.png](http://ithare.com/wp-content/uploads/BB_emotion_0001b.png)
References
[TanenbaumBos] Andrew S. Tanenbaum, Herbert Bos, “Modern Operating Systems, 4th Edition”, Pearson, 2015
[CorbatóVyssotsky] F. J. Corbató, V. A. Vyssotsky, “Introduction and Overview of the Multics System”
[Knuth] Donald E. Knuth, “The Art of Computer Programming, Vol. 1: Fundamental Algorithms, 3rd Edition”, Addison-Wesley Professional, 1997
[i8080] “Intel 8080 instruction set”
[LiEtAl] Chuanpeng Li, Chen Ding, Kai Shen, “Quantifying The Cost of Context Switch”
[Sutter] Herb Sutter, “Lifetime safety: Preventing common dangling”
[memory-safe-cpp] “memory-safe-cpp”
[GoBlog] “Share Memory By Communicating”
[NoBugs] ‘No Bugs’ Hare, “Eight Ways to Handle Non-Blocking Returns in Message-Passing Programs”, CPPCON2017
Acknowledgement
Cartoons by Sergey Gordeev
![IRL IRL](http://ithare.com/wp-content/uploads/irl-link.png)
P.S.
Don't like this post? Criticize↯
P.P.S.
We've tried to optimize our feed for viewing in your RSS viewer. However, our pages are quite complicated, so if you see any glitches when viewing this page in your RSS viewer, please refer to our original page.