Last modified: 2025-03-13.

Tout croire ou douter de tout sont deux façons différentes d'être également superficiel.

[To believe everything or to doubt everything are two different ways of being equally superficial.]

Henri Poincaré in La science et l'hypothèse (the citation is not verbatim: I [TL] have "coined" it a little; but this is the spirit)

The usefulness of a computer—here, in the sense of some computation able entity appearing, to the user, as a single unit— has to be measured by its signal to noise ratio: how much real work is done compared to the noise: what it costs, the power it consumes, the time it takes to achieve a useful result, the amount of work needed to administrate it and so on.

A first approach was time sharing: using the time wasted for some input to be used to do something else; or, instead of having a queue of tasks having to wait for the previous ones to be totally achieved, allowing tasks to be partially executed by allocating slices of processing power to different tasks, giving the illusion that they were running at the same time (they are not).

This is a 1 to N approach: one processing unit with N tasks. But there is also the problem of the N to 1: a task that could benefit from N subtasks done concurrently to achieve 1 global result. How to allocate, if needed N processing units to one task.

When trying to make N processing units working, concurrently (at the very same time; no "time sharing") to achieve 1 task, the problem is the communication between the units.

The advent of numerous very tightly coupled cores is changing the means but should certainly also change the view on the problems. As well as the advent of auxiliary processors: GPUs, so that even very common hardware is now multi-cores and inherently NUMA.

Solutions have been designed, including in HPC, with hardware that was different from what is now available. But to believe we know for sure the answers is probably foolish: it is even not proven that we know really the questions...

A different solution, related to cores: application cores, was designed and tried a little more than ten years ago. It was called NIX and was based on Plan9.

Time is ripe to re-explore this way—perhaps leading to new questions and new ways. This is what this tries to partially document.

  1. About the document
    1. Organisation of the document
    2. Typographical conventions
  2. Stating the problem
    1. Timesharing cost
    2. Time is ripe: affordable hardware exists already
  3. Some historical notes
    1. Fran. J. Ballesteros' notes on Nix
    2. History of the 9k kernel
  4. Getting the NIX sources
  5. A tour
    1. Quick tips for now (2025-03-13)
    2. TODO
    3. The organisation of the sources
  6. Various bibliographic references
Nix: a schema

About the document

At present, the editor/writer of the present document is Thierry Laronde (can be mentionned as [TL]. But it is hoped that others will contribute. What follows in this paragraph should serve as contribution guide also.

I [TL] am not an english native speaker. The document is in (some sort) of english that one may call: "international english" that is ruled by all the faults that non english speakers agree upon and hence have no difficulty understanding (when two French insist on speaking "english" together they are generally very proud of their fluency in "english" since they have no difficulty understanding one another). English native speakers have hence whether to adapt, or to send patches :-^

The present document is put under CC BY 4.0 license meaning, roughly—but you should always read the license—that this material can be used with the sole restriction that the creator is appropriately credited (the following is an extract and a block quotation is rendered this way):

You must give appropriate credit, provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

The attribution obligation is applied in this very document: I [TL] have gathered also—the document will have original content too—explanations, discussions etc. found particularily on the 9fans mailing list and I give attribution to the original redactor.

Attribution, in this sense, will mean that someone citing or reproducing part of this document should give credit to the creator of the original: if I cite X for some part of the text, X shall be cited, not me—unless one wants to be extra careful about the quotation or the attribution, not knowing if my citation is accurate or not, and in this case this would be the classical "X as cited in [this document] by [the editor]".

If I do not make a further precision, the citation was taken from the 9fans mailing list. If from another source, the precision should be made (I'm not perfect; I may miss some things...).

The license is valid only for this document, meaning this entry point and the sub-parts linked here and found on this very server under the nix-os subdirectory. External resources have and keep their own license and retain whole responsability of their contents.

Organisation of the document

A course is not a reference or text book: it is a path chosen to present things with some logical thread, with the opportunity, at different moments of the journey, to look at the very same thing but from a different point of view.

This entry document is the main path. But we may wander in the vicinity or we may desire to rest at some special place. These will be put in sub-documents, linked here (and place under the nix-os subdir on this server). So the journey is not a sole line, but a graph.

There are two main threads that come handy to link things together: history and code. These will be alternatively or concurrently used.

Typographical conventions

When adding a short precision (expending nicknames for example), the addition is spanned in the text, interpolated between [square brackets].

A more lengthy remark, reaction... is put in this fashion.

Stating the problem

Timesharing cost

[Paul A. Lalonde] One of the critical tasks of a modern operating system is timesharing: spliting the compute resource among multiple processes, often running at varying priorities. In particular, much of the housekeeping work of the operating system happens on the same cores that are being used for throughput computation. These housekeeping tasks cause perturbation to the running processes. These in turn, though minor on a single compute resource, scale into terrible waste when applied to computing clusters (see F. Petrini, D. J. Kerbyson and S. Pakin, "The Case of the Missi ng Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q")/.

Time is ripe: affordable hardware exists already

There are announces based on RISC-V: Esperanto 1000 general purpose 64-bits RISC-V; others on ARM Ampere 32 to 128 Cores, or up to 192 Cores.

Even low end SoCs are, nowadays, multicore—there is generally at least two cores—and NUMA, because of the GPU. What was very expensive to get is now affordable even for an individual with limited financial means.

Skip Tavkkolian suggested this:

For anyone that might be looking for noisy, power-hungry, but inexpensive x86_64 machines with lots of cores and memory to play around with NIX, you may want to check recyclers like UNIXSurplus, REPC, 3RTechnology, etc.

As an example, about a year ago I bought a Supermicro 1U server from UNIXSurplus (2x Xeon E5-2667 – 16 cores total, 128GB RAM) for $260. It is running linux and I use it as a qemu farm to test and try out different builds and flavors of Plan 9 without messing around with my stable environment. I suspect the prices reflect the fact that some are unpatchable for Meltdown and Spectre vulnerabilities.

But you can get for some hundreds of dollars or euros, new hardware with 4 or 8 cores.

Adventures in9 has made a video presenting PXE booting a mini-pc Beelink EQR6.

WIP A presentation of the installation of 9front on an AM08PRO, a minipc with an 8 cores AMD Ryzen 9 (very recent; 399 euros, VAT included) will be added as a leaf document soon.
The hardware comes with an UEFI AMI Bios (but without an UEFI shell installed), allowing UEFI booting and, in particular, UEFI PXE booting.
This hardware will serve to present tips about UEFI, partitioning in Plan9/9front, installing from an USB key and installing with PXE booting.

Some historical notes

Ron Minnich has provided these commentaries and notes about the NIX history:

Basically, in 2011 or so, we had wrapped up the Blue Gene work, the last Blue Gene systems having been shipped, and jmk [Jim McKie] and I were thinking about what to do; there was still DOE money left.. We decided to revive the k10 work from 2005 or so. We had stopped the k10 work in 2006, when Fred Johnson, DOE program manager of FAST-OS, asked the FAST-OS researchers to start focusing on the upcoming petaflop HPC systems, which were not going to be x86 clusters, and (so long ago!) were not going to run Linux. So we went full circle: DOE funded Plan 9 on k10, then we shifted gears in 2006 to Blue Gene (Power PC), then in 2011, it was back to ... K10.

I wrote a note summarizing what jmk and I came up with and sent it out to the lsub folks and jmk on April 21. A lot of it is not what we did :-) but the core idea, of application cores, we did do.

So, below, the note, showing the core idea in april 2021. The "in May" reference is lsub's kind offer to fly me out and give me a place to stay for May 2011. The result was, for me, a chance to work with and learn from very smart researchers! The 'we' mentioned in the note is jmk and me. Note that the idea is very unformed. By the time I got there Nemo [Francisco J Ballesteros] had figured out the core operation of switching between AC and TC, and Charles [Forsyth] had convinced me that, since we're running on a shared memory machine, we might want to take advantage of that fact.

I pushed hard on having only 2M pages, which we later continued on Harvey. A standard HPC noise benchmark (https://github.com/rminnich/ftq) showed this worked very well. Nemo came up with a very nice idea; once the break got above 1G, just use 1G pages, b/c only 1 or 2 programs would need it, but we'd save lots of page table pages in doing this. It worked well.

The README of ftq has this: Plan 9 support is deprecated because nobody cared, and the default Plan 9 timers still suck .

Has some work been done in this area? What is needed?

In retrospect, it's not clear that just having 2M pages is a good idea. 4K seems clearly too small, but 64K seems a better size all around.

What was really incredible was just how little of the kernel we had to change to get it to work, and just how quickly we had the basic system (about 2 weeks). And it worked so well. We made a minor change to exec and to rc to make it trivially easy to schedule processes on an AC.

Finally, why did something like this not ever happen? Because GPUs came along a few years later and that's where all the parallelism in HPC is nowadays. NIX was a nice idea, but it did not survive in the GPU era.

Here is the original note:

I think we came to a good conclusion about what to do in May.

The idea is to base our work on the Plan 9 k8 port for the 9k kernel. I would like to explore the concept of application cores. An application core is a core that only runs user mode programs. Right now there are lots of questions about how to do this, but application cores are seen as a next step in manycore systems. In a system of N^2 cores,vendors are telling me that something like N of then will be able to run a kernel, and that N^2-N will not be able to. Application cores save power, heat, money, and die space.

The idea is that to prototype we can run a full-up Plan 9 kernel on core 0, then have a driver (/dev/apcore) with clone file (/dev/apcore/clone) that a process can open to gain access to a core. The application core process can be assembled by writing to a ctl file to do such operations as allocating and writing memory, setting registers, etc., then launched via write to the ctl file. The application core process talks to the kernel via typed ipc channels such as we have today – it will look kind of like an ioproc but the channels will be highly optimized like the ones in Barrelfish.

All the models we need for this mode exist in Plan 9.

Here's what's neat. You can have application cores, but we can also have core 0 running a traditional time-shared kernel (Plan 9 in this case). That way, if you run out of application cores, the traditional time-shared model is there on core 0. I think this hybrid model is going to be very powerful.

So I will be booting the 9k/k8 kernel to make sure I know how. That way, when I get there, we can get a quick start.

thanks

Ron Minnich later added the following:

Reading old emails is always interesting. Turns out I was in discussions with a company building a CPU that was a very good fit to NIX. I was trying to get that company to ship a research system to lsub.

They were initially very agreeable but, finally, stopped talking about Plan 9 on their system.

What stopped them? The Lucent Public License.

That license blocked a number of potential partnerships with companies back then. I am glad Nokia helped us resolve that problem.

P.S. that CPU never went very far. The GPU tsunami killed it.

Fran. J. Ballesteros' notes on Nix

Fran. J Ballesteros [nemo] has written also notes and presentation of the Nix work. The document, worth reading, is here: https://lsub.org/nix/ (the page has references too that are also given in the present document bibliography).

History of the 9k kernel

Some of the work done for Nix was incorporated back in Jim McKie's 9k kernel that was used as a starting point to 10k for Nix.

David du Colombier gave these notes and link to the history of the kernel:

Roughly speaking, Nix is based on a version of Jim McKie's 9k from early 2011.

Jim McKie later retrieved some improvements from Nix to its own 9k tree, especially since there was a lot of improvements done in May 2011.

From February 2013 to June 2013, there has been some collective effort to improve 9k further, especially since it was still lacking a proper memory manager.

As far I know, Jim McKie continued to work on his 9k tree until October 2013, before the disband of Bell Labs.

The code history for the 9k kernel [git sources] is on https://github.com/0intro/9k.

9legacy's 9k is based on the last version of Jim McKie's 9k, with some improvements imported from Nix, including its memory manager.

Over the years, there have been multiple variants of 9k and Nix, each with the own specialties (e.g. there have been something like 4 or so completely different implementations of the memory manager).

A small overview of the 9k derivatives is available here: http://9legacy.org/9legacy/doc/9k/history.

Getting the NIX sources

NIX is a small amount of modifications on top of 9front. For me moment, it's a branch and not an official component. The master sources are served as a git repository here: https://github.com/rminnich/9front .

A tour

As evoked in the historical notes above, the version from the kernel from which Nix was started was a version due to, for the main part, to Jim McKie[jmk] and named 9k. In the original sources, the Nix kernel was found under k10—that I interpreted wrongly as meaning the next iteration (10, successor of 9) of a kernel; the reason is explained by David du Colombier (original explanations were in French; the translation is mine):

The kernel was named 9k, but k10 isn't a reference to it.

k10 is the port name for amd64k10 had other ports too, notably for Power arch, in the context of the HARE project).

k10 is a reference to AMD K10 micro-arch.

The port was indeed even named k8 (/sys/src/9k/k8) before being renamed k10 in June 2011. The AMD K8 micro-architecture was the K10 one predecessor: AMD K8.

The initial effort about Nix, in 2011, was accomplished, for the bulk of the work—the log shows supplementary ponctual contributions—by the following individuals, here simply listed in (nicknames) alphabetical order:

Not surprising: these are the authors of the main articles about the project (see the bibliography in this page).

The result, as it is now, uses 2M pages and supports 1G pages.

Some experimental frameworks were tried too: tubes (removed), and zero-copy (still there).

Quick tips for now (2025-03-13)

Here are some notes about the current (as of the date of the paragraph) state of NIX, from Ron Minnich.

Once you get the kernel compiled, from Ron Minnich's 9front NIX branch (nix branch, that is the default (supposing you have the latest sources), here is a qemu incantation to test (NIXDIR has to be defined with the directory where lives the compiled 9pc64)—no need to provide the iso since it will stop before reaching userland):
qemu-system-x86_64 -kernel $NIXDIR/9pc64 -smp 4 -nographic -serial mon:stdio -cpu qemu64,+monitor -append console=0 -m 16384 [other_args]

TODO

About current state and what has to be done next, please refer to the NIX TODO list

The organisation of the sources

Initially, the Nix modifications came with a whole Plan9 distribution that was the Bell Labs one at the moment. Ron Minnich and, mainly, Paul A. Lalonde have cleaned the sources in order to keep only what is modified or added by Nix proper.

Then Ron Minnich integrated back these modifications (including some additions) on a 9front branch. On this branch, the Nix kernel is the default one, so files are to be found under sys/src/9.

The modifications consist of 26 files : 21 modifications and 5 additions.

In what follows, M.I. means Machine Independent, while M.D. means Machine Dependent.

The modifications can be grouped roughly in the following subsets:

Bootstrapping
Modifications made in order to alter options during the bootstrapping stage.
Kernel
Modification to the kernel proper: there are M.D. parts relative to the x86 family (whether under pc/ or pc64/), and M.I. concerning portable stuff: common devices etc.
Interface
The interface with userland code: the system calls.
Administration utilities
The program to launch an Application Core.
Scaffolding
Ad hoc temporary utilities, whether scripts or programs, that are used during development, but are not to remain.

Here is a table with the modifications:

Group Directory Filename MI/MD Addition/Modification Description Commentary
Boostrapping sys/src/9/boot bootrc M.I. Modification
Boostrapping sys/src/9/boot disk.proto M.I. Modification Debugging stuff; temporary.
Kernel sys/src/9 l64sipi.s M.D. Addition Related to AC: Start-up request IPI handler. Unused for now?
Kernel sys/src/9/pc apic.c M.D. Modification Addition for IOAPIC Redirection Table Entry (RDT); APIC Local Vector Table Entry (LVT); APIC Interrupt Command Register (ICR).
Kernel sys/src/9/pc fns.h M.D. Modification Declarations linked to additions in apic.c.
Kernel sys/src/9/pc io.h M.D. Modification Addition of enum value VectorIPI and struct ACVctl.
Kernel sys/src/9/pc64 acore.c M.D. Addition The AC (Application Core) kernel.
Kernel sys/src/9/pc64 dat.h M.D. Modification
Kernel sys/src/9/pc64 fns.h M.D. Modification
Kernel sys/src/9/pc64 main.c M.D. Modification De dicto: the C entry point. The only modification is to append testing things for now.
Kernel sys/src/9/pc64 mkfile M.D. Modification
Kernel sys/src/9/pc64 nix.s M.D. Addition De dicto: NIX proper M.D. stuff. The first defines give the offsets that was obtained with the scaffolding struct.c—see below.
Kernel sys/src/9/pc64 squidboy.c M.D. Modification
Kernel sys/src/9/pc64 tcore.c M.D. Addition TC (Time Sharing) core kernel.
Kernel sys/src/9/port lib.h M.I. Modification
Kernel sys/src/9/port portdat.h M.I. Modification
Kernel sys/src/9/port portfns.h M.I. Modification
Kernel sys/src/9/port fault.c M.I. Modification
Kernel sys/src/9/port proc.c M.I. Modification
Kernel sys/src/9/port segment.c M.I. Modification
Kernel sys/src/9/port syscallfmt.c M.I. Modification
Kernel sys/src/9/port sysproc.c M.I. Modification
Interface sys/src/libc/9syscall sys.h M.I. Modification
Administration sys/src/cmd execac.c M.I. Addition
Scaffolding sys/src/9/pc64 DIT M.I. Addition A script to put the kernel on an USB key.
Scaffolding sys/src/9/pc64 brekky.c M.I. Addition Testing brk.
Scaffolding sys/src/9/pc64 struct.c M.I. Addition Searching for the correct offsets in various structures.

Various bibliographic references

Here are some references to resources related to the Nix operating system, to hardware (some were provided by various people— Paul Lalonde, Bakul Shah...—on the 9fans mailing list).

General benchmarking:

F. Petrini, D. J. Kerbyson and S. Pakin, "The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q"
:
Abstract: In this paper we describe how we improved the effective performance of ASCI Q, the world's second-fastest supercomputer, to meet our expectations. Using an arsenal of performance-analysis techniques including analytical models, custom microbenchmarks, full applications, and simulators, we succeeded in observing a serious-but previously undetected-performance problem. We identified the source of the problem, eliminated the problem, and "closed the loop" by demonstrating up to a factor of 2 improvement in application performance. We present our methodology and provide insight into performance analysis that is immediately applicable to other large-scale supercomputers.

Nix proper:

Fran. J. Ballesteros: Nix, Get rid of the operating system!
A presentation of the aim and work done on Nix.
Analyzing manycore OS design aspects in NIX)
A PDF overview (Abstract) presenting Nix.
Analyzing manycore OS design aspects in NIX (figures)
A PDF with graphics presenting the schematic principles of Nix (KC, TC and AC) and graphics about early results.

Hardware:

Kayvon Fatahalian and Mike Houston : "A closer look at GPUs"
As the line between GPUs and CPUs begins to blur, it's important to understand what makes GPUs tick.
EuroCC National Competence Center Sweden: GPU Programming: When, Why and How?
an introduction to GPUs.
Nvidia: Ampere architecture whitepaper
de dicto.
AMD IOMMU specification
AMD I/O Virtualization Technology. The I/O Memory Management Unit (IOMMU) extends the AMD64 system architecture by adding support for address translation and system memory access protection on DMA transfers from peripheral devices. IOMMU also helps filter and remap interrupts from peripheral devices.

©2022–2025 Thierry Laronde — FRANCE