Whither Plan 9? History and motivation

Whither Plan 9? History and Motivation

Plan 9 is a
research operating system from
Bell Labs. For several
years it was my primary environment and I still use it
regularly. Despite it’s conceptual and implementation
simplicity, I’ve found that folks often don’t immediately
understand the system’s fundamentals: hence this series of
articles.

When I was a young programmer back in high school my primary
environment was a workstation of some type running either Unix
or VMS and X11. After a while I migrated to
FreeBSD on commodity
hardware using the same X11 setup I’d built on workstations. But
eventually the complexity of Unix in general started to get to me:
it happened when they added
periodic(8)
to FreeBSD in one of the 4.x releases. “Really?” I thought to
myself. “What’s wrong with
cron(8)
and making a crontab?” Unix-like systems were evolving in a way
that I didn’t like and I realized it was time for me to find
another home.

And I wasn’t the only one who had ever felt that way. It
turns out that circa 1985 the 1127 research group at Bell Labs,
the same group that developed Unix and C after Bell Labs pulled
out of the
Multics project, came
to the conclusion that they’d taken Unix about as far as they
could as a research vehicle.

They were looking at the computing landscape of the 1980s and
realized that the computing world was fundamentally
changing.

First, high-bandwidth low-latency local area networks were
becoming ubiquitous.

Second, large time-shared systems were being replaced by
networks of heterogeneous workstations built from commodity
hardware. Related, people were now using machines that had
high-resolution bitmapped graphics displays accompanied by mice
instead of text-only, keyboard-only character terminals.

Third, RISC processors were on the rise and multiprocessor
RISC machines were dramatically outperforming their earlier
uniprocessor CISC ancestors.

Finally, they saw major changes in storage systems: RAID was
gaining traction, tape drives were waning, and optical storage
was looking like it would be a big part of the future. (Of
note, this is one area where they were arguably very, very
wrong. But no one is truly prescient.)

At first, they tried to adapt Unix to this new world, but
they quickly decided this was unworkable. What they wanted was
a Unix built on top of the network; what they found was a
network of small Unix systems, each unique and incompatible with
the rest. Instead of a modern nation state, they had a loose
federation of feudal city states.

It turned out that fundamental design assumptions in their
earlier system made it difficult to gracefully accommodate their
desired changes. For example, the concept of a single
privileged ‘root’ user made it difficult to extend the system to
a network of machines: does having ‘root’ access on one machine
confer it on all machines? Why or why not? Here, an artifact
of a different time was at odds with the new reality.
Similarly, graphics had never been integrated into Unix well:
the system was fundamentally build around the idea of the TTY as
the unit of user interaction and the TTY abstraction permeated
the kernel. Also, the system had been fundamentally designed
assuming a uniprocessor machine; fine-grained locking for
scalability on multiprocessor systems was simply non-existent.
Finally, the filesystem organization made it challenging to
support heterogeneous systems in a coherent manner.

In the end, the amount of work required to bring Unix up to
date was considered not worth the effort. So they decided to
start from scratch and build a new system from the ground up:
this system would became Plan 9.

Plan 9 Fundamentals

To a first-order approximation, the idea behind Plan 9 is to
build a Unix-like timesharing system from the network,
rather than a network of loosely connected time-sharing
Unixes.

To start at the most basic level, a Plan 9 system is a
network of computers that are divided into three classes:

File Servers
This is where your data lives. They provide stable
storage to the network.

These are machines with lots of fast secondary storage
(hard disks or RAID arrays or SSDs or whatever. Historically
speaking this meant RAID arrays built from hard disks: Plan 9
predates SSDs and other commodity-class solid state storage
technologies).

File server machines have decent if not spectacular
processors, moderate amounts of RAM for caching data from
secondary storage, and a very fast network connection.

They have no user interaction capabilities to speak of:
often one would use a serial console for day-to-day system
administration tasks. Historically, the file server machine
ran a special version of the kernel and didn’t even have a
shell! Rather, there was something akin to a monitor built-in
where the system administrator executed commands to configure
the system, add and remove users and other similar tasks.

More recently, the file server was rewritten so that it
runs as a user-level program executing under the control of a
normal kernel. It is often still run on a dedicated machine,
however.

An unusual innovation at the time was the backup mechanism:
this was built into the file server. Periodically, all
modified blocks on the file server would be written off to a
tertiary storage device (historically, a magneto-optical
jukebox, but now a separate archival service that stores data
on a dedicated RAID array). Of note, historically file
service was suspended while the set of modified blocks was
enumerated, a process that could take on the order of minutes.
Now, the file system is essentially marked copy-on-write while
backups are happening with no interruption in service.

CPU Servers
Shared compute resources.

These are large multiprocessor machines with lots of of
fast CPUs and lots of RAM. They have a very fast network
connection to the file server but rarely have stable storage
of their own (read: they are often diskless, except for
occasionally having locally attached storage for scratch space
to cut down on network traffic).

Like file servers, there is no real user-interaction
hardware attached to the computer itself: the idea is
that you will interact with a CPU server through a Plan 9
terminal (discussed below). Often console access for system
administration was provided through a serial line.

These run a standard Plan 9 kernel, but compiled using a
“cpu” configuration. This mostly affects how resources are
partitioned between user processes and the kernel (e.g.,
buffers reserved by the kernel and the like). The modern file
server typically runs on a CPU server kernel.

Terminals
The machines a user sits in front of and interacts with.

Terminals have mediocre amounts RAM and CPU power and
middling network interfaces but excellent user-interface
features including a nice keyboard, nice 3-button mouse, and a
nice high resolution bitmapped display with a large monitor.
They are usually diskless.

This is where the user actually interacts with the system:
the terminal is a real computer, capable of running arbitrary
programs locally, subject to RAM and other resource
limitations. In particular, the user runs the window system
program on the terminal as well as programs like text editors,
mail clients, and the usual compliment of filesystem traversal
and manipulation commands. Users would often run compilers
and other applications locally as well.

The terminal, however, is not meant to be a
particularly powerful computer. When the user needs more
computational power, she is expected to use a CPU server.

A user initiates a session with a Plan 9 network by booting a
terminal machine. Once the kernel comes up, it prompts the user
for her credentials: a login name and password. These are
verified against an authentication server — a program running
somewhere on the network that has access to a database of
secrets shared with the users. After successful authentication,
the user becomes the “hostowner”, the terminal connects to the
CPU server, constructs an initial namespace and starts an
interactive shell. That shell typically sources a profile file
that further customizes the namespace and starts the window
system. At this point, the user can interact with the entire
network.

Modernization

A question that immediately arises from this
description: why write a new kernel for this? Why not just
implement these things as separate user-processes on a mature
Unix kernel?

Over the course of its research lifetime, Unix had acquired a
number of barnacles that were difficult to remove. Assumptions
about the machine environment it was developed on were
fundamental: TTYs were a foundational abstraction. Neither
networking nor graphics had ever really been integrated
gracefully. And finally it was fundamentally oriented towards
uniprocessor CISC machines.

With Plan 9, the opportunity was taken to fix the various
deficiencies listed in the motivation section. In particular,
fine-grained locking was added to protect invariants on kernel
data structures. The TTY abstraction, which was already an
anachronism in the 1970s, was discarded completely: effective
use of the system now required a bitmapped graphical
display and a mouse. The kernel was generally slimmed down and
the vestiges of various experiments that didn’t pan out, or
design decisions that were otherwise obsolete or generally bad,
were removed or replaced.

Device interfaces were rethought and replaced. Networking
and graphics were designed in from the start. The security
model was rethought for this new world.

The result was a significantly more modern and portable
kernel that could target far more hardware than Research Unix
could. Unburdened by the legacy of the past, the system could
evolve more cleanly in the new computing environment.
Ultimately, the same kernel would target MIPS, SPARC, Alpha, x86
and x86_64, ARM, MC68k, PowerPC and i960: all without a
single #ifdef.

The userspace programs that one had come to expect were also
cleaned up. Programs that seemingly made no sense in the new
world were not carried forward: things dealing with the TTY, for
example, were left behind. The window system was rewritten from
scratch to take advantage of the network, various warts on
programs were removed and things were generally polished. New
editors were written or polished for the new system, and the new
UNICODE standard for internationalization was embraced through
the freshly-designed UTF-8 encoding, which was introduced to the
world through Plan 9.

On the development front, a new compiler/assembler/linker
suite was written which made cross-compilation trivial and made
development of a single system across heterogeneous hardware
vastly easier (dramatically increasing system portability), and
some experimental features added to the C programming language
to support Plan 9 development. The standard libraries were
rethought and rewritten with a new formatted-printing library,
standard functions, system calls, etc. Threads were facilitated
through the introduction of an rfork primitive that
could create new processes that shared address spaces (but not
stacks).

But what about root?

Plan 9 circumvents the “what about root?” question by simply
doing away with the concept: there is no super-user. Instead,
an ordinary user is designated as the “hostowner” of any
particular computer. This user “owns” the hardware resources of
the machine but is otherwise subject to the normal permissions
authorized scheme users are familiar with from Unix: user, group
and other permissions for read, write and execute.

All machines have hostowners: for terminals this is whoever
logged into the machine when the terminal booted. For CPU and
file servers, these are configured by the system administrator
and stored in some sort of non-volatile memory on the computer
itself (e.g., NVRAM).

On CPU servers, the hostowner can create processes and change
their owner to some other user. This allows a CPU server to
support multiple users simultaneously. But the hostowner cannot
bypass filesystem permissions to inspect a user’s read-protected
files.

This begs the question: if there is no super-user, how are
resources put into places where the user expects them, and how
does the user communicate with the system? The answer is
per-process, mutable namespaces.

Namespaces and resource sharing

One of the, if not the, greatest advances of Plan 9 was an
aggressive adaptation and generalization of the Unix “everything
is a file” philosophy. On Unix “everything” is a file — a named
stream of bytes — except when it’s not: for instance sockets
kinda-sorta look like files but they live in a separate
namespace than other file-like objects (which have familiar
names, like /dev/console or /etc/motd). One does not manipulate
them using the “standard” system calls like
open(2),
creat(2),
etc. One cannot use standard filesystem tools like
ls(1),
cat(1),
or
grep(1)
on sockets since they aren’t visible in the file namespace
(okay, you kinda-sorta can with Unix domain sockets, but even
then there are pretty serious limitations). Or consider the
venerable
ioctl(2)
system call: this is basically a hook for manipulating devices
in some way; the device itself may be represented by a device
node in /dev, but controlling that device uses this weird
in-band mechanism; it’s a hack.

But on Plan 9, everything looks like a file. Or
more precisely everything is a filesystem and there is a single
protocol (called 9P) for interacting with those
filesystems. Most devices are implemented as a small tree of
files including data files for getting access to the
data associated with a device as well as a ctl (nee
“control”) file for controlling the device, setting its
characteristics and so forth. ioctl(2) is gone.

Consider interacting with a UART controlling a serial port.
The UART driver provides a tree that contains a data file for
sending and receiving data over the serial port, as in Unix, but
also a control file. Suppose one wants to set the line rate on
a serial port, one does so by echoing a string into
the control file. Similarly, one can put an ethernet interface
into full-duplex mode via the same mechanism. Generalizing the
mechanism so that reading and writing a text file applies to
device control obsoletes ioctl(2) and other similar
mechanisms: the TCP/IP stack is a filesystem, so setting options
on a TCP connection can also be done by echoing a command into a
ctl file.

Further, the system allows process groups to have independent
namespaces: some process may have a particular set of resources,
represented as filesystems, mounted into its namespace while
another process may have another set of resources mounted into a
different namespace. These can be inherited and changed, and
things can be ‘bound’ into different parts of the namespace
using a “bind” primitive, which is kind of like mounting an
existing subtree onto a new mount point, except that one can
create ‘union’ mounts that share with whatever was already under
that mount point. Further, bindings can be ordered so that one
comes before or after another, a facility used by the shell:
basically, the only thing in $path on Plan 9
is /bin, which is usually a union of all the
various bin directories the user cares about (e.g.,
the system’s architecture-specific bin, the user’s
personal bin, one just for shell scripts, etc).
Note that bind nearly replaces the need for symbolic links; if I
want to create a new name for something, I simply bind it.

All mounts and binds are handled by something in the kernel
called the “mount driver,” and as long as a program can speak 9P
on a file descriptor, the resources it exposes can be mounted
into a namespace, bound into nearly arbitrary configurations,
and manipulated using the standard complement of commands.

Since 9P is a protocol it can be carried over the network,
allowing one access to remote resources. One mounts the
resource into one’s namespace and binds it where one wishes.
This is how networked graphics are implemented: there’s no need
for a separate protocol like X11, as one simply connects to a
remote machine, imports the “draw” device (the filesystem for
dealing with the graphic’s hardware) from one’s terminal, binds
that over /dev/draw (and similarly with the
keyboard and mouse, which are of course represented similarly),
and runs a graphical program, which opens /dev/draw
and writes to it to manipulate the display. Further, all of the
authentication and encryption of the network connection is
handled by whatever provides the network connection;
authorization for opening files is handled by controlling access
to the namespace, and the usual Unix-style permissions for
owner, group and world. There’s no need for
MIT-MAGIC-COOKIE-1‘s or tunneling over SSH or other
such application-level support: you get all of it for free.

Also, since 9P is just a protocol, it is not tied to devices:
any program that can read and write 9P can provide some service.
Again, the window system is implemented as a fileserver:
individual windows provide their own /dev/keyboard, /dev/mouse
and /dev/draw. Note that this implies that the window system can
run itself recursively, which is great if you’re testing a new
version of the window system. As mentioned before, even the
TCP/IP stack is a filesystem.

Finally since mounts and binds are per-process, both
operations are unprivileged: users can arbitrarily mount and
bind things as they like and subject to the permissions of the
resources themselves. Of course, Plan 9 does rely on some
established conventions and programs might make corresponding
assumptions about the shape of the namespace so it’s
not exactly arbitrary in practice but the mechanism is
inherently flexible.

We can see how this simplifies the system by comparing Plan
9’s console mechanism to /dev/tty under Unix.
Under Plan 9, each process can have its
own /dev/cons (taken from the namespace the process
was started in) for interacting with the “console”: it’s not a
special case requiring explicit handling in the kernel
as /dev/tty is under Unix, it’s simply private to
the namespace. Indeed, under the rio window
system, each window has it’s
own /dev/cons: these are synthesized by the window
system itself and used to multiplex the /dev/cons
that was in the namespace rio was started in.

Note how this changes the view of the network from the user’s
perspective in contrast to e.g. Unix or VMS: I construct the set
of resources I wish to manipulate and import them into
my namespace: in this sense, they become an extension of my
machine. This is in stark to other systems in which resources
are remotely accessed: I have to carry my use to them. For
example, suppose I want to access the serial port of some
remote computer: perhaps it is connected to some embedded
device I want to manipulate. I do this by importing the serial
port driver, via 9P, from the machine the device is connected
to. I then run some kind of communications program locally, on
my terminal, connecting to the resource as if it were local to
my computer. 9P and the namespace abstraction make this
transparent to me. Under Unix, by contrast, I’d have to login
to the remote machine and run the communications program
there. This is the resource sharing model, as opposed
to the remote access to resources model.

However, I still can have access to remote resources.
Consider CPU servers: to make use of a CPU server’s resources, I
run a command on my terminal called cpu which
connects me to a remote machine. This is superficially similar
to a remote login program such as ssh with the
critical difference that cpu imports my existing
namespace from my terminal, via 9P, and makes it accessible to
me on the remote machine. Everything on the remote machine is done
within the context of the namespace I set up for myself locally
before accessing the CPU server. So when I run a graphical
program, and it opens /dev/draw this is really
the /dev/draw from my terminal. It is imperfect
in that it relies on well-established convention, but in
practice it works brilliantly.

The file server revisited

The file server is worth another look, both as an interesting
artifact in its own right as well as an example of an early
component of the system that did not pan out as envisioned at
the outset of the project.

In the first through third editions of Plan 9 the file server
machine ran a special kernel that had the filesystem built in.
This was a traditional block-based filesystem and the blocks
were durably kept on a magneto-optical WORM jukebox. In fact,
the WORM actually held the filesystem structure; magnetic disk
was a cache for data resident on the worm and could be discarded
and reinitialized. The WORM was treated as being infinite (not
true of course, but it was regarded so conceptually). Since
changing platters was necessarily slow and magneto-optical
drives weren’t exactly “fast”, there was a disc acting as a
cache of frequently-used blocks as well as a write buffer. RAM
on the file server machine also acted as a read cache for blocks
on the hard disk, giving two layers of caching: generally, the
working set of commonly used user programs and so forth all fit
into the RAM cache. The overview
paper
describing the system stated that something less than one
percent of accesses missed the cache and had to go to the
WORM.

To avoid wasting write-once space and for performance, writes
were buffered on disk and automatically sync’ed to the WORM once
a day: at 5am file service was paused and all blocks modified
since the last dump were enumerated and queued for copy. Once
queued, file service resumed. Those blocks were then written to
newly allocated blocks on some platter(s) by a background
process. The resultong daily “dump” was recorded with a known
name and made accessible as a mountable filesystem (via 9P).
Thus, one could ‘cd’ to a particular dump and see a snapshot of
the filesystem as it existed at that moment in time. This was
interesting since, unlike using tape backups on Unix, if you
lost a file you didn’t need anyone to go read it back for you;
you simply cded to where it was and
used cp to copy it back to the active filesystem.
Similarly if you wanted to try building a program with an older
version of a library, you could simply bind the older version
from the dump onto the library’s name and build your program;
the linker would automatically use the older library version
because that’s what was bound to the name it expected in its
namespace. There were some helper commands for looking for a
file in the dump and so forth to make navigating the dump
easier.

A few groups outside of Bell Labs actually had the
magneto-optical jukeboxes, but they were rare. However the file
server could be configured to use a hard disk as a
“pseudo-worm”: that is, the file server could treat a disk or a
disk mirror like a WORM even though it wasn’t truly write-once
at the hardware level. Most sites outside of the labs were
configured to use the pseudo-worm.

In the 4th edition a new associative block storage server
called Venti appeared. Venti isn’t a WORM jukebox; it’s an
associatively-indexed archival storage server. Data is stored
in fixed-sized blocks that are allocated from storage “arenas”:
when a user writes data to a venti server the data is split into
blocks, the SHA-1 signature of the block is calculated, a block
of backing store is allocated from an arena, the data is written
there, and the mapping between signature and pair is written into an index. If one wants the
block back one looks up its signature in the index to get the
pair back and then reads that block
from the arena. Naturally, this means that duplicate data is
stored only once in the venti. However, venti arenas can be
replicated for speed and/or reliability.

Arenas are sized such that they can be written onto some kind
of archival media (my vague recollection is that DVDs may have
been popular at the time), but they are stored on hard disks or
some other kind of random-access media (SSDs are popular now).
Venti, however, is not a file server and does not present itself
as one. Rather, it speaks its own protocol and likely
originated out of the observation that magneto-optical jukeboxes
had never quite taken off the way they had initially expected,
were expensive, slow, big, noisy and power-hungry. Hard disks
were getting so cheap that they were about to pass tape in
storage density versus cost and with RAID they were pretty
reliable.

A filesystem called “fossil” was written that
could be optionally backed by a venti, but it was rather a
different beast than the old file server. In particular, fossil
is just a normal user program that one can run under a normal
Plan 9 kernel (unlike the older file server, which really was a
self-contained program). And unlike the older filesystem which
lived implicitly on the WORM, fossil has to explicitly maintain
state about the associative store in order to be able to
reconstruct the filesystem structure from the venti.
Regardless, it shares many of the traits of the earlier system
and was clearly influenced by it: there is a dump that is
accessed in the exact same way as the older server’s dump
(including the naming scheme) and backups are automatically
integrated in the same way, but using a copy-on-write scheme
instead of suspending service when snapshotting. The
implementation is radically different, however.

Sounds great; so where is it now?

Sadly, Plan 9 has fallen into disuse over the past decade and
the system as a whole has atrophied. For example, it has been
argued that fossil never attained the level of maturity,
reliability, or polish of the older filesystem and that is
largely a fair assessment. I will discuss this more in part 3
of this series.

Plan 9 is still available today, though it is not actively
developed by Bell Labs anymore. The Labs produced four official
Plan 9 editions; the last in 2003, after which they moved to a
rolling release without fixed editions. However, the Plan 9
group at Bell Labs disbanded several years ago. There are
several forks that have arisen to take up some of the slack:

  • 9legacy: This is a
    patch set for Plan 9 from Bell Labs. It contains most of the
    interesting bits of other distributions while retaining the
    flavor the Bell Labs distribution.
  • Harvey: Harvey is an
    attempt to modernize the Plan 9 codebase. It discards the
    traditional Plan 9 compiler suite in favor of GCC and CLANG.
    It is extremely active and making rapid progress.
  • 9atom: This is a
    distribution maintained by long-time 9fan Eric Quanstrom.
  • 9front: A European fork
    with a distinct culture.
  • Plan 9 from
    Bell Labs
    : The “official” Bell Labs distribution is still
    available, though it has been essentially orphaned.

Further, many of the good ideas in Plan 9 have been brought
into other systems.
The Akaros
operating system has imported not just many of the ideas, but
much of the code as well. Even systems like Linux and FreeBSD
have taken many of the good ideas from Plan 9:
the /proc filesystem on both systems is inspired by
Plan 9, and Linux has implemented a form of per-process
namespaces. FUSE is reminiscent of Plan 9’s userspace filesystem
support.


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/hJUaa2LyFy4/plan9part1

Original article

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: