+TITLE(«
+
+ Unix is user-friendly. It's just very selective about who
+ its friends are. -- Unknown
+
+», __file__)
+
+SECTION(«History and Philosophy»)
+
+SUBSECTION(«Early Unix History»)
+
+<p> Unix was created in 1969 as a successor of Multics, the
+<em>MULTiplexed Information and Computing Service</em>, which had been
+in use since the mid 1960s as the successor of CTSS, the <em>Compatible
+Time-Sharing System</em> of the early 1960s. Multics aimed to get
+CTSS right, but failed in this regard and was eventually discontinued
+because of its complexity. The Unix approach was very different as it
+was brainstormed by only three people and then implemented by Ken
+Thompson at Bell Laboratories in two days. Unlike its predecessors it
+focused on elegance and simplicity. The name was originally spelt
+UNICS (<em>UNiplexed Information and Computing Service</em>) to
+emphasize the contrast to Multics. </p>
+
+<p> The original Unix implementation was written in assembly
+language for the 18 bit processor of the PDP-7 "minicomputer", a
+device of the size of a wardrobe which was considered small by the
+standards of the day. Like all computers of this era, the PDP-7 was
+not connected to a video screen. Instead, input had to be typed in on
+the <em>console</em>, a device which looked much like an electric
+typewriter. Output from the computer was printed on rolls of paper.
+Since the assembly instructions could not easily be ported to different
+hardware, Dennis Ritchie invented the C programming language in
+1971. By 1973 the complete Unix implementation had been rewritten in C.
+The C language was another corner stone in the history of Unix which
+turned out to be very successful. While other programming languages
+of that time have long been abandoned or play merely a niche role, C
+is still one of the most widely used programming languages today. The
+first Unix application was <code>roff</code>, a typesetting program
+which is still ubiquitous as the manual pages which ship with every
+Unix system are formatted with roff. </p>
+
+<p> From the beginning, Thompson and his early collaborators encouraged
+close communication between programmers, creating an early form of
+community. Up to the present day, this "hacker culture" has been a
+stimulus for countless improvements. Copies of Unix were distributed
+on tapes, hand-signed with "Love, Ken". Over time many universities
+contributed to Unix. By the end of the 1970s, Unix accumulated a
+whole bunch of utilities that made it a fully flavored operating
+system which was also free of any copyright claims. </p>
+
+<p> Despite the primitive hardware of the time, the early Unix was
+remarkably similar to modern Linux systems. For example, the task
+scheduler, the hierarchical filesystem tree and the shell already
+existed back then. </p>
+
+SUBSECTION(«Networking»)
+
+<p> The <em>Advanced Research Projects Agency</em> (ARPA) was a
+military research unit that was part of the USA's department of
+defence. It was established in the early 1960s with the mandate to
+create systems that could survive a nuclear war. The agency created
+the <em>arpanet</em>, the predecessor of today's internet, which was
+designed to stay operational after subordinate network losses. By
+the end of the 1960s and the early 1970s, the fundamental networking
+protocols were established: telnet for remote login was standardized
+in 1969, email (SMTP) in 1971, and the file transfer protocol (FTP)
+in 1973. </p>
+
+<p> By the end of the 1970s many Unix installations existed in
+all parts of the world. However, the arpanet was mostly powered by
+commercial Multics systems because Unix only had rudimentary network
+support (UUCP, the <em>Unix to Unix copy</em>) which could copy files
+over telephone lines via modems but not much more. This changed in
+1983 when TCP/IP networking was developed by the ARPA to replace the
+arpanet. Unix support for TCP/IP was developed at Berkeley University
+which had become the "Mecca" for Unix development and started already
+in 1977 to release their own Unix system named BSD, the <em>Berkeley
+Software Distribution</em>. </p>
+
+SUBSECTION(«Commercialization, POSIX and GNU»)
+
+<p> With excellent networking support and no licensing issues, it
+was only a matter of time until companies became interested in Unix
+in order to make money. Several companies started to commercialize
+Unix by adding features to the common code base but keeping their
+improvements closed, effectively stopping the source code from being
+freely distributable. At the same time Microsoft began to sell their
+DOS operating system, targeting small businesses and the home market.
+DOS lacked many features that Unix already had for a decade, like
+multi-tasking and multi-user support, but it did run on the cheap
+Intel 286 processors that were too weak for Unix. </p>
+
+<p> By 1985 the commercialization of Unix and the success of Microsoft
+had damaged the Unix community badly. But also the various companies
+that sold their particular proprietary Unix brand realized that
+too many incompatible Unix implementations would only hurt their
+business. That's where the Computer Society of the <em>Institute of
+Electrical and Electronics Engineers</em> (IEEE) became involved
+in Unix. The IEEE is an organization which was already founded in
+1946 to advance the theory, practice, and application of computer
+technology. This organization created POSIX, the <em>Portable Operating
+System Interface for Unix</em>, which is a family of specifications
+for maintaining compatibility between operating systems. The first
+version of POSIX was published in 1988. It covered several command
+line utilities including <code>vi(1)</code> and <code>awk(1)</code>,
+the shell scripting language, application programmer interfaces (APIs)
+for I/O (input/output) and networking, and more. Up to the present
+day POSIX is maintained by the IEEE and new revisions of the POSIX
+standard are published regularly. </p>
+
+<p> In 1983 Richard Stallman launched the GNU project and the Free
+Software Foundation as a reaction to the ongoing commercialization
+of Unix. GNU, which is is a recursive acronym for "GNU's not Unix",
+aimed to keep the Unix source code free, or to replace non-free parts
+by open source equivalents. To this aim the GNU project created
+the <em>GNU General Public License</em> (GPL), which requires not
+only the source code to stay free, but also that all subsequent
+modifications to the code base remain free. By the end of the 80s,
+the GNU toolset had become a full developer software stack licensed
+under the GPL. This set of software packages was complemented by
+the <em>X window system</em>, which was also released under a free
+license and enabled programmers to build graphical applications for
+desktop systems. Moreover, the first open source scripting language,
+<em>perl</em>, was released in 1987. </p>
+
+SUBSECTION(«Linux»)
+
+<p> In 1985 Intel announced the 386 processor which, unlike its 286
+predecessor, was powerful enough to run Unix. There were efforts to
+port the Unix operating system kernel to this hardware, but these
+efforts were impaired by pending lawsuits about who owns the copyright
+on the BSD source code. Due to the unclear legal situation of the BSD
+code, the major missing piece in the GNU toolset was a free operating
+system kernel. This hole was filled in 1991 when Linus Torvalds,
+a student from Helsinki in Finland, announced the first version of
+his <em>Linux</em> kernel. </p>
+
+<p> Linux did not repeat the licensing problems of the original Unix
+because the Linux source code was written from scratch and licensed
+under the GPL. Due to this difference many developers moved from
+Unix to Linux, so Linux grew quickly and started soon to outperform
+the commercial Unix kernels in almost every benchmark. The cheap 386
+hardware, the Linux kernel, the GNU toolset and the graphical user
+interface based on the X window system facilitated cheap workstations
+which ran a complete open source software stack. </p>
+
+<p> The success of Linux, or <em>GNU/Linux</em> as some prefer to
+call it for reasons that should now be clear, has only increased
+over time, to the point where commercial Unix systems are mostly
+irrelevant. Today Linux runs on a wide variety of machines ranging
+from supercomputers to workstations, smart phones and IOT (internet
+of things) devices with very limited resources.
+
+<p> The same companies which almost killed Unix by commercializing it
+in order to maximize their profit make money with Linux today. However,
+they had to adjust their business model in order to comply with the
+GPL. Rather than selling proprietary software, they bundle open source
+software and sell support to paying customers. Some companies also
+sell hardware with Linux pre-installed. </p>
+
+SUBSECTION(«Linux Distributions»)
+
+<p> A <em>Linux Distribution</em> is a conglomeration of free software,
+including the Linux kernel, the GNU toolset and the X window system,
+plus possibly other, proprietary software on top of that. Usually a
+distribution also includes an installer and a package manager to
+make it easy to install and update packages according to the users'
+needs. </p>
+
+<p> There are hundreds of Linux distributions, and new distributions
+are created all the time while others are discontinued. Many
+distributions are backed by companies which target specific
+classes of users or hardware, but there are also non-commercial
+Linux distributions which are solely driven by a community of
+volunteers. </p>
+
+<p> One of the most popular company-backed Linux distributions is
+<em>Ubuntu</em>, which is led since 2004 by the UK-based Canonical Ltd.
+It targets unskilled desktop users which would like to switch away
+from Microsoft Windows. One reason for the popularity of Ubuntu
+is that it is very easy to install on standard desktop and laptop
+hardware. A distinguishing feature of Ubuntu is its strict release
+cycles: New versions are released in April and October of each year,
+and every fourth release is a <em>long-term support</em> (LTS) release
+which will be supported for at least five years. Ubuntu also features
+a variant for server hardware which contains a different Linux kernel
+and ships with most desktop packages excluded. </p>
+
+<p> The main community-driven Linux distribution is
+<em>Debian</em>. The Debian project was founded in 1993 and the first
+stable version was released in 1996. Debian is used as the basis for
+many other distributions. In fact, Ubuntu is based on Debian. The
+development of Debian closely follows the Unix culture in that it
+is developed openly and distributed freely. A team of about 1000
+core developers work together with countless package maintainers
+according to the Debian Social Contract, the Debian Constitution,
+and the Debian Free Software Guidelines. </p>
+
+EXERCISES()
+
+<ul>
+ <li> Run <code>uname -a</code> on various Unix machines to see the
+ OS type and the kernel version. </li>
+
+ <li> Nice read on the
+ <a href="http://www.catb.org/~esr/writings/taoup/html/ch02s01.html">Origins and
+ History of Unix</a>, 1969-1995. </li>
+
+ <li> Explore the <a
+ href="https://upload.wikimedia.org/wikipedia/commons/7/77/Unix_history-simple.svg">Unix
+ time line</a>. </li>
+
+ <li> Try out the <a
+ href="http://www.gnu.org/cgi-bin/license-quiz.cgi">Free Software
+ licensing quiz</a>. </li>
+
+ <li> Read the <a
+ href="https://www.newyorker.com/business/currency/the-gnu-manifesto-turns-thirty">
+ notes on the 30th anniversary</a> of the GNU Manifesto. </li>
+
+ <li> Read the <a
+ href="http://www.catb.org/~esr/writings/unix-koans/end-user.html">
+ Koan of Master Foo and the End User</a>. </li>
+
+ <li> On a Debian or Ubuntu system, run <code>aptitude search
+ python</code> to list all python-related Ubuntu packages. Run
+ <code>aptitude show python-biopython</code> to see the description
+ of the biopython package. Repeat with different search patterns and
+ packages. </li>
+
+ <li> The Debian Social Contract (DSC) describes the agenda of Debian.
+ Find the DSC online, read it and form your own opinion about the key
+ points stated in this document. </li>
+</ul>
+
+SECTION(«Characteristics of a Unix system»)
+
+<p> After having briefly reviewed the history of Unix, we now look
+closer at the various components which comprise a Unix system and
+which distinguish Unix from other operating systems. We focus on
+general design patterns that have existed since the early Unix days
+and are still present on recent Linux systems. </p>
+
+SUBSECTION(«Single Hierarchy of Files»)
+
+<p> The most striking difference between Unix and Windows is perhaps
+that on Unix the files of all devices are combined to form a single
+hierarchy with no concept of drive letters. When the system boots,
+there is only one device, the <em>root device</em>, which contains the
+<em>root directory</em>. To make the files of other devices visible,
+the <em>mount</em> operation must be employed. This operation attaches
+the file hierarchy of the given device to the existing hierarchy
+at a given location which is then called the <em>mountpoint</em>
+of the device. Mountpoints are thus the locations in the hierarchy
+where the underlying storage device changes. </p>
+
+<p> The root directory contains a couple of well-known subdirectories,
+each of which is supposed to contain files of a certain type or for
+a certain purpose. The following table lists a subset:
+
+<ul>
+ <li> <code>/bin</code>: Essential commands for all users </li>
+ <li> <code>/sbin</code>: Essential system binaries </li>
+ <li> <code>/lib</code>: Essential libraries </li>
+ <li> <code>/usr</code>: Non-essential read-only user data </li>
+ <li> <code>/etc</code>: Static configuration files </li>
+ <li> <code>/home</code>: Home directories </li>
+ <li> <code>/tmp</code>: Temporary files </li>
+ <li> <code>/run</code>: Files which describe the state of running programs </li>
+ <li> <code>/var</code>: Log and spool files </li>
+</ul>
+
+<p> The <em>Filesystem Hierarchy Standard</em> describes the various
+subdirectories in more detail. The exercises ask the reader to become
+acquainted with this directory structure. </p>
+
+SUBSECTION(«POSIX Commands and Shell»)
+
+<p> The Filesystem Hierarchy Standard lists <code>/bin</code>
+and <code>/sbin</code> and several other directories for executable
+files. The POSIX standard defines which executables must exist in one
+of these directories for the system to be POSIX-compliant. Well over
+100 <em>POSIX commands</em> are listed in the XCU volume of this
+standard. Besides the names of the commands, the general behaviour
+of each and the set of command line options and their semantics are
+described. POSIX versions are designed with backwards compatibility
+in mind. For example, a new POSIX version might require a command
+to support additional command line options, but existing options are
+never dropped and never change semantics in incompatible ways. The
+target audience of the POSIX document are programmers who implement
+and maintain the POSIX commands and users which want to keep their
+software portable across different Unix flavors. </p>
+
+<p> One of the POSIX commands is the <em>shell</em>,
+<code>/bin/sh</code>, an interpreter that reads input expressed in
+the <em>shell command language</em>, which is also part of POSIX.
+The shell transforms the input in various ways to produce commands
+and then executes these commands. The user may enter shell code
+(i.e., code written in the shell command language) interactively at
+the <em>command prompt</em>, or supply the input for the shell as a
+<em>shell script</em>, a text file which contains shell code. Shell
+scripts which only contain POSIX commands and use only POSIX options
+are portable between different shell implementations and between
+different Unix flavors. They should therefore never cease to work after
+an upgrade. Among the many available POSIX shell implementations,
+<em>GNU bash</em> is one of the more popular choices. Bash is fully
+POSIX compatible and offers many more features on top of what is
+required by POSIX. </p>
+
+<p> Several implementations of the POSIX commands exist. On Linux
+the GNU implementation is typically installed while FreeBSD, NetBSD
+and MacOS contain the BSD versions. Although all implementations
+are POSIX-compliant, they differ considerably because different
+implementations support different sets of additional features and
+options which are not required by POSIX. These extensions are not
+portable, and should thus be avoided in shell scripts that must work
+on different Unix flavors. </p>
+
+<p> In addition to the POSIX commands, a typical Unix system might well
+contain thousands of other commands. This illustrates another aspect
+that is characteristic for Unix: Tools should do only one specific
+task, and do it well. The operating system provides mechanisms
+to combine the simple commands in order to form more powerful
+programs. For example, commands can be <em>chained</em> together so
+that the output of one command becomes the input for the next command
+in the chain. This is the idea behind <em>pipes</em>, a Unix concept
+which dates back to 1973 and which is also covered by POSIX. We shall
+come back to pipes and related concepts in a later section. </p>
+
+SUBSECTION(«Multi-User, Multi-Tasking, Isolation»)
+
+<p> From the very beginning Unix was designed to be a multi-user
+and a multi-tasking operating system. That is, it could run multiple
+programs on behalf of different users independently of each other and
+isolated from each other. This design was chosen to improve hardware
+utilization and robustness. In contrast, DOS and early versions
+of Windows were designed for <em>personal computing</em> (PC) and
+had no notion of user accounts, access permissions or isolation.
+This resulted in an unstable system because a single misbehaving
+program was enough to take down the whole system. Therefore these
+features had to be retrofitted later. </p>
+
+<p> While multi-tasking makes all tasks appear to run simultaneously
+even if there are more tasks than CPUs, isolation refers to
+<em>memory protection</em>, a mechanism which prevents applications
+from interfering with each other and with the internals of the
+operating system. A running Unix system maintains two sets of
+running tasks: besides the <em>application tasks</em> there is
+also a set of <em>kernel tasks</em>. Unlike the application tasks,
+the kernel tasks are privileged in that they can access the memory
+of the application tasks while applications tasks can only access
+their own memory. Isolation is achieved by a hardware concept called
+<em>protection domains</em>, which existed already in Multics and thus
+predates Unix. In the simplest case, there are only two protection
+domains: a privileged domain called <em>ring 0</em> for the kernel
+tasks, and an unprivileged domain for application tasks, also called
+<em>user processes</em> in this context. The CPU is always aware
+of the current protection domain as this information is stored in a
+special CPU register.
+
+SUBSECTION(«System Calls and the C POSIX Library»)
+
+<p> Only when the CPU is running in the privileged ring 0 domain
+(also known as <em>kernel mode</em>, as opposed to <em>user mode</em>
+for application tasks), it can interact directly with hardware and
+memory. If an application wants to access hardware, for example read a
+data block from a storage device, it can not do so by itself. Instead,
+it has to ask the operating system to perform the read operation
+on behalf of the application. This is done by issuing a <em>system
+call</em>. Like function calls, system calls interrupt the current
+program, continue execution at a different address and eventually
+return to the instruction right after the call. However, in addition
+to this, they also cause the CPU to enter kernel mode so that it can
+perform the privileged operation. When the system call has done its
+work and is about to return to the application, the protection domain
+is changed again to let the CPU re-enter user mode. </p>
+
+<p> The system calls thus define the interface between applications
+and the operating system. For backwards compatibility it is of utmost
+importance that system calls never change semantics in an incompatible
+way. Moreover, system calls must never be removed because this would
+again break existing applications. The syntax and the semantics
+of many system calls are specified in POSIX, although POSIX does
+not distinguish between functions and system calls and refers to
+both as <em>system functions</em>. This is because system calls are
+typically not performed by the application directly. Instead, if an
+application calls, for example, <code>read()</code>, it actually calls
+the compatibility wrapper for the <code>read</code> system call which
+is implemented as a function in the <em>C POSIX Library</em> (libc),
+which ships with every Unix system. It is this library which does the
+hard work, like figuring out which system calls are supported on the
+currently running kernel, and how kernel mode must be entered on this
+CPU type. Like the POSIX commands, the system functions described in
+POSIX never change in incompatible ways, so programs which exclusively
+use POSIX system functions are portable between different Unix flavors
+and stay operational after an upgrade. </p>
+
+SUBSECTION(«Multi-Layer Configuration Through Text Files»)
+
+<p> On a multi-user system it becomes necessary to configure programs
+according to each user's personal preferences. The Unix way to
+achieve this is to provide four levels of configuration options
+for each program. First, there are the built-in defaults which are
+provided by the author of the program. Next, there is the system-wide
+configuration that is controlled by the administrator. Third,
+there is the user-defined configuration, and finally there are the
+command line options. Each time the program is executed, the four
+sets of configuration options are applied one after another so
+that the later sets of options override the earlier settings. The
+system-wide configuration is stored in <code>/etc</code> while the
+user-defined configuration is stored in that user's home directory.
+Both are are simple text files that can be examined and modified with
+any text editor. This makes it easy to compare two configurations
+and to transfer the configuration across different machines or user
+accounts. </p>
+
+SUBSECTION(«Everything is a File»)
+
+<p> Another mantra which is often heard in connection with Unix is
+<em>everything is a file</em>. This phrase, while certainly catchy,
+is slightly incorrect. A more precise version would be <em>everything
+is controlled by a file descriptor</em>, or, as Ritchie and Thompson
+stated it, Unix has <em>compatible file, device, and inter-process
+I/O</em>. Modern Unix systems have pushed this idea further and employ
+file descriptors also for networking, process management, system
+configuration, and for certain types of events. The file descriptor
+concept is thus an abstraction which hides the differences between the
+objects the file descriptors refer to. It provides a uniform interface
+for the application programmer who does not need to care about whether
+a file descriptor refers to a file, a network connection, a peripheral
+device or something else because the basic I/O operations like open,
+read, write are the same. </p>
+
+<p> File descriptors are ubiquitous since every Unix program uses
+them, albeit perhaps implicitly via higher-level interfaces provided
+by a scripting language. We shall return to this topic when we discuss
+processes. </p>
+
+SUBSECTION(«Manual Pages»)
+
+<p> All POSIX commands and most other programs are installed along
+with one or more <em>man pages</em> (short for <em>manual pages</em>),
+which are plain text files that can be formatted and displayed in
+various ways. This concept was introduced in 1971 as part of the
+<em>Unix Programmer's Manual</em>. The characteristic page layout
+and the typical sections (NAME, SYNOPSIS, DESCRIPTION, EXAMPLES,
+SEE ALSO) of a man page have not changed since then. The POSIX
+<code>man</code> command is used to view man pages in a terminal. For
+example, the command <code>man ls</code> opens the man page of the
+<code>ls</code> command, and <code>man man</code> shows the man page
+of the <code>man</code> command itself. Most implementations also
+maintain a database of the existing man pages and provide additional
+commands to query this database. For example, the <code>whatis</code>
+command prints the one-line description of all man pages which match
+a pattern while the <code>apropos</code> command searches the manual
+page names and descriptions. </p>
+
+<p> In addition to the man pages for commands, there are man pages for
+system calls, library functions, configuration files and more. Each
+man page belongs to one of several <em>man sections</em>. For example,
+the aforementioned man pages for <code>ls</code> and <code>man</code>
+are part of section 1 (user commands) while section 2 is reserved for
+system calls and section 8 for administration commands that can only be
+executed by privileged users. By convention, to indicate which section
+a command or a function belongs to, the man section is appended in
+parenthesis as in <code>mount(8)</code>. Most Unix systems also offer
+translated man pages for many languages as an optional package. Note
+that the same name may refer to more than one man page. For example
+there is <code>kill(1)</code> for the user command that kills processes
+and also <code>kill(2)</code> which describes the corresponding system
+call. To open the man page of a specific section, one may use a command
+like <code>man 2 kill</code>. The <code>MANSECT</code> environment
+variable can be set to a colon-delimited list of man sections to
+change the order in which the man sections are searched. </p>
+
+<p> Consulting the local man pages rather than searching the web has
+some advantages. Most importantly, the local pages will always give
+correct answers since they always match the installed software while
+there is no such relationship between a particular web documentation
+page and the version of the software package that is installed on the
+local computer. Working with man pages is also faster, works offline
+and helps the user to stay focused on the topic at hand. </p>
+
+EXERCISES()
+
+<ul>
+ <li> Run <code>df</code> on as many systems as possible to see the
+ mount points of each filesystem. Then discuss the pros and cons of
+ a single file hierarchy as opposed to one hierarchy per device. </li>
+
+ <li> Run <code>ls /</code> to list all top-level subdirectories of
+ the root file system and discuss the purpose of each. Consult the
+ Filesystem Hierarchy Standard if in doubt. </li>
+
+ <li> Execute <code>cd / && mc</code> and start surfing at the root
+ directory. </li>
+
+ <li> Compare the list of top-level directories that exist on different
+ Unix systems, for example Linux and MacOS. </li>
+
+ <li> Find out which type of files are supposed to be stored in
+ <code>/usr/local/bin</code>. Run <code>ls /usr/local/bin</code>
+ to list this directory. </li>
+
+ <li> Find out what the term <em>bashism</em> means and learn how to
+ avoid bashishms. </li>
+
+ <li> Find the POSIX specification of the <code>cp(1)</code> command
+ online and compare the set of options with the options supported by
+ the GNU version of that command, as obtained with <code>man cp</code>
+ on a Linux system. </li>
+
+ <li>
+ <ul>
+ <li> Run <code>time ls /</code> and discuss the meaning of
+ the three time values shown at the end of the output (see
+ <code>bash(1)</code>). </li>
+
+ <li> Guess the user/real and the sys/real ratios for the following
+ commands. Answer, before you run the commands.
+
+ <ul>
+ <li> <code>time head -c 100000000 /dev/urandom > /dev/null</code> </li>
+
+ <li> <code>i=0; time while ((i++ < 1000000)); do :; done</code>
+ </li>
+ </ul>
+ </li>
+
+ <li> Run the above two commands again, this time run
+ <code>htop(1)</code> in parallel on another terminal and observe the
+ difference. </li>
+ </ul>
+ </li>
+
+ <li> On a Linux system, check the list of all system calls in
+ <code>syscalls(8)</code>. </li>
+
+ <li> The <code>strace(1)</code> command prints the system calls that
+ the given command performs. Guess how many system calls the command
+ <code>ls -l</code> will make. Run <code>strace -c ls -l</code> for
+ the answer. Read the <code>strace(1)</code> man page to find suitable
+ command line options to only see the system calls which try to open
+ a file. </li>
+
+ <li> Guess how many man pages a given system has. Run <code>whatis -w
+ '*' | wc -l</code> to see how close your guess was. </li>
+
+ <li> Search the web for "cp(1) manual page" and count how many
+ <em>different</em> manual pages are shown in the first 20 hits. </li>
+</ul>
+
+HOMEWORK(«
+
+Think about printers, sound cards, or displays as a file. Specifically,
+describe what <code>open, read</code>, and <code>write</code> should
+mean for these devices.
+
+», «
+
+Opening would establish a (probably exclusive) connection
+to the device. Reading from the file descriptor returned by
+<code>open(2)</code> could return all kinds of status information,
+like the type, model and capabilities of the device. For example,
+printers could return the number of paper trays, the amount of toner
+left etc. Writing to the file descriptor would cause output on the
+device. This would mean to print the text that is written, play the
+audio samples, or show the given text on the display. The point to
+take away is that the <code>open, read, write</code> interface is a
+generic concept that works for different kinds of devices, not only
+for storing data in a file on a hard disk.
+
+»)
+
+SECTION(«Paths, Files and Directories»)
+
+In this section we look in some detail at paths, at a matching
+language for paths, and at the connection between paths and files. We
+then describe the seven Unix file types and how file metadata are
+stored. We conclude with the characteristics of soft and hard links.
+
+SUBSECTION(«Paths»)
+
+<p> The path concept was introduced in the 1960s with the Multics
+operating system. Paths will be familiar to the reader because
+they are often specified as arguments to commands. Also many
+system calls receive a path argument. A path is a non-empty
+string of <em>path components</em> which are separated by slash
+characters. An <em>absolute path</em> is a path that starts with a
+slash, all other paths are called <em>relative</em>. A relative path
+has to be interpreted within a context that implies the leading
+part of the path. For example, if the implied leading part is
+<code>/foo/bar</code>, the relative path <code>baz/qux</code> is
+equivalent to the absolute path <code>/foo/bar/baz/qux</code>. </p>
+
+<p> Given a path, there may or may not exist a file or a
+directory that corresponds to the path. <em>Path lookup</em> is the
+operation which determines the answer to this question, taking the
+implied leading part into account in case of relative paths. This
+operation is always performed within the kernel and turns out to
+be surprisingly complex due to concurrency and performance issues.
+Consult <code>path_resolution(7)</code> on a Linux system to learn
+more about how pathnames are resolved to files. </p>
+
+<p> If a path was successfully looked up, each path component up to the
+second-last refers to an existing directory while the last component
+refers to either a file or a directory. In both cases the directory
+identified by the second-last component contains an entry named by the
+last component. We call those paths <em>valid</em>. The valid paths
+give rise to a rooted tree whose interior nodes are directories and
+whose leaf nodes are files or directories. Note that the validity of a
+path depends on the set of existing files, not just on the path itself,
+and that a valid path may become invalid at any time, for example if
+a file is deleted or renamed. Many system calls which receive a path
+argument perform path lookup and fail with the <code>No such file or
+directory</code> error if the lookup operation fails. </p>
+
+<p> It depends on the underlying filesystem whether the path components
+are <em>case-sensitive</em> or <em>case-insensitive</em>. That is,
+whether paths which differ only in capitalization (for example
+<code>foo</code> and <code>Foo</code>) refer to the same file.
+Since the hierarchy of files may be comprised of several filesystems,
+some components of the path may be case-sensitive while others are
+case-insensitive. As a rule of thumb, Unix filesystems are case
+sensitive while Microsoft filesystems are case-insensitive even when
+mounted on a Unix system. </p>
+
+<p> Path components may contain every character except the Null
+character and the slash. In particular, space and newline characters
+are allowed. However, while dots are allowed in path components if
+they are used together with other characters, the path components
+<code>.</code> and <code>..</code> have a special meaning: every
+directory contains two subdirectories named <code>.</code> and
+<code>..</code> which refer to the directory itself and its parent
+directory, respectively. </p>
+
+SUBSECTION(«Globbing»)
+
+<p> Globbing, also known as <em>pathname expansion</em>, is a pattern
+matching language for paths which was already present in the earliest
+Unix versions. The glob operation generates a set of valid paths from
+a <em>glob pattern</em> by replacing the pattern by all <em>matching
+paths</em>. </p>
+
+<p> Glob patterns may contain special characters called
+<em>wildcards</em>. The wildcard characters are: </p>
+
+ <ul>
+ <li> <code>*</code>: match any string, </li>
+ <li> <code>?</code>: match any simple character, </li>
+ <li> <code>[...]</code>: match any of the enclosed characters. </li>
+ </ul>
+
+<p> The complete syntax rules for glob patterns and the exact
+semantics for pattern matching are described in POSIX and in
+<code>glob(7)</code>. Any POSIX-compliant shell performs globbing
+to construct the command to be executed from the line entered at
+the prompt. However, POSIX also demands system functions which make
+globbing available to other applications. These are implemented as
+part of libc. </p>
+
+
+<p> There are a few quirks related to globbing which are worth to
+point out. First, if no valid path matches the given pattern, the
+expansion of the pattern is, by definition according to POSIX, the
+pattern itself. This can lead to unexpected results. Second, files
+which start with a dot (so-called <em>hidden</em> files) must be
+matched explicitly. For example, <code>rm *</code> does <em>not</em>
+remove these files. Third, the tilde character is <em>no</em> wildcard,
+although it is also expanded by the shell. See the exercises for more
+examples. </p>
+
+<p> POSIX globbing has some limitations. For example, there is no
+glob pattern which matches exactly those files that start with an
+arbitrary number of <code>a</code> characters. To overcome these
+limitations, some shells extend the matching language by implementing
+<em>extended glob patterns</em> which are not covered by POSIX. For
+example, if extended globbing feature of <code>bash(1)</code> is
+activated via the <code>extglob</code> option, the extended glob
+pattern <code>+(a)</code> matches the above set of files. </p>
+
+SUBSECTION(«File Types»)
+
+We have seen that all but the last component of a valid path refer
+to directories while the last component may refer to either a file
+or a directory. The first character in the output of <code>ls
+-l</code> indicates the type of the last path component: for
+directories a <code>d</code> character is shown while files (also
+called <em>regular</em> files in this context) get a hyphen character
+(<code>-</code>). Besides directories and regular files, the following
+special file types exist:
+
+<dl>
+ <dt> Soft link (<code>l</code>) </dt>
+
+ <dd> A file which acts as a pointer to another file. We shall cover
+ links in a dedicated subsection below. </dd>
+
+ <dt> Device node (<code>c</code> and <code>b</code>) </dt>
+
+ <dd> Also called <em>device special</em>. These files refer to devices
+ on the local system. Device nodes come in two flavors: character
+ devices (<code>c</code>) and block devices (<code>b</code>). Regardless
+ of the flavor, each device node has a major and a minor number
+ associated with it. The major number indicates the type of the
+ device (e.g. a hard drive, a serial connector, etc.) while the
+ minor number enumerates devices of the same type. On most systems
+ the device nodes are created and deleted on the fly as the set of
+ connected devices changes, for example due to a USB device being
+ added or removed. However, device nodes can also be created manually
+ with the <code>mknod(1)</code> command or the <code>mknod(2)</code>
+ system call. Device nodes do not necessarily correspond to physical
+ devices. In fact, POSIX demands the existence of a couple of
+ <em>virtual devices</em> with certain properties. We look at some of
+ these in the exercises. The access to device nodes which do correspond
+ to physical devices is usually restricted to privileged users. </dd>
+
+ <dt> Socket (<code>s</code>) </dt>
+
+ <dd> Sockets provide an interface between a running program and the
+ network stack of the kernel. They are subdivided into <em>address
+ families</em> which correspond to the various network protocols. For
+ example, the <code>AF_INET</code> and <code>AF_INET6</code> address
+ families are for internet protocols (IP) while <code>AF_LOCAL</code>
+ (also known as <code>AF_UNIX</code>) is used for communication between
+ processes on the same machine. These local sockets are also called
+ <em>Unix domain sockets</em>. They can be bound to a path which
+ refers to a file of type socket. Regardless of the address family,
+ processes can exchange data via sockets in both directions, but
+ the local sockets support additional features, like passing process
+ credentials to other processes. </dd>
+
+ <dt> Fifo (<code>p</code>) </dt>
+
+ <dd> Files of type <em>fifo</em> are also known as <em>named
+ pipes</em>. They associate a path with a kernel object that provides a
+ <em>First In, First Out</em> data channel for user space programs. Data
+ written to the fifo by one program can be read back by another program
+ in the same order. Fifos are created with the <code>mkfifo(1)</code>
+ command or the <code>mkfifo(3)</code> library function. </dd>
+</dl>
+
+<p> Note that the type of a file is never inferred from the path.
+In particular the suffix of the path (everything after the last
+dot) is just a convention and has no strict connection to the file
+type. Also there is no difference between text and binary files. </p>
+
+SUBSECTION(«Metadata and Inodes»)
+
+<p> The <code>stat(2)</code> system call returns the metadata
+of the file or directory that corresponds to the given path. The
+<code>stat(1)</code> command is a simple program which executes this
+system call and prints the thusly obtained metadata in human-readable
+form, including the file type. This is done without looking at the
+contents of the file because metadata are stored in a special area
+called the <em>inode</em>. All types of files (including directories)
+have an associated inode. Besides the file type, the inode stores
+several other properties prescribed by POSIX. For example the file
+size, the owner and group IDs, and the access permissions are all
+stored in the inode. Moreover, POSIX requires to maintain three
+timestamps in each inode: </p>
+
+<ul>
+ <li> modification time (mtime): time of last content change. </li>
+
+ <li> access time (atime): time of last access. </li>
+
+ <li> status change time (ctime): time of last modification to the
+ inode. </li>
+</ul>
+
+<p> To illustrate the difference between the mtime and the ctime,
+consider the <code>chgrp(1)</code> command which changes the group
+ID of the file or directory identified by its path argument. This
+command sets the ctime to the current time while the mtime is left
+unmodified. On the other hand, commands which modify the contents of
+a file, such as <code>echo foo >> bar</code>, change both the mtime
+and the ctime. </p>
+
+<p> The inode of each file or directory contains twelve <em>mode
+bits</em>, nine of which are the <em>permission bits</em> which
+control who is allowed to access the file or directory, and how. The
+permission bits are broken up into three classes called <em>user</em>
+(<code>u</code>), <em>group</em> (<code>g</code>) and <em>others</em>
+(<code>o</code>). Some texts refer to the first and last class as
+"owner" and "world" instead, but we won't use this naming to avoid
+confusion. Each class contains three bits. The bits of the "user"
+class apply to the file owner, that is, the user whose ID is stored in
+the inode. The "group" category applies to all non-owners who belong
+to the group whose ID is stored in the inode. The third category
+applies to all remaining users. The three bits of each class refer to
+read/write/execute permission. They are therefore named <code>r</code>,
+<code>w</code> and <code>x</code>, respectively. The permission
+bits mean different things for directories and non-directories,
+as described below. </p>
+
+<table>
+ <tr>
+ <td> </td>
+ <td> <em>Directories</em> </td>
+ <td> <em>Non-directories</em> </td>
+ </tr> <tr>
+ <td> <code> r </code> </td>
+
+ <td> The permission to list the directory contents. More precisely,
+ this bit grants the permission to call <code>opendir(3)</code>
+ to obtain a handle to the directory which can then be passed to
+ <code>readdir(3)</code> to obtain the directory contents. </td>
+
+ <td> If read permission is granted, the <code>open(2)</code> system
+ call does not fail with the <code>permission denied</code> error,
+ provided the file is opened in read-only mode. The system call may
+ fail for other reasons, though.
+
+ </tr> <tr>
+ <td> <code> w </code> </td>
+
+ <td> The permission to add or remove directory entries. That is,
+ to create new files or to remove existing files. Note that write
+ permission is not required for the file that is being removed. </td>
+
+ <td> Permission to open the file in write-only mode in order
+ to perform subsequent operations like <code>write(2)</code>
+ and <code>truncate(2)</code> which change the contents of the
+ file. Non-directories are often opened with the intention to both
+ read and write. Naturally, such opens require both read and write
+ permissions. </td>
+
+ </tr> <tr>
+ <td> <code> x </code> </td>
+
+ <td> The permission to <em>search</em> the directory. Searching
+ a directory means to access its entries, either by retrieving
+ inode information with <code>stat(2)</code> or by calling
+ <code>open(2)</code> on a directory entry. </td>
+
+ <td> Run the file. This applies to <em>binary executables</em> as well
+ as to text files which start with a <em>shebang</em>, <code>#!</code>,
+ followed by the path to an interpreter. We shall cover file execution
+ in more detail below. </td>
+
+ </tr>
+</table>
+
+<p> To run the regular file <code>/foo/bar/baz</code>, search
+permission is needed for both <code>foo</code> and <code>bar</code>,
+and execute permission is needed for <code>baz</code>. Similarly, to
+open the regular file <code>foo/bar</code> for reading, we need execute
+permissions on the current working directory and on <code>foo</code>,
+and read permissions on <code>bar</code>. </p>
+
+<p> A <em>numeric permission mode</em> is a three octal digit (0-7)
+number, where the digits correspond to the user, group, other classes
+described above, in that order. The value of each digit is derived by
+adding up the bits with values 4 (read), 2 (write), and 1 (execute).
+The following table lists all eight possibilities for each of the
+three digits. </p>
+
+<table>
+ <tr>
+ <td> octal value </td>
+ <td> symbolic representation </td>
+ <td> meaning </td>
+ </tr> <tr>
+ <td> 0 </td>
+ <td> <code>---</code> </td>
+ <td> no permissions at all </td>
+ </tr> <tr>
+ <td> 1 </td>
+ <td> <code>--x</code> </td>
+ <td> only execute permission </td>
+ </tr> <tr>
+ <td> 2 </td>
+ <td> <code>-w-</code> </td>
+ <td> only write permission </td>
+ </tr> <tr>
+ <td> 3 </td>
+ <td> <code>-wx</code> </td>
+ <td> write and execute permission </td>
+ </tr> <tr>
+ <td> 4 </td>
+ <td> <code>r--</code> </td>
+ <td> only read permission </td>
+ </tr> <tr>
+ <td> 5 </td>
+ <td> <code>r-x</code> </td>
+ <td> read and execute permission </td>
+ </tr> <tr>
+ <td> 6 </td>
+ <td> <code>rw-</code> </td>
+ <td> read and write permission </td>
+ </tr> <tr>
+ <td> 7 </td>
+ <td> <code>rwx</code> </td>
+ <td> read, write and execute permission </td>
+ </tr>
+</table>
+
+<p> The <code>chmod(1)</code> command changes the permission
+bits of the file identified by the path argument. For example,
+<code>chmod 600 foo</code> sets the permissions of <code>foo</code> to
+<code>rw-------</code>. Besides the octal values, <code>chmod(1)</code>
+supports symbolic notation to address the three classes described
+above: <code>u</code> selects the user class, <code>g</code> the
+group class, <code>o</code> the class of other users. The symbolic
+value <code>a</code> selects all three classes. Moreover, the letters
+<code>r</code>, <code>w</code> and <code>x</code> are used to set or
+unset the read, write and execute permission, respectively. The above
+command is equivalent to <code>chmod u=rw,g=---,o=--- foo</code>. The
+<code>+</code> and <code>-</code> characters can be specified instead
+of <code>=</code> to set or unset specific permission bits while
+leaving the remaining bits unchanged. For example <code>chmod go-rw
+foo</code> turns off read and write permissions for non-owners. </p>
+
+<p> Unprivileged users can only change the mode bits of their own
+files or directories while there is no such restriction for the
+superuser. </p>
+
+SUBSECTION(«Hard and Soft Links»)
+
+<p> Links make it possible to refer to identical files through
+different paths. They come in two flavors: hard and soft. Both
+types of links have advantages and disadvantages, and different
+limitations. We start with hard links because these existed already
+in the earliest Unix versions. </p>
+
+<p> A file can have more than one directory entry that points to its
+inode. If two directory entries point to the same inode, they are
+said to be <em> hard links</em> of each other. The two entries are
+equivalent in that they refer to the same file. It is impossible
+to tell which of the two is the "origin" from which the "link"
+was created. Hard links are created with the <code>link(2)</code>
+system call or the <code>ln(1)</code> command. Both take two path
+arguments, one for the existing file and one for the directory entry
+to be created. The filesystem maintains in each inode a link counter
+which keeps track of the number of directory entries which point to the
+inode. The <code>link(2)</code> system call increases the link count
+while <code>unlink(2)</code> decrements the link count and removes
+the directory entry. If the decremented counter remains positive,
+there is still at least one other directory entry which points to
+the inode. Hence the file is still accessible through this other
+directory entry and the file contents must not be released. Otherwise,
+when the link counter reached zero, the inode and the file contents
+may be deleted (assuming the file is not in use). </p>
+
+<p> There are several issues with hard links. For one, hard links
+can not span filesystems. That is, the two path arguments for
+<code>link(2)</code> have to refer to files which reside on the
+same filesystem. Second, it is problematic to create hard links to
+directories. Early Unix systems allowed this for the superuser,
+but on Linux the attempt to hard-link a directory always fails.
+To address the limitations of hard links, <em>soft links</em>, also
+called <em>symbolic links</em> (or <em>symlinks</em> for short),
+were introduced. A soft link can be imagined as a special text file
+containing a single absolute or relative path, the <em>target</em> of
+the link. For relative paths the implied leading part is the directory
+that contains the link. A soft link is thus a named reference in
+the global hierarchy of files. Unlike hard links, the soft link
+and its target do not need to reside on the same filesystem, and
+there is a clear distinction between the link and its target. Soft
+links are created with <code>symlink(2)</code> or by specifying the
+<code>-s</code> option to the <code>ln(1)</code> command. </p>
+
+<p> A soft link and its target usually point to different inodes. This
+raises the following question: Should system calls which receive a
+path argument that happens to be a soft link operate on the link
+itself, or should they <em>follow</em> (or <em>dereference</em>)
+the link and perform the operation on the target? Most system calls
+follow soft links, but some don't. For example, if the path argument
+to <code>chdir(2)</code> happens to be a soft link, the link is
+dereferenced and the working directory is changed to the target of
+the link instead. The <code>rename(2)</code> system call, however,
+does not follow soft links and renames the link rather than its
+target. Other system calls, including <code>open(2)</code>, allow
+the caller to specify the desired behaviour by passing a flag to the
+system call. For yet others there is a second version of the system
+call to control the behaviour. For example, <code>lstat(2)</code> is
+identical to <code>stat(2)</code>, but does not follow soft links. </p>
+
+<p> It is possible for a soft link to refer to an invalid path. In
+fact, <code>ln(1)</code> and <code>symlink(2)</code> do not consider
+it an error if the target does not exist, and happily create a soft
+link which points to an invalid path. Such soft links are called
+<em>dangling</em> or <em>broken</em>. Dangling soft links also occur
+when the target file is removed or renamed, or after a mount point
+change. </p>
+
+<p> Soft links may refer to other soft links. System calls which
+follow soft links must therefore be prepared to resolve chains of
+soft links to determine the file to operate on. However, this is not
+always possible because soft links can easily introduce loops into the
+hierarchy of files. For example, the commands <code>ln -s foo bar;
+ln -s bar foo</code> create such a loop. System calls detect this
+and fail with the <code>Too many levels of symbolic links</code>
+error when they encounter a loop. </p>
+
+<p> Another issue with both soft and hard links is that there is no
+simple way to find all directory entries which point to the same path
+(soft links) or inode (hard links). The only way to achieve this is
+to traverse the whole hierarchy of files. This may be prohibitive
+for large filesystems, and the result is unreliable anyway unless
+the filesystems are mounted read-only. </p>
+
+EXERCISES()
+
+<ul>
+ <li> A path can lack both slashes and components. Give an example
+ of a path that lacks a slash and another example of a path that has
+ no components. </li>
+
+ <li> Assume <code>foo</code> is an existing directory. Guess what the
+ command <code>mv foo bar</code> will do in each of the following cases:
+ (a) <code>bar</code> does not exist, (b) <code>bar</code> exists and
+ is a regular file, (c) <code>bar</code> exists and is a directory.
+ Verify your guess by running the command. </li>
+
+ <li> Many programs check if a path is valid and act differently
+ according to the result. For example, a shell script might
+ check for the existence of a file with code like <code>if test
+ -e "$file"; do something_with "$file"; fi</code>. Explain
+ why this approach is not bullet-proof. How could this be
+ fixed? </li>
+
+ <li> Run <code>touch file-{1..100}</code> to create 100 files. Guess
+ what the following commands will print. Run each command to confirm.
+
+ <ul>
+ <li> <code>ls file-</code> </li>
+ <li> <code>ls file-*</code> </li>
+ <li> <code>ls file-?</code> </li>
+ <li> <code>ls file-[1-4]</code> </li>
+ <li> <code>ls file-[1,3,5,7,9]*</code> </li>
+ </ul>
+ </li>
+
+ <li> Find an extended glob pattern for <code>bash(1)</code>
+ that matches all valid paths whose last component starts with
+ <code>file-</code>, followed by any number of odd digits (1, 3, 5,
+ 7, or 9). </li>
+
+ <li> Point out the flaw in the following shell code: <code>for
+ f in file-*; do something_with "$f"; done</code>. Hint: Search
+ <code>bash(1)</code> for "nullglob".
+
+ <li> Create a file named <code>-r</code> with <code>echo >
+ -r</code>. Try to remove the file with <code>rm -r</code> and
+ discuss why this doesn't work as expected. Find a way to get rid
+ of the file. Discuss what happens if you run <code>rm *</code> in a
+ directory which contains a file named <code>-r</code>. </li>
+
+ <li> The content of the <code>PATH</code> variable is a
+ colon-separated list of directories in which the shell looks for
+ commands to execute. Discuss the dangers of including the current
+ working directory in this list. </li>
+
+ <li> Run <code>id</code> to determine a group <code>G</code> you
+ belong to but is not your primary group. Consider the following
+ commands <code>mkdir foo; chgrp $G foo; touch foo/bar</code>. What
+ is the group ID of <code>foo/bar</code>? Run the same commands, but
+ insert <code>chmod g+s foo</code> as the second-to-last command. </li>
+
+ <li> Run <code>man null</code> and <code>man zero</code> to learn
+ about the properties of these two character devices. </li>
+
+ <li> Assume the modification time stored in the inode of some file
+ suggests that the file was last modified two years ago. How sure
+ can you be that the file was never changed since then? Hint: See the
+ <code>-d</code> option of <code>touch(1)</code>. </li>
+
+ <li> Run the following commands <code>echo hello > foo</code>,
+ <code>cat foo</code>, <code>chmod 600 foo</code>, <code>echo world >>
+ foo</code>. Check the three timestamps with <code>stat foo</code>
+ after each command. </li>
+
+ <li> Determine the state of the permission bits of your own
+ home directory by running <code>ls -ld ~</code>. Who can
+ access its contents? Also look at the permission bits of
+ other people's home directory. </li>
+
+ <li> A file or directory is called <em>world-writeable</em>
+ if the <code>w</code> bit is set in the <code>others</code>
+ class of the permission bits. Create a world-writable
+ directory with <code>mkdir foo; chmod 777 foo</code>
+ and create a file in the new directory: <code>echo hello
+ > foo/bar</code>. Is a different user allowed to create
+ another file there (<code>echo world > foo/baz</code>)? Can
+ he remove it again (<code>rm foo/baz</code>)? Will he succeed
+ in removing <code>foo/bar</code> although it is owned by you
+ and <em>not</em> writable to him? Try the same with the sticky
+ bit turned on (<code>chmod 1777 foo</code>). </li>
+
+ <li> Translate <code>rw-r--r--</code> into octal, and 755 into
+ <code>rwx</code>-notation. </li>
+
+ <li> Create a <a href="#hello_world">hello world script</a>, make it
+ executable and run it. Create a subdirectory of your home directory
+ and move the script into this directory. Set the permissions of the
+ directory to <code>r--------</code>, check whether you still can
+ list/execute it. Do the same with <code>--x------</code>. </li>
+
+ <li> Create a file with <code>echo hello > foo</code>,
+ create soft and hard links with <code>ln -s foo soft</code>
+ and <code>ln foo hard</code>. Examine the inode numbers
+ and link counts using the command <code>stat foo soft
+ hard</code>. Remove <code>foo</code> and repeat. Try to
+ access the file content through both links. Guess what
+ <code>realpath foo soft hard</code> will print. Run the
+ command to confirm. </li>
+
+ <li> Create a dangling symlink with <code>ln -s /nope foo</code>. Would
+ you expect the commands <code>ls -l foo</code> and <code>cat foo</code>
+ succeed? Run these commands to verify your guess. </li>
+
+ <li> One of the seven Unix file types is symlink. Why is there no
+ file type for hard links? </li>
+</ul>
+
+HOMEWORK(«
+How many paths are there that refer to the same file?
+», «
+Given the path <code>/foo/bar</code>, one may construct different paths
+which refer to the same file by inserting any number of <code>/.</code>
+or <code>../foo</code> after the first component. For example,
+<code>/foo/./bar</code> and <code>/foo/../foo/bar</code> both refer
+to the same file. If relative paths have to be taken into account as
+well, even more paths can be constructed easily. Hence the answer is:
+arbitrary many.
+
+This illustrates the fundamental difference between a path and a
+file. Paths can be mapped to files, but not the other way around. In
+particular, there is no such thing like "the list of paths which have
+changed since yesterday".
+
+The concept of hard- and soft links complicates
+the situation further. This topic is discussed in a <a
+href="#soft_and_hard_links">subsequent section</a>. See the exercises
+therein for more information.
+
+»)
+
+HOMEWORK(«
+Given two paths, how can one tell if they refer to the same file?
+», «
+
+Among other information, the metadata record of each file contains the
+so-called <em>inode number</em>, which uniquely identifies the file
+within the file system that contains the file. Therefore, if both
+paths are known to refer to files stored on the same file system,
+a comparison of the two inode numbers is sufficient to tell whether
+the two paths refer to the same file. The inode number can be obtained
+with the command <code>ls -i</code>.
+
+In the general case one additionally has to check that the
+two <code>device IDs</code> which identify the underlying file
+systems are also identical. Like the inode number, the device ID
+is part of the metadata of the file. It can be obtained by running
+<code>stat(1)</code>.
+
+»)
+
+HOMEWORK(«
+Device nodes come in two flavors: Character and block devices. Explain
+the difference between the two device flavors.
+»)
+
+HOMEWORK(«
+
+<ul>
+ <li> Nine of the 12 mode bits of each file are the permission
+ bits. The remaining three are the <em>sticky</em>, <em>setuid</em>
+ and <em>setgid</em> bits. Explain the purpose of each. </li>
+
+ <li> Run <code>find /usr/bin/ -perm -2000 -ls</code> to see all SUID
+ executables in <code>/usr/bin</code>. Discuss why those programs have
+ the SUID bit set. </li>
+</ul>
+
+»)
+
+HOMEWORK(«
+How many possible permission modes exist for a file or directory on
+a Unix System?
+», «
+There are nine permission bits that can be turned on and off
+independently. Hence we have 2^9=512 possibilities. When taking into
+account the three special bits (sticky, setuid, setgid), the number
+increases to 2^12=4096.
+»)
+
+HOMEWORK(«
+Explain each command of the <a href="«#»symlink_madness">script</a>
+below. Show the arrangement of all files and links in a figure,
+drawing a directory as a circle and a file as a square. How many
+different paths exist for the file <code> a</code>? Discuss whether
+the question "What's the path of a given file?" makes sense.
+
+», «
+
+<div>
+<svg
+ width="150" height="125"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+>
+ <marker
+ id="slm_arrow"
+ viewBox="0 0 10 10" refX="5" refY="5"
+ markerWidth="4" markerHeight="4"
+ orient="auto-start-reverse">
+ <path d="M 0 0 L 10 5 L 0 10 z" />
+ </marker>
+ <circle
+ fill="#ccc"
+ stroke-width="1"
+ stroke="black"
+ r=20
+ cx=51
+ cy=21
+ />
+ <text
+ x="51"
+ y="21"
+ stroke="black"
+ text-anchor="middle"
+ dy="0.3em"
+ >foo</text>
+ <rect
+ fill="#ccc"
+ stroke-width="1"
+ stroke="black"
+ x=1
+ y=81
+ width="40"
+ height="40"
+ rx="5"
+ />
+ <text
+ x="21"
+ y="101"
+ stroke="black"
+ text-anchor="middle"
+ dy="0.3em"
+ >a</text>
+ <ellipse
+ cx=81
+ cy=101
+ rx=30
+ ry=20
+ fill="#ccc"
+ stroke-width="1"
+ stroke="black"
+ />
+ <text
+ x="81"
+ y="101"
+ stroke="black"
+ text-anchor="middle"
+ dy="0.3em"
+ >testdir</text>
+ <line
+ stroke="black"
+ stroke-width="1"
+ x1="41"
+ y1="45"
+ x2="24"
+ y2="75"
+ />
+ <line
+ stroke="black"
+ stroke-width="1"
+ x1="61"
+ y1="45"
+ x2="77"
+ y2="75"
+ />
+ <path
+ d="
+ M 118,101
+ C 150,90 150,30 80,20
+ "
+ stroke-width="2"
+ stroke="black"
+ fill="none"
+ marker-end="url(#slm_arrow)"
+ />
+</svg>
+</div>
+
+Since <code> foo/a</code>, <code> foo/testdir/a</code>, <code>
+foo/testdir/testdir/a </code> etc. all refer to the same file, there
+are infinitely many paths for the file <code> a</code>. Hence the
+question makes no sense: There is no such thing as <em> the </em>
+path to a file.
+
+»)
+
+HOMEWORK(«
+
+Recall that the path component <code>..</code> refers to the
+parent directory. Give an example of a path to a directory where
+"parent directory" and "directory identified by the path obtained by
+removing the last path component" are different. Which of the two
+interpretations of <code>..</code> does bash apply when you type
+<code>cd ..</code> at the bash prompt?
+
+»)
+
+HOMEWORK(«
+
+Is it possible to choose among all possible paths which refer to the
+same file a <em>canonical</em> path? That is, a shortest (counting
+characters) absolute path which does not contain any soft links?
+
+», «
+
+<p> The POSIX standard requires each Unix system library to provide
+the <code>realpath()</code> function which performs the following
+substitutions on the given path: First, the path to the current
+working directory is prepended if the given path is relative
+(does not begin with a slash). Second, symbolic links are replaced
+by their targets. Third, any occurrences of <code>/.</code> and
+<code>foo/..</code> are removed. The thusly transformed path is
+returned by the function as the canonical path. </p>
+
+<p> Although each path can be canonicalized in this way, not all paths
+which refer to the same file give rise to the same canonical path. For
+example, <code>/tmp/foo</code> and <code>/tmp/bar</code> could refer
+to regular files which are hard links of each other. In this case the
+paths refer to the same file, yet the paths are different and already
+canonicalized. The same can happen when a file system (or a subtree
+of it) is <em>bind mounted</em>. That is, the file system tree is
+visible at two or more locations in the global directory tree. </p>
+
+The message of this exercise is to convince the reader that it is
+incorrect to assume that two files are different because their paths
+are different.
+
+»)
+
+SECTION(«Processes»)
+
+<p> A <em>program</em> consists of instructions and data stored in
+a regular file. A <em> user process</em> is an instance of a running
+program. This is in contrast to <em>kernel processes</em> which are
+created directly by the kernel and have no relationship to executable
+files. Since we shall only be concerned with user processes, we will
+refer to these as "processes" from now on. In this section we'll see
+how processes are created and removed. We will then take a closer look
+at the enviroment of a process and discuss how processes communicate
+with each other. </p>
+
+SUBSECTION(«Process Tree, Zombies and Orphans»)
+
+<p> When the system boots, there is only one process, the
+<em>init</em> process, which is created by the kernel at the end
+of the boot sequence by executing <code>/sbin/init</code>. All
+other processess are created from existing processes by means of
+the <code>fork(2)</code> system call. The process which called
+<code>fork(2)</code> is said to be the <em>parent</em> of the newly
+created <em>child</em>. After <code>fork(2)</code> returns, both
+parent and child are executing independently of each other. Both
+processes may call <code>fork(2)</code> again to create further
+processes. This gives rise to a tree structure where the processes
+are the nodes of the tree with init being the root node. The edges
+of the tree describe the parent-child relationships. </p>
+
+<p> If there are more processes than CPUs, not all processes can
+run simultaneously. It is the mission of the kernel's <em>task
+scheduler</em> to assign processes to CPUs and to perform <em>context
+switches</em>. That is, to take away the CPU from a running process
+in order to give another process the chance to run. The scheduler
+has to choose the duration of each process' time slice and it must
+pick the process to switch to when the time slice of the current
+process has expired or the process gives up the CPU voluntarily, for
+example because it is waiting for an I/O operation to complete. This
+is a non-trivial task at least for modern multi-CPU systems with
+<em>non-uniform memory access</em> (NUMA) where the memory access times
+depend on the memory location and the processor core. Things don't
+get easier if the CPU clock speed can vary and/or scheduling must be
+power-aware to optimize battery time. To make good decisions, some
+information has to be provided by the processes or by a system-wide
+policy. One elementary way to prioritize certain processes over others
+is via <em>nice levels</em> which we shall discuss below. </p>
+
+<p> The normal way for a process to terminate is to call
+<code>exit(3)</code> after it has done its work. This function
+transfers an integer value, the <em>exit status</em>, to the
+kernel. The exit status can only be retrieved by the parent of
+the terminating process. To illustrate this concept, imagine an
+interactive shell which creates one child each time the user enters
+a command. The children are short living while the parent, the shell,
+stays around for much longer. During command execution the parent needs
+to wait for its child to terminate. It may then want to tell whether
+the child has terminated successfully. To achieve this, the parent
+calls one of the <em>wait</em> system calls (<code>wait(2)</code>,
+<code>waitpid(2)</code>, <code>waitid(2)</code>) which block until the
+child has terminated, then return the child's exit status. After the
+child called <code>exit(3)</code> but before the parent has called
+one of the wait functions, the kernel needs to keep at least the
+exit status (and possibly further information) about the terminated
+child. During this time window the child has already terminated but
+a part of it still exists in kernel memory. Processes in this state
+are aptly called <em>zombies</em>. </p>
+
+<p> Unlike in the shell scenario outlined above, a process might well
+have any number of children at the time it terminates. Its children
+then become <em>orphans</em> as they lose their parent. The kernel
+cannot simply remove the terminated process from the process tree
+because this would disconnect its orphaned children from the other
+processes in the tree, destroying the tree structure. To avoid this,
+orphans are <em>reparented to init</em>, that is, made children
+of the init process. This works because the init process never
+terminates. </p>
+
+<p> There are several programs which show information about
+processes. The POSIX <code>ps(1)</code> command prints a list of
+processes. It has many options that control which processes are
+shown and how the output is formatted. Similar programs which are not
+covered by POSIX are <code>pstree(1)</code>, <code>top(1)</code> and
+<code>htop(1)</code>. The former shows the tree structure while the
+latter two provide a dynamic real-time view of the process tree. The
+exercises of this section invite the reader to become familiar with
+these essential programs. </p>
+
+SUBSECTION(«File Execution»)
+
+<p> When a process calls <code>fork(2)</code>, the newly created
+child process starts out as a copy of the calling process. However,
+the reason to create a new process is usually to let the child do
+something different than the parent. Therefore, <code>fork(2)</code>
+is often followed by a second system call which replaces the child
+process with a different program. There are several similar system
+calls which do this, with slight semantic differences. We refer to
+this family of system calls as the <em>exec system calls</em>. </p>
+
+<p> All exec system calls receive a path argument from which they
+determine an executable file that contains the program to run. Linux
+and BSD store executables in <em>Executable and Linkable Format</em>
+(ELF). Executables are typically linked <em>dynamically</em>.
+That is, the dependent libraries (at least libc, but possibly many
+more) are loaded at runtime from separate files. This is in contrast
+to <em>static linking</em> which includes all dependencies in the
+executable, making the executable self-contained but also larger
+and harder to maintain. Regardless of the type of linking, when the
+program is loaded, it completely replaces the previously running
+program. Note that there can be more than one process at the same
+time which executes the same program. </p>
+
+<p> Files in ELF format are called <em>native</em> executables because
+they contain machine instructions which can be executed directly
+by the CPU. Another type of executables are <em>scripts</em>,
+also known as <em>interpreter files</em>. Scripts are text files
+which start with the <em>shebang</em> (<code>#!</code>). They can
+not run directly but have to be <em>interpreted</em> at runtime
+by another program, the <em>interpreter</em>. Nevertheless, it is
+possible to execute a script as if it were a native executable:
+by passing the path to one of the exec system calls or by typing
+the path at the shell prompt. The exec system call recognizes that
+the given file is a script by investigating the first line, which
+contains the path to the interpreter after the shebang, optionally
+followed by options to the interpreter. The kernel then executes the
+interpreter rather than the script, passing the path to the script as
+an argument. For example, if <code>/foo/bar</code> is being executed,
+and the first line of this file is <code>#!/bin/sh</code>, the kernel
+executes <code>/bin/sh /foo/bar</code> instead. Popular interpreters
+besides <code>/bin/sh</code> include <code>/bin/bash</code>,
+<code>/usr/bin/perl</code>, <code>/usr/bin/python</code> and
+<code>/usr/bin/awk</code>. </p>
+
+SUBSECTION(«File Descriptions and File Descriptors»)
+
+<p> The kernel must always be aware of the set of all objects which are
+currently in use. This set is often called the <em>system-wide table
+of open files</em> although not all entries refer to files. In fact, an
+entry may refer to any object that supports I/O operations, for example
+a network socket. Each entry is called a <em>file description</em>,
+which is a somewhat unfortunate term that was coined by POSIX. A
+file description records information about the object itself as well
+as the current state of the reference, including the file offset,
+if applicable, and the <em>status flags</em> which affect how future
+I/O operations are going to be performed through this reference. </p>
+
+<p> The kernel maintains for each process an array of pointers to file
+descriptions. Each such pointer is a <em>file descriptor</em>. Unlike
+files and file descriptions, a file descriptor always corresponds
+to a process and is identified by a non-negative number, the index
+into the pointer array of that process. This index is returned by
+system calls like <code>open(2)</code> or <code>socket(2)</code>.
+As far as user space programs are concerned, a file descriptor is
+synonymous with this integer. It can be regarded as an abstract
+<em>handle</em> that must be supplied to subsequent I/O operations
+like <code>read(2)</code> or <code>write(2)</code> to tell the system
+call the target object of the operation. </p>
+
+<p> The shell automatically creates three file descriptors for each
+process which are identified by the integers 0, 1 and 2. They are
+called <em>stdin</em>, <em>stdout</em>, and <em>stderr</em>, which is
+short for <em>standard input/output/error</em>. It is possible, and in
+fact common, that all three file descriptors point to the same file
+description: the terminal device. Many command line tools read their
+input from stdin, write normal output to stdout, and error messages
+to stderr. For example, when the POSIX command <code>cat(1)</code>
+is invoked with no arguments, it reads data from stdin and writes
+the same data to stdout. </p>
+
+SUBSECTION(«Signals»)
+
+<p> Signals are another ancient Unix concept that dates back to the
+early 1970s and was standardized in POSIX long ago. This concept
+facilitates a rudimentary form of <em>inter process communication</em>
+(IPC) between unrelated processes. Signals can be regarded as software
+interrupts that transmit a notification event from the sending process
+to the target process. The event is sent <em>asynchronously</em>,
+meaning that the interruption can happen at any location of the code
+flow. </p>
+
+<p> It is fair to say that most non-trivial programs, including
+scripts, have to deal with signals. All major scripting languages
+(bash, python, perl, ruby, etc.) provide an API for signal
+handling. The interpreter of the scripting language ends up calling
+the POSIX system functions, so we will only look at these. </p>
+
+<p> Signals are identified by name or by a numerical ID. For example,
+<code>SIGINT</code> (interrupt from keyboard) is the name for signal
+number 2. POSIX defines 31 <em>standard signals</em> plus at least
+eight <em>real-time signals</em>. The standard signals can be
+subdivided according to the origin of the signal as follows. </p>
+
+<dl>
+ <dt> hardware related signals </dt>
+
+ <dd> These signals originate from <em>hardware traps</em> that force
+ the CPU back into kernel mode. The kernel responds to the trap by
+ generating a signal for the process that caused the trap. For example,
+ a division by zero in a user space program triggers a hardware trap
+ in the <em>floating point unit</em> (FPU) of the CPU. The kernel
+ then generates the <code>SIGFPE</code> (floating-point exception)
+ signal for the process. Another example for a signal that originates
+ from a hardware trap is <code>SIGSEGV</code> (segmentation fault)
+ which occurs when a process attempts to reference a memory address
+ that has not been mapped (i.e., marked as valid) by the <em>memory
+ management unit</em> (MMU) of the CPU. </dd>
+
+ <dt> kernel generated signals </dt>
+
+ <dd> Signals which originate from the kernel rather than from
+ hardware. One example is <code>SIGCHLD</code> (child terminated),
+ which is sent to the parent process when one of its child processes
+ terminates. Another example is <code>SIGWINCH</code> (window resize),
+ which is generated when the geometry of the controlling terminal of
+ a process changes. </dd>
+
+ <dt> user-space generated signals </dt>
+
+ <dd> These signals can only originate from user space when a process,
+ for example <code>kill(1)</code>, calls <code>raise(2)</code>
+ or <code>kill(2)</code> to instruct the kernel to generate a
+ signal. Examples are <code>SIGTERM</code>, which issues a termination
+ request, and <code>SIGUSR1</code> and <code>SIGUSR2</code> which are
+ reserved for use by application programs. </dd>
+</dl>
+
+The following signals are used frequently and deserve to the described
+explicitly. We refer to <code>signal(7)</code> for the full list of
+signals and their semantics.
+
+<dl>
+ <dt> <code>SIGINT, SIGTERM</code> and <code>SIGKILL</code> </dt>
+
+ <dd> All three signals terminate the process by default.
+ <code>SIGINT</code> is generated for the <em>foreground processes</em>
+ when the <em>interrupt character</em> (CTRL+C) is pressed in a
+ terminal. For example, if CTRL+C is pressed while the shell pipeline
+ <code>find | wc </code> is executing, <code>SIGINT</code> is sent
+ to both processes of the pipeline. <code>SIGTERM</code> is the
+ default signal for the <code>kill(1)</code> command. It requests
+ the target process to run its shutdown procedure, if any, then
+ terminate. <code>SIGKILL</code> instructs the kernel to terminate the
+ target process immediately, without giving the process the chance to
+ clean up. This signal can originate from a process or from the kernel
+ in response to running out of memory. To keep the system working, the
+ kernel invokes the infamous <em>out of memory killer</em> (OOM killer)
+ which terminates one memory-hungry process to free some memory. </dd>
+
+ <dt> <code>SIGSTOP</code>, <code>SIGTSTP</code> and <code>SIGCONT</code> </dt>
+
+ <dd> <code>SIGSTOP</code> instructs the task scheduler of the kernel to
+ no longer assign any CPU time to the target process until the process
+ is woken up by a subsequent <code>SIGCONT</code>. <code>SIGTSTP</code>
+ (stop typed at terminal) stops all foreground processes of a terminal
+ session. It is generated when the <em>stop character</em> (CTRL+Z)
+ is pressed in a terminal. </dd>
+</dl>
+
+<p> Processes may set the <em>signal disposition</em> of most signals
+to control what happens when the signal arrives. When no disposition
+has been set, the signal is left at its <em>default disposition</em> so
+that the <em>default action</em> is performed to deliver the signal.
+For most signals the default action is to terminate the process,
+but for others the default action is to <em>ignore</em> the signal.
+If the signal is neither ignored nor left at its default disposition,
+it is said to be <em>caught</em> by the process. To catch a signal the
+process must tell the kernel the address of a function, the <em>signal
+handler</em>, to call in order to deliver the signal. The set of
+signal dispositions of a process can thus be imagined as an array
+of function pointers with one pointer per possible signal. If the
+process catches the signal, the pointer points to the corresponding
+signal handler. A NULL pointer represents a signal that was left at
+its default disposition while the special value <code>SIG_IGN</code>
+indicates that the process explicitly asked to ignore this signal. </p>
+
+<p> Signals can also be <em>blocked</em> and <em>unblocked</em>. When
+a signal is generated for a process that has it blocked, it remains
+<em>pending</em>. Pending signals cause no action as long as the
+signal remains blocked but the action will be performed once the
+signal gets unblocked. <code>SIGKILL</code> and <code>SIGSTOP</code>
+can not be caught, blocked, or ignored. </p>
+
+<p> Real-time signals are similar to <code>SIGUSR1</code> and
+<code>SIGUSR2</code> in that they have no predefined meaning but
+can be used for any purpose. However, they have different semantics
+than the standard signals and support additional features. Most
+importantly, real-time signals are <em>queued</em>, meaning that in
+contrast to standard signals the same real-time signal can be pending
+multiple times. Also, the sending process may pass an <em>accompanying
+value</em> along with the signal. The target process can obtain this
+value from its signal handler along with additional information like
+the PID and the UID of the process that sent the signal. </p>
+
+<p> Some system calls including <code>read(2)</code> and
+<code>write(2)</code> may block for an indefinite time. For
+example, reading from a network socket blocks until there is data
+available. What should happen when a signal arrives while the process
+is blocked in a system call? There are two reasonable answers: Either
+the system call is <em>restarted</em>, or the call fails with the
+<code>Interrupted system call</code> error. Unfortunately, different
+flavors of Unix handle this case differently by default. However,
+applications may request either behaviour by setting or clearing the
+<code>SA_RESTART</code> flag on a per-signal basis. </p>
+
+SUBSECTION(«Environment of a Process»)
+
+<p> Now that we have a rough understanding of processes we look
+closer at the information the kernel maintains for each process. We
+already discussed the array of file descriptors and the array of
+signal dispositions. Clearly both are process specific properties.
+As we shall see, there is much more what constitutes the environment
+of a process. </p>
+
+<p> Each process is identified by a unique <em>process ID</em>
+(PID), which is a positive integer. The <code>init</code> process is
+identified by PID 1. PIDs are assigned in ascending order, but are
+usually restricted to the range 1..32767. After this many processes
+have been created, PIDs wrap and unused PIDs are recycled for new
+processes. Thus, on a busy system on which processes are created and
+terminated frequently, the same PID is assigned to multiple processes
+over time. </p>
+
+<p> Not all processes call <code>fork(2)</code> to create a child
+process, but each process except the init process has a unique
+parent. As described before, this is either the "real" parent (the
+process which created the process earlier) or the init process that
+"adopted" the orphaned process in case the real parent terminated
+before the child. The process ID of the parent (PPID) is thus
+well-defined. A process can obtain its PID and PPID with the
+<code>getpid(2)</code> and <code>getppid(2)</code> system calls. </p>
+
+<p> Each process runs on behalf of a user (possibly the superuser)
+which is identified by its <em>user ID</em> (UID) and belongs to
+one or more groups, identified by one or more <em>group IDs</em>
+(GIDs). The superuser is identified by UID zero. When we talked
+about the permission bits of files and directories, we said that
+suitable permissions are needed for system calls which operate on
+files (<code>open(2)</code>, <code>stat(2)</code>, etc.). A more
+precise statement is that the <em>process</em> which calls, say,
+<code>open(2)</code> needs to have these permissions. To decide this,
+the kernel needs to take into account the UID and GIDs of the process
+that called <code>open(2)</code>, the UID and the GID stored in the
+inode of the file that is being opened, and the permission bits of
+this file. The UID is also taken into account for <code>kill(2)</code>
+because unprivileged processes (non-zero UID) can only send signals
+to processes which run on behalf of the same user while the superuser
+may target any process. </p>
+
+<p> Each process has a <em>current working directory</em> (CWD)
+associated with it. When the user logs in, the CWD of the login shell
+process is set to his <em>home directory</em>, which should always
+exist and have the read, write and execute permission bits set for
+the user. The CWD can later be changed with <code>chdir(2)</code>
+and be retrieved with <code>getcwd(3)</code>. The CWD is used as the
+starting point for path searches for relative paths. It affects most
+system calls which receive a path argument. For example, if the CWD
+is <code>/foo/bar</code> and the relative path <code>baz/qux</code>
+is passed to <code>open(2)</code>, the kernel will attempt to open
+the file which is identified by <code>/foo/bar/baz/qux</code>. </p>
+
+<p> Many programs accept arguments to control their behavior.
+In addition to the path to the program that is to be executed,
+all variants of the exec system calls receive an array of arguments
+called the <em>argument vector</em>. For example, when the command
+<code>ls -l foo</code> is executed, the argument vector contains
+the two strings <code>"-l"</code> and <code>"foo"</code>. Note that
+the argument vector is not part of the program but is tied to the
+process. It is passed to the main function at startup so that the
+program may evaluate it and act accordingly. </p>
+
+<p> Another way to pass information to a program is via <em>environment
+variables</em>. Environment variables are strings of the form
+<code>name=value</code>. POSIX describes the API to maintain the
+environment variables of a process. Environment variables are set
+with <code>setenv(3)</code> or <code>putenv(3)</code>, the value of a
+variable can be retrieved with <code>getenv(3)</code>, and a variable
+and its value can be deleted with <code>unsetenv(3)</code>. The set of
+environment variables is sometimes called the <em>environment</em>
+of the process, although we use this term in a broader sense to
+describe the entirety of all metadata maintained by the kernel about
+the process, including but not limited to environment variables. </p>
+
+<p> Each process also has about a dozen <em>resource limits</em>
+that can be set and queried with the POSIX <code>setrlimit(2)</code>
+and <code>getrlimit(2)</code> functions. Each limit controls a
+different aspect. For example, <code>RLIMIT_CPU</code> limits the
+CPU time the process is allowed to use and <code>RLIMIT_NOFILE</code>
+controls how many open files it may have at a time. For each resource
+there is a <em>soft</em> and a <em>hard</em> limit. The kernel
+enforces the value stored as the soft limit. This value may be set
+by an unprivileged process to any value between zero and the hard
+limit. Unprivileged processes may also reduce (but not increase) their
+hard limits. Once a hard limit is reduced, it can not be increased
+any more. For <code>RLIMIT_CPU</code> a special convention applies:
+If the soft limit is reached, the kernel sends <code>SIGXCPU</code>
+(CPU time limit exceeded) to notify the process about this fact so
+that it can terminate orderly (e.g., remove temporary files). When
+the hard limit is reached, the kernel terminates the process as if
+it received <code>SIGKILL</code>. </p>
+
+<p> The <em>nice level</em> of a process provides a hint for
+the task scheduler to give the process a lower or higher priority
+relative to other processes. Nice levels range between -20 and 19. A
+high nice level means that the process wants to be nice to other
+processes, that is, should run with reduced priority. Low nice levels
+indicate important processes that should be prioritized over other
+processes. The default nice level is zero. Unprivileged users may
+set the nice level of new processes with the <code>nice(1)</code>
+command to any non-negative value. They can also increase the nice
+level of their existing processes with <code>renice(1)</code>, but
+never decrease it. The superuser, however, may set the nice level
+of any process to an arbitrary value in the valid range. </p>
+
+<p> The bulk of the properties discussed so far are inherited by the
+child after a <code>fork(2)</code>. Specifically, the child gets the
+same array of file descriptors and signal dispositions as its parent,
+runs on behalf of the same user, has the same working directory,
+the same resource limits and nice level, and also the same set
+of environment variables with identical values. The PID and PPID,
+however, are different of course. </p>
+
+<p> After a process has called an exec function to replace itself with
+a new program, its signal handlers no longer exist because they were
+part of the program code which has been replaced. Therefore the exec
+system calls reset the disposition of all signals that were caught to
+the default disposition. Signals that were being ignored keep being
+ignored, however. </p>
+
+SUBSECTION(«The Process Filesystem»)
+
+<p> Although not covered by POSIX, at least Linux, NetBSD and FreeBSD
+provide information about processes via the <em>process filesystem</em>
+(procfs), which is usually mounted on <code>/proc</code>. The process
+filesystem is a <em>pseudo-filesystem</em>, i.e., it has no underlying
+storage device. Files and directories are faked by the kernel as they
+are accessed. Each process is represented by a numerical subdirectory
+of <code>/proc</code> which is named by the PID. For example,
+<code>/proc/1</code> represents the init process. The aforementioned
+process utilities (<code>ps(1)</code>, <code>top(1)</code>, etc.) read
+the contents of the process filesystem in order to do their job. <p>
+
+<p> Each <code>/proc/[pid]</code> directory contains the same set
+of files although this set is different between Unixes. These files
+expose much of the environment of the process to user space. The Linux
+procfs implementation provides text files named <code>environ</code>
+and <code>limits</code> which contain the current environment and
+the resource limits of the process, respectively. Moreover, the
+file descriptor array of each process is exposed in the files of
+the <code>/proc/[pid]/fd</code> directory. Linux and NetBSD (but not
+FreeBSD) also provide a <code>cwd</code> soft link which points to
+the current working directory of the process. </p>
+
+SUBSECTION(«Pipes and Redirections»)
+
+<p> The <code>pipe(2)</code> system call takes no arguments and
+creates two file descriptors for the calling process which are tied
+together as a unidirectional first in, first out data channel that
+works just like a fifo, but without any files being involved. One
+file descriptor is the <em>read end</em> of the pipe, the other is
+the <em>write end</em>. Data written to the write end is buffered by
+the kernel and can be obtained by reading from the read end. </p>
+
+<p> One application of pipes is communication between
+related processes. A process first creates a pipe, then calls
+<code>fork(2)</code> to create a child process. The child inherits
+a copy of both pipe file descriptors. Hence the parent process can
+communicate with the child by writing a message to the write end of
+the pipe for the child to read. </p>
+
+<p> The POSIX <code>dup(2)</code> and <code>dup2(2)</code> system
+calls allow a process to manipulate the entries of its file descriptor
+array. In particular the standard file descriptors 0, 1, and 2 can be
+replaced. By doing so before performing an exec system call, it can
+be arranged that the replacement program starts with, say, its stdout
+file descriptor be redirected to the write end of a pipe. Note that
+the replacement program does not need any modifications for this to
+work, and might not even be aware of the fact that it is not writing
+its output to the terminal as usual. </p>
+
+<p> Shells employ this technique to implement the <code>|</code>
+operator which "pipes" the output of one command "into" another
+command. For example, the pipeline <code>ls | wc</code> works
+as follows: First the shell creates a pipe, then it calls
+<code>fork(2)</code> twice to create two processes which both
+get a copy of the two pipe file descriptors. The first process
+replaces its stdout file descriptor with the write end of the
+pipe and performs an exec system call to replace itself with the
+<code>ls(1)</code> program. The second process replaces its stdin
+file descriptor with the read end of the pipe and replaces itself
+with <code>wc(1)</code>. Since <code>ls(1)</code> writes to stdout
+and <code>wc(1)</code> reads from stdin, <code>wc(1)</code> processes
+the output of <code>ls(1)</code>. </p>
+
+<p> Note that this trick does not work to establish a connection
+between two <em>existing</em> processes because it depends on file
+descriptor inheritance across <code>fork(2)</code>. In the general
+case one has to fall back to sockets or fifos to create the data
+channel. </p>
+
+SUBSECTION(«Stdio»)
+
+<p> The POSIX standard requires a compliant Unix system to provide
+a collection of functions that let applications perform input and
+output by means of operations on <em>streams</em>. This programming
+interface, known as <em>stdio</em> for <em>standard input/output</em>,
+is part of every Unix system since 1979. Every program which contains
+a <code>printf(3)</code> statement relies on stdio. </p>
+
+<p> The stdio functions are implemented as part of libc on top of the
+<code>open(2)</code>, <code>read(2)</code> and <code>write(2)</code>
+system calls which are implemented in the kernel. Roughly speaking,
+stdio replaces the file descriptor API by a more abstract API
+which centers around streams. A stream is an opaque data structure
+which comprises a file descriptor and an associated data buffer for
+I/O. Each program has three predefined streams which correspond to
+the three standard file descriptors (stdin, stdout and stderr). The
+stdio API contains well over 50 functions to create and maintain
+streams and to perform I/O on streams. These functions take care of
+the characteristics of the underlying file description. For example,
+they automatically try to select the optimal I/O buffer size. </p>
+
+<p> Many applications rely on stdio because of convenience. For
+one, the buffers for <code>read(2)</code> and <code>write(2)</code>
+must be allocated and freed explicitly by the application, and care
+must be taken to not overflow these buffers. With stdio, this task
+is done by the stdio library. Second, <em>formatted</em> I/O is
+much easier to do with the stdio functions because the programmer
+only has to provide a suitable <em>format string</em> to convert
+between the machine representation and the textual representation of
+numbers. For example, by passing the format string <code>"%f"</code>
+to <code>scanf(3)</code>, the programmer tells the stdio library to
+read a floating point number stored in textual representation from the
+specified stream, convert it to machine representation and store it
+in the given variable. The <code>fprintf(3)</code> function works the
+other way round: the value is converted from machine representation
+to text, and this text is written to the stream. Both functions can
+deal with various formats, like scientific notation for floating
+point numbers (e.g., 0.42E-23). With stdio it is easy to customize
+the formatted output, for example add leading zeros or select the
+number of decimal digits in the textual representation of a floating
+point number. </p>
+
+<p> Another reason why many programs rely on stdio is that it performs
+<em>buffered</em> I/O. With buffered I/O not each read/write operation
+results in a system call. Instead, data read from or written to the
+stream is first stored in the user space buffer that is associated
+with the stream. This is a performance improvement for applications
+which perform many small I/O operations because every system call
+incurs some overhead. Buffers may be <em>flushed</em> explicitly by
+calling <code>fflush(3)</code>, or implicitly by the stdio library. The
+criteria which cause an implicit flush depend on the <em>buffering
+type</em> of the stream. Three buffering types exist. </p>
+
+<dl>
+ <dt> unbuffered </dt>
+
+ <dd> The stdio library does not buffer any reads or writes. Instead,
+ each I/O operation results in a <code>read(2)</code> or
+ <code>write(2)</code> system call. By default the stderr stream is
+ unbuffered to display error messages as quickly as possible. </dd>
+
+ <dt> line buffered </dt>
+
+ <dd> System calls are performed when a newline character is
+ encountered. This buffering type applies by default to interactive
+ sessions where the file descriptor of the stream refers to a terminal
+ device (as determined by <code>isatty(3)</code>). </dd>
+
+ <dt> fully buffered </dt>
+
+ <dd> I/O takes place only if the buffer of the stream is empty/full. By
+ default, if the file descriptor refers to a regular file, the
+ stream is fully buffered. POSIX requires that stderr is never fully
+ buffered. </dd>
+</dl>
+
+<p> The exercises on stdio focus on the three different buffering
+types because this is a common source of confusion. </p>
+
+SUBSECTION(«The Virtual Address Space of a Process»)
+
+<p> Isolation refers to the concept that each process gets its own
+<em>virtual address space</em>. A rough understanding of the memory
+management system and the layout of the virtual address space of
+a process helps to locate the source of common problems like the
+infamous <code>segmentation fault</code> error, and to realize that
+putatively simple questions such as "how much memory is my process
+currently using?" are in fact not simple at all, and need to be made
+more precise before they can be answered. </p>
+
+<div>
+
+define(«vas_width», «200»)
+define(«vas_height», «300»)
+define(«vas_vmem_left_margin», «5»)
+define(«vas_vmem_top_margin», «5»)
+define(«vas_mem_width», «20»)
+define(«vas_gap_width», «30»)
+define(«vas_vmem_height», «140»)
+define(«vas_vmem_color», «#34b»)
+define(«vas_pmem_height», «100»)
+define(«vas_pmem_color», «#7e5»)
+define(«vas_vmem_unmapped_color», «#a22»)
+define(«vas_vmem_swapped_color», «yellow»)
+define(«vas_pmem_unavail_color», «orange»)
+define(«vas_disk_gap», «15»)
+define(«vas_disk_height», «20»)
+define(«vas_disk_color», «grey»)
+define(«vas_x1», «vas_vmem_left_margin()»)
+define(«vas_x2», «eval(vas_x1() + vas_mem_width())»)
+define(«vas_x3», «eval(vas_x2() + vas_gap_width())»)
+define(«vas_x4», «eval(vas_x3() + vas_mem_width())»)
+
+define(«vas_membox», «
+ <rect
+ fill="$1" stroke="black" stroke-width="1"
+ x="eval(vas_vmem_left_margin() + $3)"
+ y="vas_vmem_top_margin()"
+ width="vas_mem_width()" height="$2"
+ />
+»)
+define(«vas_vmem_unmapped_box», «
+ <rect
+ fill="vas_vmem_unmapped_color()" stroke="black" stroke-width="1"
+ x="eval(vas_vmem_left_margin())"
+ y="eval(vas_vmem_top_margin() + $1)"
+ width="vas_mem_width()"
+ height="eval($2)"
+ />
+
+»)
+define(«vas_vmem_swapped_box», «
+ <rect
+ fill="vas_vmem_swapped_color()" stroke="black" stroke-width="1"
+ x="eval(vas_vmem_left_margin())"
+ y="eval(vas_vmem_top_margin() + $1)"
+ width="vas_mem_width()"
+ height="eval($2)"
+ />
+
+»)
+define(«vas_pmem_unavail_box», «
+ <rect
+ fill="vas_pmem_unavail_color()" stroke="black" stroke-width="1"
+ x="eval(vas_vmem_left_margin() + vas_mem_width() + vas_gap_width())"
+ y="eval(vas_vmem_top_margin() + $1)"
+ width="vas_mem_width()"
+ height="$2"
+ />
+
+»)
+define(«vas_vmem_hline», «
+ <line
+ x1="vas_vmem_left_margin()"
+ y1="eval(vas_vmem_top_margin() + $1)"
+ x2="eval(vas_vmem_left_margin() + vas_mem_width())"
+ y2="eval(vas_vmem_top_margin() + $1)"
+ stroke-width="1"
+ stroke="black"
+ />
+
+»)
+
+define(«vas_pmem_hline», «
+ «<!-- pmem hline -->»
+ <line
+ x1="eval(vas_vmem_left_margin() + vas_mem_width() + vas_gap_width())"
+ y1="eval(vas_vmem_top_margin() + $1)"
+ x2="eval(vas_vmem_left_margin() + 2 * vas_mem_width() + vas_gap_width())"
+ y2="eval(vas_vmem_top_margin() + $1)"
+ stroke-width="1"
+ stroke="black"
+ />
+
+»)
+define(«vas_arrow», «
+ <line
+ x1="eval(vas_vmem_left_margin() + vas_mem_width())"
+ y1="eval(vas_vmem_top_margin() + $1)"
+ x2="eval(vas_vmem_left_margin() + vas_mem_width() + vas_gap_width() - 2)"
+ y2="eval(vas_vmem_top_margin() + $2)"
+ stroke-width="1"
+ stroke="black"
+ marker-end="url(#arrow)"
+ />
+»)
+define(«vas_disk», «
+ <rect
+ fill="vas_disk_color()" stroke="black" stroke-width="1"
+ x="vas_x3()"
+ y="eval(vas_vmem_top_margin() + vas_pmem_height()
+ + vas_disk_gap())"
+ width="eval(vas_x4() - vas_x3())"
+ height="eval(vas_disk_height())"
+ />
+ <ellipse
+ cx="eval(vas_x3() + vas_mem_width() / 2)"
+ cy="eval(vas_vmem_top_margin() + vas_pmem_height() + vas_disk_gap())"
+ rx="eval(vas_mem_width() / 2)"
+ ry="eval(vas_mem_width() / 4)"
+ fill="vas_disk_color()" stroke="black"
+ />
+ <ellipse
+ cx="eval(vas_x3() + vas_mem_width() / 2)"
+ cy="eval(vas_vmem_top_margin() + vas_pmem_height()
+ + vas_disk_gap() + vas_disk_height())"
+ rx="eval(vas_mem_width() / 2)"
+ ry="eval(vas_mem_width() / 4)"
+ fill="vas_disk_color()" stroke="black"
+ />
+»)
+
+<svg
+ width="vas_width()" height="vas_height()"
+ viewBox="0 0 100 eval(100 * vas_height() / vas_width())"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+>
+ <marker
+ id="arrow"
+ viewBox="0 0 10 10" refX="5" refY="5"
+ markerWidth="4" markerHeight="4"
+ orient="auto-start-reverse">
+ <path d="M 0 0 L 10 5 L 0 10 z" />
+ </marker>
+ vas_membox(«vas_vmem_color()», «vas_vmem_height()», «0»)
+ vas_membox(«vas_pmem_color()», «vas_pmem_height()»,
+ «eval(vas_gap_width() + vas_mem_width())»)
+ vas_vmem_hline(«10»)
+ vas_vmem_hline(«40»)
+ vas_vmem_unmapped_box(«40», «20»)
+ vas_vmem_swapped_box(«60», «60»)
+
+ vas_pmem_unavail_box(«0», «10»)
+ vas_pmem_hline(«20»)
+ vas_pmem_unavail_box(«20», «30»)
+ vas_pmem_hline(«80»)
+
+ vas_arrow(«5», «15»)
+ vas_arrow(«25», «65»)
+ vas_arrow(«130», «90»)
+ vas_disk()
+ vas_arrow(«90», «eval(vas_pmem_height() + vas_disk_gap()
+ + vas_disk_height() / 2)»)
+</svg>
+</div>
+
+<p> Virtual memory is an abstraction of the available memory resources.
+When a process reads from or writes to a memory location, it refers
+to <em>virtual addresses</em> (illustrated as the left box of the
+diagram). Virtual addresses are mapped by the MMU to <em>physical
+addresses</em> which refer to physical memory locations (right
+box). The <em>mapped</em> virtual address space of a process is a
+collection of ranges of virtual addresses which correspond to physical
+memory addresses (blue areas). By storing less frequently-accessed
+chunks of virtual memory (yellow) on the swap area (grey), applications
+can use more memory than is physically available. In this case the
+size of the valid virtual addresses (blue and yellow areas together)
+exceeds the amount of physical memory (orange and green areas). Any
+attempt to access an unmapped memory location (red and yellow areas)
+results in a <em>page fault</em>, a hardware trap which forces the CPU
+back into kernel mode. The kernel then checks whether the address is
+valid (yellow) or invalid (red). If it is invalid, the kernel sends
+<code>SIGSEGV</code>, which usually terminates the process with
+the <code>segmentation fault</code> error. Otherwise it allocates
+a chunk of unused physical memory, copies the chunk from the swap
+area to the newly allocated memory and adjusts the mapping (i.e.,
+a yellow part becomes blue). The virtual memory concept increases
+stability and security because no process can access physical memory
+which belongs to the kernel or to other processes (orange areas). </p>
+
+<p> We've already seen that the <code> fork(2) </code> system call
+creates a new process as a duplicate of the calling process. Since
+the virtual address space of the calling process (a) might be large
+and (b) is likely to be replaced in the child by a subsequent call
+to an exec function, it would be both wasteful and pointless to
+copy the full address space of the parent process to the child. To
+implement <code> fork(2) </code> efficiently, operating systems
+employ an optimization strategy known as <em> Copy on Write </em>
+(CoW). The idea of CoW is that if multiple callers ask for resources
+which are initially indistinguishable, you can give them pointers to
+the same resource. This function can be maintained until a caller
+tries to modify its copy of the resource, at which point a true
+private copy is created to prevent the changes becoming visible to
+everyone else. The primary advantage is that if a caller never makes
+any modifications, no private copy needs ever be created. The <code>
+fork(2) </code> system call marks the pages of the virtual address
+space of both the parent and the child process as CoW by setting a
+special bit in the <em> page table entry </em> which describes the
+mapping between virtual and physical addresses of the MMU. As for
+invalid memory accesses, the attempt to write to a CoW page results
+in a page fault that puts the CPU back into kernel mode. The kernel
+then allocates a new memory page on behalf of the process, copies
+the contents of the page which caused the fault, changes the page
+table mappings for the process accordingly and returns to user space.
+This all happens transparently to the process. </p>
+
+<div>
+define(«asl_width», «300»)
+define(«asl_height», «400»)
+define(«asl_top_margin», «10»)
+define(«asl_text_width», «35»)
+define(«asl_mem_width», «25»)
+define(«asl_mem_color_env», «#fc8»)
+define(«asl_mem_color_stack», «#8fc»)
+define(«asl_mem_color_empty», «#ccc»)
+define(«asl_mem_color_heap», «#c8f»)
+define(«asl_mem_color_bss», «#8cf»)
+define(«asl_mem_color_data», «#cf8»)
+define(«asl_mem_color_text», «#f8c»)
+define(«asl_font_size», «5»)
+
+define(«asl_arrow», «
+ <line
+ x1="0"
+ y1="$1"
+ x2="eval(asl_text_width() - 2)"
+ y2="$1"
+ stroke-width="1"
+ stroke="black"
+ marker-end="url(#arrow)"
+ />
+»)
+define(«asl_arrow_text», «
+ <text
+ x="0"
+ y="$1"
+ font-size="asl_font_size()"
+ >
+ $2
+ </text>
+»)
+
+dnl $1: y0, $2; height, $3: color, $4: high arrow text
+dnl $5: low arrow text, $6: desc
+
+define(«asl_box», «
+ <rect
+ stroke="black"
+ stroke-width="1"
+ x="asl_text_width()"
+ y="eval($1 + asl_top_margin())"
+ height="$2"
+ fill="$3"
+ width="asl_mem_width()"
+ />
+ ifelse(«$4», «», «», «
+ asl_arrow(«eval($1 + asl_top_margin())»)
+ asl_arrow_text(«eval($1 + asl_top_margin() - 2)», «$4»)
+ »)
+ ifelse(«$5», «», «», «
+ asl_arrow(«eval($1 + $2 + asl_top_margin())»)
+ asl_arrow_text(«eval(asl_top_margin()
+ + $1 + $2 - 2)», «$5»)
+ »)
+ <text
+ x="eval(asl_text_width() + asl_mem_width() + 2)"
+ y="eval($1 + $2 / 2 + asl_top_margin())"
+ dy="0.3em"
+ font-size="asl_font_size()"
+ >
+ $6
+ </text>
+»)
+
+<svg
+ width="asl_width()" height="asl_height()"
+ viewBox="0 0 100 eval(100 * asl_height() / asl_width())"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+>
+ asl_box(«0», «10», «asl_mem_color_env», «2^64 - 1», «»,
+ «Environment»)
+ asl_box(«10», «15», «asl_mem_color_stack», «», «base pointer»,
+ «Stack»)
+ asl_box(«25», «30», «asl_mem_color_empty», «», «break point»,
+ «Empty»)
+ asl_box(«55», «35», «asl_mem_color_heap», «», «», «Heap»)
+ asl_box(«90», «10», «asl_mem_color_bss», «», «», «BSS»)
+ asl_box(«100», «10», «asl_mem_color_data», «», «», «Data»)
+ asl_box(«110», «10», «asl_mem_color_text», «», «0», «Text»)
+</svg>
+</div>
+
+<p> The diagram on the left illustrates the layout of the virtual
+address space of a process. At the top of the address space are the
+argument vector and the environment variables. The <em>stack</em>
+stores the local variables of the functions which are currently
+being called, plus house-keeping data like the return addresses
+of these functions. As more functions are called, the stack grows
+downwards towards the lower addresses. Its current lower end is
+called the <em> base pointer</em>. The other variable area of the
+address space is the <em>heap</em>, which contains the memory that
+has been allocated on behalf of the process, for example with <code>
+malloc(3)</code>. As the process allocates more memory, the heap grows
+upwards, towards the stack. The current end of the heap is called the
+<em> break point</em>. The lower part of the address space contains
+three segments of fixed size. The <em>text</em> segment contains the
+compiled machine instructions of the executable, the <em>data</em>
+segment contains the initialized variables which are already known
+at compile time. Finally, the <em>BSS</em> segment is allocated and
+zeroed at execution time. This segment contains variables which should
+initialized to zero at startup. Unlike the data segment it is not
+stored in the executable. BSS stands for "Block Started by Symbol",
+which is a historic name coined in the 1950s. It has no relation to
+the real meaning of the segment. </p>
+
+The exercises of this section invite the reader to look at the virtual
+address space of running processes to learn what happens when a
+dynamically-linked executable is being executed and how the resulting
+memory maps affect the virtual address space of the newly created
+process.
+
+EXERCISES()
+
+<ul>
+ <li> Examine your own processes with <code>htop</code>, <code>ps
+ ux</code> and <code>pstree -panuch $LOGNAME</code>. </li>
+
+ <li> Run <code>ls -l /proc/$$</code> and examine the environment of
+ your shell process. </li>
+
+ <li> Run <code>kill -l</code> and discuss the meaning of signals
+ 1-15. Use <code>signal(7)</code> as a reference. </li>
+
+ <li> Create a zombie process: run <code>sleep 100&</code>. From
+ another terminal, send <code>SIGSTOP</code> to the parent process
+ of the sleep process (the shell), then send <code>SIGKILL</code>
+ to the sleep process. Run <code>cat /proc/$PID/status</code> where
+ <code>$PID</code> is the process ID of the sleep process. </li>
+
+ <li> Run <code>echo $$</code> to obtain the PID of an interactive
+ shell that is running in a terminal. Send the <code>SIGSTOP</code>
+ and <code>SIGCONT</code> signals to this PID from another terminal
+ and see what happens when you type in the terminal that contains the
+ stopped shell process. </li>
+
+ <li> The <code>ping(8)</code> utility catches <code>SIGQUIT</code>.
+ In one terminal execute <code>ping localhost</code>. While this
+ command runs in an endless loop, send <code>SIGQUIT</code> to the
+ ping process from another terminal and see what happens. </li>
+
+ <li> Read <code>kill(2)</code> to learn what <code>kill -9 -1</code>
+ does. Run this command if you are brave. </li>
+
+ <li> Why doesn't the <a href="«#»cd_script">cd script</a> work as
+ expected? </li>
+
+ <li> Explain the difference between the two commands <code>X=foo
+ bar</code> and <code>X=foo; bar</code>. </li>
+
+ <li> Run <code>set</code> and examine the environment variables of
+ an interactive shell session. </li>
+
+ <li> Check this <a
+ href="https://public-inbox.org/git/Pine.LNX.4.64.0609141023130.4388@g5.osdl.org/">email</a>
+ from Linus Torvalds about why stdio is not that simple at all. </li>
+
+ <li> Run the command <code>ls / /does-not-exist</code>, redirect
+ stdout and stderr to different files. </li>
+
+ <li> Consider the following shell code which uses stdio to first write
+ to stdout, then to stderr. <code>echo foo; echo bar 1>&2</code>. Which
+ circumstances guarantee that the "foo" line appears before the "bar"
+ line in the output? </li>
+
+ <li> In the pipeline <code> foo | bar</code>, what is the
+ buffering type of the file descriptor which corresponds to
+ the <code> stdin </code> stream of the <code> bar </code>
+ process? </li>
+
+ <li> Assume <code>foo</code> is a log file which increases due to
+ some process appending data to it. Explain why the command <code>
+ tail -f foo | while read; do echo new data; done </code> does not
+ work as expected. Fix the flaw by changing the buffering type with
+ <code>stdbuf(1)</code>. </li>
+
+ <li> Run <code>sleep 100 > /dev/null &</code>, examine open
+ files of the sleep process by looking at suitable files in
+ <code>/proc</code>. Do the same with <code>sleep 100 | head
+ &</code>. </li>
+
+ <li> Run <code>ldd /bin/sh</code> and explain what happens when a
+ shell is executed. </li>
+
+ <li> On a Linux system, run <code>cat /proc/$$/maps</code> or
+ <code>pmap -x $$</code> to see the address space layout of your
+ shell. Check <code>Documentation/filesystems/proc.txt</code>
+ in the linux kernel source tree for the format of
+ <code>/proc/$$/maps</code>. </li>
+
+ <li> Run <code>cat /proc/$$/smaps</code> and examine the values of
+ the heap section. </li>
+
+ <li> Assume some program allocates a lot of memory so that the size of
+ the valid virtual addresses is 1T large. Assume further that a software
+ bug causes the content of a pointer variable to be overwritten with
+ random garbage. Determine the probability that this pointer variable
+ contains a valid address (assuming a 64 bit system). </li>
+</ul>
+
+HOMEWORK(«
+
+Explain how <code>PR_SET_CHILD_SUBREAPER</code> works and possible
+use-cases for this (Linux) feature.
+
+»)
+
+HOMEWORK(«
+
+Explain in one paragraph of text the purpose of the <em>file
+creation mask</em> (also known as <em>umask</em>) of a process.
+
+»)
+
+HOMEWORK(«
+
+When we said that each process runs on behalf of a user and that the
+ID of this user is part of the process metadata, we were simplifying
+matters. There are actually three different UIDs and three different
+GIDs: the <em>real UID</em>, the <em>effective UID</em>, and the
+<em>saved set-user ID</em>, and analogous for the group IDs. Explain
+the purpose of the three UIDs.
+
+»)
+
+
+HOMEWORK(«
+
+On a multi-CPU system the performance of a program can be
+enhanced by allowing for multiple flows of control. This is the
+idea behind <em>threads</em>, which are also called <em>lightweight
+processes</em>. Give an overview of threads, summarize the POSIX thread
+API (see <code>pthreads(7)</code>) and explain how the Linux-specific
+<code>clone(2)</code> system call can used to implement threads.
+
+»)
+
+HOMEWORK(«
+
+Explain what the command <code>find /etc > /dev/null</code> does,
+and why you get some error messages. Assume you'd like to extract
+only those error messages which contain the string "lvm". Explain
+why <code>find /etc > /dev/null | grep lvm</code> does not work as
+expected. Come up with a similiar command that works.
+
+», «
+
+The command traverses the <code>/etc</code> directory recursively and
+prints all files and directories it encounters during the traversal to
+stdout. Since stdout is redirected to the NULL device by the <code>>
+/dev/null</code> construct, only the stderr stream containing the error
+messages makes it to the terminal. This includes all subdirectories
+of <code>/etc</code> which cannot be traversed due to insufficient
+permissions (no "r" bit set). The proposed <code>find | grep</code>
+command does not work since the <code>|</code> operator is evaluated
+<em>before</em> any redirections specified by the find command
+take place. More precisely, stdout of the find process is redirected
+<em>twice</em>: First to one end of the pipe due to the <code>|</code>,
+then to the NULL device due to the <code>> /dev/null</code>. The
+last redirection "wins", so the <code>grep</code> process does not
+see any input. The command <code>find /etc 2>&1 > /dev/null | grep
+lvm</code> works. The following four redirections take place: First
+stdout of the <code>find</code> process and stdin of <code>grep</code>
+process are redirected to the two ends of the pipe. Next, due to
+the <code>2>&1</code> the stderr stream of the <code>find</code>
+process is redirected to the current destination of stdout, i.e.,
+to the pipe. Finally the <code>> /dev/null</code> redirects stdout
+of the find process to the NULL device. Hence error messages go to
+the pipe and are processed by <code>grep</code> as desired.
+
+»)
+
+HOMEWORK(«
+Run <code>ulimit -n</code> to see the maximal number of file descriptors you
+are allowed to create. Explain what this limit means with respect
+to multiple processes, multiple logins, and the <code>fork(2</code>) system
+call. Write a program in your language of choice which creates file
+descriptors in a loop until it fails due to the file descriptor
+limit. Then print the number of file descriptors the program was able
+to create.
+», «
+On our systems the limit is set to 1024. This means a single process
+can only have this many files open at any given time. Independent
+processes (like those coming from different login sessions) have no
+common file descriptors, even though they may open the same files. In
+this sense the file descriptor limit is a per-process limit. However,
+when a process calls <code>«fork(</code>») to create a new process, the new
+process inherits all open file descriptors from the parent. This can
+lead to the situation where a newly created process is unable to open
+<em>any</em> files. This property was actually used to break computer
+security. The <code>«O_CLOEXEC»</code> flag was introduced not too long
+ago to deal with this problem. See <code>open(2</code>) for details.
+
+C program that opens the maximal possible number of file descriptors:
+
+<pre>
+ int main(void)
+ {
+ int i;
+
+ for (i == 0; open("/dev/null", O_RDONLY) >= 0; i++)
+ ;
+ printf("opened %d file descriptors\n", i);
+ exit(0);
+ }
+</pre>
+»)
+
+HOMEWORK(«
+
+Search the web for the document called
+<code>vm/overcommit-accounting</code>. Discuss the pros and cons of
+the three possible overcommit handling modes.
+
+»)
+
+HOMEWORK(«
+
+Read this
+<a
+href="https://utcc.utoronto.ca/~cks/space/blog/unix/MemoryOvercommit">blog
+posting</a> on the virtual memory overcommit issue. Explain the
+catch-22 situation described there in no more than two sentences.
+
+»)
+
+HOMEWORK(«
+
+Describe, in a single paragraph of text, what a virtual dynamic
+shared object (VDSO) is and which type of applications benefit most
+from it.
+
+»)
+
+HOMEWORK(«
+
+Describe the concept of <em> huge pages </em> and the Linux-specific
+implementation of <em> transparent </em> huge pages. Discuss the pros
+and cons of huge tables and explain the workloads which would benefit
+from a setup with huge pages enabled.
+
+»)
+
+HOMEWORK(«
+<ul>
+ <li> Explain the concept of <em>address space layout randomization</em>
+ (ASLR). </li>
+
+ <li> Run <code>bash -c</code> '<code>cat /proc/$$/maps</code>'
+ repeatedly to see address space layout randomization in action. Discuss
+ the pros and cons of ASLR. </li>
+</ul>
+»)
+
+SUPPLEMENTS()
+
+SUBSECTION(«cd_script»)
+
+<pre>
+ #!/bin/sh
+ echo "changing CWD to $1"
+ cd "$1"
+</pre>
+
+SUBSECTION(«hello_world»)
+
+<pre>
+ #!/bin/sh
+ echo "hello world"
+</pre>
+
+SUBSECTION(«symlink_madness»)
+
+<pre>
+ #!/bin/sh
+ mkdir foo
+ touch foo/a
+ ln -s ../foo foo/testdir
+ ls -l foo/a foo/testdir/a foo/testdir/testdir/a
+</pre>
+
+SECTION(«Further Reading»)
+<ul>
+ <li> <a href="https://lwn.net/Articles/411845/">Ghosts of Unix Past:
+ a historical search for design patterns</a>, by Neil Brown. </li>
+
+ <li> W. Richard Stevens: Advanced Programming in the Unix
+ Environment. Addison Wesley. </li>
+
+</ul>