Unix_Concepts.m4

   1 TITLE(«
   2
   3         Unix is user-friendly. It's just very selective about who
   4         its friends are. -- Unknown
   5
   6 », __file__)
   7
   8 SECTION(«History and Philosophy»)
   9
  10 SUBSECTION(«Early Unix History»)
  11
  12 <p> Unix was created in 1969 as a successor of Multics, the
  13 <em>MULTiplexed Information and Computing Service</em>, which had been
  14 in use since the mid 1960s as the successor of CTSS, the <em>Compatible
  15 Time-Sharing System</em> of the early 1960s. Multics aimed to get
  16 CTSS right, but failed in this regard and was eventually discontinued
  17 because of its complexity. The Unix approach was very different as it
  18 was brainstormed by only three people and then implemented by Ken
  19 Thompson at Bell Laboratories in two days. Unlike its predecessors it
  20 focused on elegance and simplicity. The name was originally spelt
  21 UNICS (<em>UNiplexed Information and Computing Service</em>) to
  22 emphasize the contrast to Multics. </p>
  23
  24 <p> The original Unix implementation was written in assembly
  25 language for the 18 bit processor of the PDP-7 "minicomputer", a
  26 device of the size of a wardrobe which was considered small by the
  27 standards of the day. Like all computers of this era, the PDP-7 was
  28 not connected to a video screen. Instead, input had to be typed in on
  29 the <em>console</em>, a device which looked much like an electric
  30 typewriter. Output from the computer was printed on rolls of paper.
  31 Since the assembly instructions could not easily be ported to different
  32 hardware, Dennis Ritchie invented the C programming language in
  33 1971. By 1973 the complete Unix implementation had been rewritten in C.
  34 The C language was another corner stone in the history of Unix which
  35 turned out to be very successful. While other programming languages
  36 of that time have long been abandoned or play merely a niche role, C
  37 is still one of the most widely used programming languages today. The
  38 first Unix application was <code>roff</code>, a typesetting program
  39 which is still ubiquitous as the manual pages which ship with every
  40 Unix system are formatted with roff. </p>
  41
  42 <p> From the beginning, Thompson and his early collaborators encouraged
  43 close communication between programmers, creating an early form of
  44 community. Up to the present day, this "hacker culture" has been a
  45 stimulus for countless improvements. Copies of Unix were distributed
  46 on tapes, hand-signed with "Love, Ken". Over time many universities
  47 contributed to Unix. By the end of the 1970s, Unix accumulated a
  48 whole bunch of utilities that made it a fully flavored operating
  49 system which was also free of any copyright claims. </p>
  50
  51 <p> Despite the primitive hardware of the time, the early Unix was
  52 remarkably similar to modern Linux systems. For example, the task
  53 scheduler, the hierarchical filesystem tree and the shell already
  54 existed back then. </p>
  55
  56 SUBSECTION(«Networking»)
  57
  58 <p> The <em>Advanced Research Projects Agency</em> (ARPA) was a
  59 military research unit that was part of the USA's department of
  60 defence. It was established in the early 1960s with the mandate to
  61 create systems that could survive a nuclear war. The agency created
  62 the <em>arpanet</em>, the predecessor of today's internet, which was
  63 designed to stay operational after subordinate network losses. By
  64 the end of the 1960s and the early 1970s, the fundamental networking
  65 protocols were established: telnet for remote login was standardized
  66 in 1969, email (SMTP) in 1971, and the file transfer protocol (FTP)
  67 in 1973. </p>
  68
  69 <p> By the end of the 1970s many Unix installations existed in
  70 all parts of the world. However, the arpanet was mostly powered by
  71 commercial Multics systems because Unix only had rudimentary network
  72 support (UUCP, the <em>Unix to Unix copy</em>) which could copy files
  73 over telephone lines via modems but not much more. This changed in
  74 1983 when TCP/IP networking was developed by the ARPA to replace the
  75 arpanet. Unix support for TCP/IP was developed at Berkeley University
  76 which had become the "Mecca" for Unix development and started already
  77 in 1977 to release their own Unix system named BSD, the <em>Berkeley
  78 Software Distribution</em>. </p>
  79
  80 SUBSECTION(«Commercialization, POSIX and GNU»)
  81
  82 <p> With excellent networking support and no licensing issues, it
  83 was only a matter of time until companies became interested in Unix
  84 in order to make money. Several companies started to commercialize
  85 Unix by adding features to the common code base but keeping their
  86 improvements closed, effectively stopping the source code from being
  87 freely distributable. At the same time Microsoft began to sell their
  88 DOS operating system, targeting small businesses and the home market.
  89 DOS lacked many features that Unix already had for a decade, like
  90 multi-tasking and multi-user support, but it did run on the cheap
  91 Intel 286 processors that were too weak for Unix. </p>
  92
  93 <p> By 1985 the commercialization of Unix and the success of Microsoft
  94 had damaged the Unix community badly. But also the various companies
  95 that sold their particular proprietary Unix brand realized that
  96 too many incompatible Unix implementations would only hurt their
  97 business. That's where the Computer Society of the <em>Institute of
  98 Electrical and Electronics Engineers</em> (IEEE) became involved
  99 in Unix. The IEEE is an organization which was already founded in
 100 1946 to advance the theory, practice, and application of computer
 101 technology. This organization created POSIX, the <em>Portable Operating
 102 System Interface for Unix</em>, which is a family of specifications
 103 for maintaining compatibility between operating systems. The first
 104 version of POSIX was published in 1988. It covered several command
 105 line utilities including <code>vi(1)</code> and <code>awk(1)</code>,
 106 the shell scripting language, application programmer interfaces (APIs)
 107 for I/O (input/output) and networking, and more. Up to the present
 108 day POSIX is maintained by the IEEE and new revisions of the POSIX
 109 standard are published regularly. </p>
 110
 111 <p> In 1983 Richard Stallman launched the GNU project and the Free
 112 Software Foundation as a reaction to the ongoing commercialization
 113 of Unix. GNU, which is is a recursive acronym for "GNU's not Unix",
 114 aimed to keep the Unix source code free, or to replace non-free parts
 115 by open source equivalents. To this aim the GNU project created
 116 the <em>GNU General Public License</em> (GPL), which requires not
 117 only the source code to stay free, but also that all subsequent
 118 modifications to the code base remain free. By the end of the 80s,
 119 the GNU toolset had become a full developer software stack licensed
 120 under the GPL. This set of software packages was complemented by
 121 the <em>X window system</em>, which was also released under a free
 122 license and enabled programmers to build graphical applications for
 123 desktop systems.  Moreover, the first open source scripting language,
 124 <em>perl</em>, was released in 1987. </p>
 125
 126 SUBSECTION(«Linux»)
 127
 128 <p> In 1985 Intel announced the 386 processor which, unlike its 286
 129 predecessor, was powerful enough to run Unix. There were efforts to
 130 port the Unix operating system kernel to this hardware, but these
 131 efforts were impaired by pending lawsuits about who owns the copyright
 132 on the BSD source code. Due to the unclear legal situation of the BSD
 133 code, the major missing piece in the GNU toolset was a free operating
 134 system kernel. This hole was filled in 1991 when Linus Torvalds,
 135 a student from Helsinki in Finland, announced the first version of
 136 his <em>Linux</em> kernel. </p>
 137
 138 <p> Linux did not repeat the licensing problems of the original Unix
 139 because the Linux source code was written from scratch and licensed
 140 under the GPL. Due to this difference many developers moved from
 141 Unix to Linux, so Linux grew quickly and started soon to outperform
 142 the commercial Unix kernels in almost every benchmark. The cheap 386
 143 hardware, the Linux kernel, the GNU toolset and the graphical user
 144 interface based on the X window system facilitated cheap workstations
 145 which ran a complete open source software stack. </p>
 146
 147 <p> The success of Linux, or <em>GNU/Linux</em> as some prefer to
 148 call it for reasons that should now be clear, has only increased
 149 over time, to the point where commercial Unix systems are mostly
 150 irrelevant. Today Linux runs on a wide variety of machines ranging
 151 from supercomputers to workstations, smart phones and IOT (internet
 152 of things) devices with very limited resources.
 153
 154 <p> The same companies which almost killed Unix by commercializing it
 155 in order to maximize their profit make money with Linux today. However,
 156 they had to adjust their business model in order to comply with the
 157 GPL. Rather than selling proprietary software, they bundle open source
 158 software and sell support to paying customers. Some companies also
 159 sell hardware with Linux pre-installed. </p>
 160
 161 SUBSECTION(«Linux Distributions»)
 162
 163 <p> A <em>Linux Distribution</em> is a conglomeration of free software,
 164 including the Linux kernel, the GNU toolset and the X window system,
 165 plus possibly other, proprietary software on top of that. Usually a
 166 distribution also includes an installer and a package manager to
 167 make it easy to install and update packages according to the users'
 168 needs. </p>
 169
 170 <p> There are hundreds of Linux distributions, and new distributions
 171 are created all the time while others are discontinued. Many
 172 distributions are backed by companies which target specific
 173 classes of users or hardware, but there are also non-commercial
 174 Linux distributions which are solely driven by a community of
 175 volunteers. </p>
 176
 177 <p> One of the most popular company-backed Linux distributions is
 178 <em>Ubuntu</em>, which is led since 2004 by the UK-based Canonical Ltd.
 179 It targets unskilled desktop users which would like to switch away
 180 from Microsoft Windows. One reason for the popularity of Ubuntu
 181 is that it is very easy to install on standard desktop and laptop
 182 hardware. A distinguishing feature of Ubuntu is its strict release
 183 cycles: New versions are released in April and October of each year,
 184 and every fourth release is a <em>long-term support</em> (LTS) release
 185 which will be supported for at least five years. Ubuntu also features
 186 a variant for server hardware which contains a different Linux kernel
 187 and ships with most desktop packages excluded. </p>
 188
 189 <p> The main community-driven Linux distribution is
 190 <em>Debian</em>. The Debian project was founded in 1993 and the first
 191 stable version was released in 1996. Debian is used as the basis for
 192 many other distributions. In fact, Ubuntu is based on Debian. The
 193 development of Debian closely follows the Unix culture in that it
 194 is developed openly and distributed freely. A team of about 1000
 195 core developers work together with countless package maintainers
 196 according to the Debian Social Contract, the Debian Constitution,
 197 and the Debian Free Software Guidelines. </p>
 198
 199 EXERCISES()
 200
 201 <ul>
 202         <li> Run <code>uname -a</code> on various Unix machines to see the
 203         OS type and the kernel version. </li>
 204
 205         <li> Explore the <a
 206         href="https://upload.wikimedia.org/wikipedia/commons/7/77/Unix_history-simple.svg">Unix
 207         time line</a>. </li>
 208
 209         <li> Try out the <a
 210         href="http://www.gnu.org/cgi-bin/license-quiz.cgi">Free Software
 211         licensing quiz</a>. </li>
 212
 213         <li> On a Debian or Ubuntu system, run <code>aptitude search
 214         python</code> to list all python-related Ubuntu packages. Run
 215         <code>aptitude show python-biopython</code> to see the description
 216         of the biopython package. Repeat with different search patterns and
 217         packages. </li>
 218
 219         <li> The Debian Social Contract (DSC) describes the agenda of Debian.
 220         Find the DSC online, read it and form your own opinion about the key
 221         points stated in this document. </li>
 222 </ul>
 223
 224 SECTION(«Characteristics of a Unix system»)
 225
 226 <p> After having briefly reviewed the history of Unix, we now look
 227 closer at the various components which comprise a Unix system and
 228 which distinguish Unix from other operating systems. We focus on
 229 general design patterns that have existed since the early Unix days
 230 and are still present on recent Linux systems. </p>
 231
 232 SUBSECTION(«Single Hierarchy of Files»)
 233
 234 <p> The most striking difference between Unix and Windows is perhaps
 235 that on Unix the files of all devices are combined to form a single
 236 hierarchy with no concept of drive letters. When the system boots,
 237 there is only one device, the <em>root device</em>, which contains the
 238 <em>root directory</em>. To make the files of other devices visible,
 239 the <em>mount</em> operation must be employed. This operation attaches
 240 the file hierarchy of the given device to the existing hierarchy
 241 at a given location which is then called the <em>mountpoint</em>
 242 of the device. Mountpoints are thus the locations in the hierarchy
 243 where the underlying storage device changes. </p>
 244
 245 <p> The root directory contains a couple of well-known subdirectories,
 246 each of which is supposed to contain files of a certain type or for
 247 a certain purpose. The following table lists a subset:
 248
 249 <ul>
 250         <li> <code>/bin</code>: Essential commands for all users </li>
 251         <li> <code>/sbin</code>: Essential system binaries </li>
 252         <li> <code>/lib</code>: Essential libraries </li>
 253         <li> <code>/usr</code>: Non-essential read-only user data </li>
 254         <li> <code>/etc</code>: Static configuration files </li>
 255         <li> <code>/home</code>: Home directories </li>
 256         <li> <code>/tmp</code>: Temporary files </li>
 257         <li> <code>/run</code>: Files which describe the state of running programs </li>
 258         <li> <code>/var</code>: Log and spool files </li>
 259 </ul>
 260
 261 <p> The <em>Filesystem Hierarchy Standard</em> describes the various
 262 subdirectories in more detail. The exercises ask the reader to become
 263 acquainted with this directory structure. </p>
 264
 265 SUBSECTION(«POSIX Commands and Shell»)
 266
 267 <p> The Filesystem Hierarchy Standard lists <code>/bin</code>
 268 and <code>/sbin</code> and several other directories for executable
 269 files. The POSIX standard defines which executables must exist in one
 270 of these directories for the system to be POSIX-compliant.  Well over
 271 100 <em>POSIX commands</em> are listed in the XCU volume of this
 272 standard. Besides the names of the commands, the general behaviour
 273 of each and the set of command line options and their semantics are
 274 described.  POSIX versions are designed with backwards compatibility
 275 in mind. For example, a new POSIX version might require a command
 276 to support additional command line options, but existing options are
 277 never dropped and never change semantics in incompatible ways. The
 278 target audience of the POSIX document are programmers who implement
 279 and maintain the POSIX commands and users which want to keep their
 280 software portable across different Unix flavors. </p>
 281
 282 <p> One of the POSIX commands is the <em>shell</em>,
 283 <code>/bin/sh</code>, an interpreter that reads input expressed in
 284 the <em>shell command language</em>, which is also part of POSIX.
 285 The shell transforms the input in various ways to produce commands
 286 and then executes these commands. The user may enter shell code
 287 (i.e., code written in the shell command language) interactively at
 288 the <em>command prompt</em>, or supply the input for the shell as a
 289 <em>shell script</em>, a text file which contains shell code. Shell
 290 scripts which only contain POSIX commands and use only POSIX options
 291 are portable between different shell implementations and between
 292 different Unix flavors. They should therefore never cease to work after
 293 an upgrade. Among the many available POSIX shell implementations,
 294 <em>GNU bash</em> is one of the more popular choices. Bash is fully
 295 POSIX compatible and offers many more features on top of what is
 296 required by POSIX. </p>
 297
 298 <p> Several implementations of the POSIX commands exist. On Linux
 299 the GNU implementation is typically installed while FreeBSD, NetBSD
 300 and MacOS contain the BSD versions. Although all implementations
 301 are POSIX-compliant, they differ considerably because different
 302 implementations support different sets of additional features and
 303 options which are not required by POSIX. These extensions are not
 304 portable, and should thus be avoided in shell scripts that must work
 305 on different Unix flavors. </p>
 306
 307 <p> In addition to the POSIX commands, a typical Unix system might well
 308 contain thousands of other commands. This illustrates another aspect
 309 that is characteristic for Unix: Tools should do only one specific
 310 task, and do it well. The operating system provides mechanisms
 311 to combine the simple commands in order to form more powerful
 312 programs. For example, commands can be <em>chained</em> together so
 313 that the output of one command becomes the input for the next command
 314 in the chain. This is the idea behind <em>pipes</em>, a Unix concept
 315 which dates back to 1973 and which is also covered by POSIX. We shall
 316 come back to pipes and related concepts in a later section. </p>
 317
 318 SUBSECTION(«Multi-User, Multi-Tasking, Isolation»)
 319
 320 <p> From the very beginning Unix was designed to be a multi-user
 321 and a multi-tasking operating system. That is, it could run multiple
 322 programs on behalf of different users independently of each other and
 323 isolated from each other. This design was chosen to improve hardware
 324 utilization and robustness. In contrast, DOS and early versions
 325 of Windows were designed for <em>personal computing</em> (PC) and
 326 had no notion of user accounts, access permissions or isolation.
 327 This resulted in an unstable system because a single misbehaving
 328 program was enough to take down the whole system. Therefore these
 329 features had to be retrofitted later.  </p>
 330
 331 <p> While multi-tasking makes all tasks appear to run simultaneously
 332 even if there are more tasks than CPUs, isolation refers to
 333 <em>memory protection</em>, a mechanism which prevents applications
 334 from interfering with each other and with the internals of the
 335 operating system. A running Unix system maintains two sets of
 336 running tasks: besides the <em>application tasks</em> there is
 337 also a set of <em>kernel tasks</em>. Unlike the application tasks,
 338 the kernel tasks are privileged in that they can access the memory
 339 of the application tasks while applications tasks can only access
 340 their own memory. Isolation is achieved by a hardware concept called
 341 <em>protection domains</em>, which existed already in Multics and thus
 342 predates Unix. In the simplest case, there are only two protection
 343 domains: a privileged domain called <em>ring 0</em> for the kernel
 344 tasks, and an unprivileged domain for application tasks, also called
 345 <em>user processes</em> in this context. The CPU is always aware
 346 of the current protection domain as this information is stored in a
 347 special CPU register.
 348
 349 SUBSECTION(«System Calls and the C POSIX Library»)
 350
 351 <p> Only when the CPU is running in the privileged ring 0 domain
 352 (also known as <em>kernel mode</em>, as opposed to <em>user mode</em>
 353 for application tasks), it can interact directly with hardware and
 354 memory. If an application wants to access hardware, for example read a
 355 data block from a storage device, it can not do so by itself. Instead,
 356 it has to ask the operating system to perform the read operation
 357 on behalf of the application. This is done by issuing a <em>system
 358 call</em>. Like function calls, system calls interrupt the current
 359 program, continue execution at a different address and eventually
 360 return to the instruction right after the call. However, in addition
 361 to this, they also cause the CPU to enter kernel mode so that it can
 362 perform the privileged operation. When the system call has done its
 363 work and is about to return to the application, the protection domain
 364 is changed again to let the CPU re-enter user mode. </p>
 365
 366 <p> The system calls thus define the interface between applications
 367 and the operating system. For backwards compatibility it is of utmost
 368 importance that system calls never change semantics in an incompatible
 369 way. Moreover, system calls must never be removed because this would
 370 again break existing applications. The syntax and the semantics
 371 of many system calls are specified in POSIX, although POSIX does
 372 not distinguish between functions and system calls and refers to
 373 both as <em>system functions</em>. This is because system calls are
 374 typically not performed by the application directly. Instead, if an
 375 application calls, for example, <code>read()</code>, it actually calls
 376 the compatibility wrapper for the <code>read</code> system call which
 377 is implemented as a function in the <em>C POSIX Library</em> (libc),
 378 which ships with every Unix system. It is this library which does the
 379 hard work, like figuring out which system calls are supported on the
 380 currently running kernel, and how kernel mode must be entered on this
 381 CPU type. Like the POSIX commands, the system functions described in
 382 POSIX never change in incompatible ways, so programs which exclusively
 383 use POSIX system functions are portable between different Unix flavors
 384 and stay operational after an upgrade. </p>
 385
 386 SUBSECTION(«Multi-Layer Configuration Through Text Files»)
 387
 388 <p> On a multi-user system it becomes necessary to configure programs
 389 according to each user's personal preferences. The Unix way to
 390 achieve this is to provide four levels of configuration options
 391 for each program. First, there are the built-in defaults which are
 392 provided by the author of the program. Next, there is the system-wide
 393 configuration that is controlled by the administrator. Third,
 394 there is the user-defined configuration, and finally there are the
 395 command line options. Each time the program is executed, the four
 396 sets of configuration options are applied one after another so
 397 that the later sets of options override the earlier settings. The
 398 system-wide configuration is stored in <code>/etc</code> while the
 399 user-defined configuration is stored in that user's home directory.
 400 Both are are simple text files that can be examined and modified with
 401 any text editor. This makes it easy to compare two configurations
 402 and to transfer the configuration across different machines or user
 403 accounts. </p>
 404
 405 SUBSECTION(«Everything is a File»)
 406
 407 <p> Another mantra which is often heard in connection with Unix is
 408 <em>everything is a file</em>. This phrase, while certainly catchy,
 409 is slightly incorrect. A more precise version would be <em>everything
 410 is controlled by a file descriptor</em>, or, as Ritchie and Thompson
 411 stated it, Unix has <em>compatible file, device, and inter-process
 412 I/O</em>. Modern Unix systems have pushed this idea further and employ
 413 file descriptors also for networking, process management, system
 414 configuration, and for certain types of events. The file descriptor
 415 concept is thus an abstraction which hides the differences between the
 416 objects the file descriptors refer to. It provides a uniform interface
 417 for the application programmer who does not need to care about whether
 418 a file descriptor refers to a file, a network connection, a peripheral
 419 device or something else because the basic I/O operations like open,
 420 read, write are the same. </p>
 421
 422 <p> File descriptors are ubiquitous since every Unix program uses
 423 them, albeit perhaps implicitly via higher-level interfaces provided
 424 by a scripting language. We shall return to this topic when we discuss
 425 processes. </p>
 426
 427 SUBSECTION(«Manual Pages»)
 428
 429 <p> All POSIX commands and most other programs are installed along
 430 with one or more <em>man pages</em> (short for <em>manual pages</em>),
 431 which are plain text files that can be formatted and displayed in
 432 various ways. This concept was introduced in 1971 as part of the
 433 <em>Unix Programmer's Manual</em>.  The characteristic page layout
 434 and the typical sections (NAME, SYNOPSIS, DESCRIPTION, EXAMPLES,
 435 SEE ALSO) of a man page have not changed since then. The POSIX
 436 <code>man</code> command is used to view man pages in a terminal. For
 437 example, the command <code>man ls</code> opens the man page of the
 438 <code>ls</code> command, and <code>man man</code> shows the man page
 439 of the <code>man</code> command itself. Most implementations also
 440 maintain a database of the existing man pages and provide additional
 441 commands to query this database. For example, the <code>whatis</code>
 442 command prints the one-line description of all man pages which match
 443 a pattern while the <code>apropos</code> command searches the manual
 444 page names and descriptions. </p>
 445
 446 <p> In addition to the man pages for commands, there are man pages for
 447 system calls, library functions, configuration files and more. Each
 448 man page belongs to one of several <em>man sections</em>. For example,
 449 the aforementioned man pages for <code>ls</code> and <code>man</code>
 450 are part of section 1 (user commands) while section 2 is reserved for
 451 system calls and section 8 for administration commands that can only be
 452 executed by privileged users. By convention, to indicate which section
 453 a command or a function belongs to, the man section is appended in
 454 parenthesis as in <code>mount(8)</code>. Most Unix systems also offer
 455 translated man pages for many languages as an optional package. Note
 456 that the same name may refer to more than one man page. For example
 457 there is <code>kill(1)</code> for the user command that kills processes
 458 and also <code>kill(2)</code> which describes the corresponding system
 459 call. To open the man page of a specific section, one may use a command
 460 like <code>man 2 kill</code>.  The <code>MANSECT</code> environment
 461 variable can be set to a colon-delimited list of man sections to
 462 change the order in which the man sections are searched. </p>
 463
 464 <p> Consulting the local man pages rather than searching the web has
 465 some advantages. Most importantly, the local pages will always give
 466 correct answers since they always match the installed software while
 467 there is no such relationship between a particular web documentation
 468 page and the version of the software package that is installed on the
 469 local computer. Working with man pages is also faster, works offline
 470 and helps the user to stay focused on the topic at hand. </p>
 471
 472 EXERCISES()
 473
 474 <ul>
 475         <li> Run <code>df</code> on as many systems as possible to see the
 476         mount points of each filesystem. Then discuss the pros and cons of
 477         a single file hierarchy as opposed to one hierarchy per device. </li>
 478
 479         <li> Run <code>ls /</code> to list all top-level subdirectories of
 480         the root file system and discuss the purpose of each. Consult the
 481         Filesystem Hierarchy Standard if in doubt. </li>
 482
 483         <li> Execute <code>cd / && mc</code> and start surfing at the root
 484          directory.  </li>
 485
 486         <li> Compare the list of top-level directories that exist on different
 487         Unix systems, for example Linux and MacOS. </li>
 488
 489         <li> Find out which type of files are supposed to be stored in
 490         <code>/usr/local/bin</code>. Run <code>ls /usr/local/bin</code>
 491         to list this directory.  </li>
 492
 493         <li> Find out what the term <em>bashism</em> means and learn how to
 494         avoid bashishms. </li>
 495
 496         <li> Find the POSIX specification of the <code>cp(1)</code> command
 497         online and compare the set of options with the options supported by
 498         the GNU version of that command, as obtained with <code>man cp</code>
 499         on a Linux system. </li>
 500
 501         <li>&nbsp;
 502                 <ul>
 503                         <li> Run <code>time ls /</code> and discuss the meaning of
 504                         the three time values shown at the end of the output (see
 505                         <code>bash(1)</code>). </li>
 506
 507                         <li> Guess the user/real and the sys/real ratios for the following
 508                         commands. Answer, before you run the commands.
 509
 510                                 <ul>
 511                                         <li> <code>time head -c 100000000 /dev/urandom > /dev/null</code> </li>
 512
 513                                         <li> <code>i=0; time while ((i++ &lt; 1000000)); do :; done</code>
 514                                         </li>
 515                                 </ul>
 516                         </li>
 517
 518                         <li> Run the above two commands again, this time run
 519                         <code>htop(1)</code> in parallel on another terminal and observe the
 520                         difference. </li>
 521                 </ul>
 522         </li>
 523
 524         <li> On a Linux system, check the list of all system calls in
 525         <code>syscalls(8)</code>. </li>
 526
 527         <li> The <code>strace(1)</code> command prints the system calls that
 528         the given command performs. Guess how many system calls the command
 529         <code>ls -l</code> will make.  Run <code>strace -c ls -l</code> for
 530         the answer. Read the <code>strace(1)</code> man page to find suitable
 531         command line options to only see the system calls which try to open
 532         a file. </li>
 533
 534         <li> Guess how many man pages a given system has. Run <code>whatis -w
 535         '*' | wc -l</code> to see how close your guess was. </li>
 536
 537         <li> Search the web for "cp(1) manual page" and count how many
 538         <em>different</em> manual pages are shown in the first 20 hits. </li>
 539 </ul>
 540
 541 HOMEWORK(«
 542
 543 Think about printers, sound cards, or displays as a file. Specifically,
 544 describe what <code>open, read</code>, and <code>write</code> should
 545 mean for these devices.
 546
 547 », «
 548
 549 Opening would establish a (probably exclusive) connection
 550 to the device.  Reading from the file descriptor returned by
 551 <code>open(2)</code> could return all kinds of status information,
 552 like the type, model and capabilities of the device. For example,
 553 printers could return the number of paper trays, the amount of toner
 554 left etc. Writing to the file descriptor would cause output on the
 555 device. This would mean to print the text that is written, play the
 556 audio samples, or show the given text on the display. The point to
 557 take away is that the <code>open, read, write</code> interface is a
 558 generic concept that works for different kinds of devices, not only
 559 for storing data in a file on a hard disk.
 560
 561 »)
 562
 563 SECTION(«Paths, Files and Directories»)
 564
 565 In this section we look in some detail at paths, at a matching
 566 language for paths, and at the connection between paths and files. We
 567 then describe the seven Unix file types and how file metadata are
 568 stored. We conclude with the characteristics of soft and hard links.
 569
 570 SUBSECTION(«Paths»)
 571
 572 <p> The path concept was introduced in the 1960s with the Multics
 573 operating system. Paths will be familiar to the reader because
 574 they are often specified as arguments to commands. Also many
 575 system calls receive a path argument. A path is a non-empty
 576 string of <em>path components</em> which are separated by slash
 577 characters. An <em>absolute path</em> is a path that starts with a
 578 slash, all other paths are called <em>relative</em>. A relative path
 579 has to be interpreted within a context that implies the leading
 580 part of the path. For example, if the implied leading part is
 581 <code>/foo/bar</code>, the relative path <code>baz/qux</code> is
 582 equivalent to the absolute path <code>/foo/bar/baz/qux</code>. </p>
 583
 584 <p> Given a path, there may or may not exist a file or a
 585 directory that corresponds to the path. <em>Path lookup</em> is the
 586 operation which determines the answer to this question, taking the
 587 implied leading part into account in case of relative paths. This
 588 operation is always performed within the kernel and turns out to
 589 be surprisingly complex due to concurrency and performance issues.
 590 Consult <code>path_resolution(7)</code> on a Linux system to learn
 591 more about how pathnames are resolved to files. </p>
 592
 593 <p> If a path was successfully looked up, each path component up to the
 594 second-last refers to an existing directory while the last component
 595 refers to either a file or a directory. In both cases the directory
 596 identified by the second-last component contains an entry named by the
 597 last component. We call those paths <em>valid</em>. The valid paths
 598 give rise to a rooted tree whose interior nodes are directories and
 599 whose leaf nodes are files or directories. Note that the validity of a
 600 path depends on the set of existing files, not just on the path itself,
 601 and that a valid path may become invalid at any time, for example if
 602 a file is deleted or renamed. Many system calls which receive a path
 603 argument perform path lookup and fail with the <code>No such file or
 604 directory</code> error if the lookup operation fails. </p>
 605
 606 <p> It depends on the underlying filesystem whether the path components
 607 are <em>case-sensitive</em> or <em>case-insensitive</em>. That is,
 608 whether paths which differ only in capitalization (for example
 609 <code>foo</code> and <code>Foo</code>) refer to the same file.
 610 Since the hierarchy of files may be comprised of several filesystems,
 611 some components of the path may be case-sensitive while others are
 612 case-insensitive. As a rule of thumb, Unix filesystems are case
 613 sensitive while Microsoft filesystems are case-insensitive even when
 614 mounted on a Unix system. </p>
 615
 616 <p> Path components may contain every character except the Null
 617 character and the slash. In particular, space and newline characters
 618 are allowed. However, while dots are allowed in path components if
 619 they are used together with other characters, the path components
 620 <code>.</code> and <code>..</code> have a special meaning: every
 621 directory contains two subdirectories named <code>.</code> and
 622 <code>..</code> which refer to the directory itself and its parent
 623 directory, respectively. </p>
 624
 625 SUBSECTION(«Globbing»)
 626
 627 <p> Globbing, also known as <em>pathname expansion</em>, is a pattern
 628 matching language for paths which was already present in the earliest
 629 Unix versions. The glob operation generates a set of valid paths from
 630 a <em>glob pattern</em> by replacing the pattern by all <em>matching
 631 paths</em>. </p>
 632
 633 <p> Glob patterns may contain special characters called
 634 <em>wildcards</em>. The wildcard characters are: </p>
 635
 636         <ul>
 637                 <li> <code>*</code>: match any string, </li>
 638                 <li> <code>?</code>: match any simple character, </li>
 639                 <li> <code>[...]</code>: match any of the enclosed characters. </li>
 640         </ul>
 641
 642 <p> The complete syntax rules for glob patterns and the exact
 643 semantics for pattern matching are described in POSIX and in
 644 <code>glob(7)</code>. Any POSIX-compliant shell performs globbing
 645 to construct the command to be executed from the line entered at
 646 the prompt. However, POSIX also demands system functions which make
 647 globbing available to other applications. These are implemented as
 648 part of libc. </p>
 649
 650
 651 <p> There are a few quirks related to globbing which are worth to
 652 point out. First, if no valid path matches the given pattern, the
 653 expansion of the pattern is, by definition according to POSIX, the
 654 pattern itself. This can lead to unexpected results. Second, files
 655 which start with a dot (so-called <em>hidden</em> files) must be
 656 matched explicitly. For example, <code>rm *</code> does <em>not</em>
 657 remove these files. Third, the tilde character is <em>no</em> wildcard,
 658 although it is also expanded by the shell. See the exercises for more
 659 examples. </p>
 660
 661 <p> POSIX globbing has some limitations. For example, there is no
 662 glob pattern which matches exactly those files that start with an
 663 arbitrary number of <code>a</code> characters. To overcome these
 664 limitations, some shells extend the matching language by implementing
 665 <em>extended glob patterns</em> which are not covered by POSIX. For
 666 example, if extended globbing feature of <code>bash(1)</code> is
 667 activated via the <code>extglob</code> option, the extended glob
 668 pattern <code>+(a)</code> matches the above set of files. </p>
 669
 670 SUBSECTION(«File Types»)
 671
 672 We have seen that all but the last component of a valid path refer
 673 to directories while the last component may refer to either a file
 674 or a directory. The first character in the output of <code>ls
 675 -l</code> indicates the type of the last path component: for
 676 directories a <code>d</code> character is shown while files (also
 677 called <em>regular</em> files in this context) get a hyphen character
 678 (<code>-</code>). Besides directories and regular files, the following
 679 special file types exist:
 680
 681 <dl>
 682         <dt> Soft link (<code>l</code>) </dt>
 683
 684         <dd> A file which acts as a pointer to another file. We shall cover
 685         links in a dedicated subsection below. </dd>
 686
 687         <dt> Device node (<code>c</code> and <code>b</code>) </dt>
 688
 689         <dd> Also called <em>device special</em>. These files refer to devices
 690         on the local system. Device nodes come in two flavors: character
 691         devices (<code>c</code>) and block devices (<code>b</code>). Regardless
 692         of the flavor, each device node has a major and a minor number
 693         associated with it. The major number indicates the type of the
 694         device (e.g. a hard drive, a serial connector, etc.) while the
 695         minor number enumerates devices of the same type. On most systems
 696         the device nodes are created and deleted on the fly as the set of
 697         connected devices changes, for example due to a USB device being
 698         added or removed. However, device nodes can also be created manually
 699         with the <code>mknod(1)</code> command or the <code>mknod(2)</code>
 700         system call. Device nodes do not necessarily correspond to physical
 701         devices. In fact, POSIX demands the existence of a couple of
 702         <em>virtual devices</em> with certain properties. We look at some of
 703         these in the exercises. The access to device nodes which do correspond
 704         to physical devices is usually restricted to privileged users. </dd>
 705
 706         <dt> Socket (<code>s</code>) </dt>
 707
 708         <dd> Sockets provide an interface between a running program and the
 709         network stack of the kernel. They are subdivided into <em>address
 710         families</em> which correspond to the various network protocols. For
 711         example, the <code>AF_INET</code> and <code>AF_INET6</code> address
 712         families are for internet protocols (IP) while <code>AF_LOCAL</code>
 713         (also known as <code>AF_UNIX</code>) is used for communication between
 714         processes on the same machine. These local sockets are also called
 715         <em>Unix domain sockets</em>. They can be bound to a path which
 716         refers to a file of type socket. Regardless of the address family,
 717         processes can exchange data via sockets in both directions, but
 718         the local sockets support additional features, like passing process
 719         credentials to other processes. </dd>
 720
 721         <dt> Fifo (<code>p</code>) </dt>
 722
 723         <dd> Files of type <em>fifo</em> are also known as <em>named
 724         pipes</em>. They associate a path with a kernel object that provides a
 725         <em>First In, First Out</em> data channel for user space programs. Data
 726         written to the fifo by one program can be read back by another program
 727         in the same order. Fifos are created with the <code>mkfifo(1)</code>
 728         command or the <code>mkfifo(3)</code> library function. </dd>
 729 </dl>
 730
 731 <p> Note that the type of a file is never inferred from the path.
 732 In particular the suffix of the path (everything after the last
 733 dot) is just a convention and has no strict connection to the file
 734 type. Also there is no difference between text and binary files. </p>
 735
 736 SUBSECTION(«Metadata and Inodes»)
 737
 738 <p> The <code>stat(2)</code> system call returns the metadata
 739 of the file or directory that corresponds to the given path. The
 740 <code>stat(1)</code> command is a simple program which executes this
 741 system call and prints the thusly obtained metadata in human-readable
 742 form, including the file type. This is done without looking at the
 743 contents of the file because metadata are stored in a special area
 744 called the <em>inode</em>. All types of files (including directories)
 745 have an associated inode. Besides the file type, the inode stores
 746 several other properties prescribed by POSIX. For example the file
 747 size, the owner and group IDs, and the access permissions are all
 748 stored in the inode. Moreover, POSIX requires to maintain three
 749 timestamps in each inode: </p>
 750
 751 <ul>
 752         <li> modification time (mtime): time of last content change. </li>
 753
 754         <li> access time (atime): time of last access. </li>
 755
 756         <li> status change time (ctime): time of last modification to the
 757         inode. </li>
 758 </ul>
 759
 760 <p> To illustrate the difference between the mtime and the ctime,
 761 consider the <code>chgrp(1)</code> command which changes the group
 762 ID of the file or directory identified by its path argument. This
 763 command sets the ctime to the current time while the mtime is left
 764 unmodified. On the other hand, commands which modify the contents of
 765 a file, such as <code>echo foo >> bar</code>, change both the mtime
 766 and the ctime. </p>
 767
 768 <p> The inode of each file or directory contains twelve <em>mode
 769 bits</em>, nine of which are the <em>permission bits</em> which
 770 control who is allowed to access the file or directory, and how. The
 771 permission bits are broken up into three classes called <em>user</em>
 772 (<code>u</code>), <em>group</em> (<code>g</code>) and <em>others</em>
 773 (<code>o</code>). Some texts refer to the first and last class as
 774 "owner" and "world" instead, but we won't use this naming to avoid
 775 confusion. Each class contains three bits. The bits of the "user"
 776 class apply to the file owner, that is, the user whose ID is stored in
 777 the inode. The "group" category applies to all non-owners who belong
 778 to the group whose ID is stored in the inode. The third category
 779 applies to all remaining users. The three bits of each class refer to
 780 read/write/execute permission. They are therefore named <code>r</code>,
 781 <code>w</code> and <code>x</code>, respectively. The permission
 782 bits mean different things for directories and non-directories,
 783 as described below. </p>
 784
 785 <table>
 786         <tr>
 787                 <th> &nbsp;&nbsp; </th>
 788                 <th> Directories </th>
 789                 <th> Non-directories </th>
 790         </tr> <tr>
 791                 <th> <code>r</code> </th>
 792
 793                 <td> The permission to list the directory contents. More precisely,
 794                 this bit grants the permission to call <code>opendir(3)</code>
 795                 to obtain a handle to the directory which can then be passed to
 796                 <code>readdir(3)</code> to obtain the directory contents. </td>
 797
 798                 <td> If read permission is granted, the <code>open(2)</code> system
 799                 call does not fail with the <code>permission denied</code> error,
 800                 provided the file is opened in read-only mode. The system call may
 801                 fail for other reasons, though.
 802
 803         </tr> <tr>
 804                 <th> <code>w</code> </th>
 805
 806                 <td> The permission to add or remove directory entries. That is,
 807                 to create new files or to remove existing files. Note that write
 808                 permission is not required for the file that is being removed. </td>
 809
 810                 <td> Permission to open the file in write-only mode in order
 811                 to perform subsequent operations like <code>write(2)</code>
 812                 and <code>truncate(2)</code> which change the contents of the
 813                 file. Non-directories are often opened with the intention to both
 814                 read and write. Naturally, such opens require both read and write
 815                 permissions. </td>
 816
 817         </tr> <tr>
 818                 <th> <code>x</code> </th>
 819
 820                 <td> The permission to <em>search</em> the directory. Searching
 821                 a directory means to access its entries, either by retrieving
 822                 inode information with <code>stat(2)</code> or by calling
 823                 <code>open(2)</code> on a directory entry. </td>
 824
 825                 <td> Run the file. This applies to <em>binary executables</em> as well
 826                 as to text files which start with a <em>shebang</em>, <code>#!</code>,
 827                 followed by the path to an interpreter. We shall cover file execution
 828                 in more detail below. </td>
 829
 830         </tr>
 831 </table>
 832
 833 <p> To run the regular file <code>/foo/bar/baz</code>, search
 834 permission is needed for both <code>foo</code> and <code>bar</code>,
 835 and execute permission is needed for <code>baz</code>. Similarly, to
 836 open the regular file <code>foo/bar</code> for reading, we need execute
 837 permissions on the current working directory and on <code>foo</code>,
 838 and read permissions on <code>bar</code>. </p>
 839
 840 <p> A <em>numeric permission mode</em> is a three octal digit (0-7)
 841 number, where the digits correspond to the user, group, other classes
 842 described above, in that order. The value of each digit is derived by
 843 adding up the bits with values 4 (read), 2 (write), and 1 (execute).
 844 The following table lists all eight possibilities for each of the
 845 three digits. </p>
 846
 847 <table>
 848         <tr>
 849                 <th> Octal Value </th>
 850                 <th> Symbolic Representation </th>
 851                 <th> Meaning </th>
 852         </tr> <tr>
 853                 <td> 0 </td>
 854                 <td> <code>---</code> </td>
 855                 <td> no permissions at all </td>
 856         </tr> <tr>
 857                 <td> 1 </td>
 858                 <td> <code>--x</code> </td>
 859                 <td> only execute permission </td>
 860         </tr> <tr>
 861                 <td> 2 </td>
 862                 <td> <code>-w-</code> </td>
 863                 <td> only write permission </td>
 864         </tr> <tr>
 865                 <td> 3 </td>
 866                 <td> <code>-wx</code> </td>
 867                 <td> write and execute permission </td>
 868         </tr> <tr>
 869                 <td> 4 </td>
 870                 <td> <code>r--</code> </td>
 871                 <td> only read permission </td>
 872         </tr> <tr>
 873                 <td> 5 </td>
 874                 <td> <code>r-x</code> </td>
 875                 <td> read and execute permission </td>
 876         </tr> <tr>
 877                 <td> 6 </td>
 878                 <td> <code>rw-</code> </td>
 879                 <td> read and write permission </td>
 880         </tr> <tr>
 881                 <td> 7 </td>
 882                 <td> <code>rwx</code> </td>
 883                 <td> read, write and execute permission </td>
 884         </tr>
 885 </table>
 886
 887 <p> The <code>chmod(1)</code> command changes the permission
 888 bits of the file identified by the path argument. For example,
 889 <code>chmod 600 foo</code> sets the permissions of <code>foo</code> to
 890 <code>rw-------</code>. Besides the octal values, <code>chmod(1)</code>
 891 supports symbolic notation to address the three classes described
 892 above: <code>u</code> selects the user class, <code>g</code> the
 893 group class, <code>o</code> the class of other users. The symbolic
 894 value <code>a</code> selects all three classes. Moreover, the letters
 895 <code>r</code>, <code>w</code> and <code>x</code> are used to set or
 896 unset the read, write and execute permission, respectively. The above
 897 command is equivalent to <code>chmod u=rw,g=---,o=--- foo</code>. The
 898 <code>+</code> and <code>-</code> characters can be specified instead
 899 of <code>=</code> to set or unset specific permission bits while
 900 leaving the remaining bits unchanged. For example <code>chmod go-rw
 901 foo</code> turns off read and write permissions for non-owners. </p>
 902
 903 <p> Unprivileged users can only change the mode bits of their own
 904 files or directories while there is no such restriction for the
 905 superuser. </p>
 906
 907 SUBSECTION(«Hard and Soft Links»)
 908
 909 <p> Links make it possible to refer to identical files through
 910 different paths. They come in two flavors: hard and soft. Both
 911 types of links have advantages and disadvantages, and different
 912 limitations. We start with hard links because these existed already
 913 in the earliest Unix versions. </p>
 914
 915 <p> A file can have more than one directory entry that points to its
 916 inode. If two directory entries point to the same inode, they are
 917 said to be <em> hard links</em> of each other. The two entries are
 918 equivalent in that they refer to the same file. It is impossible
 919 to tell which of the two is the "origin" from which the "link"
 920 was created. Hard links are created with the <code>link(2)</code>
 921 system call or the <code>ln(1)</code> command. Both take two path
 922 arguments, one for the existing file and one for the directory entry
 923 to be created. The filesystem maintains in each inode a link counter
 924 which keeps track of the number of directory entries which point to the
 925 inode. The <code>link(2)</code> system call increases the link count
 926 while <code>unlink(2)</code> decrements the link count and removes
 927 the directory entry. If the decremented counter remains positive,
 928 there is still at least one other directory entry which points to
 929 the inode. Hence the file is still accessible through this other
 930 directory entry and the file contents must not be released. Otherwise,
 931 when the link counter reached zero, the inode and the file contents
 932 may be deleted (assuming the file is not in use). </p>
 933
 934 <p> There are several issues with hard links. For one, hard links
 935 can not span filesystems. That is, the two path arguments for
 936 <code>link(2)</code> have to refer to files which reside on the
 937 same filesystem. Second, it is problematic to create hard links to
 938 directories. Early Unix systems allowed this for the superuser,
 939 but on Linux the attempt to hard-link a directory always fails.
 940 To address the limitations of hard links, <em>soft links</em>, also
 941 called <em>symbolic links</em> (or <em>symlinks</em> for short),
 942 were introduced. A soft link can be imagined as a special text file
 943 containing a single absolute or relative path, the <em>target</em> of
 944 the link. For relative paths the implied leading part is the directory
 945 that contains the link. A soft link is thus a named reference in
 946 the global hierarchy of files. Unlike hard links, the soft link
 947 and its target do not need to reside on the same filesystem, and
 948 there is a clear distinction between the link and its target. Soft
 949 links are created with <code>symlink(2)</code> or by specifying the
 950 <code>-s</code> option to the <code>ln(1)</code> command. </p>
 951
 952 <p> A soft link and its target usually point to different inodes. This
 953 raises the following question: Should system calls which receive a
 954 path argument that happens to be a soft link operate on the link
 955 itself, or should they <em>follow</em> (or <em>dereference</em>)
 956 the link and perform the operation on the target? Most system calls
 957 follow soft links, but some don't. For example, if the path argument
 958 to <code>chdir(2)</code> happens to be a soft link, the link is
 959 dereferenced and the working directory is changed to the target of
 960 the link instead. The <code>rename(2)</code> system call, however,
 961 does not follow soft links and renames the link rather than its
 962 target. Other system calls, including <code>open(2)</code>, allow
 963 the caller to specify the desired behaviour by passing a flag to the
 964 system call. For yet others there is a second version of the system
 965 call to control the behaviour. For example, <code>lstat(2)</code> is
 966 identical to <code>stat(2)</code>, but does not follow soft links. </p>
 967
 968 <p> It is possible for a soft link to refer to an invalid path. In
 969 fact, <code>ln(1)</code> and <code>symlink(2)</code> do not consider
 970 it an error if the target does not exist, and happily create a soft
 971 link which points to an invalid path. Such soft links are called
 972 <em>dangling</em> or <em>broken</em>. Dangling soft links also occur
 973 when the target file is removed or renamed, or after a mount point
 974 change. </p>
 975
 976 <p> Soft links may refer to other soft links. System calls which
 977 follow soft links must therefore be prepared to resolve chains of
 978 soft links to determine the file to operate on. However, this is not
 979 always possible because soft links can easily introduce loops into the
 980 hierarchy of files. For example, the commands <code>ln -s foo bar;
 981 ln -s bar foo</code> create such a loop. System calls detect this
 982 and fail with the <code>Too many levels of symbolic links</code>
 983 error when they encounter a loop. </p>
 984
 985 <p> Another issue with both soft and hard links is that there is no
 986 simple way to find all directory entries which point to the same path
 987 (soft links) or inode (hard links). The only way to achieve this is
 988 to traverse the whole hierarchy of files. This may be prohibitive
 989 for large filesystems, and the result is unreliable anyway unless
 990 the filesystems are mounted read-only. </p>
 991
 992 EXERCISES()
 993
 994 <ul>
 995         <li> A path can lack both slashes and components. Give an example
 996         of a path that lacks a slash and another example of a path that has
 997         no components.  </li>
 998
 999         <li> Assume <code>foo</code> is an existing directory. Guess what the
1000         command <code>mv foo bar</code> will do in each of the following cases:
1001         (a) <code>bar</code> does not exist, (b) <code>bar</code> exists and
1002         is a regular file, (c) <code>bar</code> exists and is a directory.
1003         Verify your guess by running the command. </li>
1004
1005         <li> Many programs check if a path is valid and act differently
1006         according to the result. For example, a shell script might
1007         check for the existence of a file with code like <code>if test
1008         -e "$file"; do something_with "$file"; fi</code>. Explain
1009         why this approach is not bullet-proof. How could this be
1010         fixed? </li>
1011
1012         <li> Run <code>touch file-{1..100}</code> to create 100 files. Guess
1013         what the following commands will print. Run each command to confirm.
1014
1015         <ul>
1016                 <li> <code>ls file-</code> </li>
1017                 <li> <code>ls file-*</code> </li>
1018                 <li> <code>ls file-?</code> </li>
1019                 <li> <code>ls file-[1-4]</code> </li>
1020                 <li> <code>ls file-[1,3,5,7,9]*</code> </li>
1021         </ul>
1022         </li>
1023
1024         <li> Find an extended glob pattern for <code>bash(1)</code>
1025         that matches all valid paths whose last component starts with
1026         <code>file-</code>, followed by any number of odd digits (1, 3, 5,
1027         7, or 9). </li>
1028
1029         <li> Point out the flaw in the following shell code: <code>for
1030         f in file-*; do something_with "$f"; done</code>. Hint: Search
1031         <code>bash(1)</code> for "nullglob".
1032
1033         <li> Create a file named <code>-r</code> with <code>echo >
1034         -r</code>. Try to remove the file with <code>rm -r</code> and
1035         discuss why this doesn't work as expected. Find a way to get rid
1036         of the file. Discuss what happens if you run <code>rm *</code> in a
1037         directory which contains a file named <code>-r</code>. </li>
1038
1039         <li> The content of the <code>PATH</code> variable is a
1040         colon-separated list of directories in which the shell looks for
1041         commands to execute. Discuss the dangers of including the current
1042         working directory in this list. </li>
1043
1044         <li> Run <code>id</code> to determine a group <code>G</code> you
1045         belong to but is not your primary group. Consider the following
1046         commands <code>mkdir foo; chgrp $G foo; touch foo/bar</code>. What
1047         is the group ID of <code>foo/bar</code>? Run the same commands, but
1048         insert <code>chmod g+s foo</code> as the second-to-last command. </li>
1049
1050         <li> Run <code>man null</code> and <code>man zero</code> to learn
1051         about the properties of these two character devices. </li>
1052
1053         <li> Assume the modification time stored in the inode of some file
1054         suggests that the file was last modified two years ago. How sure
1055         can you be that the file was never changed since then? Hint: See the
1056         <code>-d</code> option of <code>touch(1)</code>. </li>
1057
1058         <li> Run the following commands <code>echo hello > foo</code>,
1059         <code>cat foo</code>, <code>chmod 600 foo</code>, <code>echo world >>
1060         foo</code>. Check the three timestamps with <code>stat foo</code>
1061         after each command. </li>
1062
1063         <li> Determine the state of the permission bits of your own
1064         home directory by running <code>ls -ld ~</code>. Who can
1065         access its contents? Also look at the permission bits of
1066         other people's home directory. </li>
1067
1068         <li> A file or directory is called <em>world-writeable</em>
1069         if the <code>w</code> bit is set in the <code>others</code>
1070         class of the permission bits. Create a world-writable
1071         directory with <code>mkdir foo; chmod 777 foo</code>
1072         and create a file in the new directory: <code>echo hello
1073         > foo/bar</code>. Is a different user allowed to create
1074         another file there (<code>echo world > foo/baz</code>)? Can
1075         he remove it again (<code>rm foo/baz</code>)? Will he succeed
1076         in removing <code>foo/bar</code> although it is owned by you
1077         and <em>not</em> writable to him? Try the same with the sticky
1078         bit turned on (<code>chmod 1777 foo</code>). </li>
1079
1080         <li> Translate <code>rw-r--r--</code> into octal, and 755 into
1081         <code>rwx</code>-notation. </li>
1082
1083         <li> Create a <a href="#hello_world">hello world script</a>, make it
1084         executable and run it. Create a subdirectory of your home directory
1085         and move the script into this directory. Set the permissions of the
1086         directory to <code>r--------</code>, check whether you still can
1087         list/execute it. Do the same with <code>--x------</code>. </li>
1088
1089         <li> Create a file with <code>echo hello > foo</code>,
1090         create soft and hard links with <code>ln -s foo soft</code>
1091         and <code>ln foo hard</code>. Examine the inode numbers
1092         and link counts using the command <code>stat foo soft
1093         hard</code>. Remove <code>foo</code> and repeat. Try to
1094         access the file content through both links. Guess what
1095         <code>realpath foo soft hard</code> will print. Run the
1096         command to confirm. </li>
1097
1098         <li> Create a dangling symlink with <code>ln -s /nope foo</code>. Would
1099         you expect the commands <code>ls -l foo</code> and <code>cat foo</code>
1100         succeed? Run these commands to verify your guess. </li>
1101
1102         <li> One of the seven Unix file types is symlink. Why is there no
1103         file type for hard links? </li>
1104 </ul>
1105
1106 HOMEWORK(«
1107 How many paths are there that refer to the same file?
1108 », «
1109 Given the path <code>/foo/bar</code>, one may construct different paths
1110 which refer to the same file by inserting any number of <code>/.</code>
1111 or <code>../foo</code> after the first component. For example,
1112 <code>/foo/./bar</code> and <code>/foo/../foo/bar</code> both refer
1113 to the same file. If relative paths have to be taken into account as
1114 well, even more paths can be constructed easily. Hence the answer is:
1115 arbitrary many.
1116
1117 This illustrates the fundamental difference between a path and a
1118 file. Paths can be mapped to files, but not the other way around. In
1119 particular, there is no such thing like "the list of paths which have
1120 changed since yesterday".
1121
1122 The concept of hard- and soft links complicates the situation further.
1123 »)
1124
1125 HOMEWORK(«
1126 Given two paths, how can one tell if they refer to the same file?
1127 », «
1128
1129 Among other information, the metadata record of each file contains the
1130 so-called <em>inode number</em>, which uniquely identifies the file
1131 within the file system that contains the file. Therefore, if both
1132 paths are known to refer to files stored on the same file system,
1133 a comparison of the two inode numbers is sufficient to tell whether
1134 the two paths refer to the same file. The inode number can be obtained
1135 with the command <code>ls -i</code>.
1136
1137 In the general case one additionally has to check that the
1138 two <code>device IDs</code> which identify the underlying file
1139 systems are also identical. Like the inode number, the device ID
1140 is part of the metadata of the file. It can be obtained by running
1141 <code>stat(1)</code>.
1142
1143 »)
1144
1145 HOMEWORK(«
1146 Device nodes come in two flavors: Character and block devices. Explain
1147 the difference between the two device flavors.
1148 »)
1149
1150 HOMEWORK(«
1151
1152 <ul>
1153         <li> Nine of the 12 mode bits of each file are the permission
1154         bits. The remaining three are the <em>sticky</em>, <em>setuid</em>
1155         and <em>setgid</em> bits. Explain the purpose of each. </li>
1156
1157         <li> Run <code>find /usr/bin/ -perm -2000 -ls</code> to see all SUID
1158         executables in <code>/usr/bin</code>. Discuss why those programs have
1159         the SUID bit set. </li>
1160 </ul>
1161
1162 »)
1163
1164 HOMEWORK(«
1165 How many possible permission modes exist for a file or directory on
1166 a Unix System?
1167 », «
1168 There are nine permission bits that can be turned on and off
1169 independently. Hence we have 2^9=512 possibilities. When taking into
1170 account the three special bits (sticky, setuid, setgid), the number
1171 increases to 2^12=4096.
1172 »)
1173
1174 HOMEWORK(«
1175 Explain each command of the <a href="«#»symlink_madness">script</a>
1176 below. Show the arrangement of all files and links in a figure,
1177 drawing a directory as a circle and a file as a square.  How many
1178 different paths exist for the file <code> a</code>? Discuss whether
1179 the question "What's the path of a given file?" makes sense.
1180
1181 », «
1182
1183 <div>
1184 <svg
1185         width="150" height="125"
1186         xmlns="http://www.w3.org/2000/svg"
1187         xmlns:xlink="http://www.w3.org/1999/xlink"
1188 >
1189         <marker
1190                 id="slm_arrow"
1191                 viewBox="0 0 10 10" refX="5" refY="5"
1192                 markerWidth="4" markerHeight="4"
1193                 orient="auto-start-reverse">
1194                 <path d="M 0 0 L 10 5 L 0 10 z" />
1195         </marker>
1196         <circle
1197                 fill="#ccc"
1198                 stroke-width="1"
1199                 stroke="black"
1200                 r=20
1201                 cx=51
1202                 cy=21
1203         />
1204         <text
1205                 x="51"
1206                 y="21"
1207                 stroke="black"
1208                 text-anchor="middle"
1209                 dy="0.3em"
1210         >foo</text>
1211         <rect
1212                 fill="#ccc"
1213                 stroke-width="1"
1214                 stroke="black"
1215                 x=1
1216                 y=81
1217                 width="40"
1218                 height="40"
1219                 rx="5"
1220         />
1221         <text
1222                 x="21"
1223                 y="101"
1224                 stroke="black"
1225                 text-anchor="middle"
1226                 dy="0.3em"
1227         >a</text>
1228         <ellipse
1229                 cx=81
1230                 cy=101
1231                 rx=30
1232                 ry=20
1233                 fill="#ccc"
1234                 stroke-width="1"
1235                 stroke="black"
1236         />
1237         <text
1238                 x="81"
1239                 y="101"
1240                 stroke="black"
1241                 text-anchor="middle"
1242                 dy="0.3em"
1243         >testdir</text>
1244         <line
1245                 stroke="black"
1246                 stroke-width="1"
1247                 x1="41"
1248                 y1="45"
1249                 x2="24"
1250                 y2="75"
1251         />
1252         <line
1253                 stroke="black"
1254                 stroke-width="1"
1255                 x1="61"
1256                 y1="45"
1257                 x2="77"
1258                 y2="75"
1259         />
1260         <path
1261                 d="
1262                         M 118,101
1263                         C 150,90 150,30 80,20
1264                 "
1265                 stroke-width="2"
1266                 stroke="black"
1267                 fill="none"
1268                 marker-end="url(#slm_arrow)"
1269         />
1270 </svg>
1271 </div>
1272
1273 Since <code> foo/a</code>, <code> foo/testdir/a</code>, <code>
1274 foo/testdir/testdir/a </code> etc. all refer to the same file, there
1275 are infinitely many paths for the file <code> a</code>. Hence the
1276 question makes no sense: There is no such thing as <em> the </em>
1277 path to a file.
1278
1279 »)
1280
1281 HOMEWORK(«
1282
1283 Recall that the path component <code>..</code> refers to the
1284 parent directory. Give an example of a path to a directory where
1285 "parent directory" and "directory identified by the path obtained by
1286 removing the last path component" are different. Which of the two
1287 interpretations of <code>..</code> does bash apply when you type
1288 <code>cd ..</code> at the bash prompt?
1289
1290 »)
1291
1292 HOMEWORK(«
1293
1294 Is it possible to choose among all possible paths which refer to the
1295 same file a <em>canonical</em> path? That is, a shortest (counting
1296 characters) absolute path which does not contain any soft links?
1297
1298 », «
1299
1300 <p> The POSIX standard requires each Unix system library to provide
1301 the <code>realpath()</code> function which performs the following
1302 substitutions on the given path: First, the path to the current
1303 working directory is prepended if the given path is relative
1304 (does not begin with a slash). Second, symbolic links are replaced
1305 by their targets. Third, any occurrences of <code>/.</code> and
1306 <code>foo/..</code> are removed. The thusly transformed path is
1307 returned by the function as the canonical path. </p>
1308
1309 <p> Although each path can be canonicalized in this way, not all paths
1310 which refer to the same file give rise to the same canonical path. For
1311 example, <code>/tmp/foo</code> and <code>/tmp/bar</code> could refer
1312 to regular files which are hard links of each other. In this case the
1313 paths refer to the same file, yet the paths are different and already
1314 canonicalized. The same can happen when a file system (or a subtree
1315 of it) is <em>bind mounted</em>. That is, the file system tree is
1316 visible at two or more locations in the global directory tree. </p>
1317
1318 The message of this exercise is to convince the reader that it is
1319 incorrect to assume that two files are different because their paths
1320 are different.
1321
1322 »)
1323
1324 SECTION(«Processes»)
1325
1326 <p> A <em>program</em> consists of instructions and data stored in
1327 a regular file. A <em> user process</em> is an instance of a running
1328 program. This is in contrast to <em>kernel processes</em> which are
1329 created directly by the kernel and have no relationship to executable
1330 files. Since we shall only be concerned with user processes, we will
1331 refer to these as "processes" from now on. In this section we'll see
1332 how processes are created and removed. We will then take a closer look
1333 at the enviroment of a process and discuss how processes communicate
1334 with each other. </p>
1335
1336 SUBSECTION(«Process Tree, Zombies and Orphans»)
1337
1338 <p> When the system boots, there is only one process, the
1339 <em>init</em> process, which is created by the kernel at the end
1340 of the boot sequence by executing <code>/sbin/init</code>. All
1341 other processess are created from existing processes by means of
1342 the <code>fork(2)</code> system call. The process which called
1343 <code>fork(2)</code> is said to be the <em>parent</em> of the newly
1344 created <em>child</em>. After <code>fork(2)</code> returns, both
1345 parent and child are executing independently of each other. Both
1346 processes may call <code>fork(2)</code> again to create further
1347 processes. This gives rise to a tree structure where the processes
1348 are the nodes of the tree with init being the root node. The edges
1349 of the tree describe the parent-child relationships. </p>
1350
1351 <p> If there are more processes than CPUs, not all processes can
1352 run simultaneously. It is the mission of the kernel's <em>task
1353 scheduler</em> to assign processes to CPUs and to perform <em>context
1354 switches</em>. That is, to take away the CPU from a running process
1355 in order to give another process the chance to run. The scheduler
1356 has to choose the duration of each process' time slice and it must
1357 pick the process to switch to when the time slice of the current
1358 process has expired or the process gives up the CPU voluntarily, for
1359 example because it is waiting for an I/O operation to complete. This
1360 is a non-trivial task at least for modern multi-CPU systems with
1361 <em>non-uniform memory access</em> (NUMA) where the memory access times
1362 depend on the memory location and the processor core. Things don't
1363 get easier if the CPU clock speed can vary and/or scheduling must be
1364 power-aware to optimize battery time. To make good decisions, some
1365 information has to be provided by the processes or by a system-wide
1366 policy. One elementary way to prioritize certain processes over others
1367 is via <em>nice levels</em> which we shall discuss below. </p>
1368
1369 <p> The normal way for a process to terminate is to call
1370 <code>exit(3)</code> after it has done its work. This function
1371 transfers an integer value, the <em>exit status</em>, to the
1372 kernel. The exit status can only be retrieved by the parent of
1373 the terminating process. To illustrate this concept, imagine an
1374 interactive shell which creates one child each time the user enters
1375 a command. The children are short living while the parent, the shell,
1376 stays around for much longer. During command execution the parent needs
1377 to wait for its child to terminate. It may then want to tell whether
1378 the child has terminated successfully. To achieve this, the parent
1379 calls one of the <em>wait</em> system calls (<code>wait(2)</code>,
1380 <code>waitpid(2)</code>, <code>waitid(2)</code>) which block until the
1381 child has terminated, then return the child's exit status. After the
1382 child called <code>exit(3)</code> but before the parent has called
1383 one of the wait functions, the kernel needs to keep at least the
1384 exit status (and possibly further information) about the terminated
1385 child. During this time window the child has already terminated but
1386 a part of it still exists in kernel memory. Processes in this state
1387 are aptly called <em>zombies</em>. </p>
1388
1389 <p> Unlike in the shell scenario outlined above, a process might well
1390 have any number of children at the time it terminates. Its children
1391 then become <em>orphans</em> as they lose their parent. The kernel
1392 cannot simply remove the terminated process from the process tree
1393 because this would disconnect its orphaned children from the other
1394 processes in the tree, destroying the tree structure. To avoid this,
1395 orphans are <em>reparented to init</em>, that is, made children
1396 of the init process. This works because the init process never
1397 terminates. </p>
1398
1399 <p> There are several programs which show information about
1400 processes. The POSIX <code>ps(1)</code> command prints a list of
1401 processes. It has many options that control which processes are
1402 shown and how the output is formatted. Similar programs which are not
1403 covered by POSIX are <code>pstree(1)</code>, <code>top(1)</code> and
1404 <code>htop(1)</code>. The former shows the tree structure while the
1405 latter two provide a dynamic real-time view of the process tree. The
1406 exercises of this section invite the reader to become familiar with
1407 these essential programs. </p>
1408
1409 SUBSECTION(«File Execution»)
1410
1411 <p> When a process calls <code>fork(2)</code>, the newly created
1412 child process starts out as a copy of the calling process. However,
1413 the reason to create a new process is usually to let the child do
1414 something different than the parent. Therefore, <code>fork(2)</code>
1415 is often followed by a second system call which replaces the child
1416 process with a different program. There are several similar system
1417 calls which do this, with slight semantic differences. We refer to
1418 this family of system calls as the <em>exec system calls</em>. </p>
1419
1420 <p> All exec system calls receive a path argument from which they
1421 determine an executable file that contains the program to run. Linux
1422 and BSD store executables in <em>Executable and Linkable Format</em>
1423 (ELF). Executables are typically linked <em>dynamically</em>.
1424 That is, the dependent libraries (at least libc, but possibly many
1425 more) are loaded at runtime from separate files. This is in contrast
1426 to <em>static linking</em> which includes all dependencies in the
1427 executable, making the executable self-contained but also larger
1428 and harder to maintain. Regardless of the type of linking, when the
1429 program is loaded, it completely replaces the previously running
1430 program. Note that there can be more than one process at the same
1431 time which executes the same program. </p>
1432
1433 <p> Files in ELF format are called <em>native</em> executables because
1434 they contain machine instructions which can be executed directly
1435 by the CPU. Another type of executables are <em>scripts</em>,
1436 also known as <em>interpreter files</em>. Scripts are text files
1437 which start with the <em>shebang</em> (<code>#!</code>). They can
1438 not run directly but have to be <em>interpreted</em> at runtime
1439 by another program, the <em>interpreter</em>. Nevertheless, it is
1440 possible to execute a script as if it were a native executable:
1441 by passing the path to one of the exec system calls or by typing
1442 the path at the shell prompt. The exec system call recognizes that
1443 the given file is a script by investigating the first line, which
1444 contains the path to the interpreter after the shebang, optionally
1445 followed by options to the interpreter. The kernel then executes the
1446 interpreter rather than the script, passing the path to the script as
1447 an argument. For example, if <code>/foo/bar</code> is being executed,
1448 and the first line of this file is <code>#!/bin/sh</code>, the kernel
1449 executes <code>/bin/sh /foo/bar</code> instead. Popular interpreters
1450 besides <code>/bin/sh</code> include <code>/bin/bash</code>,
1451 <code>/usr/bin/perl</code>, <code>/usr/bin/python</code> and
1452 <code>/usr/bin/awk</code>. </p>
1453
1454 SUBSECTION(«File Descriptions and File Descriptors»)
1455
1456 <p> The kernel must always be aware of the set of all objects which are
1457 currently in use. This set is often called the <em>system-wide table
1458 of open files</em> although not all entries refer to files. In fact, an
1459 entry may refer to any object that supports I/O operations, for example
1460 a network socket. Each entry is called a <em>file description</em>,
1461 which is a somewhat unfortunate term that was coined by POSIX. A
1462 file description records information about the object itself as well
1463 as the current state of the reference, including the file offset,
1464 if applicable, and the <em>status flags</em> which affect how future
1465 I/O operations are going to be performed through this reference. </p>
1466
1467 <p> The kernel maintains for each process an array of pointers to file
1468 descriptions. Each such pointer is a <em>file descriptor</em>. Unlike
1469 files and file descriptions, a file descriptor always corresponds
1470 to a process and is identified by a non-negative number, the index
1471 into the pointer array of that process. This index is returned by
1472 system calls like <code>open(2)</code> or <code>socket(2)</code>.
1473 As far as user space programs are concerned, a file descriptor is
1474 synonymous with this integer. It can be regarded as an abstract
1475 <em>handle</em> that must be supplied to subsequent I/O operations
1476 like <code>read(2)</code> or <code>write(2)</code> to tell the system
1477 call the target object of the operation. </p>
1478
1479 <p> The shell automatically creates three file descriptors for each
1480 process which are identified by the integers 0, 1 and 2. They are
1481 called <em>stdin</em>, <em>stdout</em>, and <em>stderr</em>, which is
1482 short for <em>standard input/output/error</em>. It is possible, and in
1483 fact common, that all three file descriptors point to the same file
1484 description: the terminal device. Many command line tools read their
1485 input from stdin, write normal output to stdout, and error messages
1486 to stderr. For example, when the POSIX command <code>cat(1)</code>
1487 is invoked with no arguments, it reads data from stdin and writes
1488 the same data to stdout. </p>
1489
1490 SUBSECTION(«Signals»)
1491
1492 <p> Signals are another ancient Unix concept that dates back to the
1493 early 1970s and was standardized in POSIX long ago. This concept
1494 facilitates a rudimentary form of <em>inter process communication</em>
1495 (IPC) between unrelated processes. Signals can be regarded as software
1496 interrupts that transmit a notification event from the sending process
1497 to the target process. The event is sent <em>asynchronously</em>,
1498 meaning that the interruption can happen at any location of the code
1499 flow. </p>
1500
1501 <p> It is fair to say that most non-trivial programs, including
1502 scripts, have to deal with signals. All major scripting languages
1503 (bash, python, perl, ruby, etc.) provide an API for signal
1504 handling. The interpreter of the scripting language ends up calling
1505 the POSIX system functions, so we will only look at these. </p>
1506
1507 <p> Signals are identified by name or by a numerical ID. For example,
1508 <code>SIGINT</code> (interrupt from keyboard) is the name for signal
1509 number 2. POSIX defines 31 <em>standard signals</em> plus at least
1510 eight <em>real-time signals</em>. The standard signals can be
1511 subdivided according to the origin of the signal as follows. </p>
1512
1513 <dl>
1514         <dt> hardware related signals </dt>
1515
1516         <dd> These signals originate from <em>hardware traps</em> that force
1517         the CPU back into kernel mode. The kernel responds to the trap by
1518         generating a signal for the process that caused the trap. For example,
1519         a division by zero in a user space program triggers a hardware trap
1520         in the <em>floating point unit</em> (FPU) of the CPU. The kernel
1521         then generates the <code>SIGFPE</code> (floating-point exception)
1522         signal for the process. Another example for a signal that originates
1523         from a hardware trap is <code>SIGSEGV</code> (segmentation fault)
1524         which occurs when a process attempts to reference a memory address
1525         that has not been mapped (i.e., marked as valid) by the <em>memory
1526         management unit</em> (MMU) of the CPU. </dd>
1527
1528         <dt> kernel generated signals </dt>
1529
1530         <dd> Signals which originate from the kernel rather than from
1531         hardware. One example is <code>SIGCHLD</code> (child terminated),
1532         which is sent to the parent process when one of its child processes
1533         terminates. Another example is <code>SIGWINCH</code> (window resize),
1534         which is generated when the geometry of the controlling terminal of
1535         a process changes. </dd>
1536
1537         <dt> user-space generated signals </dt>
1538
1539         <dd> These signals can only originate from user space when a process,
1540         for example <code>kill(1)</code>, calls <code>raise(2)</code>
1541         or <code>kill(2)</code> to instruct the kernel to generate a
1542         signal. Examples are <code>SIGTERM</code>, which issues a termination
1543         request, and <code>SIGUSR1</code> and <code>SIGUSR2</code> which are
1544         reserved for use by application programs. </dd>
1545 </dl>
1546
1547 The following signals are used frequently and deserve to the described
1548 explicitly. We refer to <code>signal(7)</code> for the full list of
1549 signals and their semantics.
1550
1551 <dl>
1552         <dt> <code>SIGINT, SIGTERM</code> and <code>SIGKILL</code> </dt>
1553
1554         <dd> All three signals terminate the process by default.
1555         <code>SIGINT</code> is generated for the <em>foreground processes</em>
1556         when the <em>interrupt character</em> (CTRL+C) is pressed in a
1557         terminal. For example, if CTRL+C is pressed while the shell pipeline
1558         <code>find | wc </code> is executing, <code>SIGINT</code> is sent
1559         to both processes of the pipeline. <code>SIGTERM</code> is the
1560         default signal for the <code>kill(1)</code> command. It requests
1561         the target process to run its shutdown procedure, if any, then
1562         terminate. <code>SIGKILL</code> instructs the kernel to terminate the
1563         target process immediately, without giving the process the chance to
1564         clean up. This signal can originate from a process or from the kernel
1565         in response to running out of memory. To keep the system working, the
1566         kernel invokes the infamous <em>out of memory killer</em> (OOM killer)
1567         which terminates one memory-hungry process to free some memory. </dd>
1568
1569         <dt> <code>SIGSTOP</code>, <code>SIGTSTP</code> and <code>SIGCONT</code> </dt>
1570
1571         <dd> <code>SIGSTOP</code> instructs the task scheduler of the kernel to
1572         no longer assign any CPU time to the target process until the process
1573         is woken up by a subsequent <code>SIGCONT</code>. <code>SIGTSTP</code>
1574         (stop typed at terminal) stops all foreground processes of a terminal
1575         session. It is generated when the <em>stop character</em> (CTRL+Z)
1576         is pressed in a terminal. </dd>
1577 </dl>
1578
1579 <p> Processes may set the <em>signal disposition</em> of most signals
1580 to control what happens when the signal arrives. When no disposition
1581 has been set, the signal is left at its <em>default disposition</em> so
1582 that the <em>default action</em> is performed to deliver the signal.
1583 For most signals the default action is to terminate the process,
1584 but for others the default action is to <em>ignore</em> the signal.
1585 If the signal is neither ignored nor left at its default disposition,
1586 it is said to be <em>caught</em> by the process. To catch a signal the
1587 process must tell the kernel the address of a function, the <em>signal
1588 handler</em>, to call in order to deliver the signal. The set of
1589 signal dispositions of a process can thus be imagined as an array
1590 of function pointers with one pointer per possible signal. If the
1591 process catches the signal, the pointer points to the corresponding
1592 signal handler. A NULL pointer represents a signal that was left at
1593 its default disposition while the special value <code>SIG_IGN</code>
1594 indicates that the process explicitly asked to ignore this signal. </p>
1595
1596 <p> Signals can also be <em>blocked</em> and <em>unblocked</em>. When
1597 a signal is generated for a process that has it blocked, it remains
1598 <em>pending</em>. Pending signals cause no action as long as the
1599 signal remains blocked but the action will be performed once the
1600 signal gets unblocked. <code>SIGKILL</code> and <code>SIGSTOP</code>
1601 can not be caught, blocked, or ignored. </p>
1602
1603 <p> Real-time signals are similar to <code>SIGUSR1</code> and
1604 <code>SIGUSR2</code> in that they have no predefined meaning but
1605 can be used for any purpose. However, they have different semantics
1606 than the standard signals and support additional features. Most
1607 importantly, real-time signals are <em>queued</em>, meaning that in
1608 contrast to standard signals the same real-time signal can be pending
1609 multiple times. Also, the sending process may pass an <em>accompanying
1610 value</em> along with the signal. The target process can obtain this
1611 value from its signal handler along with additional information like
1612 the PID and the UID of the process that sent the signal. </p>
1613
1614 <p> Some system calls including <code>read(2)</code> and
1615 <code>write(2)</code> may block for an indefinite time. For
1616 example, reading from a network socket blocks until there is data
1617 available. What should happen when a signal arrives while the process
1618 is blocked in a system call? There are two reasonable answers: Either
1619 the system call is <em>restarted</em>, or the call fails with the
1620 <code>Interrupted system call</code> error. Unfortunately, different
1621 flavors of Unix handle this case differently by default. However,
1622 applications may request either behaviour by setting or clearing the
1623 <code>SA_RESTART</code> flag on a per-signal basis. </p>
1624
1625 SUBSECTION(«Environment of a Process»)
1626
1627 <p> Now that we have a rough understanding of processes we look
1628 closer at the information the kernel maintains for each process. We
1629 already discussed the array of file descriptors and the array of
1630 signal dispositions. Clearly both are process specific properties.
1631 As we shall see, there is much more what constitutes the environment
1632 of a process. </p>
1633
1634 <p> Each process is identified by a unique <em>process ID</em>
1635 (PID), which is a positive integer. The <code>init</code> process is
1636 identified by PID 1. PIDs are assigned in ascending order, but are
1637 usually restricted to the range 1..32767. After this many processes
1638 have been created, PIDs wrap and unused PIDs are recycled for new
1639 processes. Thus, on a busy system on which processes are created and
1640 terminated frequently, the same PID is assigned to multiple processes
1641 over time. </p>
1642
1643 <p> Not all processes call <code>fork(2)</code> to create a child
1644 process, but each process except the init process has a unique
1645 parent. As described before, this is either the "real" parent (the
1646 process which created the process earlier) or the init process that
1647 "adopted" the orphaned process in case the real parent terminated
1648 before the child. The process ID of the parent (PPID) is thus
1649 well-defined. A process can obtain its PID and PPID with the
1650 <code>getpid(2)</code> and <code>getppid(2)</code> system calls. </p>
1651
1652 <p> Each process runs on behalf of a user (possibly the superuser)
1653 which is identified by its <em>user ID</em> (UID) and belongs to
1654 one or more groups, identified by one or more <em>group IDs</em>
1655 (GIDs). The superuser is identified by UID zero. When we talked
1656 about the permission bits of files and directories, we said that
1657 suitable permissions are needed for system calls which operate on
1658 files (<code>open(2)</code>, <code>stat(2)</code>, etc.). A more
1659 precise statement is that the <em>process</em> which calls, say,
1660 <code>open(2)</code> needs to have these permissions. To decide this,
1661 the kernel needs to take into account the UID and GIDs of the process
1662 that called <code>open(2)</code>, the UID and the GID stored in the
1663 inode of the file that is being opened, and the permission bits of
1664 this file. The UID is also taken into account for <code>kill(2)</code>
1665 because unprivileged processes (non-zero UID) can only send signals
1666 to processes which run on behalf of the same user while the superuser
1667 may target any process. </p>
1668
1669 <p> Each process has a <em>current working directory</em> (CWD)
1670 associated with it. When the user logs in, the CWD of the login shell
1671 process is set to his <em>home directory</em>, which should always
1672 exist and have the read, write and execute permission bits set for
1673 the user. The CWD can later be changed with <code>chdir(2)</code>
1674 and be retrieved with <code>getcwd(3)</code>. The CWD is used as the
1675 starting point for path searches for relative paths. It affects most
1676 system calls which receive a path argument. For example, if the CWD
1677 is <code>/foo/bar</code> and the relative path <code>baz/qux</code>
1678 is passed to <code>open(2)</code>, the kernel will attempt to open
1679 the file which is identified by <code>/foo/bar/baz/qux</code>. </p>
1680
1681 <p> Many programs accept arguments to control their behavior.
1682 In addition to the path to the program that is to be executed,
1683 all variants of the exec system calls receive an array of arguments
1684 called the <em>argument vector</em>. For example, when the command
1685 <code>ls -l foo</code> is executed, the argument vector contains
1686 the two strings <code>"-l"</code> and <code>"foo"</code>. Note that
1687 the argument vector is not part of the program but is tied to the
1688 process. It is passed to the main function at startup so that the
1689 program may evaluate it and act accordingly. </p>
1690
1691 <p> Another way to pass information to a program is via <em>environment
1692 variables</em>. Environment variables are strings of the form
1693 <code>name=value</code>. POSIX describes the API to maintain the
1694 environment variables of a process. Environment variables are set
1695 with <code>setenv(3)</code> or <code>putenv(3)</code>, the value of a
1696 variable can be retrieved with <code>getenv(3)</code>, and a variable
1697 and its value can be deleted with <code>unsetenv(3)</code>. The set of
1698 environment variables is sometimes called the <em>environment</em>
1699 of the process, although we use this term in a broader sense to
1700 describe the entirety of all metadata maintained by the kernel about
1701 the process, including but not limited to environment variables. </p>
1702
1703 <p> Each process also has about a dozen <em>resource limits</em>
1704 that can be set and queried with the POSIX <code>setrlimit(2)</code>
1705 and <code>getrlimit(2)</code> functions. Each limit controls a
1706 different aspect. For example, <code>RLIMIT_CPU</code> limits the
1707 CPU time the process is allowed to use and <code>RLIMIT_NOFILE</code>
1708 controls how many open files it may have at a time. For each resource
1709 there is a <em>soft</em> and a <em>hard</em> limit. The kernel
1710 enforces the value stored as the soft limit. This value may be set
1711 by an unprivileged process to any value between zero and the hard
1712 limit. Unprivileged processes may also reduce (but not increase) their
1713 hard limits. Once a hard limit is reduced, it can not be increased
1714 any more. For <code>RLIMIT_CPU</code> a special convention applies:
1715 If the soft limit is reached, the kernel sends <code>SIGXCPU</code>
1716 (CPU time limit exceeded) to notify the process about this fact so
1717 that it can terminate orderly (e.g., remove temporary files). When
1718 the hard limit is reached, the kernel terminates the process as if
1719 it received <code>SIGKILL</code>. </p>
1720
1721 <p> The <em>nice level</em> of a process provides a hint for
1722 the task scheduler to give the process a lower or higher priority
1723 relative to other processes. Nice levels range between -20 and 19. A
1724 high nice level means that the process wants to be nice to other
1725 processes, that is, should run with reduced priority. Low nice levels
1726 indicate important processes that should be prioritized over other
1727 processes. The default nice level is zero. Unprivileged users may
1728 set the nice level of new processes with the <code>nice(1)</code>
1729 command to any non-negative value. They can also increase the nice
1730 level of their existing processes with <code>renice(1)</code>, but
1731 never decrease it. The superuser, however, may set the nice level
1732 of any process to an arbitrary value in the valid range. </p>
1733
1734 <p> The bulk of the properties discussed so far are inherited by the
1735 child after a <code>fork(2)</code>. Specifically, the child gets the
1736 same array of file descriptors and signal dispositions as its parent,
1737 runs on behalf of the same user, has the same working directory,
1738 the same resource limits and nice level, and also the same set
1739 of environment variables with identical values. The PID and PPID,
1740 however, are different of course. </p>
1741
1742 <p> After a process has called an exec function to replace itself with
1743 a new program, its signal handlers no longer exist because they were
1744 part of the program code which has been replaced. Therefore the exec
1745 system calls reset the disposition of all signals that were caught to
1746 the default disposition. Signals that were being ignored keep being
1747 ignored, however. </p>
1748
1749 SUBSECTION(«The Process Filesystem»)
1750
1751 <p> Although not covered by POSIX, at least Linux, NetBSD and FreeBSD
1752 provide information about processes via the <em>process filesystem</em>
1753 (procfs), which is usually mounted on <code>/proc</code>. The process
1754 filesystem is a <em>pseudo-filesystem</em>, i.e., it has no underlying
1755 storage device. Files and directories are faked by the kernel as they
1756 are accessed. Each process is represented by a numerical subdirectory
1757 of <code>/proc</code> which is named by the PID. For example,
1758 <code>/proc/1</code> represents the init process. The aforementioned
1759 process utilities (<code>ps(1)</code>, <code>top(1)</code>, etc.) read
1760 the contents of the process filesystem in order to do their job. <p>
1761
1762 <p> Each <code>/proc/[pid]</code> directory contains the same set
1763 of files although this set is different between Unixes. These files
1764 expose much of the environment of the process to user space. The Linux
1765 procfs implementation provides text files named <code>environ</code>
1766 and <code>limits</code> which contain the current environment and
1767 the resource limits of the process, respectively. Moreover, the
1768 file descriptor array of each process is exposed in the files of
1769 the <code>/proc/[pid]/fd</code> directory. Linux and NetBSD (but not
1770 FreeBSD) also provide a <code>cwd</code> soft link which points to
1771 the current working directory of the process. </p>
1772
1773 SUBSECTION(«Pipes and Redirections»)
1774
1775 <p> The <code>pipe(2)</code> system call takes no arguments and
1776 creates two file descriptors for the calling process which are tied
1777 together as a unidirectional first in, first out data channel that
1778 works just like a fifo, but without any files being involved. One
1779 file descriptor is the <em>read end</em> of the pipe, the other is
1780 the <em>write end</em>. Data written to the write end is buffered by
1781 the kernel and can be obtained by reading from the read end. </p>
1782
1783 <p> One application of pipes is communication between
1784 related processes. A process first creates a pipe, then calls
1785 <code>fork(2)</code> to create a child process. The child inherits
1786 a copy of both pipe file descriptors. Hence the parent process can
1787 communicate with the child by writing a message to the write end of
1788 the pipe for the child to read. </p>
1789
1790 <p> The POSIX <code>dup(2)</code> and <code>dup2(2)</code> system
1791 calls allow a process to manipulate the entries of its file descriptor
1792 array. In particular the standard file descriptors 0, 1, and 2 can be
1793 replaced. By doing so before performing an exec system call, it can
1794 be arranged that the replacement program starts with, say, its stdout
1795 file descriptor be redirected to the write end of a pipe. Note that
1796 the replacement program does not need any modifications for this to
1797 work, and might not even be aware of the fact that it is not writing
1798 its output to the terminal as usual. </p>
1799
1800 <p> Shells employ this technique to implement the <code>|</code>
1801 operator which "pipes" the output of one command "into" another
1802 command. For example, the pipeline <code>ls | wc</code> works
1803 as follows: First the shell creates a pipe, then it calls
1804 <code>fork(2)</code> twice to create two processes which both
1805 get a copy of the two pipe file descriptors. The first process
1806 replaces its stdout file descriptor with the write end of the
1807 pipe and performs an exec system call to replace itself with the
1808 <code>ls(1)</code> program. The second process replaces its stdin
1809 file descriptor with the read end of the pipe and replaces itself
1810 with <code>wc(1)</code>. Since <code>ls(1)</code> writes to stdout
1811 and <code>wc(1)</code> reads from stdin, <code>wc(1)</code> processes
1812 the output of <code>ls(1)</code>. </p>
1813
1814 <p> Note that this trick does not work to establish a connection
1815 between two <em>existing</em> processes because it depends on file
1816 descriptor inheritance across <code>fork(2)</code>. In the general
1817 case one has to fall back to sockets or fifos to create the data
1818 channel. </p>
1819
1820 SUBSECTION(«Stdio»)
1821
1822 <p> The POSIX standard requires a compliant Unix system to provide
1823 a collection of functions that let applications perform input and
1824 output by means of operations on <em>streams</em>. This programming
1825 interface, known as <em>stdio</em> for <em>standard input/output</em>,
1826 is part of every Unix system since 1979. Every program which contains
1827 a <code>printf(3)</code> statement relies on stdio. </p>
1828
1829 <p> The stdio functions are implemented as part of libc on top of the
1830 <code>open(2)</code>, <code>read(2)</code> and <code>write(2)</code>
1831 system calls which are implemented in the kernel. Roughly speaking,
1832 stdio replaces the file descriptor API by a more abstract API
1833 which centers around streams. A stream is an opaque data structure
1834 which comprises a file descriptor and an associated data buffer for
1835 I/O. Each program has three predefined streams which correspond to
1836 the three standard file descriptors (stdin, stdout and stderr). The
1837 stdio API contains well over 50 functions to create and maintain
1838 streams and to perform I/O on streams. These functions take care of
1839 the characteristics of the underlying file description. For example,
1840 they automatically try to select the optimal I/O buffer size. </p>
1841
1842 <p> Many applications rely on stdio because of convenience. For
1843 one, the buffers for <code>read(2)</code> and <code>write(2)</code>
1844 must be allocated and freed explicitly by the application, and care
1845 must be taken to not overflow these buffers. With stdio, this task
1846 is done by the stdio library. Second, <em>formatted</em> I/O is
1847 much easier to do with the stdio functions because the programmer
1848 only has to provide a suitable <em>format string</em> to convert
1849 between the machine representation and the textual representation of
1850 numbers. For example, by passing the format string <code>"%f"</code>
1851 to <code>scanf(3)</code>, the programmer tells the stdio library to
1852 read a floating point number stored in textual representation from the
1853 specified stream, convert it to machine representation and store it
1854 in the given variable. The <code>fprintf(3)</code> function works the
1855 other way round: the value is converted from machine representation
1856 to text, and this text is written to the stream. Both functions can
1857 deal with various formats, like scientific notation for floating
1858 point numbers (e.g., 0.42E-23). With stdio it is easy to customize
1859 the formatted output, for example add leading zeros or select the
1860 number of decimal digits in the textual representation of a floating
1861 point number. </p>
1862
1863 <p> Another reason why many programs rely on stdio is that it performs
1864 <em>buffered</em> I/O. With buffered I/O not each read/write operation
1865 results in a system call. Instead, data read from or written to the
1866 stream is first stored in the user space buffer that is associated
1867 with the stream. This is a performance improvement for applications
1868 which perform many small I/O operations because every system call
1869 incurs some overhead. Buffers may be <em>flushed</em> explicitly by
1870 calling <code>fflush(3)</code>, or implicitly by the stdio library. The
1871 criteria which cause an implicit flush depend on the <em>buffering
1872 type</em> of the stream. Three buffering types exist. </p>
1873
1874 <dl>
1875         <dt> unbuffered </dt>
1876
1877         <dd> The stdio library does not buffer any reads or writes. Instead,
1878         each I/O operation results in a <code>read(2)</code> or
1879         <code>write(2)</code> system call. By default the stderr stream is
1880         unbuffered to display error messages as quickly as possible. </dd>
1881
1882         <dt> line buffered </dt>
1883
1884         <dd> System calls are performed when a newline character is
1885         encountered. This buffering type applies by default to interactive
1886         sessions where the file descriptor of the stream refers to a terminal
1887         device (as determined by <code>isatty(3)</code>). </dd>
1888
1889         <dt> fully buffered </dt>
1890
1891         <dd> I/O takes place only if the buffer of the stream is empty/full. By
1892         default, if the file descriptor refers to a regular file, the
1893         stream is fully buffered. POSIX requires that stderr is never fully
1894         buffered. </dd>
1895 </dl>
1896
1897 <p> The exercises on stdio focus on the three different buffering
1898 types because this is a common source of confusion. </p>
1899
1900 SUBSECTION(«The Virtual Address Space of a Process»)
1901
1902 <p> Isolation refers to the concept that each process gets its own
1903 <em>virtual address space</em>. A rough understanding of the memory
1904 management system and the layout of the virtual address space of
1905 a process helps to locate the source of common problems like the
1906 infamous <code>segmentation fault</code> error, and to realize that
1907 putatively simple questions such as "how much memory is my process
1908 currently using?" are in fact not simple at all, and need to be made
1909 more precise before they can be answered. </p>
1910
1911 <div>
1912
1913 define(«vas_width», «200»)
1914 define(«vas_height», «300»)
1915 define(«vas_vmem_left_margin», «5»)
1916 define(«vas_vmem_top_margin», «5»)
1917 define(«vas_mem_width», «20»)
1918 define(«vas_gap_width», «30»)
1919 define(«vas_vmem_height», «140»)
1920 define(«vas_vmem_color», «#34b»)
1921 define(«vas_pmem_height», «100»)
1922 define(«vas_pmem_color», «#7e5»)
1923 define(«vas_vmem_unmapped_color», «#a22»)
1924 define(«vas_vmem_swapped_color», «yellow»)
1925 define(«vas_pmem_unavail_color», «orange»)
1926 define(«vas_disk_gap», «15»)
1927 define(«vas_disk_height», «20»)
1928 define(«vas_disk_color», «grey»)
1929 define(«vas_x1», «vas_vmem_left_margin()»)
1930 define(«vas_x2», «eval(vas_x1() + vas_mem_width())»)
1931 define(«vas_x3», «eval(vas_x2() + vas_gap_width())»)
1932 define(«vas_x4», «eval(vas_x3() + vas_mem_width())»)
1933
1934 define(«vas_membox», «
1935         <rect
1936                 fill="$1" stroke="black" stroke-width="1"
1937                 x="eval(vas_vmem_left_margin() + $3)"
1938                 y="vas_vmem_top_margin()"
1939                 width="vas_mem_width()" height="$2"
1940         />
1941 »)
1942 define(«vas_vmem_unmapped_box», «
1943         <rect
1944                 fill="vas_vmem_unmapped_color()" stroke="black" stroke-width="1"
1945                 x="eval(vas_vmem_left_margin())"
1946                 y="eval(vas_vmem_top_margin() + $1)"
1947                 width="vas_mem_width()"
1948                 height="eval($2)"
1949         />
1950
1951 »)
1952 define(«vas_vmem_swapped_box», «
1953         <rect
1954                 fill="vas_vmem_swapped_color()" stroke="black" stroke-width="1"
1955                 x="eval(vas_vmem_left_margin())"
1956                 y="eval(vas_vmem_top_margin() + $1)"
1957                 width="vas_mem_width()"
1958                 height="eval($2)"
1959         />
1960
1961 »)
1962 define(«vas_pmem_unavail_box», «
1963         <rect
1964                 fill="vas_pmem_unavail_color()" stroke="black" stroke-width="1"
1965                 x="eval(vas_vmem_left_margin() + vas_mem_width() + vas_gap_width())"
1966                 y="eval(vas_vmem_top_margin() + $1)"
1967                 width="vas_mem_width()"
1968                 height="$2"
1969         />
1970
1971 »)
1972 define(«vas_vmem_hline», «
1973         <line
1974                 x1="vas_vmem_left_margin()"
1975                 y1="eval(vas_vmem_top_margin() + $1)"
1976                 x2="eval(vas_vmem_left_margin() + vas_mem_width())"
1977                 y2="eval(vas_vmem_top_margin() + $1)"
1978                 stroke-width="1"
1979                 stroke="black"
1980         />
1981
1982 »)
1983
1984 define(«vas_pmem_hline», «
1985         «<!-- pmem hline -->»
1986         <line
1987                 x1="eval(vas_vmem_left_margin() + vas_mem_width() + vas_gap_width())"
1988                 y1="eval(vas_vmem_top_margin() + $1)"
1989                 x2="eval(vas_vmem_left_margin() + 2 * vas_mem_width() + vas_gap_width())"
1990                 y2="eval(vas_vmem_top_margin() + $1)"
1991                 stroke-width="1"
1992                 stroke="black"
1993         />
1994
1995 »)
1996 define(«vas_arrow», «
1997         <line
1998                 x1="eval(vas_vmem_left_margin() + vas_mem_width())"
1999                 y1="eval(vas_vmem_top_margin() + $1)"
2000                 x2="eval(vas_vmem_left_margin() + vas_mem_width() + vas_gap_width() - 2)"
2001                 y2="eval(vas_vmem_top_margin() + $2)"
2002                 stroke-width="1"
2003                 stroke="black"
2004                 marker-end="url(#arrow)"
2005         />
2006 »)
2007 define(«vas_disk», «
2008         <rect
2009                 fill="vas_disk_color()" stroke="black" stroke-width="1"
2010                 x="vas_x3()"
2011                 y="eval(vas_vmem_top_margin() + vas_pmem_height()
2012                         + vas_disk_gap())"
2013                 width="eval(vas_x4() - vas_x3())"
2014                 height="eval(vas_disk_height())"
2015         />
2016         <ellipse
2017                 cx="eval(vas_x3() + vas_mem_width() / 2)"
2018                 cy="eval(vas_vmem_top_margin() + vas_pmem_height() + vas_disk_gap())"
2019                 rx="eval(vas_mem_width() / 2)"
2020                 ry="eval(vas_mem_width() / 4)"
2021                 fill="vas_disk_color()" stroke="black"
2022         />
2023         <ellipse
2024                 cx="eval(vas_x3() + vas_mem_width() / 2)"
2025                 cy="eval(vas_vmem_top_margin() + vas_pmem_height()
2026                         + vas_disk_gap() + vas_disk_height())"
2027                 rx="eval(vas_mem_width() / 2)"
2028                 ry="eval(vas_mem_width() / 4)"
2029                 fill="vas_disk_color()" stroke="black"
2030         />
2031 »)
2032
2033 <svg
2034         width="vas_width()" height="vas_height()"
2035         viewBox="0 0 100 eval(100 * vas_height() / vas_width())"
2036         xmlns="http://www.w3.org/2000/svg"
2037         xmlns:xlink="http://www.w3.org/1999/xlink"
2038 >
2039         <marker
2040                 id="arrow"
2041                 viewBox="0 0 10 10" refX="5" refY="5"
2042                 markerWidth="4" markerHeight="4"
2043                 orient="auto-start-reverse">
2044                 <path d="M 0 0 L 10 5 L 0 10 z" />
2045         </marker>
2046         vas_membox(«vas_vmem_color()», «vas_vmem_height()», «0»)
2047         vas_membox(«vas_pmem_color()», «vas_pmem_height()»,
2048                 «eval(vas_gap_width() + vas_mem_width())»)
2049         vas_vmem_hline(«10»)
2050         vas_vmem_hline(«40»)
2051         vas_vmem_unmapped_box(«40», «20»)
2052         vas_vmem_swapped_box(«60», «60»)
2053
2054         vas_pmem_unavail_box(«0», «10»)
2055         vas_pmem_hline(«20»)
2056         vas_pmem_unavail_box(«20», «30»)
2057         vas_pmem_hline(«80»)
2058
2059         vas_arrow(«5», «15»)
2060         vas_arrow(«25», «65»)
2061         vas_arrow(«130», «90»)
2062         vas_disk()
2063         vas_arrow(«90», «eval(vas_pmem_height() + vas_disk_gap()
2064                 + vas_disk_height() / 2)»)
2065 </svg>
2066 </div>
2067
2068 <p> Virtual memory is an abstraction of the available memory resources.
2069 When a process reads from or writes to a memory location, it refers
2070 to <em>virtual addresses</em> (illustrated as the left box of the
2071 diagram). Virtual addresses are mapped by the MMU to <em>physical
2072 addresses</em> which refer to physical memory locations (right
2073 box). The <em>mapped</em> virtual address space of a process is a
2074 collection of ranges of virtual addresses which correspond to physical
2075 memory addresses (blue areas). By storing less frequently-accessed
2076 chunks of virtual memory (yellow) on the swap area (grey), applications
2077 can use more memory than is physically available. In this case the
2078 size of the valid virtual addresses (blue and yellow areas together)
2079 exceeds the amount of physical memory (orange and green areas). Any
2080 attempt to access an unmapped memory location (red and yellow areas)
2081 results in a <em>page fault</em>, a hardware trap which forces the CPU
2082 back into kernel mode. The kernel then checks whether the address is
2083 valid (yellow) or invalid (red). If it is invalid, the kernel sends
2084 <code>SIGSEGV</code>, which usually terminates the process with
2085 the <code>segmentation fault</code> error. Otherwise it allocates
2086 a chunk of unused physical memory, copies the chunk from the swap
2087 area to the newly allocated memory and adjusts the mapping (i.e.,
2088 a yellow part becomes blue). The virtual memory concept increases
2089 stability and security because no process can access physical memory
2090 which belongs to the kernel or to other processes (orange areas). </p>
2091
2092 <p> We've already seen that the <code> fork(2) </code> system call
2093 creates a new process as a duplicate of the calling process. Since
2094 the virtual address space of the calling process (a) might be large
2095 and (b) is likely to be replaced in the child by a subsequent call
2096 to an exec function, it would be both wasteful and pointless to
2097 copy the full address space of the parent process to the child. To
2098 implement <code> fork(2) </code> efficiently, operating systems
2099 employ an optimization strategy known as <em> Copy on Write </em>
2100 (CoW). The idea of CoW is that if multiple callers ask for resources
2101 which are initially indistinguishable, you can give them pointers to
2102 the same resource. This function can be maintained until a caller
2103 tries to modify its copy of the resource, at which point a true
2104 private copy is created to prevent the changes becoming visible to
2105 everyone else. The primary advantage is that if a caller never makes
2106 any modifications, no private copy needs ever be created. The <code>
2107 fork(2) </code> system call marks the pages of the virtual address
2108 space of both the parent and the child process as CoW by setting a
2109 special bit in the <em> page table entry </em> which describes the
2110 mapping between virtual and physical addresses of the MMU. As for
2111 invalid memory accesses, the attempt to write to a CoW page results
2112 in a page fault that puts the CPU back into kernel mode. The kernel
2113 then allocates a new memory page on behalf of the process, copies
2114 the contents of the page which caused the fault, changes the page
2115 table mappings for the process accordingly and returns to user space.
2116 This all happens transparently to the process. </p>
2117
2118 <div>
2119 define(«asl_width», «300»)
2120 define(«asl_height», «400»)
2121 define(«asl_top_margin», «10»)
2122 define(«asl_text_width», «35»)
2123 define(«asl_mem_width», «25»)
2124 define(«asl_mem_color_env», «#fc8»)
2125 define(«asl_mem_color_stack», «#8fc»)
2126 define(«asl_mem_color_empty», «#ccc»)
2127 define(«asl_mem_color_heap», «#c8f»)
2128 define(«asl_mem_color_bss», «#8cf»)
2129 define(«asl_mem_color_data», «#cf8»)
2130 define(«asl_mem_color_text», «#f8c»)
2131 define(«asl_font_size», «5»)
2132
2133 define(«asl_arrow», «
2134         <line
2135                 x1="0"
2136                 y1="$1"
2137                 x2="eval(asl_text_width() - 2)"
2138                 y2="$1"
2139                 stroke-width="1"
2140                 stroke="black"
2141                 marker-end="url(#arrow)"
2142         />
2143 »)
2144 define(«asl_arrow_text», «
2145         <text
2146                 x="0"
2147                 y="$1"
2148                 font-size="asl_font_size()"
2149         >
2150                 $2
2151         </text>
2152 »)
2153
2154 dnl $1: y0, $2; height, $3: color, $4: high arrow text
2155 dnl $5: low arrow text, $6: desc
2156
2157 define(«asl_box», «
2158         <rect
2159                 stroke="black"
2160                 stroke-width="1"
2161                 x="asl_text_width()"
2162                 y="eval($1 + asl_top_margin())"
2163                 height="$2"
2164                 fill="$3"
2165                 width="asl_mem_width()"
2166         />
2167         ifelse(«$4», «», «», «
2168                 asl_arrow(«eval($1 + asl_top_margin())»)
2169                 asl_arrow_text(«eval($1 + asl_top_margin() - 2)», «$4»)
2170         »)
2171         ifelse(«$5», «», «», «
2172                 asl_arrow(«eval($1 + $2 + asl_top_margin())»)
2173                 asl_arrow_text(«eval(asl_top_margin()
2174                         + $1 + $2 - 2)», «$5»)
2175         »)
2176         <text
2177                 x="eval(asl_text_width() + asl_mem_width() + 2)"
2178                 y="eval($1 + $2 / 2 + asl_top_margin())"
2179                 dy="0.3em"
2180                 font-size="asl_font_size()"
2181         >
2182                 $6
2183         </text>
2184 »)
2185
2186 <svg
2187         width="asl_width()" height="asl_height()"
2188         viewBox="0 0 100 eval(100 * asl_height() / asl_width())"
2189         xmlns="http://www.w3.org/2000/svg"
2190         xmlns:xlink="http://www.w3.org/1999/xlink"
2191 >
2192         asl_box(«0», «10», «asl_mem_color_env», «2^64 - 1», «»,
2193                  «Environment»)
2194         asl_box(«10», «15», «asl_mem_color_stack», «», «base pointer»,
2195                  «Stack»)
2196         asl_box(«25», «30», «asl_mem_color_empty», «», «break point»,
2197                  «Empty»)
2198         asl_box(«55», «35», «asl_mem_color_heap», «», «», «Heap»)
2199         asl_box(«90», «10», «asl_mem_color_bss», «», «», «BSS»)
2200         asl_box(«100», «10», «asl_mem_color_data», «», «», «Data»)
2201         asl_box(«110», «10», «asl_mem_color_text», «», «0», «Text»)
2202 </svg>
2203 </div>
2204
2205 <p> The diagram on the left illustrates the layout of the virtual
2206 address space of a process. At the top of the address space are the
2207 argument vector and the environment variables. The <em>stack</em>
2208 stores the local variables of the functions which are currently
2209 being called, plus house-keeping data like the return addresses
2210 of these functions. As more functions are called, the stack grows
2211 downwards towards the lower addresses. Its current lower end is
2212 called the <em> base pointer</em>. The other variable area of the
2213 address space is the <em>heap</em>, which contains the memory that
2214 has been allocated on behalf of the process, for example with <code>
2215 malloc(3)</code>. As the process allocates more memory, the heap grows
2216 upwards, towards the stack. The current end of the heap is called the
2217 <em> break point</em>. The lower part of the address space contains
2218 three segments of fixed size. The <em>text</em> segment contains the
2219 compiled machine instructions of the executable, the <em>data</em>
2220 segment contains the initialized variables which are already known
2221 at compile time. Finally, the <em>BSS</em> segment is allocated and
2222 zeroed at execution time. This segment contains variables which should
2223 initialized to zero at startup. Unlike the data segment it is not
2224 stored in the executable. BSS stands for "Block Started by Symbol",
2225 which is a historic name coined in the 1950s. It has no relation to
2226 the real meaning of the segment. </p>
2227
2228 The exercises of this section invite the reader to look at the virtual
2229 address space of running processes to learn what happens when a
2230 dynamically-linked executable is being executed and how the resulting
2231 memory maps affect the virtual address space of the newly created
2232 process.
2233
2234 EXERCISES()
2235
2236 <ul>
2237         <li> Examine your own processes with <code>htop</code>, <code>ps
2238         ux</code> and <code>pstree -panuch $LOGNAME</code>. </li>
2239
2240         <li> Run <code>ls -l /proc/$$</code> and examine the environment of
2241         your shell process. </li>
2242
2243         <li> Run <code>kill -l</code> and discuss the meaning of signals
2244         1-15. Use <code>signal(7)</code> as a reference. </li>
2245
2246         <li> Create a zombie process: run <code>sleep 100&</code>. From
2247         another terminal, send <code>SIGSTOP</code> to the parent process
2248         of the sleep process (the shell), then send <code>SIGKILL</code>
2249         to the sleep process. Run <code>cat /proc/$PID/status</code> where
2250         <code>$PID</code> is the process ID of the sleep process. </li>
2251
2252         <li> Run <code>echo $$</code> to obtain the PID of an interactive
2253         shell that is running in a terminal. Send the <code>SIGSTOP</code>
2254         and <code>SIGCONT</code> signals to this PID from another terminal
2255         and see what happens when you type in the terminal that contains the
2256         stopped shell process. </li>
2257
2258         <li> The <code>ping(8)</code> utility catches <code>SIGQUIT</code>.
2259         In one terminal execute <code>ping localhost</code>. While this
2260         command runs in an endless loop, send <code>SIGQUIT</code> to the
2261         ping process from another terminal and see what happens. </li>
2262
2263         <li> Read <code>kill(2)</code> to learn what <code>kill -9 -1</code>
2264         does. Run this command if you are brave. </li>
2265
2266         <li> Why doesn't the <a href="«#»cd_script">cd script</a> work as
2267         expected? </li>
2268
2269         <li> Explain the difference between the two commands <code>X=foo
2270         bar</code> and <code>X=foo; bar</code>. </li>
2271
2272         <li> Run <code>set</code> and examine the environment variables of
2273         an interactive shell session. </li>
2274
2275         <li> Check this <a
2276         href="https://public-inbox.org/git/Pine.LNX.4.64.0609141023130.4388@g5.osdl.org/">email</a>
2277         from Linus Torvalds about why stdio is not that simple at all. </li>
2278
2279         <li> Run the command <code>ls / /does-not-exist</code>, redirect
2280         stdout and stderr to different files. </li>
2281
2282         <li> Consider the following shell code which uses stdio to first write
2283         to stdout, then to stderr. <code>echo foo; echo bar 1>&2</code>. Which
2284         circumstances guarantee that the "foo" line appears before the "bar"
2285         line in the output? </li>
2286
2287         <li> In the pipeline <code> foo | bar</code>, what is the
2288         buffering type of the file descriptor which corresponds to
2289         the <code> stdin </code> stream of the <code> bar </code>
2290         process? </li>
2291
2292         <li> Assume <code>foo</code> is a log file which increases due to
2293         some process appending data to it. Explain why the command <code>
2294         tail -f foo | while read; do echo new data; done </code> does not
2295         work as expected. Fix the flaw by changing the buffering type with
2296         <code>stdbuf(1)</code>. </li>
2297
2298         <li> Run <code>sleep 100 > /dev/null &</code>, examine open
2299         files of the sleep process by looking at suitable files in
2300         <code>/proc</code>. Do the same with <code>sleep 100 | head
2301         &</code>. </li>
2302
2303         <li> Run <code>ldd /bin/sh</code> and explain what happens when a
2304         shell is executed. </li>
2305
2306         <li> On a Linux system, run <code>cat /proc/$$/maps</code> or
2307         <code>pmap -x $$</code> to see the address space layout of your
2308         shell. Check <code>Documentation/filesystems/proc.txt</code>
2309         in the linux kernel source tree for the format of
2310         <code>/proc/$$/maps</code>. </li>
2311
2312         <li> Run <code>cat /proc/$$/smaps</code> and examine the values of
2313         the heap section. </li>
2314
2315         <li> Assume some program allocates a lot of memory so that the size of
2316         the valid virtual addresses is 1T large. Assume further that a software
2317         bug causes the content of a pointer variable to be overwritten with
2318         random garbage. Determine the probability that this pointer variable
2319         contains a valid address (assuming a 64 bit system). </li>
2320 </ul>
2321
2322 HOMEWORK(«
2323
2324 Explain how <code>PR_SET_CHILD_SUBREAPER</code> works and possible
2325 use-cases for this (Linux) feature.
2326
2327 »)
2328
2329 HOMEWORK(«
2330
2331 Explain in one paragraph of text the purpose of the <em>file
2332 creation mask</em> (also known as <em>umask</em>) of a process.
2333
2334 »)
2335
2336 HOMEWORK(«
2337
2338 When we said that each process runs on behalf of a user and that the
2339 ID of this user is part of the process metadata, we were simplifying
2340 matters. There are actually three different UIDs and three different
2341 GIDs: the <em>real UID</em>, the <em>effective UID</em>, and the
2342 <em>saved set-user ID</em>, and analogous for the group IDs. Explain
2343 the purpose of the three UIDs.
2344
2345 »)
2346
2347
2348 HOMEWORK(«
2349
2350 On a multi-CPU system the performance of a program can be
2351 enhanced by allowing for multiple flows of control. This is the
2352 idea behind <em>threads</em>, which are also called <em>lightweight
2353 processes</em>. Give an overview of threads, summarize the POSIX thread
2354 API (see <code>pthreads(7)</code>) and explain how the Linux-specific
2355 <code>clone(2)</code> system call can used to implement threads.
2356
2357 »)
2358
2359 HOMEWORK(«
2360
2361 Explain what the command <code>find /etc > /dev/null</code> does,
2362 and why you get some error messages. Assume you'd like to extract
2363 only those error messages which contain the string "lvm". Explain
2364 why <code>find /etc > /dev/null | grep lvm</code> does not work as
2365 expected. Come up with a similiar command that works.
2366
2367 », «
2368
2369 The command traverses the <code>/etc</code> directory recursively and
2370 prints all files and directories it encounters during the traversal to
2371 stdout. Since stdout is redirected to the NULL device by the <code>>
2372 /dev/null</code> construct, only the stderr stream containing the error
2373 messages makes it to the terminal.  This includes all subdirectories
2374 of <code>/etc</code> which cannot be traversed due to insufficient
2375 permissions (no "r" bit set). The proposed <code>find | grep</code>
2376 command does not work since the <code>|</code> operator is evaluated
2377 <em>before</em> any redirections specified by the find command
2378 take place. More precisely, stdout of the find process is redirected
2379 <em>twice</em>: First to one end of the pipe due to the <code>|</code>,
2380 then to the NULL device due to the <code>> /dev/null</code>. The
2381 last redirection "wins", so the <code>grep</code> process does not
2382 see any input. The command <code>find /etc 2>&1 > /dev/null | grep
2383 lvm</code> works. The following four redirections take place: First
2384 stdout of the <code>find</code> process and stdin of <code>grep</code>
2385 process are redirected to the two ends of the pipe. Next, due to
2386 the <code>2>&1</code> the stderr stream of the <code>find</code>
2387 process is redirected to the current destination of stdout, i.e.,
2388 to the pipe. Finally the <code>> /dev/null</code> redirects stdout
2389 of the find process to the NULL device. Hence error messages go to
2390 the pipe and are processed by <code>grep</code> as desired.
2391
2392 »)
2393
2394 HOMEWORK(«
2395 Run <code>ulimit -n</code> to see the maximal number of file descriptors you
2396 are allowed to create. Explain what this limit means with respect
2397 to multiple processes, multiple logins, and the <code>fork(2</code>) system
2398 call. Write a program in your language of choice which creates file
2399 descriptors in a loop until it fails due to the file descriptor
2400 limit. Then print the number of file descriptors the program was able
2401 to create.
2402 », «
2403 On our systems the limit is set to 1024. This means a single process
2404 can only have this many files open at any given time. Independent
2405 processes (like those coming from different login sessions) have no
2406 common file descriptors, even though they may open the same files. In
2407 this sense the file descriptor limit is a per-process limit. However,
2408 when a process calls <code>«fork(</code>») to create a new process, the new
2409 process inherits all open file descriptors from the parent. This can
2410 lead to the situation where a newly created process is unable to open
2411 <em>any</em> files. This property was actually used to break computer
2412 security. The <code>«O_CLOEXEC»</code> flag was introduced not too long
2413 ago to deal with this problem. See <code>open(2</code>) for details.
2414
2415 C program that opens the maximal possible number of file descriptors:
2416
2417 <pre>
2418         int main(void)
2419         {
2420                 int i;
2421
2422                 for (i == 0; open("/dev/null", O_RDONLY) >= 0; i++)
2423                         ;
2424                 printf("opened %d file descriptors\n", i);
2425                 exit(0);
2426         }
2427 </pre>
2428 »)
2429
2430 HOMEWORK(«
2431
2432 Search the web for the document called
2433 <code>vm/overcommit-accounting</code>. Discuss the pros and cons of
2434 the three possible overcommit handling modes.
2435
2436 »)
2437
2438 HOMEWORK(«
2439
2440 Read this
2441 <a
2442 href="https://utcc.utoronto.ca/~cks/space/blog/unix/MemoryOvercommit">blog
2443 posting</a> on the virtual memory overcommit issue. Explain the
2444 catch-22 situation described there in no more than two sentences.
2445
2446 »)
2447
2448 HOMEWORK(«
2449
2450 Describe, in a single paragraph of text, what a virtual dynamic
2451 shared object (VDSO) is and which type of applications benefit most
2452 from it.
2453
2454 »)
2455
2456 HOMEWORK(«
2457
2458 Describe the concept of <em> huge pages </em> and the Linux-specific
2459 implementation of <em> transparent </em> huge pages. Discuss the pros
2460 and cons of huge tables and explain the workloads which would benefit
2461 from a setup with huge pages enabled.
2462
2463 »)
2464
2465 HOMEWORK(«
2466 <ul>
2467         <li> Explain the concept of <em>address space layout randomization</em>
2468         (ASLR). </li>
2469
2470         <li> Run <code>bash -c</code> '<code>cat /proc/$$/maps</code>'
2471         repeatedly to see address space layout randomization in action. Discuss
2472         the pros and cons of ASLR. </li>
2473 </ul>
2474 »)
2475
2476 SUPPLEMENTS()
2477
2478 SUBSECTION(«cd_script»)
2479
2480 <pre>
2481         #!/bin/sh
2482         echo "changing CWD to $1"
2483         cd "$1"
2484 </pre>
2485
2486 SUBSECTION(«hello_world»)
2487
2488 <pre>
2489         #!/bin/sh
2490         echo "hello world"
2491 </pre>
2492
2493 SUBSECTION(«symlink_madness»)
2494
2495 <pre>
2496         #!/bin/sh
2497         mkdir foo
2498         touch foo/a
2499         ln -s ../foo foo/testdir
2500         ls -l foo/a foo/testdir/a foo/testdir/testdir/a
2501 </pre>
2502
2503 SECTION(«Further Reading»)
2504 <ul>
2505         <li> <a
2506         href="http://www.catb.org/~esr/writings/taoup/html/ch02s01.html">Origins
2507         and History of Unix, 1969-1995</a> by Eric Steven Raymond. </li>
2508
2509         <li> <a
2510         href="https://www.newyorker.com/business/currency/the-gnu-manifesto-turns-thirty">
2511         The GNU Manifesto Turns Thirty</a>, by Maria Bustillos. </li>
2512
2513         <li> <a
2514         href="http://www.catb.org/~esr/writings/unix-koans/end-user.html">
2515         The Koan of Master Foo and the End User</a>. </li>
2516
2517         <li> <a href="https://lwn.net/Articles/411845/">Ghosts of Unix Past:
2518         a historical search for design patterns</a>, by Neil Brown. </li>
2519
2520         <li> W. Richard Stevens: Advanced Programming in the Unix
2521         Environment. Addison Wesley. </li>
2522
2523 </ul>