which represents a file or a directory. Dentries contain pointers to
the corresponding inode and to the parent dentry. The vfs maintains
the <em>dentry cache</em>, which is independent of the normal page
-cache that keeps copies of file contents in memory. Dentries are kept
-in hashed lists to make directory lookups fast. </p>
+cache that keeps copies of file contents in memory. Dentries are
+kept in hashed lists to make directory lookups fast. Dentries are
+also reference-counted. As long as there is a reference on a dentry,
+it can not be pruned from the dentry cache. Unreferenced dentries,
+however, can be evicted from the cache at any time due to memory
+pressure. Each dentry also has a "looked up" flag which enables the
+VFS to evict dentries which have never been looked up earlier than
+those which have. </p>
<p> On a busy system the dentry cache changes frequently. For example,
file creation, removal and rename all trigger an update of the dentry
-cache. Moreover, memory pressure can cause dentries to be evicted from
-the cache at any time. Clearly, some sort of coordination is needed to
-keep the dentry cache consistent in view of concurrent changes, like
-a file being deleted on one CPU and looked up on another. A global
-lock would scale very poorly, so a more sophisticated method called
-<em>RCU-walk</em> is employed. With RCU, lookups can be performed
-without taking locks, and read operations can proceed in parallel
-with concurrent writers. </p>
+cache. Clearly, some sort of coordination is needed to keep the dentry
+cache consistent in view of concurrent changes, like a file being
+deleted on one CPU and looked up on another. A global lock would scale
+very poorly, so a more sophisticated method called <em>RCU-walk</em> is
+employed. With RCU, lookups can be performed without taking locks, and
+read operations can proceed in parallel with concurrent writers. </p>
<p> The dentry cache also contains <em>negative</em> entries
which represent nonexistent paths which were recently looked up
unsuccessfully. When a user space program tries to access such a path
again, the <code>ENOENT</code> error can be returned without involving
the filesystem. Since lookups of nonexistent files happen frequently,
-failing such lookups quickly enhances performance. Naturally, negative
-dentries do not point to any inode. </p>
-
-<p> Dentries are reference-counted. As long as there is a reference
-on a dentry, it can not be pruned from the dentry cache. </p>
+failing such lookups quickly enhances performance. For example
+<code>import</code> statements from interpreted languages like Python
+benefit from the negative entries of the dentry cache because the
+requested files have to be looked up in several directories. Naturally,
+negative dentries do not point to any inode. </p>
SUBSECTION(«File and Inode Objects»)
accounted identically to other blocks, making files appear to use
more data blocks than expected. </p>
+<p> If the system crashes while preallocated post-EOF blocks exist,
+the space will be recovered the next time the affected file gets closed
+(after it has been opened of course) by the normal reclaim mechanism
+which happens when a file is being closed. </p>
+
SUBSECTION(«Reverse Mapping»)
<p> This feature was implemented in 2018. It adds yet another B-tree to
The client periodically talks to the server to update its leases. </p>
<p> There is a deep interaction between file handles and the dentry
-cache of the vfs. Without nfs, a filesystems can rely on the following
+cache of the vfs. Without nfs, a filesystem can rely on the following
"closure" property: For any positive dentry, all its parent directories
are also positive dentries. This is no longer true if a filesystem
is exported. Therefore the filesystem maps any file handles sent to
guaranteed to see the write operation of the blue client while there
is no such guarantee for the red client. </p>
-SUBSECTION(«Delegations»)
-
-nfs4 introduced a feature called <em>file delegation</em>. A file
-delegation allows the client to treat a file temporarily as if no
-other client is accessing it. Once a file has been delegated to a
-client, the client might cache all write operations for this file,
-and only contact the server when memory pressure forces the client
-to free memory by writing back file contents. The server notifies
-the client if another client attempts to access that file.
+SUBSECTION(«File and Directory Delegations»)
+
+<p> nfs4 introduced a per-file state management feature called
+<em>file delegation</em>. Once a file has been delegated to a client,
+the server blocks write access to the file for other nfs clients and
+for local processes. Therefore the client may assume that the file
+does not change unexpectedly. This cache-coherency guarantee can
+improve performance because the client may cache all write operations
+for this file, and only contact the server when memory pressure forces
+the client to free memory by writing back file contents. </p>
+
+<p> A drawback of file delegations is that they delay conflicting open
+requests by other clients because existing delegations must be recalled
+before the open request completes. This is particularly important if
+an nfs client which is holding a delegation gets disconnected from
+the network. To detect this condition, clients report to the server
+that they are still alive by periodically sending a <code>RENEW</code>
+request. If no such request has arrived for the <em>lease time</em>
+(typically 90 seconds), the server may recall any delegations it
+has granted to the disconnected client. This allows accesses from
+other clients that would normally be prevented because of the
+delegation. </p>
+
+<p> However, the server is not obliged to recall <em>uncontested</em>
+delegations for clients whose lease period has expired. In fact,
+newer Linux NFS server implementations retain the uncontested
+state of unresponsive clients for up to 24 hours. This so-called
+<em>courteous server</em> feature was introduced in Linux-5.19
+(released in 2022). </p>
+
+<p> Let us finally remark that the delegations as discussed above
+work only for regular files. NFS versions up to and including 4.0
+do not grant delegations for directories. With nfs4.1 an nfs client
+may ask the server to be notified whenever changes are made to the
+directory by another client. Among other benefits, this feature allows
+for <em>strong directory cache coherency</em>. However, as of 2022,
+directory delegations are not yet implemented by Linux. </p>
SUBSECTION(«Silly Renames and Stale File Handles»)
to the thusly silly-renamed file is closed, the client removes the
file by issuing an appropriate rpc. </p>
-<p> This approach is not perfect. For one, if the client crashes,
-a stale <code>.nfs12345</code> file remains on the server. Second,
-since silly renames are only known to the nfs client, bad things
-happen if a different client removes the file. </p>
-
+<p> This approach is not perfect. For one, if the client crashes, a
+stale <code>.nfs12345</code> file remains on the server. Second, since
+silly renames are only known to the nfs client, bad things happen if a
+different client removes the file. Finally, if an application running
+on a client removes the last regular file in a directory, and this
+file got silly-renamed because it was still held open, a subsequent
+<code>rmdir</code> will fail unexpectedly with <code>Directory not
+empty</code>. Version 4.1 of the NFS protocol finally got rid of
+silly renames: An NFS4.1 server knows when it its safe to unlink a
+file and communicates this information to the client. </p>
<p> The file handle which an nfs client received through some earlier
-rpc can become invalid at any time due to operations on a different
+rpc can become invalid at any time due to operations on different
hosts. This happens, for example, if the file was deleted on the server
or on a different nfs client, or when the directory that contains
the file is no longer exported by the server due to a configuration
<li> On an nfs server, run <code>collectl -s F -i 5</code> and discuss
the output. </li>
- <li> In an nfs-mounted directory, run <code>cat > foo &</code>. Note
- that the cat process automatically receives the STOP signal.
- Run <code>rm foo; ls -ltra</code>. Read section D2 of the
- <a href="http://nfs.sourceforge.net/">nfs HOWTO</a> for the
- explanation. </li>
+ <li> In an nfs-mounted directory (nfs version 4.0 or earlier), run
+ <code>cat > foo &</code>. Note that the cat process automatically
+ receives the STOP signal. Run <code>rm foo; ls -ltra</code>. Read
+ section D2 of the <a href="https://nfs.sourceforge.net/">nfs HOWTO</a>
+ for the explanation. </li>
<li> In an nfs-mounted directory, run <code>{ while :; do echo; sleep
1; done; } > baz &</code>. What happens if you remove the file on a
<li> Discuss the pros and cons of hard vs. soft mounts. </li>
- <li> Read section A10 of the <a href="http://nfs.sourceforge.net/">nfs
+ <li> Read section A10 of the <a href="https://nfs.sourceforge.net/">nfs
HOWTO</a> to learn about common reasons for stale nfs handles. </li>
<li> Can every local filesystem be exported via nfs? </li>
<ul>
<li> <code>Documentation/filesystems/vfs.txt</code> of the Linux
kernel source. </li>
+ <li> Jonathan Corbet: <a href="https://lwn.net/Articles/419811/">
+ Dcache scalability and RCU-walk</a>. An LWN articile which explains
+ the dcache in some more detail. </li>
<li> Dominic Giampaolo: Practical File System Design </li>
<li> Cormen </li>
<li> Darrick Wong: XFS Filesystem Disk Structures </li>
- <li> The <a href="https://xfs.org/index.php/Main_Page">xfs FAQ</a> </li>
<li> Documentation/filesystems/path-lookup.rst </li>
<li> rfc 5531: Remote Procedure Call Protocol, Version 2 (2009) </li>
<li> Birell, A.D. and Nelson, B.J.: Implementing Remote Procedure Calls
(1984) </li>
+ <li> <a href="https://lwn.net/Articles/897917/">NFS: the early
+ years</a> and <a href="https://lwn.net/Articles/898262/">NFS: the new
+ millennium</a>, two articles on the design and history of NFS by Neil
+ Brown. </li>
</ul>