X-Git-Url: http://git.tuebingen.mpg.de/?p=aple.git;a=blobdiff_plain;f=Filesystems.m4;h=1a77ef882e9ebb6c412380add1c2787acc32a291;hp=0e573d3d44ba3cd11499dae6e357f5451503931e;hb=HEAD;hpb=3193680276da2e0997c64cdf9a0f5874218237a5 diff --git a/Filesystems.m4 b/Filesystems.m4 index 0e573d3..40dedd1 100644 --- a/Filesystems.m4 +++ b/Filesystems.m4 @@ -543,30 +543,34 @@ SUBSECTION(«The Dentry Cache») which represents a file or a directory. Dentries contain pointers to the corresponding inode and to the parent dentry. The vfs maintains the dentry cache, which is independent of the normal page -cache that keeps copies of file contents in memory. Dentries are kept -in hashed lists to make directory lookups fast.

+cache that keeps copies of file contents in memory. Dentries are +kept in hashed lists to make directory lookups fast. Dentries are +also reference-counted. As long as there is a reference on a dentry, +it can not be pruned from the dentry cache. Unreferenced dentries, +however, can be evicted from the cache at any time due to memory +pressure. Each dentry also has a "looked up" flag which enables the +VFS to evict dentries which have never been looked up earlier than +those which have.

On a busy system the dentry cache changes frequently. For example, file creation, removal and rename all trigger an update of the dentry -cache. Moreover, memory pressure can cause dentries to be evicted from -the cache at any time. Clearly, some sort of coordination is needed to -keep the dentry cache consistent in view of concurrent changes, like -a file being deleted on one CPU and looked up on another. A global -lock would scale very poorly, so a more sophisticated method called -RCU-walk is employed. With RCU, lookups can be performed -without taking locks, and read operations can proceed in parallel -with concurrent writers.

+cache. Clearly, some sort of coordination is needed to keep the dentry +cache consistent in view of concurrent changes, like a file being +deleted on one CPU and looked up on another. A global lock would scale +very poorly, so a more sophisticated method called RCU-walk is +employed. With RCU, lookups can be performed without taking locks, and +read operations can proceed in parallel with concurrent writers.

The dentry cache also contains negative entries which represent nonexistent paths which were recently looked up unsuccessfully. When a user space program tries to access such a path again, the ENOENT error can be returned without involving the filesystem. Since lookups of nonexistent files happen frequently, -failing such lookups quickly enhances performance. Naturally, negative -dentries do not point to any inode.

- -

Dentries are reference-counted. As long as there is a reference -on a dentry, it can not be pruned from the dentry cache.

+failing such lookups quickly enhances performance. For example +import statements from interpreted languages like Python +benefit from the negative entries of the dentry cache because the +requested files have to be looked up in several directories. Naturally, +negative dentries do not point to any inode.

SUBSECTION(«File and Inode Objects») @@ -951,6 +955,11 @@ file fragmentation, but they can cause confusion because they are accounted identically to other blocks, making files appear to use more data blocks than expected.

+

If the system crashes while preallocated post-EOF blocks exist, +the space will be recovered the next time the affected file gets closed +(after it has been opened of course) by the normal reclaim mechanism +which happens when a file is being closed.

+ SUBSECTION(«Reverse Mapping»)

This feature was implemented in 2018. It adds yet another B-tree to @@ -1195,7 +1204,7 @@ re-used subsequently. File handles are based on leases: The client periodically talks to the server to update its leases.

There is a deep interaction between file handles and the dentry -cache of the vfs. Without nfs, a filesystems can rely on the following +cache of the vfs. Without nfs, a filesystem can rely on the following "closure" property: For any positive dentry, all its parent directories are also positive dentries. This is no longer true if a filesystem is exported. Therefore the filesystem maps any file handles sent to @@ -1307,15 +1316,43 @@ operations. With close-to-open cache consistency the green client is guaranteed to see the write operation of the blue client while there is no such guarantee for the red client.

-SUBSECTION(«Delegations») - -nfs4 introduced a feature called file delegation. A file -delegation allows the client to treat a file temporarily as if no -other client is accessing it. Once a file has been delegated to a -client, the client might cache all write operations for this file, -and only contact the server when memory pressure forces the client -to free memory by writing back file contents. The server notifies -the client if another client attempts to access that file. +SUBSECTION(«File and Directory Delegations») + +

nfs4 introduced a per-file state management feature called +file delegation. Once a file has been delegated to a client, +the server blocks write access to the file for other nfs clients and +for local processes. Therefore the client may assume that the file +does not change unexpectedly. This cache-coherency guarantee can +improve performance because the client may cache all write operations +for this file, and only contact the server when memory pressure forces +the client to free memory by writing back file contents.

+ +

A drawback of file delegations is that they delay conflicting open +requests by other clients because existing delegations must be recalled +before the open request completes. This is particularly important if +an nfs client which is holding a delegation gets disconnected from +the network. To detect this condition, clients report to the server +that they are still alive by periodically sending a RENEW +request. If no such request has arrived for the lease time +(typically 90 seconds), the server may recall any delegations it +has granted to the disconnected client. This allows accesses from +other clients that would normally be prevented because of the +delegation.

+ +

However, the server is not obliged to recall uncontested +delegations for clients whose lease period has expired. In fact, +newer Linux NFS server implementations retain the uncontested +state of unresponsive clients for up to 24 hours. This so-called +courteous server feature was introduced in Linux-5.19 +(released in 2022).

+ +

Let us finally remark that the delegations as discussed above +work only for regular files. NFS versions up to and including 4.0 +do not grant delegations for directories. With nfs4.1 an nfs client +may ask the server to be notified whenever changes are made to the +directory by another client. Among other benefits, this feature allows +for strong directory cache coherency. However, as of 2022, +directory delegations are not yet implemented by Linux.

SUBSECTION(«Silly Renames and Stale File Handles») @@ -1336,14 +1373,19 @@ is still open. Only after all the last file descriptor that refers to the thusly silly-renamed file is closed, the client removes the file by issuing an appropriate rpc.

-

This approach is not perfect. For one, if the client crashes, -a stale .nfs12345 file remains on the server. Second, -since silly renames are only known to the nfs client, bad things -happen if a different client removes the file.

- +

This approach is not perfect. For one, if the client crashes, a +stale .nfs12345 file remains on the server. Second, since +silly renames are only known to the nfs client, bad things happen if a +different client removes the file. Finally, if an application running +on a client removes the last regular file in a directory, and this +file got silly-renamed because it was still held open, a subsequent +rmdir will fail unexpectedly with Directory not +empty. Version 4.1 of the NFS protocol finally got rid of +silly renames: An NFS4.1 server knows when it its safe to unlink a +file and communicates this information to the client.

The file handle which an nfs client received through some earlier -rpc can become invalid at any time due to operations on a different +rpc can become invalid at any time due to operations on different hosts. This happens, for example, if the file was deleted on the server or on a different nfs client, or when the directory that contains the file is no longer exported by the server due to a configuration @@ -1387,11 +1429,11 @@ EXERCISES()

  • On an nfs server, run collectl -s F -i 5 and discuss the output.
  • -
  • In an nfs-mounted directory, run cat > foo &. Note - that the cat process automatically receives the STOP signal. - Run rm foo; ls -ltra. Read section D2 of the - nfs HOWTO for the - explanation.
  • +
  • In an nfs-mounted directory (nfs version 4.0 or earlier), run + cat > foo &. Note that the cat process automatically + receives the STOP signal. Run rm foo; ls -ltra. Read + section D2 of the nfs HOWTO + for the explanation.
  • In an nfs-mounted directory, run { while :; do echo; sleep 1; done; } > baz &. What happens if you remove the file on a @@ -1399,7 +1441,7 @@ EXERCISES()
  • Discuss the pros and cons of hard vs. soft mounts.
  • -
  • Read section A10 of the nfs +
  • Read section A10 of the nfs HOWTO to learn about common reasons for stale nfs handles.
  • Can every local filesystem be exported via nfs?
  • @@ -1466,9 +1508,12 @@ SECTION(«Further Reading»)
  • Dominic Giampaolo: Practical File System Design
  • Cormen
  • Darrick Wong: XFS Filesystem Disk Structures
  • -
  • The xfs FAQ
  • Documentation/filesystems/path-lookup.rst
  • rfc 5531: Remote Procedure Call Protocol, Version 2 (2009)
  • Birell, A.D. and Nelson, B.J.: Implementing Remote Procedure Calls (1984)
  • +
  • NFS: the early + years and NFS: the new + millennium, two articles on the design and history of NFS by Neil + Brown.