[PATCH 3/4] Rework the CacheFS documentation to reflect FS-Cache split

From: David Howells (dhowells_at_redhat.com)
Date: 10/06/04

  • Next message: Alan Kilian: "Solaris developer wants a Linux Mentor for drivers."
    To: akpm@osdl.org
    Date:	Wed, 06 Oct 2004 17:24:25 +0100
    
    

    The attached patch reworks the CacheFS documentation to reflect the new split
    between CacheFS and FS-Cache.

    Signed-Off-By: David Howells <dhowells@redhat.com>

    ---
    warthog1>diffstat fscache-docs-269rc3mm2.diff 
     cachefs.txt             |  881 ------------------------------------------------
     caching/backend-api.txt |  317 +++++++++++++++++
     caching/cachefs.txt     |  274 ++++++++++++++
     caching/fscache.txt     |   94 +++++
     caching/netfs-api.txt   |  583 +++++++++++++++++++++++++++++++
     5 files changed, 1268 insertions(+), 881 deletions(-)
    diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/cachefs.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/cachefs.txt
    --- linux-2.6.9-rc3-mm2/Documentation/filesystems/cachefs.txt	2004-10-05 10:38:12.000000000 +0100
    +++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/cachefs.txt	1970-01-01 01:00:00.000000000 +0100
    @@ -1,892 +0,0 @@
    -			  ===========================
    -			  CacheFS: Caching Filesystem
    -			  ===========================
    -
    -========
    -OVERVIEW
    -========
    -
    -CacheFS is a general purpose cache for network filesystems, though it could be
    -used for caching other things such as ISO9660 filesystems too.
    -
    -CacheFS uses a block device directly rather than a bunch of files under an
    -already mounted filesystem. For why this is so, see further on. If necessary,
    -however, a file can be loopback mounted as a cache.
    -
    -CacheFS does not follow the idea of completely loading every netfs file opened
    -into the cache before it can be operated upon, and then serving the pages out
    -of CacheFS rather than the netfs because:
    -
    - (1) It must be practical to operate without a cache.
    -
    - (2) The size of any accessible file must not be limited to the size of the
    -     cache.
    -
    - (3) The combined size of all opened files (this includes mapped libraries)
    -     must not be limited to the size of the cache.
    -
    - (4) The user should not be forced to download an entire file just to do a
    -     one-off access of a small portion of it.
    -
    -It rather serves the cache out in PAGE_SIZE chunks as and when requested by
    -the netfs('s) using it.
    -
    -
    -CacheFS provides the following facilities:
    -
    - (1) More than one block device can be mounted as a cache.
    -
    - (2) Caches can be mounted / unmounted at any time.
    -
    - (3) The netfs is provided with an interface that allows either party to
    -     withdraw caching facilities from a file (required for (2)).
    -
    - (4) The interface to the netfs returns as few errors as possible, preferring
    -     rather to let the netfs remain oblivious.
    -
    - (5) Cookies are used to represent files and indexes to the netfs. The simplest
    -     cookie is just a NULL pointer - indicating nothing cached there.
    -
    - (6) The netfs is allowed to propose - dynamically - any index hierarchy it
    -     desires, though it must be aware that the index search function is
    -     recursive and stack space is limited.
    -
    - (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates
    -     that page A is at index B of the data-file represented by cookie C, and
    -     that it should be read or written. CacheFS may or may not start I/O on
    -     that page, but if it does, a netfs callback will be invoked to indicate
    -     completion.
    -
    - (8) Cookies can be "retired" upon release. At this point CacheFS will mark
    -     them as obsolete and the index hierarchy rooted at that point will get
    -     recycled.
    -
    - (9) The netfs provides a "match" function for index searches. In addition to
    -     saying whether a match was made or not, this can also specify that an
    -     entry should be updated or deleted.
    -
    -(10) All metadata modifications (this includes index contents) are performed
    -     as journalled transactions. These are replayed on mounting.
    -
    -
    -=============================================
    -WHY A BLOCK DEVICE? WHY NOT A BUNCH OF FILES?
    -=============================================
    -
    -CacheFS is backed by a block device rather than being backed by a bunch of
    -files on a filesystem. This confers several advantages:
    -
    - (1) Performance.
    -
    -     Going directly to a block device means that we can DMA directly to/from
    -     the the netfs's pages. If another filesystem was managing the backing
    -     store, everything would have to be copied between pages. Whilst DirectIO
    -     does exist, it doesn't appear easy to make use of in this situation.
    -
    -     New address space or file operations could be added to make it possible to
    -     persuade a backing discfs to generate block I/O directly to/from disc
    -     blocks under its control, but that then means the discfs has to keep track
    -     of I/O requests to pages not under its control.
    -
    -     Furthermore, we only have to do one lot of readahead calculations, not
    -     two; in the discfs backing case, the netfs would do one and the discfs
    -     would do one.
    -
    - (2) Memory.
    -
    -     Using a block device means that we have a lower memory usage - all data
    -     pages belong to the netfs we're backing. If we used a filesystem, we would
    -     have twice as many pages at certain points - one from the netfs and one
    -     from the backing discfs. In the backing discfs model, under situations of
    -     memory pressure, we'd have to allocate or keep around a discfs page to be
    -     able to write out a netfs page; or else we'd need to be able to punch a
    -     hole in the backing file.
    -
    -     Furthermore, whilst we have to keep a CacheFS inode around in memory for
    -     every netfs inode we're backing, a backing discfs would have to keep the
    -     dentry and possibly a file struct too.
    -
    - (3) Holes.
    -
    -     The cache uses holes to indicate to the netfs that it hasn't yet
    -     downloaded the data for that page.
    -
    -     Since CacheFS is its own filesystem, it can support holes in files
    -     trivially. Running on top of another discfs would limit us to using ones
    -     that can support holes.
    -
    -     Furthermore, it would have to be made possible to detect holes in a discfs
    -     file, rather than just seeing zero filled blocks.
    -
    - (4) Data Consistency.
    -
    -     Cachefs uses a pair of journals to keep track of the state of the cache
    -     and all the pages contained therein. This means that it doesn't get into
    -     an inconsistent state in the on-disc cache and it doesn't lose disc space.
    -
    -     CacheFS takes especial care between the allocation of a block and its
    -     splicing into the on-disc pointer tree, and the data having been written
    -     to disc. If power is interrupted and then restored, the journals are
    -     replayed and if it is seen that a block was allocated but not written it
    -     is then punched out. Being backed by a discfs, I'm not certain what will
    -     happen. It may well be possible to mark a discfs's journal, if it has one,
    -     but how does the discfs deal with those marks? This also limits consistent
    -     caching to running on journalled discfs's where there's a function to
    -     write extraordinary marks into the journal.
    -
    -     The alternative would be to keep flags in the superblock, and to
    -     re-initialise the cache if it wasn't cleanly unmounted.
    -
    -     Knowing that your cache is in a good state is vitally important if you,
    -     say, put /usr on AFS. Some organisations put everything barring /etc,
    -     /sbin, /lib and /var on AFS and have an enormous cache on every
    -     computer. Imagine if the power goes out and renders every cache
    -     inconsistent, requiring all the computers to re-initialise their caches
    -     when the power comes back on...
    -
    - (5) Recycling.
    -
    -     Recycling is simple on CacheFS. It can just scan the metadata index to
    -     look for inodes that require reclamation/recycling; and it can also build
    -     up a list of the least recently used inodes so that they can be reclaimed
    -     later to make space.
    -
    -     Doing this on a discfs would require a search going down through a nest
    -     of directories, and would probably have to be done in userspace.
    -
    - (6) Disc Space.
    -
    -     Whilst the block device does set a hard ceiling on the amount of space
    -     available, CacheFS can guarantee that all that space will be available to
    -     the cache. On a discfs-backed cache, the administrator would probably want
    -     to set a cache size limit, but the system wouldn't be able guarantee that
    -     all that space would be available to the cache - not unless that cache was
    -     on a partition of its own.
    -
    -     Furthermore, with a discfs-backed cache, if the recycler starts to reclaim
    -     cache files to make space, the freed blocks may just be eaten directly by
    -     userspace programs, potentially resulting in the entire cache being
    -     consumed. Alternatively, netfs operations may end up being held up because
    -     the cache can't get blocks on which to store the data.
    -
    - (7) Users.
    -
    -     Users can't so easily go into CacheFS and run amok. The worst they can do
    -     is cause bits of the cache to be recycled early. With a discfs-backed
    -     cache, they can do all sorts of bad things to the files belonging to the
    -     cache, and they can do this quite by accident.
    -
    -
    -On the other hand, there would be some advantages to using a file-based cache
    -rather than a blockdev-based cache:
    -
    - (1) Having to copy to a discfs's page would mean that a netfs could just make
    -     the copy and then assume its own page is ready to go.
    -
    - (2) Backing onto a discfs wouldn't require a committed block device. You would
    -     just nominate a directory and go from there. With CacheFS you have to
    -     repartition or install an extra drive to make use of it in an existing
    -     system (though the loopback device offers a way out).
    -
    - (3) CacheFS requires the netfs to store a key in any pertinent index entry,
    -     and it also permits a limited amount arbitrary data to be stored there.
    -
    -     A discfs could be requested to store the netfs's data in xattrs, and the
    -     filename could be used to store the key, though the key would have to be
    -     rendered as text not binary. Likewise indexes could be rendered as
    -     directories with xattrs.
    -
    - (4) You could easily make your cache bigger if the discfs has plenty of space,
    -     you could even go across multiple mountpoints.
    -
    -
    -======================
    -GENERAL ON-DISC LAYOUT
    -======================
    -
    -The filesystem is divided into a number of parts:
    -
    -  0	+---------------------------+
    -	|        Superblock         |
    -  1	+---------------------------+
    -	|      Update Journal       |
    -	+---------------------------+
    -	|     Validity Journal      |
    -	+---------------------------+
    -	|    Write-Back Journal     |
    -	+---------------------------+
    -	|                           |
    -	|           Data            |
    -	|                           |
    - END	+---------------------------+
    -
    -The superblock contains the filesystem ID tags and pointers to all the other
    -regions.
    -
    -The update journal consists of a set of entries of sector size that keep track
    -of what changes have been made to the on-disc filesystem, but not yet
    -committed.
    -
    -The validity journal contains records of data blocks that have been allocated
    -but not yet written. Upon journal replay, all these blocks will be detached
    -from their pointers and recycled.
    -
    -The writeback journal keeps track of changes that have been made locally to
    -data blocks, but that have not yet been committed back to the server. This is
    -not yet implemented.
    -
    -The journals are replayed upon mounting to make sure that the cache is in a
    -reasonable state.
    -
    -The data region holds a number of things:
    -
    -  (1) Index Files
    -
    -      These are files of entries used by CacheFS internally and by filesystems
    -      that wish to cache data here (such as AFS) to keep track of what's in
    -      the cache at any given time.
    -
    -      The first index file (inode 1) is special. It holds the CacheFS-specific
    -      metadata for every file in the cache (including direct, single-indirect
    -      and double-indirect block pointers).
    -
    -      The second index file (inode 2) is also special. It has an entry for
    -      each filesystem that's currently holding data in this cache.
    -
    -      Every allocated entry in an index has an inode bound to it. This inode is
    -      either another index file or it is a data file.
    -
    -  (2) Cached Data Files
    -
    -      These are caches of files from remote servers. Holes in these files
    -      represent blocks not yet obtained from the server.
    -
    -  (3) Indirection Blocks
    -
    -      Should a file have more blocks than can be pointed to by the few
    -      pointers in its storage management record, then indirection blocks will
    -      be used to point to further data or indirection blocks.
    -
    -      Three levels of indirection are currently supported:
    -
    -	- single indirection
    -	- double indirection
    -
    -  (4) Allocation Nodes and Free Blocks
    -
    -      The free blocks of the filesystem are kept in two single-branched
    -      "trees". One tree is the blocks that are ready to be allocated, and the
    -      other is the blocks that have just been recycled. When the former tree
    -      becomes empty, the latter tree is decanted across.
    -
    -      Each tree is arranged as a chain of "nodes", each node points to the next
    -      node in the chain (unless it's at the end) and also up to 1022 free
    -      blocks.
    -
    -Note that all blocks are PAGE_SIZE in size. The blocks are numbered starting
    -with the superblock at 0. Using 32-bit block pointers, a maximum number of
    -0xffffffff blocks can be accessed, meaning that the maximum cache size is ~16TB
    -for 4KB pages.
    -
    -
    -========
    -MOUNTING
    -========
    -
    -Since CacheFS is actually a quasi-filesystem, it requires a block device behind
    -it. The way to give it one is to mount it as cachefs type on a directory
    -somewhere. The mounted filesystem will then present the user with a set of
    -directories outlining the index structure resident in the cache. Indexes
    -(directories) and files can be turfed out of the cache by the sysadmin through
    -the use of rmdir and unlink.
    -
    -For instance, if a cache contains AFS data, the user might see the following:
    -
    -	root>mount -t cachefs /dev/hdg9 /cache-hdg9
    -	root>ls -1 /cache-hdg9
    -	afs
    -	root>ls -1 /cache-hdg9/afs
    -	cambridge.redhat.com
    -	root>ls -1 /cache-hdg9/afs/cambridge.redhat.com
    -	root.afs
    -	root.cell
    -
    -However, a block device that's going to be used for a cache must be prepared
    -before it can be mounted initially. This is done very simply by:
    -
    -	echo "cachefs___" >/dev/hdg9
    -
    -During the initial mount, the basic structure will be scribed into the cache,
    -and then a background thread will "recycle" the as-yet unused data blocks.
    -
    -
    -======================
    -NETWORK FILESYSTEM API
    -======================
    -
    -There is, of course, an API by which a network filesystem can make use of the
    -CacheFS facilities. This is based around a number of principles:
    -
    - (1) Every file and index is represented by a cookie. This cookie may or may
    -     not have anything associated with it, but the netfs doesn't need to care.
    -
    - (2) Barring the top-level index (one entry per cached netfs), the index
    -     hierarchy for each netfs is structured according the whim of the netfs.
    -
    - (3) Any netfs page being backed by the cache must have a small token
    -     associated with it (possibly pointed to by page->private) so that CacheFS
    -     can keep track of it.
    -
    -This API is declared in <linux/cachefs.h>.
    -
    -
    -NETWORK FILESYSTEM DEFINITION
    ------------------------------
    -
    -CacheFS needs a description of the network filesystem. This is specified using
    -a record of the following structure:
    -
    -	struct cachefs_netfs {
    -		const char			*name;
    -		unsigned			version;
    -		struct cachefs_netfs_operations	*ops;
    -		struct cachefs_cookie		*primary_index;
    -		...
    -	};
    -
    -This first three fields should be filled in before registration, and the fourth
    -will be filled in by the registration function; any other fields should just be
    -ignored and are for internal use only.
    -
    -The fields are:
    -
    - (1) The name of the netfs (used as the key in the toplevel index).
    -
    - (2) The version of the netfs (if the name matches but the version doesn't, the
    -     entire on-disc hierarchy for this netfs will be scrapped and begun
    -     afresh).
    -
    - (3) The operations table is defined as follows:
    -
    -	struct cachefs_netfs_operations {
    -		struct cachefs_page *(*get_page_cookie)(struct page *page);
    -	};
    -
    -     The functions here must all be present. Currently the only one is:
    -
    -     (a) get_page_cookie(): Get the token used to bind a page to a block in a
    -         cache. This function should allocate it if it doesn't exist.
    -
    -	 Return -ENOMEM if there's not enough memory and -ENODATA if the page
    -	 just shouldn't be cached.
    -
    -	 Set *_page_cookie to point to the token and return 0 if there is now a
    -	 cookie. Note that the netfs must keep track of the cookie itself (and
    -	 free it later). page->private can be used for this (see below).
    -
    - (4) The cookie representing the primary index will be allocated according to
    -     another parameter passed into the registration function.
    -
    -For example, kAFS (linux/fs/afs/) uses the following definitions to describe
    -itself:
    -
    -	static struct cachefs_netfs_operations afs_cache_ops = {
    -		.get_page_cookie	= afs_cache_get_page_cookie,
    -	};
    -
    -	struct cachefs_netfs afs_cache_netfs = {
    -		.name			= "afs",
    -		.version		= 0,
    -		.ops			= &afs_cache_ops,
    -	};
    -
    -
    -INDEX DEFINITION
    -----------------
    -
    -Indexes are used for two purposes:
    -
    - (1) To speed up the finding of a file based on a series of keys (such as AFS's
    -     "cell", "volume ID", "vnode ID").
    -
    - (2) To make it easier to discard a subset of all the files cached based around
    -     a particular key - for instance to mirror the removal of an AFS volume.
    -
    -However, since it's unlikely that any two netfs's are going to want to define
    -their index hierarchies in quite the same way, CacheFS tries to impose as few
    -restraints as possible on how an index is structured and where it is placed in
    -the tree. The netfs can even mix indexes and data files at the same level, but
    -it's not recommended.
    -
    -There are some limits on indexes:
    -
    - (1) All entries in any given index must be the same size. An array of such
    -     entries needn't fit exactly into a page, but they will be not laid across
    -     a page boundary.
    -
    -     The netfs supplies a blob of data for each index entry, and CacheFS
    -     provides an inode number and a flag.
    -
    - (2) The entries in one index can be of a different size to the entries in
    -     another index.
    -
    - (3) The entry data must be journallable, and thus must be able to fit into an
    -     update journal entry - this limits the maximum size to a little over 400
    -     bytes at present.
    -
    - (4) The index data must start with the key. The layout of the key is described
    -     in the index definition, and this is used to display the key in some
    -     appropriate way.
    -
    - (5) The depth of the index tree should be judged with care as the search
    -     function is recursive. Too many layers will run the kernel out of stack.
    -
    -To define an index, a structure of the following type should be filled out:
    -
    -	struct cachefs_index_def
    -	{
    -		uint8_t name[8];
    -		uint16_t data_size;
    -		struct {
    -			uint8_t type;
    -			uint16_t len;
    -		} keys[4];
    -
    -		cachefs_match_val_t (*match)(void *target_netfs_data,
    -					     const void *entry);
    -
    -		void (*update)(void *source_netfs_data, void *entry);
    -	};
    -
    -This has the following fields:
    -
    - (1) The name of the index (NUL terminated unless all 8 chars are used).
    -
    - (2) The size of the data blob provided by the netfs.
    -
    - (3) A definition of the key(s) at the beginning of the blob. The netfs is
    -     permitted to specify up to four keys. The total length must not exceed the
    -     data size. It is assumed that the keys will be laid end to end in order,
    -     starting at the first byte of the data.
    -
    -     The type field specifies the way the data should be displayed. It can be
    -     one of:
    -
    -	(*) CACHEFS_INDEX_KEYS_NOTUSED	- key field not used
    -	(*) CACHEFS_INDEX_KEYS_BIN	- display byte-by-byte in hex
    -	(*) CACHEFS_INDEX_KEYS_ASCIIZ	- NUL-terminated ASCII
    -	(*) CACHEFS_INDEX_KEYS_IPV4ADDR	- display as IPv4 address
    -	(*) CACHEFS_INDEX_KEYS_IPV6ADDR	- display as IPv6 address
    -
    - (4) A function to compare an in-page-cache index entry blob with the data
    -     passed to the cookie acquisition function. This function can also be used
    -     to extract data from the blob and copy it into the netfs's structures.
    -
    -     The values this function can return are:
    -
    -	(*) CACHEFS_MATCH_FAILED - failed to match
    -	(*) CACHEFS_MATCH_SUCCESS - successful match
    -	(*) CACHEFS_MATCH_SUCCESS_UPDATE - successful match, entry needs update
    -	(*) CACHEFS_MATCH_SUCCESS_DELETE - entry should be deleted
    -
    -     For example, in linux/fs/afs/vnode.c:
    -
    -	static cachefs_match_val_t
    -	afs_vnode_cache_match(void *target, const void *entry)
    -	{
    -		const struct afs_cache_vnode *cvnode = entry;
    -		struct afs_vnode *vnode = target;
    -
    -		if (vnode->fid.vnode != cvnode->vnode_id)
    -			return CACHEFS_MATCH_FAILED;
    -
    -		if (vnode->fid.unique != cvnode->vnode_unique ||
    -		    vnode->status.version != cvnode->data_version)
    -			return CACHEFS_MATCH_SUCCESS_DELETE;
    -
    -		return CACHEFS_MATCH_SUCCESS;
    -	}
    -
    - (5) A function to initialise or update an in-page-cache index entry blob from
    -     netfs data passed to CacheFS by the netfs. This function should not assume
    -     that there's any data yet in the in-page-cache.
    -
    -     Continuing the above example:
    -
    -	static void afs_vnode_cache_update(void *source, void *entry)
    -	{
    -		struct afs_cache_vnode *cvnode = entry;
    -		struct afs_vnode *vnode = source;
    -
    -		cvnode->vnode_id	= vnode->fid.vnode;
    -		cvnode->vnode_unique	= vnode->fid.unique;
    -		cvnode->data_version	= vnode->status.version;
    -	}
    -
    -To finish the above example, the index definition for the "vnode" level is as
    -follows:
    -
    -	struct cachefs_index_def afs_vnode_cache_index_def = {
    -		.name		= "vnode",
    -		.data_size	= sizeof(struct afs_cache_vnode),
    -		.keys[0]	= { CACHEFS_INDEX_KEYS_BIN, 4 },
    -		.match		= afs_vnode_cache_match,
    -		.update		= afs_vnode_cache_update,
    -	};
    -
    -The first element of struct afs_cache_vnode is the vnode ID.
    -
    -And for contrast, the cell index definition is:
    -
    -	struct cachefs_index_def afs_cache_cell_index_def = {
    -		.name			= "cell_ix",
    -		.data_size		= sizeof(afs_cell_t),
    -		.keys[0]		= { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
    -		.match			= afs_cell_cache_match,
    -		.update			= afs_cell_cache_update,
    -	};
    -
    -The cell index is the primary index for kAFS.
    -
    -
    -NETWORK FILESYSTEM (UN)REGISTRATION
    ------------------------------------
    -
    -The first step is to declare the network filesystem to the cache. This also
    -involves specifying the layout of the primary index (for AFS, this would be the
    -"cell" level).
    -
    -The registration function is:
    -
    -	int cachefs_register_netfs(struct cachefs_netfs *netfs,
    -				   struct cachefs_index_def *primary_idef);
    -
    -It just takes pointers to the netfs definition and the primary index
    -definition. It returns 0 or an error as appropriate.
    -
    -For kAFS, registration is done as follows:
    -
    -	ret = cachefs_register_netfs(&afs_cache_netfs,
    -				     &afs_cache_cell_index_def);
    -
    -The last step is, of course, unregistration:
    -
    -	void cachefs_unregister_netfs(struct cachefs_netfs *netfs);
    -
    -
    -INDEX REGISTRATION
    -------------------
    -
    -The second step is to inform cachefs about part of an index hierarchy that can
    -be used to locate files. This is done by requesting a cookie for each index in
    -the path to the file:
    -
    -	struct cachefs_cookie *
    -	cachefs_acquire_cookie(struct cachefs_cookie *iparent,
    -			       struct cachefs_index_def *idef,
    -			       void *netfs_data);
    -
    -This function creates an index entry in the index represented by iparent,
    -loading the associated blob by calling iparent's update method with the
    -supplied netfs_data.
    -
    -It also creates a new index inode, formatted according to the definition
    -supplied in idef. The new cookie is then returned in *_cookie.
    -
    -Note that this function never returns an error - all errors are handled
    -internally. It may also return CACHEFS_NEGATIVE_COOKIE. It is quite acceptable
    -to pass this token back to this function as iparent (or even to the relinquish
    -cookie, read page and write page functions - see below).
    -
    -Note also that no indexes are actually created on disc until a data file needs
    -to be created somewhere down the hierarchy. Furthermore, an index may be
    -created in several different caches independently at different times. This is
    -all handled transparently, and the netfs doesn't see any of it.
    -
    -For example, with AFS, a cell would be added to the primary index. This index
    -entry would have a dependent inode containing a volume location index for the
    -volume mappings within this cell:
    -
    -	cell->cache =
    -		cachefs_acquire_cookie(afs_cache_netfs.primary_index,
    -				       &afs_vlocation_cache_index_def,
    -				       cell);
    -
    -Then when a volume location was accessed, it would be entered into the cell's
    -index and an inode would be allocated that acts as a volume type and hash chain
    -combination:
    -
    -	vlocation->cache =
    -		cachefs_acquire_cookie(cell->cache,
    -				       &afs_volume_cache_index_def,
    -				       vlocation);
    -
    -And then a particular flavour of volume (R/O for example) could be added to
    -that index, creating another index for vnodes (AFS inode equivalents):
    -
    -	volume->cache =
    -		cachefs_acquire_cookie(vlocation->cache,
    -				       &afs_vnode_cache_index_def,
    -				       volume);
    -
    -
    -DATA FILE REGISTRATION
    -----------------------
    -
    -The third step is to request a data file be created in the cache. This is
    -almost identical to index cookie acquisition. The only difference is that a
    -NULL index definition is passed.
    -
    -	vnode->cache =
    -		cachefs_acquire_cookie(volume->cache,
    -				       NULL,
    -				       vnode);
    -
    -
    -
    -PAGE ALLOC/READ/WRITE
    ----------------------
    -
    -And the fourth step is to propose a page be cached. There are two functions
    -that are used to do this.
    -
    -Firstly, the netfs should ask CacheFS to examine the caches and read the
    -contents cached for a particular page of a particular file if present, or else
    -allocate space to store the contents if not:
    -
    -	typedef
    -	void (*cachefs_rw_complete_t)(void *cookie_data,
    -				      struct page *page,
    -				      void *end_io_data,
    -				      int error);
    -
    -	int cachefs_read_or_alloc_page(struct cachefs_cookie *cookie,
    -				       struct page *page,
    -				       cachefs_rw_complete_t end_io_func,
    -				       void *end_io_data,
    -				       unsigned long gfp);
    -
    -The cookie argument must specify a data file cookie, the page specified will
    -have the data loaded into it (and is also used to specify the page number), and
    -the gfp argument is used to control how any memory allocations made are satisfied.
    -
    -If the cookie indicates the inode is not cached:
    -
    - (1) The function will return -ENOBUFS.
    -
    -Else if there's a copy of the page resident on disc:
    -
    - (1) The function will submit a request to read the data off the disc directly
    -     into the page specified.
    -
    - (2) The function will return 0.
    -
    - (3) When the read is complete, end_io_func() will be invoked with:
    -
    -     (*) The netfs data supplied when the cookie was created.
    -
    -     (*) The page descriptor.
    -
    -     (*) The data passed to the above function.
    -
    -     (*) An argument that's 0 on success or negative for an error.
    -
    -     If an error occurs, it should be assumed that the page contains no usable
    -     data.
    -
    -Otherwise, if there's not a copy available on disc:
    -
    - (1) A block may be allocated in the cache and attached to the inode at the
    -     appropriate place.
    -
    - (2) The validity journal will be marked to indicate this page does not yet
    -     contain valid data.
    -
    - (3) The function will return -ENODATA.
    -
    -
    -Secondly, if the netfs changes the contents of the page (either due to an
    -initial download or if a user performs a write), then the page should be
    -written back to the cache:
    -
    -	int cachefs_write_page(struct cachefs_cookie *cookie,
    -			       struct page *page,
    -			       cachefs_rw_complete_t end_io_func,
    -			       void *end_io_data,
    -			       unsigned long gfp);
    -
    -The cookie argument must specify a data file cookie, the page specified should
    -contain the data to be written (and is also used to specify the page number),
    -and the gfp argument is used to control how any memory allocations made are
    -satisfied.
    -
    -If the cookie indicates the inode is not cached then:
    -
    - (1) The function will return -ENOBUFS.
    -
    -Else if there's a block allocated on disc to hold this page:
    -
    - (1) The function will submit a request to write the data to the disc directly
    -     from the page specified.
    -
    - (2) The function will return 0.
    -
    - (3) When the write is complete:
    -
    -     (a) Any associated validity journal entry will be cleared (the block now
    -	 contains valid data as far as CacheFS is concerned).
    -
    -     (b) end_io_func() will be invoked with:
    -
    -	 (*) The netfs data supplied when the cookie was created.
    -
    -	 (*) The page descriptor.
    -
    -	 (*) The data passed to the above function.
    -
    -	 (*) An argument that's 0 on success or negative for an error.
    -
    -	 If an error happens, it can be assumed that the page has been
    -	 discarded from the cache.
    -
    -
    -PAGE UNCACHING
    ---------------
    -
    -To uncache a page, this function should be called:
    -
    -	void cachefs_uncache_page(struct cachefs_cookie *cookie,
    -				  struct page *page);
    -
    -This detaches the page specified from the data file indicated by the cookie and
    -unbinds it from the underlying block.
    -
    -Note that pages can't be explicitly detached from the a data file. The whole
    -data file must be retired (see the relinquish cookie function below).
    -
    -Furthermore, note that this does not cancel the asynchronous read or write
    -operation started by the read/alloc and write functions.
    -
    -
    -INDEX AND DATA FILE UPDATE
    ---------------------------
    -
    -To request an update of the index data for an index or data file, the following
    -function should be called:
    -
    -	void cachefs_update_cookie(struct cachefs_cookie *cookie);
    -
    -This function will refer back to the netfs_data pointer stored in the cookie by
    -the acquisition function to obtain the data to write into each revised index
    -entry. The update method in the parent index definition will be called to
    -transfer the data.
    -
    -
    -INDEX AND DATA FILE UNREGISTRATION
    -----------------------------------
    -
    -To get rid of a cookie, this function should be called.
    -
    -	void cachefs_relinquish_cookie(struct cachefs_cookie *cookie,
    -				       int retire);
    -
    -If retire is non-zero, then the index or file will be marked for recycling, and
    -all copies of it will be removed from all active caches in which it is present.
    -
    -If retire is zero, then the inode may be available again next the the
    -acquisition function is called.
    -
    -One very important note - relinquish must NOT be called unless all "child"
    -indexes, files and pages have been relinquished first.
    -
    -
    -PAGE TOKEN MANAGEMENT
    ----------------------
    -
    -As previously mentioned, the netfs must keep a token associated with each page
    -currently actively backed by the cache. This is used by CacheFS to go from a
    -page to the internal representation of the underlying block and back again. It
    -is particularly important for managing the withdrawal of a cache whilst it is
    -in active service (eg: it got unmounted).
    -
    -The token is this:
    -
    -	struct cachefs_page {
    -		...
    -	};
    -
    -Note that all fields are for internal CacheFS use only.
    -
    -The token only needs to be allocated when CacheFS asks for it. This it will do
    -by calling the get_page_cookie() method in the netfs definition ops table. Once
    -allocated, the same token should be presented every time the method is called
    -again for a particular page.
    -
    -The token should be retained by the netfs, and should be deleted only after the
    -page has been uncached.
    -
    -One way to achieve this is to attach the token to page->private (and set the
    -PG_private bit on the page) once allocated. Shortcut routines are provided by
    -CacheFS to do this. Firstly, to retrieve if present and allocate if not:
    -
    -	struct cachefs_page *cachefs_page_get_private(struct page *page,
    -						      unsigned gfp);
    -
    -Secondly to retrieve if present and BUG if not:
    -
    -	static inline
    -	struct cachefs_page *cachefs_page_grab_private(struct page *page);
    -
    -To clean up the tokens, the netfs inode hosting the page should be provided
    -with address space operations that circumvent the buffer-head operations for a
    -page. For instance:
    -
    -	struct address_space_operations afs_fs_aops = {
    -		...
    -		.sync_page	= block_sync_page,
    -		.set_page_dirty	= __set_page_dirty_nobuffers,
    -		.releasepage	= afs_file_releasepage,
    -		.invalidatepage	= afs_file_invalidatepage,
    -	};
    -
    -	static int afs_file_invalidatepage(struct page *page,
    -					   unsigned long offset)
    -	{
    -		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
    -		int ret = 1;
    -
    -		BUG_ON(!PageLocked(page));
    -		if (!PagePrivate(page))
    -			return 1;
    -		cachefs_uncache_page(vnode->cache,page);
    -		if (offset == 0)
    -			return 1;
    -		BUG_ON(!PageLocked(page));
    -		if (PageWriteback(page))
    -			return 0;
    -		return page->mapping->a_ops->releasepage(page, 0);
    -	}
    -
    -	static int afs_file_releasepage(struct page *page, int gfp_flags)
    -	{
    -		struct cachefs_page *token;
    -		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
    -
    -		if (PagePrivate(page)) {
    -			cachefs_uncache_page(vnode->cache, page);
    -			token = (struct cachefs_page *) page->private;
    -			page->private = 0;
    -			ClearPagePrivate(page);
    -			if (token)
    -				kfree(token);
    -		}
    -		return 0;
    -	}
    -
    -
    -INDEX AND DATA FILE INVALIDATION
    ---------------------------------
    -
    -There is no direct way to invalidate an index subtree or a data file. To do
    -this, the caller should relinquish and retire the cookie they have, and then
    -acquire a new one.
    diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/backend-api.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/backend-api.txt
    --- linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/backend-api.txt	1970-01-01 01:00:00.000000000 +0100
    +++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/backend-api.txt	2004-10-05 13:21:09.000000000 +0100
    @@ -0,0 +1,317 @@
    +			  ==========================
    +			  FS-CACHE CACHE BACKEND API
    +			  ==========================
    +
    +The FS-Cache system provides an API by which actual caches can be supplied to
    +FS-Cache for it to then serve out to network filesystems and other interested
    +parties.:
    +
    +This API is declared in <linux/fscache-cache.h>.
    +
    +
    +====================================
    +INITIALISING AND REGISTERING A CACHE
    +====================================
    +
    +To start off, a cache definition must be initialised and registered for each
    +cache the backend wants to make available. For instance, CacheFS does this in
    +the fill_super() operation on mounting.
    +
    +The cache definition (struct fscache_cache) should be initialised by calling:
    +
    +	void fscache_init_cache(struct fscache_cache *cache,
    +				struct fscache_cache_ops *ops,
    +				unsigned fsdef_ino,
    +				const char *idfmt,
    +				...)
    +
    +Where:
    +
    + (*) "cache" is a pointer to the cache definition;
    +
    + (*) "ops" is a pointer to the table of operations that the backend supports on
    +     this cache;
    +
    + (*) "fsdef_ino" is the reference number of the FileSystem DEFinition index
    +     (the top-level index), which in CacheFS is its inode number;
    +
    + (*) and a format and printf-style arguments for constructing a label for the
    +     cache.
    +
    +
    +The cache should then be registered with FS-Cache by passing a pointer to the
    +previously initialised cache definition to:
    +
    +	void fscache_add_cache(struct fscache_cache *cache)
    +
    +
    +=====================
    +UNREGISTERING A CACHE
    +=====================
    +
    +A cache can be withdrawn from the system by calling this function with a
    +pointer to the cache definition:
    +
    +	void fscache_withdraw_cache(struct fscache_cache *cache)
    +
    +In CacheFS's case, this is called by put_super().
    +
    +It is possible to check to see if a cache has been withdrawn by calling:
    +
    +	int fscache_is_cache_withdrawn(struct fscache_cache *cache)
    +
    +Which will return non-zero if it has been, zero if it is still active.
    +
    +
    +==================
    +FS-CACHE UTILITIES
    +==================
    +
    +FS-Cache provides some utilities that a cache backend may make use of:
    +
    + (*) Find parent of node.
    +
    +	struct fscache_node *fscache_find_parent_node(struct fscache_node *node)
    +
    +     This allows a backend to find the logical parent of an index or data file
    +     in the cache hierarchy.
    +
    + (*) Allocate a page token.
    +
    +	struct fscache_page *fscache_page_get_private(struct page *page,
    +						      unsigned gfp);
    +
    +     If the page has a page token attached, then this is returned by this
    +     function. If it doesn't have one, then a page token is allocated with the
    +     specified allocation flags and attached to the page's private value. The
    +     error ENOMEM is returned if there's no memory available.
    +
    + (*) Grab an existing page token.
    +
    +	struct fscache_page *fscache_page_grab_private(struct page *page)
    +
    +     This function returns a pointer to the page token attached to the page's
    +     private value if it exists, and BUG's if it does not.
    +
    +
    +========================
    +RELEVANT DATA STRUCTURES
    +========================
    +
    + (*) Index/Data file FS-Cache representation cookie.
    +
    +	struct fscache_cookie {
    +		struct fscache_index_def	*idef;
    +		struct fscache_netfs		*netfs;
    +		void				*netfs_data;
    +		...
    +	};
    +
    +     The fields that might be of use to the backend describe the index
    +     definition (indexes only), the netfs definition and the netfs's data for
    +     this cookie. The index definition contains a number of functions supplied
    +     by the netfs for matching index entries; these are required to provide
    +     some of the cache operations.
    +
    + (*) Cached search result.
    +
    +	struct fscache_search_result {
    +		unsigned			ino;
    +		...
    +	};
    +
    +     This is used by FS-Cache to keep track of what nodes it has found in what
    +     caches. Some of the cache operations set the "cache node number" held
    +     therein.
    +
    + (*) In-cache node representation.
    +
    +	struct fscache_node {
    +		struct fscache_cookie		*cookie;
    +		unsigned long			flags;
    +	#define FSCACHE_NODE_ISINDEX		0
    +		...
    +	};
    +
    +     Structures of this type should be allocated by the cache backend and
    +     passed to FS-Cache when requested by the appropriate cache operation. In
    +     the case of CacheFS, they're embedded in CacheFS's inode structure.
    +
    +     Each node contains a pointer to the cookie that represents the index or
    +     data file it is backing. It also contains a flag that indicates whether
    +     this is an index or not. This should be initialised by calling
    +     fscache_node_init(node).
    +
    + (*) Filesystem definition (FSDEF) index entry representation.
    +
    +	struct fscache_fsdef_index_entry {
    +		uint8_t		name[24];	/* name of netfs */
    +		uint32_t	version;	/* version of layout */
    +	};
    +
    +     This structure defines the layout of the data in the FSDEF index
    +     maintained by the FS-Cache facility for distinguishing between the caches
    +     for separate netfs's.
    +
    +
    +================
    +CACHE OPERATIONS
    +================
    +
    +The cache backend provides FS-Cache with a table of operations that can be
    +performed on the denizens of the cache. These are held in a structure of type
    +
    +	struct fscache_cache_ops
    +
    + (*) Name of cache provider [mandatory].
    +
    +	const char *name
    +
    +     This isn't strictly an operation, but should be pointed at a string naming
    +     the backend.
    +
    + (*) Node lookup [mandatory].
    +
    +	struct fscache_node *(*lookup_node)(struct fscache_cache *cache,
    +					    unsigned ino)
    +
    +     This method is used to turn a logical cache node number into a handle on a
    +     represention of that node.
    +
    + (*) Increment node refcount [mandatory].
    +
    +	struct fscache_node *(*grab_node)(struct fscache_node *node)
    +
    +     This method is called to increment the reference count on a node. It may
    +     fail (for instance if the cache is being withdrawn).
    +
    + (*) Lock/Unlock node [mandatory].
    +
    +	void (*lock_node)(struct fscache_node *node)
    +	void (*unlock_node)(struct fscache_node *node)
    +
    +     These methods are used to exclusively lock a node. It must be possible to
    +     schedule with the lock held, so a spinlock isn't sufficient.
    +
    + (*) Unreference node [mandatory].
    +
    +	void (*put_node)(struct fscache_node *node)
    +
    +     This method is used to discard a reference to a node. The node may be
    +     destroyed when all the references held by FS-Cache are released.
    +
    + (*) Search an index [mandatory].
    +
    +	int (*index_search)(struct fscache_node *index,
    +			    struct fscache_cookie *cookie,
    +			    struct fscache_search_result *result)
    +
    +     This method is called to search an index for a node that matches the
    +     criteria attached to the cookie (cookie->netfs_data). This should be
    +     matched by calling index->cookie->idef->match().
    +
    +     The cache backend is responsible for dealing with the match result,
    +     including updating or discarding existing index entries. An index entry
    +     can be updated by calling index->cookie->idef->update().
    +
    +     If the search is successful, the node number should be stored in
    +     result->ino and zero returned. If not successful, error ENOENT should be
    +     returned if no entry was found, or some other error otherwise.
    +
    + (*) Create a new node [mandatory].
    +
    +	int (*index_add)(struct fscache_node *index,
    +			 struct fscache_cookie *cookie,
    +			 struct fscache_search_result *result)
    +
    +     This method is called to create a new node on disc and add an entry for it
    +     to the specified index. The index entry for the new node should be
    +     obtained by calling index->cookie->idef->update() and passing it the
    +     argument cookie.
    +
    +     If successful, the node number should be stored in result->ino and zero
    +     should be returned.
    +
    + (*) Update a node [mandatory].
    +
    +	int (*index_update)(struct fscache_node *index,
    +			    struct fscache_node *node)
    +
    +     This is called to update the on-disc index entry for the specified
    +     node. The new information should be in node->cookie->netfs_data. This can
    +     be obtained by calling index->cookie->idef->update() and passing it
    +     node->cookie.
    +
    + (*) Synchronise a cache to disc [mandatory].
    +
    +	void (*sync)(struct fscache_cache *cache)
    +
    +     This is called to ask the backend to synchronise a cache with disc.
    +
    + (*) Dissociate a cache [mandatory].
    +
    +	void (*dissociate_pages)(struct fscache_cache *cache)
    +
    +     This is called to ask the cache to dissociate all netfs pages from
    +     mappings to disc. It is assumed that the backend cache will have some way
    +     of finding all the page tokens that refer to its own blocks.
    +
    + (*) Request page be read from cache [mandatory].
    +
    +	int (*read_or_alloc_page)(struct fscache_node *node,
    +				  struct page *page,
    +				  struct fscache_page *pageio,
    +				  fscache_rw_complete_t end_io_func,
    +				  void *end_io_data,
    +				  unsigned long gfp)
    +
    +     This is called to attempt to read a netfs page from disc, or to allocate a
    +     backing block if not. FS-Cache will have done as much checking as it can
    +     before calling, but most of the work belongs to the backend.
    +
    +     If there's no page on disc, then -ENODATA should be returned if the
    +     backend managed to allocate a backing block; -ENOBUFS or -ENOMEM if it
    +     didn't.
    +
    +     If there is a page on disc, then a read operation should be queued and 0
    +     returned. When the read finishes, end_io_func() should be called with the
    +     following arguments:
    +
    +	(*end_io_func)(node->cookie->netfs_data,
    +		       page,
    +		       end_io_data,
    +		       error);
    +
    + (*) Request page be written to cache [mandatory].
    +
    +	int (*write_page)(struct fscache_node *node,
    +			  struct page *page,
    +			  struct fscache_page *pageio,
    +			  fscache_rw_complete_t end_io_func,
    +			  void *end_io_data,
    +			  unsigned long gfp)
    +
    +     This is called to write from a page on which there was a previously
    +     successful read_or_alloc_page() call. FS-Cache filters out pages that
    +     don't have mappings.
    +
    +     If there's no block on disc available, then -ENOBUFS should be returned
    +     (or -ENOMEM if there wasn't any memory to be had).
    +
    +     If the write operation could be queued, then 0 should be returned. When
    +     the write completes, end_io_func() should be called with the following
    +     arguments:
    +
    +	(*end_io_func)(node->cookie->netfs_data,
    +		       page,
    +		       end_io_data,
    +		       error);
    +
    + (*) Discard mapping [mandatory].
    +
    +	void (*uncache_page)(struct fscache_node *node,
    +			     struct fscache_page *page_token)
    +
    +     This is called when a page is being booted from the pagecache. The cache
    +     backend needs to break the links between the page token and whatever
    +     internal representations it maintains.
    diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/cachefs.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/cachefs.txt
    --- linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/cachefs.txt	1970-01-01 01:00:00.000000000 +0100
    +++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/cachefs.txt	2004-10-05 11:22:27.000000000 +0100
    @@ -0,0 +1,274 @@
    +			  ===========================
    +			  CacheFS: Caching Filesystem
    +			  ===========================
    +
    +========
    +OVERVIEW
    +========
    +
    +CacheFS is a backend for the general filesystem cache facility.
    +
    +CacheFS uses a block device directly rather than a bunch of files under an
    +already mounted filesystem. For why this is so, see further on. If necessary,
    +however, a file can be loopback mounted as a cache.
    +
    +
    +CacheFS provides the following facilities:
    +
    + (1) More than one block device can be mounted as a cache.
    +
    + (2) Caches can be mounted / unmounted at any time.
    +
    + (3) All metadata modifications (this includes index contents) are performed
    +     as journalled transactions. These are replayed on mounting.
    +
    +
    +=============================================
    +WHY A BLOCK DEVICE? WHY NOT A BUNCH OF FILES?
    +=============================================
    +
    +CacheFS is backed by a block device rather than being backed by a bunch of
    +files on a filesystem. This confers several advantages:
    +
    + (1) Performance.
    +
    +     Going directly to a block device means that we can DMA directly to/from
    +     the the netfs's pages. If another filesystem was managing the backing
    +     store, everything would have to be copied between pages. Whilst DirectIO
    +     does exist, it doesn't appear easy to make use of in this situation.
    +
    +     New address space or file operations could be added to make it possible to
    +     persuade a backing discfs to generate block I/O directly to/from disc
    +     blocks under its control, but that then means the discfs has to keep track
    +     of I/O requests to pages not under its control.
    +
    +     Furthermore, we only have to do one lot of readahead calculations, not
    +     two; in the discfs backing case, the netfs would do one and the discfs
    +     would do one.
    +
    + (2) Memory.
    +
    +     Using a block device means that we have a lower memory usage - all data
    +     pages belong to the netfs we're backing. If we used a filesystem, we would
    +     have twice as many pages at certain points - one from the netfs and one
    +     from the backing discfs. In the backing discfs model, under situations of
    +     memory pressure, we'd have to allocate or keep around a discfs page to be
    +     able to write out a netfs page; or else we'd need to be able to punch a
    +     hole in the backing file.
    +
    +     Furthermore, whilst we have to keep a CacheFS inode around in memory for
    +     every netfs inode we're backing, a backing discfs would have to keep the
    +     dentry and possibly a file struct too.
    +
    + (3) Holes.
    +
    +     The cache uses holes to indicate to the netfs that it hasn't yet
    +     downloaded the data for that page.
    +
    +     Since CacheFS is its own filesystem, it can support holes in files
    +     trivially. Running on top of another discfs would limit us to using ones
    +     that can support holes.
    +
    +     Furthermore, it would have to be made possible to detect holes in a discfs
    +     file, rather than just seeing zero filled blocks.
    +
    + (4) Data Consistency.
    +
    +     Cachefs uses a pair of journals to keep track of the state of the cache
    +     and all the pages contained therein. This means that it doesn't get into
    +     an inconsistent state in the on-disc cache and it doesn't lose disc space.
    +
    +     CacheFS takes especial care between the allocation of a block and its
    +     splicing into the on-disc pointer tree, and the data having been written
    +     to disc. If power is interrupted and then restored, the journals are
    +     replayed and if it is seen that a block was allocated but not written it
    +     is then punched out. Being backed by a discfs, I'm not certain what will
    +     happen. It may well be possible to mark a discfs's journal, if it has one,
    +     but how does the discfs deal with those marks? This also limits consistent
    +     caching to running on journalled discfs's where there's a function to
    +     write extraordinary marks into the journal.
    +
    +     The alternative would be to keep flags in the superblock, and to
    +     re-initialise the cache if it wasn't cleanly unmounted.
    +
    +     Knowing that your cache is in a good state is vitally important if you,
    +     say, put /usr on AFS. Some organisations put everything barring /etc,
    +     /sbin, /lib and /var on AFS and have an enormous cache on every
    +     computer. Imagine if the power goes out and renders every cache
    +     inconsistent, requiring all the computers to re-initialise their caches
    +     when the power comes back on...
    +
    + (5) Recycling.
    +
    +     Recycling is simple on CacheFS. It can just scan the metadata index to
    +     look for inodes that require reclamation/recycling; and it can also build
    +     up a list of the least recently used inodes so that they can be reclaimed
    +     later to make space.
    +
    +     Doing this on a discfs would require a search going down through a nest
    +     of directories, and would probably have to be done in userspace.
    +
    + (6) Disc Space.
    +
    +     Whilst the block device does set a hard ceiling on the amount of space
    +     available, CacheFS can guarantee that all that space will be available to
    +     the cache. On a discfs-backed cache, the administrator would probably want
    +     to set a cache size limit, but the system wouldn't be able guarantee that
    +     all that space would be available to the cache - not unless that cache was
    +     on a partition of its own.
    +
    +     Furthermore, with a discfs-backed cache, if the recycler starts to reclaim
    +     cache files to make space, the freed blocks may just be eaten directly by
    +     userspace programs, potentially resulting in the entire cache being
    +     consumed. Alternatively, netfs operations may end up being held up because
    +     the cache can't get blocks on which to store the data.
    +
    + (7) Users.
    +
    +     Users can't so easily go into CacheFS and run amok. The worst they can do
    +     is cause bits of the cache to be recycled early. With a discfs-backed
    +     cache, they can do all sorts of bad things to the files belonging to the
    +     cache, and they can do this quite by accident.
    +
    +
    +On the other hand, there would be some advantages to using a file-based cache
    +rather than a blockdev-based cache:
    +
    + (1) Having to copy to a discfs's page would mean that a netfs could just make
    +     the copy and then assume its own page is ready to go.
    +
    + (2) Backing onto a discfs wouldn't require a committed block device. You would
    +     just nominate a directory and go from there. With CacheFS you have to
    +     repartition or install an extra drive to make use of it in an existing
    +     system (though the loopback device offers a way out).
    +
    + (3) CacheFS requires the netfs to store a key in any pertinent index entry,
    +     and it also permits a limited amount arbitrary data to be stored there.
    +
    +     A discfs could be requested to store the netfs's data in xattrs, and the
    +     filename could be used to store the key, though the key would have to be
    +     rendered as text not binary. Likewise indexes could be rendered as
    +     directories with xattrs.
    +
    + (4) You could easily make your cache bigger if the discfs has plenty of space,
    +     you could even go across multiple mountpoints.
    +
    +
    +======================
    +GENERAL ON-DISC LAYOUT
    +======================
    +
    +The filesystem is divided into a number of parts:
    +
    +  0	+---------------------------+
    +	|        Superblock         |
    +  1	+---------------------------+
    +	|      Update Journal       |
    +	+---------------------------+
    +	|     Validity Journal      |
    +	+---------------------------+
    +	|    Write-Back Journal     |
    +	+---------------------------+
    +	|                           |
    +	|           Data            |
    +	|                           |
    + END	+---------------------------+
    +
    +The superblock contains the filesystem ID tags and pointers to all the other
    +regions.
    +
    +The update journal consists of a set of entries of sector size that keep track
    +of what changes have been made to the on-disc filesystem, but not yet
    +committed.
    +
    +The validity journal contains records of data blocks that have been allocated
    +but not yet written. Upon journal replay, all these blocks will be detached
    +from their pointers and recycled.
    +
    +The writeback journal keeps track of changes that have been made locally to
    +data blocks, but that have not yet been committed back to the server. This is
    +not yet implemented.
    +
    +The journals are replayed upon mounting to make sure that the cache is in a
    +reasonable state.
    +
    +The data region holds a number of things:
    +
    +  (1) Index Files
    +
    +      These are files of entries used by CacheFS internally and by filesystems
    +      that wish to cache data here (such as AFS) to keep track of what's in
    +      the cache at any given time.
    +
    +      The first index file (inode 1) is special. It holds the CacheFS-specific
    +      metadata for every file in the cache (including direct, single-indirect
    +      and double-indirect block pointers).
    +
    +      The second index file (inode 2) is also special. It has an entry for
    +      each filesystem that's currently holding data in this cache.
    +
    +      Every allocated entry in an index has an inode bound to it. This inode is
    +      either another index file or it is a data file.
    +
    +  (2) Cached Data Files
    +
    +      These are caches of files from remote servers. Holes in these files
    +      represent blocks not yet obtained from the server.
    +
    +  (3) Indirection Blocks
    +
    +      Should a file have more blocks than can be pointed to by the few
    +      pointers in its storage management record, then indirection blocks will
    +      be used to point to further data or indirection blocks.
    +
    +      Three levels of indirection are currently supported:
    +
    +	- single indirection
    +	- double indirection
    +
    +  (4) Allocation Nodes and Free Blocks
    +
    +      The free blocks of the filesystem are kept in two single-branched
    +      "trees". One tree is the blocks that are ready to be allocated, and the
    +      other is the blocks that have just been recycled. When the former tree
    +      becomes empty, the latter tree is decanted across.
    +
    +      Each tree is arranged as a chain of "nodes", each node points to the next
    +      node in the chain (unless it's at the end) and also up to 1022 free
    +      blocks.
    +
    +Note that all blocks are PAGE_SIZE in size. The blocks are numbered starting
    +with the superblock at 0. Using 32-bit block pointers, a maximum number of
    +0xffffffff blocks can be accessed, meaning that the maximum cache size is ~16TB
    +for 4KB pages.
    +
    +
    +========
    +MOUNTING
    +========
    +
    +Since CacheFS is actually a quasi-filesystem, it requires a block device behind
    +it. The way to give it one is to mount it as cachefs type on a directory
    +somewhere. The mounted filesystem will then present the user with a set of
    +directories outlining the index structure resident in the cache. Indexes
    +(directories) and files can be turfed out of the cache by the sysadmin through
    +the use of rmdir and unlink.
    +
    +For instance, if a cache contains AFS data, the user might see the following:
    +
    +	root>mount -t cachefs /dev/hdg9 /cache-hdg9
    +	root>ls -1 /cache-hdg9
    +	afs
    +	root>ls -1 /cache-hdg9/afs
    +	cambridge.redhat.com
    +	root>ls -1 /cache-hdg9/afs/cambridge.redhat.com
    +	root.afs
    +	root.cell
    +
    +However, a block device that's going to be used for a cache must be prepared
    +before it can be mounted initially. This is done very simply by:
    +
    +	echo "cachefs___" >/dev/hdg9
    +
    +During the initial mount, the basic structure will be scribed into the cache,
    +and then a background thread will "recycle" the as-yet unused data blocks.
    diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/fscache.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/fscache.txt
    --- linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/fscache.txt	1970-01-01 01:00:00.000000000 +0100
    +++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/fscache.txt	2004-10-05 11:22:27.000000000 +0100
    @@ -0,0 +1,94 @@
    +			  ==========================
    +			  General Filesystem Caching
    +			  ==========================
    +
    +========
    +OVERVIEW
    +========
    +
    +This facility is a general purpose cache for network filesystems, though it
    +could be used for caching other things such as ISO9660 filesystems too.
    +
    +FS-Cache mediates between cache backends (such as CacheFS) and network
    +filesystems:
    +
    +	+---------+
    +	|         |                        +-----------+
    +	|   NFS   |--+                     |           |
    +	|         |  |                 +-->|  CacheFS  |
    +	+---------+  |   +----------+  |   | /dev/hda5 |
    +	             |   |          |  |   +-----------+
    +	+---------+  +-->|          |  |
    +	|         |      |          |--+   +-------------+
    +	|   AFS   |----->| FS-Cache |      |             |
    +	|         |      |          |----->| Cache Files |
    +	+---------+  +-->|          |      | /var/cache  |
    +	             |   |          |--+   +-------------+
    +	+---------+  |   +----------+  |
    +	|         |  |                 |   +-------------+
    +	|  ISOFS  |--+                 |   |             |
    +	|         |                    +-->| ReiserCache |
    +	+---------+                        | /           |
    +	                                   +-------------+
    +
    +FS-Cache does not follow the idea of completely loading every netfs file
    +opened in its entirety into a cache before permitting it to be accessed and
    +then serving the pages out of that cache rather than the netfs inode because:
    +
    + (1) It must be practical to operate without a cache.
    +
    + (2) The size of any accessible file must not be limited to the size of the
    +     cache.
    +
    + (3) The combined size of all opened files (this includes mapped libraries)
    +     must not be limited to the size of the cache.
    +
    + (4) The user should not be forced to download an entire file just to do a
    +     one-off access of a small portion of it (such as might be done with the
    +     "file" program).
    +
    +It instead serves the cache out in PAGE_SIZE chunks as and when requested by
    +the netfs('s) using it.
    +
    +
    +FS-Cache provides the following facilities:
    +
    + (1) More than one cache can be used at once.
    +
    + (2) Caches can be added / removed at any time.
    +
    + (3) The netfs is provided with an interface that allows either party to
    +     withdraw caching facilities from a file (required for (2)).
    +
    + (4) The interface to the netfs returns as few errors as possible, preferring
    +     rather to let the netfs remain oblivious.
    +
    + (5) Cookies are used to represent files and indexes to the netfs. The simplest
    +     cookie is just a NULL pointer - indicating nothing cached there.
    +
    + (6) The netfs is allowed to propose - dynamically - any index hierarchy it
    +     desires, though it must be aware that the index search function is
    +     recursive and stack space is limited.
    +
    + (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates
    +     that page A is at index B of the data-file represented by cookie C, and
    +     that it should be read or written. The cache backend may or may not start
    +     I/O on that page, but if it does, a netfs callback will be invoked to
    +     indicate completion. The I/O may be either synchronous or asynchronous.
    +
    + (8) Cookies can be "retired" upon release. At this point FS-Cache will mark
    +     them as obsolete and the index hierarchy rooted at that point will get
    +     recycled.
    +
    + (9) The netfs provides a "match" function for index searches. In addition to
    +     saying whether a match was made or not, this can also specify that an
    +     entry should be updated or deleted.
    +
    +
    +The netfs API to FS-Cache can be found in:
    +
    +	Documentation/filesystems/caching/netfs-api.txt
    +
    +The cache backend API to FS-Cache can be found in:
    +
    +	Documentation/filesystems/caching/backend-api.txt
    diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/netfs-api.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/netfs-api.txt
    --- linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/netfs-api.txt	1970-01-01 01:00:00.000000000 +0100
    +++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/netfs-api.txt	2004-10-06 13:31:13.000000000 +0100
    @@ -0,0 +1,583 @@
    +			===============================
    +			FS-CACHE NETWORK FILESYSTEM API
    +			===============================
    +				       
    +There's an API by which a network filesystem can make use of the FS-Cache
    +facilities. This is based around a number of principles:
    +
    + (1) Every file and index is represented by a cookie. This cookie may or may
    +     not have anything associated with it, but the netfs doesn't need to care.
    +
    + (2) Barring the top-level index (one entry per cached netfs), the index
    +     hierarchy for each netfs is structured according the whim of the netfs.
    +
    + (3) Any netfs page being backed by the cache must have a small token
    +     associated with it (possibly pointed to by page->private) so that FS-Cache
    +     can keep track of it.
    +
    +This API is declared in <linux/fscache.h>.
    +
    +
    +=============================
    +NETWORK FILESYSTEM DEFINITION
    +=============================
    +
    +FS-Cache needs a description of the network filesystem. This is specified using
    +a record of the following structure:
    +
    +	struct fscache_netfs {
    +		const char			*name;
    +		unsigned			version;
    +		struct fscache_netfs_operations	*ops;
    +		struct fscache_cookie		*primary_index;
    +		...
    +	};
    +
    +This first three fields should be filled in before registration, and the fourth
    +will be filled in by the registration function; any other fields should just be
    +ignored and are for internal use only.
    +
    +The fields are:
    +
    + (1) The name of the netfs (used as the key in the toplevel index).
    +
    + (2) The version of the netfs (if the name matches but the version doesn't, the
    +     entire on-disc hierarchy for this netfs will be scrapped and begun
    +     afresh).
    +
    + (3) The operations table is defined as follows:
    +
    +	struct fscache_netfs_operations {
    +		struct fscache_page *(*get_page_cookie)(struct page *page);
    +	};
    +
    +     The functions here must all be present. Currently the only one is:
    +
    +     (a) get_page_token(): Get the token used to bind a page to a block in a
    +         cache. This function should allocate it if it doesn't exist.
    +
    +	 Return -ENOMEM if there's not enough memory and -ENODATA if the page
    +	 just shouldn't be cached.
    +
    +	 Set *_page_token to point to the token and return 0 if there is now a
    +	 token. Note that the netfs must keep track of the token itself (and
    +	 free it later). page->private can be used for this (see below).
    +
    + (4) The cookie representing the primary index will be allocated according to
    +     another parameter passed into the registration function.
    +
    +For example, kAFS (linux/fs/afs/) uses the following definitions to describe
    +itself:
    +
    +	static struct fscache_netfs_operations afs_cache_ops = {
    +		.get_page_token	= afs_cache_get_page_token,
    +	};
    +
    +	struct fscache_netfs afs_cache_netfs = {
    +		.name		= "afs",
    +		.version	= 0,
    +		.ops		= &afs_cache_ops,
    +	};
    +
    +
    +================
    +INDEX DEFINITION
    +================
    +
    +Indexes are used for two purposes:
    +
    + (1) To speed up the finding of a file based on a series of keys (such as AFS's
    +     "cell", "volume ID", "vnode ID").
    +
    + (2) To make it easier to discard a subset of all the files cached based around
    +     a particular key - for instance to mirror the removal of an AFS volume.
    +
    +However, since it's unlikely that any two netfs's are going to want to define
    +their index hierarchies in quite the same way, FS-Cache tries to impose as few
    +restraints as possible on how an index is structured and where it is placed in
    +the tree. The netfs can even mix indexes and data files at the same level, but
    +it's not recommended.
    +
    +There are some limits on indexes:
    +
    + (1) All entries in any given index must be the same size. The netfs supplies a
    +     blob of data for each index entry.
    +
    + (2) The entries in one index can be of a different size to the entries in
    +     another index.
    +
    + (3) The entry data must be atomically journallable, so it is limited to 400
    +     bytes at present.
    +
    + (4) The index data must start with the key. The layout of the key is described
    +     in the index definition, and this is used to display the key in some
    +     appropriate way.
    +
    + (5) The depth of the index tree should be judged with care as the search
    +     function is recursive. Too many layers will run the kernel out of stack.
    +
    +To define an index, a structure of the following type should be filled out:
    +
    +	struct fscache_index_def
    +	{
    +		uint8_t name[8];
    +		uint16_t data_size;
    +		struct {
    +			uint8_t type;
    +			uint16_t len;
    +		} keys[4];
    +
    +		fscache_match_val_t (*match)(void *target_netfs_data,
    +					     const void *entry);
    +
    +		void (*update)(void *source_netfs_data, void *entry);
    +	};
    +
    +This has the following fields:
    +
    + (1) The name of the index (NUL terminated unless all 8 chars are used).
    +
    + (2) The size of the data blob provided by the netfs.
    +
    + (3) A definition of the key(s) at the beginning of the blob. The netfs is
    +     permitted to specify up to four keys. The total length must not exceed the
    +     data size. It is assumed that the keys will be laid end to end in order,
    +     starting at the first byte of the data.
    +
    +     The type field specifies the way the data should be displayed. It can be
    +     one of:
    +
    +	(*) FSCACHE_INDEX_KEYS_NOTUSED	- key field not used
    +	(*) FSCACHE_INDEX_KEYS_BIN	- display byte-by-byte in hex
    +	(*) FSCACHE_INDEX_KEYS_BIN_SZ1	- as above, BE size in byte 0
    +	(*) FSCACHE_INDEX_KEYS_BIN_SZ2	- as above, BE size in bytes 0-1
    +	(*) FSCACHE_INDEX_KEYS_BIN_SZ4	- as above, BE size in bytes 0-3
    +	(*) FSCACHE_INDEX_KEYS_ASCIIZ	- NUL-terminated ASCII
    +	(*) FSCACHE_INDEX_KEYS_IPV4ADDR	- display as IPv4 address
    +	(*) FSCACHE_INDEX_KEYS_IPV6ADDR	- display as IPv6 address
    +
    + (4) A function to compare an in-page-cache index entry blob with the data
    +     passed to the cookie acquisition function. This function can also be used
    +     to extract data from the blob and copy it into the netfs's structures.
    +
    +     The values this function can return are:
    +
    +	(*) FSCACHE_MATCH_FAILED - failed to match
    +	(*) FSCACHE_MATCH_SUCCESS - successful match
    +	(*) FSCACHE_MATCH_SUCCESS_UPDATE - successful match, entry needs update
    +	(*) FSCACHE_MATCH_SUCCESS_DELETE - entry should be deleted
    +
    +     For example, in linux/fs/afs/vnode.c:
    +
    +	static fscache_match_val_t
    +	afs_vnode_cache_match(void *target, const void *entry)
    +	{
    +		const struct afs_cache_vnode *cvnode = entry;
    +		struct afs_vnode *vnode = target;
    +
    +		if (vnode->fid.vnode != cvnode->vnode_id)
    +			return FSCACHE_MATCH_FAILED;
    +
    +		if (vnode->fid.unique != cvnode->vnode_unique ||
    +		    vnode->status.version != cvnode->data_version)
    +			return FSCACHE_MATCH_SUCCESS_DELETE;
    +
    +		return FSCACHE_MATCH_SUCCESS;
    +	}
    +
    + (5) A function to initialise or update an in-page-cache index entry blob from
    +     netfs data passed to FS-Cache by the netfs. This function should not assume
    +     that there's any data yet in the in-page-cache.
    +
    +     Continuing the above example:
    +
    +	static void afs_vnode_cache_update(void *source, void *entry)
    +	{
    +		struct afs_cache_vnode *cvnode = entry;
    +		struct afs_vnode *vnode = source;
    +
    +		cvnode->vnode_id	= vnode->fid.vnode;
    +		cvnode->vnode_unique	= vnode->fid.unique;
    +		cvnode->data_version	= vnode->status.version;
    +	}
    +
    +     Any dead space in the index entry should be filled with a pattern defined
    +     by FS-Cache:
    +
    +	FSCACHE_INDEX_DEADFILL_PATTERN
    +
    +To finish the above example, the index definition for the "vnode" level is as
    +follows:
    +
    +	struct fscache_index_def afs_vnode_cache_index_def = {
    +		.name		= "vnode",
    +		.data_size	= sizeof(struct afs_cache_vnode),
    +		.keys[0]	= { FSCACHE_INDEX_KEYS_BIN, 4 },
    +		.match		= afs_vnode_cache_match,
    +		.update		= afs_vnode_cache_update,
    +	};
    +
    +The first element of struct afs_cache_vnode is the vnode ID.
    +
    +And for contrast, the cell index definition is:
    +
    +	struct fscache_index_def afs_cache_cell_index_def = {
    +		.name			= "cell_ix",
    +		.data_size		= sizeof(struct afs_cell),
    +		.keys[0]		= { FSCACHE_INDEX_KEYS_ASCIIZ, 64 },
    +		.match			= afs_cell_cache_match,
    +		.update			= afs_cell_cache_update,
    +	};
    +
    +The cell index is the primary index for kAFS.
    +
    +
    +===================================
    +NETWORK FILESYSTEM (UN)REGISTRATION
    +===================================
    +
    +The first step is to declare the network filesystem to the cache. This also
    +involves specifying the layout of the primary index (for AFS, this would be the
    +"cell" level).
    +
    +The registration function is:
    +
    +	int fscache_register_netfs(struct fscache_netfs *netfs,
    +				   struct fscache_index_def *primary_idef);
    +
    +It just takes pointers to the netfs definition and the primary index
    +definition. It returns 0 or an error as appropriate.
    +
    +For kAFS, registration is done as follows:
    +
    +	ret = fscache_register_netfs(&afs_cache_netfs,
    +				     &afs_cache_cell_index_def);
    +
    +The last step is, of course, unregistration:
    +
    +	void fscache_unregister_netfs(struct fscache_netfs *netfs);
    +
    +
    +==================
    +INDEX REGISTRATION
    +==================
    +
    +The second step is to inform FS-Cache about part of an index hierarchy that can
    +be used to locate files. This is done by requesting a cookie for each index in
    +the path to the file:
    +
    +	struct fscache_cookie *
    +	fscache_acquire_cookie(struct fscache_cookie *iparent,
    +			       struct fscache_index_def *idef,
    +			       void *netfs_data);
    +
    +This function creates an index entry in the index represented by iparent,
    +loading the associated blob by calling iparent's update method with the
    +supplied netfs_data.
    +
    +It also creates a new index inode, formatted according to the definition
    +supplied in idef. The new cookie is then returned in *_cookie.
    +
    +Note that this function never returns an error - all errors are handled
    +internally. It may also return FSCACHE_NEGATIVE_COOKIE. It is quite acceptable
    +to pass this token back to this function as iparent (or even to the relinquish
    +cookie, read page and write page functions - see below).
    +
    +Note also that no indexes are actually created on disc until a data file needs
    +to be created somewhere down the hierarchy. Furthermore, an index may be
    +created in several different caches independently at different times. This is
    +all handled transparently, and the netfs doesn't see any of it.
    +
    +For example, with AFS, a cell would be added to the primary index. This index
    +entry would have a dependent inode containing a volume location index for the
    +volume mappings within this cell:
    +
    +	cell->cache =
    +		fscache_acquire_cookie(afs_cache_netfs.primary_index,
    +				       &afs_vlocation_cache_index_def,
    +				       cell);
    +
    +Then when a volume location was accessed, it would be entered into the cell's
    +index and an inode would be allocated that acts as a volume type and hash chain
    +combination:
    +
    +	vlocation->cache =
    +		fscache_acquire_cookie(cell->cache,
    +				       &afs_volume_cache_index_def,
    +				       vlocation);
    +
    +And then a particular flavour of volume (R/O for example) could be added to
    +that index, creating another index for vnodes (AFS inode equivalents):
    +
    +	volume->cache =
    +		fscache_acquire_cookie(vlocation->cache,
    +				       &afs_vnode_cache_index_def,
    +				       volume);
    +
    +
    +======================
    +DATA FILE REGISTRATION
    +======================
    +
    +The third step is to request a data file be created in the cache. This is
    +almost identical to index cookie acquisition. The only difference is that a
    +NULL index definition is passed.
    +
    +	vnode->cache =
    +		fscache_acquire_cookie(volume->cache,
    +				       NULL,
    +				       vnode);
    +
    +
    +=====================
    +PAGE ALLOC/READ/WRITE
    +=====================
    +
    +And the fourth step is to propose a page be cached. There are two functions
    +that are used to do this.
    +
    +Firstly, the netfs should ask FS-Cache to examine the caches and read the
    +contents cached for a particular page of a particular file if present, or else
    +allocate space to store the contents if not:
    +
    +	typedef
    +	void (*fscache_rw_complete_t)(void *cookie_data,
    +				      struct page *page,
    +				      void *end_io_data,
    +				      int error);
    +
    +	int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
    +				       struct page *page,
    +				       fscache_rw_complete_t end_io_func,
    +				       void *end_io_data,
    +				       unsigned long gfp);
    +
    +The cookie argument must specify a data file cookie, the page specified will
    +have the data loaded into it (and is also used to specify the page number), and
    +the gfp argument is used to control how any memory allocations made are satisfied.
    +
    +If the cookie indicates the inode is not cached:
    +
    + (1) The function will return -ENOBUFS.
    +
    +Else if there's a copy of the page resident on disc:
    +
    + (1) The function will submit a request to read the data off the disc directly
    +     into the page specified.
    +
    + (2) The function will return 0.
    +
    + (3) When the read is complete, end_io_func() will be invoked with:
    +
    +     (*) The netfs data supplied when the cookie was created.
    +
    +     (*) The page descriptor.
    +
    +     (*) The data passed to the above function.
    +
    +     (*) An argument that's 0 on success or negative for an error.
    +
    +     If an error occurs, it should be assumed that the page contains no usable
    +     data.
    +
    +Otherwise, if there's not a copy available on disc:
    +
    + (1) A block may be allocated in the cache and attached to the inode at the
    +     appropriate place.
    +
    + (2) The validity journal will be marked to indicate this page does not yet
    +     contain valid data.
    +
    + (3) The function will return -ENODATA.
    +
    +
    +Secondly, if the netfs changes the contents of the page (either due to an
    +initial download or if a user performs a write), then the page should be
    +written back to the cache:
    +
    +	int fscache_write_page(struct fscache_cookie *cookie,
    +			       struct page *page,
    +			       fscache_rw_complete_t end_io_func,
    +			       void *end_io_data,
    +			       unsigned long gfp);
    +
    +The cookie argument must specify a data file cookie, the page specified should
    +contain the data to be written (and is also used to specify the page number),
    +and the gfp argument is used to control how any memory allocations made are
    +satisfied.
    +
    +If the cookie indicates the inode is not cached then:
    +
    + (1) The function will return -ENOBUFS.
    +
    +Else if there's a block allocated on disc to hold this page:
    +
    + (1) The function will submit a request to write the data to the disc directly
    +     from the page specified.
    +
    + (2) The function will return 0.
    +
    + (3) When the write is complete:
    +
    +     (a) Any associated validity journal entry will be cleared (the block now
    +	 contains valid data as far as FS-Cache is concerned).
    +
    +     (b) end_io_func() will be invoked with:
    +
    +	 (*) The netfs data supplied when the cookie was created.
    +
    +	 (*) The page descriptor.
    +
    +	 (*) The data passed to the above function.
    +
    +	 (*) An argument that's 0 on success or negative for an error.
    +
    +	 If an error happens, it can be assumed that the page has been
    +	 discarded from the cache.
    +
    +
    +==============
    +PAGE UNCACHING
    +==============
    +
    +To uncache a page, this function should be called:
    +
    +	void fscache_uncache_page(struct fscache_cookie *cookie,
    +				  struct page *page);
    +
    +This detaches the page specified from the data file indicated by the cookie and
    +unbinds it from the underlying block.
    +
    +Note that pages can't be explicitly detached from the a data file. The whole
    +data file must be retired (see the relinquish cookie function below).
    +
    +Furthermore, note that this does not cancel the asynchronous read or write
    +operation started by the read/alloc and write functions.
    +
    +
    +==========================
    +INDEX AND DATA FILE UPDATE
    +==========================
    +
    +To request an update of the index data for an index or data file, the following
    +function should be called:
    +
    +	void fscache_update_cookie(struct fscache_cookie *cookie);
    +
    +This function will refer back to the netfs_data pointer stored in the cookie by
    +the acquisition function to obtain the data to write into each revised index
    +entry. The update method in the parent index definition will be called to
    +transfer the data.
    +
    +
    +==================================
    +INDEX AND DATA FILE UNREGISTRATION
    +==================================
    +
    +To get rid of a cookie, this function should be called.
    +
    +	void fscache_relinquish_cookie(struct fscache_cookie *cookie,
    +				       int retire);
    +
    +If retire is non-zero, then the index or file will be marked for recycling, and
    +all copies of it will be removed from all active caches in which it is present.
    +
    +If retire is zero, then the inode may be available again next the the
    +acquisition function is called.
    +
    +One very important note - relinquish must NOT be called unless all "child"
    +indexes, files and pages have been relinquished first.
    +
    +
    +=====================
    +PAGE TOKEN MANAGEMENT
    +=====================
    +
    +As previously mentioned, the netfs must keep a token associated with each page
    +currently actively backed by the cache. This is used by FS-Cache to go from a
    +page to the internal representation of the underlying block and back again. It
    +is particularly important for managing the withdrawal of a cache whilst it is
    +in active service (eg: it got unmounted).
    +
    +The token is this:
    +
    +	struct fscache_page {
    +		...
    +	};
    +
    +Note that all fields are for internal FS-Cache use only.
    +
    +The token only needs to be allocated when FS-Cache asks for it. This it will do
    +by calling the get_page_cookie() method in the netfs definition ops table. Once
    +allocated, the same token should be presented every time the method is called
    +again for a particular page.
    +
    +The token should be retained by the netfs, and should be deleted only after the
    +page has been uncached.
    +
    +One way to achieve this is to attach the token to page->private (and set the
    +PG_private bit on the page) once allocated. Shortcut routines are provided by
    +FS-Cache to do this. Firstly, to retrieve if present and allocate if not:
    +
    +	struct fscache_page *fscache_page_get_private(struct page *page,
    +						      unsigned gfp);
    +
    +Secondly to retrieve if present and BUG if not:
    +
    +	static inline
    +	struct fscache_page *fscache_page_grab_private(struct page *page);
    +
    +To clean up the tokens, the netfs inode hosting the page should be provided
    +with address space operations that circumvent the buffer-head operations for a
    +page. For instance:
    +
    +	struct address_space_operations afs_fs_aops = {
    +		...
    +		.sync_page	= block_sync_page,
    +		.set_page_dirty	= __set_page_dirty_nobuffers,
    +		.releasepage	= afs_file_releasepage,
    +		.invalidatepage	= afs_file_invalidatepage,
    +	};
    +
    +	static int afs_file_invalidatepage(struct page *page,
    +					   unsigned long offset)
    +	{
    +		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
    +		int ret = 1;
    +
    +		BUG_ON(!PageLocked(page));
    +		if (!PagePrivate(page))
    +			return 1;
    +		fscache_uncache_page(vnode->cache,page);
    +		if (offset == 0)
    +			return 1;
    +		BUG_ON(!PageLocked(page));
    +		if (PageWriteback(page))
    +			return 0;
    +		return page->mapping->a_ops->releasepage(page, 0);
    +	}
    +
    +	static int afs_file_releasepage(struct page *page, int gfp_flags)
    +	{
    +		struct fscache_page *token;
    +		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
    +
    +		if (PagePrivate(page)) {
    +			fscache_uncache_page(vnode->cache, page);
    +			token = (struct fscache_page *) page->private;
    +			page->private = 0;
    +			ClearPagePrivate(page);
    +			if (token)
    +				kfree(token);
    +		}
    +		return 0;
    +	}
    +
    +
    +================================
    +INDEX AND DATA FILE INVALIDATION
    +================================
    +
    +There is no direct way to invalidate an index subtree or a data file. To do
    +this, the caller should relinquish and retire the cookie they have, and then
    +acquire a new one.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Alan Kilian: "Solaris developer wants a Linux Mentor for drivers."

    Relevant Pages