buffer and page cache overview
From: sean larsson (infamous42md_at_ERASEMEhotpop.com)
Date: 07/29/04
- Next message: phil-news-nospam_at_ipal.net: "troubles patching kernel source"
- Previous message: Ask: "vritual/physical tanslation"
- Next in thread: P.T. Breuer: "Re: buffer and page cache overview"
- Reply: P.T. Breuer: "Re: buffer and page cache overview"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 29 Jul 2004 06:10:43 GMT
i'm studying the buffer/page cache. i've read a decent chunk of code, and i understand some of it; after more studying i'll probably understand more, but for now i'd like to make sure i have some concepts correct. so im going to just blab for a paragraph, and if someone could make sure i'm blabbing correctly i'd appreciate it.
buffer head structures are used to map blocks on disk to memory. the buffer head structures all reside in a SLAB cache, while the actual data blocks reside in pages allocated from alloc_pages() or friend. a page contains one or more data blocks. these blocks all belong to the same device, and furthermore these blocks exist sequentially on disk. so the low addr of page has lowest # block, and higher # blocks are at higher addr in page.
the buffer cache hash table is not used when it comes to regular file i/o. it is only used by the filesystem code when it needs to read inodes/superblocks/etc. in those cases, the FS code uses bread() function to get blocks it needs. now, for file i/o, the hash table has no significance(sp?). instead, when accessing a file we go through the page cache. the page cache is simply an array of pointers to struct page(hash table). each index in array is a bucket in hash table, with each page in bucket linked by page->next_hash ptr. blocks are hashed by using an index, which corresponds to the current offset in the file shifted right by the number of bits in page - giving you the page number this block belongs to. in addition to using the index, the address_space ptr is also used, b/c obviously many diff files may have same current offset, which wouldn't be a very good hash.
when reading data from a file, the page that would contain that data is located. if that page exists and contains valid data for that block, then we read ahead some more, and return the data to the user. if the page doesn't exist in cache, or has invalid data, then a new page is allocated, and the ENTIRE page is filled up with data blocks via readpage() function in address_ops. so, even if u read only the LAST block on the page, all of the previous blocks on teh page will be read in anyhow in readpage().
when writing data, same procedure is followed to locate a valid page or alloc a new one. after this, before data can be copied from the user, the prepare_write() method is called. in this func, if the file is being extended past EOF it will allocate new blocks by calling the filesystems get_blk() method. also, if the region that we want to write spans across the borders of blocks, then any partially filled blocks will have to be first read into the page cache before we could write them. the reason being that we couldn't just have half of the block contain the data we write, and leave the other half with some random data. the rest of that block on disk needs to be read into cache. after this func, we then copy data from user into location. after this, the commit_write() func of filesystem is called. this func marks buffers uptodate, and dirty, and refiles them if they were not already dirty. if any of the buffers were not already dirty, then we have to call the balance_dirty() function.
this is somethign that confuses me now. balance_dirty() wakes up bdflush, and bdflush grabs the first X bhs off the dirty list, and initiates a write to disk for them. what i dont get is: we have just added a number of bhs to the front of the dirty list in previous section, now we go and take those same exact buffers and write them to disk. i thought that instead we would delay the write of these freshly written buffers until some later point in time, ie when file is sync'd or closed, or we run low on buffers and need to write some back and reclaim space.
thanks much for any input.
-- -sean
- Next message: phil-news-nospam_at_ipal.net: "troubles patching kernel source"
- Previous message: Ask: "vritual/physical tanslation"
- Next in thread: P.T. Breuer: "Re: buffer and page cache overview"
- Reply: P.T. Breuer: "Re: buffer and page cache overview"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]