Re: [Ext2-devel] [RFC] [PATCH] Reducing average ext2 fsck time through fs-wide dirty bit]



On Mar 24, 2006 13:52 -0700, Matthew Wilcox wrote:
On Fri, Mar 24, 2006 at 12:28:02PM -0700, Andreas Dilger wrote:
Fix for this problem (inode is locked already):
- create a modified ext3_free_branches() to do tree walking and call a
method instead of always calling ext3_free_data->ext3_clear_blocks
- walk inode {d,t,}indirect blocks in forward direction, count bitmaps and
groups that will be modified (essentially NULL ext3_free_branches method)
- try to start a journal handle for this many blocks + 1 (inode) +
1 (super) + quota + EXT3_RESERVE_TRANS_BLOCKS
- if journal handle is too large (journal_start() returns -ENOSPC) fall
back to old zero-in-steps method (vast majority of cases will be OK
because number of modified blocks is much fewer)

Could we try a different fallback in this case? For example, attempt to
truncate only half as much? Is this even allowed?

What you suggest IS essentially the fallback. The current code will start
truncating at the end and grow the truncation until it can't any longer.
In order to make this operation correct w.r.t. recovery, it HAS to
zero out the already-truncated blocks, because the first transaction
may complete and commit, while the second may not. The proposed new
behaviour is only acceptable because it ensures that the whole truncate
can be completed in a single transaction.


For a rough estimate of the allowable size of a "new" truncate
transaction, worst case truncate dirties every group in the filesystem.
A 2TB filesystem has 16384 groups, maximum transaction size:

(16384 bitmaps + (16384 / 128) group desc + inode + super + quota)
= 16518

requiring a journal size of 4x that is about 260MB (default journal
size is 128MB these days for large filesystems). For the worst case 1
block/group this works out to a 64MB file, but in the vast majority of
cases we will have more than a single block per group, and could have
a full file truncate (up to 2TB file size) in the same (or smaller)
transaction size. Best case is about 125MB/group (i.e. per 4kB of journal
transaction size).

With the absolute minimum journal size we could always truncate files
up to 1MB w/o fallback, and rougly up to 16GB (at 1/2 group chunks per
"extent") without fallback.

The current code needs ~33 4kB blocks per 128MB of file size.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Simple Question on DTS - Please help me ..
    ... One way to do this is to use a transaction. ... The second is the data transform to that table. ... Create two ExecuteSQL tasks on Connection B with the statement "rollback ... I can achive the truncate statement by Execute sql task but i want to ...
    (microsoft.public.sqlserver.dts)
  • Re: Log Invalidated after truncate table
    ... >transaction log for the entire database. ... >until the next full database backup. ... Restore a full backup to restore the database to the state it had ... A TRUNCATE TABLE operation is minimally logged. ...
    (comp.databases.ms-sqlserver)
  • Re: truncate and full recovery model
    ... TRUNCATE TABLE logs only the deallocation of whole data pages in transaction ... log which is needed for recovery. ... > recovery model you are using. ...
    (microsoft.public.sqlserver.server)
  • Re: urgent!!!
    ... One of us has an incomplete and/or inaccurate understanding of SQL Server in this context. ... > a BEGIN TRANSACTION & END TRANSACTION statement. ... > Deleting All Rows Using TRUNCATE TABLE ...
    (microsoft.public.sqlserver.server)
  • Re: [PATCH 0/7] vfs: notify_changes() error handling
    ... there are more problems than these in the truncate path. ... have filesystems call their truncate sequence ... from ->setattr if filesystem specific operations are required. ... to be used to copy simple attributes into the generic inode. ...
    (Linux-Kernel)