Re: O_DIRECT question



Bill Davidsen wrote:
Linus Torvalds wrote:

[]
But what O_DIRECT does right now is _not_ really sensible, and the
O_DIRECT propeller-heads seem to have some problem even admitting that
there _is_ a problem, because they don't care.

You say that as if it were a failing. Currently if you mix access via
O_DIRECT and non-DIRECT you can get unexpected results. You can screw
yourself, mangle your data, or have no problems at all if you avoid
trying to access the same bytes in multiple ways. There are lots of ways
to get or write stale data, not all involve O_DIRECT in any way, and the
people actually using O_DIRECT now are managing very well.

I don't regard it as a system failing that I am allowed to shoot myself
in the foot, it's one of the benefits of Linux over Windows. Using
O_DIRECT now is like being your own lawyer, room for both creativity and
serious error. But what's there appears portable, which is important as
well.

If I got it right (and please someone tell me if I *really* got it right!),
the problem is elsewhere.

Suppose you have a filesystem, not at all related to databases and stuff.
Your usual root filesystem, with your /etc/ /var and so on directories.

Some time ago you edited /etc/shadow, updating it by writing new file and
renaming it to proper place. So you have that old content of your shadow
file (now deleted) somewhere on the disk, but not accessible from the
filesystem.

Now, a bad guy deliberately tries to open some file on this filesystem, using
O_DIRECT flag, ftruncates() it to some huge size (or does seek+write), and
at the same time tries to use O_DIRECT read of the data.

Due to all the races etc, it is possible for him to read that old content of
/etc/shadow file you've deleted before.

I do have one thought, WRT reading uninitialized disk data. I would hope
that sparse files are handled right, and that when doing a write with
O_DIRECT the metadata is not updated until the write is done.

"hope that sparse files are handled right" is a high hope. Exactly because
this very place IS racy.

Again, *IF* I got it correctly.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: /usr/bin/cp preserving holes ?
    ... >> supports sparse files, but I don't see that it has anything to do with NFS ... >> some differences in the block count (due to differences in filesystem ... Absent a decent interface to determine hole location, ...
    (comp.unix.solaris)
  • Re: cp: I do not understand ...
    ... but did you try using backup/restore with the source filesystem ... You could try to find the sparse files on the source ... on the new filesystem instead of copying them over. ...
    (comp.unix.aix)
  • Removal of sparse file settings
    ... Does any of You how to remove the 'Sparse file' settings on a jfs2 filesystem? ... We've got a jfs2 filesystem, the sap adm user, and the oracle adm user got 'unlimited' settings, like this: ... I can see that the filesystem is using 'sparse files', but how it got that setting, I don't know.. ...
    (comp.unix.aix)
  • Re: Incorrect file size?
    ... Filesystem Size Used Avail Capacity Mounted on ... /dev/ar0s1a on / (ufs, local) ... This is most likely due to the use of 'sparse files': ...
    (freebsd-stable)
  • Re: Interesting speed benchmarks
    ... I don't think that cpio or pax support ACLs or file ... I thought 'star' handled sparse files and all the extra magic? ... For backups I use vbackup & venti from plan9ports. ... filesystem namespace) ...
    (freebsd-current)