Re: [00/17] Large Blocksize Support V3



Eric W. Biederman wrote:
clameter@xxxxxxx writes:


V2->V3
- More restructuring
- It actually works!
- Add XFS support
- Fix up UP support
- Work out the direct I/O issues
- Add CONFIG_LARGE_BLOCKSIZE. Off by default which makes the inlines revert
back to constants. Disabled for 32bit and HIGHMEM configurations.
This also allows a gradual migration to the new page cache
inline functions. LARGE_BLOCKSIZE capabilities can be
added gradually and if there is a problem then we can disable
a subsystem.

V1->V2
- Some ext2 support
- Some block layer, fs layer support etc.
- Better page cache macros
- Use macros to clean up code.

This patchset modifies the Linux kernel so that larger block sizes than
page size can be supported. Larger block sizes are handled by using
compound pages of an arbitrary order for the page cache instead of
single pages with order 0.


Huh?

You seem to be mixing two very different concepts.

The page cache has no problems supporting things with a block
size larger then page size. Now the block device layer may not
have the code to do the scatter gather into small pages and it
may not handle buffer heads whose data is split between multiple
pages.

Yeah, this patch is not really large blocksize support (which we normally
think of as block size > page cache size).


But this is not a page cache issue.

And generally larger physical pages are a mistake to use.
Especially as it looks from some of the later comment you don't
date test on 32bit because the memory fragments faster.

I actually completely agree with this, and I'm concerned in general about
using higher order pages. I think it is fundamentally the wrong approach
because of fragmentation and defragmentation costs (similarly to Linus's
take on page colouring).

I think starting with the assumption that we _want_ to use higher order
allocations, and then creating all this complexity around that is not a
good one, and if we start introducing things that _require_ significant
higher order allocations to function then it is a nasty thing for
robustness.


Is it common for hardware that supports large block sizes to not
support splitting those blocks apart during DMA? Unless it is common
the whole premise of this patchset seems broken.

I suspect what needs to be fixed is the page cache block device
interface so that we have helper functions that know how to stuff
a single block into several pages.

I am working now and again on some code to do this, it is a big job but
I think it is the right way to do it. But it would take a long time to
get stable and supported by filesystems...


That would make the choice of using larger order pages (essentially
increasing PAGE_SIZE) something that can be investigated in parallel.

I agree that hardware inefficiencies should be handled by increasing
PAGE_SIZE (not making PAGE_CACHE_SIZE > PAGE_SIZE) at the arch level.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Kernel 2.6 SMP very slow with ServerWorks LE Chipset
    ... in UP config everything is OK in SMP the system slows right down, I've been googling and recompiling my kernel for days looking for the problem option without success, please help. ... Below is some useful output, the hdpram -tT examples below are under the smp kernel, the same commands under a uniprocessor kernel yeld fairly consistent timed cache reads around ... Linux Plug and Play Support v0.97 Adam Belay ... Probing IDE interface ide0... ...
    (Debian-User)
  • RE: Windows 2003 Server throughput vs. Windows XP
    ... "A cache miss is an event that occurs when the CPU ... Microsoft CSS for a case support. ...
    (microsoft.public.win32.programmer.kernel)
  • 2.6.15-mm3 hangs during boot (raid related?)
    ... ACPI: Local APIC address 0xfee00000 ... Enabling unmasked SIMD FPU exception support... ... CPU: L2 cache: 512K ... usb usb1: Product: EHCI Host Controller ...
    (Linux-Kernel)
  • Re: [RFC 0/8] Variable Order Page Cache
    ... This patchset modifies the core VM so that higher order page cache pages ... The higher order page cache pages are compound pages ... it would be good if the VM would support I/O to higher order pages ...
    (Linux-Kernel)
  • RE: Cannot change any control id in vs2005 web project
    ... existing issue of the Visual Studio 2005 IDE. ... And each web project will has a own sub cache directory under it. ... Microsoft CSS for a hotfix against this issue. ... Microsoft MSDN Online Support Lead ...
    (microsoft.public.vsnet.general)