Re: Incorrect comment in leave_mm()?

From: Andi Kleen (ak_at_suse.de)
Date: 03/31/05

  • Next message: Dmitry Torokhov: "Re: 2.6.12-rc1 swsusp broken [Was Re: swsusp not working for me on a PREEMPT 2.6.12-rc1 and 2.6.12-rc1-mm3 kernel]"
    Date:	Thu, 31 Mar 2005 21:20:53 +0200
    To: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
    
    

    On Thu, Mar 31, 2005 at 05:14:30PM +0100, Keir Fraser wrote:
    >
    > Hi,
    >
    > I have a question regarding the per-cpu tlbstate logic that is used to
    > lazily switch to the swapper_pgdir when running a process with no
    > mm_struct of its own.
    >
    > There is a comment in arch/i386/kernel/smp.c:leave_mm() that
    > states 'We need to reload %cr3 since the page tables may be going away
    > from under us'. AFAICT this is not true -- the currently-running task
    > holds a reference on the active_mm until it is context-switched off
    > the CPU, at which point the reference is dropped in
    > sched.c:finish_task_switch(). Until that point the pgd cannot be
    > freed and so kernel mappings should remain valid to use.

    The PTE pages get freed earlier. On x86-64 also PMD/PGD.
    The code is needed to prevent a CPU from ever seeing any partially
    freed page tables. After the flush IPI happened the PTE pages
    get freed, and if you dont reload to init_mm the CPU has
    already freed page tables in its TLB.

    Modern x86 do an awful lot of prefetching behind your back, doing
    MMU lookup on adresses you never touched etc. and you have to
    be extremly careful to only ever have fully valid page tables
    in CR3 all the time.

    I had this code disabled on x86-64, but I have several open bugs
    because of this I thin (there were some other bugs in this logic too
    which I only recently fixed, but they were all 64bit specific).

    I can only advise against touching this! It is very easy to break
    and very subtle.

    > Although the corresponding function in arch/x86_64 doesn't include
    > this comment, Andi Kleen recently modified it to switch to the
    > swapper_pg_dir, instead of doing a simple __flush_tlb. Does this mean
    > that I am missing something, and the comment in arch/i386 is in fact
    > correct?

    It is correct.

    -Andi
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Dmitry Torokhov: "Re: 2.6.12-rc1 swsusp broken [Was Re: swsusp not working for me on a PREEMPT 2.6.12-rc1 and 2.6.12-rc1-mm3 kernel]"