Re: CPU boot problem on 2.6.0-test3-bk8

From: Dave Hansen (haveblue_at_us.ibm.com)
Date: 08/21/03

  • Next message: H. Peter Anvin: "Re: Initramfs confusion"
    To: Andrew Theurer <habanero@us.ibm.com>
    Date:	20 Aug 2003 20:42:09 -0700
    
    
    

    On Wed, 2003-08-20 at 18:13, Andrew Theurer wrote:
    > On Wednesday 20 August 2003 20:02, Dave Hansen wrote:
    > > On Wed, 2003-08-20 at 14:58, Andrew Theurer wrote:
    > > > Maybe this is already known, but just in case:
    > > > I cannot fully boot on an x440 system with 2.6.0-test3-bk8. The kernel
    > > > tries to boot more than the 16 logical processors, and after failing (no
    > > > response) on cpus 16, 17, and 18, it still thinks it has 19 cpus total.
    > > > It finally gets stuck at "checking TSC synchronization across 19 CPUs:"
    > > >
    > > > Attached is the boot log. Any ideas? I'll try -test3-bk7 next
    > >
    > > Can you see if it works without HT on? Did it work on plain -test3?
    > > My 16-way x440 with no HT boots fine on test3.
    >
    > I'll try without HT to see what happens. FWIW, it boots fine with HT if I set
    > maxcpus=16. I am wondering if (apicid == BAD_APIC) test is not working in
    > smp_boot_cpus.

    Hmmm. This is looking like fallout from the massive wli-bomb. Here's
    the loop that controls the cpu booting, before and after cpumask_t:

    - for (bit = 0; kicked < NR_CPUS && bit < BITS_PER_LONG; bit++) + for
    (bit = 0; kicked < NR_CPUS && bit < MAX_APICS; bit++)
                    apicid = cpu_present_to_apicid(bit);

    "kicked" only gets incremented for CPUs that were successfully booted,
    so it doesn't help terminate the loop much. MAX_APICS is 256 on summit,
    which is *MUCH* bigger than BITS_PER_LONG.
    cpu_2_logical_apicid[NR_CPUS] which is referenced from
    cpu_present_to_apicid() is getting referenced up to MAX_APICs, which is
    bigger than NR_CPUS. Overflow. Bang. garbage != BAD_APICID :)

    Attached patch fixes it. We sure do have a lot of duplicate code in the
    subarches. <sigh>

    -- 
    Dave Hansen
    haveblue@us.ibm.com
    
    

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/



  • Next message: H. Peter Anvin: "Re: Initramfs confusion"

    Relevant Pages

    • wait_event and preemption in 2.6
      ... I'm writing a device driver for PPC Linux and I'm using wait_event. ... preemption is turned on. ... check the condition and break out of the loop. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • [PATCH] NULL pointer deref in tcp_do_twkill_work()
      ... Shouldn't the loop always restart from the beginning instead of using the ... The alternative is to not drop the lock, but I'm guessing we need to do ... Proposed patch is attached. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • x86 build issue with software suspend code
      ... collision and the relocation from and alloc section targeting targeting ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [PATCH 2.6.9-rc2-mm1 0/2] mm: memory policy for page cache allocation
      ... Patches done with the 'diff -p' option are slightly easier to ... Could you explain the for loop in alloc_page_roundrobin? ... pseudo-uniform distribution, without any need for the additional rr_next ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: ide errors in 7-rc1-mm1 and later
      ... Suggestions/proposals for new features etc, if they're a good idea, I ... long-lived specs with multiple revisions, ... >Some other bigger OS wanted it differently, ... To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ...
      (Linux-Kernel)