Re: CPU boot problem on 2.6.0-test3-bk8

From: Andrew Theurer (habanero_at_us.ibm.com)
Date: 08/21/03

  • Next message: Vojtech Pavlik: "Re: Input issues - key down with no key up"
    To: Dave Hansen <haveblue@us.ibm.com>
    Date:	Thu, 21 Aug 2003 09:10:07 -0500
    
    
    

    On Wednesday 20 August 2003 22:42, Dave Hansen wrote:
    > On Wed, 2003-08-20 at 18:13, Andrew Theurer wrote:
    > > On Wednesday 20 August 2003 20:02, Dave Hansen wrote:
    > > > On Wed, 2003-08-20 at 14:58, Andrew Theurer wrote:
    > > > > Maybe this is already known, but just in case:
    > > > > I cannot fully boot on an x440 system with 2.6.0-test3-bk8. The
    > > > > kernel tries to boot more than the 16 logical processors, and after
    > > > > failing (no response) on cpus 16, 17, and 18, it still thinks it has
    > > > > 19 cpus total. It finally gets stuck at "checking TSC synchronization
    > > > > across 19 CPUs:"
    > > > >
    > > > > Attached is the boot log. Any ideas? I'll try -test3-bk7 next
    > > >
    > > > Can you see if it works without HT on? Did it work on plain -test3?
    > > > My 16-way x440 with no HT boots fine on test3.
    > >
    > > I'll try without HT to see what happens. FWIW, it boots fine with HT if
    > > I set maxcpus=16. I am wondering if (apicid == BAD_APIC) test is not
    > > working in smp_boot_cpus.
    >
    > Hmmm. This is looking like fallout from the massive wli-bomb. Here's
    > the loop that controls the cpu booting, before and after cpumask_t:
    >
    > - for (bit = 0; kicked < NR_CPUS && bit < BITS_PER_LONG; bit++) + for
    > (bit = 0; kicked < NR_CPUS && bit < MAX_APICS; bit++)
    > apicid = cpu_present_to_apicid(bit);
    >
    > "kicked" only gets incremented for CPUs that were successfully booted,
    > so it doesn't help terminate the loop much. MAX_APICS is 256 on summit,
    > which is *MUCH* bigger than BITS_PER_LONG.
    > cpu_2_logical_apicid[NR_CPUS] which is referenced from
    > cpu_present_to_apicid() is getting referenced up to MAX_APICs, which is
    > bigger than NR_CPUS. Overflow. Bang. garbage != BAD_APICID :)

    Still looks like we have a problem (see attached boot log). Maybe we should
    change that for loop to:

    for (bit = 0; kicked < num_processors && bit < BITS_PER_LONG; bit++)

    So we only loop for the actual number processors found in mpparse.c? This
    seems to work for me.

    -Andrew Theurer

    
    
    

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/




  • Next message: Vojtech Pavlik: "Re: Input issues - key down with no key up"

    Relevant Pages

    • Re: 2.6.14-rc4-rt1 - enable IRQ-off tracing causes kernel to fault at boot
      ... >> boots are in the capture file. ... > (Note that doing this will re-introduce tracing bugs, ... Let me know when you have a fix to test, ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [PATCH] Dynamic tick for x86 version 050609-2
      ... Patches with 3 minor rejects against -rc6-mm1, boots, and seems to work well on ... my Dell Latitude C840 laptop - although running at full load with seti@home ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: 2.6.14-rc2-mm1
      ... > Overall boots up and looks fine, but still seeing this oops which comes up on ... Nasty. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • [patch] real-time enhanced page allocator and throttling
      ... It compiles, it boots, and it does not crash. ... There are multiple ways to handle the real-time task path. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • 2.4.27 and CCISS related problem
      ... Smart Array 5i+. ... It boots up fine with 2.4.25, but halts every time ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)