Re: CPU boot problem on 2.6.0-test3-bk8
From: Andrew Theurer (habanero_at_us.ibm.com)
Date: 08/21/03
- Previous message: Maciej W. Rozycki: "Re: Input issues - key down with no key up"
- In reply to: Dave Hansen: "Re: CPU boot problem on 2.6.0-test3-bk8"
- Next in thread: Dave Hansen: "Re: CPU boot problem on 2.6.0-test3-bk8"
- Reply: Dave Hansen: "Re: CPU boot problem on 2.6.0-test3-bk8"
- Reply: Dave Hansen: "Re: CPU boot problem on 2.6.0-test3-bk8"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
To: Dave Hansen <haveblue@us.ibm.com> Date: Thu, 21 Aug 2003 09:10:07 -0500
On Wednesday 20 August 2003 22:42, Dave Hansen wrote:
> On Wed, 2003-08-20 at 18:13, Andrew Theurer wrote:
> > On Wednesday 20 August 2003 20:02, Dave Hansen wrote:
> > > On Wed, 2003-08-20 at 14:58, Andrew Theurer wrote:
> > > > Maybe this is already known, but just in case:
> > > > I cannot fully boot on an x440 system with 2.6.0-test3-bk8. The
> > > > kernel tries to boot more than the 16 logical processors, and after
> > > > failing (no response) on cpus 16, 17, and 18, it still thinks it has
> > > > 19 cpus total. It finally gets stuck at "checking TSC synchronization
> > > > across 19 CPUs:"
> > > >
> > > > Attached is the boot log. Any ideas? I'll try -test3-bk7 next
> > >
> > > Can you see if it works without HT on? Did it work on plain -test3?
> > > My 16-way x440 with no HT boots fine on test3.
> >
> > I'll try without HT to see what happens. FWIW, it boots fine with HT if
> > I set maxcpus=16. I am wondering if (apicid == BAD_APIC) test is not
> > working in smp_boot_cpus.
>
> Hmmm. This is looking like fallout from the massive wli-bomb. Here's
> the loop that controls the cpu booting, before and after cpumask_t:
>
> - for (bit = 0; kicked < NR_CPUS && bit < BITS_PER_LONG; bit++) + for
> (bit = 0; kicked < NR_CPUS && bit < MAX_APICS; bit++)
> apicid = cpu_present_to_apicid(bit);
>
> "kicked" only gets incremented for CPUs that were successfully booted,
> so it doesn't help terminate the loop much. MAX_APICS is 256 on summit,
> which is *MUCH* bigger than BITS_PER_LONG.
> cpu_2_logical_apicid[NR_CPUS] which is referenced from
> cpu_present_to_apicid() is getting referenced up to MAX_APICs, which is
> bigger than NR_CPUS. Overflow. Bang. garbage != BAD_APICID :)
Still looks like we have a problem (see attached boot log). Maybe we should
change that for loop to:
for (bit = 0; kicked < num_processors && bit < BITS_PER_LONG; bit++)
So we only loop for the actual number processors found in mpparse.c? This
seems to work for me.
-Andrew Theurer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- text/plain attachment: 260test3bk8patch1
- text/x-diff attachment: patch-boot-cpu.260test3bk8
- Previous message: Maciej W. Rozycki: "Re: Input issues - key down with no key up"
- In reply to: Dave Hansen: "Re: CPU boot problem on 2.6.0-test3-bk8"
- Next in thread: Dave Hansen: "Re: CPU boot problem on 2.6.0-test3-bk8"
- Reply: Dave Hansen: "Re: CPU boot problem on 2.6.0-test3-bk8"
- Reply: Dave Hansen: "Re: CPU boot problem on 2.6.0-test3-bk8"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|