Re: Linux 2.6.30-rc8 [also: VIA Support]



On Thu June 4 2009, Harald Welte wrote:
Dear Linus and others,

On Thu, Jun 04, 2009 at 09:13:15AM -0700, Linus Torvalds wrote:

There have been reports of hangs on various VIA C7 machines going back
a year now. The version of the kernel doesn't seem to matter, but the
version of glibc does. Unfortunately there hasn't been much progress
in getting to the bottom of it.

See here (and other linked reports):
http://bugs.gentoo.org/show_bug.cgi?id=228263

Hmm. That looks like a CPU problem, but hey, it might be that the glibc
version thing is just coincidence, and just changes timings or whatever,
and the problem is in the chipsets.

So at least from that particular report it smells very much
non-kernel-related.

That said, even if it isn't kernel-related, it might be fixable with some
kernel patch that changes the setup of the CPU/chipset. But we'd need VIA
to help with anythign like that.

So far, inside VIA there is no well-known issue/bug about such hangs / locks at
all.

I have seen a number (probably between 5 or 10) of sporadic reports from a
number of people on a variety of systems. Some from actual commercial vendors
of VIA+Linux based appliances, and some from the wider community of end users.
So far, to the best of my knowledge, none of those isseus has been narrowed
down to a sufficiently easy to reproduce test case. Also, none of the bug
reporters has so far been able to reproduce the problem on a genuine VIA
mainboard, i.e. it could be issues introduced by the actual board hardware or
how the speicfic BIOS initializes the low-level hardware.

Especially when SMI/SMM based debugging no longer works (i.e. something that
appears to be a bus lockup), the actual bug needs to be reproduced on a
reference board that can be hooked up to a logic/protocol analyzer.

On the other hand, VIA's CPU division (CentaurLabs) is performing extensive
testing on their CPUs with a large codebase of x86 code, AFAIK based on more
than 40 operating systems. Also, there are large quantities of VIA CPU+chipset
systems that run without any problem, especially in 24/7 embedded x86 worloads
on Linux...

I'm more than determined to help resolving those sporadic Linux lock-up
problems. It feels like there is some problem out there, given the fact that
there is a number of independent reporters who talk about some kind of hard
system hang without oops that even prevents the NMI watchdog to kick in.

However, unless we can somehow narrow down at least one of those reports into
something that is easier to reproduce, and which can actuall be reproduced on
a VIA board. Triggering in 1-4 hours is already very good, I have reports
where 1 of 30 system exposes a lock once within 5 days of continuous full
application workload.

Sure, third party BIOS/board vendors selling products that randomly produce
locks are obviously also not a particularly great advertisement for VIA...
but debigging on such a board is much more difficult due to the lack of access
to BIOS sources, schematics and hardware debugging interfaces.

In any case, if somebody can ship me a system that exposes one of those
lock-ups, together with a pre-installed test case that exposes the problem
within let's say less than one day, plus the full kernel sources used in
that particular system: I'm happy to spend time to investigate the issue,
try to run the same test case on a VIA board, etc.


I am about at my wits end with this Everex product -

Give me a couple more weeks at the problem and if I haven't solved it;
I'll give you this machine if you promise to update LKML with any fix.

Mike
Any additional help is much appreciated.

Regards,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Linux 2.6.30-rc8 [also: VIA Support]
    ... The version of the kernel doesn't seem to matter, ... See here (and other linked reports): ... That looks like a CPU problem, but hey, it might be that the glibc ... reporters has so far been able to reproduce the problem on a genuine VIA ...
    (Linux-Kernel)
  • Re: SUJ update
    ... Ehh, looks like fresh kernel is not too stable, it holds after some time ... Looks like it is not related to SUJ, it happens even with SUJ disabled. ... I have had some reports of a hang waiting on journal space with certain ... to reproduce no matter how much load I throw at the machine. ...
    (freebsd-current)
  • Re: SUJ update
    ... Ehh, looks like fresh kernel is not too stable, it holds after some time ... Looks like it is not related to SUJ, it happens even with SUJ disabled. ... I have had some reports of a hang waiting on journal space with certain ... to reproduce no matter how much load I throw at the machine. ...
    (freebsd-current)
  • Re: [linux-cifs-client] BUG: Possible cifs+IPv6-Regression 2.6.27.4 -> 2.6.27.9
    ... The exact mount ... vanilla 2.6.28 kernel and Debian packaged Samba 3.2.5-3. ... intrepid-updates) the above noted mount command works fine, ... I've still not been able to reproduce this here though I don't have any ...
    (Linux-Kernel)
  • Re: starting with 2.7
    ... On Tuesday 04 January 2005 16:43, Willy Tarreau wrote: ... trend in the reports, go fix that and have -rc3 after say 5 days. ... that kernel will now fail amanda 100% of the time. ...
    (Linux-Kernel)