Re: Serial related oops



(trimmed tie-fei.zang from the CC, added by mistake)
On Mon, Feb 19, 2007 at 02:35:20PM +0000, Russell King wrote:
Neither did I, but introducing printk's through the function, we narrowed
the problem to this part of the code. And removing it makes the problem
go away. We inserted 37 printk's in the function body, and Jose bisected
those until the problem went away.

Well, there's still little clue about why this is causing a NULL pointer
dereference. The only thing I can think is that somehow performing
this test is causing a power glitch to your CPU, causing its registers
to get corrupted, and which results in it doing a NULL pointer deref.
That may be the case, indeed.

Are you saying that the NULL pointer occurred while executing this code?
If not, where does the NULL pointer occur?
The thing is, the NULL pointer deref dissapeared as soon as we
instrumented (printk'ed) the code. So it's seems to be triggered by
check+timing+hardware.

No, it's only runtime because you can't tell which ports might be
affected, and you might have a mixture of ports which are affected
and those which aren't.
Hmm, ok. And what about a CONFIG_I_KNOW_MY_SERIAL_IS_BROKEN option?

Andrew's said no (in that the thread you refer to) and suggested an
alternative, I've said no, how many more 'no's do you need to turn
you away from the wrong approach?
One is usually sufficient once I've understood :). I missed the module
option approach. Is it ok with you? If yes, I'll put up a patch to do
this.

PS: CCing Andrew and Zang Roy-r61911 as they seemed to discuss this in
http://lkml.org/lkml/2006/6/13/21

I don't see any reference to this problem there.

Sorry, I suck, I got that mixed with that one:
http://lkml.org/lkml/2006/12/26/63
"probing for UART_BUG_TXEN in 8250 driver leads to weird effects on some
ARM boards"

The "weird effects" were never quantified, so that's one of the reasons
I ignored that report (another being is that I stopped being the serial
maintainer a while ago, and now serial is maintainerless.)

The problem appears to be reproducible on Jose's hardware within 2-3 days.
If you see other tests to be performed...

Regards,
Frederik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Serial related oops
    ... We inserted 37 printk's in the function body, ... there's still little clue about why this is causing a NULL pointer ... The reported dump shows that the kernel tried to access virtual address 0, ...
    (Linux-Kernel)
  • Re: Serial related oops
    ... there's still little clue about why this is causing a NULL pointer ... The reported dump shows that the kernel tried to access virtual address 0, ... The grep should get you the address of uart_startup. ...
    (Linux-Kernel)
  • Re: Serial related oops
    ... particular Jose Goncalves reportedan oops in 2.6.16.38 reproducible ... there's still little clue about why this is causing a NULL pointer ... maintainer a while ago, ...
    (Linux-Kernel)
  • Re: Problem with Marshal.SizeOf in CF 2.0
    ... pointer, which is causing the marshaler grief. ... int a1 = Marshal.SizeOf; ...
    (microsoft.public.dotnet.framework.compactframework)
  • Re: 5.1.2.2.1 argv misdescription
    ... The maintainer in question, when this was pointed out over in ... immediately acknowledged that argv is an array. ... He acknowledged that argv is a pointer. ...
    (comp.std.c)