Re: [PATCH] panic.c: export panic_on_oops



Here's what I get for traveling and not reading mail for two days...

On Mon, 2009-10-12 at 11:45 -0700, Linus Torvalds wrote:

On Mon, 12 Oct 2009, Ingo Molnar wrote:

* Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

Perhaps oops_enter() is a good place to mark the start of the log, and
flush it within oops_exit().

Simplest would be to do the last 2K in oops_exit()? That gives the oops,
and the history leading up to it. Since the blocking is 2K, the extra
log output is for free.

I agree, except I don't think it should be fixed to 2k.

We should just dump as much as is "appropriate" for the dump device. It
might be the last 2kB, it might be 8kB, it might be 64kB. We don't know,
we don't care. The device may have its own per-device limits. Any extra
data we get from before the oops is just gravy (often there might be
interestign warning messages leadign up to the dump), and if the oops is
too big for the dump device, it's not something we can do anything about
anyway.

I'm working on something different but related - also using the ring
buffer and just getting as much from its tail as I am able to conserve.

The approach in my case is to write a 2D bar code to the screen and have
the user take a picture / submit the picture to kerneloops.org where it
then gets decoded back into the oops message. This is intended for
situations where you don't have access to other storage / network - or
where a picture of the screen is actually the easiest way to get to the
information.

Right now the project is slightly stalled as I am running into an
unexpected project on the decode side, but I'd love to make sure that
the core changes I'm doing integrate cleanly with this project...


So the logic should literally be something like this:

- kernel/printk.c:

void dump_kmsg(void)
{
unsigned long len = ACCESS_ONCE(log_end);
struct dump_device *dump;
const char *s1, *s2;
unsigned long l1, l2;

s1 = "";
l1 = 0;
s2 = log_buf;
l2 = len;

/* Have we rotated around the circular buffer? */
if (len > log_buf_len) {
unsigned long pos = (len & LOG_BUF_MASK);

s1 = log_buf + pos;
l1 = log_buf_len - pos;

s2 = log_buf;
l2 = pos;
}

list_for_each_entry (dump, dump_list, list) {
dump->fn(s1, l1, s2, l2);
}
}

ie we just always give the whole buffer (as two "sections", since it's a
circular buffer) to the dumper, and then the dumper can decide how much of
those buffers it is able to dump sanely.

That's pretty close to what I do - only in my case the information then
doesn't get written to a device but instead gets compressed, encoded and
displayed on the framebuffer...

/D

--
Dirk Hohndel
Intel Open Source Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • [PATCH v2] tracing: Dump either the oopss cpu source or all cpus buffers
    ... dump every cpu buffers when an oops or panic happens. ... plus you miss the real origin of the problem in all the cpu traces. ... all you need is to dump the cpu buffer that triggered the ...
    (Linux-Kernel)
  • Re: [PATCH] panic.c: export panic_on_oops
    ... We should just dump as much as is "appropriate" for the dump device. ... circular buffer) to the dumper, and then the dumper can decide how much of ...
    (Linux-Kernel)
  • top causing oops with preempt count=1 on 2.6.7
    ... This is the dump i received when my system, running kernel 2.6.7 caused an oops. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: How to prevent context switching?
    ... crash and then analyze the dump. ... >> pre-empts the already running thread A, does context switching and runs ... >> short burst and clear the data from my main buffer. ... >> thread A resumes the operation and panics the system. ...
    (microsoft.public.development.device.drivers)
  • Re: question for secondary dump data callback
    ... My machine was setup for kernel memory dump and because of resource ... the secondary dump data didn't get preserved. ... Is the output buffer you provide ...
    (microsoft.public.development.device.drivers)