Re: Linux Kernel Dump Summit 2005

From: OBATA Noboru (noboru.obata.ar_at_hitachi.com)
Date: 10/12/05

  • Next message: com bio: "FTP over TLS/SSL (explicit)"
    Date:	Wed, 12 Oct 2005 17:30:43 +0900 (JST)
    To: hyoshiok@miraclelinux.com, akpm@osdl.org
    
    

    On Tue, 11 Oct 2005, Hiro Yoshioka wrote:
    >
    > The reasons are
    > 1) They have to maintain the dump tools and support their users.
    > Many users are still using 2.4 kernels so merging kdump into 2.6
    > kernel does not help them.
    > 2) Commercial Linux Distros (Red Hat/Suse/MIRACLE(Asianux)/Turbo etc) use
    > LKCD/diskdump/netdump etc.
    > Almost no users use a vanilla kernel so kdump does not have users yet.

    Agreed.

    I am testing (or tasting ;-) kdump myself, and find it really
    impressive and promising. Thank you all who have worked on.

    In term of users, however, the majority of commercial users
    still use 2.4 kernels of commercial Linux distributions. This
    is especially true for careful users who have large systems
    because switching to 2.6 kernels without regression is not an
    easy task. So merging kdump into the mainline kernel does not
    directly mean that these users start using it now.

    Rather, merging kdump has much meaning for commercial Linux
    distributors, who should be planning how and when to include
    kdump in their distros.

    > > Is that a correct impression? If so, what shortcoming(s) in kdump are
    > > causing people to be reluctant to use it?
    >
    > I think the way to go is the kdump however it may take time.

    Agreed.

    I'd say commercial users are not reluctant to use kdump, but
    they are just waiting for kdump-ready distros. So in turn, we
    still have some time left for improving kdump further before
    kdump-ready distros are shipped to users, and I would like to be
    involved in such improvement hereafter.

    Thinking about the requirements in enterprise systems,
    challenges of kdump will be:

      - Reliability
        + Hardware-related issues

      - Manageability
        + Easy configuration
        + Automated dump-capture and restart
        + Time and space for capturing dump
        + Handling two kernels

      - Flexibility
        + Hook points before booting the 2nd kernel

    My short impressions follow. I understand that kdump/kexec
    developers are already discussing and working on some issues
    above, and I am grateful if someone tell me about the current
    status, or point me to the past lkml threads.

    Reliability
    -----------

    In terms of reliability, hardware-related issues, such as a
    device reinitialization problem, an ongoing DMA problem, and
    possibly a pending interrupts problem, must be carefully
    resolved.

    Manageability
    -------------

    As for manageability, it is nice if a user can easily setup
    kdump just by writing DEVICE=/dev/sdc6 to one's
    /etc/sysconfig/kdump and start the kdump service, for example.
    It is also desirable that an action taken after capturing a dump
    (halt, reboot, or poweroff) is configurable. I believe these are
    userspace tasks.

    Time and space problem in capturing huge crash dump is raised
    already. The partial dump and dump compress technology must be
    explored.

    One of my worries is that the current kdump requires distinct
    two kernels (one for normal use, and one for capturing dumps) to
    work. And I'm not fully convinced whether a use of two kernels
    is the only solution or not. Well, I heard that this decision
    better solves the ongoing DMA problem (please correct me if
    other reasons are prominent), but from a pure management point
    of view handing one kernel is happier than two kernels.

    Flexibility
    -----------

    To minimize the downtime, a crashed kernel would want to
    communicate with clustering software/firmware to help it detect
    the failure quickly. This can be generalized by making
    appropriate hook points (or notifier lists) in kdump.

    Perhaps these hooks can be used to try reseting devices when
    reinitialization of devices in the 2nd kernel tends to fail.

    Sorry if I'm bringing up the already discussed issues again, but
    I believe that addressing above issues will help the commercial
    users in the future, and so I would like to discuss them again
    how these issues can be addressed.

    Regards,

    -- 
    OBATA Noboru (noboru.obata.ar@hitachi.com)
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: com bio: "FTP over TLS/SSL (explicit)"

    Relevant Pages

    • Re: [patch] add kdump_after_notifier
      ... I want to use both kdb and kdump, ... The kexec on panic assumption is that the kernel is broken and we better not ... My stance is that _all_ the RAS tools (kdb, kgdb, nlkd, netdump, lkcd, ...
      (Linux-Kernel)
    • Re: oops in :snd_pcm_oss:resample_expand+0x19c/0x1f0
      ... Thanks for trying out kdump. ... For example, the small script you wrote for saving the dump, is already ... "service kdump start" and initrd will be generated and kdump kernel will ... Currently we can use gdb but only for linearly mapped region. ...
      (Linux-Kernel)
    • [PATCH] Documentation update for kdump
      ... We have updated much of the document to reflect the current state of kdump and reformatted the document for readability. ... -Kdump uses kexec to reboot to a second kernel whenever a dump needs to be ... This second kernel is booted with very little memory. ...
      (Linux-Kernel)
    • Re: oops in :snd_pcm_oss:resample_expand+0x19c/0x1f0
      ... A serial console on another PC very much helped in troubleshooting problems (eg. my kdump kernel's initrd was initially too large), but it's not required. ... The distro is the AMD64 Debian etch, with two vanilla 2.6.18 kernels: one "regular" SMP kernel with CONFIG_KEXEC=y, which is what I use, and one UP kernel with CONFIG_PROC_VMCORE=y and a different load address, which is activated on a crash; other than that they are the same. ... This kernel boots on the very same root fs, runs the script below again, this time to save the dump and reboot to the regular kernel. ... _echo "Configuring kdump kernel..." ...
      (Linux-Kernel)
    • Re: [patch] add kdump_after_notifier
      ... I want to use both kdb and kdump, ... The kexec on panic assumption is that the kernel is broken and we better not ... One thing to keep in mind, doing things in second kernel might not be easy ...
      (Linux-Kernel)