Re: [PATCH -mm] kexec jump -v9



On Thursday, 15 of May 2008, Eric W. Biederman wrote:
"Huang, Ying" <ying.huang@xxxxxxxxx> writes:

This is a minimal patch with only the essential features. All
additional features are split out and can be discussed later. I think
it may be easier to get consensus on this minimal patch.

A minimal patch route sounds good.


* Do not allocate memory (or fail in any way) in machine_kexec().
* We are past the point of no return, committed to rebooting now.
*/
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
{
unsigned long page_list[PAGES_NR];
void *control_page;
+ asmlinkage NORET_TYPE void
+ (*relocate_kernel_ptr)(unsigned long indirection_page,
+ unsigned long control_page,
+ unsigned long start_address,
+ unsigned int has_pae) ATTRIB_NORET;

/* Interrupts aren't acceptable while we reboot */
local_irq_disable();

control_page = page_address(image->control_code_page);
- memcpy(control_page, relocate_kernel, PAGE_SIZE);
+ memcpy(control_page, kexec_relocate_page, PAGE_SIZE/2);
+ KJUMP_MAGIC(control_page) = 0;

+ if (image->preserve_context) {
+ KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
+ if (kexec_jump_save_cpu(control_page)) {
+ image->start = KJUMP_ENTRY(control_page);
+ return;

Tricky, and I expect unnecessary.
We should be able to just have relocate_new_kernel return?

+ }
+ }
+
+ relocate_kernel_ptr = control_page +
+ ((void *)relocate_kernel - (void *)kexec_relocate_page);
page_list[PA_CONTROL_PAGE] = __pa(control_page);
- page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
+ page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
page_list[PA_PGD] = __pa(kexec_pgd);
page_list[VA_PGD] = (unsigned long)kexec_pgd;
#ifdef CONFIG_X86_PAE
@@ -127,6 +148,7 @@ NORET_TYPE void machine_kexec(struct kim
page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
page_list[PA_PTE_1] = __pa(kexec_pte1);
page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
+ page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);

/* The segment registers are funny things, they have both a
* visible and an invisible part. Whenever the visible part is
@@ -145,8 +167,9 @@ NORET_TYPE void machine_kexec(struct kim
set_idt(phys_to_virt(0),0);

/* now call it */
- relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
- image->start, cpu_has_pae);
+ relocate_kernel_ptr((unsigned long)image->head,
+ (unsigned long)page_list,
+ image->start, cpu_has_pae);
}


--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -301,18 +301,26 @@ EXPORT_SYMBOL_GPL(kernel_restart);
* Move into place and start executing a preloaded standalone
* executable. If nothing was preloaded return an error.
*/
-static void kernel_kexec(void)
+static int kernel_kexec(void)
{
+ int ret = -ENOSYS;
#ifdef CONFIG_KEXEC
- struct kimage *image;
- image = xchg(&kexec_image, NULL);
- if (!image)
- return;
- kernel_restart_prepare(NULL);
- printk(KERN_EMERG "Starting new kernel\n");
- machine_shutdown();
- machine_kexec(image);
+ if (xchg(&kexec_lock, 1))
+ return -EBUSY;
+ if (!kexec_image) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+ if (!kexec_image->preserve_context) {
+ kernel_restart_prepare(NULL);
+ printk(KERN_EMERG "Starting new kernel\n");
+ machine_shutdown();
+ }
+ ret = kexec_jump(kexec_image);
+unlock:
+ xchg(&kexec_lock, 0);
#endif

Ugh. No. Not sharing the shutdown methods with reboot and
the normal kexec path looks like a recipe for failure to me.

This looks like where we really need to have the conversation.
What methods do we use to shutdown the system.

My take on the situation is this. For proper handling we
need driver device_detach and device_reattach methods.

With the following semantics. The device_detach methods
will disable DMA and place the hardware in a sane state
from which the device driver can reclaim and reinitialize it,
but the hardware will not be touched.

device_reattach reattaches the driver to the hardware.

So looking at this patch I see two very productive directions
we can go.
1) A patch that just fixes up the kexec infrastructure code
so it implements the swap page and provides the kernel
reentry point. And doesn't handle the upper layer
user interface portion.

2) A patch that renames device_shutdown to device_detach.
And starts implementing the driver hooks needed from
a resumable kexec.

Then we have the question what do we do with devices in the
kernel that don't have a device_reattach method, when we
expect to come back from a kexec. The two choices are:
(a) fail the operations before we commit to anything.
(b) hotunplug/hotreplug the device.

With respect to device methods. I don't think any of
the current power saving methods make sense. Certainly
nothing that prepares the way for using weird ACPI states.

I don't think there is not enough difference between
device_detach and device_shutdown for us to maintain two
separate methods, and that seems to place an unreasonable
maintenance burden on device driver developers.

Well, it looks like we do similar things concurrently. Please have a look
here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

Similar patches are in the Greg's tree already.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • [PATCH] char/tpm: use msleep(), clean-up timers, fix typo
    ... > This patch is a device driver to enable new hardware. ... Here is a patch that removes all callers of schedule_timeoutas I ... The TPM driver unnecessarily uses timers when it simply ...
    (Linux-Kernel)
  • [PATCH -MM] e1000: rewrite hardware initialization code
    ... This patch contains a major rewrite to the e1000 driver that groups and separates e1000 hardware by chipset family. ... It abstracts the hardware specific code into an API that will allow us to continue to maintain the complex e1000 driver and add new hardware support to it without touching code that affects older chipsets. ...
    (Linux-Kernel)
  • Re: [PATCH -mm] kexec jump -v9
    ... Maybe we can move CPU state saving code into ... need driver device_detach and device_reattach methods. ... Current device PM callback is not suitable for hibernation (kexec ... A patch that renames device_shutdown to device_detach. ...
    (Linux-Kernel)
  • Re: [PATCH] e100 rx: or s and el bits
    ... I wonder if this worked for me because the hardware also spun on the link field being NULL? ... With the original in kernel driver, I had two tests that would always end in horrible memory corruption on a PXA255 based system. ... Even through the original S-bit patch seems broken, we have had days of continuous traffic through it without issue where previously we could never go more than 6 hours. ... I will also run these tests with the new patch and with a smaller receive pool to make the pool run out more often. ...
    (Linux-Kernel)
  • Re: [PATCH -mm] kexec jump -v9
    ... it may be easier to get consensus on this minimal patch. ... A minimal patch route sounds good. ... the normal kexec path looks like a recipe for failure to me. ... need driver device_detach and device_reattach methods. ...
    (Linux-Kernel)