Bug 144668

Summary:	System doesn't reboot even if kernel.panic is > 0 on RHEL-4 Beta-2.
Product:	Red Hat Enterprise Linux 4	Reporter:	Sumit Sharma <ssharma3>
Component:	kernel	Assignee:	Dave Anderson <anderson>
Status:	CLOSED ERRATA	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.2	CC:	davej, linux26port, lockhart, ntachino, riel, rkenna, tburke
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	RHSA-2005-514	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-10-05 12:37:10 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	156322

Description Sumit Sharma 2005-01-10 15:30:11 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040510

Description of problem:
Normally, if kernel.panic is > 0, system reboots on calling panic().
But on x86_64 platform, some conditions have been put for rebooting
the system on calling panic(). Now, if panic() has not been called
from diskdump/netdump specific code, system is halted and no reboot
takes place. Only when panic() has been initiated from
diskdump/netdump code [with the assumption that any of these 2 is
already configured], reboot happens, otherwise system halts. In some
cases, there is an urgent requirement for rebooting the system on
panic, even if we have not configured any diskdump or netdump stuff.
The point here is, there are some drivers which directly call panic()
function without actually creating a panic situation. For those
drivers, there would not be any system reboot, if dump is not
configured and a panic called directly. On i386 and ia64, directly
reboot_warm() is called when panic() is called, but on x86_64, need
for diskdump/netdump is there.

Version-Release number of selected component (if applicable):
kernel-2.6.9-1.648_ELsmp

How reproducible:
Always

Steps to Reproduce:
1. Make a simple module and insert it. Set kernel.panic to some value > 0.
2. Within the module's code, call panic() on some condition.
3. Create that panic condition, so that driver module calls panic().
4. You can see that at this point, your system will not reboot, but
will halt. You will see the following message :-

Rebooting in x seconds...

But there is no reboot, but only system halt.
    

Actual Results:  System reboot doesn't take place on calling panic()
directly.

Expected Results:  System should reboot on calling panic() directly,
when kernel.panic > 0.

Additional info:

There are some cases in drivers, where we have to call panic() on some
situations directly without actually knowing whether dump is
configured or not. Making dump configuration mandatory doesn't allow
you to reboot the system by calling panic() directly. Normally, if we
see the flow of panic(), it calls machine_restart(), which is a
platform_specific code and for x86_64, at the starting of this
function, we check if the panic() has been called with
diskdump/netdump configured, because in that case diskdump_mode or
netdump_mode is set and then we call reboot_warm(), otherwise we call
machine_shutdown() which simply shuts down all the CPUs down without
rebooting the system.

The overhead which we have to take in order to reboot the machine for
x86_64 platform is :-

1. Configure diskdump or network dump.
2. Instead of calling panic() directly from the driver, create panic
situation by NULL assignment or something like this, so that system
calls die() inside do_trap(), which then calls try_crashdump() and if
disk/net(dump) is configured, diskdump_mode or netdump_mode flag is
set accordingly and inside oops_end(), we call panic which then
reboots the machine.

Comment 1 Dave Anderson 2005-01-18 14:28:10 UTC

The kernel patch in question is this:

--- linux-2.6.8/arch/x86_64/kernel/reboot.c.orig
+++ linux-2.6.8/arch/x86_64/kernel/reboot.c
@@ -120,7 +120,8 @@ void machine_shutdown(void)
        /* O.K Now that I'm on the appropriate processor,
         * stop all of the others.
         */
-       smp_send_stop();
+       if (!crashdump_mode())
+               smp_send_stop();
 #endif

        local_irq_disable();
@@ -130,15 +131,16 @@ void machine_shutdown(void)
 #endif

        disable_IO_APIC();
-
-       local_irq_enable();
+       if (!crashdump_mode())
+               local_irq_enable();
 }

 void machine_restart(char * __unused)
 {
        int i;

-       machine_shutdown();
+       if (!crashdump_mode())      /* temporary? */
+               machine_shutdown();

        /* Tell the BIOS if we want cold or warm reboot */
        *((unsigned short *)__va(0x472)) = reboot_mode

Note that before the patch was applied to machine_restart(),
machine_shutdown() would be called unconditionally.  With the
patch, it is only called if we are *not* in diskdump/netdump mode.
So if the netdump/diskdump patch were not applied, machine_shutdown()
would be called, and as you have seen, would prevent the reboot
operation from occurring.  In other words, warm reboots would never
work on x86_64 machines with that 2.6.9 code base, because, as I
recall, the machine would hang in machine_shutdown().  By adding the
netdump/diskdump patch, we avoid machine_restart() entirely 
(since the other cpus have been frozen and won't receive IP
interrupts), and by your workaround, you essentially do the same
thing.

I will check whether the latest RHEL4 x86_64 kernel still is unable
to do a warm reboot successfully.

Comment 2 Dave Anderson 2005-01-18 15:24:30 UTC

Can you confirm that when you call panic() from your kernel module
that it is *not* in interrupt mode?

Comment 3 Dave Anderson 2005-01-18 16:42:38 UTC

Without configuring netdump or diskdump, can you confirm the what the
results are if you:

1. Enter Alt-sysrq-b from the console device
2. Enter "echo b > /proc/sysrq-trigger"

Comment 4 Dave Anderson 2005-01-20 15:46:05 UTC

Never mind the tests requested above, I've found the problem.
It's actually a bug in the upstream x86_64 kernel.  

When /proc/sys/kernel/panic is set, smp_send_stop() gets called twice
when panic() is called -- once at the top of panic() itself, and if
panic_timeout > 0, it gets called a second time via the
machine_restart() to machine_shutdown() path.

The x86_64 version of smp_send_stop() first sends IPI's to kill
the other cpus with smp_really_stop_cpu(): 

void smp_send_stop(void)
{
        int nolock = 0;
        /* Don't deadlock on the call lock in panic */
        if (!spin_trylock(&call_lock)) {
                udelay(100);
                /* ignore locking because we have paniced anyways */
                nolock = 1;
        }
        __smp_call_function(smp_really_stop_cpu, NULL, 1, 0);
        if (!nolock)
                spin_unlock(&call_lock);
        smp_stop_cpu();
}

Note that smp_really_stop_cpu(), when it runs on the other cpus,
calls smp_stop_cpu() before it actually shuts itself down:

static void smp_really_stop_cpu(void *dummy)
{
        smp_stop_cpu();
        for (;;)
                asm("hlt");
}

One of the functions of smp_stop_cpu() is to clear the processor's
bit from the cpu_online_map:

void smp_stop_cpu(void)
{
        /*
         * Remove this CPU:
         */
        cpu_clear(smp_processor_id(), cpu_online_map);
        local_irq_disable();
        disable_local_APIC();
        local_irq_enable();
}

What's unique about x86_64 shutdown procedure, is that in the
smp_send_stop() function above, after it shuts the other cpus down,
it calls smp_stop_cpu() on *itself*, thereby removing its cpu bit
from the cpu_online_map.  That's the bug, because, upon calling
smp_send_stop() the second time, it retries the __smp_call_function()
to kill the other cpus with smp_really_stop_cpu().  That would
normally be OK, but because the cpu_online_map is now equal to zero,
the __smp_call_function() routine hangs because of this:

static void __smp_call_function (void (*func) (void *info), 
        void *info, int nonatomic, int wait)
{
        static struct call_data_struct dumpdata;
        struct call_data_struct normaldata;
        struct call_data_struct *data;
        int cpus = num_online_cpus()-1;

        if (!cpus)
                return;

Normally the calling cpu would have its bit set in the cpu_online_map,
but because it's been cleared, num_online_cpus()-1 returns -1, as if
there are 32 online cpus to send IPI's to.  So it hangs waiting for
them to respond.

We'll fix this in an update.

Comment 9 Sumit Sharma 2005-04-06 04:50:45 UTC

I verified it with latest update (RHEL4 U1), but the problem is still there.
Seems that there is some other thing also involved in this matter. No reboot is
taking place on calling panic explicitly.

Comment 10 Dave Anderson 2005-04-06 13:21:38 UTC

Exactly which kernel version are you testing this with?

Comment 11 Sumit Sharma 2005-04-06 13:26:02 UTC

Kernel version is :- 2.6.9-6.28.ELsmp

Comment 12 Dave Anderson 2005-04-06 13:56:06 UTC

Ok -- the patch that was tested OK here is in 2.6.9-6.28.

That patch prevented the cpu_online map being set to 0 prematurely,
which would cause a hang in smp_call_function().  It's not obvious
what has changed.

Anyway, I'll re-open this case, and re-investigate...

Comment 13 Dave Anderson 2005-04-07 18:35:25 UTC

In this case, we have a different situation than the cpu_online map
issue that was fixed before.  Here, if the panic cpu is not the 
boot_cpu_id cpu, the hang occurs like this:

1. panic on a cpu other than boot_cpu_id.
2. panic() calls smp_send_stop(), which shuts down all other cpus,
   including the boot_cpu_id cpu.
3. panic() (if panic_timeout) then calls machine_restart().
4. machine_restart() calls smp_halt() -- presumably to shut down the
   other cpus.  (In a non-panic() situation, the other cpus would still
   be running.)
5. smp_halt() does this:

#ifdef CONFIG_SMP
static void smp_halt(void)
{
        int cpuid = safe_smp_processor_id();
                static int first_entry = 1;

                if (first_entry) {
                        first_entry = 0;
                        smp_call_function((void *)machine_restart, NULL, 1, 0);
                }

        smp_stop_cpu();

        /* AP calling this. Just halt */
        if (cpuid != boot_cpu_id) {
                for (;;)
                        asm("hlt");
        }

        /* Wait for all other CPUs to have run smp_stop_cpu */
        while (!cpus_empty(cpu_online_map))
                rep_nop();
}
#endif

Since the other cpus have already halted, the "first_entry" smp_call_function()
is a no-op.  Then, the subsequent (cpuid != boot_cpu_id) check causes all but
the boot_cpu_id cpu to halt as well, so there is no return to machine_restart()
to continue the reboot operation.   So, if the panic cpu is not the boot_cpu_id
cpu, then we hang...  (actually all cpus have done a "hlt").

Presuming that the reboot operation can be done by other than the boot_cpu_id
cpu, it looks like the best solution is to avoid calling smp_halt() if all the
other cpus have already been halted, and let the non-boot cpu do the
reboot operation.  To be tested...

Comment 14 Dave Anderson 2005-04-07 21:39:11 UTC

The proposed fix is to have the "first_entry" smp_halt() return
if the number of online cpus is 1, so that the machine_restart()
can proceed:

--- linux-2.6.9/arch/x86_64/kernel/reboot.c.orig        2005-04-07
15:02:02.297123520 -0400
+++ linux-2.6.9/arch/x86_64/kernel/reboot.c     2005-04-07 15:08:29.346283104
-0400
@@ -100,12 +100,15 @@
 static void smp_halt(void)
 {
        int cpuid = safe_smp_processor_id(); 
-               static int first_entry = 1;
+       static int first_entry = 1;
 
-               if (first_entry) { 
-                       first_entry = 0;
-                       smp_call_function((void *)machine_restart, NULL, 1, 0);
-               } 
+       if (first_entry) { 
+               first_entry = 0;
+               /* If nobody's alive, just return to machine_restart */
+               if (num_online_cpus() == 1)
+                       return;
+               smp_call_function((void *)machine_restart, NULL, 1, 0);
+       } 
                        
        smp_stop_cpu(); 

It should also be noted that the kernel that the original patch was applied
to (and which is still valid) did not contain the smp_halt() function above.
smp_halt() replaced the former machine_shutdown() function, which did not have
the (cpuid != boot_cpu_id) check.

Comment 15 Jay Turner 2005-04-16 10:29:30 UTC

The patch in comment 14 isn't included in the latest U1-candidate kernel
(2.6.9-6.39.EL) and although it has been posted to rhkernel-list, hasn't
received any acks, so throwing this back into ASSIGNED.

Comment 21 Sumit Sharma 2005-05-26 15:00:50 UTC

The above patch looks ok to me, but more appropriate patch may be this :-

--- arch/x86_64/kernel/reboot.c.orig    2005-05-26 19:51:50.134178960 +0530
+++ arch/x86_64/kernel/reboot.c 2005-05-26 19:54:19.567461648 +0530
@@ -95,17 +95,18 @@
 static void smp_halt(void)
 {
        int cpuid = safe_smp_processor_id();
-               static int first_entry = 1;
+       static long first_entry;
+       static int restart_cpuid;
+
+       if (!test_and_set_bit(0, &first_entry)) {
+               restart_cpuid = cpuid;
+               smp_call_function((void *)machine_restart, NULL, 1, 0);
+       }

-               if (first_entry) {
-                       first_entry = 0;
-                       smp_call_function((void *)machine_restart, NULL, 1, 0);
-               }
-
        smp_stop_cpu();

        /* AP calling this. Just halt */
-       if (cpuid != boot_cpu_id) {
+       if (cpuid != restart_cpuid) {
                for (;;)
                        asm("hlt");
        }

Because if we apply the patch at comment # 14, then comparison
"if (cpuid != boot_cpu_id)" becomes unnecessary and this comparison forces us to
assume that boot_cpu_id is online when machine_restart() is called.
This suggested patch will remove that assumption also.

Comment 22 Dave Anderson 2005-05-26 15:39:14 UTC

I didn't want to make any assumptions re: the upstream mechanism
which implies that all cpus except the boot_cpu_id should "hlt",
and the boot_cpu_id should handle the machine_restart().  But in
the panic case, we've got no choice but to allow the only remaining
cpu running to attempt the machine_restart().

Comment 32 Red Hat Bugzilla 2005-10-05 12:37:10 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-514.html