Bug 166722 - Kernel panic during system shutdown
Kernel panic during system shutdown
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: David Miller
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-08-24 17:53 EDT by Rigoberto Corujo
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-11-18 15:18:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Photograph of kernel panic (33.17 KB, image/jpeg)
2005-08-24 17:53 EDT, Rigoberto Corujo
no flags Details

  None (edit)
Description Rigoberto Corujo 2005-08-24 17:53:54 EDT
We recently received a report of several (2-CPU) nodes on a cluster panicking 
when they were powered off.  The cluster had previously running an application 
called AMBER which probably stressed the system pretty good.

The customer sent us a photograph of the console showing the panic information 
which I will attach shortly.

Basically, the trace shows that the problem occurred at “neigh_destroy+216” 
(linux-2.4.21/net/core/neighbour.c)

Trace:

neigh_destroy+216
dst_destroy+92
dst_run_gc+103

I’ve shown some of the disassembled code below.  Since the trace is probably 
showing what the next instruction would have been, not the instruction being 
currently executed, I assume that we are really interested in 
either “neigh_destroy+209” or “neigh_destroy+211”, which would be the “kfree” 
(shown in the code snippet below).  I’m guessing that the “kfree” is the 
culprit only because if it were the line before, which is “atomic_dec_and_test
(&hh->hh_refcnt)”, then that would imply a bad “hh” and we would have blown up 
a few lines earlier while referencing “hh”.

Relevant code of a disassembled “neigh_destroy”.

0xffffffff80255921 <neigh_destroy+209>: je     0xffffffff80255928 
<neigh_destroy+216>
0xffffffff80255923 <neigh_destroy+211>: callq  0xffffffff8014eb20 <kfree>
0xffffffff80255928 <neigh_destroy+216>: mov    0x68(%rbp),%rdi

Relevant code of a disassembled “dst_destroy”.

0xffffffff80254a34 <dst_destroy+84>:    mov    %rbx,%rdi
0xffffffff80254a37 <dst_destroy+87>:    callq  0xffffffff80255850 
<neigh_destroy>
0xffffffff80254a3c <dst_destroy+92>:    mov    0xc0(%rbp),%rax

Relevant code of a disassembled “dst_run_gc”.

0xffffffff8025471e <dst_run_gc+94>:     mov    %rax,0x0(%rbp)
0xffffffff80254722 <dst_run_gc+98>:     callq  0xffffffff802549e0 <dst_destroy>
0xffffffff80254727 <dst_run_gc+103>:    test   %rax,%rax

FILE: linux-2.4.21/net/core/neighbour.c

void neigh_destroy(struct neighbour *neigh)
{
…
        while ((hh = neigh->hh) != NULL) {
                neigh->hh = hh->hh_next;
                hh->hh_next = NULL;
                write_lock_bh(&hh->hh_lock);
                hh->hh_output = neigh_blackhole;
                write_unlock_bh(&hh->hh_lock);
                if (atomic_dec_and_test(&hh->hh_refcnt))
                        kfree(hh);
        }
…
}

----------------------------------------------------

FILE:  linux-2.4.21/include/net/neighbour.h

static inline void neigh_release(struct neighbour *neigh)
{
        if (atomic_dec_and_test(&neigh->refcnt))
                neigh_destroy(neigh);
}

----------------------------------------------------

FILE:  linux-2.4.21/net/core/dst.c

struct dst_entry *dst_destroy(struct dst_entry * dst)
{
…
        neigh = dst->neighbour;
…
        if (neigh) {
                dst->neighbour = NULL;
                neigh_release(neigh);
        }
…
}

----------------------------------------------------

FILE:  linux-2.4.21/net/core/dst.c

static void dst_run_gc(unsigned long dummy)
{
…
        if (!spin_trylock(&dst_lock)) {
                mod_timer(&dst_gc_timer, jiffies + HZ/10);
                return;
        }
…
        while ((dst = *dstp) != NULL) {
                if (atomic_read(&dst->__refcnt)) {
                        dstp = &dst->next;
                        delayed++;
                        continue;
                }
                *dstp = dst->next;

                dst = dst_destroy(dst);
…
        spin_unlock(&dst_lock);
}

I’m not sure if this implies a locking problem.  It would appear as if 
the “write_lock_bh(&hh->hh_lock)”, or some other type of locking, should have 
taken place prior to the following line???  That is, prior to actually 
using “hh”??

      while ((hh = neigh->hh) != NULL)

These nodes have 2 CPU’s each so I don’t know if this code is well protected 
against multiple kernel threads.  The “dst_run_gc”, which calls “dst_destroy”, 
which calls “neigh_release”, which calls “neigh_destroy”, does appear to set a 
lock, which may be sufficient in this case.

These machines are running Linux 2.4.21-27 and the processor type is EM64T.

I am trying to see if I can get my hands on the AMBER application that the 
customer ran prior to the kernel panic.  As I understand it, the kernel panics 
during shutdown only started to occur after users began running the AMBER 
application.  So, unfortunately, I have not been able to reproduce it yet and 
do not have a reproducer to provide.

Any help in resolving this problem would be appreciated.

Thank you.

Rigoberto
Comment 1 Rigoberto Corujo 2005-08-24 17:53:54 EDT
Created attachment 118095 [details]
Photograph of kernel panic
Comment 2 Ernie Petrides 2005-08-24 18:22:06 EDT
Could you please try to reproduce this on the latest released kernel
(version 2.4.21-32.0.1.EL), which was released 3 months ago?  There
have been many important (and potentially relevant) fixes since U4.

Also, please try to capture the full console oops output (with serial
console if necessary).  We at least need to see if the kernel is tainted
and what the module list looks like.

Thanks in advance.
Comment 3 Rigoberto Corujo 2005-09-26 10:18:23 EDT
Hello Ernie,

I'm sorry for taking so long to respond to this case as I've been completely 
swamped.

We'll follow your suggestion of moving to the 2.4.21-32 kernel and see if that 
fixes the problem for this particular customer.  As far as capturing console 
logs via the serial port, that might not be easy to do.  The customer has a 
288 node cluster and it is difficult to know which nodes are going to crash.  
These nodes do have a management port that we use to power them on/off, which 
should also have console redirection capability, but we need to figure out 
what the BIOS recipe is to enable console redirection to the management port.

Anyway, I think you can close this case as we need to first try the 2.4.21-32 
kernel before attempting to further troubleshoot this problem.  Should the 
problem persist even after upgrading, I will file a new Bugzilla.

Thank you very much for your support.

BTW, your name sounds very familiar.  Were you a former kernel developer with 
DEC?

Rigoberto
Comment 4 Ernie Petrides 2005-10-10 19:55:24 EDT
Rigoberto, yes, I used to be a contractor for many years there, and was involved
heavily with OSF/1 -> Digitial UNIX -> Compaq Tru64 UNIX kernel development.

Reverting state to NEEDINFO.
Comment 5 Rigoberto Corujo 2005-11-18 15:18:47 EST
After receiving a second report by a different customer, we were able to obtain 
enough information to determine that the kernel panic was being caused by the 
Infiniband drivers.

The scenario that leads up to the panic is as follows:

1) In a cluster, node "A" is exporting a filesystem, say "/scratch".
2) Node "B" is NFS mounting "/scratch" from node "A" with the "tcp" mount 
option over the Infiniband interconnect.  It is important to note that he 
problem doesn't occur with "udp".
3) Node "B" runs an application that writes to files in "/scratch".
4) A cluster-wide shutdown command is issued and all the nodes begin to stop 
their services.
5) Node "A's" nfs service is stopped during the shutdown and, therefore, is no 
longer exporting "/scratch".
6) Node "B" unloads its Infiniband driver that was being used to 
mount "/scratch" from node "A".
7) Shortly afterwards, node "B" panics.

It should be noted that if node "A" doesn't shutdown its nfs service before 
node "B" shutdowns down, then the panic does not occur.

We reported the incident to Voltaire, who provides the Infiniband drivers, and 
they provided the following explanation:

----

Linux holds a reference counter on network devices, the counter is increased /
decreased during traffic. There is a kernel implementation related problem that
causes the counter to stay non-zero for very long time (possibly forever). In 
thiscase the device un-registration will cause the machine to wait forever. 
This usually happens during shutdown / reboot during heavy traffic.

During server shutdown / reboot all services are being stopped and all process 
are being killed. Voltaire IBHOST is a registered service and therefore being 
stopped during the shutdown / reboot event. This causes the removal of the 
IPoIB interface and also the removal of the IPoIB kernel module, which calls 
the unregister device command ( From the kernel ).  This issue can also happen 
in Ethernet drivers, the main difference is that Ethernet drivers are not 
removed during shutdown / reboot ( Only the interface is brought
down ) and therefore don’t call the unregister_device.

----

Voltaire has provided a patch for this problem.  This case can be considered 
closed.

Thank you for your assistance.

Rigoberto

Note You need to log in before you can comment on or make changes to this bug.