Bug 166722

Summary: Kernel panic during system shutdown
Product: Red Hat Enterprise Linux 3 Reporter: Rigoberto Corujo <rigoberto.corujo>
Component: kernelAssignee: David Miller <davem>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: petrides
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-11-18 20:18:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Photograph of kernel panic none

Description Rigoberto Corujo 2005-08-24 21:53:54 UTC
We recently received a report of several (2-CPU) nodes on a cluster panicking 
when they were powered off.  The cluster had previously running an application 
called AMBER which probably stressed the system pretty good.

The customer sent us a photograph of the console showing the panic information 
which I will attach shortly.

Basically, the trace shows that the problem occurred at âneigh_destroy+216â 
(linux-2.4.21/net/core/neighbour.c)

Trace:

neigh_destroy+216
dst_destroy+92
dst_run_gc+103

Iâve shown some of the disassembled code below.  Since the trace is probably 
showing what the next instruction would have been, not the instruction being 
currently executed, I assume that we are really interested in 
either âneigh_destroy+209â or âneigh_destroy+211â, which would be the âkfreeâ 
(shown in the code snippet below).  Iâm guessing that the âkfreeâ is the 
culprit only because if it were the line before, which is âatomic_dec_and_test
(&hh->hh_refcnt)â, then that would imply a bad âhhâ and we would have blown up 
a few lines earlier while referencing âhhâ.

Relevant code of a disassembled âneigh_destroyâ.

0xffffffff80255921 <neigh_destroy+209>: je     0xffffffff80255928 
<neigh_destroy+216>
0xffffffff80255923 <neigh_destroy+211>: callq  0xffffffff8014eb20 <kfree>
0xffffffff80255928 <neigh_destroy+216>: mov    0x68(%rbp),%rdi

Relevant code of a disassembled âdst_destroyâ.

0xffffffff80254a34 <dst_destroy+84>:    mov    %rbx,%rdi
0xffffffff80254a37 <dst_destroy+87>:    callq  0xffffffff80255850 
<neigh_destroy>
0xffffffff80254a3c <dst_destroy+92>:    mov    0xc0(%rbp),%rax

Relevant code of a disassembled âdst_run_gcâ.

0xffffffff8025471e <dst_run_gc+94>:     mov    %rax,0x0(%rbp)
0xffffffff80254722 <dst_run_gc+98>:     callq  0xffffffff802549e0 <dst_destroy>
0xffffffff80254727 <dst_run_gc+103>:    test   %rax,%rax

FILE: linux-2.4.21/net/core/neighbour.c

void neigh_destroy(struct neighbour *neigh)
{
â¦
        while ((hh = neigh->hh) != NULL) {
                neigh->hh = hh->hh_next;
                hh->hh_next = NULL;
                write_lock_bh(&hh->hh_lock);
                hh->hh_output = neigh_blackhole;
                write_unlock_bh(&hh->hh_lock);
                if (atomic_dec_and_test(&hh->hh_refcnt))
                        kfree(hh);
        }
â¦
}

----------------------------------------------------

FILE:  linux-2.4.21/include/net/neighbour.h

static inline void neigh_release(struct neighbour *neigh)
{
        if (atomic_dec_and_test(&neigh->refcnt))
                neigh_destroy(neigh);
}

----------------------------------------------------

FILE:  linux-2.4.21/net/core/dst.c

struct dst_entry *dst_destroy(struct dst_entry * dst)
{
â¦
        neigh = dst->neighbour;
â¦
        if (neigh) {
                dst->neighbour = NULL;
                neigh_release(neigh);
        }
â¦
}

----------------------------------------------------

FILE:  linux-2.4.21/net/core/dst.c

static void dst_run_gc(unsigned long dummy)
{
â¦
        if (!spin_trylock(&dst_lock)) {
                mod_timer(&dst_gc_timer, jiffies + HZ/10);
                return;
        }
â¦
        while ((dst = *dstp) != NULL) {
                if (atomic_read(&dst->__refcnt)) {
                        dstp = &dst->next;
                        delayed++;
                        continue;
                }
                *dstp = dst->next;

                dst = dst_destroy(dst);
â¦
        spin_unlock(&dst_lock);
}

Iâm not sure if this implies a locking problem.  It would appear as if 
the âwrite_lock_bh(&hh->hh_lock)â, or some other type of locking, should have 
taken place prior to the following line???  That is, prior to actually 
using âhhâ??

      while ((hh = neigh->hh) != NULL)

These nodes have 2 CPUâs each so I donât know if this code is well protected 
against multiple kernel threads.  The âdst_run_gcâ, which calls âdst_destroyâ, 
which calls âneigh_releaseâ, which calls âneigh_destroyâ, does appear to set a 
lock, which may be sufficient in this case.

These machines are running Linux 2.4.21-27 and the processor type is EM64T.

I am trying to see if I can get my hands on the AMBER application that the 
customer ran prior to the kernel panic.  As I understand it, the kernel panics 
during shutdown only started to occur after users began running the AMBER 
application.  So, unfortunately, I have not been able to reproduce it yet and 
do not have a reproducer to provide.

Any help in resolving this problem would be appreciated.

Thank you.

Rigoberto

Comment 1 Rigoberto Corujo 2005-08-24 21:53:54 UTC
Created attachment 118095 [details]
Photograph of kernel panic

Comment 2 Ernie Petrides 2005-08-24 22:22:06 UTC
Could you please try to reproduce this on the latest released kernel
(version 2.4.21-32.0.1.EL), which was released 3 months ago?  There
have been many important (and potentially relevant) fixes since U4.

Also, please try to capture the full console oops output (with serial
console if necessary).  We at least need to see if the kernel is tainted
and what the module list looks like.

Thanks in advance.


Comment 3 Rigoberto Corujo 2005-09-26 14:18:23 UTC
Hello Ernie,

I'm sorry for taking so long to respond to this case as I've been completely 
swamped.

We'll follow your suggestion of moving to the 2.4.21-32 kernel and see if that 
fixes the problem for this particular customer.  As far as capturing console 
logs via the serial port, that might not be easy to do.  The customer has a 
288 node cluster and it is difficult to know which nodes are going to crash.  
These nodes do have a management port that we use to power them on/off, which 
should also have console redirection capability, but we need to figure out 
what the BIOS recipe is to enable console redirection to the management port.

Anyway, I think you can close this case as we need to first try the 2.4.21-32 
kernel before attempting to further troubleshoot this problem.  Should the 
problem persist even after upgrading, I will file a new Bugzilla.

Thank you very much for your support.

BTW, your name sounds very familiar.  Were you a former kernel developer with 
DEC?

Rigoberto

Comment 4 Ernie Petrides 2005-10-10 23:55:24 UTC
Rigoberto, yes, I used to be a contractor for many years there, and was involved
heavily with OSF/1 -> Digitial UNIX -> Compaq Tru64 UNIX kernel development.

Reverting state to NEEDINFO.

Comment 5 Rigoberto Corujo 2005-11-18 20:18:47 UTC
After receiving a second report by a different customer, we were able to obtain 
enough information to determine that the kernel panic was being caused by the 
Infiniband drivers.

The scenario that leads up to the panic is as follows:

1) In a cluster, node "A" is exporting a filesystem, say "/scratch".
2) Node "B" is NFS mounting "/scratch" from node "A" with the "tcp" mount 
option over the Infiniband interconnect.  It is important to note that he 
problem doesn't occur with "udp".
3) Node "B" runs an application that writes to files in "/scratch".
4) A cluster-wide shutdown command is issued and all the nodes begin to stop 
their services.
5) Node "A's" nfs service is stopped during the shutdown and, therefore, is no 
longer exporting "/scratch".
6) Node "B" unloads its Infiniband driver that was being used to 
mount "/scratch" from node "A".
7) Shortly afterwards, node "B" panics.

It should be noted that if node "A" doesn't shutdown its nfs service before 
node "B" shutdowns down, then the panic does not occur.

We reported the incident to Voltaire, who provides the Infiniband drivers, and 
they provided the following explanation:

----

Linux holds a reference counter on network devices, the counter is increased /
decreased during traffic. There is a kernel implementation related problem that
causes the counter to stay non-zero for very long time (possibly forever). In 
thiscase the device un-registration will cause the machine to wait forever. 
This usually happens during shutdown / reboot during heavy traffic.

During server shutdown / reboot all services are being stopped and all process 
are being killed. Voltaire IBHOST is a registered service and therefore being 
stopped during the shutdown / reboot event. This causes the removal of the 
IPoIB interface and also the removal of the IPoIB kernel module, which calls 
the unregister device command ( From the kernel ).  This issue can also happen 
in Ethernet drivers, the main difference is that Ethernet drivers are not 
removed during shutdown / reboot ( Only the interface is brought
down ) and therefore donât call the unregister_device.

----

Voltaire has provided a patch for this problem.  This case can be considered 
closed.

Thank you for your assistance.

Rigoberto