Bug 126604
Summary: | recovery panic in dlm_recoverd | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | gfs | Assignee: | David Teigland <teigland> |
Status: | CLOSED WORKSFORME | QA Contact: | Derek Anderson <danderso> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | ccaulfie, djansa |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-01-05 22:42:20 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2004-06-23 18:22:26 UTC
AFAICS this was caused by a new debugging statement that referenced a null pointer. It was fixed the following day in changeset 1.1682. is this bug fixed in cvs? I've upgraded many times since this was "fixed" and continue to see this bug. This certainly doesn't happen for me (and I don't think for Patrick). Maybe it requires 6 node to show up (I have 4, Patrick 5). I'm sure it will be simple to fix if we can reproduce it. For now, please add #define DLM_DEBUG_ALL after DLM_DEBUG in dlm_internal.h and collect the console output from the crash. Hit it again (July 13 cvs tree, but didn't have DLM_DEBUG_ALL turned on...) ------------[ cut here ]------------ kernel BUG at kernel/timer.c:405! invalid operand: 0000 [#1] Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harnesd CPU: 0 EIP: 0060:[<c0121b10>] Not tainted EFLAGS: 00010002 (2.6.7) EIP is at cascade+0x40/0x50 eax: f73d6290 ebx: c03b59f8 ecx: c03b59f8 edx: c03b59f8 esi: c03b59f0 edi: c03b5180 ebp: 0000000d esp: c0367f40 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c0366000 task=c0312a40) Stack: 00000000 c03b4ea8 c0367f54 c0367f54 c01220d1 c0367f54 c0367f54 c0122217 00000001 c03b4ea8 0000000a c0314e24 c011e809 00000046 c0364a00 00000000 c011e837 00000000 c01077c5 00000000 c0367fac c0314e24 c0366000 00099100 Call Trace: [<c01220d1>] run_timer_softirq+0xe1/0x150 [<c0122217>] do_timer+0xc7/0xd0 [<c011e809>] __do_softirq+0x79/0x80 [<c011e837>] do_softirq+0x27/0x30 [<c01077c5>] do_IRQ+0xd5/0x110 [<c0105e6c>] common_interrupt+0x18/0x20 [<c0104053>] default_idle+0x23/0x40 [<c01040e4>] cpu_idle+0x34/0x40 [<c03685e2>] start_kernel+0x162/0x1a0 [<c0368330>] unknown_bootoption+0x0/0x120 Code: 0f 0b 95 01 ea e4 2d c0 eb dd 8d b6 00 00 00 00 56 53 83 ec <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing I think I'll need to get a six+ node cluster from somewhere to try this on. I don't think it'll happen on my four node cluster, at least I've never seen it. Updating version to the right level in the defects. Sorry for the storm. hasn't been seen in almost 6 months. |