Bug 307721 - GFS Kernel Panic - NULL pointer dereference
GFS Kernel Panic - NULL pointer dereference
Status: CLOSED NOTABUG
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
i686 Linux
low Severity high
: ---
: ---
Assigned To: Abhijith Das
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-09-26 14:52 EDT by Ed McLain
Modified: 2010-10-22 14:59 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-06 11:41:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Ed McLain 2007-09-26 14:52:43 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7

Description of problem:
Woke up to the following error on one of my nodes.  Found a reference to the same eip using google but no bug was ever filed.


Sep 26 01:29:19 qmail02 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Sep 26 01:29:19 qmail02 kernel:  printing eip:
Sep 26 01:29:19 qmail02 kernel: f8c423a6
Sep 26 01:29:19 qmail02 kernel: *pde = 12214001
Sep 26 01:29:19 qmail02 kernel: Oops: 0000 [#1]
Sep 26 01:29:19 qmail02 kernel: SMP
Sep 26 01:29:19 qmail02 kernel: Modules linked in: iptable_filter ip_tables lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) dell_rbu autofs4 i2c_dev i2c_core md5 ipv6 ipmi_devintf ipmi_si ipmi_msghandler mptctl joydev button battery ac uhci_hcd ehci_hcd hw_random e1000 floppy mptscsih dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mppVhba(U) ata_piix libata mptsas mptfc mptspi mptscsi mptbase megaraid_mbox megaraid_mm mppUpper(U) sg sd_mod scsi_mod
Sep 26 01:29:19 qmail02 kernel: CPU:    0
Sep 26 01:29:19 qmail02 kernel: EIP:    0060:[<f8c423a6>]    Not tainted VLI
Sep 26 01:29:19 qmail02 kernel: EFLAGS: 00010202   (2.6.9-42.0.3.ELsmp)
Sep 26 01:29:19 qmail02 kernel: EIP is at gfs_glock_dq+0xaf/0x16e [gfs]
Sep 26 01:29:19 qmail02 kernel: eax: d9538e8c   ebx: d9538e80   ecx: f7e404ff   edx: 00000000
Sep 26 01:29:19 qmail02 kernel: esi: 00000000   edi: d9538e64   ebp: e25d2a1c   esp: d7a1cf58
Sep 26 01:29:19 qmail02 kernel: ds: 007b   es: 007b   ss: 0068
Sep 26 01:29:19 qmail02 kernel: Process imap (pid: 22920, threadinfo=d7a1c000 task=d60d4230)
Sep 26 01:29:19 qmail02 kernel: Stack: 0129404f f5b3d41c f8c776a0 f8b29000 e25d2a1c e25d2a1c e25d2a04 e25d2a00
Sep 26 01:29:19 qmail02 kernel:        f8c427aa eb6aae80 f8c5745c f7dee6cc eb6aae80 00000000 00000006 00000000
Sep 26 01:29:19 qmail02 kernel:        f8c574d0 f8c5746e 00000000 eb6aae80 c016dd79 f7dee6cc 0000000e bfe0d674
Sep 26 01:29:19 qmail02 kernel: Call Trace:
Sep 26 01:29:19 qmail02 kernel:  [<f8c427aa>] gfs_glock_dq_uninit+0x8/0x10 [gfs]
Sep 26 01:29:19 qmail02 kernel:  [<f8c5745c>] do_unflock+0x4f/0x61 [gfs]
Sep 26 01:29:19 qmail02 kernel:  [<f8c574d0>] gfs_flock+0x62/0x76 [gfs]
Sep 26 01:29:19 qmail02 kernel:  [<f8c5746e>] gfs_flock+0x0/0x76 [gfs]
Sep 26 01:29:19 qmail02 kernel:  [<c016dd79>] sys_flock+0x96/0x119
Sep 26 01:29:19 qmail02 kernel:  [<c02d47cb>] syscall_call+0x7/0xb
Sep 26 01:29:19 qmail02 kernel: Code: f8 ba 57 65 c6 f8 68 2d 62 c6 f8 8b 44 24 14 e8 e0 1e 02 00 59 5b f6 45 15 08 74 06 f0 0f ba 6f 08 04 f6 45 15 04 74 38 8b 57 28 <8b> 02 0f 18 00 90 8d 47 28 39 c2 74 0b ff 04 24 89 54 24 04 8b
Sep 26 01:29:19 qmail02 kernel:  <0>Fatal exception: panic in 5 seconds

Version-Release number of selected component (if applicable):
GFS-6.1.6-1 GFS-kernel-smp-2.6.9-60.3

How reproducible:
Didn't try


Steps to Reproduce:
1.
2.
3.

Actual Results:


Expected Results:


Additional info:
We have 2 identical nodes in our cluster and this panic occurred on only one of the nodes.  The first node showed no errors except for the cluster node failing and being fenced.  Hardware is Dell 2850's with 4GB of ram and a Dell MD3000 SAS array on the backend which is holding the gfs filesystems.
Comment 1 Vincent Riquer 2008-10-13 10:23:21 EDT
Not sure whether this is exactly the same bug.
We run a 3 machine cluster (Dell 1850), with an iscsi backend.

We encounter a similar problem, currently on 2 machines, but we had the same problem on the third one some time ago, resulting in a corrupt filesystem, maybe because of the hard reboots needed to get the machines up again.

The kernel doesn't crash, but processes accessing the gfs mount at the time of the Oops hang. The cluster is a webserver (apache).

On the first machine, dmesg contains:
 <1>BUG: unable to handle kernel NULL pointer dereference at virtual address 000
00000
 printing eip:
f8c20451
*pde = 29557001
*pte = 00000000
Oops: 0000 [#4]
SMP 
Modules linked in: ip_vs_rr linear md_mod lock_dlm dlm gfs lock_harness cman dm_
round_robin dm_emc ib_iser rdma_cm ib_addr ib_cm ib_sa ib_mad ib_core iscsi_tcp 
libiscsi scsi_transport_iscsi xt_state xt_limit xt_tcpudp iptable_mangle iptable
_nat ip_nat ipmi_devintf ipmi_si ipmi_msghandler tun 8021q ipv6 bonding ext2 dm_
snapshot dm_mirror ip_vs_wlc ip_vs_wrr iptable_filter ip_tables ipt_LOG x_tables
 ip_conntrack_ftp ip_conntrack nfnetlink ip_vs ide_floppy floppy e752x_edac rtc 
pcspkr psmouse shpchp pci_hotplug serio_raw edac_mc evdev joydev sg dm_multipath
 dm_mod ext3 jbd mbcache ide_cd cdrom sd_mod usbhid piix siimage megaraid_mbox s
csi_mod megaraid_mm e1000 ehci_hcd generic ide_core uhci_hcd usbcore thermal pro
cessor fan
CPU:    3
EIP:    0060:[<f8c20451>]    Not tainted VLI
EFLAGS: 00010203   (2.6.18-5-686-bigmem #1) 
EIP is at gfs_glock_dq+0x93/0x12e [gfs]
eax: ea39b9cc   ebx: ea39b9a8   ecx: 00000000   edx: 00000000
esi: ef94a758   edi: 0001925e   ebp: d19d3ed8   esp: f74dbefc
ds: 007b   es: 007b   ss: 0068
Process apache (pid: 14285, ti=f74da000 task=dffb7aa0 task.ti=f74da000)
Stack: f8c504c0 ef94a758 ef94a740 00000001 dfcdc380 f8c206a5 ef94a758 f8c2ed67 
       f498eddc 00000007 dfcdc380 f0e45368 dfcdc380 f74dbf30 f74dbf30 00000000 
       dffb7aa0 00000003 00000200 00000000 00000042 00000000 00000000 00000000 
Call Trace:
 [<f8c206a5>] gfs_glock_dq_uninit+0x8/0x10 [gfs]
 [<f8c2ed67>] gfs_flock+0x9b/0x1e1 [gfs]
 [<c01599db>] vfs_read+0x101/0x141
 [<f8c2eccc>] gfs_flock+0x0/0x1e1 [gfs]
 [<c016c037>] sys_flock+0x114/0x147
 [<c0102c77>] syscall_call+0x7/0xb
Code: a7 ec c3 f8 e8 09 c5 01 00 5f 5d f6 46 15 08 74 06 f0 0f ba 6b 08 04 f6 46
 15 04 74 2b 8b 4b 24 31 ed 31 ff eb 05 89 cd 47 89 d1 <8b> 11 0f 18 02 90 8d 43
 24 39 c1 75 ee 39 f5 75 0c 4f 75 09 31 
EIP: [<f8c20451>] gfs_glock_dq+0x93/0x12e [gfs] SS:ESP 0068:f74dbefc

On the second :
 <1>BUG: unable to handle kernel NULL pointer dereference at virtual address 000
00000
 printing eip:
f8c0d451
*pde = 09b44001
*pte = 00000000
Oops: 0000 [#3]
SMP 
Modules linked in: linear md_mod lock_dlm dlm gfs lock_harness cman dm_round_rob
in dm_emc ib_iser rdma_cm ib_addr ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi 
scsi_transport_iscsi iptable_mangle iptable_nat ip_nat xt_tcpudp xt_state ip_con
ntrack nfnetlink xt_limit iptable_filter ip_tables x_tables ipmi_devintf ipmi_si
 ipmi_msghandler 8021q ipv6 bonding ext2 dm_snapshot dm_mirror ip_vs ide_floppy 
rtc psmouse floppy serio_raw pcspkr e752x_edac edac_mc shpchp pci_hotplug evdev 
tsdev sg dm_multipath dm_mod ext3 jbd mbcache ide_cd cdrom sd_mod usbhid siimage
 piix megaraid_mbox scsi_mod megaraid_mm generic ehci_hcd uhci_hcd e1000 ide_cor
e usbcore thermal processor fan
CPU:    3
EIP:    0060:[<f8c0d451>]    Not tainted VLI
EFLAGS: 00010202   (2.6.18-5-686-bigmem #1) 
EIP is at gfs_glock_dq+0x93/0x12e [gfs]
eax: e14cf294   ebx: e14cf270   ecx: 00000000   edx: 00000000
esi: db945ed8   edi: 000ea7a2   ebp: f1d02158   esp: e71c1efc
ds: 007b   es: 007b   ss: 0068
Process apache (pid: 14324, ti=e71c0000 task=e0d1f000 task.ti=e71c0000)
Stack: f8c3d4c0 db945ed8 db945ec0 00000001 dff4b480 f8c0d6a5 db945ed8 f8c1bd67 
       f586117c 00000007 dff4b480 dcb8b690 dff4b480 e71c1f30 e71c1f30 00000000 
       e0d1f000 00000003 00000200 00000000 00000042 00000000 00000000 00000000 
Call Trace:
 [<f8c0d6a5>] gfs_glock_dq_uninit+0x8/0x10 [gfs]
 [<f8c1bd67>] gfs_flock+0x9b/0x1e1 [gfs]
 [<c01599db>] vfs_read+0x101/0x141
 [<f8c1bccc>] gfs_flock+0x0/0x1e1 [gfs]
 [<c016c037>] sys_flock+0x114/0x147
 [<c0102c0d>] sysenter_past_esp+0x56/0x79
Code: a7 bc c2 f8 e8 09 c5 01 00 5f 5d f6 46 15 08 74 06 f0 0f ba 6b 08 04 f6 46
 15 04 74 2b 8b 4b 24 31 ed 31 ff eb 05 89 cd 47 89 d1 <8b> 11 0f 18 02 90 8d 43
 24 39 c1 75 ee 39 f5 75 0c 4f 75 09 31 
EIP: [<f8c0d451>] gfs_glock_dq+0x93/0x12e [gfs] SS:ESP 0068:e71c1efc

Those systems are running Debian 4.0, kernel 2.6.18-5-686-bigmem with redhat-cluster-modules-2.6.18-5-686-bigmem version 2.6.18+1.03.00-7+etch3

I could not find any way to reproduce this (creating / deleting files in directories apache processes where accessing didn't work).
Comment 4 Steve Whitehouse 2009-05-06 11:41:55 EDT
Closing this one as its been resolved according to comment #3. Please reopen if there are any outstanding issues.

Note You need to log in before you can comment on or make changes to this bug.