Bug 170893

Summary: bug in checkpoint.c causes system panic - __journal_remove_checkpoint
Product: Red Hat Enterprise Linux 4 Reporter: Sean Plaice <splaice>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: bugzilla, jbaron, jwest, rwheeler, sct
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-07 04:52:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sean Plaice 2005-10-15 04:40:49 UTC
This system is scheduled to be updated the latest RHEL4-u2 kernel, but listed
fixes for kernel-2.6.9-22 does not list anything that appears to address this.

I found some other information on the net regarding other experiencing this
problem with similar call traces, and similar usage (data=journal mounts). This
post to ext3-users specifically
https://www.redhat.com/archives/ext3-users/2005-February/msg00045.html.

If this has been addressed in the latest U2 kernel please let me know, if not
please advise on status. This is a production system so I am not able to do any
intrusive tests.

Thanks.

Application:
kernel-2.6.9-11.ELsmp
Raid controller driver megaraid_mbox 
qmail mail system

Hardware:
Dell PowerEdge 2850
Perc4e/Di Raid Control (LSI Megaraid) - configured in RAID10

Filesystem Configuration:
/dev/sda3 on / type ext3 (rw)
/dev/sda2 on /boot type ext3 (rw)
/dev/sda8 on /var type ext3 (rw,noatime,nodiratime)
/dev/sda9 on /var/qmail/queue type ext3 (rw,noatime,nodiratime,data=journal)
/dev/sda5 on /var/scanner type ext3 (rw,noatime,nodiratime)
/dev/sda6 on /usr/vpopmail/domains type ext3 (rw,noatime,nodiratime)

System message log from crash:
Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:613:
"transaction->t_forget == NULL"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:613!
invalid operand: 0000 [#1]
SMP
Modules linked in: md5 ipv6 parport_pc lp parport dcdipm(U) dcdbas(U) autofs4
sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables joydev dm_mod
button battery ac uhci_hcd ehci_hcd e1000 floppy sg ext3 jbd megaraid_mbox
megaraid_mm sd_mod scsi_mod
CPU:    1
EIP:    0060:[<f8833320>]    Tainted: P      VLI
EFLAGS: 00010212   (2.6.9-11.ELsmp)
EIP is at __journal_drop_transaction+0x114/0x2a0
[jbd]
eax: 00000071   ebx: c2267480   ecx: f752edc8   edx: f8836d4c
esi: c21f7400   edi: cce9aaac   ebp: c2267a80   esp: f752edc4
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 1653, threadinfo=f752e000 task=f769ecb0)
Stack: f8836d4c f8835f40 f8836d38 00000265 f8836ef9 c2267480 c21f7400 f8833158
       c3e9796c e85a66fc f8831fb9 f7f74680 00000000 00000f6c cca81094 00000000
       00000000 cfa95e0c c21f7400 cf3728fc 000010a7 00000000 f769ecb0 c011f6ee
Call Trace:
 [<f8833158>] __journal_remove_checkpoint+0x4d/0x65 [jbd]
 [<f8831fb9>] journal_commit_transaction+0xdae/0xfb1 [jbd]
 [<c011f6ee>] autoremove_wake_function+0x0/0x2d
 [<f885ef19>] megaraid_isr+0x1ad/0x1bf [megaraid_mbox]
 [<c011f6ee>] autoremove_wake_function+0x0/0x2d
 [<c0217301>] elv_next_request+0xc7/0xce
 [<c0235428>] ide_do_request+0x63/0x2a6
 [<f8833e6d>] kjournald+0xc7/0x213 [jbd]
 [<c011f6ee>] autoremove_wake_function+0x0/0x2d
 [<c011f6ee>] autoremove_wake_function+0x0/0x2d
 [<c011cb39>] schedule_tail+0x12/0x55
 [<f8833da0>] commit_timeout+0x0/0x5 [jbd]
 [<f8833da6>] kjournald+0x0/0x213 [jbd]
 [<c01041f1>] kernel_thread_helper+0x5/0xb
Code: 38 6d 83 f8 83 c4 14 83 7b 24 00 74 29 68 f9 6e 83 f8 68 65 02 00 00 68 38
6d 83 f8 68 40 5f 83 f8 68 4c 6d 83 f8 e8 b6 e6
8e c7 <0f> 0b 65 02 38 6d 83 f8 83 c4 14 83 7b 2c 00 74 29 68 17 6f 83
 <0>Fatal exception: panic in 5 seconds

System DMESG:
etected 2993.010 MHz processor.
Using hpet for high-res timesource
Calibrating delay loop... 5931.00 BogoMIPS (lpj=2965504)
Security Scaffold v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
There is already a security framework initialized, register_security failed.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000
CPU: After vendor identify, caps:  bfebfbff 20100000 00000000 00000000
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 2048K
CPU: Physical Processor ID: 0
CPU: After all inits, caps:        bfebfbff 20100000 00000000 00000080
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
CPU0: Thermal monitoring enabled
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Xeon(TM) CPU 3.00GHz stepping 03
per-CPU timeslice cutoff: 1749.71 usecs.
task migration cache decay timeout: 2 msecs.
Booting processor 1/1 eip 3000
CPU 1 irqstacks, hard=c03dc000 soft=c03bc000
Initializing CPU#1
Calibrating delay loop... 5980.16 BogoMIPS (lpj=2990080)
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000
CPU: After vendor identify, caps:  bfebfbff 20100000 00000000 00000000
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 2048K
CPU: Physical Processor ID: 0
CPU: After all inits, caps:        bfebfbff 20100000 00000000 00000080
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Xeon(TM) CPU 3.00GHz stepping 03
Total of 2 processors activated (11911.16 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
zapping low mappings.
checking if image is initramfs... it is
Freeing initrd memory: 466k freed
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfbf0e, last bus=11
PCI: Using MMCONFIG
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040816
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PALO._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PALO.DOBA._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PALO.DOBB._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PBLO._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PBHI._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PBHI.PXB1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PBHI.PXB2._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.VPR1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.VPR1.PXC1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.VPR1.PXC2._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PICH._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12)
ACPI: PCI Interrupt Link [LNKB] (IRQs *3 4 5 6 7 10 11 12)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 *7 10 11 12)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *10 11 12)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 *11 12)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 *10 11 12)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 *5 6 7 10 11 12)
Linux Plug and Play Support v0.97 (c) Adam Belay
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
ACPI: PCI interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI interrupt 0000:00:04.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI interrupt 0000:00:05.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI interrupt 0000:00:06.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 177
ACPI: PCI interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 185
ACPI: PCI interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 193
ACPI: PCI interrupt 0000:00:1f.1[A]: no GSI
ACPI: PCI interrupt 0000:02:0e.0[A] -> GSI 46 (level, low) -> IRQ 201
ACPI: PCI interrupt 0000:06:07.0[A] -> GSI 64 (level, low) -> IRQ 209
ACPI: PCI interrupt 0000:07:08.0[A] -> GSI 65 (level, low) -> IRQ 217
ACPI: PCI interrupt 0000:0b:05.0[A] -> GSI 20 (level, low) -> IRQ 225
ACPI: PCI interrupt 0000:0b:05.1[B] -> GSI 21 (level, low) -> IRQ 233
ACPI: PCI interrupt 0000:0b:06.0[A] -> GSI 23 (level, low) -> IRQ 193
ACPI: PCI interrupt 0000:0b:0d.0[A] -> GSI 18 (level, low) -> IRQ 185
apm: BIOS not found.
audit: initializing netlink socket (disabled)
audit(1129327958.614:0): initialized
highmem bounce pool size: 64 pages
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
SELinux:  Registering netfilter hooks
Initializing Cryptographic API
ksign: Installing public key data
Loading keyring
- Added public key D67B3E6B1ED6FEC7
- User ID: Red Hat, Inc. (Kernel Module GPG key)
Intel E7520/7320/7525 detected.<6>pci_hotplug: PCI Hot Plug PCI Core version: 0.5
ACPI: Processor [CPU0] (supports C1)
ACPI: Processor [CPU1] (supports C1)
Real Time Clock Driver v1.12
Linux agpgart interface v0.100 (c) Dave Jones
Failed to disable AUX port, but continuing anyway... Is this a SiS?
If AUX port is really absent please use the 'i8042.noaux' option.
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a NS16550A
ACPI: PCI interrupt 0000:0b:05.1[B] -> GSI 21 (level, low) -> IRQ 233
ttyS4 at I/O 0xcc80 (irq = 233) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
divert: not allocating divert_blk for non-ethernet device lo
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH5: IDE controller at PCI slot 0000:00:1f.1
PCI: Enabling device 0000:00:1f.1 (0005 -> 0007)
ACPI: PCI interrupt 0000:00:1f.1[A]: no GSI
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: TEAC CD-ROM CD-224E, ATAPI CD/DVD-ROM drive
Using cfq io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
SiI680: IDE controller at PCI slot 0000:0b:06.0
ACPI: PCI interrupt 0000:0b:06.0[A] -> GSI 23 (level, low) -> IRQ 193
SiI680: chipset revision 2
SiI680: BASE CLOCK == 133
SiI680: 100% native mode on irq 193
    ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio
    ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio
Probing IDE interface ide2...
hde: VIRTUALFLOPPY DRIVE Floppy, ATAPI FLOPPY drive
hdf: VIRTUALCDROM DRIVE, ATAPI CD/DVD-ROM drive
hde: set_drive_speed_status: status=0x40 { DriveReady }
hdf: set_drive_speed_status: status=0x40 { DriveReady }
ide2 at 0xf8804c80-0xf8804c87,0xf8804c8a on irq 193
Probing IDE interface ide3...
Probing IDE interface ide1...
Probing IDE interface ide3...
Probing IDE interface ide4...
Probing IDE interface ide5...
hda: ATAPI 24X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
ide-floppy: Can't get floppy parameters
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard on isa0060/serio0
input: PS/2 Generic Mouse on isa0060/serio1
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP: routing cache hash table of 8192 buckets, 128Kbytes
TCP: Hash tables configured (established 262144 bind 43690)
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
ACPI: (supports S0 S4 S5)
ACPI wakeup devices:
PCI0 PALO PBLO PBHI VPR1 PICH
Freeing unused kernel memory: 176k freed
SCSI subsystem initialized
megaraid cmm: 2.20.2.5 (Release Date: Fri Jan 21 00:01:03 EST 2005)
megaraid: 2.20.4.5 (Release Date: Thu Feb 03 12:27:22 EST 2005)
megaraid: probe new device 0x1028:0x0013:0x1028:0x016d: bus 2:slot 14:func 0
ACPI: PCI interrupt 0000:02:0e.0[A] -> GSI 46 (level, low) -> IRQ 201
megaraid: fw version:[521S] bios version:[H430]
scsi0 : LSI Logic MegaRAID driver
scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
  Vendor: PE/PV     Model: 1x6 SCSI BP       Rev: 1.0
  Type:   Processor                          ANSI SCSI revision: 02
scsi[0]: scanning scsi channel 1 [Phy 1] for non-raid devices
scsi[0]: scanning scsi channel 2 [virtual] for logical drives
  Vendor: MegaRAID  Model: LD 0 RAID1  279G  Rev: 521S
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 573030400 512-byte hdwr sectors (293392 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 >
Attached scsi disk sda at scsi0, channel 2, id 0, lun 0
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
SELinux:  Disabled at runtime.
SELinux:  Unregistering netfilter hooks
Attached scsi generic sg0 at scsi0, channel 0, id 6, lun 0,  type 3
Attached scsi generic sg1 at scsi0, channel 2, id 0, lun 0,  type 0
inserting floppy driver for 2.6.9-11.ELsmp
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
Intel(R) PRO/1000 Network Driver - version 5.6.10.1-k2-NAPI
Copyright (c) 1999-2004 Intel Corporation.
ACPI: PCI interrupt 0000:06:07.0[A] -> GSI 64 (level, low) -> IRQ 209
divert: allocating divert_blk for eth0
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI interrupt 0000:07:08.0[A] -> GSI 65 (level, low) -> IRQ 217
divert: allocating divert_blk for eth1
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
hw_random: RNG not detected
ACPI: PCI interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 193
ehci_hcd 0000:00:1d.7: EHCI Host Controller
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: irq 193, pci mem f881c000
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
PCI: cache line size of 128 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
USB Universal Host Controller Interface driver v2.2
ACPI: PCI interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 169
uhci_hcd 0000:00:1d.0: UHCI Host Controller
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: irq 169, io base 0000bce0
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 177
uhci_hcd 0000:00:1d.1: UHCI Host Controller
PCI: Setting latency timer of device 0000:00:1d.1 to 64
uhci_hcd 0000:00:1d.1: irq 177, io base 0000bcc0
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 185
uhci_hcd 0000:00:1d.2: UHCI Host Controller
PCI: Setting latency timer of device 0000:00:1d.2 to 64
uhci_hcd 0000:00:1d.2: irq 185, io base 0000bca0
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
usb 1-3: new high speed USB device using address 3
hub 1-3:1.0: USB hub found
hub 1-3:1.0: 2 ports detected
ACPI: Power Button (FF) [PWRF]
usb 2-1: new full speed USB device using address 2
input: USB HID v1.10 Keyboard [Dell DRAC4] on usb-0000:00:1d.0-1
input: USB HID v1.10 Mouse [Dell DRAC4] on usb-0000:00:1d.0-1
EXT3 FS on sda3, internal journal
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm.com
cdrom: open failed.
ide-floppy: Can't get floppy parameters
cdrom: open failed.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda9, internal journal
EXT3-fs: mounted filesystem with journal data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda6, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 4192924k swap on /dev/sda7.  Priority:-1 extents:1
ip_tables: (C) 2000-2002 Netfilter core team
ip_conntrack version 2.1 (8192 buckets, 65536 max) - 340 bytes per conntrack
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
e1000: eth1: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
dcdbas: module license 'unspecified' taints kernel.
lp: driver loaded but no devices found
NET: Registered protocol family 10
Disabled Privacy Extensions on device c03356c0(lo)
IPv6 over IPv4 tunneling driver
divert: not allocating divert_blk for non-ethernet device sit0
eth1: no IPv6 routers present
eth0: no IPv6 routers present


How reproducible:
Cannot reproduce, production system.


Possible Steps to Reproduce:
1. Simulate qmail's fsync() behaviour on data=journal mounted fs.

Comment 1 Suzanne Hillman 2005-10-17 15:21:31 UTC
This looks like something for which you would be best off going through our
support. In order to do this, please either contact Red Hat's Technical
Support line at 888-GO-REDHAT or file a web ticket at
http://www.redhat.com/apps/support/.  Bugzilla is not an official support
channel, has no response guarantees, and may not route your issue to the
correct area to assist you.  Using the official support channels above will
guarantee that your issue is handled appropriately and routed to the
individual or group which can best assist you with this issue and will also
allow Red Hat to track the issue, ensuring that any applicable bug fix is
included in all releases and is not dropped from a future update or major
release.

Comment 2 Sean Plaice 2005-10-27 23:01:35 UTC
I have filed a support request to follow up on the problem via that channel.

Comment 3 Sean Plaice 2005-10-27 23:03:34 UTC
This problem appears to still be occuring with 2.6.9-22 kernel. Though it fails
to provide the same details in the system messages log.

The server is in a remote location, and will not have a serial console till next
week to capture the complete panic log from the console.

Comment 7 Sander 2010-02-03 11:30:38 UTC
We _seem_ to be affected by the same problem on a different architecture (i386), the following error:

Feb  3 04:53:51 ebsdb kernel: Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
Feb  3 04:53:51 ebsdb kernel: in_atomic():0[expected: 0], irqs_disabled():1
Feb  3 04:53:51 ebsdb kernel:  [<02120c1d>] __might_sleep+0x7d/0x88
Feb  3 04:53:51 ebsdb kernel:  [<0215796c>] rw_vm+0xe4/0x29c
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<02157de3>] get_user_size+0x30/0x57
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<0211b5c4>] __is_prefetch+0x1d5/0x2ba
Feb  3 04:53:51 ebsdb kernel:  [<02138ba8>] search_module_extables+0x5d/0x64
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<0211b9f9>] do_page_fault+0x350/0x5f7
Feb  3 04:53:51 ebsdb kernel:  [<022d43d9>] __cond_resched+0x14/0x39
Feb  3 04:53:51 ebsdb kernel:  [<021442b9>] rmqueue_bulk+0x5b/0x65
Feb  3 04:53:51 ebsdb kernel:  [<02144648>] buffered_rmqueue+0x17d/0x1a5
Feb  3 04:53:51 ebsdb kernel:  [<0211b6a9>] do_page_fault+0x0/0x5f7
Feb  3 04:53:51 ebsdb kernel:  [<02131675>] find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel:  [<02131803>] find_task_by_pid_type+0x8/0x1d
Feb  3 04:53:51 ebsdb kernel:  [<0211e04c>] sched_exit+0x1d/0xbc
Feb  3 04:53:51 ebsdb kernel:  [<021241ca>] release_task+0xb6/0xfa
Feb  3 04:53:51 ebsdb kernel:  [<02125d5c>] wait_task_zombie+0x475/0x48b
Feb  3 04:53:51 ebsdb kernel:  [<021262fd>] do_wait+0x183/0x3b8
Feb  3 04:53:51 ebsdb kernel:  [<0211f28b>] default_wake_function+0x0/0xc
Feb  3 04:53:51 ebsdb kernel:  [<0212dfb9>] sys_rt_sigaction+0x73/0x88
Feb  3 04:53:51 ebsdb kernel:  [<0211f28b>] default_wake_function+0x0/0xc
Feb  3 04:53:51 ebsdb kernel:  [<021265c5>] sys_wait4+0x27/0x2a
Feb  3 04:53:51 ebsdb kernel:  [<021265db>] sys_waitpid+0x13/0x17
Feb  3 04:53:51 ebsdb kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Feb  3 04:53:51 ebsdb kernel:  printing eip:
Feb  3 04:53:51 ebsdb kernel: 02131675
Feb  3 04:53:51 ebsdb kernel: *pde = 00004001
Feb  3 04:53:51 ebsdb kernel: Oops: 0000 [#1]
Feb  3 04:53:51 ebsdb kernel: SMP 
Feb  3 04:53:51 ebsdb kernel: Modules linked in: mptctl mptbase hpilo(U) nfsd exportfs autofs4 nfs lockd nfs_acl sunrpc 8021q dm_mirror dm_round_robin dm_multipath button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 bonding(U) floppy sg ext3 jbd dm_mod cciss sd_mod qla2xxx(U) scsi_mod qla2xxx_conf(U)
Feb  3 04:53:51 ebsdb kernel: CPU:    0
Feb  3 04:53:51 ebsdb kernel: EIP:    0060:[<02131675>]    Not tainted VLI
Feb  3 04:53:51 ebsdb kernel: EFLAGS: 00010086   (2.6.9-89.0.18.ELhugemem) 
Feb  3 04:53:51 ebsdb kernel: EIP is at find_pid+0x26/0x3a
Feb  3 04:53:51 ebsdb kernel: eax: 0f1e1000   ebx: 00002fcf   ecx: 00000000   edx: c1e7586c
Feb  3 04:53:51 ebsdb kernel: esi: f3581430   edi: 00000000   ebp: c1259ed0   esp: c1259eac
Feb  3 04:53:51 ebsdb kernel: ds: 007b   es: 007b   ss: 0068
Feb  3 04:53:51 ebsdb kernel: Process hpetfe (pid: 12239, threadinfo=c1259000 task=c1e757b0)
Feb  3 04:53:51 ebsdb kernel: Stack: 00000000 02131803 f3581430 0211e04c f3581430 f3581430 f3581430 f3581430 
Feb  3 04:53:51 ebsdb kernel:        00000000 00000000 021241ca f3581430 00002fd3 00000000 00000000 02125d5c 
Feb  3 04:53:51 ebsdb kernel:        03000000 00000000 00000003 00000000 a0ff8080 0011a6e2 39e805b0 c1e757b0 
Feb  3 04:53:51 ebsdb kernel: Call Trace:
Feb  3 04:53:51 ebsdb kernel:  [<02131803>] find_task_by_pid_type+0x8/0x1d
Feb  3 04:53:51 ebsdb kernel:  [<0211e04c>] sched_exit+0x1d/0xbc
Feb  3 04:53:51 ebsdb kernel:  [<021241ca>] release_task+0xb6/0xfa
Feb  3 04:53:51 ebsdb kernel:  [<02125d5c>] wait_task_zombie+0x475/0x48b
Feb  3 04:53:51 ebsdb kernel:  [<021262fd>] do_wait+0x183/0x3b8
Feb  3 04:53:51 ebsdb kernel:  [<0211f28b>] default_wake_function+0x0/0xc
Feb  3 04:53:51 ebsdb kernel:  [<0212dfb9>] sys_rt_sigaction+0x73/0x88
Feb  3 04:53:51 ebsdb kernel:  [<0211f28b>] default_wake_function+0x0/0xc
Feb  3 04:53:51 ebsdb kernel:  [<021265c5>] sys_wait4+0x27/0x2a
Feb  3 04:53:51 ebsdb kernel:  [<021265db>] sys_waitpid+0x13/0x17
Feb  3 04:53:51 ebsdb kernel: Code: c8 ff 5b 5e c3 53 b9 20 00 00 00 8b 04 85 84 fe 43 02 2b 0d 94 fe 43 02 89 d3 69 d2 01 00 37 9e d3 ea 8b 14 90 85 d2 74 12 8b 0a <0f> 18 01 90 39 5a fc 8d 42 fc 74 06 89 ca eb ea 31 c0 5b c3 55 
Feb  3 04:53:51 ebsdb kernel:  <0>Fatal exception: panic in 5 seconds

This issue has occurred several times in the past months (since the server was re-installed with RHEL4 instead of RHEL3).

This is a HP Proliant DL585 G1 server (RHEL 4 update 9) with the following kernel:

Linux ebsdb 2.6.9-89.0.18.ELhugemem #1 SMP Wed Nov 25 06:13:02 EST 2009 i686 athlon i386 GNU/Linux

Further specs:

2 dual-core AMD Opteron 848 processors, 24GB memory (24 * 1GB dimms, ECC).

Comment 8 Sander 2010-02-03 12:44:32 UTC
Please ignore the comment above, I meant to attach the comment to https://bugzilla.redhat.com/show_bug.cgi?id=175189 .