Bug 161617 - RHEL4 Panics at smp_apic_timer_interrupt
RHEL4 Panics at smp_apic_timer_interrupt
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: David Woodhouse
Brian Brock
:
Depends On:
Blocks: 168429
  Show dependency treegraph
 
Reported: 2005-06-24 16:35 EDT by GV Govindasamy
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-07 14:12:26 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
console screen shot of panic/oops (1.70 MB, image/jpeg)
2005-06-30 09:52 EDT, Chuck Lever
no flags Details
Another console screen (83.06 KB, image/jpeg)
2005-07-21 19:56 EDT, GV Govindasamy
no flags Details
Debugging patch, for reference. (971 bytes, patch)
2005-07-27 08:36 EDT, David Woodhouse
no flags Details | Diff

  None (edit)
Description GV Govindasamy 2005-06-24 16:35:26 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

Description of problem:
Periodically couple of systems (same kind of hardware) PANICed. I have captured only the following message from the console.

Call tree:
Run_timer_softirq
__do_softirq
Do_softirq
-------------
-------------
Smp_apic_timer_interrupt
Apic_timer_interrupt
Mwait_idle
Cpu_idle
Start_kernel


Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-11.EL

How reproducible:
Didn't try


Additional info:
Comment 1 Jason Baron 2005-06-28 15:31:55 EDT
we really need a better panic trace to start on this one. If you can reproduce
this perhaps you could attach a serail console output? thanks.
Comment 2 Michael Waite 2005-06-29 12:51:47 EDT
Any word from the folks at NetApp on getting this data to us?
I spoke with Charles Lever and this is a pretty hot topic that we would like to
find a resolution for.
Comment 3 Chuck Lever 2005-06-30 09:52:15 EDT
Created attachment 116169 [details]
console screen shot of panic/oops
Comment 4 Jason Merrill 2005-07-07 17:55:47 EDT
I don't think you really want me working on this bug.
Comment 5 Dave Jones 2005-07-07 20:49:53 EDT
whoops, I picked the wrong Jason :)
Comment 6 Chuck Lever 2005-07-20 10:37:00 EDT
dmesg and lspci output on a typical system that sees this problem follows.  any
update on when this might be addressed?

> Dmesg:
> ====================================
> Linux version 2.6.9-11.ELsmp 
> (bhcompile@decompose.build.redhat.com) (gcc version 3.4.3 
> 20050227 (Red Hat 3.4.3-22)) #1 SMP Fri May 20 18:26:27 EDT 2005
> BIOS-provided physical RAM map:
>  BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)
>  BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000ea070 - 0000000000100000 (reserved)
>  BIOS-e820: 0000000000100000 - 00000000dffc0000 (usable)
>  BIOS-e820: 00000000dffc0000 - 00000000dffcf000 (ACPI data)
>  BIOS-e820: 00000000dffcf000 - 00000000dfff0000 (ACPI NVS)
>  BIOS-e820: 00000000dfff0000 - 00000000e0000000 (reserved)
>  BIOS-e820: 00000000fec00000 - 00000000fec86000 (reserved)
>  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
>  BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
>  BIOS-e820: 0000000100000000 - 0000000120000000 (usable)
> 3712MB HIGHMEM available.
> 896MB LOWMEM available.
> found SMP MP-table at 000ff780
> On node 0 totalpages: 1179648
>   DMA zone: 4096 pages, LIFO batch:1
>   Normal zone: 225280 pages, LIFO batch:16
>   HighMem zone: 950272 pages, LIFO batch:16
> DMI 2.3 present.
> Using APIC driver default
> ACPI: RSDP (v000 ACPIAM                                ) @ 0x000f7710
> ACPI: RSDT (v001 A M I  OEMRSDT  0x09000424 MSFT 0x00000097) 
> @ 0xdffc0000
> ACPI: FADT (v002 A M I  OEMFACP  0x09000424 MSFT 0x00000097) 
> @ 0xdffc0200
> ACPI: MADT (v001 A M I  OEMAPIC  0x09000424 MSFT 0x00000097) 
> @ 0xdffc0390
> ACPI: OEMB (v001 A M I  AMI_OEM  0x09000424 MSFT 0x00000097) 
> @ 0xdffcf040
> ACPI: DSDT (v001  0ABDI 0ABDI007 0x00000007 INTL 0x02002026) 
> @ 0x00000000
> ACPI: PM-Timer IO Port: 0x408
> ACPI: Local APIC address 0xfee00000
> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
> Processor #0 15:3 APIC version 20
> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
> Processor #6 15:3 APIC version 20
> ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
> Processor #1 15:3 APIC version 20
> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled)
> Processor #7 15:3 APIC version 20
> Enabling APIC mode:  Flat.  Using 0 I/O APICs
> ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
> IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
> ACPI: IOAPIC (id[0x09] address[0xfec10000] gsi_base[24])
> IOAPIC[1]: apic_id 9, version 32, address 0xfec10000, GSI 24-47
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: IRQ0 used by override.
> ACPI: IRQ2 used by override.
> ACPI: IRQ9 used by override.
> Using ACPI (MADT) for SMP configuration information
> Built 1 zonelists
> Kernel command line: auto BOOT_IMAGE=2.6.9-11.ELsmp ro 
> BOOT_FILE=/boot/vmlinuz-2.6.9-11.ELsmp rhgb quiet root=LABEL=/
> Initializing CPU#0
> CPU 0 irqstacks, hard=c03db000 soft=c03bb000
> PID hash table entries: 4096 (order: 12, 65536 bytes)
> Detected 2801.842 MHz processor.
> Using tsc for high-res timesource
> Console: colour VGA+ 80x25
> Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> Memory: 4150216k/4718592k available (1824k kernel code, 
> 42940k reserved, 744k data, 176k init, 3276544k highmem)
> Calibrating delay loop... 5521.40 BogoMIPS (lpj=2760704)
> Security Scaffold v1.0.0 initialized
> SELinux:  Initializing.
> SELinux:  Starting in permissive mode
> There is already a security framework initialized, 
> register_security failed.
> selinux_register_security:  Registering secondary module capability
> Capability LSM initialized as secondary
> Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
> CPU: After generic identify, caps: bfebfbff 20000000 00000000 00000000
> CPU: After vendor identify, caps:  bfebfbff 20000000 00000000 00000000
> monitor/mwait feature present.
> using mwait in idle threads.
> CPU: Trace cache: 12K uops, L1 D cache: 16K
> CPU: L2 cache: 1024K
> CPU: Physical Processor ID: 0
> CPU: After all inits, caps:        bfebf3ff 20000000 00000000 00000080
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#0.
> CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
> CPU0: Thermal monitoring enabled
> Enabling fast FPU save and restore... done.
> Enabling unmasked SIMD FPU exception support... done.
> Checking 'hlt' instruction... OK.
> CPU0: Intel(R) Xeon(TM) CPU 3.00GHz stepping 04
> per-CPU timeslice cutoff: 2926.40 usecs.
> task migration cache decay timeout: 3 msecs.
> Booting processor 1/1 eip 3000
> CPU 1 irqstacks, hard=c03dc000 soft=c03bc000
> Initializing CPU#1
> Calibrating delay loop... 5586.94 BogoMIPS (lpj=2793472)
> CPU: After generic identify, caps: bfebfbff 20000000 00000000 00000000
> CPU: After vendor identify, caps:  bfebfbff 20000000 00000000 00000000
> monitor/mwait feature present.
> CPU: Trace cache: 12K uops, L1 D cache: 16K
> CPU: L2 cache: 1024K
> CPU: Physical Processor ID: 0
> CPU: After all inits, caps:        bfebf3ff 20000000 00000000 00000080
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#1.
> CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
> CPU1: Thermal monitoring enabled
> CPU1: Intel(R) Xeon(TM) CPU 3.00GHz stepping 04
> Booting processor 2/6 eip 3000
> CPU 2 irqstacks, hard=c03dd000 soft=c03bd000
> Initializing CPU#2
> Calibrating delay loop... 5586.94 BogoMIPS (lpj=2793472)
> CPU: After generic identify, caps: bfebfbff 20000000 00000000 00000000
> CPU: After vendor identify, caps:  bfebfbff 20000000 00000000 00000000
> monitor/mwait feature present.
> CPU: Trace cache: 12K uops, L1 D cache: 16K
> CPU: L2 cache: 1024K
> CPU: Physical Processor ID: 3
> CPU: After all inits, caps:        bfebf3ff 20000000 00000000 00000080
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#2.
> CPU2: Intel P4/Xeon Extended MCE MSRs (24) available
> CPU2: Thermal monitoring enabled
> CPU2: Intel(R) Xeon(TM) CPU 3.00GHz stepping 04
> Booting processor 3/7 eip 3000
> CPU 3 irqstacks, hard=c03de000 soft=c03be000
> Initializing CPU#3
> Calibrating delay loop... 5586.94 BogoMIPS (lpj=2793472)
> CPU: After generic identify, caps: bfebfbff 20000000 00000000 00000000
> CPU: After vendor identify, caps:  bfebfbff 20000000 00000000 00000000
> monitor/mwait feature present.
> CPU: Trace cache: 12K uops, L1 D cache: 16K
> CPU: L2 cache: 1024K
> CPU: Physical Processor ID: 3
> CPU: After all inits, caps:        bfebf3ff 20000000 00000000 00000080
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#3.
> CPU3: Intel P4/Xeon Extended MCE MSRs (24) available
> CPU3: Thermal monitoring enabled
> CPU3: Intel(R) Xeon(TM) CPU 3.00GHz stepping 04
> Total of 4 processors activated (22282.24 BogoMIPS).
> ENABLING IO-APIC IRQs
> ..TIMER: vector=0x31 pin1=2 pin2=-1
> checking TSC synchronization across 4 CPUs: passed.
> Brought up 4 CPUs
> zapping low mappings.
> checking if image is initramfs... it is
> Freeing initrd memory: 472k freed
> NET: Registered protocol family 16
> PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=3
> PCI: Using configuration type 1
> mtrr: v2.0 (20020519)
> ACPI: Subsystem revision 20040816
> ACPI: Interpreter enabled
> ACPI: Using IOAPIC for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (00:00)
> PCI: Probing PCI hardware (bus 00)
> PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.2
> PCI: Transparent bridge - 0000:00:1e.0
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.EPA0._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0PC._PRT]
> ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 *10 11 12 14 15)
> ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 *11 12 14 15)
> ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *10 11 12 14 15)
> ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 *7 10 11 12 14 15)
> ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 
> 15) *0, disabled.
> ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 
> 15) *0, disabled.
> ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 
> 15) *0, disabled.
> ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 *5 6 7 10 11 12 14 15)
> Linux Plug and Play Support v0.97 (c) Adam Belay
> usbcore: registered new driver usbfs
> usbcore: registered new driver hub
> PCI: Using ACPI for IRQ routing
> ACPI: PCI interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169
> ACPI: PCI interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 169
> ACPI: PCI interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 177
> ACPI: PCI interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 185
> ACPI: PCI interrupt 0000:00:1f.2[A] -> GSI 18 (level, low) -> IRQ 193
> ACPI: PCI interrupt 0000:00:1f.3[B] -> GSI 17 (level, low) -> IRQ 201
> ACPI: PCI interrupt 0000:03:04.0[A] -> GSI 18 (level, low) -> IRQ 193
> ACPI: PCI interrupt 0000:03:05.0[A] -> GSI 17 (level, low) -> IRQ 201
> apm: BIOS not found.
> audit: initializing netlink socket (disabled)
> audit(1121768818.922:0): initialized
> highmem bounce pool size: 64 pages
> Total HugeTLB memory allocated, 0
> VFS: Disk quotas dquot_6.5.1
> Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
> SELinux:  Registering netfilter hooks
> Initializing Cryptographic API
> ksign: Installing public key data
> Loading keyring
> - Added public key D67B3E6B1ED6FEC7
> - User ID: Red Hat, Inc. (Kernel Module GPG key)
> pci_hotplug: PCI Hot Plug PCI Core version: 0.5
> ACPI: Processor [CPU1] (supports C1, 8 throttling states)
> ACPI: Processor [CPU2] (supports C1)
> ACPI: Processor [CPU3] (supports C1)
> ACPI: Processor [CPU4] (supports C1)
> Real Time Clock Driver v1.12
> Linux agpgart interface v0.100 (c) Dave Jones
> serio: i8042 AUX port at 0x60,0x64 irq 12
> serio: i8042 KBD port at 0x60,0x64 irq 1
> Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ 
> sharing enabled
> ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
> divert: not allocating divert_blk for non-ethernet device lo
> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> ide: Assuming 33MHz system bus speed for PIO modes; override 
> with idebus=xx
> Probing IDE interface ide0...
> hda: CD-224E, ATAPI CD/DVD-ROM drive
> ide1: I/O resource 0x170-0x177 not free.
> ide1: ports already in use, skipping probe
> Probing IDE interface ide2...
> Probing IDE interface ide3...
> Probing IDE interface ide4...
> Probing IDE interface ide5...
> Using cfq io scheduler
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> hda: ATAPI 24X CD-ROM drive, 128kB Cache
> Uniform CD-ROM driver Revision: 3.20
> ide-floppy driver 0.99.newide
> usbcore: registered new driver hiddev
> usbcore: registered new driver usbhid
> drivers/usb/input/hid-core.c: v2.0:USB HID core driver
> mice: PS/2 mouse device common for all mice
> input: AT Translated Set 2 keyboard on isa0060/serio0
> input: ImPS/2 Logitech Wheel Mouse on isa0060/serio1
> md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
> NET: Registered protocol family 2
> IP: routing cache hash table of 32768 buckets, 512Kbytes
> TCP: Hash tables configured (established 262144 bind 43690)
> Initializing IPsec netlink socket
> NET: Registered protocol family 1
> NET: Registered protocol family 17
> ACPI: (supports S0 S1 S3 S4 S4bios S5)
> ACPI wakeup devices:
> EPA0 EPA1 EPB0 EPB1 EPC0 P0P1 MC97 USB1 USB2 EUSB PS2K PS2M P0PC SLPB
> Freeing unused kernel memory: 176k freed
> SCSI subsystem initialized
> libata version 1.10 loaded.
> ata_piix version 1.03
> ata_piix: combined mode detected
> ACPI: PCI interrupt 0000:00:1f.2[A] -> GSI 18 (level, low) -> IRQ 193
> ata: 0x1f0 IDE port busy
> PCI: Setting latency timer of device 0000:00:1f.2 to 64
> ata1: SATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xFC08 irq 15
> ata1: dev 1 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 
> 86:3c01 87:4003 88:207f
> ata1: dev 1 ATA, max UDMA/133, 156301488 sectors: lba48
> ata1: dev 1 configured for UDMA/133
> scsi0 : ata_piix
>   Vendor: ATA       Model: ST380013AS        Rev: 3.19
>   Type:   Direct-Access                      ANSI SCSI revision: 05
> SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
> SCSI device sda: drive cache: write back
>  sda: sda1 sda2
> Attached scsi disk sda at scsi0, channel 0, id 1, lun 0
> EXT3-fs: INFO: recovery required on readonly filesystem.
> EXT3-fs: write access will be enabled during recovery.
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: recovery complete.
> EXT3-fs: mounted filesystem with ordered data mode.
> SELinux:  Disabled at runtime.
> SELinux:  Unregistering netfilter hooks
> inserting floppy driver for 2.6.9-11.ELsmp
> Floppy drive(s): fd0 is 1.44M
> FDC 0 is a post-1991 82077
> Intel(R) PRO/1000 Network Driver - version 5.6.10.1-k2-NAPI
> Copyright (c) 1999-2004 Intel Corporation.
> ACPI: PCI interrupt 0000:03:04.0[A] -> GSI 18 (level, low) -> IRQ 193
> divert: allocating divert_blk for eth0
> e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
> hw_random: RNG not detected
> ACPI: PCI interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 185
> ehci_hcd 0000:00:1d.7: EHCI Host Controller
> PCI: Setting latency timer of device 0000:00:1d.7 to 64
> ehci_hcd 0000:00:1d.7: irq 185, pci mem f8806c00
> ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
> PCI: cache line size of 128 is not supported by device 0000:00:1d.7
> ehci_hcd 0000:00:1d.7: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10
> hub 1-0:1.0: USB hub found
> hub 1-0:1.0: 4 ports detected
> USB Universal Host Controller Interface driver v2.2
> ACPI: PCI interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 169
> uhci_hcd 0000:00:1d.0: UHCI Host Controller
> PCI: Setting latency timer of device 0000:00:1d.0 to 64
> uhci_hcd 0000:00:1d.0: irq 169, io base 0000e800
> uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
> hub 2-0:1.0: USB hub found
> hub 2-0:1.0: 2 ports detected
> ACPI: PCI interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 177
> uhci_hcd 0000:00:1d.1: UHCI Host Controller
> PCI: Setting latency timer of device 0000:00:1d.1 to 64
> uhci_hcd 0000:00:1d.1: irq 177, io base 0000ec00
> uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
> hub 3-0:1.0: USB hub found
> hub 3-0:1.0: 2 ports detected
> ip_tables: (C) 2000-2002 Netfilter core team
> md: Autodetecting RAID arrays.
> md: autorun ...
> md: ... autorun DONE.
> NET: Registered protocol family 10
> Disabled Privacy Extensions on device c03356c0(lo)
> IPv6 over IPv4 tunneling driver
> divert: not allocating divert_blk for non-ethernet device sit0
> ip_tables: (C) 2000-2002 Netfilter core team
> e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
> mtrr: type mismatch for fd000000,800000 old: uncachable new: 
> write-combining
> mtrr: type mismatch for fd000000,800000 old: uncachable new: 
> write-combining
> e1000: eth0: e1000_watchdog: NIC Link is Down
> e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
> e1000: eth0: e1000_watchdog: NIC Link is Down
> ACPI: Power Button (FF) [PWRF]
> ACPI: Sleep Button (CM) [SLPB]
> e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
> eth0: no IPv6 routers present
> EXT3 FS on sda1, internal journal
> device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm@uk.sistina.com
> cdrom: open failed.
> Adding 16707592k swap on /dev/sda2.  Priority:-1 extents:1
> ip_tables: (C) 2000-2002 Netfilter core team
> ip_tables: (C) 2000-2002 Netfilter core team
> i2c /dev entries driver
> mtrr: type mismatch for fd000000,800000 old: uncachable new: 
> write-combining
> mtrr: type mismatch for fd000000,800000 old: uncachable new: 
> write-combining

> Lspci -v
> ====================================
> [root@sdgsim-c14 ~]# lspci -v
> 00:00.0 Host bridge: Intel Corporation E7320 Memory 
> Controller Hub (rev 0a)
>         Subsystem: Intel Corporation: Unknown device 0000
>         Flags: bus master, fast devsel, latency 0
>         Capabilities: [40] Vendor Specific Information
> 
> 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI 
> Express Port A (rev 0a) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         Capabilities: [50] Power Management version 2
>         Capabilities: [58] Message Signalled Interrupts: 
> 64bit- Queue=0/1 Enable-
>         Capabilities: [64] Express Root Port (Slot-) IRQ 0
> 
> 00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X 
> Bridge (rev 02) (prog-if 00 [Normal decode])
>         Flags: bus master, 66Mhz, fast devsel, latency 32
>         Bus: primary=00, secondary=02, subordinate=02, sec-latency=32
>         Capabilities: [50] PCI-X bridge device.
> 
> 00:1d.0 USB Controller: Intel Corporation 6300ESB USB 
> Universal Host Controller (rev 02) (prog-if 00 [UHCI])
>         Subsystem: Intel Corporation: Unknown device 24d0
>         Flags: bus master, medium devsel, latency 0, IRQ 169
>         I/O ports at e800 [size=32]
> 
> 00:1d.1 USB Controller: Intel Corporation 6300ESB USB 
> Universal Host Controller (rev 02) (prog-if 00 [UHCI])
>         Subsystem: Intel Corporation: Unknown device 24d0
>         Flags: bus master, medium devsel, latency 0, IRQ 177
>         I/O ports at ec00 [size=32]
> 
> 00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog 
> Timer (rev 02)
>         Flags: medium devsel
>         Memory at febff800 (32-bit, non-prefetchable) [size=16]
> 
> 00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced 
> Programmable Interrupt Controller (rev 02) (prog-if 20 [IO(X)-APIC])
>         Flags: bus master, fast devsel, latency 0
>         Capabilities: [50] PCI-X non-bridge device.
> 
> 00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 
> Enhanced Host Controller (rev 02) (prog-if 20 [EHCI])
>         Subsystem: Intel Corporation: Unknown device 24d0
>         Flags: bus master, medium devsel, latency 0, IRQ 185
>         Memory at febffc00 (32-bit, non-prefetchable) [size=1K]
>         Capabilities: [50] Power Management version 2
>         Capabilities: [58] Debug port
> 
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 
> 0a) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0
>         Bus: primary=00, secondary=03, subordinate=03, sec-latency=32
>         I/O behind bridge: 0000d000-0000dfff
>         Memory behind bridge: fca00000-feafffff
>         Prefetchable memory behind bridge: fc800000-fc8fffff
> 
> 00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface 
> Controller (rev 02)
>         Flags: bus master, medium devsel, latency 0
> 
> 00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage 
> Controller (rev 02) (prog-if 8a [Master SecP PriP])
>         Subsystem: Intel Corporation 6300ESB SATA Storage Controller
>         Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 193
>         I/O ports at <unassigned>
>         I/O ports at <unassigned>
>         I/O ports at <unassigned>
>         I/O ports at <unassigned>
>         I/O ports at fc00 [size=16]
> 
> 00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02)
>         Subsystem: Intel Corporation: Unknown device 24d0
>         Flags: medium devsel, IRQ 201
>         I/O ports at 0540 [size=32]
> 
> 03:04.0 Ethernet controller: Intel Corporation 82541GI/PI 
> Gigabit Ethernet Controller
>         Subsystem: Super Micro Computer Inc: Unknown device 1076
>         Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 193
>         Memory at feaa0000 (64-bit, non-prefetchable) [size=128K]
>         I/O ports at dc00 [size=64]
>         Capabilities: [dc] Power Management version 2
>         Capabilities: [e4] PCI-X non-bridge device.
> 
> 03:05.0 VGA compatible controller: ATI Technologies Inc Rage 
> XL (rev 27) (prog-if 00 [VGA])
>         Subsystem: ATI Technologies Inc Rage XL
>         Flags: bus master, stepping, medium devsel, latency 
> 32, IRQ 201
>         Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
>         I/O ports at d800 [size=256]
>         Memory at feafe000 (32-bit, non-prefetchable) [size=4K]
>         Expansion ROM at fea60000 [disabled] [size=128K]
>         Capabilities: [5c] Power Management version 2
Comment 7 GV Govindasamy 2005-07-20 13:06:12 EDT
I am back from vacation and ready to assist to fix this bug as soon as possible.
This bug is really impacting us as we couldn't get consistent test runs due to
random and intermittant Linux client PANICs. Increasing priority to meet our
business requirements.
Comment 8 GV Govindasamy 2005-07-21 19:56:04 EDT
Created attachment 117044 [details]
Another console screen

Another console screen of same panic with different stack trace
Comment 9 GV Govindasamy 2005-07-21 20:31:24 EDT
As another note: I am seeing a lot of messages as below on the console:

kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice
day...

and a few as below:

kernel: nfs_proc_symlink: TEST/SYMLINK_EEXIST_link already exists??
and
just before this PANIC
kernel: RPC: error 5 connecting to server XXX
Comment 10 Tim Burke 2005-07-21 21:23:52 EDT
Chuck suggests that the comment #9 above actually does not pertain to this bug.
Comment 14 David Woodhouse 2005-07-27 08:34:53 EDT
We are hitting the BUG_ON() at line 420 of kernel/timer.c, which happens when
timers are being moved from one list to another and one of the timers doesn't
actually claim to be on the list it was found from. 

At http://people.redhat.com/dwmw2/.bz161617/ there is a kernel (i686 SMP) with
some extra debugging checks which should shed some light on the cause. Please
could you run with this and report the output? You may need to either use a
serial console or change the console font so that more text fits on screen,
because the extra information will be before the stack trace.

Alternatively, a netdump would be useful.
Comment 15 David Woodhouse 2005-07-27 08:36:18 EDT
Created attachment 117187 [details]
Debugging patch, for reference.
Comment 16 Chuck Lever 2005-07-27 08:44:35 EDT
we're hitting two other bugs pretty hard as well.  i have a custom-built
2.6.9-11.EL kernel with two extra patches to fix the bugs.  would you mind if i
took your debugging patch and applied it this kernel for use in our testing?
Comment 17 David Woodhouse 2005-07-27 08:49:51 EDT
That'll be fine, as long as you capture the output -- the two screen shots
attached to this bug would have missed it, just as they did the "kernel BUG at
kernel/timer.c line 420" output which had scrolled off the top of the screen.
Comment 18 Chuck Lever 2005-08-08 11:05:04 EDT
running a 2.6.9-11.EL kernel with patches for 164298 and 163738.  we have not
seen any more panics.  it is likely that 164298 is the same as this bug, and
that the patch attached to that report addresses this panic as well.

Comment 19 David Woodhouse 2005-08-08 11:29:16 EDT
Yeah, it's possible that this was just another symptom of bug #164298. How
reproducible was it before? Is the fact that you haven't reproduced it since
than actually meaningful?

If you're running custom kernels anyway, perhaps it makes sense to keep the
debugging patch in it just in case? It would be unamusing to find that you can
actually reproduce it after all, when you've already take out the patch.
Comment 20 Chuck Lever 2005-08-08 11:46:30 EDT
as far as i know Gv was hitting the SMP panic at least several times a week, and
we haven't seen this panic since we started running the kernel with the fix for
164298.

if you think it is reasonable to include the debugging patch in our hotfix, then
let's go ahead and do it.  no harm at all in that.  (IT 76661).
Comment 22 David Woodhouse 2005-09-27 09:10:33 EDT
Closing as duplicate of bug #164298 according to info in comment #20.

*** This bug has been marked as a duplicate of 164298 ***
Comment 26 Red Hat Bugzilla 2006-03-07 14:12:27 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html

Note You need to log in before you can comment on or make changes to this bug.