Bug 113133
Summary: | Kernel interrupt problems removing ethernet cable | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | David Yerger <davidy> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 9 | CC: | riel |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | athlon | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:41:46 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Yerger
2004-01-08 19:16:36 UTC
Same kind of problem with kernel released today (2.4.20-30.9) I'm pasting in sections from /var/log/messages, I thought the times might be useful. Tried pulling ethernet cable and replacing in runlevel 1, OK. In runlevel 3, got: Feb 18 19:34:56 cache kernel: scsi0: PCI error Interrupt at seqaddr = 0x8 Feb 18 19:34:56 cache kernel: scsi0: Data Parity Error Detected during address or write data phase Feb 18 19:34:56 cache kernel: scsi0: PCI error Interrupt at seqaddr = 0x9 Feb 18 19:34:56 cache kernel: scsi0: Data Parity Error Detected during address or write data phase Feb 18 19:37:26 cache ntpd[1715]: kernel time discipline status change 41 Feb 18 19:38:30 cache ntpd[1715]: kernel time discipline status change 1 Feb 18 19:40:00 cache CROND[2114]: (root) CMD (/usr/lib/sa/sa1 1 1) Then I shut down my database, and as it was releasing locks got: Feb 18 19:40:18 cache kernel: swap_free: Bad swap file entry 00010052 Feb 18 19:40:18 cache kernel: swap_free: Unused swap offset entry 00c10000 Feb 18 19:40:18 cache kernel: swap_free: Unused swap offset entry 00010000 Feb 18 19:40:18 cache last message repeated 3 times Then, after database (Intersystems' Caché 5.0.4) was all the way down, an Oops: Feb 18 19:40:38 cache kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000074 Feb 18 19:40:38 cache kernel: printing eip: Feb 18 19:40:38 cache kernel: c0142000 Feb 18 19:40:38 cache kernel: *pde = 00000000 Feb 18 19:40:38 cache kernel: Oops: 0000 Feb 18 19:40:38 cache kernel: i2c-isa it87 i2c-proc i2c-core tg3 reiserfs ext3 jbd raid1 3w-xxxx aic7xxx sd_mod scsi_mod Feb 18 19:40:38 cache kernel: CPU: 0 Feb 18 19:40:38 cache kernel: EIP: 0060:[<c0142000>] Not tainted Feb 18 19:40:38 cache kernel: EFLAGS: 00010202 Feb 18 19:40:38 cache kernel: Feb 18 19:40:38 cache kernel: EIP is at page_referenced [kernel] 0x210 (2.4.20-30.9) Feb 18 19:40:38 cache kernel: eax: c1825ea8 ebx: 0000001c ecx: 00000000 edx: 00000001 Feb 18 19:40:38 cache kernel: esi: 0000000e edi: ee5d7840 ebp: 00000000 esp: c46cdf84 Feb 18 19:40:38 cache kernel: ds: 0068 es: 0068 ss: 0068 Feb 18 19:40:38 cache kernel: Process kscand/HighMem (pid: 8, stackpage=c46cd000) Feb 18 19:40:38 cache kernel: Stack: c46cdfa0 00000000 00000000 c46cdfb4 c1f01ac0 c1f01ac0 c030a0f4 c1f01b14 Feb 18 19:40:38 cache kernel: 00000000 c013a984 c46cc000 c01254e0 00000001 00000000 c46cc000 c0309f80 Feb 18 19:40:38 cache kernel: c46cc000 c013bb34 c0309f80 00000000 00000000 c025b760 000009c4 c013ba40 Feb 18 19:40:38 cache kernel: Call Trace: [<c013a984>] scan_active_list [kernel] 0x34 (0xc46cdfa8)) Feb 18 19:40:38 cache kernel: [<c01254e0>] process_timeout [kernel] 0x0 (0xc46cdfb0)) Feb 18 19:40:38 cache kernel: [<c013bb34>] kscand [kernel] 0xf4 (0xc46cdfc8)) Feb 18 19:40:39 cache kernel: [<c013ba40>] kscand [kernel] 0x0 (0xc46cdfe0)) Feb 18 19:40:39 cache kernel: [<c010727d>] kernel_thread_helper [kernel] 0x5 (0xc46cdff0)) Feb 18 19:40:39 cache kernel: Feb 18 19:40:39 cache kernel: Feb 18 19:40:39 cache kernel: Code: 8b 41 74 39 41 60 0f 43 54 24 04 45 4e 89 54 24 04 0f 89 3e Then, while I was writing the above (not realizing it was being written to disk), got: Feb 18 20:03:53 cache kernel: <6>NETDEV WATCHDOG: eth0: transmit timed out Feb 18 20:03:53 cache kernel: tg3: eth0: transmit timed out, resetting Feb 18 20:03:53 cache kernel: tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 Feb 18 20:03:53 cache kernel: tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 Feb 18 20:03:53 cache kernel: tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 Feb 18 20:03:53 cache kernel: tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 Tried same thing with 2.4.20-20.9-XFS, no problem. Probably should mention that I had (with 2.4.20-20.9-XFS) # up2date --force kernel kernel-source # reboot the system hung on "Turning Off Swap" so on rebooting I did # swapoff -a # mkswap /dev/md3 # swapon -a (under 2.4.20-30.9) Tried vanilla 2.4.25, also seems OK. Question: I noticed in /usr/src/linux-2.4.25/drivers/scsi/aic7xxx/README-aic7xxx that Option: pci_parity Definition: Toggles the detection of PCI parity errors. On many motherboards with VIA chipsets, PCI parity is not generated correctly on the PCI bus. It is impossible for the hardware to differentiate between these "spurious" parity errors and real parity errors. The symptom of this problem is a stream of the message: "scsi0: Data Parity Error Detected during address or write data phase" output by the driver. Possible Values: This option is a toggle Default Value: PCI Parity Error reporting is disabled Is this default enabled in recent RedHat kernels? I *do* have a Via chipset (KT400) Would this possibly lead to the oops? Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |