Bug 208488
Summary: | ext3: oops in ext3_clear_inode | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jeremy Fitzhardinge <jeremy> | ||||||
Component: | kernel | Assignee: | Eric Sandeen <esandeen> | ||||||
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 6 | CC: | davej, hanwen, ncunning, wtogami | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2006-12-14 14:18:45 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jeremy Fitzhardinge
2006-09-28 20:32:22 UTC
Created attachment 137343 [details]
Oops output
Also, I forced a complete check of the filesystems on reboot, and there was no damage, so this looks like a purely in-core thing. I don't suppose by any chance it's reproducable without the madwifi stuff loaded ? I noticed you posted this upstream too, so lets see if anything useful comes out of that. We don't have much of a delta between 2.6.18's ext3 right now. The biggest changes are the inode-diet patches from -mm (and now in Linus' tree for .19), and there's some work to make ext3/jbd safe for 16TB volumes. I don't think either of these are likely candidates for bugs, but maybe Eric will spot something. Hm, ok, this is getting interesting. Bug #207658 (which I closed CANTFIX due to the tainted kernel...) looks almost exactly the same: Sep 22 12:22:35 localhost kernel: BUG: unable to handle kernel paging request at virtual address 756e6547 <--- look familiar? Sep 22 12:22:35 localhost kernel: EIP is at ext3_clear_inode+0x52/0x8b [ext3] but... it also had modules forced in, 3rd-party intel wireless stuff. Curious... I wonder if they share the same ieee80211 code? Hm, perhaps not. Well, at least now we know "Genu" probably didn't come from the kernel code you were compiling, but somewhere else... The other bug noted a suspension within the past hour, anything like that in your case? Yes. I'd had resumed from a suspend-to-ram not long before the oops. The machine had been up for a while, and undergone a number of suspend-resume cycles. Hm, just for fun, google turns up 1 other person who has tried to use that "memory address" http://lists.pld-linux.org/mailman/pipermail/pld-installer/2002-January.txt also someone else with proprietary modules with that address on their stack: https://www.redhat.com/archives/fedora-test-list/2003-October/msg00979.html but those are old. There's only one "Genu*" string in the i386 kernel, in intel.c: static struct cpu_dev intel_cpu_dev __cpuinitdata = { .c_vendor = "Intel", .c_ident = { "GenuineIntel" }, There are none in the ath_pci driver. The cpuid instruction puts that value into %ebx when run with %eax==1, but in both this bug and 207658 its in %edx. The nvidia one has pretty clearly just done a cpuid, and the crash is in the depths of the nvidia driver, so that's pretty clearly not it. And the Polish one omits so much detail its hard to tell if its comparable. The other crash (thinkpad) is with [hanwen@haring root]$ ls -l /root/wireless/ totaal 200 -rw-r--r-- 1 root root 68832 aug 27 11:30 ieee80211-1.2.15.tgz -rw-r--r-- 1 root root 57929 aug 27 11:28 ipw3945d-1.7.18.tgz -rw-r--r-- 1 root root 61175 aug 27 11:28 ipw3945-ucode-1.13.tgz these sources (and the .o files) don't contain the string Genu, though. From the disassembly, looks like we died here in ext3_clear_inode(): 0000af09 <ext3_clear_inode>: ... af5b: f0 ff 0a lock decl (%edx) <--- + 0x52 which should correspond to the 2nd posix_acl_release() call I think. And now gotta run, but hey, you found an interesting one :) Good eyes on the "Genu" thing, that'll be a good hint. *** Bug 207658 has been marked as a duplicate of this bug. *** If either of you can reproduce this, obtaining a dump in some manner might be helpful. I just got a repro; same backtrace, different address. Again with madwifi loaded, unfortunately. Kernel kernel-2.6.18-1.2849.fc6 Created attachment 141829 [details]
Second oops with the same backtrace
BTW, the system was under some disk load, running a mercurial "hg status" on a kernel source tree, while browsing in firefox. The oops happened, and then the machine locked up shortly afterwards, forcing a reboot. Fortunately the oops got saved to syslog. Since this has only ever been reproduced (3x now) with tainted kernels, I'm going to have to close it CANTFIX. If you ever get an oops with a clean kernel, please re-open with as many details as possible; a kernel dump would be great. Thanks, -Eric |