Bug 108948
Summary: | File system complaints ultimately trigger networking BUG | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Richard Schaal <rschaal_95135> | ||||
Component: | kernel | Assignee: | Stephen Tweedie <sct> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.2 | ||||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i586 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-09-10 14:12:00 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Richard Schaal
2003-11-03 16:54:55 UTC
Created attachment 95680 [details]
Dmesg output showing file system complaints and BUG output
This is the data I was able to collect from the system.
That's a corruption of something. Without more data, it's impossible to say what: whether it's bad memory, a bad disk/controller/cable, or a kernel fault in a driver or filesystem or in the VM core. The BUG() looks like it was triggered by previous failed IO. The kernel should respond more gracefully to that, but this doesn't tell us what the corruption was caused by in the first place. We need to know if it is reproducible, and how to reproduce it. BUG() appears to be in ext2_get_branch() -> sb_read(). We've got to bh = sb_bread(sb, le32_to_cpu(p->key)); which looks up an indirect block, but gets one which is both not uptodate and not mapped. bread() gets the BUG when it tries to read in the buffer. So a previous IO error has left behind an unmapped buffer, but one that _is_ hashed. Odd. Sounds like one for Al to check out. We're looking at it, but the kernel log is seriously truncated and it's hard for us to identify what the *first* thing that went wrong is. We may find a route to the BUG(), but without more information there's almost no chance that we'll be able to diagnose whether the initial problem here was due to hardware or software. Do you want me to set up a serial console and catch all the messages? - Most likely, we could catch the first complaint... Will setting debug on the kernel boot help? Were you able to capture any more here? These problems still look more like hardware than anything else, and they don't match any other footprint I've been looking at. Please reopen if you can still reproduce against a current kernel. I've updated the system to 2.4.27 now, and have not seen the issue in 21 days of uptime. I hadn't gotten a response on my queries in comment#5, so haven't set up the serial console recording. debug on the kernel boot won't help here. A full debug kernel build might (enable things like slab poisoning in .config.) |