Bug 78312
Summary: | Kernel hang (possibly from disk or network load) | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Need Real Name <mjeffery> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.3 | CC: | sct |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2003-03-01 03:05:40 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Need Real Name
2002-11-21 03:33:33 UTC
Created attachment 85824 [details] Syslog output containing sysreq-T/P from 1st hang (duplicated from bug 77508) Created attachment 85825 [details]
Serial console log of sysreq-T/P from 2nd hang
Created attachment 85826 [details]
Diff between sysreq-T/P attempts for 2nd hang
In the second case, there is nothing suspicious at all in the logs. There are just a few processes waiting for disk IO. There is absolutely no deadlocked processes. So, all we know is that a disk IO went missing --- we scheduled it to disk, but it never completed, and one by one a bunch of kernel processes started waiting uninterruptibly for that lost IO. In conjunction with the first hang's trace, which indicates being locked in the MegaServ process, this looks even more like a driver-level problem --- either a driver bug, a firmware problem or a hardware problem causing lost IOs. Are there any other IO messages in the logs? There are absolutely no other messages that seem connected to the crash---the only thing close in time are a few securelog messages recording my use of ssh and su. I see no evidence that a single sector was written to the disk once it hung (for the 2nd hang), and the last thing logged to the console before the sysreqs are boot messages. I've upped sys.printk to 15. Is there anything else I can do to get more debugging info out of this system? Upgrading to 2.4.18-19.7.xbigmem fixed this. We've gotten over 45 days of uptime and no freezes since. |