Bug 397521
Summary: | Disk goes offline and refuses to come back | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Daniel C Weeks <daniel.c.weeks> | ||||||
Component: | kernel | Assignee: | Ingo Molnar <mingo> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 8 | CC: | alan, chris.brown | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 2.6.24.3-12 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-03-10 05:29:48 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Daniel C Weeks
2007-11-24 00:42:58 UTC
Created attachment 267841 [details]
dmesg and lspci -vvv output
This appears to be the result of the tickless timer that was recently added to the kernel. This is fixed with kernel options: clocksource=acpi_pm nohz=off hightres=off I assume this is still an issue and that this is just a workaround, but I'm not sure what how to proceed with this report. Created attachment 274401 [details]
Kernel messages for Nov 30 with failures.
I guess I jumped the gun on that diagnosis. I'm still experiencing the failure of the IDE devices, but more sporadically. For a short period after I boot, the devices work fine, but errors slowly start to appear about failed reads: FAT: Directory bread(block 1404516) failed With the changed clocksource, the system doesn't freeze (or hasn't yet). If you just do "nohz=off" is that sufficient or do you need both that and acpi_pm selected ? I spent a few days testing different combinations of "nohz=off" and "clocksource=acpi_pm" and the following is what I observed: "nohz=off" alone: The system boots and I am able to use the IDE devices, but after a some period of time (usually about 30 min to an hour) device errors appear in the log and the devices become unusable. "clocksource=acpi_pm" alone: The system boots up and within a few minutes devices errors occur and the devices become unusable. "nohz=off" and "clocksource=acpi" together: This seems to be the most stable. I've been able to run for many hours without device errors, but they do eventually show up. These are just my personal observations as the amount of time it takes before the errors appear seems to vary significantly. I noticed another issue during this time that makes me think it's more than just an issue with the IDE devices is that my external USB drive also fails with these errors at the same time the DVD and IDE hard drive fail. Any ideas Ingo ? I recently decided to install the debug kernel to see if I could get any more information about this problem and I ran into something interesting. If I run with the debug kernel, I don't have any errors at all and the devices function normally. However, if I boot to the standard kernel the problems occur regularly. Is there any significant difference between the standard kernel and the debug kernel, other than optimization and debugging information? The following bug reports are potentially related: https://bugzilla.redhat.com/show_bug.cgi?id=411001 https://bugzilla.redhat.com/show_bug.cgi?id=397191 https://bugzilla.redhat.com/show_bug.cgi?id=250349 Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few weeks if there is no additional information lodged. I recently downloaded the 2.6.24 vanilla kernel and this problem appears to be fixed. I don't know what changed, but I've been running for over a week now without any errors. Hi Daniel, Okay, thanks for testing. There should be a 2.6.24 kernel in the Fedora repositories sometime in the next week or so (check updates-testing) so if you could test with that it would be greatly appreciated. Then if everythings okay we can close out this bug. Closing. Fixed in kernel 2.6.24.3-12. |