Bug 166357
Summary: | kernel places disks to sleep on swsusp, then fails to write pages to swap on lvm on raid1 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Alexandre Oliva <oliva> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5 | CC: | agk, katzj, ncunning, pfrields, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | NeedsRetesting | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-12-10 22:54:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alexandre Oliva
2005-08-19 18:57:54 UTC
I suspect placing disks to sleep might be the right thing to do, such that the saved state hsa them turned off as expected (?), but I've narrowed the problem down to saving to raid. If I switch to a raw disk device, the box suspends and resumes fine most of the times. It fails if I attempt to suspend in the middle of a raid resync, though. In fact, it fails in such a way that it comes back to life after the attempted suspend, but with the raid subsystem completely hosed, such that any further attempts to access /proc/mdstat or any filesystems mounted out of the raid devices will hang. In particular, further attempts to suspend will hang as well. This is what happened in the `when I insisted' case above. There's still an oddity in coming back up after a suspend to raw partition in 1.1502_FC5: when the box boots up, it says one of the raid devices (not holding swap partitions) needs a resync. When stopping all tasks for resuming, it stops and checkpoints the resync, but when it completes the resume, there's no ongoing resync. I imagine it might be the case that resuming completes whatever I/O was ongoing in the raid 1 devices that left one of the devices in need of a resync, but I'd feel much safer if swsusp actually got all raid members stable before completing the suspend, such that it wouldn't depend on resuming to avoid a complete resync. Hmm, if the disks need a resync, we should probably not resume until that has completed. Starting a background rebuild, and then resuming (to a system that won't be rebuilding) sounds terrifying. Jeremy, any ideas ? Erhm... My theory is the system didn't *really* need a resync, it just had some pending raid I/O that the raid subsystem would complete to bring the array back in sync right after resume. I may be totally off though; if the system doesn't actually complete the I/O after resuming, it's a big trouble, but ideally it shouldn't leave any such pending I/O such that the resync doesn't start and stop before resume. I just filed a separate bug for the raid-needs-resync problem, bug 166453. Let's leave this one for saving swap on lvm on raid alone, although the problems might actually be related. A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. Closing as per previous comment. |