Bug 166453 - swsusp doesn't wait until raid devices are stable
Summary: swsusp doesn't wait until raid devices are stable
Alias: None
Product: Fedora
Classification: Fedora
Component: mkinitrd   
(Show other bugs)
Version: rawhide
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Peter Jones
QA Contact: David Lawrence
Depends On:
TreeView+ depends on / blocked
Reported: 2005-08-21 20:22 UTC by Alexandre Oliva
Modified: 2007-11-30 22:11 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-02-27 16:54:16 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Alexandre Oliva 2005-08-21 20:22:46 UTC
Booting up after swsusp on a system with root on LVM on raid 1 will almost
always claim the raid 1 device holding the root filesystem needs resyncing.  It
starts the resync, but resume stops it and never restarts it.  It's not clear
whether that's because the saved state contains information about the incomplete
I/O, and proceeds to complete it, or it's entirely lost.  In any case, raid
devices ought to be stable at the time of the swsusp, otherwise the saved state
in a raided swap device (yeah, bad idea, but still...) may not even be readable
correctly (consider degraded raid 5, for one).  swsusp should place all raid
devices in immediate safe mode, like shutdown does shortly before powering off.

Comment 1 Dave Jones 2005-08-26 05:10:01 UTC
we can't do that. how would we reactivate them when we resume ?
We suspend with dirty state, we just need to handle that better on resume.

The only thing we can do here is detect at boot up that we have a suspend
partition *Before* we activate raid sets, and do the resume before we even load
the raid modules.

This is probably going to be a really messy problem to fix up.

Comment 2 Alexandre Oliva 2005-08-26 13:53:50 UTC
We have to start raid before LVM, since PVs may be on raid, and swap devices may
be on lvm or on raid.

Couldn't raid devices move to immediate safe mode at suspend time, and back to
normal operation mode on resume?  This sounds like the right thing to do to me.
 Suspending with dirty state means you can't rely on any data that's in the raid
devices, meaning you could even be loading broken kernel or initrd images (if
they're on raid), nevermind the suspend image itself.  Flushing the raid devices
to make them stable is just as important as flushing the hard disk caches, IMHO.

Comment 3 Dave Jones 2005-08-26 16:35:47 UTC
if the resume partition is raid, you're hosed. That's just not a configuration
we can support.  It's a chicken and egg problem. We *can not* touch raid
partitions that were active at the time of suspend.

They can't "go back to normal operation mode on resume". We don't run any magic
'fix things up' code when we resume, we load the state of the memory, restore
the registers and jmp back to where the EIP was before we resumed.

You *can* rely on that data on resume, because its in exactly the same state it
was before you suspended (as long as you haven't raidstart'd it before you do
the resume).

This isn't a flushing issue, it's a state issue. Resuming from swap restores the
exact state things were in before we suspended.

Comment 4 Alexandre Oliva 2005-08-26 22:51:04 UTC
How about switching to immediate-safe mode after saving the state, before
shutting the system down?

Hmm...  This could still corrupt raid 5, ugh.

It has to be even more complex, but I think it can be done.  Just like we place
disks to sleep after we stop all tasks and before we take the memory snapshot,
we could switch raid to immediate stable, flush everything, switch back to
normal mode (which won't cause I/O since all tasks are stopped), then snapshot
the memory and save it to disk, then switch back to immediate stable mode before
shut down.  This appears to mirror exactly what happens to the IDE disk
subsystem, that shuts disks down, brings them back up to save state, then shuts
them down again before the host powers off.

If we do this, then the snapshot is taken with raid in regular operation mode,
but stable, and the system state saved on raid will be stable as well, and
nothing special has to happen when the system is brought back up.

Of course I'm speaking as someone not sufficiently familiar with the
implementation of the raid subsystem to tell whether what I propose is easily
doable, but I think it's probably the most reasonable path to go, since it
mirrors the working behavior we have for block devices in disk partitions.

Comment 5 Dave Jones 2005-08-26 23:55:56 UTC
I've run out of ways to say "We don't run any magic 'fix things up' code when we

Cleaning the state before we suspend won't do any good at all. We'd then resume
with clean state, and an active raidset. How well is that going to work ?

Comment 6 Alexandre Oliva 2005-08-27 02:05:21 UTC
Can you explain which part of my last posting gave you the impression that we'd
have to fix anything up after resume?  I must not have been clear, because I
explicitly addressed this point.

As for resuming with the active raidset, it would be active but without any
pending I/O, so it should hopefully be fine, no?

Note You need to log in before you can comment on or make changes to this bug.