Bug 730811 - hibernation often fails to resume and forces fsck
Summary: hibernation often fails to resume and forces fsck
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.3
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: John Feeney
QA Contact: Red Hat Kernel QE team
Depends On:
Blocks: 637248
TreeView+ depends on / blocked
Reported: 2011-08-15 19:58 UTC by Matthew Mosesohn
Modified: 2013-01-10 13:06 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2011-11-01 14:42:38 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Matthew Mosesohn 2011-08-15 19:58:10 UTC
Description of problem:
Hibernating in RHEL 6.1 and 6.2 pre-beta seems to cause crashes when resuming.  Then on next boot the system prompts for root password to run a manual fsck

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Boot system and log in
2. Start some applications, such as OpenOffice, Firefox, Thunderbird, Evince, Rhythmbox
3. Hibernate system
4. Resume from hibernation
Actual results:
About 30% of the time the system will kernel panic when resuming. 

Expected results:
Normal resume from hibernate

Additional info:

Comment 2 Ric Wheeler 2011-08-16 07:21:11 UTC
Can you please add in details on the file system used (ext4 I guess?), the IO stack and type of storage?


Comment 3 Matthew Mosesohn 2011-08-16 12:44:55 UTC

ext4 LVMs with full disk LUKS encryption to a local 500gb SATA disk (on a  ThinkPad T520 laptop)

Comment 4 Ric Wheeler 2011-08-16 13:33:16 UTC
Sounds like LUKS might be losing the write barrier/flush requests?

Comment 5 Jeff Moyer 2011-08-16 14:52:07 UTC
Tough to say without more debugging.  I think I'd start by assigning this to a device-mapper developer.

Comment 6 Eric Sandeen 2011-08-16 14:54:50 UTC
If you're in for more testing and have some hardware to do it, I'd start with a very simple storage stack, and then add things to it, testing along the way, until you can see which layer/component seems to cause the problem.

If it's ext4 on a plain partition, I'll perk up.  :)

Comment 7 Milan Broz 2011-08-16 16:24:21 UTC
It could be that FLUSH is lost somewhere, order is wrong, there is missing flush for workqueue (dmcrypt uses internal threads but DM core should send flush only if there is no IO in flight).

Seems to need more debugging. If flush is correctly backported, I do not think the problem is in dmcrypt. (It simply forwards flush to underlying device - the same like linear target. DM core should wait for previous IOs so flush is sent when dmcrypt has empty encryption queues.)

Is the hibernation code properly fixed to send flush when saving memory image to encrypted swap?

What is corrupted first - memory image loaded from swap during resuming or filesystem?
(I would try to hibernate and instead of resume run fsck from live CD - if there no corrupted fs, memory image in swap is corrupted and fs corruption is just consequence.)

Comment 8 Matthew Mosesohn 2011-08-16 16:31:40 UTC

Are you requesting I try to reproduce that?

Comment 10 Matthew Mosesohn 2011-08-29 16:40:04 UTC
Is there any update on this request?

Comment 12 Matthew Garrett 2011-08-30 14:54:11 UTC

Can you attach the backtrace you get on resume?

Comment 18 Matthew Garrett 2011-10-04 15:42:16 UTC
Which kernel are you testing 6.2 with? Make sure that it's -199 or later.

Note You need to log in before you can comment on or make changes to this bug.