Bug 730811
Summary: | hibernation often fails to resume and forces fsck | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Matthew Mosesohn <mmosesoh> |
Component: | kernel | Assignee: | John Feeney <jfeeney> |
Status: | CLOSED WORKSFORME | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.3 | CC: | chellwig, dchinner, esandeen, jfeeney, jmoyer, lczerner, mbroz, msanders, msnitzer, rwheeler, vgoyal |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-11-01 14:42:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 637248 |
Description
Matthew Mosesohn
2011-08-15 19:58:10 UTC
Can you please add in details on the file system used (ext4 I guess?), the IO stack and type of storage? Thanks! Ric, ext4 LVMs with full disk LUKS encryption to a local 500gb SATA disk (on a ThinkPad T520 laptop) Sounds like LUKS might be losing the write barrier/flush requests? Tough to say without more debugging. I think I'd start by assigning this to a device-mapper developer. If you're in for more testing and have some hardware to do it, I'd start with a very simple storage stack, and then add things to it, testing along the way, until you can see which layer/component seems to cause the problem. If it's ext4 on a plain partition, I'll perk up. :) It could be that FLUSH is lost somewhere, order is wrong, there is missing flush for workqueue (dmcrypt uses internal threads but DM core should send flush only if there is no IO in flight). Seems to need more debugging. If flush is correctly backported, I do not think the problem is in dmcrypt. (It simply forwards flush to underlying device - the same like linear target. DM core should wait for previous IOs so flush is sent when dmcrypt has empty encryption queues.) Is the hibernation code properly fixed to send flush when saving memory image to encrypted swap? What is corrupted first - memory image loaded from swap during resuming or filesystem? (I would try to hibernate and instead of resume run fsck from live CD - if there no corrupted fs, memory image in swap is corrupted and fs corruption is just consequence.) Milan, Are you requesting I try to reproduce that? Is there any update on this request? Matthew, Can you attach the backtrace you get on resume? Which kernel are you testing 6.2 with? Make sure that it's -199 or later. |