Bug 540199 - Random memory/filesystem corruption and crashes after resuming from suspend
Summary: Random memory/filesystem corruption and crashes after resuming from suspend
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 12
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-22 17:56 UTC by Vadim Zeitlin
Modified: 2010-12-04 02:57 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-04 02:57:56 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Vadim Zeitlin 2009-11-22 17:56:32 UTC
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.0.14) Gecko/2009082707 Firefox/3.0.14 (.NET CLR 3.5.30729)

The system is a Gigabyte GA-P55-UD5 with i7 860, 8GB (2*4) of RAM and GeForce 9600GT video card. Using fresh installation of Fedora 12 without any special options, but I also tried disabling SELinux and KMS (and hence nouveau) and this didn't change anything.

Reproducible: Always

Steps to Reproduce:
1. Open a session in gdm.
2. Choose suspend from the menu.
3. Resume the machine by pressing the power button.
Actual Results:  
After resuming from suspend various projects start segfaulting. Initially I suspected a memory problem but memtest didn't find any problems during several hours so this seems unlikely as crashes are perfectly reproducible. Now I think that there is a bug with corruption of filesystem data in memory because I see that various files have random junk inserted into/appended to them when I view them after resuming from suspend. Rebooting (if it manages to reboot... sometimes it crashes hard before this even if it's the first command I type after resume) and examining them again shows that the files are not actually corrupted on disk (although sometimes it seems that something does get corrupted, see http://bugzilla.kernel.org/show_bug.cgi?id=14639).

As an example, I'm using "make -j8 -B" on a relatively big project to test the system stability. It 100% reliably fails after resume, the last time because one of of .d files left from a previous build got a part of some completely different binary file inserted into its middle.

I'm not sure if all crashes can be explained by this but it's unfortunately impossible to debug them as gdb crashes itself more often than not.

I've tested different systems on the same hardware (all in their amd64 versions). Windows 7 works flawlessly. Debian testing has the same problem: although it doesn't use nouveau and hence screen remains off after resume from suspend (it's great to see that nouveau fixes this BTW), the machine is still accessible via ssh and presents exactly the same symptoms, i.e. random (but frequent) crashes and files corruption. I also tested 2.6.32-rc5 from git using the same .config as F12 uses as well as a different, custom, slimmed-down configuration but with exactly the same results. Finally, I tried using s2ram from uswsusp instead of kernel suspend but this didn't change anything. Ah, and I also tried using suspending from single user mode without anything much running -- but this still didn't help.

I have no more ideas and plan to install a different OS soon because desktop is unusable for me without suspend but I'd be glad to do any other tests if I can somehow help fixing this. I would need some inspiration though as I'm flat out of ideas by now. Please let me know if you can think of anything.

TIA!

Expected Results:  
"make -j8" should continue to work after resume from suspend.

Comment 1 Bill McGonigle 2009-11-23 18:48:30 UTC
My eeePC 1000HE (i386 F12) also is segfaulting all over itself whenever it comes out of 'hibernate' (KDE terminology - suspend to disk).  Regular 'sleep' (suspend to RAM) seems to be OK.

I may have seen this once or twice over the course of running F11 but with F12 it's 'always'.

Comment 2 Hamidou Dia 2009-11-27 17:34:53 UTC
Hi Vadim, Hi Bill.

I was triaging tickets when I this bugzilla ticket (I was looking at eventual duplicates of :

https://bugzilla.redhat.com/show_bug.cgi?id=508106

In this were reported difficulties (crashs) after resume from suspend/hibernate?
That were worked around by disabling the "Desktop effects".

Please could you check if you have Desktop Effects enabled and try the same with disabling it before, and report here the result.

Thanks in advance.

Regards.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 3 Nerijus Baliūnas 2009-11-27 17:55:43 UTC
Vadim tried using suspending from single user mode, so it will not help him. It may help Bill, but if it is so, then it's another issue.

Comment 4 Hamidou Dia 2009-11-27 19:59:07 UTC
Yes indeed you are right. It may help Bill, but this one and BZ#508106 reported 2009-06-25 were 2 different issues.

Regards.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 5 Bill McGonigle 2009-11-27 20:04:43 UTC
Hi, hamidou,

No Desktop Effects here - DE provokes other crashes in user switching for me so I leave it off.

Thanks for triaging.

Comment 6 Vadim Zeitlin 2009-11-27 21:01:01 UTC
Sorry for the delay but I can confirm that this is unrelated to desktop effects as I don't use them at all (in fact I couldn't even if I wanted to, nouveau doesn't support 3D AFAICS).

Comment 7 Stefan Becker 2010-02-20 17:55:46 UTC
Any progress in this matter?


I've now updated my mediabox from F11 to F12 and now after a few resumes from suspend-to-disk (hibernate) I start getting strange filesystem errors on "/" (root), but not on any other mount. First I thought it was that the ext3 FS got somehow corrupted.

But a few days ago I dump'ed the FS, reformatted / as ext4, restore'd the dump. A few minutes ago the machine resumed from hibernation and voilá I get this:

Feb 20 17:47:13 mediabox kernel: EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 19: 12142 blocks in bitmap, 12133 in gd
Feb 20 17:47:14 mediabox kernel: JBD: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Feb 20 17:47:14 mediabox kernel: EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 22: 6489 blocks in bitmap, 6398 in gd
Feb 20 17:47:14 mediabox kernel: EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 24: 17553 blocks in bitmap, 17367 in gd
Feb 20 17:47:14 mediabox kernel: JBD: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash.

After a reboot fsck was executed on / and immediately triggered another reboot. After that the FS was fine again. Current kernel:

Linux mediabox 2.6.31.12-174.2.19.fc12.i686 #1 SMP Thu Feb 11 07:39:11 UTC 2010 i686 athlon i386 GNU/Linux

Comment 8 Stefan Becker 2010-03-04 19:26:34 UTC
Retested with 2.6.32.9-67.fc12 from updates-testing: same problem. After a few resumes from hibernate file system errors are reported on the root partition.

Comment 9 Bug Zapper 2010-11-04 05:42:44 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 10 Bug Zapper 2010-12-04 02:57:56 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.