Created attachment 549883 [details] kernel panic backtrace Description of problem: After resuming from a hibernation, various symptoms of file system corruption occur, including kernel panics. Version-Release number of selected component (if applicable):kernel-3.1.6-1.fc16.i686 How reproducible: intermittent Steps to Reproduce: 1.hibernate the system (e.g. using the hibernate button in the logout dialog in xfce) 2.resume the system by powering on. 3.perform file system intensive activity (e.g. update an rpm package) Actual results: failing mkdir, kernel panics Expected results: Normal file system operation Additional info: See the attached panic log
Is this 100% reproducible for you? Can you reproduce when boot with i915.modeset=0 kernel parameter?
(In reply to comment #1) > Is this 100% reproducible for you? Can you reproduce when boot with > i915.modeset=0 kernel parameter? The corruption occurs about 1 out of 4 times that I hibernate and resume. I've tried reproducing the problem with i915.modeset=0, and have not been able to, even after about 10 hibernate/resume cycles.
the modesetting datapoint is a useful one. This is a duplicate of bug 744275, but lets keep this open for now to focus on that, as it sounds like modesetting causes memory corruption when we hibernate. *** This bug has been marked as a duplicate of bug 744275 ***
Derp. I never meant to dupe this. Fixing.
*** Bug 797478 has been marked as a duplicate of this bug. ***
Created attachment 571236 [details] Very similar panic, which does not seem to occur when i915 is not loaded
Pretty sure this was a dupe of 744275. Please reopen if you can reproduce this with a current kernel from updates.
Just tried hibernating and resuming with kernel 3.3.7-1.fc17.i686 on the same machine I reported the bug for (Asus eeePc 900HA). I now get: EXT4-fs error (device dm-0): ext4_mb_generate_buddy:739: group8, 25068 clusters in bitmap, 25067 in gd JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
I'm still having ext4 corruption after hibernation in F17 with kernel 3.6.3-1.fc17. i915.modeset=0 seems to prevent it. Should I file a separate bug for F17? Anything else I can do to move this along?
Can you reproduce corruption with test_hib.sh script checkmem.c program as described here: https://bugzilla.redhat.com/show_bug.cgi?id=701857#c24 ? If so, what hardware do you have (:lspci -nnvv: of VGA controller) ?
I ran about 30 hibernate cycles with the test_hib.sh script. No errors reported, but a couple of times it hung during the reboot. A hard reset got it going again.
Ran test_hib.sh for 31 cycles, no corruption detected.
(In reply to comment #11) > No errors > reported, but a couple of times it hung during the reboot. Not sure if this is corruption related, perhaps this is some suspend/resume bug. Let's try this at night: " while true; do echo "0" > /sys/class/rtc/rtc0/wakealarm echo "+120" > /sys/class/rtc/rtc0/wakealarm sync; echo 1 > /sys/power/pm_trace; pm-suspend sleep 60 done " Scripts suspend/resume infinity with enabled error detection. Once suspend or resume will fail system will reboot there should be information in dmesg which driver is responsible for suspend failure, so attach dmesg here (restarting system will erase that information). Note that on failure this will override you HW clock, so you will need to setup that in BIOS or by "date + hwclock --systohc".
I've never seen filesystem corruption after suspending, only after hibernating (and pretty much every time I hibernate). Are you sure it's useful to run this script that uses pm-suspend rather than pm-hibernate? I don't mean to hijack this ticket, happy to file a separate one if that's indicated, but so far my symptoms and Thomas Quinn's seem consistent.
Instructions from comment 13 was intended for Thomas to discover his hibernate reboot problems. I just realized that in kernel corruption detection works only on -debug kernel variant. Did you run test_hib.sh on kernel-debug ? If not please retest after installing and booting that kernel (I'm sorry for not informing you about that).
I was not previously using a debug kernel, but I tried again with it, and test_hib.sh still didn't detect any corruption after running overnight.
The script from comment 13 ran for about 12 hours, until I interrupted it.
I also repeated the script with a -debug kernel. 61 cycles and no corruption detected.
Hmm, so looks like file system corruption is not caused by memory corruption of i915 or other driver. How filesystem corrupt manifest itself on your systems?
ext4 errors in /var/log/messages when I resume after hibernation, and again whenever I mount that filesystem until I've fscked it. Only hibernating to disk, not suspending to RAM, causes these errors. When I booted with i915.modeset=0 and then hibernated, the corruption didn't occur, but since I've only tried once I'm not confident saying that that fixes it for sure.
Ok, let's look at those errors. Please boot system then hibernate and resume and attach dmesg here, as long errors are there, if not repeat hibernate/resume cycles (perhaps using script).
No script needed, corruption happens reliably every time I hibernate. Attaching dmesg. See "EXT4-fs error" near the end,
Created attachment 654714 [details] dmesg
Before ext4 errors, there are: [ 500.926215] end_request: I/O error, dev dm-0, sector 1953458048 [ 500.926220] Buffer I/O error on device dm-0, logical block 244182256 [ 500.926236] end_request: I/O error, dev dm-0, sector 1953458048 [ 500.926238] Buffer I/O error on device dm-0, logical block 244182256 what indicate data read (or write) problem on disk. This could be hardware malfunction or device driver problem. Nothing that is related with bug originally reported here. Please open new bug report after assure this is not hardware issue. i915 bug which was reported here originally is fixed, closing ...
Created attachment 654999 [details] dmesg with i915.modeset=0 As I mentioned in my first comment, setting i915.modeset=0 makes my problem go away. I'm attaching dmesg from after booting with i915.modeset=0 and then hibernating and resuming four or five times. Neither the I/O errors nor the filesystem errors occur as they do without the modeset=0 kernel option. I'm quite confident that this is not just chance. In the course of testing I have hibernated with the normal settings dozens of times and seen filesystem errors every single time; and I've hibernated about six times with i915.modeset=0 and not seen any filesystem errors.
So this must be related with traffic on PCIe bus - i915 device with modeset=1 do something that break disk controller. But not related with issue reported here. Please open a separate bug report, provide "lspci -vnn" and link to information you already provided here.
Filed bug 882232