Bug 744275 - (unlock_new_inode) [abrt] kernel: [180522.072085] WARNING: at fs/inode.c:884 unlock_new_inode+0x34/0x59()
[abrt] kernel: [180522.072085] WARNING: at fs/inode.c:884 unlock_new_inode+0x...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
16
x86_64 Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Eric Sandeen
Fedora Extras Quality Assurance
abrt_hash:99411f6da281ec046f0345b3a6f...
:
: 742692 744973 748008 755315 758656 758774 758775 758952 758953 758959 759046 759369 759372 759753 760332 760794 760797 760902 760932 761002 761122 766188 766462 768412 768480 769664 770831 772174 772471 781277 781618 781655 781656 782996 783711 799715 (view as bug list)
Depends On:
Blocks: kernel_hibernate
  Show dependency treegraph
 
Reported: 2011-10-07 13:50 EDT by Luya Tshimbalanga
Modified: 2012-05-16 05:18 EDT (History)
69 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-14 15:40:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
bug during running X session (15.68 KB, text/plain)
2011-12-05 03:07 EST, Marcela Mašláňová
no flags Details
inode bug after hibernation (33.08 KB, text/plain)
2011-12-05 03:10 EST, Marcela Mašláňová
no flags Details
output of stp script (4.64 KB, application/x-xz)
2011-12-07 11:36 EST, David Juran
no flags Details
stap script to trace failures (539 bytes, text/plain)
2011-12-07 12:52 EST, Eric Sandeen
no flags Details
message.log extract kernel-3.3.0-4, x86_64 sandbox on (42.74 KB, text/plain)
2012-03-26 13:30 EDT, Jacek Pawlyta
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 42723 None None None Never

  None (edit)
Description Luya Tshimbalanga 2011-10-07 13:50:57 EDT
libreport version: 2.0.6
abrt_version:   2.0.4.981
cmdline:        BOOT_IMAGE=/vmlinuz-3.1.0-0.rc8.git0.0.fc16.x86_64 root=/dev/mapper/vg_muamba-lv_root ro quiet rhgb SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us-acentos
comment:        Crash happened during resume session
kernel:         undefined
reason:         [180522.072085] WARNING: at fs/inode.c:884 unlock_new_inode+0x34/0x59()
time:           Sun Oct  2 03:25:36 2011

backtrace:
:[180522.072085] WARNING: at fs/inode.c:884 unlock_new_inode+0x34/0x59()
:[180522.072088] Hardware name: Satellite C650D
:[180522.072091] Modules linked in: tcp_lp ppdev parport_pc lp parport fuse fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd arc4 rtl8192ce rtl8192c_common rtlwifi mac80211 sp5100_tco uvcvideo sparse_keymap videodev uinput media v4l2_compat_ioctl32 k10temp soundcore snd_page_alloc cfg80211 shpchp i2c_piix4 atl1c microcode rfkill uas ums_realtek usb_storage video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
:[180522.072157] Pid: 830, comm: abrt-dump-oops Tainted: G        W   3.1.0-0.rc6.git0.3.fc16.x86_64 #1
:[180522.072160] Call Trace:
:[180522.072166]  [<ffffffff81057b32>] warn_slowpath_common+0x83/0x9b
:[180522.072172]  [<ffffffff81057b64>] warn_slowpath_null+0x1a/0x1c
:[180522.072177]  [<ffffffff8113c5b4>] unlock_new_inode+0x34/0x59
:[180522.072182]  [<ffffffff81188bd8>] ext4_new_inode+0xc6d/0xd02
:[180522.072189]  [<ffffffff811925b6>] ext4_mkdir+0x108/0x32c
:[180522.072195]  [<ffffffff811342a2>] vfs_mkdir+0x5f/0x9b
:[180522.072200]  [<ffffffff811358cd>] sys_mkdirat+0x6b/0xaa
:[180522.072205]  [<ffffffff81135924>] sys_mkdir+0x18/0x1a
:[180522.072211]  [<ffffffff814bbfc2>] system_call_fastpath+0x16/0x1b

event_log:
:2011-10-07-10:49:32> Smolt profile successfully saved
:2011-10-07-10:49:50> Submitting oops report to http://submit.kerneloops.org/submitoops.php
:2011-10-07-10:50:54  Kernel oops has not been sent due to Couldn't connect to server
:2011-10-07-10:50:54* (exited with 1)

smolt_data:
:
:
:General
:=================================
:UUID: 70a2209b-0fd3-4093-9211-c2b0628dc892
:OS: Fedora release 16 (Verne)
:Default run level: Unknown
:Language: en_CA.utf8
:Platform: x86_64
:BogoMIPS: 3192.03
:CPU Vendor: AuthenticAMD
:CPU Model: AMD E-350 Processor
:CPU Stepping: 0
:CPU Family: 20
:CPU Model Num: 1
:Number of CPUs: 2
:CPU Speed: 1600
:System Memory: 3553
:System Swap: 5599
:Vendor: TOSHIBA
:System: Satellite C650D PSC0YC-007003
:Form factor: Notebook
:Kernel: 3.1.0-0.rc8.git0.0.fc16.x86_64
:SELinux Enabled: 1
:SELinux Policy: targeted
:SELinux Enforce: Permissive
:MythTV Remote: Unknown
:MythTV Role: Unknown
:MythTV Theme: Unknown
:MythTV Plugin: 
:MythTV Tuner: -1
:
:
:Devices
:=================================
:(4098:17296:4473:65310) pci, ahci, STORAGE, SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
:(4130:5392:4473:65310) pci, None, HOST/PCI, Family 14h Processor Root Complex
:(4098:38914:4473:65000) pci, radeon, VIDEO, AMD Radeon HD 6310 GraphicsATI
:(4130:5912:0:0) pci, None, HOST/PCI, Family 12h/14h Processor Function 6
:(4130:5892:0:0) pci, None, HOST/PCI, Family 12h/14h Processor Function 4
:(4130:5913:0:0) pci, None, HOST/PCI, Family 12h/14h Processor Function 7
:(4130:5910:0:0) pci, None, HOST/PCI, Family 12h/14h Processor Function 5
:(4130:5889:0:0) pci, None, HOST/PCI, Family 12h/14h Processor Function 1
:(4130:5888:0:0) pci, None, HOST/PCI, Family 12h/14h Processor Function 0
:(4130:5891:0:0) pci, k10temp, HOST/PCI, Family 12h/14h Processor Function 3
:(4130:5890:0:0) pci, None, HOST/PCI, Family 12h/14h Processor Function 2
:(4098:17302:4473:65310) pci, ehci_hcd, USB, SB7x0/SB8x0/SB9x0 USB EHCI Controller
:(4098:17303:4473:65310) pci, ohci_hcd, USB, SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
:(4098:17312:4098:0) pci, pcieport, PCI/PCI, SB700/SB800 PCI to PCI bridge (PCIE port 0)
:(4098:17313:4098:0) pci, pcieport, PCI/PCI, SB700/SB800 PCI to PCI bridge (PCIE port 1)
:(4098:17285:4473:65310) pci, piix4_smbus, SERIAL, SBx00 SMBus Controller
:(4098:17309:4473:65310) pci, None, PCI/ISA, SB7x0/SB8x0/SB9x0 LPC host controller
:(4098:17283:4473:65310) pci, snd_hda_intel, MULTIMEDIA, SBx00 Azalia (Intel HDA)
:(4098:17284:0:0) pci, None, PCI/PCI, SBx00 PCI to PCI Bridge
:(6505:8290:4473:65310) pci, atl1c, ETHERNET, AR8152 v2.0 Fast Ethernet
:(4098:17302:4473:65310) pci, ehci_hcd, USB, SB7x0/SB8x0/SB9x0 USB EHCI Controller
:(4098:17303:4473:65310) pci, ohci_hcd, USB, SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
:(4098:17302:4473:65310) pci, ehci_hcd, USB, SB7x0/SB8x0/SB9x0 USB EHCI Controller
:(4098:17303:4473:65310) pci, ohci_hcd, USB, SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
:(4332:33142:4332:33153) pci, rtl8192ce, NETWORK, RTL8188CE 802.11b/g/n WiFi Adapter
:
:
:Filesystem Information
:=================================
:device mtpt type bsize frsize blocks bfree bavail file ffree favail
:-------------------------------------------------------------------
:/dev/mapper/vg_muamba-lv_root / ext4 4096 4096 12901535 11460370 11329350 3276800 3131746 3131746
:/dev/mapper/vg_muamba-lv_home /home ext4 4096 4096 26149682 19619274 18290942 6643712 6601118 6601118
:/dev/sda3 /boot ext4 1024 1024 495844 414283 388683 128016 127785 127785
:
Comment 1 Chuck Ebbert 2011-10-11 20:28:40 EDT
*** Bug 744973 has been marked as a duplicate of this bug. ***
Comment 2 Chuck Ebbert 2011-10-11 20:29:04 EDT
*** Bug 742692 has been marked as a duplicate of this bug. ***
Comment 3 Juan Carlos Watts 2011-11-15 15:55:44 EST
Package: kernel
Architecture: i686
OS Release: Fedora release 15 (Lovelock)

Comment
-----
i dont know
just appears
Comment 4 Juan Carlos Watts 2011-11-18 16:47:19 EST
Package: kernel
Architecture: i686
OS Release: Fedora release 15 (Lovelock)

Comment
-----
just appear
Comment 5 Michael Ekstrand 2011-11-24 08:45:19 EST
Package: kernel
Architecture: i686
OS Release: Fedora release 16 (Verne)

Comment
-----
Got this after waking my computer from hibernate.
Comment 6 Mark Wielaard 2011-11-24 17:29:55 EST
Got a couple of these while running yum upgrade on my laptop 3.1.1-2.fc16.x86_64

All call traces look somewhat similar, there are two variants:

[140792.490463] Call Trace:
[140792.490467]  [<ffffffff81057a1e>] warn_slowpath_common+0x83/0x9b
[140792.490469]  [<ffffffff81057a50>] warn_slowpath_null+0x1a/0x1c
[140792.490471]  [<ffffffff8113c264>] unlock_new_inode+0x34/0x59
[140792.490473]  [<ffffffff81188d5c>] ext4_new_inode+0xc6d/0xd02
[140792.490476]  [<ffffffff81193b78>] ext4_symlink+0x14a/0x2ea
[140792.490479]  [<ffffffff81133e02>] vfs_symlink+0x54/0x74
[140792.490481]  [<ffffffff81133b78>] ? user_path_create+0x4d/0x57
[140792.490483]  [<ffffffff8113569e>] sys_symlinkat+0x6b/0xb2
[140792.490485]  [<ffffffff811356fb>] sys_symlink+0x16/0x18
[140792.490489]  [<ffffffff814bd902>] system_call_fastpath+0x16/0x1b
[140792.490490] ---[ end trace 0d681c6246944c8e ]---


[140890.147376] Call Trace:
[140890.147381]  [<ffffffff81057a1e>] warn_slowpath_common+0x83/0x9b
[140890.147383]  [<ffffffff81057a50>] warn_slowpath_null+0x1a/0x1c
[140890.147385]  [<ffffffff8113c264>] unlock_new_inode+0x34/0x59
[140890.147387]  [<ffffffff81188d5c>] ext4_new_inode+0xc6d/0xd02
[140890.147390]  [<ffffffff8119255b>] ext4_create+0xbc/0x13e
[140890.147393]  [<ffffffff81133fed>] vfs_create+0x6c/0x8d
[140890.147395]  [<ffffffff81134ab3>] do_last+0x248/0x5ad
[140890.147397]  [<ffffffff81134f1a>] path_openat+0xcf/0x310
[140890.147399]  [<ffffffff8113dba6>] ? notify_change+0x25a/0x270
[140890.147401]  [<ffffffff81135258>] do_filp_open+0x38/0x86
[140890.147404]  [<ffffffff8113e810>] ? alloc_fd+0x72/0x11d
[140890.147407]  [<ffffffff811286bc>] do_sys_open+0x6e/0x100
[140890.147410]  [<ffffffff810a37c4>] ? audit_syscall_entry+0x145/0x171
[140890.147412]  [<ffffffff8112876e>] sys_open+0x20/0x22
[140890.147415]  [<ffffffff814bd902>] system_call_fastpath+0x16/0x1b
[140890.147417] ---[ end trace 0d681c6246944c93 ]---
Comment 7 Alex 2011-11-27 12:01:55 EST
This error makes FS corrupted. And this happens mostly after resume from disk
Comment 8 Josh Boyer 2011-11-28 11:44:48 EST
*** Bug 755315 has been marked as a duplicate of this bug. ***
Comment 9 Josh Boyer 2011-11-30 07:46:50 EST
*** Bug 758656 has been marked as a duplicate of this bug. ***
Comment 10 Dave Jones 2011-11-30 13:04:37 EST
*** Bug 758775 has been marked as a duplicate of this bug. ***
Comment 11 Dave Jones 2011-11-30 13:04:47 EST
*** Bug 758774 has been marked as a duplicate of this bug. ***
Comment 12 Josh Boyer 2011-12-01 07:44:50 EST
*** Bug 759046 has been marked as a duplicate of this bug. ***
Comment 13 Josh Boyer 2011-12-01 07:45:04 EST
*** Bug 758952 has been marked as a duplicate of this bug. ***
Comment 14 Josh Boyer 2011-12-01 07:45:27 EST
*** Bug 758953 has been marked as a duplicate of this bug. ***
Comment 15 Josh Boyer 2011-12-01 07:50:00 EST
*** Bug 758959 has been marked as a duplicate of this bug. ***
Comment 16 Eric Sandeen 2011-12-01 17:30:21 EST
Sorry I haven't had a chance to look at this yet. :(

Also being discussed upstream at:
http://thread.gmane.org/gmane.comp.file-systems.ext4/29377/focus=29378

-Eric
Comment 17 Eric Sandeen 2011-12-01 18:00:07 EST
As Ted pointed out we are doing:

        if (insert_inode_locked(inode) < 0) {
                err = -EINVAL;
                goto fail_drop;
        }

...

fail_drop:
        dquot_drop(inode);
        inode->i_flags |= S_NOQUOTA;
        clear_nlink(inode);
        unlock_new_inode(inode); <--- BOOM

and that goto in the error case is wrong; we are trying to unlock an inode which never got locked and is not new.

And (as Ted said) we should figure out why insert_inode_locked() failed.

Also:

In (almost?) all of the cloned bugs include "Tainted: G        W " in the backtrace, i.e. a warning has previously been issued by the kernel.

Can someone attach a full dmesg leading up to this?

-Eric
Comment 18 Josh Boyer 2011-12-01 20:31:27 EST
(In reply to comment #17)
> Also:
> 
> In (almost?) all of the cloned bugs include "Tainted: G        W " in the
> backtrace, i.e. a warning has previously been issued by the kernel.

So taint B and D aren't set, so we haven't seen a bug and aren't in the process of dieing.  I thought W was set because of the unlock_new_inode warning itself.  If you want no taints at all, bug 755315 has a trace without it.
Comment 19 Josh Boyer 2011-12-02 09:01:21 EST
*** Bug 759369 has been marked as a duplicate of this bug. ***
Comment 20 Josh Boyer 2011-12-02 09:01:31 EST
*** Bug 759372 has been marked as a duplicate of this bug. ***
Comment 21 Josh Boyer 2011-12-03 11:08:12 EST
*** Bug 759753 has been marked as a duplicate of this bug. ***
Comment 22 Marcela Mašláňová 2011-12-05 03:07:36 EST
Created attachment 540674 [details]
bug during running X session
Comment 23 Marcela Mašláňová 2011-12-05 03:10:06 EST
Created attachment 540675 [details]
inode bug after hibernation
Comment 24 Marcela Mašláňová 2011-12-05 03:13:52 EST
I hope my attachments will help. I'm seeing this bug very often after unsuccessfull hibernation. The not tainted log was produced after "inode bug", which occured during work - yum update.
Comment 26 Josh Boyer 2011-12-05 16:25:09 EST
*** Bug 760332 has been marked as a duplicate of this bug. ***
Comment 27 Eric Sandeen 2011-12-05 17:37:56 EST
I've been unable to repro this either in a vm or on metal so far.  :(

I may see if someone who can is willing to run some debugging.

Note that the reason for the warning is this upstream commit:

250df6ed274d767da844a5d9f05720b804240197 fs: protect inode->i_state with inode->i_lock

because it now returns w/o I_NEW on failure, and didn't used to.  That's easy enough to fix up; I'm still not sure why it's failing, though.
Comment 28 David Juran 2011-12-06 03:35:15 EST
I see this after resuming from hibernation so please let me know what debugging you'd need.
Comment 29 Dave Jones 2011-12-06 10:11:09 EST
hibernation seems to be a common thing.  Is there anyone seeing this error that isn't using hibernation ?
Comment 30 Eric Sandeen 2011-12-06 11:37:04 EST
I've been running pm-hibernate on a bare metal box with a 3.2-ish kernel, I can't hit it (I hit other bugs though!).

I'll work on a patch to fix up the I_NEW state confusion, but the question remains, why we are failing insert_inode_locked.

Since I can't hit it, I'll try to devise some debug steps that others can take.
Comment 31 Eric Sandeen 2011-12-06 14:40:21 EST
I'd like to know for sure that it is insert_inode_locked which is failing.

For those who can reproduce, this command will print out information about the inode passed to insert_inode_locked() if it returns an error:

# stap -e 'probe kernel.function("insert_inode_locked").return { if ($return != 0) println($$parms$) }'

To use it you need to install the systemtap package as well as the kernel-debuginfo packages.

It's not 100% clear to me if the errors happen during the resume process or sometime after it ... hopefully it is some time after, and you'll have a chance to run the above command before the problem reproduces.

Any takers?

Thanks,
-Eric
Comment 32 Eric Sandeen 2011-12-06 14:57:42 EST
Actually stay tuned, I'll have a better stap script in a few minutes.
Comment 33 Eric Sandeen 2011-12-06 15:59:00 EST
This might do a little better:  put this in a file i.e. print-vars-on-failure.stp

#! /usr/bin/env stap

global in_ext4_new_inode
probe kernel.function("ext4_new_inode").call {
	in_ext4_new_inode[tid()] = 1
}

global bad_insert_inode_locked
probe kernel.function("insert_inode_locked").return {
	if (in_ext4_new_inode[tid()] && $return == 0) {
		println($$parms$);
		bad_insert_inode_locked[tid()] = 1;
	}
}

probe kernel.function("ext4_new_inode").return {
	if (bad_insert_inode_locked[tid()]) {
		println ($$parms$);
		println ($$locals);
		delete in_ext4_new_inode[tid()];
	}
	delete bad_insert_inode_locked[tid()];
}

and then do:

# stap -DMAXSTRINGLEN=1024 print-vars-on-failure-module.stp

(after installing systemtap and kernel debuginfo rpms)

Then hibernate, resume, and wait for the bug to show up; hopefully we'll get more information about what was happening from the systemtap script.

Thanks,
-Eric
Comment 34 Josh Boyer 2011-12-06 20:26:48 EST
*** Bug 760797 has been marked as a duplicate of this bug. ***
Comment 35 Josh Boyer 2011-12-06 20:26:52 EST
*** Bug 760794 has been marked as a duplicate of this bug. ***
Comment 36 Josh Boyer 2011-12-07 09:02:11 EST
*** Bug 760932 has been marked as a duplicate of this bug. ***
Comment 37 Josh Boyer 2011-12-07 09:02:21 EST
*** Bug 760902 has been marked as a duplicate of this bug. ***
Comment 38 Josh Boyer 2011-12-07 10:44:45 EST
*** Bug 761002 has been marked as a duplicate of this bug. ***
Comment 39 David Juran 2011-12-07 11:35:03 EST
/me sacrifices his file-system for the common good.... Attached is the output of the stp script. Do note that it started spewing out text already before hibernating.
Also, the file-system I logged to is the one that got corrupted so I hope the file is complete.

And now for fsck...
Comment 40 David Juran 2011-12-07 11:36:25 EST
Created attachment 542037 [details]
output of stp script
Comment 41 Eric Sandeen 2011-12-07 12:44:14 EST
David, thanks.  I'll see if I can make something of it.  I see now that I should have written something to produce a more readable output.  ;)

Did you also happen to save the results of the fsck?  I'm interested in what has gone wrong.

Are you on a standard lvm root?  encrypted or no?

Thanks,
-Eric
Comment 42 Eric Sandeen 2011-12-07 12:45:31 EST
Crud, the local vars weren't printed quite right.  Here's an update for the stap script, should be $$locals$ - I'll still see what I can see.

#! /usr/bin/env stap

global in_ext4_new_inode
probe kernel.function("ext4_new_inode").call {
 in_ext4_new_inode[tid()] = 1
}

global bad_insert_inode_locked
probe kernel.function("insert_inode_locked").return {
 if (in_ext4_new_inode[tid()] && $return == 0) {
  println($$parms$);
  bad_insert_inode_locked[tid()] = 1;
 }
}

probe kernel.function("ext4_new_inode").return {
 if (bad_insert_inode_locked[tid()]) {
  println ($$parms$);
  println ($$locals$);
  delete in_ext4_new_inode[tid()];
 }
 delete bad_insert_inode_locked[tid()];
}
Comment 43 Eric Sandeen 2011-12-07 12:50:15 EST
And I was a complete idiot yesterday, I was testing successful return values to make sure it printed info, since I can't reproduce.  But what I pasted was only printing -successful- calls not the failures.  Sigh.  I will attach a fixed version rather than pasting, and then go have a cup of coffee to re-engage my brain.
Comment 44 Eric Sandeen 2011-12-07 12:52:20 EST
Created attachment 542090 [details]
stap script to trace failures

install systemtap and kernel-debuginfo rpms, then:

# stap -DMAXSTRINGLEN=1024 print-vars-on-failure-module.stp

then hibernate and resume.

Very sorry for the confusion & mistakes on my end.  :(
Comment 45 Frank Ch. Eigler 2011-12-07 12:57:21 EST
If the variable dumps are truncated, you could up -DMAXSTRINGLEN=2048 or more.
Comment 46 Josh Boyer 2011-12-07 12:57:49 EST
*** Bug 761122 has been marked as a duplicate of this bug. ***
Comment 47 Eric Sandeen 2011-12-07 17:24:38 EST
Another option for folks willing to try, we might possibly learn something by running the kernel-debug variant of the kernel you are currently using; if we have memory corruption we might learn that earlier, for example.

But I have a sneaking suspicion that we might not be syncing everything properly to disk on hibernate, and if that's the root cause then -debug might not help.  Worth a shot if anyone is game, though.

Thanks,
-Eric
Comment 48 Eric Sandeen 2011-12-07 17:41:03 EST
This may actually be fruitful.

I can't hit the WARN_ON others are hitting but I did just hit:

[  660.831613] =============================================================================
[  660.831654] BUG shmem_inode_cache (Not tainted): Padding overwritten. 0xffff88041ce2fc00-0xffff88041ce2fc1f
[  660.831694] -----------------------------------------------------------------------------
[  660.831695] 
[  660.831737] INFO: Slab 0xffffea0010738a00 objects=20 used=20 fp=0x          (null) flags=0x40000000004081
...

Will keep poking at it.
Comment 49 David Juran 2011-12-08 09:05:13 EST
I read your question right after I had run the fsck, so I didn't save the output of the fsck, but I was able to scroll back and the parts I still could see was something along the lines of:

inode count wrong for group #81 (28957, counted 28966)
inode count wrong 15205063 counted 15205072

There was more, but I missed that. Also, it's always only the root file system that corrupts. Might be a coincidence though...

Do you have enough to work on with the stuff you found from the debug kernel or should I torture my FS further with the updated systemtap script?
Comment 50 Eric Sandeen 2011-12-08 11:49:40 EST
David -

It's not clear that they are the same bug ... though they might be.  If you are willing to re-torture, it might be useful, both to gather the stap output (probably with the 2048 max string length as fche suggested) and save the fsck output as well.

If you can't risk it, I understand...

The fact that it is the root fs probably is relevant; I've seen some other clues that the root fs might not be properly syncing on remount,ro - though I'm not sure if that's what pm-suspend does or not.

-Eric
Comment 51 David Juran 2011-12-08 14:52:49 EST
Hello Eric.

The tests with the VM didn't work. I got it booted fine and hibernated OK but when the VM was about to resume from hibernate, right after wrote that it unsuspended the console it just quit without any warning.

So back to testing with the laptop and hoping for the best. Anyhow, I just had a look at your stap script and noticed the function names mentions ext4. But at east in my case, it#s an ext3 FS that breaks. Or are the same functions used for ext3 as well?
Comment 52 Eric Sandeen 2011-12-08 15:58:10 EST
The ext3 functions will just be s/ext3/ext4.

But I'm thinking along other lines now - I think that perhaps we are missing drive write cache flushes after hibernate does it's writes.  I'm going to test that now, see if -my- problems go away.

If you want, you could try disabling the drive write cache with hdparm, hibernate, and see if you survive...

-Eric
Comment 53 Boricua 2011-12-09 07:45:52 EST
Package: kernel
Architecture: x86_64
OS Release: Fedora release 16 (Verne)

Comment
-----
Not sure how it happened, but seems related with launching Totem or Yumex.
Comment 54 David Juran 2011-12-09 12:51:53 EST
Easier said then done, my laptop seem to refuse turning write-cahce off...

djuran@localhost ~]$ sudo hdparm -W0 /dev/sda 

/dev/sda:
 setting drive write-caching to 0 (off)
 write-caching =  1 (on)
[djuran@localhost ~]$ sudo hdparm -W /dev/sda

/dev/sda:
 write-caching =  1 (on)
Comment 55 Frank Ch. Eigler 2011-12-09 12:58:27 EST
Why would one suspect a hard drive write caching issue, if a
poorly timed power loss is not being suspected?  How long is the minimum
delay between hibernation data writing and the BIOS/ACPI power-down?
Comment 56 Eric Sandeen 2011-12-09 15:53:07 EST
There seems to be nothing which syncs the suspend snapshot after it is written; if things reside only in write cache and get powered off, it could corrupt the suspend file ... Drive caches could hold data indefinitely, certainly longer than the write/poweroff window.  At that point of course they stop holding it ;)
Comment 57 Josh Boyer 2011-12-11 10:22:26 EST
*** Bug 766188 has been marked as a duplicate of this bug. ***
Comment 58 Josh Boyer 2011-12-12 07:36:00 EST
*** Bug 766462 has been marked as a duplicate of this bug. ***
Comment 59 Pekka Savola 2011-12-12 15:39:16 EST
Happens with me too when resuming from hibernate.
Disk access is usually hosed. E.g. opening PDFs w/ Firefox fails because it can't store them on /tmp. Yum upgrades cause lots of errors.
Comment 60 Eric Sandeen 2011-12-13 11:19:29 EST
Is anyone testing on a desktop, or are these all laptops with the problem?
Comment 61 Patrick Dubois 2011-12-13 12:13:35 EST
(In reply to comment #60)
> Is anyone testing on a desktop, or are these all laptops with the problem?

That's a very good question.  I don't have a desktop but testing in a vm could provide insight as well.
Comment 62 Eric Sandeen 2011-12-13 12:26:03 EST
I've not had luck in a VM either.  I wonder if slower disks in a laptop could be influencing behavior.

Ooh, I don't suppose everyone is running with laptop mode enabled?  Can anyone confirm that they saw the bug when laptop mode was -not- enabled?
Comment 63 Boricua 2011-12-13 12:39:12 EST
(In reply to comment #60)
> Is anyone testing on a desktop, or are these all laptops with the problem?

I have seen this bug in two desktops and one laptop. Same symptoms in all of them.
Comment 64 Eric Sandeen 2011-12-13 12:53:13 EST
Ok, thanks.  Perhaps I will try more desktops!
Comment 65 Boricua 2011-12-13 13:00:05 EST
No problem. I saw all problems disappear in all three by switching to the previous kernel.
Comment 66 Patrick Dubois 2011-12-13 14:57:11 EST
(In reply to comment #65)
> No problem. I saw all problems disappear in all three by switching to the
> previous kernel.

Boricua - which kernel would that be exactly ?
Comment 67 Boricua 2011-12-13 17:00:15 EST
(In reply to comment #66)
> (In reply to comment #65)
> > No problem. I saw all problems disappear in all three by switching to the
> > previous kernel.
> 
> Boricua - which kernel would that be exactly ?

That would be kernel 3.1.2-1.fc16.x86_64. I'm getting equally good results with current kernel 3.1.5-1.fc16.x86_64.  The troubling kernel is 3.1.4-1.fc16.x86_64.
Comment 68 Eric Sandeen 2011-12-14 13:07:15 EST
On a lark - 

Could someone who can reproduce quite reliably try this?

# echo 0 > /sys/power/pm_async

(as root)

Then try the hibernate and see if things go better...
Comment 69 Josh Boyer 2011-12-16 10:47:14 EST
*** Bug 768412 has been marked as a duplicate of this bug. ***
Comment 70 Eric Sandeen 2011-12-16 11:03:11 EST
Seems everyone but me can hit this.  Could everyone please shoot me an email about the type of hardware you're using, and/or your smolt profile?  If it's a laptop, laptop model & total memory probably suffices; for desktops, maybe describe cpu, memory, and disk.  Whether or not you're using lvm and/or dm-crypt might also be useful.  I'll compile answer & put it in on one attachment for posterity, but I want to see if there's a pattern here, and maybe find some hardware I can use to reproduce!

Again, the warn_on is a known bug, but the reason we went down that path in the first place - likely fs corruption - still has a mystery root cause.

Thanks,
-Eric
Comment 71 Alex 2011-12-16 11:13:18 EST
I'm using laptop Thinkpad T420s. HW:
> lspci 
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:16.3 Serial controller: Intel Corporation 6 Series/C200 Series Chipset Family KT Controller (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b4)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation QM67 Express Chipset Family LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
03:00.0 Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35)
05:00.0 System peripheral: Ricoh Co Ltd MMC/SD Host Controller (rev 07)
0d:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)

> cat /proc/cpuinfo|grep 'model name'
model name      : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz

> free -mt
.....
Total:        3909
Comment 72 Alex 2011-12-16 11:15:29 EST
With "echo 0 > /sys/power/pm_async" I can't hibernate
Comment 73 Eric Sandeen 2011-12-16 11:26:08 EST
(In reply to comment #72)
> With "echo 0 > /sys/power/pm_async" I can't hibernate

Huh.  That's odd.  How does it fail?  I could hibernate here on my desktop with that change...  Oh well.
Comment 74 Alex 2011-12-16 11:33:00 EST
From dmesg, around hibernate time:

........
[24197.016226] e1000e 0000:00:19.0: eth0: 10/100 speed: disabling TSO
[24197.016461] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[24197.195052] PM: Marking nosave pages: 000000000009d000 - 0000000000100000
[24197.195057] PM: Marking nosave pages: 00000000da89f000 - 00000000dafff000
[24197.195086] PM: Marking nosave pages: 00000000db000000 - 0000000100000000
[24197.195579] PM: Basic memory bitmaps created
[24197.195581] PM: Syncing filesystems ... done.
[24197.460776] Freezing user space processes ... (elapsed 0.01 seconds) done.
[24197.471912] PM: Preallocating image memory... 
[24198.749520] Restarting tasks ... done.
[24198.750097] PM: Basic memory bitmaps freed
[24198.750103] video LNXVIDEO:00: Restoring backlight state
[24207.535461] eth0: no IPv6 routers present
[24217.702530] usb 1-1.4: new full-speed USB device number 5 using ehci_hcd
[24217.742843] iwlwifi 0000:03:00.0: L1 Disabled; Enabling L0S
[24217.742977] iwlwifi 0000:03:00.0: Radio type=0x0-0x3-0x1
[24217.851906] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[24218.190755] e1000e 0000:00:19.0: irq 50 for MSI/MSI-X
[24218.241825] e1000e 0000:00:19.0: irq 50 for MSI/MSI-X
[24218.242600] ADDRCONF(NETDEV_UP): eth0: link is not ready
[24218.707369] e1000e 0000:00:19.0: irq 50 for MSI/MSI-X
[24218.760108] e1000e 0000:00:19.0: irq 50 for MSI/MSI-X
.......

Seems it just restarted tasks without poweroff
Comment 75 Josh Boyer 2011-12-16 14:23:31 EST
*** Bug 768480 has been marked as a duplicate of this bug. ***
Comment 76 hsirig+redhat 2011-12-19 13:58:37 EST
seen  immediately after closing vmware player.

rating: (null)
Package: kernel
Architecture: x86_64
OS Release: Fedora release 16 (Verne)
Comment 77 Josh Boyer 2011-12-21 14:38:13 EST
*** Bug 769664 has been marked as a duplicate of this bug. ***
Comment 78 graeme_moss 2011-12-21 16:21:28 EST
I always hibernate, but this occurred long after waking up from hibernation, and I havent seen it before today.  I've had it several times today already.

rating: (null)
Package: kernel
Architecture: i686
OS Release: Fedora release 16 (Verne)
Comment 79 Eric Sandeen 2011-12-21 16:26:51 EST
I switched to testing with dm-crypt here; still no luck.
Comment 80 lane.babuder 2011-12-25 10:58:27 EST
I logged in. Seriously, that was it; I had just booted up my Acer Aspire One (first generation so "Legacy"), and logged into my computer and it provided this kernel error.

rating: (null)
Package: kernel
Architecture: i686
OS Release: Fedora release 16 (Verne)
Comment 81 lane.babuder 2011-12-25 11:00:25 EST
Reporting a bug a bug showed up; not sure where the hell this came from.

rating: (null)
Package: kernel
Architecture: i686
OS Release: Fedora release 16 (Verne)
Comment 82 Dennis Appelon Nielsen 2011-12-26 16:17:07 EST
The Problem persist in latest kernel release 3.1.6-1.fc16.x86_64
Comment 83 Nils Philippsen 2011-12-27 08:03:11 EST
(In reply to comment #82)
> The Problem persist in latest kernel release 3.1.6-1.fc16.x86_64

I can confirm this, here's my smolt profile:

http://www.smolts.org/client/show/pub_8a313c29-a5ec-48e5-8b60-3d1e2999a6be
Comment 84 Dennis Appelon Nielsen 2011-12-28 16:09:47 EST
Try booting your kernel with hpet=disable it took care of the problem for me.

But I don't know what hpet is I'll have to Google for a answer.
Comment 85 Dave Jones 2011-12-29 12:59:52 EST
*** Bug 770831 has been marked as a duplicate of this bug. ***
Comment 86 YANG Xudong 2012-01-01 21:31:04 EST
Recovered from hibernation and logged in. One second later, this shows up.

rating: (null)
Package: kernel
Architecture: i686
OS Release: Fedora release 16 (Verne)
Comment 87 YANG Xudong 2012-01-01 21:37:18 EST
Oddly enough, libreport-plugin-bugzilla itself also crashed while sending this report. That's why the comment above provides so little data. Also, if it helps, the kernel crash happens repeatedly in intervals of a few minutes.
Comment 88 Eric Malloy 2012-01-02 01:54:07 EST
coming back from hiberation, then yum install

rating: (null)
Package: kernel
Architecture: x86_64
OS Release: Fedora release 16 (Verne)
Comment 89 cazcazn 2012-01-02 09:28:40 EST
Opened Rhytmbox, clicked on Last.fm plus arrow (under "Library"), clicked on "Pulp radio" and it crashed.

rating: (null)
Package: kernel
Architecture: i686
OS Release: Fedora release 16 (Verne)
Comment 90 cazcazn 2012-01-02 09:29:17 EST
This time it happened whilst reporting an issue with Rhytmbox...

rating: (null)
Package: kernel
Architecture: i686
OS Release: Fedora release 16 (Verne)
Comment 91 Dave Jones 2012-01-03 11:05:13 EST
*** Bug 770754 has been marked as a duplicate of this bug. ***
Comment 92 Dave Jones 2012-01-03 11:07:02 EST
bug 744275 contains a possible useful datapoint.
That this might be being caused (at least in some cases) by the i915 modesetting driver.
Comment 93 Nils Philippsen 2012-01-03 12:23:08 EST
(In reply to comment #92)
> That this might be being caused (at least in some cases) by the i915
> modesetting driver.

I've encountered this bug on three machines now, one with Intel graphics, but two without.
Comment 94 Nils Philippsen 2012-01-03 12:24:50 EST
(In reply to comment #93)
> ... but two without.

Both have Radeon graphics adapters FWIW. Shouldn't submit so quickly.
Comment 95 Dennis Appelon Nielsen 2012-01-04 15:52:16 EST
The latest kernel-3.1.7-1.fc16 have solved my unsuspended problem.

Thanks Josh Boyer and anyone else there have helped me find the root of the sudden unsuspended problem :-)

I'll be sure to send it a tons of positive karma +1
Comment 96 Andrew Duggan 2012-01-04 16:53:25 EST
kernel-3.1.7-1.fc16 does NOT fix this for me.  After resume from hibernate with 3.1.7-1.fc16 I still get the same symptoms as before.

[  993.685751] ------------[ cut here ]------------
[  993.685769] WARNING: at fs/inode.c:884 unlock_new_inode+0x42/0x50()
[  993.685774] Hardware name: MM061
[a_codec mac8]211 snd_hwdep dell_laptop snd_seq snd_seq_device snd_pcm cfg80211 bcma iTCO_wdt rfkill iTCO_vendor_suppore_snd_timer snd r852 b44 sm_common nand nand_ids dell_wmi nand_ecc ssb r592 uinput i2c_i801 soundcore mii dcdbas mtd spahde_keymap joydev snd_page_alloc microcode memstick firewire_ohci sdhci_pci firewire_core sdhci crc_itu_t mmc_core wmi it 5 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]                                      rs  993.685886] Pid: 1322, comm: gconfd-2 Not tainted 3.1.7-1.fc16.i686.PAE #1                                          91  993.685890] Call Trace:
[  993.685900]  [<c0920d5f>] ? printk+0x2d/0x2f
[  993.685909]  [<c045b4a2>] warn_slowpath_common+0x72/0xa0
[  993.685917]  [<c054eb52>] ? unlock_new_inode+0x42/0x50
[  993.685924]  [<c054eb52>] ? unlock_new_inode+0x42/0x50
[  993.685930]  [<c045b4f2>] warn_slowpath_null+0x22/0x30
[  993.685937]  [<c054eb52>] unlock_new_inode+0x42/0x50
[  993.685946]  [<c059f59a>] ext4_new_inode+0xa3a/0xf30
[  993.685954]  [<c05d9293>] ? jbd2__journal_start+0xb3/0xf0
[  993.685962]  [<c05aa841>] ext4_create+0xb1/0x120
[  993.685969]  [<c0544a9b>] vfs_create+0x9b/0x100
[  993.685975]  [<c054688f>] do_last+0x5bf/0x820
[  993.685982]  [<c0546bd4>] path_openat+0xa4/0x350
[  993.686000]  [<c0552f0d>] ? mntput_no_expire+0x1d/0xe0
[  993.686004]  [<c0546f91>] do_filp_open+0x31/0x80
[  993.686008]  [<c0655b78>] ? strncpy_from_user+0x38/0x70
[  993.686027]  [<c055147c>] ? alloc_fd+0x3c/0xe0
[  993.686031]  [<c0538be6>] do_sys_open+0xe6/0x1b0
[  993.686035]  [<c0546012>] ? user_path_create+0x42/0x60
[  993.686039]  [<c0538cde>] sys_open+0x2e/0x40
[  993.686048]  [<c093159f>] sysenter_do_call+0x12/0x28
[  993.686057]  [<c0920000>] ? kvm_para_has_feature+0x2b/0x45
[  993.686065] ---[ end trace da4de3c1d4fb843c ]---
Comment 97 Eric Sandeen 2012-01-04 18:07:47 EST
Er, unless we get confirmation from a few more of the bazillion people cc'd on this bug let's not set it as MODIFIED ;)

I do have a fix for the warning itself, but the root cause is likely a corruption that I want to know about, and the warning is helpful for that, so I've not pushed the warning fix just yet, sorry.

My kingdom for a reproducer.  :(
Comment 98 Josh Boyer 2012-01-04 20:16:58 EST
(In reply to comment #96)
> kernel-3.1.7-1.fc16 does NOT fix this for me.  After resume from hibernate with
> 3.1.7-1.fc16 I still get the same symptoms as before.

Yeah, I wouldn't expect 3.1.7-1.fc16 to fix this particular bug.  It resolves a different suspend/resume issue that prevented people from resuming at all.  This is clearly different.
Comment 99 Pekka Savola 2012-01-09 03:26:05 EST
On 3.1.6-1.fc16 I get this with laptop_mode=0 as well.
Comment 100 Josh Boyer 2012-01-09 09:23:05 EST
*** Bug 772471 has been marked as a duplicate of this bug. ***
Comment 101 Zahir Toufie 2012-01-10 18:18:33 EST
Getting this error too with kernel-2.6.41.4-1.fc15.x86_64, but only after updating these packages: dhcp-libs, dhcp-common, perl-Pod-Escapes, perl-libs, perl-Pod-Simple, perl-Module-Pluggable, perl, wine-core, wine-common, perl-Git, git, perl-Digest-SHA, perl-ExtUtils-ParseXS, perl-devel, perl-Test-Harness, perl-ExtUtils-MakeMaker, perl-CPAN, wine-courier-fonts, wine-marlett-fonts, wine-tahoma-fonts, wine-symbol-fonts, wine-systemd, wine-desktop, wine-small-fonts, wine-system-fonts, wine-ms-sans-serif-fonts, wine-fonts, gitk, dhclient, ffmpeg-libs, libical, jasper-libs, ethtool, mtr, espeak, polkit-qt, wine-openal, wine-alsa, wine-pulseaudio, wine-ldap, wine-cms, wine-twain, wine-wow, wine-capi, wine, jasper-libs. 

The update may or may not be related to the warnings.


Jan  4 23:01:14 zahir-acer kernel: [30896.458526] ------------[ cut here ]------------
Jan  4 23:01:14 zahir-acer kernel: [30896.458547] WARNING: at fs/inode.c:884 unlock_new_inode+0x34/0x59()
Jan  4 23:01:14 zahir-acer kernel: [30896.458552] Hardware name: Aspire 8950G
Jan  4 23:01:14 zahir-acer kernel: [30896.458556] Modules linked in: tcp_lp usb_storage uas tun hidp michael_mic arc4 ppdev parport_pc lp parport ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc cpufreq_ondemand acpi_cpufreq mperf rfcomm bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack fuse btusb fglrx(P) i2c_i801 snd_hda_codec_hdmi microcode snd_hda_codec_realtek lib80211_crypt_tkip snd_hda_intel snd_seq snd_hda_codec snd_seq_device snd_hwdep snd_pcm wl(P) xhci_hcd uvcvideo snd_timer videodev snd lib80211 media v4l2_compat_ioctl32 soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support r8169 mii joydev acer_wmi sparse_keymap serio_raw bluetooth rfkill virtio_net kvm_intel kvm ipv6 xts gf128mul dm_crypt wmi video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Jan  4 23:01:14 zahir-acer kernel: [30896.458707] Pid: 1395, comm: abrt-dump-oops Tainted: P        W   2.6.41.4-1.fc15.x86_64 #1
Jan  4 23:01:14 zahir-acer kernel: [30896.458713] Call Trace:
Jan  4 23:01:14 zahir-acer kernel: [30896.458726]  [<ffffffff81057a26>] warn_slowpath_common+0x83/0x9b
Jan  4 23:01:14 zahir-acer kernel: [30896.458734]  [<ffffffff81057a58>] warn_slowpath_null+0x1a/0x1c
Jan  4 23:01:14 zahir-acer kernel: [30896.458742]  [<ffffffff8113cbd4>] unlock_new_inode+0x34/0x59
Jan  4 23:01:14 zahir-acer kernel: [30896.458752]  [<ffffffff811a3cc4>] ext4_new_inode+0xc6d/0xd02
Jan  4 23:01:14 zahir-acer kernel: [30896.458761]  [<ffffffff811ad6aa>] ext4_mkdir+0x108/0x32c
Jan  4 23:01:14 zahir-acer kernel: [30896.458772]  [<ffffffff811347f1>] vfs_mkdir+0x5f/0x9b
Jan  4 23:01:14 zahir-acer kernel: [30896.458781]  [<ffffffff81135eef>] sys_mkdirat+0x6b/0xaa
Jan  4 23:01:14 zahir-acer kernel: [30896.458789]  [<ffffffff81135f46>] sys_mkdir+0x18/0x1a
Jan  4 23:01:14 zahir-acer kernel: [30896.458801]  [<ffffffff814a31c2>] system_call_fastpath+0x16/0x1b
Jan  4 23:01:14 zahir-acer kernel: [30896.458807] ---[ end trace 2d3bc05fb5254534 ]---
Comment 102 Josh Boyer 2012-01-13 16:08:25 EST
*** Bug 781618 has been marked as a duplicate of this bug. ***
Comment 103 Eric Sandeen 2012-01-13 16:40:27 EST
Can everyone who is experiencing this check to see if the 'sandbox' service is on?  Or, for whatever device is mounted on root, see if it is mounted elsewhere as well:

# grep sda9 /proc/mounts 
/dev/sda9 / ext4 rw,seclabel,relatime,user_xattr,acl,barrier=1,data=ordered 0 0
/dev/sda9 /tmp ext4 rw,seclabel,relatime,user_xattr,acl,barrier=1,data=ordered 0 0
/dev/sda9 /var/tmp ext4 rw,seclabel,relatime,user_xattr,acl,barrier=1,data=ordered 0 0
/dev/sda9 /home ext4 rw,seclabel,relatime,user_xattr,acl,barrier=1,data=ordered 0 0

I'd like to know 2 things:

1) Is anyone experiencing this WITHOUT the sandbox service active?
2) If you are experiencing this and sandbox is active, can you turn it off and see if it goes away?

Thanks,
-Eric
Comment 104 Josh Boyer 2012-01-13 22:27:11 EST
*** Bug 781655 has been marked as a duplicate of this bug. ***
Comment 105 Josh Boyer 2012-01-13 22:27:19 EST
*** Bug 781656 has been marked as a duplicate of this bug. ***
Comment 106 Pierre-Antoine Roiron 2012-01-16 03:28:01 EST
Hi Eric,
I use a Thinkpad edge laptop. I can reproduce the bug each time I wake the thing up from hibernate state. The root filesystem (/dev/vg_thinkpad/lv_root) gets corrupted each time. Using Yum reports error after the bug appear. When shuting down and booting after the bug, The boot stops and offers me to repair in a term. Using the following gets the things fixed :
# fsck.ext4 -y /dev/vg_thinkpad/lv_root

For the sandbox question, here's what I've done :
1 fresh boot with cleaned FS, kernel 3.1.9 (had in 3.1.7 too and maybe before)
2 # grep sda9 /proc/mounts gives no ansers
3 # service sandbox status -> sandbox is running
4 # service sandbox stop -> Stopping sandbox (via systemctl): [  OK  ]
5 hibernate
6 waking up
7 get the bug
8 # service sandbox status -> sandbox is running 
Shouldn't sandbox be still down?

I wish to help, so do not hesitate to post other questions, thanks for your work!
Comment 107 Pierre-Antoine Roiron 2012-01-16 03:31:32 EST
My smolt profile : http://www.smolts.org/client/show/pub_d71977cf-98d1-4e60-b997-d34eb7583d02
Comment 108 Pierre-Antoine Roiron 2012-01-16 03:46:45 EST
I made an error about my Kernel number. I was sure to be in 3.1.7... I checked the dates and here what happened :
When doing the sandbox test, I did hibernate in 3.1.7, and when it woke up, the new 3.1.9 appeared and default booted. I did have the bug, but only once, instead of 3 or 4 times in a row. I did have to fix the root FS.

Shutting down the laptop, booting on 3.1.9, hibernating and waking up did to reproduce the bug. It seemes fixed with 3.1.9?
Comment 109 Eric Sandeen 2012-01-16 12:47:59 EST
(In reply to comment #106)

> For the sandbox question, here's what I've done :
> 1 fresh boot with cleaned FS, kernel 3.1.9 (had in 3.1.7 too and maybe before)
> 2 # grep sda9 /proc/mounts gives no ansers
> 3 # service sandbox status -> sandbox is running
> 4 # service sandbox stop -> Stopping sandbox (via systemctl): [  OK  ]
> 5 hibernate
> 6 waking up
> 7 get the bug
> 8 # service sandbox status -> sandbox is running 
> Shouldn't sandbox be still down?

Rather than just stopping sandbox you might need to actually disable it entirely; whatever the systemd equivalent of "chkconfig --del sandbox" is.

And then the equivalent of "chkconfig --list sandbox" should show that it's not enabled.

It's a long shot but maybe worth testing.

-Eric
Comment 110 Marcela Mašláňová 2012-01-17 05:03:19 EST
Thank you. Switching off sandbox fixed it, I can hibernate again. I wonder why it was switched on by default.

My smolt profile:
http://www.smolts.org/client/show/pub_8808fc12-c68b-4b59-ac87-c2f78b9ed6e6
I have ssd disc.
Comment 111 Nils Philippsen 2012-01-17 09:34:05 EST
I'll try this out with kernel-3.1.9-1.fc16 and sandbox disabled (which was active on one machine, but seemed inactive on another).
Comment 112 Patrick Dubois 2012-01-17 10:58:54 EST
New behaviour for me with kernel 3.1.9-1 and sandbox disabled : 

--
[10782.856854] iwlwifi 0000:04:00.0: L1 Enabled; Disabling L0S
[10782.859868] iwlwifi 0000:04:00.0: Radio type=0x1-0x2-0x0
[10782.900576] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[10782.965978] tg3 0000:08:00.0: irq 50 for MSI/MSI-X
[10783.011227] EXT4-fs error (device dm-0): ext4_mb_generate_buddy:738: group 133, 22390 blocks in bitmap, 22382 in gd
[10783.011242] JBD: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[10783.069304] ADDRCONF(NETDEV_UP): p5p1: link is not ready
[10784.775855] tg3 0000:08:00.0: p5p1: Link is up at 100 Mbps, full duplex
[10784.775859] tg3 0000:08:00.0: p5p1: Flow control is off for TX and off for RX
[10784.776278] ADDRCONF(NETDEV_CHANGE): p5p1: link becomes ready
[10796.884450] EXT4-fs error (device dm-0) in ext4_new_inode:1077: IO failure
--
dm-0 is my root filesystem, LV but not encrypted. 

My /home is LV and encrypted but problem always seemed to be associated with '/'.

Hardware is a dell studio xps, problem occurs within 1 minute of returning from hibernation.
Comment 113 Eric Sandeen 2012-01-17 11:15:14 EST
(In reply to comment #112)
> New behaviour for me with kernel 3.1.9-1 and sandbox disabled : 

hrm, conflicting reports of whether sandbox matters.  It might be making it more or less likely but not 100% ...

> [10783.011227] EXT4-fs error (device dm-0): ext4_mb_generate_buddy:738: group
> 133, 22390 blocks in bitmap, 22382 in gd
> [10783.011242] JBD: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0).
> There's a risk of filesystem corruption in case of system crash.

ugh.  -maybe- latent on-disk corruption?

> [10796.884450] EXT4-fs error (device dm-0) in ext4_new_inode:1077: IO failure

"IO failure" has the same root cause as the previous WARNING: message, it's just more succinct now that the bitmap error is being more properly handled.

-Eric
Comment 114 Sebastian Trebitz 2012-01-18 02:13:59 EST
Could it be possible that it boils down to whether you use LVM or not?

I had this issue on my machine running Fedora 16 (installed as F16-beta) using LVM on top of an encrypted volume (root-fs and swap were a logical volumes). I faced this issue with all kernels mentioned above until 3.1.7-1.fc16.x86_64.

Then I reinstalled Fedora 16 on the same machine without using LVM but encrypted volumes; one for swap and one for the root-fs.

I hibernated the system without facing this issue any more with the following kernel releases (sandbox enabled - default install):

3.1.7-1.fc16.x86_64
3.1.8-2.fc16.x86_64
3.1.9-1.fc16.x86_64

-Sebastian
Comment 115 Pekka Savola 2012-01-18 02:35:53 EST
FWIW, I'm not using sandbox but I do use LVM (root fs). I saw the hibernate issue in 3.1.5-6 era, but I haven't seen it recently. I've not thoroughly tested every release. There might be multiple issues here and I suspect at least one case has been resolved recently.
Comment 116 Pierre-Antoine Roiron 2012-01-18 03:15:59 EST
Since 3.1.9, I can't reproduce it anymore. But I lost part of my media keys on the keyboard... but that's another problem.
Comment 117 Marcela Mašláňová 2012-01-18 03:33:06 EST
(In reply to comment #110)
> Thank you. Switching off sandbox fixed it, I can hibernate again. I wonder why
> it was switched on by default.
> 
> My smolt profile:
> http://www.smolts.org/client/show/pub_8808fc12-c68b-4b59-ac87-c2f78b9ed6e6
> I have ssd disc.

I spoke too soon. Second hibernation couldn't be done and after reboot abrt catch the same warning as usual.

I have also LVM.
Comment 118 Patrick Dubois 2012-01-18 11:04:37 EST
(In reply to comment #113)
> (In reply to comment #112)
> > New behaviour for me with kernel 3.1.9-1 and sandbox disabled : 
> 
> hrm, conflicting reports of whether sandbox matters.  It might be making it
> more or less likely but not 100% ...
> 
> > [10783.011227] EXT4-fs error (device dm-0): ext4_mb_generate_buddy:738: group
> > 133, 22390 blocks in bitmap, 22382 in gd
> > [10783.011242] JBD: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0).
> > There's a risk of filesystem corruption in case of system crash.
> 
> ugh.  -maybe- latent on-disk corruption?

That's not impossible.  I'll try a fresh install and restore from backups for further tests.  That should exclude latent corruption.
> 
> > [10796.884450] EXT4-fs error (device dm-0) in ext4_new_inode:1077: IO failure
> 
> "IO failure" has the same root cause as the previous WARNING: message, it's
> just more succinct now that the bitmap error is being more properly handled.
> 
> -Eric

Yup.  included just to be complete.
Comment 119 Josh Boyer 2012-01-18 21:02:53 EST
*** Bug 782996 has been marked as a duplicate of this bug. ***
Comment 120 Vox 2012-01-21 21:18:59 EST
Installing Software Updates

Package: kernel
OS Release: Fedora release 16 (Verne)
Comment 121 Josh Boyer 2012-01-23 10:54:17 EST
*** Bug 783711 has been marked as a duplicate of this bug. ***
Comment 122 Nils Philippsen 2012-01-25 17:44:23 EST
Reproduced with 3.2.1-3.fc16, sandbox service disabled, after resuming from hibernation:

[37523.131117] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[37523.143547] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[37556.169794] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group 65, 142 clusters in bitmap, 103 in gd
[37576.680227] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[37576.692211] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[37655.926148] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group 33, 1207 clusters in bitmap, 885 in gd
Comment 123 Dave Jones 2012-01-30 13:46:21 EST
*** Bug 781277 has been marked as a duplicate of this bug. ***
Comment 124 Vox 2012-02-02 04:28:13 EST
Pops up during updating the software.

Package: kernel
OS Release: Fedora release 16 (Verne)
Comment 125 Vox 2012-02-02 04:31:43 EST
Error occured after logging out and logging back in.

Package: kernel
OS Release: Fedora release 16 (Verne)
Comment 126 Jacek Pawlyta 2012-02-17 07:36:27 EST
*** Bug 772174 has been marked as a duplicate of this bug. ***
Comment 127 Mike Eddy 2012-02-21 10:42:31 EST
uname -a
Linux <hostname>.com 2.6.42.3-2.fc15.x86_64 #1 SMP Thu Feb 9 01:42:06 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Motherboard - ASUS P8H67-M PRO/CSM  LGA 1155 Corporate Stable Model Intel H67 DDR3 1333
Processor - Intel Quad Core. i5-2500K Processor (6M Cache, 3.30 GHz) LGA1155
Memory - Mushkin Enhanced Silverline 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10666) Model 996770
Hard Drive 2, sdb (data) - Western Digital WD20EARS 2Tb SATA

# no LVM

# system running normally ... no hibernation

Feb 20 00:36:49 <hostname> kernel: [397424.938767] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:739: group 593, 6144 clusters in bitmap, 32768 in gd
Feb 20 00:37:05 <hostname> kernel: [397441.059963] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:739: group 594, 6144 clusters in bitmap, 32768 in gd
Feb 20 00:37:21 <hostname> kernel: [397457.161828] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:739: group 595, 6144 clusters in bitmap, 32768 in gd
.
.
.
Feb 20 00:38:35 <hostname> kernel: [397531.429534] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:739: group 621, 0 clusters in bitmap, 32768 in gd


# a day later

Feb 21 00:37:55 <hostname> kernel: [483820.904300] EXT4-fs (sdb1): error count: 28
Feb 21 00:37:55 <hostname> kernel: [483820.904303] EXT4-fs (sdb1): initial error at 1329716209: ext4_mb_generate_buddy:739
Feb 21 00:37:55 <hostname> kernel: [483820.904305] EXT4-fs (sdb1): last error at 1329716315: ext4_mb_generate_buddy:739
Comment 128 Josh Boyer 2012-03-05 10:39:03 EST
*** Bug 799715 has been marked as a duplicate of this bug. ***
Comment 129 Dave Jones 2012-03-22 13:15:07 EDT
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Comment 130 Dave Jones 2012-03-22 13:17:17 EDT
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Comment 131 Dave Jones 2012-03-22 13:25:59 EDT
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Comment 132 Jacek Pawlyta 2012-03-26 13:27:34 EDT
unfortunately kernel-3.3.0-4.fc16 is no better, please find attached extract from my message.log.
* I hibernated (pm-suspend-hybrid) on low power
* resumed and after a while 
* started kernel update by yum and local files.

X-server disappeared after a while all the local consoles get unreachable, I had to login to the system from external machine and reboot it.

message log has tons of extfs4 error logs.
Comment 133 Jacek Pawlyta 2012-03-26 13:30:44 EDT
Created attachment 572819 [details]
message.log extract kernel-3.3.0-4, x86_64 sandbox on
Comment 134 Josh Boyer 2012-03-28 14:02:37 EDT
[Mass hibernate bug update]

Dave Airlied has found an issue causing some corruption in the i915 fbdev after a resume from hibernate.  I have included his patch in this scratch build:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3940545

This will probably not solve all of the issues being tracked at the moment, but it is worth testing when the build completes.  If this seems to clear up the issues you see with hibernate, please report your results in the bug.
Comment 135 Pierre-Antoine Roiron 2012-03-29 03:52:08 EDT
Thanks Josh, I'll report any remaining bug once the new build will hit the repos.
Comment 136 Jacek Pawlyta 2012-03-29 04:26:02 EDT
I installed kernel 3.3.0-7.1.fc16.x86_64
I use i915 in my laptop.

First hibernation was OK, and after the second one I got this:

[ 1991.321660] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group 228, 25399 clusters in bitmap, 25398 in gd
[ 1991.321678] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[ 1991.422508] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group 226, 18328 clusters in bitmap, 18327 in gd
[ 1991.422525] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[ 2001.554890] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
Comment 137 Marcela Mašláňová 2012-03-29 07:38:11 EDT
I have 3.3.0-4.fc16.x86_64 and it's okay. I tried 4x times hibernation with or without stap script posted above. I didn't received any errors nor crash.

I have lvm, dm-crypt and my smolt profile is:
http://www.smolts.org/client/show/pub_8808fc12-c68b-4b59-ac87-c2f78b9ed6e6
Comment 138 Josh Boyer 2012-03-29 07:41:05 EDT
(In reply to comment #135)
> Thanks Josh, I'll report any remaining bug once the new build will hit the
> repos.

It won't hit any repos.  It's a scratch-build, so you need to download it from the link provided.
Comment 139 Jacek Pawlyta 2012-03-29 07:57:28 EDT
(In reply to comment #136)
the smolt profile:
http://www.smolts.org/client/show/pub_503fa95d-8ef3-46f5-b1ae-014b47bd254d
Comment 140 Pierre-Antoine Roiron 2012-03-29 09:04:37 EDT
Thanks Josh. I'm not very familiar with all that. It's installed, time will tell.
Comment 141 Eric Sandeen 2012-03-29 10:55:09 EDT
(In reply to comment #136)

> [ 1991.321660] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group
> 228, 25399 clusters in bitmap, 25398 in gd
> [ 1991.321678] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0).
> There's a risk of filesystem corruption in case of system crash.
> [ 1991.422508] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group
> 226, 18328 clusters in bitmap, 18327 in gd
> [ 1991.422525] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0).
> There's a risk of filesystem corruption in case of system crash.
> [ 2001.554890] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure

Crud, looks like the same sorts of problems.
Comment 142 Pierre-Antoine Roiron 2012-03-30 03:18:01 EDT
4 hibernate with success, then this morning, while woking up from hibernate (the laptop, not me), while a few lines appeared reporting a node problem on an ext4 filesystem. I can't find where these lines have been logged. Not in pm-suspend.log, not in messages, not in boot.log... where to look for? Thanks for any help.
Comment 143 Pierre-Antoine Roiron 2012-03-30 03:19:28 EDT
I forgot to mention that I use kernel 3.3.0-7.1.fc16.x86_64 from Josh Boyer's link.
Comment 144 Jacek Pawlyta 2012-04-02 14:47:59 EDT
Kernel 3.3.0-8
and after the resume from the hibernation I got the following + corrupted fonts in GTK apps (I use KDE) maybe there is really something wrong with the i915?

Is there any way not to use i915 and then try to hibernate? 


[153870.568870] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153879.234380] generic-bluetooth xxxxxxx: unknown main item tag 0x0
[153879.234508] input: Logitech Bluetooth Mouse M555b as /devices/pci0000:00/0000:00:1d.1/usb7/7-2/7-2:1.0/bluetooth/hci0/hci0:12/input24
[153879.235635] generic-bluetooth xxxxxxx: input,hidraw0: BLUETOOTH HID v4.16 Mouse [Logitech Bluetooth Mouse M555b] on xxxxxx
[153882.719235] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group 228, 26894 clusters in bitmap, 26878 in gd
[153882.719254] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[153884.704323] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153949.488597] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153949.569640] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153949.652351] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153949.700876] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153949.750978] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153949.799361] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153949.907135] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153949.956060] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[153950.006030] EXT4-fs error (device dm-1) in ext4_new_inode:941: IO failure
[154022.348634] EXT4-fs error (device dm-1): ext4_free_inode:289: comm chrome: bit already cleared for inode 1840003
[154026.974883] EXT4-fs (dm-1): pa ffff88007fca1208: logic 9, phys. 7439388, len 55
[154026.974889] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3655: group 227, free 55, pa_free 28
[154026.974961] EXT4-fs error (device dm-1): ext4_free_inode:289: comm Chrome_HistoryT: bit already cleared for inode 1843736
[154026.975099] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[154026.976936] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[154027.096450] EXT4-fs error (device dm-1): mb_free_blocks:1348: group 229, block 7520914:freeing already freed block (bit 17042)
[154027.096460] EXT4-fs error (device dm-1): mb_free_blocks:1348: group 229, block 7520915:freeing already freed block (bit 17043)
[154027.096466] EXT4-fs error (device dm-1): mb_free_blocks:1348: group 229, block 7520918:freeing already freed block (bit 17046)
[154027.096471] EXT4-fs error (device dm-1): mb_free_blocks:1348: group 229, block 7520919:freeing already freed block (bit 17047)
[154027.096475] EXT4-fs error (device dm-1): mb_free_blocks:1348: group 229, block 7520920:freeing already freed block (bit 17048)
[154027.096481] EXT4-fs error (device dm-1): mb_free_blocks:1348: group 228, block 7487734:freeing already freed block (bit 16630)
[154027.096485] EXT4-fs error (device dm-1): mb_free_blocks:1348: group 228, block 7487735:freeing already freed block (bit 16631)
[154027.096490] EXT4-fs error (device dm-1): mb_free_blocks:1348: group 228, block 7487736:freeing already freed block (bit 16632)
[154027.096495] EXT4-fs error (device dm-1): mb_free_blocks:1348: group 228, block 7487737:freeing already freed block (bit 16633)
Comment 145 Jacek Pawlyta 2012-04-05 04:16:37 EDT
kernel-3.3.1-2 the same symptoms as for Joshe's 3.3.0-7, first hibernation was OK, and the second - mass disaster in the filesystem
Comment 146 Jacek Pawlyta 2012-04-12 13:29:34 EDT
kernel-3.3.1-5.fc16 the problem is still here.

I just have a question about swap. I have upgraded RAM from 3 to 6 GB and my swap size wasn't increased, maybe this gives the problem?
Comment 147 Jacek Pawlyta 2012-04-17 06:46:01 EDT
kernel 3.4.0-0.rc3.git0.1.fc18.x86_64

after second hibernation (first on a fresh reboot was OK)

EXT4-fs error (device dm-1) in ext4_new_inode:895: IO failure
[ 2221.576527] EXT4-fs error (device dm-1) in ext4_new_inode:895: IO failure
[ 2223.880090] EXT4-fs error (device dm-1) in ext4_new_inode:895: IO failure
[ 2224.397003] EXT4-fs error (device dm-1) in ext4_new_inode:895: IO failure
[ 2235.593121] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:741: group 276, 14392 clusters in bitmap, 14378 in gd
[ 2235.593143] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Comment 148 Bojan Smojver 2012-04-24 17:31:34 EDT
(In reply to comment #146)
> kernel-3.3.1-5.fc16 the problem is still here.
> 
> I just have a question about swap. I have upgraded RAM from 3 to 6 GB and my
> swap size wasn't increased, maybe this gives the problem?

Should not. If you don't have enough swap to write hibernation image, the hibernation will simply fail.
Comment 149 Bojan Smojver 2012-04-24 17:49:24 EDT
That kernel bug I pointed to may be similar to comment #147.
Comment 150 Jacek Pawlyta 2012-04-27 14:33:33 EDT
kernel-3.4.0-0.rc4.git2.1.fc18 hibernation/resume cycle six times and our bug was not visible.

kernel-3.4.0-0.rc4.git3.1.fc18 two hibernation/resume cycles - so far so good.
Comment 151 Jacek Pawlyta 2012-05-02 04:55:42 EDT
Sorry to inform you but eventually I had the same symptoms on kernel-3.4.0-0.rc4.git3.1.fc18 build form src.rpm --with release --with baseonly :(

I also build kernel-3.4.0-0.rc5.git2.1 the same way and have the same problem after second hib/res cycle

EXT4-fs error (device dm-1): ext4_mb_generate_buddy:741: group 264, 17705 clusters in bitmap, 17689 in gd
[ 2036.746659] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[ 2036.748046] EXT4-fs error (device dm-1) in ext4_new_inode:895: IO failure
[ 2042.143050] usbcore: registered new interface driver btusb
[ 2044.560534] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:741: group 261, 11251 clusters in bitmap, 11250 in gd
[ 2044.560554] JBD2: Spotted dirty metadata buffer (dev = dm-1, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Comment 152 Dave Jones 2012-05-07 14:51:36 EDT
*** Bug 748008 has been marked as a duplicate of this bug. ***
Comment 153 Dave Jones 2012-05-14 14:10:45 EDT
I've noticed we've stopped getting new reports of the original bug, so I'm wondering if the problem Jacek (and anyone else) is seeing is unrelated.
The original reports were caused by memory corruption, but these other reports may actually be genuine io errors.
Comment 154 Pekka Savola 2012-05-14 14:30:28 EDT
FWIW, it has been months since I've seen an oops, but usually I still get some errors, like this (the symptoms are similar; have to reboot and fsck automatically or manually):

EXT4-fs error (device dm-0): mb_free_blocks:1348: group 447, block 14657679: freeing already freed block (bit 10383)

This is with 3.3.5-2.fc16.i686
Comment 155 Eric Sandeen 2012-05-14 15:00:19 EDT
So, bitmap corruption - for some yet unknown reason ...
Comment 156 Jonas Thiem 2012-05-14 15:16:19 EDT
> I've noticed we've stopped getting new reports of the original bug

Brand-new ThinkPad, Fedora 17, Kernel 3.3.4-5.fc17.x86_64. System freshly installed, hardware not second hand or anything.

I got the EXT4-fs error: ext4_mb_generate_buddy and subsequent IO errors promptly after my first and only hibernate with parts of / dropping read-only up to the next reboot. I really doubt it is a hardware problem.

It seems a bit unclear to me whether that was actually the original bug of this report, but if it was, then it definitely still occurs.
Comment 157 Dave Jones 2012-05-14 15:40:05 EDT
The original report was the "WARNING: at fs/inode.c:884 unlock_new_inode" trace as shown in comment 1.

That seems to be fixed now.

Given there seem to be multiple other things going on in this report, and it's getting hard to follow with over 150 comments, I suggest filing new bugs if you're still seeing ext* related problems, and we can get a better picture of what's actually still a problem or not.
Comment 158 Jacek Pawlyta 2012-05-16 05:18:27 EDT
new bug is filed here: https://bugzilla.redhat.com/show_bug.cgi?id=822071

Note You need to log in before you can comment on or make changes to this bug.