Created attachment 458138 [details] script to mkfs and mount in parallel Description of problem: Running instances of mkfs.btrfs/mkfs.xfs on 24 disks and mounting the resultant file systems in parallel blows up the kernel. Version-Release number of selected component (if applicable): kernel-2.6.35.6-48.fc14.x86_64 How reproducible: Always Steps to Reproduce: 1. Run python format.py as root. Actual results: Kernel explodes. Expected results: Formatting and mounting should work. Additional info: Latest 2.6.32 in FC12 also has this issue. I'm still trying to get a kdump, but I'm not sure if the kernel crashes, so kdump might not be running.
One crash just looked like this: Nov 5 17:18:38 kernel: [ 625.742226] BUG: unable to handle kernel NULL pointer dereference at 0000000000000128 Nov 5 17:18:38 kernel: [ 625.757777] IP: [<ffffffffa0579010>] btrfs_test_super+0x10/0x26 [btrfs] Nov 5 17:18:38 kernel: [ 625.766032] PGD 0 Nov 5 17:18:38 kernel: [ 625.775730] Oops: 0000 [#1] SMP Nov 5 17:18:38 kernel: [ 625.780134] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:0d:00.0/host8/port-8:0/expander-8:0/port-8:0:21/end_device-8:0:21/target8:0:21/8:0:21:0/block/sdw/dev Nov 5 17:18:38 kernel: [ 625.800616] CPU 12 Nov 5 17:18:38 kernel: [ 625.803354] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs nls_utf8 hfsplus hfs vfat fat ext2 ipv6 mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb i7core_edac i2c_i801 i2c_core ses iTCO_wdt joydev edac_core serio_raw iTCO_vendor_support ioatdma enclosure dca microcode usb_storage mptsas mptscsih mptbase scsi_transport_sas [last unloaded: scsi_wait_scan] Nov 5 17:18:38 kernel: [ 625.867024] Nov 5 17:18:38 kernel: [ 625.867952] Pid: 2801, comm: mount Not tainted 2.6.35.6-48.fc14.x86_64 #1 ASSY,BLADE,X6270 /SUN BLADE X6270 SERVER MODULE Nov 5 17:18:38 kernel: [ 625.895335] RIP: 0010:[<ffffffffa0579010>] [<ffffffffa0579010>] btrfs_test_super+0x10/0x26 [btrfs] Nov 5 17:18:38 kernel: [ 625.903367] RSP: 0018:ffff88036e911cd8 EFLAGS: 00010287 Nov 5 17:18:38 kernel: [ 625.918565] RAX: 0000000000000000 RBX: ffff880376b3e000 RCX: ffff8801f5d7d140 Nov 5 17:18:38 kernel: [ 625.925409] RDX: 0000000000000788 RSI: ffff8801f5d7d140 RDI: ffff880376b3e000 Nov 5 17:18:38 kernel: [ 625.941752] RBP: ffff88036e911cd8 R08: ffff8801f5d7d1b8 R09: 0000000000000002 Nov 5 17:18:38 kernel: [ 625.960929] R10: ffff88036e911b68 R11: ffff8801f5d7d140 R12: ffffffffa05d03d0 Nov 5 17:18:38 kernel: [ 625.969841] R13: ffffffffa0579000 RNov 5 17:29:34 kernel: imklog 4.6.3, log source = /proc/kmsg started. Crash happens with at least mkfs.btrfs AND mkfs.xfs.
Created attachment 458145 [details] Stacktrace soon after crash
Created attachment 458601 [details] Crash immediately after running format.py
Created attachment 458602 [details] More crashing one minute later
Seems this only happens when some btrfs volumes are involved. My script wasn't properly reformatting all the disks as XFS because mkfs.xfs didn't like my labels.
Created attachment 459463 [details] Patch - fix race in btrfs_get_sb() Attempt to fix a race when obtaining the super block during rapid mounting.
I'm not sure if this will fix the problem here but could you give a kernel with the above patch a try please. You can find one at: http://people.redhat.com/~ikent/kernel-2.6.35.6-52.bz650261.1
(In reply to comment #6) > Created attachment 459463 [details] > Patch - fix race in btrfs_get_sb() > > Attempt to fix a race when obtaining the super block during > rapid mounting. OK, the comment should say "may not be complete" not "may be complete". I'll fix that later.
Will test a bit later today.
(In reply to comment #4) > Created attachment 458602 [details] > More crashing one minute later It doesn't look like there should be a problem here. This code, in the VFS, is very frequently used so the problem might be caused by the previous invalid access trying to get a super block for the mount. We need to try this with the above patch, see if it helps getting a super block and then see what happens after that.
(In reply to comment #9) > Will test a bit later today. Great, whenever you get a chance is fine, thanks.
Created attachment 459776 [details] Crash dump Tested with 2.6.35.6-52.bz650261.1.fc14.x86_64 No immediate crash anymore, but things aren't quite right yet: The system manages to mkfs about 5 btrfs filesystems. Then it hangs for about 60 seconds and then the BUG: soft lockup stuff starts printing out.
(In reply to comment #12) > Created attachment 459776 [details] > Crash dump > > Tested with 2.6.35.6-52.bz650261.1.fc14.x86_64 > > No immediate crash anymore, but things aren't quite right yet: > > The system manages to mkfs about 5 btrfs filesystems. Then it hangs for about > 60 seconds and then the BUG: soft lockup stuff starts printing out. Right, that looks ugly. Looks like there is more than one problem with btrfs super block creation. I have a look at this too.
(In reply to comment #13) > (In reply to comment #12) > > Created attachment 459776 [details] [details] > > Crash dump > > > > Tested with 2.6.35.6-52.bz650261.1.fc14.x86_64 > > > > No immediate crash anymore, but things aren't quite right yet: > > > > The system manages to mkfs about 5 btrfs filesystems. Then it hangs for about > > 60 seconds and then the BUG: soft lockup stuff starts printing out. > > Right, that looks ugly. > Looks like there is more than one problem with btrfs super block > creation. I have a look at this too. I don't think the is problem is nearly as simple as I originally thought. After looking more closely I don't think that a super block can get onto the list of file system super block instances without the btrfs root being set so I don't think the problem can be happening at mount time, as I originally thought. I see now that the btrfs root in a super block can be cleared before being removed from this list during umount so now I think that may be the problem. All I can do is update my patch and build another test kernel so we can see if that is the problem.
Created attachment 459987 [details] Patch - fix race in btrfs_get_sb() (2nd attempt)
Can you give this one a try please: http://people.redhat.com/~ikent/kernel-2.6.35.6-52.bz650261.2
Created attachment 460058 [details] 2.6.35.6-52.bz650261.2.fc14.x86_64 crash dump The instant crash is back with 2.6.35.6-52.bz650261.2.fc14.x86_64. [ 187.394266] [<ffffffff81118df6>] sget+0x54/0x367 [ 187.408162] [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7 [ 187.415878] [<ffffffffa016882d>] btrfs_get_sb+0x108/0x3eb [btrfs] [ 187.428172] [<ffffffff810ffd5e>] ? alloc_pages_current+0xb2/0xc3 [ 187.432555] [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac [ 187.450176] [<ffffffff81118d00>] do_kern_mount+0x4d/0xef [ 187.456343] [<ffffffff8112e45a>] do_mount+0x700/0x75d [ 187.468170] [<ffffffff8112e6e7>] sys_mount+0x88/0xc2 [ 187.473033] [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
(In reply to comment #17) > Created attachment 460058 [details] > 2.6.35.6-52.bz650261.2.fc14.x86_64 crash dump > > The instant crash is back with 2.6.35.6-52.bz650261.2.fc14.x86_64. > > [ 187.394266] [<ffffffff81118df6>] sget+0x54/0x367 > [ 187.408162] [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7 > [ 187.415878] [<ffffffffa016882d>] btrfs_get_sb+0x108/0x3eb [btrfs] > [ 187.428172] [<ffffffff810ffd5e>] ? alloc_pages_current+0xb2/0xc3 > [ 187.432555] [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac > [ 187.450176] [<ffffffff81118d00>] do_kern_mount+0x4d/0xef > [ 187.456343] [<ffffffff8112e45a>] do_mount+0x700/0x75d > [ 187.468170] [<ffffffff8112e6e7>] sys_mount+0x88/0xc2 > [ 187.473033] [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b Aaaah, every time I have to work through the super block handling in the VFS I have re-work out what is going on. It's fairly complex. The gotcha now is that, even if I'm right about the btrfs root not being set early enough I can't use the method I previously tried. It violates lock ordering and taking a mutex might sleep so it can't be done while holding the list spin lock. Oh well, I'll need to keep working on it. I'll get back when I have another kernel to test. Ian
Heh so I sat down and figured out the problem and came to put it in here and then realized you already figured out what the problem is. We need to be using a btrfs_set_super instead of set_anon_super and have btrfs_set_super call set_anon_super, and setup the root in there first. Of course we don't have a root until we do the open_ctree, so for now I think the best approach is to just setup a skeleton root+fs_info and use those in open_ctree. I'll try and come up with something like that.
Created attachment 460563 [details] patch to fix the problem Possible fix, please let me know if this fixes the problem for you.
(In reply to comment #20) > Created attachment 460563 [details] > patch to fix the problem > > Possible fix, please let me know if this fixes the problem for you. Hello I don't have a build environment ready to go on the machine. Is there any chance you can build a kernel RPM like Ian did? Regards Albert
(In reply to comment #21) > (In reply to comment #20) > > Created attachment 460563 [details] [details] > > patch to fix the problem > > > > Possible fix, please let me know if this fixes the problem for you. > > Hello > > I don't have a build environment ready to go on the machine. Is there any > chance you can build a kernel RPM like Ian did? Let me have a look through Josef's patch, then I'll do a scratch build for you to test. It might be waiting to see if Chris Mason has an opinion as well, so lets see if he replies to the message I posted to him (cc the btrfs list). It would probably be worth posting this patch in response to that list message Josef. It's late for me now anyway, so I'll get to this tomorrow. Ian
Created attachment 460736 [details] Patch - fix error handling in btrfs_get_sb I think it's wise to include Josefs upstream error handling patch. It allows the subsequent setup blank root to apply a little more cleanly. This pactch is included in 2.6.36 and will need to be dropped when f14 moves to 2.6.36. Can you see any problem with this Josef?
Created attachment 460737 [details] Patch - setup blank root and fs_info for mount time (against f14 2.6.35) This is just Josefs patch "patch to fix the problem" against f14. I found a typo in btrfs_set_super(), sb needs to be s and removed the blank line.
Created attachment 460738 [details] Patch - fix compile error setup blank root As the tittle says, should be folded into you "setup blank root" patch.
Created attachment 460739 [details] Patch - fix memory leak on finding existing super Josef, please have a look at this and if you agree this is needed fold it into your "setup blank root" patch.
Created attachment 460740 [details] Patch - fix memory leak in close_ctree() Again, could you have a look at this and if you agree it is needed fold it into you "setup blank root" patch.
Created attachment 460741 [details] Patch - fix race between btrfs_get_sb() and umount I know you didn't think this was a problem but could you please have another look, I'm fairly sure it's a bug and think it's worth fixing.
Created attachment 460745 [details] Patch - fix lock order in blkdev_get and blkdev_put() I think this will fix the deadlock we see in the traces of processes waiting on the BKL. This patch isn't needed in 2.6.36 since the BKL usage has been removed from these functions.
(In reply to comment #21) > (In reply to comment #20) > > Created attachment 460563 [details] [details] > > patch to fix the problem > > > > Possible fix, please let me know if this fixes the problem for you. > > Hello > > I don't have a build environment ready to go on the machine. Is there any > chance you can build a kernel RPM like Ian did? Just for your info., these patches are not working yet. Ha, what have I broken, ;) Ian
Comment on attachment 460740 [details] Patch - fix memory leak in close_ctree() Freeing the btrfs tree root is a much more sophisticated activity. Freeing it in btrfs_put_super() is just plain wrong. Dropping this patch.
Can you give this one a try please: http://people.redhat.com/~ikent/kernel-2.6.35.6-52.bz650261 Rapid mount and umount will very likely break the /etc/mtab locking. That is nothing to do with this problem and I'm not interested in hearing about it so don't report it.
There were two kernel packages, so I tested 2.6.35.6-52.bz650261.4.fc14.x86_64. It survived my tests. Thanks! :-) I did notice warnings like: [ 492.692031] Warning: dev (pts0) tty->count(6) != #fd's(5) in tty_release_dev [ 527.287606] Warning: dev (pts0) tty->count(5) != #fd's(4) in tty_release_dev but these are probably due to another bug. Also, if you could point me to any discussions about the /etc/mtab locking problems you mentioned, that would be greatly appreciated. I'm guessing you're referring to this message and script on the linux-btrfs list from Li Zefan? http://www.spinics.net/lists/linux-btrfs/msg06932.html We're doing rapid mounting, but not really rapid unmounting. Can this cause problems?
(In reply to comment #33) > There were two kernel packages, so I tested 2.6.35.6-52.bz650261.4.fc14.x86_64. > > It survived my tests. Thanks! :-) Great, now to work who to speak to to get the needed patches into a release kernel, mmm .... > > I did notice warnings like: > > [ 492.692031] Warning: dev (pts0) tty->count(6) != #fd's(5) in tty_release_dev > [ 527.287606] Warning: dev (pts0) tty->count(5) != #fd's(4) in tty_release_dev > > but these are probably due to another bug. Yep, doesn't look like something for us to concern ourselves with. > > Also, if you could point me to any discussions about the /etc/mtab locking > problems you mentioned, that would be greatly appreciated. I'm guessing you're > referring to this message and script on the linux-btrfs list from Li Zefan? > > http://www.spinics.net/lists/linux-btrfs/msg06932.html I did use his suggestion to test and noticed the mtab locking was broken. The reason I said this is because this has been a problem for many years and I'm not interested in working on it or discussing it any more, ever! > > We're doing rapid mounting, but not really rapid unmounting. Can this cause > problems? Yes. All I can suggest is investigate symlinking /proc/mounts to /etc/mtab but take care. Ian
*** Bug 656465 has been marked as a duplicate of this bug. ***
Can you make a release for rawhide kernels as well please?
(In reply to comment #36) > Can you make a release for rawhide kernels as well please? This isn't a Rawhide bug. It's difficult enough to keep track of the patches here for f14 but adding another series for Rawhide as well would make this bug a complete mess. Log a bug against Rawhide and I should be able to post a patch series against the Rawhide kernel and build a scratch build for you.
Thanks Ian, I reopened my bug for rawhide and assigned it to you. Thanks for your support.
This message is a notice that Fedora 14 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 14. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '14' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 14 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping