650261 – mkfs.btrfs and mount on 24 disks in parallel blows up

Bug 650261 - mkfs.btrfs and mount on 24 disks in parallel blows up

Summary: mkfs.btrfs and mount on 24 disks in parallel blows up

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	14
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	urgent
Target Milestone:	---
Assignee:	Ian Kent
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-11-05 16:04 UTC by Albert Strasheim
Modified:	2012-08-16 21:55 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-08-16 21:55:02 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
script to mkfs and mount in parallel (737 bytes, text/plain) 2010-11-05 16:04 UTC, Albert Strasheim	no flags	Details
Stacktrace soon after crash (91.92 KB, image/jpg) 2010-11-05 16:18 UTC, Albert Strasheim	no flags	Details
Crash immediately after running format.py (3.36 KB, text/plain) 2010-11-08 06:44 UTC, Albert Strasheim	no flags	Details
More crashing one minute later (52.95 KB, text/plain) 2010-11-08 06:45 UTC, Albert Strasheim	no flags	Details
Patch - fix race in btrfs_get_sb() (1.08 KB, patch) 2010-11-10 14:41 UTC, Ian Kent	no flags	Details \| Diff
Crash dump (53.53 KB, text/plain) 2010-11-11 14:59 UTC, Albert Strasheim	no flags	Details
Patch - fix race in btrfs_get_sb() (2nd attempt) (1011 bytes, patch) 2010-11-12 09:32 UTC, Ian Kent	no flags	Details \| Diff
2.6.35.6-52.bz650261.2.fc14.x86_64 crash dump (3.49 KB, text/plain) 2010-11-12 14:16 UTC, Albert Strasheim	no flags	Details
patch to fix the problem (3.99 KB, patch) 2010-11-15 15:58 UTC, Josef Bacik	no flags	Details \| Diff
Patch - fix error handling in btrfs_get_sb (1.78 KB, patch) 2010-11-16 06:11 UTC, Ian Kent	no flags	Details \| Diff
Patch - setup blank root and fs_info for mount time (against f14 2.6.35) (3.82 KB, patch) 2010-11-16 06:14 UTC, Ian Kent	no flags	Details \| Diff
Patch - fix compile error setup blank root (775 bytes, patch) 2010-11-16 06:16 UTC, Ian Kent	no flags	Details \| Diff
Patch - fix memory leak on finding existing super (737 bytes, patch) 2010-11-16 06:19 UTC, Ian Kent	no flags	Details \| Diff
Patch - fix memory leak in close_ctree() (798 bytes, patch) 2010-11-16 06:21 UTC, Ian Kent	no flags	Details \| Diff
Patch - fix race between btrfs_get_sb() and umount (1.06 KB, patch) 2010-11-16 06:23 UTC, Ian Kent	no flags	Details \| Diff
Patch - fix lock order in blkdev_get and blkdev_put() (1.10 KB, patch) 2010-11-16 06:28 UTC, Ian Kent	no flags	Details \| Diff
Show Obsolete (4) View All

Description Albert Strasheim 2010-11-05 16:04:49 UTC

Created attachment 458138 [details]
script to mkfs and mount in parallel

Description of problem:

Running instances of mkfs.btrfs/mkfs.xfs on 24 disks and mounting the resultant file systems in parallel blows up the kernel.

Version-Release number of selected component (if applicable):

kernel-2.6.35.6-48.fc14.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Run python format.py as root.
  
Actual results:

Kernel explodes.

Expected results:

Formatting and mounting should work.

Additional info:

Latest 2.6.32 in FC12 also has this issue.

I'm still trying to get a kdump, but I'm not sure if the kernel crashes, so kdump might not be running.

Comment 1 Albert Strasheim 2010-11-05 16:13:11 UTC

One crash just looked like this:

Nov  5 17:18:38 kernel: [  625.742226] BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
Nov  5 17:18:38 kernel: [  625.757777] IP: [<ffffffffa0579010>] btrfs_test_super+0x10/0x26 [btrfs]
Nov  5 17:18:38 kernel: [  625.766032] PGD 0
Nov  5 17:18:38 kernel: [  625.775730] Oops: 0000 [#1] SMP
Nov  5 17:18:38 kernel: [  625.780134] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:0d:00.0/host8/port-8:0/expander-8:0/port-8:0:21/end_device-8:0:21/target8:0:21/8:0:21:0/block/sdw/dev
Nov  5 17:18:38 kernel: [  625.800616] CPU 12
Nov  5 17:18:38 kernel: [  625.803354] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs nls_utf8 hfsplus hfs vfat fat ext2 ipv6 mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb i7core_edac i2c_i801 i2c_core ses iTCO_wdt joydev edac_core serio_raw iTCO_vendor_support ioatdma enclosure dca microcode usb_storage mptsas mptscsih mptbase scsi_transport_sas [last unloaded: scsi_wait_scan]
Nov  5 17:18:38 kernel: [  625.867024]
Nov  5 17:18:38 kernel: [  625.867952] Pid: 2801, comm: mount Not tainted 2.6.35.6-48.fc14.x86_64 #1 ASSY,BLADE,X6270      /SUN BLADE X6270 SERVER MODULE
Nov  5 17:18:38 kernel: [  625.895335] RIP: 0010:[<ffffffffa0579010>]  [<ffffffffa0579010>] btrfs_test_super+0x10/0x26 [btrfs]
Nov  5 17:18:38 kernel: [  625.903367] RSP: 0018:ffff88036e911cd8  EFLAGS: 00010287
Nov  5 17:18:38 kernel: [  625.918565] RAX: 0000000000000000 RBX: ffff880376b3e000 RCX: ffff8801f5d7d140
Nov  5 17:18:38 kernel: [  625.925409] RDX: 0000000000000788 RSI: ffff8801f5d7d140 RDI: ffff880376b3e000
Nov  5 17:18:38 kernel: [  625.941752] RBP: ffff88036e911cd8 R08: ffff8801f5d7d1b8 R09: 0000000000000002
Nov  5 17:18:38 kernel: [  625.960929] R10: ffff88036e911b68 R11: ffff8801f5d7d140 R12: ffffffffa05d03d0
Nov  5 17:18:38 kernel: [  625.969841] R13: ffffffffa0579000 RNov  5 17:29:34 kernel: imklog 4.6.3, log source = /proc/kmsg started.

Crash happens with at least mkfs.btrfs AND mkfs.xfs.

Comment 2 Albert Strasheim 2010-11-05 16:18:46 UTC

Created attachment 458145 [details]
Stacktrace soon after crash

Comment 3 Albert Strasheim 2010-11-08 06:44:11 UTC

Created attachment 458601 [details]
Crash immediately after running format.py

Comment 4 Albert Strasheim 2010-11-08 06:45:07 UTC

Created attachment 458602 [details]
More crashing one minute later

Comment 5 Albert Strasheim 2010-11-08 07:17:30 UTC

Seems this only happens when some btrfs volumes are involved.

My script wasn't properly reformatting all the disks as XFS because mkfs.xfs didn't like my labels.

Comment 6 Ian Kent 2010-11-10 14:41:09 UTC

Created attachment 459463 [details]
Patch - fix race in btrfs_get_sb()

Attempt to fix a race when obtaining the super block during
rapid mounting.

Comment 7 Ian Kent 2010-11-10 14:44:53 UTC

I'm not sure if this will fix the problem here but could you
give a kernel with the above patch a try please.

You can find one at:
http://people.redhat.com/~ikent/kernel-2.6.35.6-52.bz650261.1

Comment 8 Ian Kent 2010-11-10 14:47:21 UTC

(In reply to comment #6)
> Created attachment 459463 [details]
> Patch - fix race in btrfs_get_sb()
> 
> Attempt to fix a race when obtaining the super block during
> rapid mounting.

OK, the comment should say "may not be complete" not
"may be complete". I'll fix that later.

Comment 9 Albert Strasheim 2010-11-11 14:28:50 UTC

Will test a bit later today.

Comment 10 Ian Kent 2010-11-11 14:33:36 UTC

(In reply to comment #4)
> Created attachment 458602 [details]
> More crashing one minute later

It doesn't look like there should be a problem here.
This code, in the VFS, is very frequently used so the problem might
be caused by the previous invalid access trying to get a super block
for the mount. We need to try this with the above patch, see if it
helps getting a super block and then see what happens after that.

Comment 11 Ian Kent 2010-11-11 14:34:15 UTC

(In reply to comment #9)
> Will test a bit later today.

Great, whenever you get a chance is fine, thanks.

Comment 12 Albert Strasheim 2010-11-11 14:59:11 UTC

Created attachment 459776 [details]
Crash dump

Tested with 2.6.35.6-52.bz650261.1.fc14.x86_64

No immediate crash anymore, but things aren't quite right yet:

The system manages to mkfs about 5 btrfs filesystems. Then it hangs for about 60 seconds and then the BUG: soft lockup stuff starts printing out.

Comment 13 Ian Kent 2010-11-11 15:46:49 UTC

(In reply to comment #12)
> Created attachment 459776 [details]
> Crash dump
> 
> Tested with 2.6.35.6-52.bz650261.1.fc14.x86_64
> 
> No immediate crash anymore, but things aren't quite right yet:
> 
> The system manages to mkfs about 5 btrfs filesystems. Then it hangs for about
> 60 seconds and then the BUG: soft lockup stuff starts printing out.

Right, that looks ugly.
Looks like there is more than one problem with btrfs super block
creation. I have a look at this too.

Comment 14 Ian Kent 2010-11-12 07:23:02 UTC

(In reply to comment #13)
> (In reply to comment #12)
> > Created attachment 459776 [details] [details]
> > Crash dump
> > 
> > Tested with 2.6.35.6-52.bz650261.1.fc14.x86_64
> > 
> > No immediate crash anymore, but things aren't quite right yet:
> > 
> > The system manages to mkfs about 5 btrfs filesystems. Then it hangs for about
> > 60 seconds and then the BUG: soft lockup stuff starts printing out.
> 
> Right, that looks ugly.
> Looks like there is more than one problem with btrfs super block
> creation. I have a look at this too.

I don't think the is problem is nearly as simple as I originally
thought.

After looking more closely I don't think that a super block can
get onto the list of file system super block instances without
the btrfs root being set so I don't think the problem can be
happening at mount time, as I originally thought. I see now
that the btrfs root in a super block can be cleared before
being removed from this list during umount so now I think
that may be the problem.

All I can do is update my patch and build another test kernel
so we can see if that is the problem.

Comment 15 Ian Kent 2010-11-12 09:32:00 UTC

Created attachment 459987 [details]
Patch - fix race in btrfs_get_sb() (2nd attempt)

Comment 16 Ian Kent 2010-11-12 12:27:35 UTC

Can you give this one a try please:
http://people.redhat.com/~ikent/kernel-2.6.35.6-52.bz650261.2

Comment 17 Albert Strasheim 2010-11-12 14:16:08 UTC

Created attachment 460058 [details]
2.6.35.6-52.bz650261.2.fc14.x86_64 crash dump

The instant crash is back with 2.6.35.6-52.bz650261.2.fc14.x86_64.

[  187.394266]  [<ffffffff81118df6>] sget+0x54/0x367
[  187.408162]  [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7
[  187.415878]  [<ffffffffa016882d>] btrfs_get_sb+0x108/0x3eb [btrfs]
[  187.428172]  [<ffffffff810ffd5e>] ? alloc_pages_current+0xb2/0xc3
[  187.432555]  [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac
[  187.450176]  [<ffffffff81118d00>] do_kern_mount+0x4d/0xef
[  187.456343]  [<ffffffff8112e45a>] do_mount+0x700/0x75d
[  187.468170]  [<ffffffff8112e6e7>] sys_mount+0x88/0xc2
[  187.473033]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b

Comment 18 Ian Kent 2010-11-12 16:33:48 UTC

(In reply to comment #17)
> Created attachment 460058 [details]
> 2.6.35.6-52.bz650261.2.fc14.x86_64 crash dump
> 
> The instant crash is back with 2.6.35.6-52.bz650261.2.fc14.x86_64.
> 
> [  187.394266]  [<ffffffff81118df6>] sget+0x54/0x367
> [  187.408162]  [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7
> [  187.415878]  [<ffffffffa016882d>] btrfs_get_sb+0x108/0x3eb [btrfs]
> [  187.428172]  [<ffffffff810ffd5e>] ? alloc_pages_current+0xb2/0xc3
> [  187.432555]  [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac
> [  187.450176]  [<ffffffff81118d00>] do_kern_mount+0x4d/0xef
> [  187.456343]  [<ffffffff8112e45a>] do_mount+0x700/0x75d
> [  187.468170]  [<ffffffff8112e6e7>] sys_mount+0x88/0xc2
> [  187.473033]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b

Aaaah, every time I have to work through the super block
handling in the VFS I have re-work out what is going on.
It's fairly complex.

The gotcha now is that, even if I'm right about the btrfs
root not being set early enough I can't use the method I
previously tried. It violates lock ordering and taking a
mutex might sleep so it can't be done while holding the
list spin lock.

Oh well, I'll need to keep working on it.
I'll get back when I have another kernel to test.

Ian

Comment 19 Josef Bacik 2010-11-15 15:37:42 UTC

Heh so I sat down and figured out the problem and came to put it in here and then realized you already figured out what the problem is.  We need to be using a btrfs_set_super instead of set_anon_super and have btrfs_set_super call set_anon_super, and setup the root in there first.  Of course we don't have a root until we do the open_ctree, so for now I think the best approach is to just setup a skeleton root+fs_info and use those in open_ctree.  I'll try and come up with something like that.

Comment 20 Josef Bacik 2010-11-15 15:58:51 UTC

Created attachment 460563 [details]
patch to fix the problem

Possible fix, please let me know if this fixes the problem for you.

Comment 21 Albert Strasheim 2010-11-15 16:04:58 UTC

(In reply to comment #20)
> Created attachment 460563 [details]
> patch to fix the problem
> 
> Possible fix, please let me know if this fixes the problem for you.

Hello

I don't have a build environment ready to go on the machine. Is there any chance you can build a kernel RPM like Ian did?

Regards

Albert

Comment 22 Ian Kent 2010-11-15 16:37:02 UTC

(In reply to comment #21)
> (In reply to comment #20)
> > Created attachment 460563 [details] [details]
> > patch to fix the problem
> > 
> > Possible fix, please let me know if this fixes the problem for you.
> 
> Hello
> 
> I don't have a build environment ready to go on the machine. Is there any
> chance you can build a kernel RPM like Ian did?

Let me have a look through Josef's patch, then I'll do a scratch
build for you to test.

It might be waiting to see if Chris Mason has an opinion as well,
so lets see if he replies to the message I posted to him (cc the
btrfs list). It would probably be worth posting this patch in
response to that list message Josef.

It's late for me now anyway, so I'll get to this tomorrow.

Ian

Comment 23 Ian Kent 2010-11-16 06:11:29 UTC

Created attachment 460736 [details]
Patch - fix error handling in btrfs_get_sb

I think it's wise to include Josefs upstream error handling
patch. It allows the subsequent setup blank root to apply a
little more cleanly. This pactch is included in 2.6.36 and
will need to be dropped when f14 moves to 2.6.36.

Can you see any problem with this Josef?

Comment 24 Ian Kent 2010-11-16 06:14:55 UTC

Created attachment 460737 [details]
Patch - setup blank root and fs_info for mount time (against f14 2.6.35)

This is just Josefs patch "patch to fix the problem" against
f14.

I found a typo in btrfs_set_super(), sb needs to be s and
removed the blank line.

Comment 25 Ian Kent 2010-11-16 06:16:54 UTC

Created attachment 460738 [details]
Patch - fix compile error setup blank root

As the tittle says, should be folded into you "setup
blank root" patch.

Comment 26 Ian Kent 2010-11-16 06:19:33 UTC

Created attachment 460739 [details]
Patch - fix memory leak on finding existing super

Josef, please have a look at this and if you agree this
is needed fold it into your "setup blank root" patch.

Comment 27 Ian Kent 2010-11-16 06:21:33 UTC

Created attachment 460740 [details]
Patch - fix memory leak in close_ctree()

Again, could you have a look at this and if you agree it
is needed fold it into you "setup blank root" patch.

Comment 28 Ian Kent 2010-11-16 06:23:46 UTC

Created attachment 460741 [details]
Patch - fix race between btrfs_get_sb() and umount

I know you didn't think this was a problem but could you
please have another look, I'm fairly sure it's a bug and
think it's worth fixing.

Comment 29 Ian Kent 2010-11-16 06:28:15 UTC

Created attachment 460745 [details]
Patch - fix lock order in blkdev_get and blkdev_put()

I think this will fix the deadlock we see in the traces of
processes waiting on the BKL.

This patch isn't needed in 2.6.36 since the BKL usage has
been removed from these functions.

Comment 30 Ian Kent 2010-11-16 13:16:33 UTC

(In reply to comment #21)
> (In reply to comment #20)
> > Created attachment 460563 [details] [details]
> > patch to fix the problem
> > 
> > Possible fix, please let me know if this fixes the problem for you.
> 
> Hello
> 
> I don't have a build environment ready to go on the machine. Is there any
> chance you can build a kernel RPM like Ian did?

Just for your info., these patches are not working yet.
Ha, what have I broken, ;)
Ian

Comment 31 Ian Kent 2010-11-17 00:31:20 UTC

Comment on attachment 460740 [details]
Patch - fix memory leak in close_ctree()

Freeing the btrfs tree root is a much more sophisticated activity. Freeing it in btrfs_put_super() is just plain wrong. Dropping this patch.

Comment 32 Ian Kent 2010-11-17 00:45:52 UTC

Can you give this one a try please:
http://people.redhat.com/~ikent/kernel-2.6.35.6-52.bz650261

Rapid mount and umount will very likely break the /etc/mtab
locking. That is nothing to do with this problem and I'm not
interested in hearing about it so don't report it.

Comment 33 Albert Strasheim 2010-11-17 04:44:03 UTC

There were two kernel packages, so I tested 2.6.35.6-52.bz650261.4.fc14.x86_64.

It survived my tests. Thanks! :-)

I did notice warnings like:

[  492.692031] Warning: dev (pts0) tty->count(6) != #fd's(5) in tty_release_dev
[  527.287606] Warning: dev (pts0) tty->count(5) != #fd's(4) in tty_release_dev

but these are probably due to another bug.

Also, if you could point me to any discussions about the /etc/mtab locking problems you mentioned, that would be greatly appreciated. I'm guessing you're referring to this message and script on the linux-btrfs list from Li Zefan?

http://www.spinics.net/lists/linux-btrfs/msg06932.html

We're doing rapid mounting, but not really rapid unmounting. Can this cause problems?

Comment 34 Ian Kent 2010-11-17 05:40:37 UTC

(In reply to comment #33)
> There were two kernel packages, so I tested 2.6.35.6-52.bz650261.4.fc14.x86_64.
> 
> It survived my tests. Thanks! :-)

Great, now to work who to speak to to get the needed patches
into a release kernel, mmm ....

> 
> I did notice warnings like:
> 
> [  492.692031] Warning: dev (pts0) tty->count(6) != #fd's(5) in tty_release_dev
> [  527.287606] Warning: dev (pts0) tty->count(5) != #fd's(4) in tty_release_dev
> 
> but these are probably due to another bug.

Yep, doesn't look like something for us to concern ourselves
with.

> 
> Also, if you could point me to any discussions about the /etc/mtab locking
> problems you mentioned, that would be greatly appreciated. I'm guessing you're
> referring to this message and script on the linux-btrfs list from Li Zefan?
> 
> http://www.spinics.net/lists/linux-btrfs/msg06932.html

I did use his suggestion to test and noticed the mtab locking
was broken. The reason I said this is because this has been a
problem for many years and I'm not interested in working on it
or discussing it any more, ever!

> 
> We're doing rapid mounting, but not really rapid unmounting. Can this cause
> problems?

Yes.
All I can suggest is investigate symlinking /proc/mounts to
/etc/mtab but take care.

Ian

Comment 35 Michal Schmidt 2010-11-24 06:24:03 UTC

*** Bug 656465 has been marked as a duplicate of this bug. ***

Comment 36 Vaclav "sHINOBI" Misek 2010-11-24 07:50:13 UTC

Can you make a release for rawhide kernels as well please?

Comment 37 Ian Kent 2010-11-26 02:39:07 UTC

(In reply to comment #36)
> Can you make a release for rawhide kernels as well please?

This isn't a Rawhide bug.

It's difficult enough to keep track of the patches here for
f14 but adding another series for Rawhide as well would make
this bug a complete mess.

Log a bug against Rawhide and I should be able to post a patch
series against the Rawhide kernel and build a scratch build for
you.

Comment 38 Vaclav "sHINOBI" Misek 2010-11-27 21:51:17 UTC

Thanks Ian, I reopened my bug for rawhide and assigned it to you. Thanks for your support.

Comment 39 Fedora End Of Life 2012-08-16 21:55:05 UTC

This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Note You need to log in before you can comment on or make changes to this bug.