Description of problem: While umounting a GFS2 file system after running tests one node of three panicked. [254340.210802] dlm: brawl0: dlm_recover 5 [254340.214572] dlm: brawl0: dlm_clear_toss 355 done [254340.216621] dlm: brawl0: remove member 1 [254340.219278] dlm: brawl0: dlm_recover_members 2 nodes [254340.221899] dlm: brawl0: generation 4 slots 2 2:2 3:3 [254340.223105] dlm: brawl0: dlm_recover_directory [254340.229759] dlm: brawl0: dlm_recover_directory 389 in 384 new [254340.253346] dlm: brawl0: dlm_recover_directory 28 out 1 messages [254340.255896] dlm: brawl0: dlm_recover_purge 5 locks for 1 nodes [254340.257740] dlm: brawl0: dlm_recover_masters [254340.258998] dlm: brawl0: dlm_recover_masters 83 of 138 [254340.269351] dlm: brawl0: dlm_recover_locks 0 out [254340.314169] dlm: brawl0: dlm_recover_locks 45 in [254340.315572] dlm: brawl0: dlm_recover_rsbs 133 done [254340.350792] dlm: brawl0: dlm_recover 5 generation 4 done: 127 ms [254340.352843] gfs2: fsid=STSRHTS20160:brawl0.1: recover generation 4 done [254340.386302] gfs2: fsid=STSRHTS20160:brawl0.1: fatal: assertion "!atomic_read(&gl->gl_ail_count)" failed [254340.386302] function = inode_go_inval, file = fs/gfs2/glops.c, line = 257 [254340.390276] Kernel panic - not syncing: GFS2: fsid=STSRHTS20160:brawl0.1: panic requested [254340.390276] [254340.390951] CPU: 0 PID: 32702 Comm: kworker/0:0H Not tainted 4.3.0+ #20 [254340.390951] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [254340.390951] Workqueue: glock_workqueue glock_work_func [gfs2] [254340.390951] ffffffffa0475c00 ffff88003ca47b50 ffffffff8130751c ffffffffa047abe0 [254340.390951] ffff88003ca47bc8 ffffffff8117e3fe ffff880000000010 ffff88003ca47bd8 [254340.390951] ffff88003ca47b78 ffffffff8117f0a7 ffff88003dbaf328 0000000000000018 [254340.390951] Call Trace: [254340.390951] [<ffffffff8130751c>] dump_stack+0x44/0x58 [254340.390951] [<ffffffff8117e3fe>] panic+0xc8/0x205 [254340.390951] [<ffffffff8117f0a7>] ? printk+0x4d/0x4f [254340.390951] [<ffffffffa047041e>] gfs2_lm_withdraw+0xae/0x150 [gfs2] [254340.390951] [<ffffffffa04704e4>] gfs2_assert_withdraw_i+0x24/0x40 [gfs2] [254340.390951] [<ffffffffa0454f84>] inode_go_inval+0xe4/0xf0 [gfs2] [254340.390951] [<ffffffffa04539a3>] do_xmote+0x173/0x270 [gfs2] [254340.390951] [<ffffffffa0453bee>] run_queue+0x14e/0x280 [gfs2] [254340.390951] [<ffffffffa0453d79>] glock_work_func+0x59/0x120 [gfs2] [254340.390951] [<ffffffff81096c2e>] process_one_work+0x1be/0x4b0 [254340.390951] [<ffffffff81096bc5>] ? process_one_work+0x155/0x4b0 [254340.390951] [<ffffffff81097034>] worker_thread+0x114/0x450 [254340.390951] [<ffffffff81096f20>] ? process_one_work+0x4b0/0x4b0 [254340.390951] [<ffffffff8109cda7>] kthread+0xe7/0x100 [254340.390951] [<ffffffff8109ccc0>] ? kthread_create_on_node+0x220/0x220 [254340.390951] [<ffffffff8164bf5f>] ret_from_fork+0x3f/0x70 [254340.390951] [<ffffffff8109ccc0>] ? kthread_create_on_node+0x220/0x220 [254340.390951] Kernel Offset: disabled Version-Release number of selected component (if applicable): upstream kernel 3.4.0 + patch for bug 1277692 How reproducible: Unknown Steps to Reproduce: 1. run load 2. ??? 3. umount Actual results: Expected results: Additional info:
I did hit this a second time, but not on purpose. gfs2: fsid=STSRHTS20160:brawl0.0: fatal: assertion "!atomic_read(&gl->gl_ail_count)" failed function = inode_go_inval, file = fs/gfs2/glops.c, line = 257 Kernel panic - not syncing: GFS2: fsid=STSRHTS20160:brawl0.0: panic requested CPU: 0 PID: 32045 Comm: kworker/0:1H Not tainted 4.3.0+ #20 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 Workqueue: glock_workqueue glock_work_func [gfs2] ffffffffa049ac00 ffff88003654fb50 ffffffff8130751c ffffffffa049fbe0 ffff88003654fbc8 ffffffff8117e3fe ffff880000000010 ffff88003654fbd8 ffff88003654fb78 ffffffff8117f0a7 ffff88003c0e3328 0000000000000018 Call Trace: [<ffffffff8130751c>] dump_stack+0x44/0x58 [<ffffffff8117e3fe>] panic+0xc8/0x205 [<ffffffff8117f0a7>] ? printk+0x4d/0x4f [<ffffffffa049541e>] gfs2_lm_withdraw+0xae/0x150 [gfs2] [<ffffffffa04954e4>] gfs2_assert_withdraw_i+0x24/0x40 [gfs2] [<ffffffffa0479f84>] inode_go_inval+0xe4/0xf0 [gfs2] [<ffffffffa04789a3>] do_xmote+0x173/0x270 [gfs2] [<ffffffffa0478bee>] run_queue+0x14e/0x280 [gfs2] [<ffffffffa0478d79>] glock_work_func+0x59/0x120 [gfs2] [<ffffffff81096c2e>] process_one_work+0x1be/0x4b0 [<ffffffff81096bc5>] ? process_one_work+0x155/0x4b0 [<ffffffff81097034>] worker_thread+0x114/0x450 [<ffffffff81096f20>] ? process_one_work+0x4b0/0x4b0 [<ffffffff8109cda7>] kthread+0xe7/0x100 [<ffffffff8109ccc0>] ? kthread_create_on_node+0x220/0x220 [<ffffffff8164bf5f>] ret_from_fork+0x3f/0x70 [<ffffffff8109ccc0>] ? kthread_create_on_node+0x220/0x220 Kernel Offset: disabled
I've been able to reproduce this a few more times and every time the inode in question is the statfs inode, 99313. gfs2: fsid=STSRHTS3863:brawl0.1: inode_go_inval with ail count: ino 99313 cnt 1
I ran a git bisect and found this commit to be the first where we hit the statfs ref count issue on umount. commit 24972557b12ce8fd5b6c6847d0e2ee1837ddc13b Author: Benjamin Marzinski <bmarzins> Date: Thu May 1 22:26:55 2014 -0500 GFS2: remove transaction glock GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins> Signed-off-by: Steven Whitehouse <swhiteho>
It looks like this is my bug then, unless you really want to keep it, Bob.
The load I'm running is from multiple nodes: 1. mount -t gfs2 -o errors=panic,data=ordered /dev/foo /mnt/gfs2 2. I/O load 3. df /mnt/gfs2 4. umount /mnt/gfs2 I'll see if I can reduce the reproducer more.
I wasn't able to reproduce this with one node. With two nodes it took a lot longer. I think the more nodes the better.
*** Bug 1284566 has been marked as a duplicate of this bug. ***
I've been trying to reproduce this bug with a script that looks like: ##### #!/bin/bash MAINNODE="gfs-i8c-01.mpc.lab.eng.bos.redhat.com" NODELIST="gfs-i8c-01.mpc.lab.eng.bos.redhat.com gfs-i8c-02.mpc.lab.eng.bos.redhat.com gfs-i8c-03.mpc.lab.eng.bos.redhat.com gfs-i8c-04.mpc.lab.eng.bos.redhat.com" qarsh root@$MAINNODE mkfs.gfs2 -p lock_dlm -t bobsrhel7:myfs -j4 /dev/clariioni/bmarzins while true; do for NODE in $NODELIST; do qarsh root@$NODE mount -o errors=panic,data=ordered /dev/clariioni/bmarzins /mnt/gfs2 done /usr/tests/sts-rhel7.2/gfs/bin/dd_io -m /mnt/gfs2 -l /usr/tests/sts-rhel7.2 -r /usr/tests/sts-rhel7.2/ -R bobsrhel7.xml -S QUICK for NODE in $NODELIST; do qarsh root@$NODE df \; umount /mnt/gfs2 & done sleep 2 for NODE in $NODELIST; do qarsh root@$NODE umount /mnt/gfs2 & done sleep 5 for NODE in $NODELIST; do qarsh root@$NODE df \| grep gfs2 done done ##### However, I've had no luck at all. I've tried remaking the filesystem every loop. I've tried different scenarios with dd_io. I've tried different ways of running the df and unmount commands, to see if that triggers it. But it's been running in various configurations for days now, without hitting this. I've been running on a recent commit to the master branch of linux-gfs2: df6d04308666c887a42f85f27b1d1c38102ca4a1 I can try running on 24972557b12ce8fd5b6c6847d0e2ee1837ddc13b, but I just want to check to see if this seems sensible, and if you can hit the issue with a similar script, Nate.
The main difference is that I'm running the mount loop on each node separately so the nodes aren't all mounting and umounting at the same time. I'm also using a different load which touches more files. I gave the scenario in bug 1284566 a try and was surprised to see the node fail in just a few quick iterations with only one node involved. $ dd if=/dev/zero of=/var/tmp/gfs2.img bs=1M count=256 $ mkfs.gfs2 -O -p lock_nolock /var/tmp/gfs2.img $ mkdir -p /mnt/gfs2/ $ for i in `seq 1000`; do mount -o loop,errors=panic /var/tmp/gfs2.img /mnt/gfs2/; echo foo > /mnt/gfs2/foo-$i; umount /mnt/gfs2; done
This hits it for me. I'm looking into it now.
Created attachment 1103777 [details] patch to move where the SDF_JOURNAL_LIVE bit is cleared When gfs2 was unmounting filesystems or changing them to read-only it was clearing the SDF_JOURNAL_LIVE bit before the final log flush. This caused a race. If an inode glock got demoted in the gap between clearing the bit and the shutdown flush, it would be unable to reserve log space to clear out the acive items list in inode_go_sync, causing an error in inode_go_inval because the glock was still dirty. To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the shutdown log flush. This means that, because of the locking on the log blocks, either inode_go_sync will be able to reserve space to clean the glock before the shutdown flush, or the shutdown flush will clean the glock itself, before inode_go_sync fails to reserve the space. Either way, the glock will be clean before inode_go_inval.
Looking good, thanks.
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
This is fixed upstream: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/gfs2?id=400ac52e805bb6852e743817bc05a136e85042a9