Bug 1279622 - GFS2: fatal: assertion "!atomic_read(&gl->gl_ail_count)" failed
Summary: GFS2: fatal: assertion "!atomic_read(&gl->gl_ail_count)" failed
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ben Marzinski
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
: 1284566 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-09 21:23 UTC by Nate Straz
Modified: 2016-05-11 11:39 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-11 11:39:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
patch to move where the SDF_JOURNAL_LIVE bit is cleared (2.18 KB, patch)
2015-12-09 03:10 UTC, Ben Marzinski
no flags Details | Diff

Description Nate Straz 2015-11-09 21:23:47 UTC
Description of problem:

While umounting a GFS2 file system after running tests one node of three panicked.


[254340.210802] dlm: brawl0: dlm_recover 5
[254340.214572] dlm: brawl0: dlm_clear_toss 355 done
[254340.216621] dlm: brawl0: remove member 1
[254340.219278] dlm: brawl0: dlm_recover_members 2 nodes
[254340.221899] dlm: brawl0: generation 4 slots 2 2:2 3:3
[254340.223105] dlm: brawl0: dlm_recover_directory
[254340.229759] dlm: brawl0: dlm_recover_directory 389 in 384 new
[254340.253346] dlm: brawl0: dlm_recover_directory 28 out 1 messages
[254340.255896] dlm: brawl0: dlm_recover_purge 5 locks for 1 nodes
[254340.257740] dlm: brawl0: dlm_recover_masters
[254340.258998] dlm: brawl0: dlm_recover_masters 83 of 138
[254340.269351] dlm: brawl0: dlm_recover_locks 0 out
[254340.314169] dlm: brawl0: dlm_recover_locks 45 in
[254340.315572] dlm: brawl0: dlm_recover_rsbs 133 done
[254340.350792] dlm: brawl0: dlm_recover 5 generation 4 done: 127 ms
[254340.352843] gfs2: fsid=STSRHTS20160:brawl0.1: recover generation 4 done
[254340.386302] gfs2: fsid=STSRHTS20160:brawl0.1: fatal: assertion "!atomic_read(&gl->gl_ail_count)" failed
[254340.386302]    function = inode_go_inval, file = fs/gfs2/glops.c, line = 257
[254340.390276] Kernel panic - not syncing: GFS2: fsid=STSRHTS20160:brawl0.1: panic requested
[254340.390276]
[254340.390951] CPU: 0 PID: 32702 Comm: kworker/0:0H Not tainted 4.3.0+ #20
[254340.390951] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[254340.390951] Workqueue: glock_workqueue glock_work_func [gfs2]
[254340.390951]  ffffffffa0475c00 ffff88003ca47b50 ffffffff8130751c ffffffffa047abe0
[254340.390951]  ffff88003ca47bc8 ffffffff8117e3fe ffff880000000010 ffff88003ca47bd8
[254340.390951]  ffff88003ca47b78 ffffffff8117f0a7 ffff88003dbaf328 0000000000000018
[254340.390951] Call Trace:
[254340.390951]  [<ffffffff8130751c>] dump_stack+0x44/0x58
[254340.390951]  [<ffffffff8117e3fe>] panic+0xc8/0x205
[254340.390951]  [<ffffffff8117f0a7>] ? printk+0x4d/0x4f
[254340.390951]  [<ffffffffa047041e>] gfs2_lm_withdraw+0xae/0x150 [gfs2]
[254340.390951]  [<ffffffffa04704e4>] gfs2_assert_withdraw_i+0x24/0x40 [gfs2]
[254340.390951]  [<ffffffffa0454f84>] inode_go_inval+0xe4/0xf0 [gfs2]
[254340.390951]  [<ffffffffa04539a3>] do_xmote+0x173/0x270 [gfs2]
[254340.390951]  [<ffffffffa0453bee>] run_queue+0x14e/0x280 [gfs2]
[254340.390951]  [<ffffffffa0453d79>] glock_work_func+0x59/0x120 [gfs2]
[254340.390951]  [<ffffffff81096c2e>] process_one_work+0x1be/0x4b0
[254340.390951]  [<ffffffff81096bc5>] ? process_one_work+0x155/0x4b0
[254340.390951]  [<ffffffff81097034>] worker_thread+0x114/0x450
[254340.390951]  [<ffffffff81096f20>] ? process_one_work+0x4b0/0x4b0
[254340.390951]  [<ffffffff8109cda7>] kthread+0xe7/0x100
[254340.390951]  [<ffffffff8109ccc0>] ? kthread_create_on_node+0x220/0x220
[254340.390951]  [<ffffffff8164bf5f>] ret_from_fork+0x3f/0x70
[254340.390951]  [<ffffffff8109ccc0>] ? kthread_create_on_node+0x220/0x220
[254340.390951] Kernel Offset: disabled


Version-Release number of selected component (if applicable):
upstream kernel 3.4.0 + patch for bug 1277692


How reproducible:
Unknown


Steps to Reproduce:
1. run load
2. ???
3. umount

Actual results:


Expected results:


Additional info:

Comment 1 Nate Straz 2015-11-10 14:33:45 UTC
I did hit this a second time, but not on purpose.  

gfs2: fsid=STSRHTS20160:brawl0.0: fatal: assertion "!atomic_read(&gl->gl_ail_count)" failed
   function = inode_go_inval, file = fs/gfs2/glops.c, line = 257
Kernel panic - not syncing: GFS2: fsid=STSRHTS20160:brawl0.0: panic requested

CPU: 0 PID: 32045 Comm: kworker/0:1H Not tainted 4.3.0+ #20
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
Workqueue: glock_workqueue glock_work_func [gfs2]
 ffffffffa049ac00 ffff88003654fb50 ffffffff8130751c ffffffffa049fbe0
 ffff88003654fbc8 ffffffff8117e3fe ffff880000000010 ffff88003654fbd8
 ffff88003654fb78 ffffffff8117f0a7 ffff88003c0e3328 0000000000000018
Call Trace:
 [<ffffffff8130751c>] dump_stack+0x44/0x58
 [<ffffffff8117e3fe>] panic+0xc8/0x205
 [<ffffffff8117f0a7>] ? printk+0x4d/0x4f
 [<ffffffffa049541e>] gfs2_lm_withdraw+0xae/0x150 [gfs2]
 [<ffffffffa04954e4>] gfs2_assert_withdraw_i+0x24/0x40 [gfs2]
 [<ffffffffa0479f84>] inode_go_inval+0xe4/0xf0 [gfs2]
 [<ffffffffa04789a3>] do_xmote+0x173/0x270 [gfs2]
 [<ffffffffa0478bee>] run_queue+0x14e/0x280 [gfs2]
 [<ffffffffa0478d79>] glock_work_func+0x59/0x120 [gfs2]
 [<ffffffff81096c2e>] process_one_work+0x1be/0x4b0
 [<ffffffff81096bc5>] ? process_one_work+0x155/0x4b0
 [<ffffffff81097034>] worker_thread+0x114/0x450
 [<ffffffff81096f20>] ? process_one_work+0x4b0/0x4b0
 [<ffffffff8109cda7>] kthread+0xe7/0x100
 [<ffffffff8109ccc0>] ? kthread_create_on_node+0x220/0x220
 [<ffffffff8164bf5f>] ret_from_fork+0x3f/0x70
 [<ffffffff8109ccc0>] ? kthread_create_on_node+0x220/0x220
Kernel Offset: disabled

Comment 2 Nate Straz 2015-11-11 19:53:07 UTC
I've been able to reproduce this a few more times and every time the inode in question is the statfs inode, 99313.

 gfs2: fsid=STSRHTS3863:brawl0.1: inode_go_inval with ail count: ino 99313 cnt 1

Comment 3 Nate Straz 2015-11-20 14:36:28 UTC
I ran a git bisect and found this commit to be the first where we hit the statfs ref count issue on umount.

commit 24972557b12ce8fd5b6c6847d0e2ee1837ddc13b
Author: Benjamin Marzinski <bmarzins>
Date:   Thu May 1 22:26:55 2014 -0500

    GFS2: remove transaction glock
    
    GFS2 has a transaction glock, which must be grabbed for every
    transaction, whose purpose is to deal with freezing the filesystem.
    Aside from this involving a large amount of locking, it is very easy to
    make the current fsfreeze code hang on unfreezing.
    
    This patch rewrites how gfs2 handles freezing the filesystem. The
    transaction glock is removed. In it's place is a freeze glock, which is
    cached (but not held) in a shared state by every node in the cluster
    when the filesystem is mounted. This lock only needs to be grabbed on
    freezing, and actions which need to be safe from freezing, like
    recovery.
    
    When a node wants to freeze the filesystem, it grabs this glock
    exclusively.  When the freeze glock state changes on the nodes (either
    from shared to unlocked, or shared to exclusive), the filesystem does a
    special log flush.  gfs2_log_flush() does all the work for flushing out
    the and shutting down the incore log, and then it tries to grab the
    freeze glock in a shared state again.  Since the filesystem is stuck in
    gfs2_log_flush, no new transaction can start, and nothing can be written
    to disk. Unfreezing the filesytem simply involes dropping the freeze
    glock, allowing gfs2_log_flush() to grab and then release the shared
    lock, so it is cached for next time.
    
    However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
    shared lock on the filesystem root directory inode to check permissions.
    If that glock has already been grabbed exclusively, fsfreeze will be
    unable to get the shared lock and unfreeze the filesystem.
    
    In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
    on the filesystem root directory during the freeze, and hold it until it
    unfreezes the filesystem.  The functions which need to grab a shared
    lock in order to allow the unfreeze ioctl to be issued now use the lock
    grabbed by the freeze code instead.
    
    The freeze and unfreeze code take care to make sure that this shared
    lock will not be dropped while another process is using it.
    
    Signed-off-by: Benjamin Marzinski <bmarzins>
    Signed-off-by: Steven Whitehouse <swhiteho>

Comment 4 Ben Marzinski 2015-11-20 15:22:45 UTC
It looks like this is my bug then, unless you really want to keep it, Bob.

Comment 5 Nate Straz 2015-11-20 15:51:37 UTC
The load I'm running is from multiple nodes:

1. mount -t gfs2 -o errors=panic,data=ordered /dev/foo /mnt/gfs2
2. I/O load
3. df /mnt/gfs2
4. umount /mnt/gfs2

I'll see if I can reduce the reproducer more.

Comment 6 Nate Straz 2015-11-20 21:19:55 UTC
I wasn't able to reproduce this with one node.  With two nodes it took a lot longer.  I think the more nodes the better.

Comment 7 Andreas Gruenbacher 2015-11-23 15:44:38 UTC
*** Bug 1284566 has been marked as a duplicate of this bug. ***

Comment 8 Ben Marzinski 2015-12-02 18:48:05 UTC
I've been trying to reproduce this bug with a script that looks like:

#####

#!/bin/bash

MAINNODE="gfs-i8c-01.mpc.lab.eng.bos.redhat.com"
NODELIST="gfs-i8c-01.mpc.lab.eng.bos.redhat.com gfs-i8c-02.mpc.lab.eng.bos.redhat.com gfs-i8c-03.mpc.lab.eng.bos.redhat.com gfs-i8c-04.mpc.lab.eng.bos.redhat.com"

qarsh root@$MAINNODE mkfs.gfs2 -p lock_dlm -t bobsrhel7:myfs -j4 /dev/clariioni/bmarzins
while true; do
        for NODE in $NODELIST; do
                qarsh root@$NODE mount -o errors=panic,data=ordered /dev/clariioni/bmarzins /mnt/gfs2
        done

        /usr/tests/sts-rhel7.2/gfs/bin/dd_io -m /mnt/gfs2 -l /usr/tests/sts-rhel7.2 -r /usr/tests/sts-rhel7.2/ -R bobsrhel7.xml -S QUICK

        for NODE in $NODELIST; do
                qarsh root@$NODE df \; umount /mnt/gfs2 &
        done

        sleep 2
        for NODE in $NODELIST; do
                qarsh root@$NODE umount /mnt/gfs2 &
        done

        sleep 5
        for NODE in $NODELIST; do
                qarsh root@$NODE df \| grep gfs2
        done
done

#####

However, I've had no luck at all. I've tried remaking the filesystem every loop. I've tried different scenarios with dd_io. I've tried different ways of running the df and unmount commands, to see if that triggers it.  But it's been running in various configurations for days now, without hitting this.

I've been running on a recent commit to the master branch of linux-gfs2:
df6d04308666c887a42f85f27b1d1c38102ca4a1

I can try running on 24972557b12ce8fd5b6c6847d0e2ee1837ddc13b, but I just want to check to see if this seems sensible, and if you can hit the issue with a similar script, Nate.

Comment 9 Nate Straz 2015-12-02 20:45:50 UTC
The main difference is that I'm running the mount loop on each node separately so the nodes aren't all mounting and umounting at the same time.  I'm also using a different load which touches more files.

I gave the scenario in bug 1284566 a try and was surprised to see the node fail in just a few quick iterations with only one node involved.

$ dd if=/dev/zero of=/var/tmp/gfs2.img bs=1M count=256
$ mkfs.gfs2 -O -p lock_nolock /var/tmp/gfs2.img
$ mkdir -p /mnt/gfs2/
$ for i in `seq 1000`; do mount -o loop,errors=panic /var/tmp/gfs2.img /mnt/gfs2/; echo foo > /mnt/gfs2/foo-$i; umount /mnt/gfs2; done

Comment 10 Ben Marzinski 2015-12-03 19:26:21 UTC
This hits it for me. I'm looking into it now.

Comment 11 Ben Marzinski 2015-12-09 03:10:34 UTC
Created attachment 1103777 [details]
patch to move where the SDF_JOURNAL_LIVE bit is cleared

When gfs2 was unmounting filesystems or changing them to read-only it
was clearing the SDF_JOURNAL_LIVE bit before the final log flush.  This
caused a race.  If an inode glock got demoted in the gap between
clearing the bit and the shutdown flush, it would be unable to reserve
log space to clear out the acive items list in inode_go_sync, causing an
error in inode_go_inval because the glock was still dirty.

To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the
shutdown log flush.  This means that, because of the locking on the log
blocks, either inode_go_sync will be able to reserve space to clean the
glock before the shutdown flush, or the shutdown flush will clean the
glock itself, before inode_go_sync fails to reserve the space. Either
way, the glock will be clean before inode_go_inval.

Comment 12 Andreas Gruenbacher 2015-12-09 14:02:53 UTC
Looking good, thanks.

Comment 13 Mike McCune 2016-03-28 22:21:14 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions


Note You need to log in before you can comment on or make changes to this bug.