Bug 523264

Summary: possible circular locking dependency detected during LVM snapshot stress test
Product: Red Hat Enterprise Linux 6 Reporter: Mike Gahagan <mgahagan>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0   
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-06 18:51:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Gahagan 2009-09-14 16:25:28 UTC
Description of problem:
device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.31-0.174.rc7.git2.fc12.x86_64 #1
-------------------------------------------------------
dd/24068 is trying to acquire lock:
 (ksnaphd){+.+...}, at: [<ffffffff8107d3c9>] flush_workqueue+0x0/0xd2

but task is already holding lock:
 (&s->lock){+.+.-.}, at: [<ffffffff8140b8a9>] origin_map+0x21a/0x3f2

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&s->lock){+.+.-.}:
       [<ffffffff81098960>] __lock_acquire+0xa79/0xc0e
       [<ffffffff81098be3>] lock_acquire+0xee/0x12e
       [<ffffffff8150353b>] down_write+0x5b/0xa4
       [<ffffffff8140c204>] snapshot_map+0x9b/0x3e6
       [<ffffffff81402966>] __map_bio+0x4c/0x121
       [<ffffffff81403da7>] __split_and_process_bio+0x278/0x537
       [<ffffffff81404493>] dm_request+0x20b/0x234
       [<ffffffff8125dfe5>] generic_make_request+0x29e/0x2fc
       [<ffffffff8125e126>] submit_bio+0xe3/0x100
       [<ffffffff811681ed>] submit_bh+0x103/0x13c
       [<ffffffff81169df6>] ll_rw_block+0xba/0x10c
       [<ffffffff8116a3ea>] __block_prepare_write+0x200/0x2b8
       [<ffffffff8116a684>] block_write_begin+0x98/0xfe
       [<ffffffff8116ee71>] blkdev_write_begin+0x39/0x4f
       [<ffffffff810fba9e>] generic_file_buffered_write+0x140/0x33c
       [<ffffffff810fc109>] __generic_file_aio_write_nolock+0x287/0x2d3
       [<ffffffff810fc374>] generic_file_aio_write_nolock+0x50/0xb8
       [<ffffffff811413bd>] do_sync_write+0xfa/0x14b
       [<ffffffff81141a71>] vfs_write+0xbd/0x12e
       [<ffffffff81141bd4>] sys_write+0x59/0x91
       [<ffffffff81012f42>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #2 (&md->io_lock){++++--}:
       [<ffffffff81098960>] __lock_acquire+0xa79/0xc0e
       [<ffffffff81098be3>] lock_acquire+0xee/0x12e
       [<ffffffff815035e2>] down_read+0x5e/0xa7
       [<ffffffff814042ee>] dm_request+0x66/0x234
       [<ffffffff8125dfe5>] generic_make_request+0x29e/0x2fc
       [<ffffffff8125e126>] submit_bio+0xe3/0x100
       [<ffffffff81409ca7>] dispatch_io+0x1a0/0x1e7
       [<ffffffff81409ecb>] dm_io+0x14f/0x26a
       [<ffffffff8140da81>] do_metadata+0x38/0x56
       [<ffffffff8107c58c>] worker_thread+0x222/0x33e
       [<ffffffff81082194>] kthread+0xac/0xb4
       [<ffffffff8101412a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (&req.work){+.+...}:
       [<ffffffff81098960>] __lock_acquire+0xa79/0xc0e
       [<ffffffff81098be3>] lock_acquire+0xee/0x12e
       [<ffffffff8107c586>] worker_thread+0x21c/0x33e
       [<ffffffff81082194>] kthread+0xac/0xb4
       [<ffffffff8101412a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (ksnaphd){+.+...}:
       [<ffffffff8109883d>] __lock_acquire+0x956/0xc0e
       [<ffffffff81098be3>] lock_acquire+0xee/0x12e
       [<ffffffff8107d439>] flush_workqueue+0x70/0xd2
       [<ffffffff8140dc2d>] chunk_io+0xf5/0x118
       [<ffffffff8140dcb8>] write_header+0x68/0x7e
       [<ffffffff8140dcfa>] persistent_drop_snapshot+0x2c/0x54
       [<ffffffff8140b3ad>] __invalidate_snapshot+0x5d/0x95
       [<ffffffff8140b935>] origin_map+0x2a6/0x3f2
       [<ffffffff81402966>] __map_bio+0x4c/0x121
       [<ffffffff81403da7>] __split_and_process_bio+0x278/0x537
       [<ffffffff81404493>] dm_request+0x20b/0x234
       [<ffffffff8125dfe5>] generic_make_request+0x29e/0x2fc
       [<ffffffff8125e126>] submit_bio+0xe3/0x100
       [<ffffffff811681ed>] submit_bh+0x103/0x13c
       [<ffffffff8116ae91>] __block_write_full_page+0x1e5/0x2e3
       [<ffffffff8116b027>] block_write_full_page_endio+0x98/0xbb
       [<ffffffff8116b072>] block_write_full_page+0x28/0x3e
       [<ffffffff8116eef3>] blkdev_writepage+0x2b/0x41
       [<ffffffff811029e8>] __writepage+0x29/0x5f
       [<ffffffff81103239>] write_cache_pages+0x27b/0x415
       [<ffffffff8110340b>] generic_writepages+0x38/0x4e
       [<ffffffff8110345e>] do_writepages+0x3d/0x63
       [<ffffffff810fb8bc>] __filemap_fdatawrite_range+0x60/0x76
       [<ffffffff810fc1cc>] filemap_fdatawrite+0x32/0x48
       [<ffffffff810fc215>] filemap_write_and_wait+0x33/0x64
       [<ffffffff8116f2b6>] __sync_blockdev+0x43/0x59
       [<ffffffff8116f2f2>] sync_blockdev+0x26/0x3c
       [<ffffffff8116f3bc>] __blkdev_put+0x6c/0x161
       [<ffffffff8116f4d4>] blkdev_put+0x23/0x39
       [<ffffffff8116f53b>] blkdev_close+0x51/0x6c
       [<ffffffff81142f75>] __fput+0x137/0x200
       [<ffffffff8114306b>] fput+0x2d/0x43
       [<ffffffff8113f250>] filp_close+0x77/0x97
       [<ffffffff8113f330>] sys_close+0xc0/0x110
       [<ffffffff81012f42>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

4 locks held by dd/24068:
 #0:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8116f398>] __blkdev_put+0x48/0x161
 #1:  (&md->io_lock){++++--}, at: [<ffffffff814042ee>] dm_request+0x66/0x234
 #2:  (&_origins_lock){++++..}, at: [<ffffffff8140b6f6>] origin_map+0x67/0x3f2
 #3:  (&s->lock){+.+.-.}, at: [<ffffffff8140b8a9>] origin_map+0x21a/0x3f2

stack backtrace:
Pid: 24068, comm: dd Not tainted 2.6.31-0.174.rc7.git2.fc12.x86_64 #1
Call Trace:
 [<ffffffff81097b20>] print_circular_bug_tail+0x80/0x9f
 [<ffffffff81097a68>] ? check_noncircular+0xb0/0xe8
 [<ffffffff8109883d>] __lock_acquire+0x956/0xc0e
 [<ffffffff81098be3>] lock_acquire+0xee/0x12e
 [<ffffffff8107d3c9>] ? flush_workqueue+0x0/0xd2
 [<ffffffff8107d3c9>] ? flush_workqueue+0x0/0xd2
 [<ffffffff8107d439>] flush_workqueue+0x70/0xd2
 [<ffffffff8107d3c9>] ? flush_workqueue+0x0/0xd2
 [<ffffffff8107d754>] ? queue_work_on+0x59/0x74
 [<ffffffff8140dc2d>] chunk_io+0xf5/0x118
 [<ffffffff8140da49>] ? do_metadata+0x0/0x56
 [<ffffffff8140dcb8>] write_header+0x68/0x7e
 [<ffffffff8140dcfa>] persistent_drop_snapshot+0x2c/0x54
 [<ffffffff8140b3ad>] __invalidate_snapshot+0x5d/0x95
 [<ffffffff8140b935>] origin_map+0x2a6/0x3f2
 [<ffffffff81402966>] __map_bio+0x4c/0x121
 [<ffffffff81403da7>] __split_and_process_bio+0x278/0x537
 [<ffffffff810956f1>] ? lock_release_holdtime+0x3f/0x146
 [<ffffffff814042ee>] ? dm_request+0x66/0x234
 [<ffffffff81404493>] dm_request+0x20b/0x234
 [<ffffffff8125dfe5>] generic_make_request+0x29e/0x2fc
 [<ffffffff8125e126>] submit_bio+0xe3/0x100
 [<ffffffff811681ed>] submit_bh+0x103/0x13c
 [<ffffffff8116ae91>] __block_write_full_page+0x1e5/0x2e3
 [<ffffffff8116dd5d>] ? blkdev_get_block+0x0/0x8d
 [<ffffffff8116b765>] ? end_buffer_async_write+0x0/0x175
 [<ffffffff8116b765>] ? end_buffer_async_write+0x0/0x175
 [<ffffffff8116dd5d>] ? blkdev_get_block+0x0/0x8d
 [<ffffffff8116b027>] block_write_full_page_endio+0x98/0xbb
 [<ffffffff8116b072>] block_write_full_page+0x28/0x3e
 [<ffffffff8116eef3>] blkdev_writepage+0x2b/0x41
 [<ffffffff811029e8>] __writepage+0x29/0x5f
 [<ffffffff81103239>] write_cache_pages+0x27b/0x415
 [<ffffffff811029bf>] ? __writepage+0x0/0x5f
 [<ffffffff8110340b>] generic_writepages+0x38/0x4e
 [<ffffffff8110345e>] do_writepages+0x3d/0x63
 [<ffffffff810fb8bc>] __filemap_fdatawrite_range+0x60/0x76
 [<ffffffff810fc1cc>] filemap_fdatawrite+0x32/0x48
 [<ffffffff810fc215>] filemap_write_and_wait+0x33/0x64
 [<ffffffff8116f2b6>] __sync_blockdev+0x43/0x59
 [<ffffffff8116f2f2>] sync_blockdev+0x26/0x3c
 [<ffffffff8116f3bc>] __blkdev_put+0x6c/0x161
 [<ffffffff8116f4d4>] blkdev_put+0x23/0x39
 [<ffffffff8116f53b>] blkdev_close+0x51/0x6c
 [<ffffffff81142f75>] __fput+0x137/0x200
 [<ffffffff8114306b>] fput+0x2d/0x43
 [<ffffffff8113f250>] filp_close+0x77/0x97
 [<ffffffff8113f330>] sys_close+0xc0/0x110
 [<ffffffff81012f42>] system_call_fastpath+0x16/0x1b


Version-Release number of selected component (if applicable):
kernel-2.6.31-0.174.rc7.git2.fc12
  RHEL6.0-20090909.7  	

How reproducible:
appears to be always, have not tried on any arch other than x86_64 due to installation issues

Steps to Reproduce:
1.Start an RHTS job with test 
2./kernel/storage/lvm/snapshot_remove
3.
  
Actual results:
see above as well as:
https://rhts.redhat.com/testlogs/2009/09/88907/259139/2109459/9959401-console_dmesg--kernel-storage-lvm-snapshot_remove.log

Expected results:
device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.
should be the only kernel messages present while this test runs.

Additional info:

also noted on one run that lvcreate seg-faulted, I'm not sure if it is related or a separate issue (it occured much later than the circular lock dependancy warning)

Need someone familiar with LVM to determine if this should be a blocker or not. Milan Broz wrote the original test case that became this RHTS test.

Comment 1 RHEL Program Management 2009-09-14 16:37:09 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 2 Mike Gahagan 2009-09-14 16:45:19 UTC
bz 523267 opened for the lvmcreate segfault issue, seems to have hit a NULL
pointer bug in the kernel.

Comment 3 Mike Gahagan 2010-07-06 18:51:37 UTC
Unable to reproduce this on the current shapshot 7 candidate tree (-42 kernel)