Bug 353211

Summary: deadlock during mirror creation (:dm_mod:dm_table_unplug_all)
Product: Red Hat Enterprise Linux 4 Reporter: Corey Marthaler <cmarthal>
Component: kernelAssignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 4.0CC: agk, jbrassow, mbroz, mpatocka
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 16:32:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
device mapper info
none
backtraces from link-02
none
backtraces from link-07
none
backtraces from link-08
none
backtraces from grant-01
none
backtraces from grant-02
none
backtraces from grant-03 none

Description Corey Marthaler 2007-10-25 20:35:05 UTC
Description of problem:
This appears to be strangely similar to bz 195392 which mysteriously was never
seen again. I saw this deadlock while attempting to create a mirror. This was
the 30th volume (all linears, stripes, and mirrors) to be created.

lvcreate -m 1 -n mirror5 -L 200M activator2

lvcreate      D 0000010121a84c88     0  6303   6379                     (NOTLB)
0000010116bdfbb8 0000000000000006 0000000000000000 ffffffff8025a595
000001021b87e980 0000000000000000 000001021b87e980 0000000180250fd4
0000010118be1030 000000000000e05f
Call Trace:
<ffffffff8025a595>{cfq_next_request+59}
<ffffffff802526fb>{generic_unplug_device+2
<ffffffffa00bcc89>{:dm_mod:dm_table_unplug_all+49}
<ffffffff8030fb36>{io_schedule+38}
<ffffffff8019d386>{__blockdev_direct_IO+2819}
<ffffffff80181968>{blkdev_direct_IO+48}
<ffffffff801818bd>{blkdev_get_blocks+0}
<ffffffff8015cb7a>{generic_file_direct_IO+78}
<ffffffff8015cc00>{generic_file_direct_w
<ffffffff8015cf3c>{__generic_file_aio_write_nolock+662}
<ffffffff8015d21f>{generic_file_aio_write_nolock+32}
<ffffffff8015d3ed>{generic_file_write_nolock+158}
<ffffffff8015d719>{generic_file_read
<ffffffff80136020>{autoremove_wake_function+0}
<ffffffff80195990>{dnotify_parent+34}
<ffffffff80182804>{blkdev_file_write+26}
<ffffffff8017af0e>{vfs_write+207}
<ffffffff8017aff6>{sys_write+69}
<ffffffff8011026a>{system_call+126}

More info to come...


Version-Release number of selected component (if applicable):
2.6.9-63.ELsmp
lvm2-2.02.27-2.el4
lvm2-cluster-2.02.27-2.el4
cmirror-1.0.1-1
device-mapper-1.02.21-1.el4
cmirror-kernel-smp-2.6.9-38.3

Comment 1 Corey Marthaler 2007-10-25 20:38:11 UTC
[root@grant-02 ~]# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    6   M   link-02
   2    1    6   M   grant-03
   3    1    6   M   link-07
   4    1    6   M   grant-01
   5    1    6   M   grant-02
   6    1    6   M   link-08
[root@grant-02 ~]# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       -
[3 1 4 5 6 2]

DLM Lock Space:  "clvmd"                             3   3 run       -
[3 1 4 5 6 2]

DLM Lock Space:  "clustered_log"                     5   4 run       -
[3 2 1 4 5 6]



Comment 2 Corey Marthaler 2007-10-25 20:41:00 UTC
Created attachment 237971 [details]
device mapper info

Comment 3 Corey Marthaler 2007-10-25 20:50:59 UTC
Created attachment 238001 [details]
backtraces from link-02

Comment 4 Corey Marthaler 2007-10-25 21:09:03 UTC
Created attachment 238031 [details]
backtraces from link-07

Comment 5 Corey Marthaler 2007-10-25 21:09:26 UTC
Created attachment 238041 [details]
backtraces from link-08

Comment 6 Corey Marthaler 2007-10-25 21:10:17 UTC
Created attachment 238051 [details]
backtraces from grant-01

Comment 7 Corey Marthaler 2007-10-25 21:10:42 UTC
Created attachment 238061 [details]
backtraces from grant-02

Comment 8 Corey Marthaler 2007-10-25 21:11:56 UTC
Created attachment 238071 [details]
backtraces from grant-03

Comment 9 Corey Marthaler 2007-11-06 15:22:46 UTC
I was able to reproduce this bug, again while running activator. The following
cmd resulted in a very similar backtrace to the one in comment #0

lvcreate -m 1 -n mirror5 -L 200M activator4

lvcreate      D 00000101218c3488     0 16373  17292                     (NOTLB)
        000001020ee6fbb8 0000000000000002 0000010112949520 0000000000000008
        000001020ff85400 ffffffff80253c9d ffffffff80136020 000000020ee6fb20
        0000010113f37030 0000000000003dd0
Call Trace:<ffffffff80253c9d>{generic_make_request+355}
<ffffffff80136020>{autoremove_wake_function+0}
<ffffffff802526fb>{generic_unplug_device+24}
<ffffffffa00bcc89>{:dm_mod:dm_table_unplug_all+49}
<ffffffff8030fb36>{io_schedule+38}
<ffffffff8019d386>{__blockdev_direct_IO+2819}
<ffffffff80181968>{blkdev_direct_IO+48}
<ffffffff801818bd>{blkdev_get_blocks+0}
<ffffffff8015cb7a>{generic_file_direct_IO+78}
<ffffffff8015cc00>{generic_file_direct_write+96}
<ffffffff8015cf3c>{__generic_file_aio_write_nolock+662}
<ffffffff8015d21f>{generic_file_aio_write_nolock+32}
<ffffffff8015d3ed>{generic_file_write_nolock+158}
<ffffffff8015d719>{generic_file_read+187}
<ffffffff80136020>{autoremove_wake_function+0}
<ffffffff80195990>{dnotify_parent+34}
<ffffffff80182804>{blkdev_file_write+26}
<ffffffff8017af0e>{vfs_write+207}
<ffffffff8017aff6>{sys_write+69}
<ffffffff8011026a>{system_call+126}


Comment 10 Corey Marthaler 2007-12-10 20:22:37 UTC
Just a note that this bz is still present on the new 4.6.z lvm rpms.

lvm2-2.02.27-2.el4_6.1/lvm2-cluster-2.02.27-2.el4_6.1

000001020150fbb8 0000000000000006 00000101e7a3ee80 0000000000000008
00000101fe3cc800 ffffffff80253c9d ffffffff80136020 000000030150fb20
00000101f51e8030 0000000000005963
Call Trace:
        <ffffffff80253c9d>{generic_make_request+355}
        <ffffffff80136020>{autoremove_wake_function+0}
        <ffffffff802526fb>{generic_unplug_device+24}
        <ffffffffa003fc89>{:dm_mod:dm_table_unplug_all+49}
        <ffffffff8030fb36>{io_schedule+38}
        <ffffffff8019d386>{__blockdev_direct_IO+2819}
        <ffffffff80181968>{blkdev_direct_IO+48}
        <ffffffff801818bd>{blkdev_get_blocks+0}
        <ffffffff8015cb7a>{generic_file_direct_IO+78}
        <ffffffff8015cc00>{generic_file_direct_write+96}
        <ffffffff8015cf3c>{__generic_file_aio_write_nolock+662}
        <ffffffff8015d21f>{generic_file_aio_write_nolock+32}
        <ffffffff8015d3ed>{generic_file_write_nolock+158}
        <ffffffff8015d719>{generic_file_read+187}
        <ffffffff80136020>{autoremove_wake_function+0}
        <ffffffff80195990>{dnotify_parent+34}
        <ffffffff80182804>{blkdev_file_write+26}
        <ffffffff8017af0e>{vfs_write+207}
        <ffffffff8017aff6>{sys_write+69}
        <ffffffff8011026a>{system_call+126}


Comment 12 Jonathan Earl Brassow 2008-03-20 16:13:47 UTC
Was there anything in /var/log/messages that would suggest that the cmirror code
is stuck?


Comment 13 Jonathan Earl Brassow 2008-03-26 18:07:30 UTC
The following post on dm-devel seems strangely familiar to this bug (but has
nothing to do with mirroring).

https://www.redhat.com/archives/dm-devel/2008-March/msg00136.html


Comment 14 Jonathan Earl Brassow 2008-04-01 15:32:09 UTC
The message from dm-devel created with crypt target (but is likely below that,
as this issue is affecting mirror too).

Comment 17 Corey Marthaler 2008-07-01 15:23:09 UTC
Have not seen any cmirror creation deadlocks lately. Marking verified.

Comment 18 Chris Feist 2008-07-24 16:32:30 UTC
Closing as the latest code has been released in 4.7