Bug 353211 - deadlock during mirror creation (:dm_mod:dm_table_unplug_all)
deadlock during mirror creation (:dm_mod:dm_table_unplug_all)
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
low Severity low
: ---
: ---
Assigned To: LVM and device-mapper development team
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-25 16:35 EDT by Corey Marthaler
Modified: 2008-07-24 12:32 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-24 12:32:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
device mapper info (21.58 KB, text/plain)
2007-10-25 16:41 EDT, Corey Marthaler
no flags Details
backtraces from link-02 (16.67 KB, text/plain)
2007-10-25 16:50 EDT, Corey Marthaler
no flags Details
backtraces from link-07 (17.90 KB, text/plain)
2007-10-25 17:09 EDT, Corey Marthaler
no flags Details
backtraces from link-08 (17.04 KB, text/plain)
2007-10-25 17:09 EDT, Corey Marthaler
no flags Details
backtraces from grant-01 (10.78 KB, text/plain)
2007-10-25 17:10 EDT, Corey Marthaler
no flags Details
backtraces from grant-02 (11.79 KB, text/plain)
2007-10-25 17:10 EDT, Corey Marthaler
no flags Details
backtraces from grant-03 (9.04 KB, text/plain)
2007-10-25 17:11 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2007-10-25 16:35:05 EDT
Description of problem:
This appears to be strangely similar to bz 195392 which mysteriously was never
seen again. I saw this deadlock while attempting to create a mirror. This was
the 30th volume (all linears, stripes, and mirrors) to be created.

lvcreate -m 1 -n mirror5 -L 200M activator2

lvcreate      D 0000010121a84c88     0  6303   6379                     (NOTLB)
0000010116bdfbb8 0000000000000006 0000000000000000 ffffffff8025a595
000001021b87e980 0000000000000000 000001021b87e980 0000000180250fd4
0000010118be1030 000000000000e05f
Call Trace:
<ffffffff8025a595>{cfq_next_request+59}
<ffffffff802526fb>{generic_unplug_device+2
<ffffffffa00bcc89>{:dm_mod:dm_table_unplug_all+49}
<ffffffff8030fb36>{io_schedule+38}
<ffffffff8019d386>{__blockdev_direct_IO+2819}
<ffffffff80181968>{blkdev_direct_IO+48}
<ffffffff801818bd>{blkdev_get_blocks+0}
<ffffffff8015cb7a>{generic_file_direct_IO+78}
<ffffffff8015cc00>{generic_file_direct_w
<ffffffff8015cf3c>{__generic_file_aio_write_nolock+662}
<ffffffff8015d21f>{generic_file_aio_write_nolock+32}
<ffffffff8015d3ed>{generic_file_write_nolock+158}
<ffffffff8015d719>{generic_file_read
<ffffffff80136020>{autoremove_wake_function+0}
<ffffffff80195990>{dnotify_parent+34}
<ffffffff80182804>{blkdev_file_write+26}
<ffffffff8017af0e>{vfs_write+207}
<ffffffff8017aff6>{sys_write+69}
<ffffffff8011026a>{system_call+126}

More info to come...


Version-Release number of selected component (if applicable):
2.6.9-63.ELsmp
lvm2-2.02.27-2.el4
lvm2-cluster-2.02.27-2.el4
cmirror-1.0.1-1
device-mapper-1.02.21-1.el4
cmirror-kernel-smp-2.6.9-38.3
Comment 1 Corey Marthaler 2007-10-25 16:38:11 EDT
[root@grant-02 ~]# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    6   M   link-02
   2    1    6   M   grant-03
   3    1    6   M   link-07
   4    1    6   M   grant-01
   5    1    6   M   grant-02
   6    1    6   M   link-08
[root@grant-02 ~]# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       -
[3 1 4 5 6 2]

DLM Lock Space:  "clvmd"                             3   3 run       -
[3 1 4 5 6 2]

DLM Lock Space:  "clustered_log"                     5   4 run       -
[3 2 1 4 5 6]

Comment 2 Corey Marthaler 2007-10-25 16:41:00 EDT
Created attachment 237971 [details]
device mapper info
Comment 3 Corey Marthaler 2007-10-25 16:50:59 EDT
Created attachment 238001 [details]
backtraces from link-02
Comment 4 Corey Marthaler 2007-10-25 17:09:03 EDT
Created attachment 238031 [details]
backtraces from link-07
Comment 5 Corey Marthaler 2007-10-25 17:09:26 EDT
Created attachment 238041 [details]
backtraces from link-08
Comment 6 Corey Marthaler 2007-10-25 17:10:17 EDT
Created attachment 238051 [details]
backtraces from grant-01
Comment 7 Corey Marthaler 2007-10-25 17:10:42 EDT
Created attachment 238061 [details]
backtraces from grant-02
Comment 8 Corey Marthaler 2007-10-25 17:11:56 EDT
Created attachment 238071 [details]
backtraces from grant-03
Comment 9 Corey Marthaler 2007-11-06 10:22:46 EST
I was able to reproduce this bug, again while running activator. The following
cmd resulted in a very similar backtrace to the one in comment #0

lvcreate -m 1 -n mirror5 -L 200M activator4

lvcreate      D 00000101218c3488     0 16373  17292                     (NOTLB)
        000001020ee6fbb8 0000000000000002 0000010112949520 0000000000000008
        000001020ff85400 ffffffff80253c9d ffffffff80136020 000000020ee6fb20
        0000010113f37030 0000000000003dd0
Call Trace:<ffffffff80253c9d>{generic_make_request+355}
<ffffffff80136020>{autoremove_wake_function+0}
<ffffffff802526fb>{generic_unplug_device+24}
<ffffffffa00bcc89>{:dm_mod:dm_table_unplug_all+49}
<ffffffff8030fb36>{io_schedule+38}
<ffffffff8019d386>{__blockdev_direct_IO+2819}
<ffffffff80181968>{blkdev_direct_IO+48}
<ffffffff801818bd>{blkdev_get_blocks+0}
<ffffffff8015cb7a>{generic_file_direct_IO+78}
<ffffffff8015cc00>{generic_file_direct_write+96}
<ffffffff8015cf3c>{__generic_file_aio_write_nolock+662}
<ffffffff8015d21f>{generic_file_aio_write_nolock+32}
<ffffffff8015d3ed>{generic_file_write_nolock+158}
<ffffffff8015d719>{generic_file_read+187}
<ffffffff80136020>{autoremove_wake_function+0}
<ffffffff80195990>{dnotify_parent+34}
<ffffffff80182804>{blkdev_file_write+26}
<ffffffff8017af0e>{vfs_write+207}
<ffffffff8017aff6>{sys_write+69}
<ffffffff8011026a>{system_call+126}
Comment 10 Corey Marthaler 2007-12-10 15:22:37 EST
Just a note that this bz is still present on the new 4.6.z lvm rpms.

lvm2-2.02.27-2.el4_6.1/lvm2-cluster-2.02.27-2.el4_6.1

000001020150fbb8 0000000000000006 00000101e7a3ee80 0000000000000008
00000101fe3cc800 ffffffff80253c9d ffffffff80136020 000000030150fb20
00000101f51e8030 0000000000005963
Call Trace:
        <ffffffff80253c9d>{generic_make_request+355}
        <ffffffff80136020>{autoremove_wake_function+0}
        <ffffffff802526fb>{generic_unplug_device+24}
        <ffffffffa003fc89>{:dm_mod:dm_table_unplug_all+49}
        <ffffffff8030fb36>{io_schedule+38}
        <ffffffff8019d386>{__blockdev_direct_IO+2819}
        <ffffffff80181968>{blkdev_direct_IO+48}
        <ffffffff801818bd>{blkdev_get_blocks+0}
        <ffffffff8015cb7a>{generic_file_direct_IO+78}
        <ffffffff8015cc00>{generic_file_direct_write+96}
        <ffffffff8015cf3c>{__generic_file_aio_write_nolock+662}
        <ffffffff8015d21f>{generic_file_aio_write_nolock+32}
        <ffffffff8015d3ed>{generic_file_write_nolock+158}
        <ffffffff8015d719>{generic_file_read+187}
        <ffffffff80136020>{autoremove_wake_function+0}
        <ffffffff80195990>{dnotify_parent+34}
        <ffffffff80182804>{blkdev_file_write+26}
        <ffffffff8017af0e>{vfs_write+207}
        <ffffffff8017aff6>{sys_write+69}
        <ffffffff8011026a>{system_call+126}
Comment 12 Jonathan Earl Brassow 2008-03-20 12:13:47 EDT
Was there anything in /var/log/messages that would suggest that the cmirror code
is stuck?
Comment 13 Jonathan Earl Brassow 2008-03-26 14:07:30 EDT
The following post on dm-devel seems strangely familiar to this bug (but has
nothing to do with mirroring).

https://www.redhat.com/archives/dm-devel/2008-March/msg00136.html
Comment 14 Jonathan Earl Brassow 2008-04-01 11:32:09 EDT
The message from dm-devel created with crypt target (but is likely below that,
as this issue is affecting mirror too).
Comment 17 Corey Marthaler 2008-07-01 11:23:09 EDT
Have not seen any cmirror creation deadlocks lately. Marking verified.
Comment 18 Chris Feist 2008-07-24 12:32:30 EDT
Closing as the latest code has been released in 4.7

Note You need to log in before you can comment on or make changes to this bug.