Description of problem: This appears to be strangely similar to bz 195392 which mysteriously was never seen again. I saw this deadlock while attempting to create a mirror. This was the 30th volume (all linears, stripes, and mirrors) to be created. lvcreate -m 1 -n mirror5 -L 200M activator2 lvcreate D 0000010121a84c88 0 6303 6379 (NOTLB) 0000010116bdfbb8 0000000000000006 0000000000000000 ffffffff8025a595 000001021b87e980 0000000000000000 000001021b87e980 0000000180250fd4 0000010118be1030 000000000000e05f Call Trace: <ffffffff8025a595>{cfq_next_request+59} <ffffffff802526fb>{generic_unplug_device+2 <ffffffffa00bcc89>{:dm_mod:dm_table_unplug_all+49} <ffffffff8030fb36>{io_schedule+38} <ffffffff8019d386>{__blockdev_direct_IO+2819} <ffffffff80181968>{blkdev_direct_IO+48} <ffffffff801818bd>{blkdev_get_blocks+0} <ffffffff8015cb7a>{generic_file_direct_IO+78} <ffffffff8015cc00>{generic_file_direct_w <ffffffff8015cf3c>{__generic_file_aio_write_nolock+662} <ffffffff8015d21f>{generic_file_aio_write_nolock+32} <ffffffff8015d3ed>{generic_file_write_nolock+158} <ffffffff8015d719>{generic_file_read <ffffffff80136020>{autoremove_wake_function+0} <ffffffff80195990>{dnotify_parent+34} <ffffffff80182804>{blkdev_file_write+26} <ffffffff8017af0e>{vfs_write+207} <ffffffff8017aff6>{sys_write+69} <ffffffff8011026a>{system_call+126} More info to come... Version-Release number of selected component (if applicable): 2.6.9-63.ELsmp lvm2-2.02.27-2.el4 lvm2-cluster-2.02.27-2.el4 cmirror-1.0.1-1 device-mapper-1.02.21-1.el4 cmirror-kernel-smp-2.6.9-38.3
[root@grant-02 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 6 M link-02 2 1 6 M grant-03 3 1 6 M link-07 4 1 6 M grant-01 5 1 6 M grant-02 6 1 6 M link-08 [root@grant-02 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 2 2 run - [3 1 4 5 6 2] DLM Lock Space: "clvmd" 3 3 run - [3 1 4 5 6 2] DLM Lock Space: "clustered_log" 5 4 run - [3 2 1 4 5 6]
Created attachment 237971 [details] device mapper info
Created attachment 238001 [details] backtraces from link-02
Created attachment 238031 [details] backtraces from link-07
Created attachment 238041 [details] backtraces from link-08
Created attachment 238051 [details] backtraces from grant-01
Created attachment 238061 [details] backtraces from grant-02
Created attachment 238071 [details] backtraces from grant-03
I was able to reproduce this bug, again while running activator. The following cmd resulted in a very similar backtrace to the one in comment #0 lvcreate -m 1 -n mirror5 -L 200M activator4 lvcreate D 00000101218c3488 0 16373 17292 (NOTLB) 000001020ee6fbb8 0000000000000002 0000010112949520 0000000000000008 000001020ff85400 ffffffff80253c9d ffffffff80136020 000000020ee6fb20 0000010113f37030 0000000000003dd0 Call Trace:<ffffffff80253c9d>{generic_make_request+355} <ffffffff80136020>{autoremove_wake_function+0} <ffffffff802526fb>{generic_unplug_device+24} <ffffffffa00bcc89>{:dm_mod:dm_table_unplug_all+49} <ffffffff8030fb36>{io_schedule+38} <ffffffff8019d386>{__blockdev_direct_IO+2819} <ffffffff80181968>{blkdev_direct_IO+48} <ffffffff801818bd>{blkdev_get_blocks+0} <ffffffff8015cb7a>{generic_file_direct_IO+78} <ffffffff8015cc00>{generic_file_direct_write+96} <ffffffff8015cf3c>{__generic_file_aio_write_nolock+662} <ffffffff8015d21f>{generic_file_aio_write_nolock+32} <ffffffff8015d3ed>{generic_file_write_nolock+158} <ffffffff8015d719>{generic_file_read+187} <ffffffff80136020>{autoremove_wake_function+0} <ffffffff80195990>{dnotify_parent+34} <ffffffff80182804>{blkdev_file_write+26} <ffffffff8017af0e>{vfs_write+207} <ffffffff8017aff6>{sys_write+69} <ffffffff8011026a>{system_call+126}
Just a note that this bz is still present on the new 4.6.z lvm rpms. lvm2-2.02.27-2.el4_6.1/lvm2-cluster-2.02.27-2.el4_6.1 000001020150fbb8 0000000000000006 00000101e7a3ee80 0000000000000008 00000101fe3cc800 ffffffff80253c9d ffffffff80136020 000000030150fb20 00000101f51e8030 0000000000005963 Call Trace: <ffffffff80253c9d>{generic_make_request+355} <ffffffff80136020>{autoremove_wake_function+0} <ffffffff802526fb>{generic_unplug_device+24} <ffffffffa003fc89>{:dm_mod:dm_table_unplug_all+49} <ffffffff8030fb36>{io_schedule+38} <ffffffff8019d386>{__blockdev_direct_IO+2819} <ffffffff80181968>{blkdev_direct_IO+48} <ffffffff801818bd>{blkdev_get_blocks+0} <ffffffff8015cb7a>{generic_file_direct_IO+78} <ffffffff8015cc00>{generic_file_direct_write+96} <ffffffff8015cf3c>{__generic_file_aio_write_nolock+662} <ffffffff8015d21f>{generic_file_aio_write_nolock+32} <ffffffff8015d3ed>{generic_file_write_nolock+158} <ffffffff8015d719>{generic_file_read+187} <ffffffff80136020>{autoremove_wake_function+0} <ffffffff80195990>{dnotify_parent+34} <ffffffff80182804>{blkdev_file_write+26} <ffffffff8017af0e>{vfs_write+207} <ffffffff8017aff6>{sys_write+69} <ffffffff8011026a>{system_call+126}
Was there anything in /var/log/messages that would suggest that the cmirror code is stuck?
The following post on dm-devel seems strangely familiar to this bug (but has nothing to do with mirroring). https://www.redhat.com/archives/dm-devel/2008-March/msg00136.html
The message from dm-devel created with crypt target (but is likely below that, as this issue is affecting mirror too).
Have not seen any cmirror creation deadlocks lately. Marking verified.
Closing as the latest code has been released in 4.7