Bug 195392

Summary: mirror creation cmd hangs
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 4CC: agk, cfeist, dwysocha, mbroz
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-05 21:31:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 180185, 180281    

Description Corey Marthaler 2006-06-14 23:02:22 UTC
Description of problem:
[root@taft-04 ~]# lvcreate -m 1 -L 800M -n mirror3 corey
[HANG]

It gets stuck at 99.5% because the sync finish count is off by one:

mirror3 corey mwi-a- 800.00M                    mirror3_mlog  99.50

corey-mirror3: 0 1638400 mirror 2 253:11 253:12 1599/1600 1 AA 3 clustered_disk
253:10 A

Version-Release number of selected component (if applicable):
[root@taft-02 tmp]# rpm -q cmirror
cmirror-1.0.0-5
[root@taft-02 tmp]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-5.4


How reproducible:
often

Comment 2 Jonathan Earl Brassow 2006-06-21 21:28:47 UTC
please retest with kernel > 39.EL

Comment 3 Jonathan Earl Brassow 2006-06-27 20:40:10 UTC
This bug also manifests itself when a device dies in a cluster mirror.  The
cluster log improperly decrements the sync count - leaving other nodes to hang
and give I/O errors.


Comment 4 Corey Marthaler 2006-06-30 16:31:56 UTC
Maybe this isn't just due to an 'off by one error' with the sync count.

I supposedly have the lastest cmirror images and even though the mirror status
appears to show that the mirror has been completed, the lvcreate cmd hangs.



[root@taft-03 ~]# lvcreate -n mirror4 -L 10G -m 1 corey
[HANG]


[...]
Jun 30 05:21:04 taft-02 kernel: device-mapper: 175 out-of-sync regions remaining
for FYOOXN2.
Jun 30 05:21:04 taft-02 kernel: device-mapper: 150 out-of-sync regions remaining
for FYOOXN2.
Jun 30 05:21:04 taft-02 kernel: device-mapper: 125 out-of-sync regions remaining
for FYOOXN2.
Jun 30 05:21:05 taft-02 kernel: device-mapper: 100 out-of-sync regions remaining
for FYOOXN2.
Jun 30 05:21:05 taft-02 kernel: device-mapper: 75 out-of-sync regions remaining
for FYOOXN2T.
Jun 30 05:21:05 taft-02 kernel: device-mapper: 50 out-of-sync regions remaining
for FYOOXN2T.
Jun 30 05:21:06 taft-02 kernel: device-mapper: 25 out-of-sync regions remaining
for FYOOXN2T.
Jun 30 05:21:06 taft-02 kernel: device-mapper: 5 out-of-sync regions remaining
for FYOOXN2T.
Jun 30 05:21:06 taft-02 kernel: device-mapper: 4 out-of-sync regions remaining
for FYOOXN2T.
Jun 30 05:21:06 taft-02 kernel: device-mapper: 3 out-of-sync regions remaining
for FYOOXN2T.
Jun 30 05:21:06 taft-02 kernel: device-mapper: 2 out-of-sync regions remaining
for FYOOXN2T.
Jun 30 05:21:06 taft-02 kernel: device-mapper: 1 out-of-sync regions remaining
for FYOOXN2T.
Jun 30 05:21:06 taft-02 kernel: device-mapper: 0 out-of-sync regions remaining
for FYOOXN2T.


[root@taft-04 ~]# lvs
  LV      VG    Attr   LSize   Origin Snap%  Move Log          Copy%
  mirror1 corey mwi-a-  10.00G                    mirror1_mlog 100.00
  mirror3 corey mwi-a- 100.00G                    mirror2_mlog 100.00
  mirror4 corey mwi-a-  10.00G                    mirror4_mlog 100.00


[root@taft-04 ~]# dmsetup status
corey-mirror1_mlog: 0 8192 linear
corey-mirror4: 0 20971520 mirror 2 253:11 253:12 20480/20480 1 AA 3
clustered_disk 253:10 A
corey-mirror4_mimage_1: 0 20971520 linear
corey-mirror2_mlog: 0 8192 linear
corey-mirror3: 0 209715200 mirror 2 253:7 253:8 204800/204800 1 AA 3
clustered_disk 253:6 A
corey-mirror4_mimage_0: 0 20971520 linear
corey-mirror2_mimage_1: 0 209715200 linear
corey-mirror1: 0 20971520 mirror 2 253:3 253:4 20480/20480 1 AA 3 clustered_disk
253:2 A
corey-mirror2_mimage_0: 0 209715200 linear
corey-mirror4_mlog: 0 8192 linear
corey-mirror1_mimage_1: 0 20971520 linear
corey-mirror1_mimage_0: 0 20971520 linear


[root@taft-01 ~]# uname -ar
Linux taft-01 2.6.9-39.1.ELsmp #1 SMP Fri Jun 16 16:47:43 EDT 2006 x86_64 x86_64
x86_64 GNU/Linux

[root@taft-01 ~]# rpm -q device-mapper
device-mapper-1.02.07-2.0.RHEL4
[root@taft-01 ~]# rpm -q lvm2
lvm2-2.02.06-3.0.RHEL4
[root@taft-01 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.06-1.0.RHEL4
[root@taft-01 ~]# rpm -q cmirror
cmirror-1.0.0-5
[root@taft-01 ~]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-8.0



Comment 5 Corey Marthaler 2006-07-05 16:20:41 UTC
I've verified that this isn't caused by an off by one issue. I was testing the
fix for bz 193596 to see if the sync restarts correctly after recovery or server
relocation and the fix does appear to work, however, even though the volume
appears to be finished, the original lvcreate command continues to hang.



4 D root      4675  4120  0  77   0 - 15217 blockd 05:05 pts/0    00:00:00
lvcreate -L 1G -n coreymirror2 -m 1 corey



Jul  5 05:18:47 taft-03 kernel: clvmd         X 000001020cd1df18     0  4363  
4289          4668       (L-TLB)
Jul  5 05:18:47 taft-03 kernel: 000001020cd1def8 0000000000000046
00000102179d83c0 00000102159767f0
Jul  5 05:18:47 taft-03 kernel:        0000000000000046 ffffffff80142a3f
0000000000000011 0000000300040001
Jul  5 05:18:47 taft-03 kernel:        00000102159767f0 000000000011e5f9
Jul  5 05:18:47 taft-03 kernel: Call
Trace:<ffffffff80142a3f>{do_notify_parent+382} <ffffffff80178d80>{filp_close+103}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff80309f08>{__cond_resched+23}
<ffffffff8013a745>{do_exit+3137}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8013a856>{sys_exit_group+0}
<ffffffff8011024a>{system_call+126}
Jul  5 05:18:47 taft-03 kernel:
Jul  5 05:18:47 taft-03 kernel: cluster_log_s S ffffffff803094a8     0  4628   
  1          4676  4289 (L-TLB)
Jul  5 05:18:47 taft-03 kernel: 000001020cb3b988 0000000000000046
ffffffff803d4000 0000000000000246
Jul  5 05:18:47 taft-03 kernel:        0000000000000001 0000000900000000
000000f800001000 0000000006aee000
Jul  5 05:18:47 taft-03 kernel:        000001020a8a67f0 00000000000000e8
Jul  5 05:18:47 taft-03 kernel: Call
Trace:<ffffffff8030a830>{schedule_timeout+224}
<ffffffff802afe87>{dev_queue_xmit+525}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff8013560a>{prepare_to_wait_exclusive+21}
<ffffffff802acbc5>{skb_recv_datagram+373}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff801356b2>{autoremove_wake_function+0}
<ffffffff802cdee9>{ip_push_pending_frames+855}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff801356b2>{autoremove_wake_function+0} <ffffffff802e91ec>{udp_recvmsg+118}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff802aa6dd>{sock_common_recvmsg+48} <ffffffff802a7204>{sock_recvmsg+284}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff80133d5a>{__wake_up_common+67}
<ffffffff80133db0>{__wake_up+54}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff801356b2>{autoremove_wake_function+0} <ffffffff8013fc14>{__mod_timer+293}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffffa026d0a4>{:dm_cmirror:my_recvmsg+265}
<ffffffffa026cf8c>{:dm_cmirror:set_sigusr1+0}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffffa026cf8c>{:dm_cmirror:set_sigusr1+0}
<ffffffffa026c528>{:dm_cmirror:cluster_log_serverd+735}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff80133d09>{default_wake_function+0} <ffffffff80110f27>{child_rip+8}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffffa026c249>{:dm_cmirror:cluster_log_serverd+0}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff80110f1f>{child_rip+0}
Jul  5 05:18:47 taft-03 kernel: kcopyd        S ffffffffa0094c84     0  4667   
 10                  88 (L-TLB)
Jul  5 05:18:47 taft-03 kernel: 000001020a751e68 0000000000000046
000001020a751de8 ffffffff80133db0
Jul  5 05:18:47 taft-03 kernel:        0000000000000000 000001020b32e748
0000000000000000 000000010b0ca310
Jul  5 05:18:47 taft-03 kernel:        0000010037c68030 0000000000000319
Jul  5 05:18:47 taft-03 kernel: Call Trace:<ffffffff80133db0>{__wake_up+54}
<ffffffffa0094c84>{:dm_mod:do_work+0}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff801476f1>{worker_thread+226}
<ffffffff80133d09>{default_wake_function+0}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff80133d09>{default_wake_function+0}
<ffffffff8014b450>{keventd_create_kthread+0}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8014760f>{worker_thread+0}
<ffffffff8014b450>{keventd_create_kthread+0}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8014b427>{kthread+200}
<ffffffff80110f27>{child_rip+8}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff8014b450>{keventd_create_kthread+0} <ffffffff8014b35f>{kthread+0}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff80110f1f>{child_rip+0}
Jul  5 05:18:47 taft-03 kernel: clvmd         S 0000000000000010     0  4668  
4289          4669  4363 (NOTLB)
Jul  5 05:18:47 taft-03 kernel: 000001020aec5d78 0000000000000006
0000010006937030 0000001900000078
Jul  5 05:18:47 taft-03 kernel:        0000010212942030 0000000000000078
000000d001079a40 0000000100000246
Jul  5 05:18:47 taft-03 kernel:        000001020a9367f0 000000000000048a
Jul  5 05:18:47 taft-03 kernel: Call Trace:<ffffffff8013fc14>{__mod_timer+293}
<ffffffff8030a8bf>{schedule_timeout+367}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8014063e>{process_timeout+0}
<ffffffff8018b73b>{do_select+939}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8018b2d5>{__pollwait+0}
<ffffffff8018baba>{sys_select+820}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff80193210>{dnotify_parent+34}
<ffffffff8011024a>{system_call+126}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8011024a>{system_call+126}
Jul  5 05:18:47 taft-03 kernel: clvmd         S ffffffffa0093074     0  4669  
4289          4714  4668 (NOTLB)
Jul  5 05:18:47 taft-03 kernel: 000001020d2d1cf8 0000000000000006
0000010214d2bf4c ffffffff803050ff
Jul  5 05:18:47 taft-03 kernel:        0000000000200200 0000000000000000
000001020d2d1cc8 0000000000000000
Jul  5 05:18:47 taft-03 kernel:        00000102175a2030 00000000000145be
Jul  5 05:18:47 taft-03 kernel: Call
Trace:<ffffffff803050ff>{unix_dgram_sendmsg+1059}
<ffffffffa0093074>{:dm_mod:dev_wait+0}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffffa00906ce>{:dm_mod:dm_wait_event+151}
<ffffffff801356b2>{autoremove_wake_function+0}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff801356b2>{autoremove_wake_function+0} <ffffffff801ea679>{__up_read+16}
Jul  5 05:18:47 taft-03 kernel:        <ffffffffa009316b>{:dm_mod:dev_wait+247}
<ffffffffa0093c34>{:dm_mod:ctl_ioctl+600}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff80181c97>{sys_newstat+32}
<ffffffff8018ad41>{sys_ioctl+853}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8011024a>{system_call+126}
Jul  5 05:18:47 taft-03 kernel: clvmd         S ffffffffa0093074     0  4714  
4289                4669 (NOTLB)
Jul  5 05:18:47 taft-03 kernel: 000001020b7a1cf8 0000000000000006
0000010214d2bf4c ffffffff803050ff
Jul  5 05:18:47 taft-03 kernel:        0000000000200200 0000000000000000
000001020b7a1cc8 0000000000000000
Jul  5 05:18:47 taft-03 kernel:        000001020d6f7030 0000000000011d57
Jul  5 05:18:47 taft-03 kernel: Call
Trace:<ffffffff803050ff>{unix_dgram_sendmsg+1059}
<ffffffffa0093074>{:dm_mod:dev_wait+0}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffffa00906ce>{:dm_mod:dm_wait_event+151}
<ffffffff801356b2>{autoremove_wake_function+0}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff801356b2>{autoremove_wake_function+0} <ffffffff801ea679>{__up_read+16}
Jul  5 05:18:47 taft-03 kernel:        <ffffffffa009316b>{:dm_mod:dev_wait+247}
<ffffffffa0093c34>{:dm_mod:ctl_ioctl+600}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff80181c97>{sys_newstat+32}
<ffffffff8018ad41>{sys_ioctl+853}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8011024a>{system_call+126}
Jul  5 05:18:47 taft-03 kernel: lvcreate      D 00000102179c7488     0  4675  
4120                     (NOTLB)
Jul  5 05:18:47 taft-03 kernel: 000001020cd77bb8 0000000000000006
00000101fffb47e0 0000000000000008
Jul  5 05:18:47 taft-03 kernel:        000001020bbc7700 ffffffff8024f7d2
ffffffff801356b2 000000030cd77b20
Jul  5 05:18:47 taft-03 kernel:        0000010213ea1030 0000000000005312
Jul  5 05:18:47 taft-03 kernel: Call
Trace:<ffffffff8024f7d2>{generic_make_request+355}
<ffffffff801356b2>{autoremove_wake_function+0}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff8024e1f5>{generic_unplug_device+24}
<ffffffffa00916bc>{:dm_mod:dm_table_unplug_all+49}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8030a713>{io_schedule+38}
<ffffffff8019abd3>{__blockdev_direct_IO+3005}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff8017fe24>{blkdev_direct_IO+48}
<ffffffff8017fd79>{blkdev_get_blocks+0}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff8015bb1e>{generic_file_direct_IO+78}
<ffffffff8015bbab>{generic_file_direct_write+103}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff8015bec8>{__generic_file_aio_write_nolock+662}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff8015c1ab>{generic_file_aio_write_nolock+32}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff8015c379>{generic_file_write_nolock+158}
<ffffffff8015c6ae>{generic_file_read+187}
Jul  5 05:18:47 taft-03 kernel:       
<ffffffff801356b2>{autoremove_wake_function+0} <ffffffff80193210>{dnotify_parent+34}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff80180cc0>{blkdev_file_write+26}
<ffffffff801795f8>{vfs_write+207}
Jul  5 05:18:47 taft-03 kernel:        <ffffffff801796e0>{sys_write+69}
<ffffffff8011024a>{system_call+126}


Comment 6 Nate Straz 2006-07-05 22:00:49 UTC
I'm trying to create a mirror in system-config-lvm and I ran into the same
problem.  In my case I wasn't doing clustered mirrors.  After the mirror sync
completed, lvcreate did not return.

 7401 ?        SLsl   0:00 /usr/sbin/lvcreate -n mir -l 2171 -m1 vg /dev/sdb1 /d
 7464 pts/0    R+     0:00 ps ax
[root@tank-01 ~]# strace -p 7401
Process 7401 attached - interrupt to quit
select(11, [10], NULL, NULL, {0, 340000}) = 0 (Timeout)
select(11, [10], NULL, NULL, {1, 0})    = 0 (Timeout)
select(11, [10], NULL, NULL, {1, 0})    = 0 (Timeout)
select(11, [10], NULL, NULL, {1, 0} <unfinished ...>
Process 7401 detached
[root@tank-01 ~]# lsof -p 7401
COMMAND   PID USER   FD   TYPE     DEVICE    SIZE    NODE NAME
lvcreate 7401 root  cwd    DIR      253,0    4096       2 /
lvcreate 7401 root  rtd    DIR      253,0    4096       2 /
lvcreate 7401 root  txt    REG      253,0  513680 4092763 /usr/sbin/lvm
lvcreate 7401 root  mem    REG      253,0    6824 2976199
/lib/libdevmapper-event-lvm2mirror.so.2.02
lvcreate 7401 root  mem    REG      253,0  487432 4092493
/usr/lib/liblvm2cmd.so.2.02
lvcreate 7401 root  mem    REG      253,0  106397 2977177 /lib/ld-2.3.4.so
lvcreate 7401 root  mem    REG      253,0 1454802 2977178 /lib/tls/libc-2.3.4.so
lvcreate 7401 root  mem    REG      253,0   15324 2977180 /lib/libdl-2.3.4.so
lvcreate 7401 root  mem    REG      253,0   53736 2976157 /lib/libsepol.so.1
lvcreate 7401 root  mem    REG      253,0   59876 2977197 /lib/libdevmapper.so.1.02
lvcreate 7401 root  mem    REG      253,0   21960 2976183
/lib/libdevmapper-event.so.1.02
lvcreate 7401 root  mem    REG      253,0   93985 2977181
/lib/tls/libpthread-2.3.4.so
lvcreate 7401 root  mem    REG      253,0  170320 4095373
/usr/lib/libreadline.so.4.3
lvcreate 7401 root  mem    REG      253,0   56328 2976144 /lib/libselinux.so.1
lvcreate 7401 root  mem    REG      253,0  879961 4092483 /usr/lib/libncurses.so.5.4
lvcreate 7401 root    0u   CHR      136,0               2 /dev/pts/0
lvcreate 7401 root    1w  FIFO        0,7           20978 pipe
lvcreate 7401 root    2w  FIFO        0,7           20979 pipe
lvcreate 7401 root    3u  FIFO      253,0         3272290 /var/run/dmeventd-server
lvcreate 7401 root    4u   REG      253,0       0 3270738 /var/lock/lvm/V_vg
(deleted)
lvcreate 7401 root    5u   BLK        8,1             952 /dev/sda1
lvcreate 7401 root    6u   BLK       8,17            1001 /dev/sdb1
lvcreate 7401 root    7u   BLK       8,33            1050 /dev/sdc1
lvcreate 7401 root    8uW  REG      253,0       5 3272300 /var/run/dmeventd.pid
lvcreate 7401 root    9w  FIFO      253,0         3272290 /var/run/dmeventd-server
lvcreate 7401 root   10u  FIFO      253,0         3270744 /var/run/dmeventd-client
lvcreate 7401 root   11u  unix 0xf6989500           21164 socket
lvcreate 7401 root   12u   CHR      10,63            1060 /dev/mapper/control

/var/log/messages
Jul  5 16:58:29 tank-01 lvm[7401]: Monitoring mirror device, vg-mir for events 
Jul  5 17:06:09 tank-01 lvm[7401]: vg-mir is now in-sync 


Comment 7 Jonathan Earl Brassow 2006-07-06 03:34:36 UTC
this no longer looks like a kernel issue, but it does smell like some issues I was seeing when dmeventd 
was not responding to requests.  If so, this has been fixed under the guise of another bug as well.

Marking modified...

Comment 8 Corey Marthaler 2006-07-06 22:28:19 UTC
Still seeing this, though not as freguently anymore. 
Instead of failing 100% of the time, it now seems to only fail around 10% of the
time.

Comment 9 Jonathan Earl Brassow 2006-07-13 14:30:49 UTC
Is it possible to get a verbose output from the create command?


Comment 11 Jonathan Earl Brassow 2006-08-07 20:01:56 UTC
moving to modified to draw more attention.

Comment 14 Corey Marthaler 2006-11-02 23:14:59 UTC
Have not seen this issue with the latest build.

Comment 15 RHEL Program Management 2006-12-12 17:17:00 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being marked as a blocker for this release.  

Please resolve ASAP.

Comment 16 RHEL Program Management 2006-12-12 17:17:00 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being marked as a blocker for this release.  

Please resolve ASAP.

Comment 17 Chris Feist 2008-08-05 21:31:50 UTC
Closing as this has been fixed in the current (4.7) release.