Bug 195392
Summary: | mirror creation cmd hangs | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4 | CC: | agk, cfeist, dwysocha, mbroz |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-08-05 21:31:50 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 180185, 180281 |
Description
Corey Marthaler
2006-06-14 23:02:22 UTC
please retest with kernel > 39.EL This bug also manifests itself when a device dies in a cluster mirror. The cluster log improperly decrements the sync count - leaving other nodes to hang and give I/O errors. Maybe this isn't just due to an 'off by one error' with the sync count. I supposedly have the lastest cmirror images and even though the mirror status appears to show that the mirror has been completed, the lvcreate cmd hangs. [root@taft-03 ~]# lvcreate -n mirror4 -L 10G -m 1 corey [HANG] [...] Jun 30 05:21:04 taft-02 kernel: device-mapper: 175 out-of-sync regions remaining for FYOOXN2. Jun 30 05:21:04 taft-02 kernel: device-mapper: 150 out-of-sync regions remaining for FYOOXN2. Jun 30 05:21:04 taft-02 kernel: device-mapper: 125 out-of-sync regions remaining for FYOOXN2. Jun 30 05:21:05 taft-02 kernel: device-mapper: 100 out-of-sync regions remaining for FYOOXN2. Jun 30 05:21:05 taft-02 kernel: device-mapper: 75 out-of-sync regions remaining for FYOOXN2T. Jun 30 05:21:05 taft-02 kernel: device-mapper: 50 out-of-sync regions remaining for FYOOXN2T. Jun 30 05:21:06 taft-02 kernel: device-mapper: 25 out-of-sync regions remaining for FYOOXN2T. Jun 30 05:21:06 taft-02 kernel: device-mapper: 5 out-of-sync regions remaining for FYOOXN2T. Jun 30 05:21:06 taft-02 kernel: device-mapper: 4 out-of-sync regions remaining for FYOOXN2T. Jun 30 05:21:06 taft-02 kernel: device-mapper: 3 out-of-sync regions remaining for FYOOXN2T. Jun 30 05:21:06 taft-02 kernel: device-mapper: 2 out-of-sync regions remaining for FYOOXN2T. Jun 30 05:21:06 taft-02 kernel: device-mapper: 1 out-of-sync regions remaining for FYOOXN2T. Jun 30 05:21:06 taft-02 kernel: device-mapper: 0 out-of-sync regions remaining for FYOOXN2T. [root@taft-04 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% mirror1 corey mwi-a- 10.00G mirror1_mlog 100.00 mirror3 corey mwi-a- 100.00G mirror2_mlog 100.00 mirror4 corey mwi-a- 10.00G mirror4_mlog 100.00 [root@taft-04 ~]# dmsetup status corey-mirror1_mlog: 0 8192 linear corey-mirror4: 0 20971520 mirror 2 253:11 253:12 20480/20480 1 AA 3 clustered_disk 253:10 A corey-mirror4_mimage_1: 0 20971520 linear corey-mirror2_mlog: 0 8192 linear corey-mirror3: 0 209715200 mirror 2 253:7 253:8 204800/204800 1 AA 3 clustered_disk 253:6 A corey-mirror4_mimage_0: 0 20971520 linear corey-mirror2_mimage_1: 0 209715200 linear corey-mirror1: 0 20971520 mirror 2 253:3 253:4 20480/20480 1 AA 3 clustered_disk 253:2 A corey-mirror2_mimage_0: 0 209715200 linear corey-mirror4_mlog: 0 8192 linear corey-mirror1_mimage_1: 0 20971520 linear corey-mirror1_mimage_0: 0 20971520 linear [root@taft-01 ~]# uname -ar Linux taft-01 2.6.9-39.1.ELsmp #1 SMP Fri Jun 16 16:47:43 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux [root@taft-01 ~]# rpm -q device-mapper device-mapper-1.02.07-2.0.RHEL4 [root@taft-01 ~]# rpm -q lvm2 lvm2-2.02.06-3.0.RHEL4 [root@taft-01 ~]# rpm -q lvm2-cluster lvm2-cluster-2.02.06-1.0.RHEL4 [root@taft-01 ~]# rpm -q cmirror cmirror-1.0.0-5 [root@taft-01 ~]# rpm -q cmirror-kernel cmirror-kernel-2.6.9-8.0 I've verified that this isn't caused by an off by one issue. I was testing the fix for bz 193596 to see if the sync restarts correctly after recovery or server relocation and the fix does appear to work, however, even though the volume appears to be finished, the original lvcreate command continues to hang. 4 D root 4675 4120 0 77 0 - 15217 blockd 05:05 pts/0 00:00:00 lvcreate -L 1G -n coreymirror2 -m 1 corey Jul 5 05:18:47 taft-03 kernel: clvmd X 000001020cd1df18 0 4363 4289 4668 (L-TLB) Jul 5 05:18:47 taft-03 kernel: 000001020cd1def8 0000000000000046 00000102179d83c0 00000102159767f0 Jul 5 05:18:47 taft-03 kernel: 0000000000000046 ffffffff80142a3f 0000000000000011 0000000300040001 Jul 5 05:18:47 taft-03 kernel: 00000102159767f0 000000000011e5f9 Jul 5 05:18:47 taft-03 kernel: Call Trace:<ffffffff80142a3f>{do_notify_parent+382} <ffffffff80178d80>{filp_close+103} Jul 5 05:18:47 taft-03 kernel: <ffffffff80309f08>{__cond_resched+23} <ffffffff8013a745>{do_exit+3137} Jul 5 05:18:47 taft-03 kernel: <ffffffff8013a856>{sys_exit_group+0} <ffffffff8011024a>{system_call+126} Jul 5 05:18:47 taft-03 kernel: Jul 5 05:18:47 taft-03 kernel: cluster_log_s S ffffffff803094a8 0 4628 1 4676 4289 (L-TLB) Jul 5 05:18:47 taft-03 kernel: 000001020cb3b988 0000000000000046 ffffffff803d4000 0000000000000246 Jul 5 05:18:47 taft-03 kernel: 0000000000000001 0000000900000000 000000f800001000 0000000006aee000 Jul 5 05:18:47 taft-03 kernel: 000001020a8a67f0 00000000000000e8 Jul 5 05:18:47 taft-03 kernel: Call Trace:<ffffffff8030a830>{schedule_timeout+224} <ffffffff802afe87>{dev_queue_xmit+525} Jul 5 05:18:47 taft-03 kernel: <ffffffff8013560a>{prepare_to_wait_exclusive+21} <ffffffff802acbc5>{skb_recv_datagram+373} Jul 5 05:18:47 taft-03 kernel: <ffffffff801356b2>{autoremove_wake_function+0} <ffffffff802cdee9>{ip_push_pending_frames+855} Jul 5 05:18:47 taft-03 kernel: <ffffffff801356b2>{autoremove_wake_function+0} <ffffffff802e91ec>{udp_recvmsg+118} Jul 5 05:18:47 taft-03 kernel: <ffffffff802aa6dd>{sock_common_recvmsg+48} <ffffffff802a7204>{sock_recvmsg+284} Jul 5 05:18:47 taft-03 kernel: <ffffffff80133d5a>{__wake_up_common+67} <ffffffff80133db0>{__wake_up+54} Jul 5 05:18:47 taft-03 kernel: <ffffffff801356b2>{autoremove_wake_function+0} <ffffffff8013fc14>{__mod_timer+293} Jul 5 05:18:47 taft-03 kernel: <ffffffffa026d0a4>{:dm_cmirror:my_recvmsg+265} <ffffffffa026cf8c>{:dm_cmirror:set_sigusr1+0} Jul 5 05:18:47 taft-03 kernel: <ffffffffa026cf8c>{:dm_cmirror:set_sigusr1+0} <ffffffffa026c528>{:dm_cmirror:cluster_log_serverd+735} Jul 5 05:18:47 taft-03 kernel: <ffffffff80133d09>{default_wake_function+0} <ffffffff80110f27>{child_rip+8} Jul 5 05:18:47 taft-03 kernel: <ffffffffa026c249>{:dm_cmirror:cluster_log_serverd+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff80110f1f>{child_rip+0} Jul 5 05:18:47 taft-03 kernel: kcopyd S ffffffffa0094c84 0 4667 10 88 (L-TLB) Jul 5 05:18:47 taft-03 kernel: 000001020a751e68 0000000000000046 000001020a751de8 ffffffff80133db0 Jul 5 05:18:47 taft-03 kernel: 0000000000000000 000001020b32e748 0000000000000000 000000010b0ca310 Jul 5 05:18:47 taft-03 kernel: 0000010037c68030 0000000000000319 Jul 5 05:18:47 taft-03 kernel: Call Trace:<ffffffff80133db0>{__wake_up+54} <ffffffffa0094c84>{:dm_mod:do_work+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff801476f1>{worker_thread+226} <ffffffff80133d09>{default_wake_function+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff80133d09>{default_wake_function+0} <ffffffff8014b450>{keventd_create_kthread+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff8014760f>{worker_thread+0} <ffffffff8014b450>{keventd_create_kthread+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff8014b427>{kthread+200} <ffffffff80110f27>{child_rip+8} Jul 5 05:18:47 taft-03 kernel: <ffffffff8014b450>{keventd_create_kthread+0} <ffffffff8014b35f>{kthread+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff80110f1f>{child_rip+0} Jul 5 05:18:47 taft-03 kernel: clvmd S 0000000000000010 0 4668 4289 4669 4363 (NOTLB) Jul 5 05:18:47 taft-03 kernel: 000001020aec5d78 0000000000000006 0000010006937030 0000001900000078 Jul 5 05:18:47 taft-03 kernel: 0000010212942030 0000000000000078 000000d001079a40 0000000100000246 Jul 5 05:18:47 taft-03 kernel: 000001020a9367f0 000000000000048a Jul 5 05:18:47 taft-03 kernel: Call Trace:<ffffffff8013fc14>{__mod_timer+293} <ffffffff8030a8bf>{schedule_timeout+367} Jul 5 05:18:47 taft-03 kernel: <ffffffff8014063e>{process_timeout+0} <ffffffff8018b73b>{do_select+939} Jul 5 05:18:47 taft-03 kernel: <ffffffff8018b2d5>{__pollwait+0} <ffffffff8018baba>{sys_select+820} Jul 5 05:18:47 taft-03 kernel: <ffffffff80193210>{dnotify_parent+34} <ffffffff8011024a>{system_call+126} Jul 5 05:18:47 taft-03 kernel: <ffffffff8011024a>{system_call+126} Jul 5 05:18:47 taft-03 kernel: clvmd S ffffffffa0093074 0 4669 4289 4714 4668 (NOTLB) Jul 5 05:18:47 taft-03 kernel: 000001020d2d1cf8 0000000000000006 0000010214d2bf4c ffffffff803050ff Jul 5 05:18:47 taft-03 kernel: 0000000000200200 0000000000000000 000001020d2d1cc8 0000000000000000 Jul 5 05:18:47 taft-03 kernel: 00000102175a2030 00000000000145be Jul 5 05:18:47 taft-03 kernel: Call Trace:<ffffffff803050ff>{unix_dgram_sendmsg+1059} <ffffffffa0093074>{:dm_mod:dev_wait+0} Jul 5 05:18:47 taft-03 kernel: <ffffffffa00906ce>{:dm_mod:dm_wait_event+151} <ffffffff801356b2>{autoremove_wake_function+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff801356b2>{autoremove_wake_function+0} <ffffffff801ea679>{__up_read+16} Jul 5 05:18:47 taft-03 kernel: <ffffffffa009316b>{:dm_mod:dev_wait+247} <ffffffffa0093c34>{:dm_mod:ctl_ioctl+600} Jul 5 05:18:47 taft-03 kernel: <ffffffff80181c97>{sys_newstat+32} <ffffffff8018ad41>{sys_ioctl+853} Jul 5 05:18:47 taft-03 kernel: <ffffffff8011024a>{system_call+126} Jul 5 05:18:47 taft-03 kernel: clvmd S ffffffffa0093074 0 4714 4289 4669 (NOTLB) Jul 5 05:18:47 taft-03 kernel: 000001020b7a1cf8 0000000000000006 0000010214d2bf4c ffffffff803050ff Jul 5 05:18:47 taft-03 kernel: 0000000000200200 0000000000000000 000001020b7a1cc8 0000000000000000 Jul 5 05:18:47 taft-03 kernel: 000001020d6f7030 0000000000011d57 Jul 5 05:18:47 taft-03 kernel: Call Trace:<ffffffff803050ff>{unix_dgram_sendmsg+1059} <ffffffffa0093074>{:dm_mod:dev_wait+0} Jul 5 05:18:47 taft-03 kernel: <ffffffffa00906ce>{:dm_mod:dm_wait_event+151} <ffffffff801356b2>{autoremove_wake_function+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff801356b2>{autoremove_wake_function+0} <ffffffff801ea679>{__up_read+16} Jul 5 05:18:47 taft-03 kernel: <ffffffffa009316b>{:dm_mod:dev_wait+247} <ffffffffa0093c34>{:dm_mod:ctl_ioctl+600} Jul 5 05:18:47 taft-03 kernel: <ffffffff80181c97>{sys_newstat+32} <ffffffff8018ad41>{sys_ioctl+853} Jul 5 05:18:47 taft-03 kernel: <ffffffff8011024a>{system_call+126} Jul 5 05:18:47 taft-03 kernel: lvcreate D 00000102179c7488 0 4675 4120 (NOTLB) Jul 5 05:18:47 taft-03 kernel: 000001020cd77bb8 0000000000000006 00000101fffb47e0 0000000000000008 Jul 5 05:18:47 taft-03 kernel: 000001020bbc7700 ffffffff8024f7d2 ffffffff801356b2 000000030cd77b20 Jul 5 05:18:47 taft-03 kernel: 0000010213ea1030 0000000000005312 Jul 5 05:18:47 taft-03 kernel: Call Trace:<ffffffff8024f7d2>{generic_make_request+355} <ffffffff801356b2>{autoremove_wake_function+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff8024e1f5>{generic_unplug_device+24} <ffffffffa00916bc>{:dm_mod:dm_table_unplug_all+49} Jul 5 05:18:47 taft-03 kernel: <ffffffff8030a713>{io_schedule+38} <ffffffff8019abd3>{__blockdev_direct_IO+3005} Jul 5 05:18:47 taft-03 kernel: <ffffffff8017fe24>{blkdev_direct_IO+48} <ffffffff8017fd79>{blkdev_get_blocks+0} Jul 5 05:18:47 taft-03 kernel: <ffffffff8015bb1e>{generic_file_direct_IO+78} <ffffffff8015bbab>{generic_file_direct_write+103} Jul 5 05:18:47 taft-03 kernel: <ffffffff8015bec8>{__generic_file_aio_write_nolock+662} Jul 5 05:18:47 taft-03 kernel: <ffffffff8015c1ab>{generic_file_aio_write_nolock+32} Jul 5 05:18:47 taft-03 kernel: <ffffffff8015c379>{generic_file_write_nolock+158} <ffffffff8015c6ae>{generic_file_read+187} Jul 5 05:18:47 taft-03 kernel: <ffffffff801356b2>{autoremove_wake_function+0} <ffffffff80193210>{dnotify_parent+34} Jul 5 05:18:47 taft-03 kernel: <ffffffff80180cc0>{blkdev_file_write+26} <ffffffff801795f8>{vfs_write+207} Jul 5 05:18:47 taft-03 kernel: <ffffffff801796e0>{sys_write+69} <ffffffff8011024a>{system_call+126} I'm trying to create a mirror in system-config-lvm and I ran into the same problem. In my case I wasn't doing clustered mirrors. After the mirror sync completed, lvcreate did not return. 7401 ? SLsl 0:00 /usr/sbin/lvcreate -n mir -l 2171 -m1 vg /dev/sdb1 /d 7464 pts/0 R+ 0:00 ps ax [root@tank-01 ~]# strace -p 7401 Process 7401 attached - interrupt to quit select(11, [10], NULL, NULL, {0, 340000}) = 0 (Timeout) select(11, [10], NULL, NULL, {1, 0}) = 0 (Timeout) select(11, [10], NULL, NULL, {1, 0}) = 0 (Timeout) select(11, [10], NULL, NULL, {1, 0} <unfinished ...> Process 7401 detached [root@tank-01 ~]# lsof -p 7401 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME lvcreate 7401 root cwd DIR 253,0 4096 2 / lvcreate 7401 root rtd DIR 253,0 4096 2 / lvcreate 7401 root txt REG 253,0 513680 4092763 /usr/sbin/lvm lvcreate 7401 root mem REG 253,0 6824 2976199 /lib/libdevmapper-event-lvm2mirror.so.2.02 lvcreate 7401 root mem REG 253,0 487432 4092493 /usr/lib/liblvm2cmd.so.2.02 lvcreate 7401 root mem REG 253,0 106397 2977177 /lib/ld-2.3.4.so lvcreate 7401 root mem REG 253,0 1454802 2977178 /lib/tls/libc-2.3.4.so lvcreate 7401 root mem REG 253,0 15324 2977180 /lib/libdl-2.3.4.so lvcreate 7401 root mem REG 253,0 53736 2976157 /lib/libsepol.so.1 lvcreate 7401 root mem REG 253,0 59876 2977197 /lib/libdevmapper.so.1.02 lvcreate 7401 root mem REG 253,0 21960 2976183 /lib/libdevmapper-event.so.1.02 lvcreate 7401 root mem REG 253,0 93985 2977181 /lib/tls/libpthread-2.3.4.so lvcreate 7401 root mem REG 253,0 170320 4095373 /usr/lib/libreadline.so.4.3 lvcreate 7401 root mem REG 253,0 56328 2976144 /lib/libselinux.so.1 lvcreate 7401 root mem REG 253,0 879961 4092483 /usr/lib/libncurses.so.5.4 lvcreate 7401 root 0u CHR 136,0 2 /dev/pts/0 lvcreate 7401 root 1w FIFO 0,7 20978 pipe lvcreate 7401 root 2w FIFO 0,7 20979 pipe lvcreate 7401 root 3u FIFO 253,0 3272290 /var/run/dmeventd-server lvcreate 7401 root 4u REG 253,0 0 3270738 /var/lock/lvm/V_vg (deleted) lvcreate 7401 root 5u BLK 8,1 952 /dev/sda1 lvcreate 7401 root 6u BLK 8,17 1001 /dev/sdb1 lvcreate 7401 root 7u BLK 8,33 1050 /dev/sdc1 lvcreate 7401 root 8uW REG 253,0 5 3272300 /var/run/dmeventd.pid lvcreate 7401 root 9w FIFO 253,0 3272290 /var/run/dmeventd-server lvcreate 7401 root 10u FIFO 253,0 3270744 /var/run/dmeventd-client lvcreate 7401 root 11u unix 0xf6989500 21164 socket lvcreate 7401 root 12u CHR 10,63 1060 /dev/mapper/control /var/log/messages Jul 5 16:58:29 tank-01 lvm[7401]: Monitoring mirror device, vg-mir for events Jul 5 17:06:09 tank-01 lvm[7401]: vg-mir is now in-sync this no longer looks like a kernel issue, but it does smell like some issues I was seeing when dmeventd was not responding to requests. If so, this has been fixed under the guise of another bug as well. Marking modified... Still seeing this, though not as freguently anymore. Instead of failing 100% of the time, it now seems to only fail around 10% of the time. Is it possible to get a verbose output from the create command? moving to modified to draw more attention. Have not seen this issue with the latest build. This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being marked as a blocker for this release. Please resolve ASAP. This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being marked as a blocker for this release. Please resolve ASAP. Closing as this has been fixed in the current (4.7) release. |