+++ This bug was initially created as a clone of Bug #290821 +++ Description of problem: This seems like the similar setup to many cmirror deadlock bugs, however, I couldn't find anything that had failed during the cmirror leg recovery. The down convert worked properly, device-mapper shows everything as normal, and the cluster services all appear fine. The issue however is that a sync attempt after the down convert is hung. ================================================================================ Iteration 2.5 started at Fri Sep 14 04:29:15 CDT 2007 ================================================================================ Senario: Kill secondary leg of synced 3 leg mirror(s) ****** Mirror hash info for this scenario ****** * name: fail_secondary_synced_3_legs_3_mirrors * sync: 1 * mirrors: 3 * disklog: 1 * failpv: /dev/sdb1 * legs: 3 * pvs: /dev/sda1 /dev/sdb1 /dev/sde1 /dev/sdf1 ************************************************ Creating mirror(s) on link-02... qarsh root@link-02 lvcreate -m 2 -n fail_secondary_synced_3_legs_3_mirrors_1 -L 800M helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150 qarsh root@link-02 lvcreate -m 2 -n fail_secondary_synced_3_legs_3_mirrors_2 -L 800M helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150 qarsh root@link-02 lvcreate -m 2 -n fail_secondary_synced_3_legs_3_mirrors_3 -L 800M helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150 Waiting for mirror(s) to sync Verifying fully syncd mirror(s), currently at ...1.50% ...8.00% ...14.00% ...20.00% ...26.00% ...32.50% ...38.50% ...44.00% ...50.00% ...56.00% ...61.50% ...67.00% ...73.00% ...79.00% ...85.50% ...91.50% ...98.50% ...100.00% Creating gfs on top of mirror(s) on link-02... Mounting mirrored gfs filesystems on link-02... Mounting mirrored gfs filesystems on link-07... Mounting mirrored gfs filesystems on link-08... Writing verification files (checkit) to mirror(s) on... link-02: checkit starting with: CREATE Num files: 100 Random Seed: 14271 Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_1 Working dir: /mnt/fail_secondary_synced_3_legs_3_mirrors_1/checkit checkit starting with: CREATE Num files: 100 Random Seed: 14280 Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_2 Working dir: /mnt/fail_secondary_synced_3_legs_3_mirrors_2/checkit checkit starting with: CREATE Num files: 100 Random Seed: 14284 Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_3 Working dir: /mnt/fail_secondary_synced_3_legs_3_mirrors_3/checkit link-07: checkit starting with: CREATE Num files: 100 Random Seed: 16157 Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_1 Working dir: /mnt/fail_secondary_synced_3_legs_3_mirrors_1/checkit checkit starting with: CREATE Num files: 100 Random Seed: 16161 Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_2 Working dir: /mnt/fail_secondary_synced_3_legs_3_mirrors_2/checkit checkit starting with: CREATE Num files: 100 Random Seed: 16165 Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_3 Working dir: /mnt/fail_secondary_synced_3_legs_3_mirrors_3/checkit link-08: checkit starting with: CREATE Num files: 100 Random Seed: 12543 Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_1 Working dir: /mnt/fail_secondary_synced_3_legs_3_mirrors_1/checkit checkit starting with: CREATE Num files: 100 Random Seed: 12547 Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_2 Working dir: /mnt/fail_secondary_synced_3_legs_3_mirrors_2/checkit checkit starting with: CREATE Num files: 100 Random Seed: 12556 Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_3 Working dir: /mnt/fail_secondary_synced_3_legs_3_mirrors_3/checkit Starting the io load (collie/xdoio) on mirror(s) Sleeping 15 seconds to get some I/O locks outstanding before the failure Disabling device sdb on link-02 Disabling device sdb on link-07 Disabling device sdb on link-08 Attempting I/O to cause mirror down conversion(s) on link-02 10+0 records in 10+0 records out # Here is where that 'sync' is attempted and got deadlocked. Link-02: [root@link-02 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 2 2 run - [1 2 3] DLM Lock Space: "clvmd" 2675 1132 run - [2 1 3] DLM Lock Space: "clustered_log" 2746 1165 run - [1 2 3] DLM Lock Space: "gfs1" 2748 1166 run - [2 1 3] DLM Lock Space: "gfs2" 2752 1168 run - [2 1 3] DLM Lock Space: "gfs3" 2756 1170 run - [2 1 3] GFS Mount Group: "gfs1" 2750 1167 run - [2 1 3] GFS Mount Group: "gfs2" 2754 1169 run - [2 1 3] GFS Mount Group: "gfs3" 2758 1171 run - [2 1 3] [root@link-02 ~]# dmsetup status helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mlog: 0 8192 linear helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mlog: 0 8192 linear helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_2: 0 1638400 linear helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_2: 0 1638400 linear VolGroup00-LogVol01: 0 4063232 linear helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_0: 0 1638400 linear VolGroup00-LogVol00: 0 73728000 linear helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3: 0 1638400 mirror 2 253:13 253:15 1600/1600 1 AA 3 clustered_disk 253:12 A helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_2: 0 1638400 linear helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_0: 0 1638400 linear helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2: 0 1638400 mirror 2 253:8 253:10 1600/1600 1 AA 3 clustered_disk 253:7 A helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1: 0 1638400 mirror 2 253:3 253:5 1600/1600 1 AA 3 clustered_disk 253:2 A helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_0: 0 1638400 linear helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mlog: 0 8192 linear [root@link-02 ~]# dmsetup info Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mlog State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 7 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVI3qQH0U9UjhwKZHUzzgPN7R1ZQW45GGJ Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mlog State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 12 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV25JE4xYwJVn0llu3kqOaIEpX8tdHPenW Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 15 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVR6EfGtaXjwxaoQGnSfnC66VSW0hBhjyE Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 10 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVTyiSLeh2fY2EIpOEyzl2u6k4x6qfklLv Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-0mpjJpcho7U2NPluVVHwoJpLj1Z6a2VcYYhWLYsKSE0wcbNdieVGH93CfCLFc0k2 Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 13 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVj4bTmA43jRm4R0K1aUra31NlyjNqgb4a Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-0mpjJpcho7U2NPluVVHwoJpLj1Z6a2VcVVEtseAZ98h5LgRR3eX7h8BC4SRKAyfs Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 15 Major, minor: 253, 16 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVAO3QPlqt90O6nXsOPaxU9olaJDxqhuun Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV7d3ysXmWYErnxQwG41JD8ALrGN5WvUsB Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 8 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVljFethBA69K96UT1TYWd1ELlAXBy4aip Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 71 Major, minor: 253, 11 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV2a6mclvdud8zV0JMRsJfE05A947cn0X9 Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 65 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV0sJSIYsAxbuekl6atsx1SuyZInr5IUx3 Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVpdKOIcTwn7t9Z56Juus0PoHrTgjspGIN Name: helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mlog State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVrZUgNOjo5xBDeUYcO6BthcjwlMheAX9f This is the stuck sync process: Sep 14 08:54:34 link-02 kernel: sync R running task 0 14341 14340 (NOTLB) Sep 14 08:54:34 link-02 kernel: kmirrord S 0000010037e0bf48 0 14342 6 14228 (L-TLB) Sep 14 08:54:34 link-02 kernel: 0000010014633e68 0000000000000046 000001000ed217f0 0000010000000073 Sep 14 08:54:34 link-02 kernel: 0000000000000046 000000003eb84760 000001000101ba80 0000000137e0be00 Sep 14 08:54:34 link-02 kernel: 000001003a7f17f0 00000000000002cb Sep 14 08:54:34 link-02 kernel: Call Trace:<ffffffffa00d06a2>{:dm_mirror:do_work+0} <ffffffff80148091>{worker_thread+226} Sep 14 08:54:34 link-02 kernel: <ffffffff8013459d>{default_wake_function+0} <ffffffff801345ee>{__wake_up_common+67} Sep 14 08:54:34 link-02 kernel: <ffffffff8013459d>{default_wake_function+0} <ffffffff8014bea0>{keventd_create_kthread+0} Sep 14 08:54:34 link-02 kernel: <ffffffff80147faf>{worker_thread+0} <ffffffff8014bea0>{keventd_create_kthread+0} Sep 14 08:54:34 link-02 kernel: <ffffffff8014be77>{kthread+200} <ffffffff80110f47>{child_rip+8} Sep 14 08:54:34 link-02 kernel: <ffffffff8014bea0>{keventd_create_kthread+0} <ffffffff8014bdaf>{kthread+0} Sep 14 08:54:34 link-02 kernel: <ffffffff80110f3f>{child_rip+0} I'll attach the rest of the logs. Version-Release number of selected component (if applicable): 2.6.9-56.ELsmp cmirror-1.0.1-1 cmirror-kernel-2.6.9-34.1 How reproducible: only once so far -- Additional comment from cmarthal on 2007-09-14 14:33 EST -- Created an attachment (id=196081) stack traces from link-07 -- Additional comment from cmarthal on 2007-09-14 15:02 EST -- Created an attachment (id=196101) stack traces from link-08 -- Additional comment from cmarthal on 2007-09-14 15:05 EST -- Created an attachment (id=196121) stack traces from link-02 -- Additional comment from cmarthal on 2007-09-25 09:41 EST -- I appear to have reproduced this bug and this may be related to bz 238629. The test hung in the same spot, during a sync, right after a leg failure: ================================================================================ Iteration 0.7 started at Mon Sep 24 17:45:04 CDT 2007 ================================================================================ Senario: Kill secondary leg of non synced 2 leg mirror(s) ****** Mirror hash info for this scenario ****** * name: fail_secondary_non_synced_2_legs_1_mirrors * sync: 0 * mirrors: 1 * disklog: 1 * failpv: /dev/sde1 * legs: 2 * pvs: /dev/sda1 /dev/sde1 /dev/sdf1 ************************************************ Creating mirror(s) on link-02... qarsh root@link-02 lvcreate -m 1 -n fail_secondary_non_synced_2_legs_1_mirrors_1 -L 800M helter_skelter /dev/sda1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150 mirror(s) currently at 5.50% synced right now Creating gfs on top of mirror(s) on link-02... Mounting mirrored gfs filesystems on link-02... Mounting mirrored gfs filesystems on link-07... Mounting mirrored gfs filesystems on link-08... Writing verification files (checkit) to mirror(s) on... link-02: checkit starting with: CREATE Num files: 100 Random Seed: 12193 Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs_1_mirrors_1 Working dir: /mnt/fail_secondary_non_synced_2_legs_1_mirrors_1/checkit link-07: checkit starting with: CREATE Num files: 100 Random Seed: 11603 Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs_1_mirrors_1 Working dir: /mnt/fail_secondary_non_synced_2_legs_1_mirrors_1/checkit link-08: checkit starting with: CREATE Num files: 100 Random Seed: 12456 Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs_1_mirrors_1 Working dir: /mnt/fail_secondary_non_synced_2_legs_1_mirrors_1/checkit Starting the io load (collie/xdoio) on mirror(s) Sleeping 15 seconds to get some I/O locks outstanding before the failure Disabling device sde on link-02 Disabling device sde on link-07 Disabling device sde on link-08 Attempting I/O to cause mirror down conversion(s) on link-02 10+0 records in 10+0 records out [Here's where the sync hangs] The reason I think that this is bz 238629 is due to the flood of "conflict" messages. [...] Sep 24 17:39:03 link-02 kernel: dm-cmirror: Notifying server(1) of sync change: 62/Q0c5dDtC Sep 24 17:39:03 link-02 last message repeated 127 times Sep 24 17:39:03 link-02 kernel: dm-cmirror: Notifying server(1) of sync change: 63/Q0c5dDtC Sep 24 17:39:03 link-02 last message repeated 46 times Sep 24 17:39:03 link-02 kernel: dm-cmirror: Notifying server(1) of sync change: 748/Q0c5dDtC Sep 24 17:39:03 link-02 last message repeated 2 times Sep 24 17:39:03 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 24 17:39:03 link-02 last message repeated 2 times Sep 24 17:39:03 link-02 kernel: scsi0 (0:5): rejecting I/O to offline device Sep 24 17:39:04 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 24 17:39:04 link-02 last message repeated 2 times Sep 24 17:39:04 link-02 kernel: dm-cmirror: Notifying server(1) of sync change: 749/Q0c5dDtC Sep 24 17:39:04 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 24 17:39:05 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 24 17:39:06 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC [...] -- Additional comment from cmarthal on 2007-09-25 09:46 EST -- More debugging from the only node not completely hung... [root@link-02 tmp]# ./lvm_backtraces.pl Backtrace for lvs-a-o+devices (489): #1 0x000000000044fde3 in pool_labeller_create () #2 0x00000000004500e3 in pool_labeller_create () #3 0x000000000043cea6 in check_lvm1_vg_inactive () #4 0x000000000043d0b9 in lock_vol () #5 0x000000000041f2ef in process_each_lv () #6 0x000000000041ceaa in pvscan () #7 0x0000000000414c42 in lvm_run_command () #8 0x00000000004155f2 in lvm2_main () #9 0x00000031b871c3fb in __libc_start_main () from /lib64/tls/libc.so.6 #10 0x000000000040c84a in _start () #11 0x0000007fbffffa38 in ?? () #12 0x000000000000001c in ?? () #13 0x0000000000000004 in ?? () #14 0x0000007fbffffc26 in ?? () #15 0x0000007fbffffc2a in ?? () #16 0x0000007fbffffc2d in ?? () #17 0x0000007fbffffc30 in ?? () Sep 25 08:38:36 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 25 08:38:36 link-02 last message repeated 3 times Sep 25 08:38:36 link-02 qarshd[12229]: Nothing to do Sep 25 08:38:36 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 25 08:38:39 link-02 last message repeated 55 times Sep 25 08:38:39 link-02 qarshd[12199]: Nothing to do Sep 25 08:38:39 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 25 08:38:39 link-02 last message repeated 3 times Sep 25 08:38:39 link-02 qarshd[12229]: Nothing to do Sep 25 08:38:39 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 25 08:38:41 link-02 last message repeated 39 times Sep 25 08:38:41 link-02 kernel: dm-cmirror: LOG INFO: Sep 25 08:38:41 link-02 kernel: dm-cmirror: uuid: LVM-vEL8KkKGwA4X8JFaykzpGjepOwn908GGm3jPYhF61PD5tuohdgtzNmEIQ0c5dDtC Sep 25 08:38:41 link-02 kernel: dm-cmirror: uuid_ref : 1 Sep 25 08:38:41 link-02 kernel: dm-cmirror: log type : disk Sep 25 08:38:41 link-02 kernel: dm-cmirror: ?region_count: 1600 Sep 25 08:38:41 link-02 kernel: dm-cmirror: ?sync_count : 1079 Sep 25 08:38:41 link-02 kernel: dm-cmirror: ?sync_search : 0 Sep 25 08:38:41 link-02 kernel: dm-cmirror: in_sync : NO Sep 25 08:38:41 link-02 kernel: dm-cmirror: suspended : NO Sep 25 08:38:41 link-02 kernel: dm-cmirror: server_id : [[link-02]] Sep 25 08:38:41 link-02 kernel: dm-cmirror: server_valid: YES Sep 25 08:38:41 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 25 08:38:42 link-02 last message repeated 15 times Sep 25 08:38:42 link-02 qarshd[12199]: Nothing to do Sep 25 08:38:42 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 25 08:38:42 link-02 last message repeated 3 times Sep 25 08:38:42 link-02 qarshd[12229]: Nothing to do Sep 25 08:38:42 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC Sep 25 08:38:45 link-02 last message repeated 55 times Sep 25 08:38:45 link-02 qarshd[12199]: Nothing to do Sep 25 08:38:45 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC DLM lockspace 'clvmd' Resource 000001002fbcfe88 (parent 0000000000000000). Name (len=64) "vEL8KkKGwA4X8JFaykzpGjepOwn908GG19cPvkWWQOJsav7yEvp20ydUxz2F8SZP" Local Copy, Master is node [[link-08]] Granted Queue 000103b6 CR 8657 Master: 00010201 Conversion Queue Waiting Queue Resource 0000010032a4d6e8 (parent 0000000000000000). Name (len=16) "V_helter_skelter" Local Copy, Master is node [[link-07]] Granted Queue 000103d0 PW 12221 Master: 00010007 Conversion Queue Waiting Queue Resource 000001002e7db9d8 (parent 0000000000000000). Name (len=12) "V_VolGroup00" Master Copy Granted Queue 000100a9 PR 490 Conversion Queue Waiting Queue -- Additional comment from cmarthal on 2007-09-25 09:49 EST -- This was with the latest code: 2.6.9-59.ELsmp cmirror-kernel-smp-2.6.9-36.2 -- Additional comment from jbrassow on 2007-09-27 16:31 EST -- It's a good bet this was a eariler screw up. We can't allow writes to regions that have not been recovered yet. We see if they have bee recovered by looking at the 'sync_bits'. In some device failure cases, regions must be marked 'out-of-sync' - this was causing a following write to block because it thought the region had not yet been recovered - when in fact, it had just been put out-of-sync due to failing device. new -> post -- Additional comment from cmarthal on 2007-10-12 14:49 EST -- This bug appears to still very much exist. The sync hung once again right after mirror leg failure to three mirrors all having three legs. I appear to see this more often with the multiple mirror scenarios. There is a constant stream of the following messages: dm-cmirror: Priority recovery region: 18446744073709551615/Pn10cLPV The strace for the sync is exact same: Oct 12 11:42:31 link-02 kernel: sync R running task 0 7539 7538 (NOTLB) Oct 12 11:42:31 link-02 kernel: kmirrord S 000001003e8baf48 0 7540 6 7558 7355 (L-TLB) Oct 12 11:42:31 link-02 kernel: 0000010022de7e68 0000000000000046 000001002e327260 0000010000000073 Oct 12 11:42:31 link-02 kernel: 0000010022de7e68 000000003e8bae18 000001000101ba80 000000013e8bae00 Oct 12 11:42:31 link-02 kernel: 0000010027b007f0 0000000000000840 Oct 12 11:42:31 link-02 kernel: Call Trace:<ffffffffa00cd6a2>{:dm_mirror:do_work+0} <ffffffff80148119>{worker_thread+226} Oct 12 11:42:31 link-02 kernel: <ffffffff8013461d>{default_wake_function+0} <ffffffff8013466e>{__wake_up_common+67} Oct 12 11:42:31 link-02 kernel: <ffffffff8013461d>{default_wake_function+0} <ffffffff8014bf28>{keventd_create_kthread+0} Oct 12 11:42:31 link-02 kernel: <ffffffff80148037>{worker_thread+0} <ffffffff8014bf28>{keventd_create_kthread+0} Oct 12 11:42:31 link-02 kernel: <ffffffff8014beff>{kthread+200} <ffffffff80110f47>{child_rip+8} Oct 12 11:42:31 link-02 kernel: <ffffffff8014bf28>{keventd_create_kthread+0} <ffffffff8014be37>{kthread+0} Oct 12 11:42:31 link-02 kernel: <ffffffff80110f3f>{child_rip+0} -- Additional comment from cmarthal on 2007-10-12 14:52 EST -- This doesn't cause future I/O to block, and clvmd is not deadlocked here either. The leg failure recovery properly takes place. [root@link-08 syncd_primary_3legs_3]# lvs -a -o +devices /dev/sda1: open failed: No such device or address LV VG Attr LSize Origin Snap% Move Log Copy% Devices LogVol00 VolGroup00 -wi-ao 72.38G /dev/hda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/hda2(2316) syncd_primary_3legs_1 helter_skelter mwi-ao 800.00M syncd_primary_3legs_1_mlog 100.00 syncd_primary_3legs_1_mimage_2(0),syncd_primary_3legs_1_mimage_1(0) [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 800.00M /dev/sdf1(0) [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 800.00M /dev/sdc1(0) [syncd_primary_3legs_1_mlog] helter_skelter lwi-ao 4.00M /dev/sdg1(0) syncd_primary_3legs_2 helter_skelter mwi-ao 800.00M syncd_primary_3legs_2_mlog 100.00 syncd_primary_3legs_2_mimage_2(0),syncd_primary_3legs_2_mimage_1(0) [syncd_primary_3legs_2_mimage_1] helter_skelter iwi-ao 800.00M /dev/sdf1(200) [syncd_primary_3legs_2_mimage_2] helter_skelter iwi-ao 800.00M /dev/sdc1(200) [syncd_primary_3legs_2_mlog] helter_skelter lwi-ao 4.00M /dev/sdg1(1) syncd_primary_3legs_3 helter_skelter mwi-ao 800.00M syncd_primary_3legs_3_mlog 100.00 syncd_primary_3legs_3_mimage_2(0),syncd_primary_3legs_3_mimage_1(0) [syncd_primary_3legs_3_mimage_1] helter_skelter iwi-ao 800.00M /dev/sdf1(400) [syncd_primary_3legs_3_mimage_2] helter_skelter iwi-ao 800.00M /dev/sdc1(400) [syncd_primary_3legs_3_mlog] helter_skelter lwi-ao 4.00M /dev/sdg1(2) -- Additional comment from jbrassow on 2007-10-24 22:17 EST -- This is very strange. I can't seem to make things "hang", but things do become abismally slow for certain processes. Perhaps your type of work load backs up enough I/O that when the sync happens, it appears to hang (because it is so slow). I'm not sure where the slowdown is coming from. cmirror network calls take virtually no time. However, I have noticed considerable delay between (core) mirror 'do_writes' calls. From system tap: Entering: do_writes(time_between=879,consult_calls=1,consult_time=0) Entering: do_writes(time_between=201,consult_calls=5,consult_time=1) Entering: do_writes(time_between=0,consult_calls=3,consult_time=0) Entering: do_writes(time_between=31,consult_calls=3,consult_time=0) Entering: do_writes(time_between=200,consult_calls=1,consult_time=0) Entering: do_writes(time_between=0,consult_calls=3,consult_time=0) Entering: do_writes(time_between=28,consult_calls=3,consult_time=0) Entering: do_writes(time_between=4998,consult_calls=1,consult_time=0) We can see that the time between calls (in ms) can be as high as 5 seconds... Also during my tests, I've seen processes slow down on mirrors which have not suffered a failure (after another mirror has recovered from a failure). I'll have to test if the slow down occurs without a failure. -- Additional comment from jbrassow on 2007-10-24 23:10 EST -- The problem could be occuring in the core mirroring code. From drivers/md/dm-raid1.c: static void do_work(void *data) { while (do_mirror(data)) { set_current_state(TASK_INTERRUPTIBLE); schedule_timeout(HZ/5); } } 'wake(ms)' is the usual call that triggers 'do_work'. However, do_mirror will return 1 if there is still work to be done - causing a 200ms delay before it is run again. Normally, this happens very infrequently - like when a failure occurs. In those cases, we may _want_ to delay and wait for a suspend. However, it also happens in response to the normal write queue having data on it. This can be filled by nominal I/O or by I/O placed back on the queue due to remote recovery conflicts (from cluster mirror). In the event of remote recovery conflicts, we may want the delay. However, we never want the delay when the write queue is filled by normal means... and that seems to be exactly what is happening. This is probably happening to single machine mirroring too, but to a lesser extend because more delay is introduced with the network calls cmirror makes - allowing more time to add something to the nominal queue before 'do_mirror' completes. Will test theory. -- Additional comment from jbrassow on 2007-10-25 00:54 EST -- that's it. -- Additional comment from jbrassow on 2007-10-30 12:16 EST -- Summary: The small sleep in drivers/md/dm-raid1.c:do_work() was designed to: 1) allow the userspace process to more easily proceed to reconfigure a mirror in the event of a failure. 2) allow remote machines (cluster mirror) to finish recovery of a region before resubmitting the I/O. However, the sleep is having adverse affects on nominal I/O - sometimes slowing processes down by greater than a factor of 10. This can make applications appear hung. This problem occurs most frequently in cluster mirrors under heavy I/O (or during failure scenarios). It is possible in single machine mirroring, but hasn't been seen yet. This bug does not corrupt data, but it certainly effects "AVAILABILITY" - which is especially bad considering mirrors are used when you want to increase availability.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in 2.6.18-62.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html