Bug 290821

Summary: cmirror write path appears deadlocked after recovery is successful
Product: Red Hat Enterprise Linux 4 Reporter: Corey Marthaler <cmarthal>
Component: kernelAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.0CC: jbaron, lwang, rkenna
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0991 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-21 21:15:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 358881, 381081, 422551, 430698    
Attachments:
Description Flags
stack traces from link-07
none
stack traces from link-08
none
stack traces from link-02 none

Description Corey Marthaler 2007-09-14 14:24:50 UTC
Description of problem:
This seems like the similar setup to many cmirror deadlock bugs, however, I
couldn't find anything that had failed during the cmirror leg recovery. The down
convert worked properly, device-mapper shows everything as normal, and the
cluster services all appear fine. The issue however is that a sync attempt after
 the down convert is hung. 

================================================================================
Iteration 2.5 started at Fri Sep 14 04:29:15 CDT 2007
================================================================================
Senario: Kill secondary leg of synced 3 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      fail_secondary_synced_3_legs_3_mirrors
* sync:      1
* mirrors:   3
* disklog:   1
* failpv:    /dev/sdb1
* legs:      3
* pvs:       /dev/sda1 /dev/sdb1 /dev/sde1 /dev/sdf1
************************************************

Creating mirror(s) on link-02...
qarsh root@link-02 lvcreate -m 2 -n fail_secondary_synced_3_legs_3_mirrors_1 -L
800M helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-1000
/dev/sdf1:0-150
qarsh root@link-02 lvcreate -m 2 -n fail_secondary_synced_3_legs_3_mirrors_2 -L
800M helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-1000
/dev/sdf1:0-150
qarsh root@link-02 lvcreate -m 2 -n fail_secondary_synced_3_legs_3_mirrors_3 -L
800M helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-1000
/dev/sdf1:0-150

Waiting for mirror(s) to sync
Verifying fully syncd mirror(s), currently at
 ...1.50% ...8.00% ...14.00% ...20.00% ...26.00% ...32.50% ...38.50% ...44.00%
...50.00% ...56.00% ...61.50% ...67.00% ...73.00% ...79.00% ...85.50% ...91.50%
...98.50% ...100.00%

Creating gfs on top of mirror(s) on link-02...
Mounting mirrored gfs filesystems on link-02...
Mounting mirrored gfs filesystems on link-07...
Mounting mirrored gfs filesystems on link-08...

Writing verification files (checkit) to mirror(s) on...
        link-02:
checkit starting with:
CREATE
Num files:          100
Random Seed:        14271
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_1
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        14280
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_2
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        14284
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_3
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_3/checkit
        link-07:
checkit starting with:
CREATE
Num files:          100
Random Seed:        16157
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_1
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        16161
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_2
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        16165
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_3
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_3/checkit
        link-08:
checkit starting with:
CREATE
Num files:          100
Random Seed:        12543
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_1
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        12547
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_2
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        12556
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_3
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_3/checkit

Starting the io load (collie/xdoio) on mirror(s)
Sleeping 15 seconds to get some I/O locks outstanding before the failure

Disabling device sdb on link-02
Disabling device sdb on link-07
Disabling device sdb on link-08

Attempting I/O to cause mirror down conversion(s) on link-02
10+0 records in
10+0 records out

# Here is where that 'sync' is attempted and got deadlocked.


Link-02:
[root@link-02 ~]# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       -
[1 2 3]

DLM Lock Space:  "clvmd"                           2675 1132 run       -
[2 1 3]

DLM Lock Space:  "clustered_log"                   2746 1165 run       -
[1 2 3]

DLM Lock Space:  "gfs1"                            2748 1166 run       -
[2 1 3]

DLM Lock Space:  "gfs2"                            2752 1168 run       -
[2 1 3]

DLM Lock Space:  "gfs3"                            2756 1170 run       -
[2 1 3]

GFS Mount Group: "gfs1"                            2750 1167 run       -
[2 1 3]

GFS Mount Group: "gfs2"                            2754 1169 run       -
[2 1 3]

GFS Mount Group: "gfs3"                            2758 1171 run       -
[2 1 3]


[root@link-02 ~]# dmsetup status
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mlog: 0 8192 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mlog: 0 8192 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_2: 0 1638400 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_2: 0 1638400 linear
VolGroup00-LogVol01: 0 4063232 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_0: 0 1638400 linear
VolGroup00-LogVol00: 0 73728000 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3: 0 1638400 mirror 2
253:13 253:15 1600/1600 1 AA 3 clustered_disk 253:12 A
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_2: 0 1638400 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_0: 0 1638400 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2: 0 1638400 mirror 2
253:8 253:10 1600/1600 1 AA 3 clustered_disk 253:7 A
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1: 0 1638400 mirror 2
253:3 253:5 1600/1600 1 AA 3 clustered_disk 253:2 A
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_0: 0 1638400 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mlog: 0 8192 linear


[root@link-02 ~]# dmsetup info
Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 7
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVI3qQH0U9UjhwKZHUzzgPN7R1ZQW45GGJ

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 12
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV25JE4xYwJVn0llu3kqOaIEpX8tdHPenW

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 15
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVR6EfGtaXjwxaoQGnSfnC66VSW0hBhjyE

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 10
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVTyiSLeh2fY2EIpOEyzl2u6k4x6qfklLv

Name:              VolGroup00-LogVol01
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 1
Number of targets: 1
UUID: LVM-0mpjJpcho7U2NPluVVHwoJpLj1Z6a2VcYYhWLYsKSE0wcbNdieVGH93CfCLFc0k2

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 13
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVj4bTmA43jRm4R0K1aUra31NlyjNqgb4a

Name:              VolGroup00-LogVol00
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: LVM-0mpjJpcho7U2NPluVVHwoJpLj1Z6a2VcVVEtseAZ98h5LgRR3eX7h8BC4SRKAyfs

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      15
Major, minor:      253, 16
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVAO3QPlqt90O6nXsOPaxU9olaJDxqhuun

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 5
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV7d3ysXmWYErnxQwG41JD8ALrGN5WvUsB

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 8
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVljFethBA69K96UT1TYWd1ELlAXBy4aip

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      71
Major, minor:      253, 11
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV2a6mclvdud8zV0JMRsJfE05A947cn0X9

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      65
Major, minor:      253, 6
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV0sJSIYsAxbuekl6atsx1SuyZInr5IUx3

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVpdKOIcTwn7t9Z56Juus0PoHrTgjspGIN

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 2
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVrZUgNOjo5xBDeUYcO6BthcjwlMheAX9f


This is the stuck sync process:
Sep 14 08:54:34 link-02 kernel: sync          R  running task       0 14341 
14340                     (NOTLB)
Sep 14 08:54:34 link-02 kernel: kmirrord      S 0000010037e0bf48     0 14342   
  6               14228 (L-TLB)
Sep 14 08:54:34 link-02 kernel: 0000010014633e68 0000000000000046
000001000ed217f0 0000010000000073
Sep 14 08:54:34 link-02 kernel:        0000000000000046 000000003eb84760
000001000101ba80 0000000137e0be00
Sep 14 08:54:34 link-02 kernel:        000001003a7f17f0 00000000000002cb
Sep 14 08:54:34 link-02 kernel: Call
Trace:<ffffffffa00d06a2>{:dm_mirror:do_work+0} <ffffffff80148091>{worker_thread+226}
Sep 14 08:54:34 link-02 kernel:       
<ffffffff8013459d>{default_wake_function+0} <ffffffff801345ee>{__wake_up_common+67}
Sep 14 08:54:34 link-02 kernel:       
<ffffffff8013459d>{default_wake_function+0}
<ffffffff8014bea0>{keventd_create_kthread+0}
Sep 14 08:54:34 link-02 kernel:        <ffffffff80147faf>{worker_thread+0}
<ffffffff8014bea0>{keventd_create_kthread+0}
Sep 14 08:54:34 link-02 kernel:        <ffffffff8014be77>{kthread+200}
<ffffffff80110f47>{child_rip+8}
Sep 14 08:54:34 link-02 kernel:       
<ffffffff8014bea0>{keventd_create_kthread+0} <ffffffff8014bdaf>{kthread+0}
Sep 14 08:54:34 link-02 kernel:        <ffffffff80110f3f>{child_rip+0}

I'll attach the rest of the logs.

Version-Release number of selected component (if applicable):
2.6.9-56.ELsmp
cmirror-1.0.1-1
cmirror-kernel-2.6.9-34.1

How reproducible:
only once so far

Comment 1 Corey Marthaler 2007-09-14 18:33:55 UTC
Created attachment 196081 [details]
stack traces from link-07

Comment 2 Corey Marthaler 2007-09-14 19:02:05 UTC
Created attachment 196101 [details]
stack traces from link-08

Comment 3 Corey Marthaler 2007-09-14 19:05:21 UTC
Created attachment 196121 [details]
stack traces from link-02

Comment 4 Corey Marthaler 2007-09-25 13:41:07 UTC
I appear to have reproduced this bug and this may be related to bz 238629.

The test hung in the same spot, during a sync, right after a leg failure:
================================================================================
Iteration 0.7 started at Mon Sep 24 17:45:04 CDT 2007
================================================================================
Senario: Kill secondary leg of non synced 2 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      fail_secondary_non_synced_2_legs_1_mirrors
* sync:      0
* mirrors:   1
* disklog:   1
* failpv:    /dev/sde1
* legs:      2
* pvs:       /dev/sda1 /dev/sde1 /dev/sdf1
************************************************

Creating mirror(s) on link-02...
qarsh root@link-02 lvcreate -m 1 -n fail_secondary_non_synced_2_legs_1_mirrors_1
-L 800M helter_skelter /dev/sda1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150

mirror(s) currently at 5.50% synced right now

Creating gfs on top of mirror(s) on link-02...
Mounting mirrored gfs filesystems on link-02...
Mounting mirrored gfs filesystems on link-07...
Mounting mirrored gfs filesystems on link-08...

Writing verification files (checkit) to mirror(s) on...
        link-02:
checkit starting with:
CREATE
Num files:          100
Random Seed:        12193
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs_1_mirrors_1
Working dir:        /mnt/fail_secondary_non_synced_2_legs_1_mirrors_1/checkit
        link-07:
checkit starting with:
CREATE
Num files:          100
Random Seed:        11603
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs_1_mirrors_1
Working dir:        /mnt/fail_secondary_non_synced_2_legs_1_mirrors_1/checkit
        link-08:
checkit starting with:
CREATE
Num files:          100
Random Seed:        12456
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs_1_mirrors_1
Working dir:        /mnt/fail_secondary_non_synced_2_legs_1_mirrors_1/checkit

Starting the io load (collie/xdoio) on mirror(s)
Sleeping 15 seconds to get some I/O locks outstanding before the failure

Disabling device sde on link-02
Disabling device sde on link-07
Disabling device sde on link-08

Attempting I/O to cause mirror down conversion(s) on link-02
10+0 records in
10+0 records out
[Here's where the sync hangs]

The reason I think that this is bz 238629 is due to the flood of "conflict"
messages.


[...]
Sep 24 17:39:03 link-02 kernel: dm-cmirror: Notifying server(1) of sync change:
62/Q0c5dDtC
Sep 24 17:39:03 link-02 last message repeated 127 times
Sep 24 17:39:03 link-02 kernel: dm-cmirror: Notifying server(1) of sync change:
63/Q0c5dDtC
Sep 24 17:39:03 link-02 last message repeated 46 times
Sep 24 17:39:03 link-02 kernel: dm-cmirror: Notifying server(1) of sync change:
748/Q0c5dDtC
Sep 24 17:39:03 link-02 last message repeated 2 times
Sep 24 17:39:03 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 24 17:39:03 link-02 last message repeated 2 times
Sep 24 17:39:03 link-02 kernel: scsi0 (0:5): rejecting I/O to offline device
Sep 24 17:39:04 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 24 17:39:04 link-02 last message repeated 2 times
Sep 24 17:39:04 link-02 kernel: dm-cmirror: Notifying server(1) of sync change:
749/Q0c5dDtC
Sep 24 17:39:04 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 24 17:39:05 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 24 17:39:06 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
[...]


Comment 5 Corey Marthaler 2007-09-25 13:46:29 UTC
More debugging from the only node not completely hung...

[root@link-02 tmp]# ./lvm_backtraces.pl
Backtrace for lvs-a-o+devices (489):
#1  0x000000000044fde3 in pool_labeller_create ()
#2  0x00000000004500e3 in pool_labeller_create ()
#3  0x000000000043cea6 in check_lvm1_vg_inactive ()
#4  0x000000000043d0b9 in lock_vol ()
#5  0x000000000041f2ef in process_each_lv ()
#6  0x000000000041ceaa in pvscan ()
#7  0x0000000000414c42 in lvm_run_command ()
#8  0x00000000004155f2 in lvm2_main ()
#9  0x00000031b871c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#10 0x000000000040c84a in _start ()
#11 0x0000007fbffffa38 in ?? ()
#12 0x000000000000001c in ?? ()
#13 0x0000000000000004 in ?? ()
#14 0x0000007fbffffc26 in ?? ()
#15 0x0000007fbffffc2a in ?? ()
#16 0x0000007fbffffc2d in ?? ()
#17 0x0000007fbffffc30 in ?? ()

Sep 25 08:38:36 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:36 link-02 last message repeated 3 times
Sep 25 08:38:36 link-02 qarshd[12229]: Nothing to do
Sep 25 08:38:36 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:39 link-02 last message repeated 55 times
Sep 25 08:38:39 link-02 qarshd[12199]: Nothing to do
Sep 25 08:38:39 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:39 link-02 last message repeated 3 times
Sep 25 08:38:39 link-02 qarshd[12229]: Nothing to do
Sep 25 08:38:39 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:41 link-02 last message repeated 39 times
Sep 25 08:38:41 link-02 kernel: dm-cmirror: LOG INFO:
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   uuid:
LVM-vEL8KkKGwA4X8JFaykzpGjepOwn908GGm3jPYhF61PD5tuohdgtzNmEIQ0c5dDtC
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   uuid_ref    : 1
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   log type    : disk
Sep 25 08:38:41 link-02 kernel: dm-cmirror:  ?region_count: 1600
Sep 25 08:38:41 link-02 kernel: dm-cmirror:  ?sync_count  : 1079
Sep 25 08:38:41 link-02 kernel: dm-cmirror:  ?sync_search : 0
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   in_sync     : NO
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   suspended   : NO
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   server_id   : [[link-02]]
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   server_valid: YES
Sep 25 08:38:41 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:42 link-02 last message repeated 15 times
Sep 25 08:38:42 link-02 qarshd[12199]: Nothing to do
Sep 25 08:38:42 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:42 link-02 last message repeated 3 times
Sep 25 08:38:42 link-02 qarshd[12229]: Nothing to do
Sep 25 08:38:42 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:45 link-02 last message repeated 55 times
Sep 25 08:38:45 link-02 qarshd[12199]: Nothing to do
Sep 25 08:38:45 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
DLM lockspace 'clvmd'

Resource 000001002fbcfe88 (parent 0000000000000000). Name (len=64)
"vEL8KkKGwA4X8JFaykzpGjepOwn908GG19cPvkWWQOJsav7yEvp20ydUxz2F8SZP"
Local Copy, Master is node [[link-08]]
Granted Queue
000103b6 CR 8657 Master:     00010201
Conversion Queue
Waiting Queue

Resource 0000010032a4d6e8 (parent 0000000000000000). Name (len=16)
"V_helter_skelter"
Local Copy, Master is node [[link-07]]
Granted Queue
000103d0 PW 12221 Master:     00010007
Conversion Queue
Waiting Queue

Resource 000001002e7db9d8 (parent 0000000000000000). Name (len=12) "V_VolGroup00"
Master Copy
Granted Queue
000100a9 PR 490
Conversion Queue
Waiting Queue





Comment 6 Corey Marthaler 2007-09-25 13:49:06 UTC
This was with the latest code:
2.6.9-59.ELsmp
cmirror-kernel-smp-2.6.9-36.2



Comment 7 Jonathan Earl Brassow 2007-09-27 20:31:34 UTC
It's a good bet this was a eariler screw up.  We can't allow writes to regions
that have not been recovered yet.  We see if they have bee recovered by looking
at the 'sync_bits'.

In some device failure cases, regions must be marked 'out-of-sync' - this was
causing a following write to block because it thought the region had not yet
been recovered - when in fact, it had just been put out-of-sync due to failing
device.

new -> post


Comment 8 Corey Marthaler 2007-10-12 18:49:36 UTC
This bug appears to still very much exist.

The sync hung once again right after mirror leg failure to three mirrors all
having three legs. I appear to see this more often with the multiple mirror
scenarios.


There is a constant stream of the following messages:
dm-cmirror: Priority recovery region: 18446744073709551615/Pn10cLPV


The strace for the sync is exact same:
Oct 12 11:42:31 link-02 kernel: sync          R  running task       0  7539  
7538                     (NOTLB)
Oct 12 11:42:31 link-02 kernel: kmirrord      S 000001003e8baf48     0  7540   
  6          7558  7355 (L-TLB)
Oct 12 11:42:31 link-02 kernel: 0000010022de7e68 0000000000000046
000001002e327260 0000010000000073
Oct 12 11:42:31 link-02 kernel:        0000010022de7e68 000000003e8bae18
000001000101ba80 000000013e8bae00
Oct 12 11:42:31 link-02 kernel:        0000010027b007f0 0000000000000840
Oct 12 11:42:31 link-02 kernel: Call
Trace:<ffffffffa00cd6a2>{:dm_mirror:do_work+0} <ffffffff80148119>{worker_thread+226}
Oct 12 11:42:31 link-02 kernel:       
<ffffffff8013461d>{default_wake_function+0} <ffffffff8013466e>{__wake_up_common+67}
Oct 12 11:42:31 link-02 kernel:       
<ffffffff8013461d>{default_wake_function+0}
<ffffffff8014bf28>{keventd_create_kthread+0}
Oct 12 11:42:31 link-02 kernel:        <ffffffff80148037>{worker_thread+0}
<ffffffff8014bf28>{keventd_create_kthread+0}
Oct 12 11:42:31 link-02 kernel:        <ffffffff8014beff>{kthread+200}
<ffffffff80110f47>{child_rip+8}
Oct 12 11:42:31 link-02 kernel:       
<ffffffff8014bf28>{keventd_create_kthread+0} <ffffffff8014be37>{kthread+0}
Oct 12 11:42:31 link-02 kernel:        <ffffffff80110f3f>{child_rip+0}



Comment 9 Corey Marthaler 2007-10-12 18:52:33 UTC
This doesn't cause future I/O to block, and clvmd is not deadlocked here either.
The leg failure recovery properly takes place.

[root@link-08 syncd_primary_3legs_3]# lvs -a -o +devices
  /dev/sda1: open failed: No such device or address
  LV                               VG             Attr   LSize   Origin Snap% 
Move Log                        Copy%  Devices            
  LogVol00                         VolGroup00     -wi-ao  72.38G               
                                      /dev/hda2(0)       
  LogVol01                         VolGroup00     -wi-ao   1.94G               
                                      /dev/hda2(2316)    
  syncd_primary_3legs_1            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_1_mlog 100.00
syncd_primary_3legs_1_mimage_2(0),syncd_primary_3legs_1_mimage_1(0)
  [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 800.00M               
                                      /dev/sdf1(0)       
  [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 800.00M               
                                      /dev/sdc1(0)       
  [syncd_primary_3legs_1_mlog]     helter_skelter lwi-ao   4.00M               
                                      /dev/sdg1(0)       
  syncd_primary_3legs_2            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_2_mlog 100.00
syncd_primary_3legs_2_mimage_2(0),syncd_primary_3legs_2_mimage_1(0)
  [syncd_primary_3legs_2_mimage_1] helter_skelter iwi-ao 800.00M               
                                      /dev/sdf1(200)     
  [syncd_primary_3legs_2_mimage_2] helter_skelter iwi-ao 800.00M               
                                      /dev/sdc1(200)     
  [syncd_primary_3legs_2_mlog]     helter_skelter lwi-ao   4.00M               
                                      /dev/sdg1(1)       
  syncd_primary_3legs_3            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_3_mlog 100.00
syncd_primary_3legs_3_mimage_2(0),syncd_primary_3legs_3_mimage_1(0)
  [syncd_primary_3legs_3_mimage_1] helter_skelter iwi-ao 800.00M               
                                      /dev/sdf1(400)     
  [syncd_primary_3legs_3_mimage_2] helter_skelter iwi-ao 800.00M               
                                      /dev/sdc1(400)     
  [syncd_primary_3legs_3_mlog]     helter_skelter lwi-ao   4.00M               
                                      /dev/sdg1(2)

Comment 10 Jonathan Earl Brassow 2007-10-25 02:17:13 UTC
This is very strange.  I can't seem to make things "hang", but things do become
abismally slow for certain processes.  Perhaps your type of work load backs up
enough I/O that when the sync happens, it appears to hang (because it is so slow).

I'm not sure where the slowdown is coming from.  cmirror network calls take
virtually no time.  However, I have noticed considerable delay between (core)
mirror 'do_writes' calls.  From system tap:

Entering:      do_writes(time_between=879,consult_calls=1,consult_time=0)
Entering:      do_writes(time_between=201,consult_calls=5,consult_time=1)
Entering:      do_writes(time_between=0,consult_calls=3,consult_time=0)
Entering:      do_writes(time_between=31,consult_calls=3,consult_time=0)
Entering:      do_writes(time_between=200,consult_calls=1,consult_time=0)
Entering:      do_writes(time_between=0,consult_calls=3,consult_time=0)
Entering:      do_writes(time_between=28,consult_calls=3,consult_time=0)
Entering:      do_writes(time_between=4998,consult_calls=1,consult_time=0)

We can see that the time between calls (in ms) can be as high as 5 seconds...

Also during my tests, I've seen processes slow down on mirrors which have not
suffered a failure (after another mirror has recovered from a failure).

I'll have to test if the slow down occurs without a failure.

Comment 11 Jonathan Earl Brassow 2007-10-25 03:10:26 UTC
The problem could be occuring in the core mirroring code.  From
drivers/md/dm-raid1.c:

static void do_work(void *data)
{
	while (do_mirror(data)) {
		set_current_state(TASK_INTERRUPTIBLE);
		schedule_timeout(HZ/5);
	}
}

'wake(ms)' is the usual call that triggers 'do_work'.  However, do_mirror will
return 1 if there is still work to be done - causing a 200ms delay before it is
run again.  Normally, this happens very infrequently - like when a failure
occurs.  In those cases, we may _want_ to delay and wait for a suspend. 
However, it also happens in response to the normal write queue having data on
it.  This can be filled by nominal I/O or by I/O placed back on the queue due to
remote recovery conflicts (from cluster mirror).  In the event of remote
recovery conflicts, we may want the delay.  However, we never want the delay
when the write queue is filled by normal means... and that seems to be exactly
what is happening.

This is probably happening to single machine mirroring too, but to a lesser
extend because more delay is introduced with the network calls cmirror makes -
allowing more time to add something to the nominal queue before 'do_mirror'
completes.

Will test theory.

Comment 12 Jonathan Earl Brassow 2007-10-25 04:54:29 UTC
that's it.


Comment 13 Jonathan Earl Brassow 2007-10-30 16:16:12 UTC
Summary:

The small sleep in drivers/md/dm-raid1.c:do_work() was designed to:
1) allow the userspace process to more easily proceed to reconfigure a mirror in
the event of a failure.
2) allow remote machines (cluster mirror) to finish recovery of a region before
resubmitting the I/O.

However, the sleep is having adverse affects on nominal I/O - sometimes slowing
processes down by greater than a factor of 10.  This can make applications
appear hung.

This problem occurs most frequently in cluster mirrors under heavy I/O (or
during failure scenarios).  It is possible in single machine mirroring, but
hasn't been seen yet.

This bug does not corrupt data, but it certainly effects "AVAILABILITY" - which
is especially bad considering mirrors are used when you want to increase
availability.


Comment 18 Corey Marthaler 2007-10-31 14:46:41 UTC
Just a quick note that the only issue I'm seeing this this fix is bz 359341, and
so far that only happens when there are simultaneous failures to 3 three legged
mirrors. Otherwise, I haven't been able to reproduce this sync "hang" anymore.

Comment 21 Corey Marthaler 2007-11-06 15:06:17 UTC
Just a note that I ran the single node lvm tests with this patch and didn't see
any issues. That said, I wasn't doing the extensive failure scenario testing
that is done on cluster mirrors.

Comment 24 errata-xmlrpc 2007-11-21 21:15:23 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0991.html