358881 – dm-mirror: shedule_timeout() call causing big process slowdowns - primarily w/ cluster mirrors

Bug 358881 - dm-mirror: shedule_timeout() call causing big process slowdowns - primarily w/ cluster mirrors

Summary: dm-mirror: shedule_timeout() call causing big process slowdowns - primarily w...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Jonathan Earl Brassow
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:	290821
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-30 18:46 UTC by Jonathan Earl Brassow
Modified:	2008-05-21 14:59 UTC (History)
CC List:	0 users
Fixed In Version:	RHBA-2008-0314
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 14:59:54 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0314	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5.2	2008-05-20 18:43:34 UTC

Description Jonathan Earl Brassow 2007-10-30 18:46:38 UTC

+++ This bug was initially created as a clone of Bug #290821 +++

Description of problem:
This seems like the similar setup to many cmirror deadlock bugs, however, I
couldn't find anything that had failed during the cmirror leg recovery. The down
convert worked properly, device-mapper shows everything as normal, and the
cluster services all appear fine. The issue however is that a sync attempt after
 the down convert is hung. 

================================================================================
Iteration 2.5 started at Fri Sep 14 04:29:15 CDT 2007
================================================================================
Senario: Kill secondary leg of synced 3 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      fail_secondary_synced_3_legs_3_mirrors
* sync:      1
* mirrors:   3
* disklog:   1
* failpv:    /dev/sdb1
* legs:      3
* pvs:       /dev/sda1 /dev/sdb1 /dev/sde1 /dev/sdf1
************************************************

Creating mirror(s) on link-02...
qarsh root@link-02 lvcreate -m 2 -n fail_secondary_synced_3_legs_3_mirrors_1 -L
800M helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-1000
/dev/sdf1:0-150
qarsh root@link-02 lvcreate -m 2 -n fail_secondary_synced_3_legs_3_mirrors_2 -L
800M helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-1000
/dev/sdf1:0-150
qarsh root@link-02 lvcreate -m 2 -n fail_secondary_synced_3_legs_3_mirrors_3 -L
800M helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-1000
/dev/sdf1:0-150

Waiting for mirror(s) to sync
Verifying fully syncd mirror(s), currently at
 ...1.50% ...8.00% ...14.00% ...20.00% ...26.00% ...32.50% ...38.50% ...44.00%
...50.00% ...56.00% ...61.50% ...67.00% ...73.00% ...79.00% ...85.50% ...91.50%
...98.50% ...100.00%

Creating gfs on top of mirror(s) on link-02...
Mounting mirrored gfs filesystems on link-02...
Mounting mirrored gfs filesystems on link-07...
Mounting mirrored gfs filesystems on link-08...

Writing verification files (checkit) to mirror(s) on...
        link-02:
checkit starting with:
CREATE
Num files:          100
Random Seed:        14271
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_1
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        14280
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_2
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        14284
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_3
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_3/checkit
        link-07:
checkit starting with:
CREATE
Num files:          100
Random Seed:        16157
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_1
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        16161
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_2
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        16165
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_3
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_3/checkit
        link-08:
checkit starting with:
CREATE
Num files:          100
Random Seed:        12543
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_1
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        12547
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_2
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        12556
Verify XIOR Stream: /tmp/checkit_fail_secondary_synced_3_legs_3_mirrors_3
Working dir:        /mnt/fail_secondary_synced_3_legs_3_mirrors_3/checkit

Starting the io load (collie/xdoio) on mirror(s)
Sleeping 15 seconds to get some I/O locks outstanding before the failure

Disabling device sdb on link-02
Disabling device sdb on link-07
Disabling device sdb on link-08

Attempting I/O to cause mirror down conversion(s) on link-02
10+0 records in
10+0 records out

# Here is where that 'sync' is attempted and got deadlocked.


Link-02:
[root@link-02 ~]# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       -
[1 2 3]

DLM Lock Space:  "clvmd"                           2675 1132 run       -
[2 1 3]

DLM Lock Space:  "clustered_log"                   2746 1165 run       -
[1 2 3]

DLM Lock Space:  "gfs1"                            2748 1166 run       -
[2 1 3]

DLM Lock Space:  "gfs2"                            2752 1168 run       -
[2 1 3]

DLM Lock Space:  "gfs3"                            2756 1170 run       -
[2 1 3]

GFS Mount Group: "gfs1"                            2750 1167 run       -
[2 1 3]

GFS Mount Group: "gfs2"                            2754 1169 run       -
[2 1 3]

GFS Mount Group: "gfs3"                            2758 1171 run       -
[2 1 3]


[root@link-02 ~]# dmsetup status
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mlog: 0 8192 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mlog: 0 8192 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_2: 0 1638400 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_2: 0 1638400 linear
VolGroup00-LogVol01: 0 4063232 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_0: 0 1638400 linear
VolGroup00-LogVol00: 0 73728000 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3: 0 1638400 mirror 2
253:13 253:15 1600/1600 1 AA 3 clustered_disk 253:12 A
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_2: 0 1638400 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_0: 0 1638400 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2: 0 1638400 mirror 2
253:8 253:10 1600/1600 1 AA 3 clustered_disk 253:7 A
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1: 0 1638400 mirror 2
253:3 253:5 1600/1600 1 AA 3 clustered_disk 253:2 A
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_0: 0 1638400 linear
helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mlog: 0 8192 linear


[root@link-02 ~]# dmsetup info
Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 7
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVI3qQH0U9UjhwKZHUzzgPN7R1ZQW45GGJ

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 12
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV25JE4xYwJVn0llu3kqOaIEpX8tdHPenW

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 15
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVR6EfGtaXjwxaoQGnSfnC66VSW0hBhjyE

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 10
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVTyiSLeh2fY2EIpOEyzl2u6k4x6qfklLv

Name:              VolGroup00-LogVol01
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 1
Number of targets: 1
UUID: LVM-0mpjJpcho7U2NPluVVHwoJpLj1Z6a2VcYYhWLYsKSE0wcbNdieVGH93CfCLFc0k2

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 13
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVj4bTmA43jRm4R0K1aUra31NlyjNqgb4a

Name:              VolGroup00-LogVol00
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: LVM-0mpjJpcho7U2NPluVVHwoJpLj1Z6a2VcVVEtseAZ98h5LgRR3eX7h8BC4SRKAyfs

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_3
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      15
Major, minor:      253, 16
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVAO3QPlqt90O6nXsOPaxU9olaJDxqhuun

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 5
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV7d3ysXmWYErnxQwG41JD8ALrGN5WvUsB

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 8
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVljFethBA69K96UT1TYWd1ELlAXBy4aip

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      71
Major, minor:      253, 11
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV2a6mclvdud8zV0JMRsJfE05A947cn0X9

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      65
Major, minor:      253, 6
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJV0sJSIYsAxbuekl6atsx1SuyZInr5IUx3

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVpdKOIcTwn7t9Z56Juus0PoHrTgjspGIN

Name:              helter_skelter-fail_secondary_synced_3_legs_3_mirrors_1_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 2
Number of targets: 1
UUID: LVM-9EgNFBfwHXtwnmVbmH73ASDpv2OZeKJVrZUgNOjo5xBDeUYcO6BthcjwlMheAX9f


This is the stuck sync process:
Sep 14 08:54:34 link-02 kernel: sync          R  running task       0 14341 
14340                     (NOTLB)
Sep 14 08:54:34 link-02 kernel: kmirrord      S 0000010037e0bf48     0 14342   
  6               14228 (L-TLB)
Sep 14 08:54:34 link-02 kernel: 0000010014633e68 0000000000000046
000001000ed217f0 0000010000000073
Sep 14 08:54:34 link-02 kernel:        0000000000000046 000000003eb84760
000001000101ba80 0000000137e0be00
Sep 14 08:54:34 link-02 kernel:        000001003a7f17f0 00000000000002cb
Sep 14 08:54:34 link-02 kernel: Call
Trace:<ffffffffa00d06a2>{:dm_mirror:do_work+0} <ffffffff80148091>{worker_thread+226}
Sep 14 08:54:34 link-02 kernel:       
<ffffffff8013459d>{default_wake_function+0} <ffffffff801345ee>{__wake_up_common+67}
Sep 14 08:54:34 link-02 kernel:       
<ffffffff8013459d>{default_wake_function+0}
<ffffffff8014bea0>{keventd_create_kthread+0}
Sep 14 08:54:34 link-02 kernel:        <ffffffff80147faf>{worker_thread+0}
<ffffffff8014bea0>{keventd_create_kthread+0}
Sep 14 08:54:34 link-02 kernel:        <ffffffff8014be77>{kthread+200}
<ffffffff80110f47>{child_rip+8}
Sep 14 08:54:34 link-02 kernel:       
<ffffffff8014bea0>{keventd_create_kthread+0} <ffffffff8014bdaf>{kthread+0}
Sep 14 08:54:34 link-02 kernel:        <ffffffff80110f3f>{child_rip+0}

I'll attach the rest of the logs.

Version-Release number of selected component (if applicable):
2.6.9-56.ELsmp
cmirror-1.0.1-1
cmirror-kernel-2.6.9-34.1

How reproducible:
only once so far

-- Additional comment from cmarthal on 2007-09-14 14:33 EST --
Created an attachment (id=196081)
stack traces from link-07


-- Additional comment from cmarthal on 2007-09-14 15:02 EST --
Created an attachment (id=196101)
stack traces from link-08


-- Additional comment from cmarthal on 2007-09-14 15:05 EST --
Created an attachment (id=196121)
stack traces from link-02


-- Additional comment from cmarthal on 2007-09-25 09:41 EST --
I appear to have reproduced this bug and this may be related to bz 238629.

The test hung in the same spot, during a sync, right after a leg failure:
================================================================================
Iteration 0.7 started at Mon Sep 24 17:45:04 CDT 2007
================================================================================
Senario: Kill secondary leg of non synced 2 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      fail_secondary_non_synced_2_legs_1_mirrors
* sync:      0
* mirrors:   1
* disklog:   1
* failpv:    /dev/sde1
* legs:      2
* pvs:       /dev/sda1 /dev/sde1 /dev/sdf1
************************************************

Creating mirror(s) on link-02...
qarsh root@link-02 lvcreate -m 1 -n fail_secondary_non_synced_2_legs_1_mirrors_1
-L 800M helter_skelter /dev/sda1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150

mirror(s) currently at 5.50% synced right now

Creating gfs on top of mirror(s) on link-02...
Mounting mirrored gfs filesystems on link-02...
Mounting mirrored gfs filesystems on link-07...
Mounting mirrored gfs filesystems on link-08...

Writing verification files (checkit) to mirror(s) on...
        link-02:
checkit starting with:
CREATE
Num files:          100
Random Seed:        12193
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs_1_mirrors_1
Working dir:        /mnt/fail_secondary_non_synced_2_legs_1_mirrors_1/checkit
        link-07:
checkit starting with:
CREATE
Num files:          100
Random Seed:        11603
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs_1_mirrors_1
Working dir:        /mnt/fail_secondary_non_synced_2_legs_1_mirrors_1/checkit
        link-08:
checkit starting with:
CREATE
Num files:          100
Random Seed:        12456
Verify XIOR Stream: /tmp/checkit_fail_secondary_non_synced_2_legs_1_mirrors_1
Working dir:        /mnt/fail_secondary_non_synced_2_legs_1_mirrors_1/checkit

Starting the io load (collie/xdoio) on mirror(s)
Sleeping 15 seconds to get some I/O locks outstanding before the failure

Disabling device sde on link-02
Disabling device sde on link-07
Disabling device sde on link-08

Attempting I/O to cause mirror down conversion(s) on link-02
10+0 records in
10+0 records out
[Here's where the sync hangs]

The reason I think that this is bz 238629 is due to the flood of "conflict"
messages.


[...]
Sep 24 17:39:03 link-02 kernel: dm-cmirror: Notifying server(1) of sync change:
62/Q0c5dDtC
Sep 24 17:39:03 link-02 last message repeated 127 times
Sep 24 17:39:03 link-02 kernel: dm-cmirror: Notifying server(1) of sync change:
63/Q0c5dDtC
Sep 24 17:39:03 link-02 last message repeated 46 times
Sep 24 17:39:03 link-02 kernel: dm-cmirror: Notifying server(1) of sync change:
748/Q0c5dDtC
Sep 24 17:39:03 link-02 last message repeated 2 times
Sep 24 17:39:03 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 24 17:39:03 link-02 last message repeated 2 times
Sep 24 17:39:03 link-02 kernel: scsi0 (0:5): rejecting I/O to offline device
Sep 24 17:39:04 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 24 17:39:04 link-02 last message repeated 2 times
Sep 24 17:39:04 link-02 kernel: dm-cmirror: Notifying server(1) of sync change:
749/Q0c5dDtC
Sep 24 17:39:04 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 24 17:39:05 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 24 17:39:06 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
[...]


-- Additional comment from cmarthal on 2007-09-25 09:46 EST --
More debugging from the only node not completely hung...

[root@link-02 tmp]# ./lvm_backtraces.pl
Backtrace for lvs-a-o+devices (489):
#1  0x000000000044fde3 in pool_labeller_create ()
#2  0x00000000004500e3 in pool_labeller_create ()
#3  0x000000000043cea6 in check_lvm1_vg_inactive ()
#4  0x000000000043d0b9 in lock_vol ()
#5  0x000000000041f2ef in process_each_lv ()
#6  0x000000000041ceaa in pvscan ()
#7  0x0000000000414c42 in lvm_run_command ()
#8  0x00000000004155f2 in lvm2_main ()
#9  0x00000031b871c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#10 0x000000000040c84a in _start ()
#11 0x0000007fbffffa38 in ?? ()
#12 0x000000000000001c in ?? ()
#13 0x0000000000000004 in ?? ()
#14 0x0000007fbffffc26 in ?? ()
#15 0x0000007fbffffc2a in ?? ()
#16 0x0000007fbffffc2d in ?? ()
#17 0x0000007fbffffc30 in ?? ()

Sep 25 08:38:36 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:36 link-02 last message repeated 3 times
Sep 25 08:38:36 link-02 qarshd[12229]: Nothing to do
Sep 25 08:38:36 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:39 link-02 last message repeated 55 times
Sep 25 08:38:39 link-02 qarshd[12199]: Nothing to do
Sep 25 08:38:39 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:39 link-02 last message repeated 3 times
Sep 25 08:38:39 link-02 qarshd[12229]: Nothing to do
Sep 25 08:38:39 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:41 link-02 last message repeated 39 times
Sep 25 08:38:41 link-02 kernel: dm-cmirror: LOG INFO:
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   uuid:
LVM-vEL8KkKGwA4X8JFaykzpGjepOwn908GGm3jPYhF61PD5tuohdgtzNmEIQ0c5dDtC
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   uuid_ref    : 1
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   log type    : disk
Sep 25 08:38:41 link-02 kernel: dm-cmirror:  ?region_count: 1600
Sep 25 08:38:41 link-02 kernel: dm-cmirror:  ?sync_count  : 1079
Sep 25 08:38:41 link-02 kernel: dm-cmirror:  ?sync_search : 0
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   in_sync     : NO
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   suspended   : NO
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   server_id   : [[link-02]]
Sep 25 08:38:41 link-02 kernel: dm-cmirror:   server_valid: YES
Sep 25 08:38:41 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:42 link-02 last message repeated 15 times
Sep 25 08:38:42 link-02 qarshd[12199]: Nothing to do
Sep 25 08:38:42 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:42 link-02 last message repeated 3 times
Sep 25 08:38:42 link-02 qarshd[12229]: Nothing to do
Sep 25 08:38:42 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
Sep 25 08:38:45 link-02 last message repeated 55 times
Sep 25 08:38:45 link-02 qarshd[12199]: Nothing to do
Sep 25 08:38:45 link-02 kernel: dm-cmirror: Remote recovery conflict: 0/Q0c5dDtC
DLM lockspace 'clvmd'

Resource 000001002fbcfe88 (parent 0000000000000000). Name (len=64)
"vEL8KkKGwA4X8JFaykzpGjepOwn908GG19cPvkWWQOJsav7yEvp20ydUxz2F8SZP"
Local Copy, Master is node [[link-08]]
Granted Queue
000103b6 CR 8657 Master:     00010201
Conversion Queue
Waiting Queue

Resource 0000010032a4d6e8 (parent 0000000000000000). Name (len=16)
"V_helter_skelter"
Local Copy, Master is node [[link-07]]
Granted Queue
000103d0 PW 12221 Master:     00010007
Conversion Queue
Waiting Queue

Resource 000001002e7db9d8 (parent 0000000000000000). Name (len=12) "V_VolGroup00"
Master Copy
Granted Queue
000100a9 PR 490
Conversion Queue
Waiting Queue





-- Additional comment from cmarthal on 2007-09-25 09:49 EST --
This was with the latest code:
2.6.9-59.ELsmp
cmirror-kernel-smp-2.6.9-36.2



-- Additional comment from jbrassow on 2007-09-27 16:31 EST --
It's a good bet this was a eariler screw up.  We can't allow writes to regions
that have not been recovered yet.  We see if they have bee recovered by looking
at the 'sync_bits'.

In some device failure cases, regions must be marked 'out-of-sync' - this was
causing a following write to block because it thought the region had not yet
been recovered - when in fact, it had just been put out-of-sync due to failing
device.

new -> post


-- Additional comment from cmarthal on 2007-10-12 14:49 EST --
This bug appears to still very much exist.

The sync hung once again right after mirror leg failure to three mirrors all
having three legs. I appear to see this more often with the multiple mirror
scenarios.


There is a constant stream of the following messages:
dm-cmirror: Priority recovery region: 18446744073709551615/Pn10cLPV


The strace for the sync is exact same:
Oct 12 11:42:31 link-02 kernel: sync          R  running task       0  7539  
7538                     (NOTLB)
Oct 12 11:42:31 link-02 kernel: kmirrord      S 000001003e8baf48     0  7540   
  6          7558  7355 (L-TLB)
Oct 12 11:42:31 link-02 kernel: 0000010022de7e68 0000000000000046
000001002e327260 0000010000000073
Oct 12 11:42:31 link-02 kernel:        0000010022de7e68 000000003e8bae18
000001000101ba80 000000013e8bae00
Oct 12 11:42:31 link-02 kernel:        0000010027b007f0 0000000000000840
Oct 12 11:42:31 link-02 kernel: Call
Trace:<ffffffffa00cd6a2>{:dm_mirror:do_work+0} <ffffffff80148119>{worker_thread+226}
Oct 12 11:42:31 link-02 kernel:       
<ffffffff8013461d>{default_wake_function+0} <ffffffff8013466e>{__wake_up_common+67}
Oct 12 11:42:31 link-02 kernel:       
<ffffffff8013461d>{default_wake_function+0}
<ffffffff8014bf28>{keventd_create_kthread+0}
Oct 12 11:42:31 link-02 kernel:        <ffffffff80148037>{worker_thread+0}
<ffffffff8014bf28>{keventd_create_kthread+0}
Oct 12 11:42:31 link-02 kernel:        <ffffffff8014beff>{kthread+200}
<ffffffff80110f47>{child_rip+8}
Oct 12 11:42:31 link-02 kernel:       
<ffffffff8014bf28>{keventd_create_kthread+0} <ffffffff8014be37>{kthread+0}
Oct 12 11:42:31 link-02 kernel:        <ffffffff80110f3f>{child_rip+0}



-- Additional comment from cmarthal on 2007-10-12 14:52 EST --
This doesn't cause future I/O to block, and clvmd is not deadlocked here either.
The leg failure recovery properly takes place.

[root@link-08 syncd_primary_3legs_3]# lvs -a -o +devices
  /dev/sda1: open failed: No such device or address
  LV                               VG             Attr   LSize   Origin Snap% 
Move Log                        Copy%  Devices            
  LogVol00                         VolGroup00     -wi-ao  72.38G               
                                      /dev/hda2(0)       
  LogVol01                         VolGroup00     -wi-ao   1.94G               
                                      /dev/hda2(2316)    
  syncd_primary_3legs_1            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_1_mlog 100.00
syncd_primary_3legs_1_mimage_2(0),syncd_primary_3legs_1_mimage_1(0)
  [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 800.00M               
                                      /dev/sdf1(0)       
  [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 800.00M               
                                      /dev/sdc1(0)       
  [syncd_primary_3legs_1_mlog]     helter_skelter lwi-ao   4.00M               
                                      /dev/sdg1(0)       
  syncd_primary_3legs_2            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_2_mlog 100.00
syncd_primary_3legs_2_mimage_2(0),syncd_primary_3legs_2_mimage_1(0)
  [syncd_primary_3legs_2_mimage_1] helter_skelter iwi-ao 800.00M               
                                      /dev/sdf1(200)     
  [syncd_primary_3legs_2_mimage_2] helter_skelter iwi-ao 800.00M               
                                      /dev/sdc1(200)     
  [syncd_primary_3legs_2_mlog]     helter_skelter lwi-ao   4.00M               
                                      /dev/sdg1(1)       
  syncd_primary_3legs_3            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_3_mlog 100.00
syncd_primary_3legs_3_mimage_2(0),syncd_primary_3legs_3_mimage_1(0)
  [syncd_primary_3legs_3_mimage_1] helter_skelter iwi-ao 800.00M               
                                      /dev/sdf1(400)     
  [syncd_primary_3legs_3_mimage_2] helter_skelter iwi-ao 800.00M               
                                      /dev/sdc1(400)     
  [syncd_primary_3legs_3_mlog]     helter_skelter lwi-ao   4.00M               
                                      /dev/sdg1(2)

-- Additional comment from jbrassow on 2007-10-24 22:17 EST --
This is very strange.  I can't seem to make things "hang", but things do become
abismally slow for certain processes.  Perhaps your type of work load backs up
enough I/O that when the sync happens, it appears to hang (because it is so slow).

I'm not sure where the slowdown is coming from.  cmirror network calls take
virtually no time.  However, I have noticed considerable delay between (core)
mirror 'do_writes' calls.  From system tap:

Entering:      do_writes(time_between=879,consult_calls=1,consult_time=0)
Entering:      do_writes(time_between=201,consult_calls=5,consult_time=1)
Entering:      do_writes(time_between=0,consult_calls=3,consult_time=0)
Entering:      do_writes(time_between=31,consult_calls=3,consult_time=0)
Entering:      do_writes(time_between=200,consult_calls=1,consult_time=0)
Entering:      do_writes(time_between=0,consult_calls=3,consult_time=0)
Entering:      do_writes(time_between=28,consult_calls=3,consult_time=0)
Entering:      do_writes(time_between=4998,consult_calls=1,consult_time=0)

We can see that the time between calls (in ms) can be as high as 5 seconds...

Also during my tests, I've seen processes slow down on mirrors which have not
suffered a failure (after another mirror has recovered from a failure).

I'll have to test if the slow down occurs without a failure.

-- Additional comment from jbrassow on 2007-10-24 23:10 EST --
The problem could be occuring in the core mirroring code.  From
drivers/md/dm-raid1.c:

static void do_work(void *data)
{
	while (do_mirror(data)) {
		set_current_state(TASK_INTERRUPTIBLE);
		schedule_timeout(HZ/5);
	}
}

'wake(ms)' is the usual call that triggers 'do_work'.  However, do_mirror will
return 1 if there is still work to be done - causing a 200ms delay before it is
run again.  Normally, this happens very infrequently - like when a failure
occurs.  In those cases, we may _want_ to delay and wait for a suspend. 
However, it also happens in response to the normal write queue having data on
it.  This can be filled by nominal I/O or by I/O placed back on the queue due to
remote recovery conflicts (from cluster mirror).  In the event of remote
recovery conflicts, we may want the delay.  However, we never want the delay
when the write queue is filled by normal means... and that seems to be exactly
what is happening.

This is probably happening to single machine mirroring too, but to a lesser
extend because more delay is introduced with the network calls cmirror makes -
allowing more time to add something to the nominal queue before 'do_mirror'
completes.

Will test theory.

-- Additional comment from jbrassow on 2007-10-25 00:54 EST --
that's it.


-- Additional comment from jbrassow on 2007-10-30 12:16 EST --
Summary:

The small sleep in drivers/md/dm-raid1.c:do_work() was designed to:
1) allow the userspace process to more easily proceed to reconfigure a mirror in
the event of a failure.
2) allow remote machines (cluster mirror) to finish recovery of a region before
resubmitting the I/O.

However, the sleep is having adverse affects on nominal I/O - sometimes slowing
processes down by greater than a factor of 10.  This can make applications
appear hung.

This problem occurs most frequently in cluster mirrors under heavy I/O (or
during failure scenarios).  It is possible in single machine mirroring, but
hasn't been seen yet.

This bug does not corrupt data, but it certainly effects "AVAILABILITY" - which
is especially bad considering mirrors are used when you want to increase
availability.

Comment 1 RHEL Program Management 2007-10-30 18:55:23 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 2 Don Zickus 2007-12-21 20:18:23 UTC

in 2.6.18-62.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 5 errata-xmlrpc 2008-05-21 14:59:54 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.