Bug 431755 - RHEL5 cmirror tracker: server can't handle log device failure
RHEL5 cmirror tracker: server can't handle log device failure
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cmirror (Show other bugs)
5.2
All Linux
high Severity high
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
: TestBlocker
Depends On:
Blocks: 430797
  Show dependency treegraph
 
Reported: 2008-02-06 14:41 EST by Corey Marthaler
Modified: 2010-04-27 11:03 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-04-27 11:03:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2008-02-06 14:41:39 EST
Description of problem:
Senario: Kill disk log of synced 2 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      syncd_log_2legs
* sync:      1
* mirrors:   1
* disklog:   1
* failpv:    /dev/sde1
* legs:      2
* pvs:       /dev/sdd1 /dev/sdh1 /dev/sde1
************************************************

Creating mirror(s) on taft-01...
taft-01: lvcreate -m 1 -n syncd_log_2legs_1 -L 800M helter_skelter
/dev/sdd1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150

Waiting until all mirrors become fully syncd...
        0/1 mirror(s) are fully synced: ( 1=0.00% )
        0/1 mirror(s) are fully synced: ( 1=32.50% )
        0/1 mirror(s) are fully synced: ( 1=64.50% )
        0/1 mirror(s) are fully synced: ( 1=96.50% )
        1/1 mirror(s) are fully synced: ( 1=100.00% )

Creating gfs on top of mirror(s) on taft-01...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        11588
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit

        ---- taft-02 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10811
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit

        ---- taft-03 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        11268
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit

        ---- taft-04 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        11254
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit


<start name="taft-01_1" pid="4798" time="Wed Feb  6 11:48:24 2008" type="cmd" />
<start name="taft-02_1" pid="4800" time="Wed Feb  6 11:48:24 2008" type="cmd" />
<start name="taft-03_1" pid="4802" time="Wed Feb  6 11:48:24 2008" type="cmd" />
<start name="taft-04_1" pid="4804" time="Wed Feb  6 11:48:24 2008" type="cmd" />

Disabling device sde on taft-01
Disabling device sde on taft-02
Disabling device sde on taft-03
Disabling device sde on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-01
10+0 records in
10+0 records out
[DEADLOCK]


The result is the server goes crazy:
Feb  6 11:48:08 taft-01 qarshd[11594]: Running cmdline: echo offline >
/sys/block/sde/device/state
Feb  6 11:48:08 taft-01 xinetd[6233]: EXIT: qarsh status=0 pid=11594 duration=0(sec)
Feb  6 11:48:08 taft-01 kernel: sd 1:0:0:4: rejecting I/O to offline device
Feb  6 11:48:08 taft-01 clogd[6761]: rw_log:  write failure: Input/output error
Feb  6 11:48:08 taft-01 clogd[6761]: Error writing to disk log
Feb  6 11:48:08 taft-01 clogd[6761]: rw_log:  write failure: Input/output error
Feb  6 11:48:08 taft-01 kernel: sd 1:0:0:4: rejecting I/O to offline device
Feb  6 11:48:08 taft-01 clogd[6761]: Error writing to disk log
Feb  6 11:48:08 taft-01 kernel: sd 1:0:0:4: rejecting I/O to offline device
Feb  6 11:48:08 taft-01 clogd[6761]: rw_log:  write failure: Input/output error
Feb  6 11:48:08 taft-01 clogd[6761]: Error writing to disk log
[...]


And the clients can no longer communicate:
Feb  6 11:48:15 taft-03 kernel: device-mapper: dm-log-clustered: Server error
while processing request [DM_CLOG_MARK_REGION]: -5
Feb  6 11:48:15 taft-03 kernel: device-mapper: dm-log-clustered: Server error
while processing request [DM_CLOG_GET_RESYNC_WORK]: -5
Feb  6 11:48:46 taft-03 last message repeated 2865 times
Feb  6 11:49:47 taft-03 last message repeated 6254 times


Version-Release number of selected component (if applicable):
cmirror-1.1.11-1.el5
kmod-cmirror-0.1.5-2.el5
lvm2-2.02.32-1.el5
lvm2-cluster-2.02.32-1.el5
Comment 1 RHEL Product and Program Management 2008-02-06 14:47:04 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 2 Corey Marthaler 2008-02-06 16:22:57 EST
This is reproducable everytime. Marking as a Testblocker.
Comment 3 Corey Marthaler 2008-02-08 11:59:29 EST
This bug is verified fixed in cmirror-1.1.13-1.el5/kmod-cmirror-0.1.6-1.el5.
Comment 4 RHEL Product and Program Management 2008-03-11 15:36:37 EDT
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 6 Alasdair Kergon 2010-04-27 11:03:03 EDT
Assuming this VERIFIED fix got released.  Closing.
Reopen if it's not yet resolved.

Note You need to log in before you can comment on or make changes to this bug.