Bug 431755 - RHEL5 cmirror tracker: server can't handle log device failure
Summary: RHEL5 cmirror tracker: server can't handle log device failure
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cmirror (Show other bugs)
(Show other bugs)
Version: 5.2
Hardware: All Linux
high
high
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Keywords: TestBlocker
Depends On:
Blocks: 430797
TreeView+ depends on / blocked
 
Reported: 2008-02-06 19:41 UTC by Corey Marthaler
Modified: 2010-04-27 15:03 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-04-27 15:03:03 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Corey Marthaler 2008-02-06 19:41:39 UTC
Description of problem:
Senario: Kill disk log of synced 2 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      syncd_log_2legs
* sync:      1
* mirrors:   1
* disklog:   1
* failpv:    /dev/sde1
* legs:      2
* pvs:       /dev/sdd1 /dev/sdh1 /dev/sde1
************************************************

Creating mirror(s) on taft-01...
taft-01: lvcreate -m 1 -n syncd_log_2legs_1 -L 800M helter_skelter
/dev/sdd1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150

Waiting until all mirrors become fully syncd...
        0/1 mirror(s) are fully synced: ( 1=0.00% )
        0/1 mirror(s) are fully synced: ( 1=32.50% )
        0/1 mirror(s) are fully synced: ( 1=64.50% )
        0/1 mirror(s) are fully synced: ( 1=96.50% )
        1/1 mirror(s) are fully synced: ( 1=100.00% )

Creating gfs on top of mirror(s) on taft-01...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        11588
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit

        ---- taft-02 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10811
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit

        ---- taft-03 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        11268
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit

        ---- taft-04 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        11254
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit


<start name="taft-01_1" pid="4798" time="Wed Feb  6 11:48:24 2008" type="cmd" />
<start name="taft-02_1" pid="4800" time="Wed Feb  6 11:48:24 2008" type="cmd" />
<start name="taft-03_1" pid="4802" time="Wed Feb  6 11:48:24 2008" type="cmd" />
<start name="taft-04_1" pid="4804" time="Wed Feb  6 11:48:24 2008" type="cmd" />

Disabling device sde on taft-01
Disabling device sde on taft-02
Disabling device sde on taft-03
Disabling device sde on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-01
10+0 records in
10+0 records out
[DEADLOCK]


The result is the server goes crazy:
Feb  6 11:48:08 taft-01 qarshd[11594]: Running cmdline: echo offline >
/sys/block/sde/device/state
Feb  6 11:48:08 taft-01 xinetd[6233]: EXIT: qarsh status=0 pid=11594 duration=0(sec)
Feb  6 11:48:08 taft-01 kernel: sd 1:0:0:4: rejecting I/O to offline device
Feb  6 11:48:08 taft-01 clogd[6761]: rw_log:  write failure: Input/output error
Feb  6 11:48:08 taft-01 clogd[6761]: Error writing to disk log
Feb  6 11:48:08 taft-01 clogd[6761]: rw_log:  write failure: Input/output error
Feb  6 11:48:08 taft-01 kernel: sd 1:0:0:4: rejecting I/O to offline device
Feb  6 11:48:08 taft-01 clogd[6761]: Error writing to disk log
Feb  6 11:48:08 taft-01 kernel: sd 1:0:0:4: rejecting I/O to offline device
Feb  6 11:48:08 taft-01 clogd[6761]: rw_log:  write failure: Input/output error
Feb  6 11:48:08 taft-01 clogd[6761]: Error writing to disk log
[...]


And the clients can no longer communicate:
Feb  6 11:48:15 taft-03 kernel: device-mapper: dm-log-clustered: Server error
while processing request [DM_CLOG_MARK_REGION]: -5
Feb  6 11:48:15 taft-03 kernel: device-mapper: dm-log-clustered: Server error
while processing request [DM_CLOG_GET_RESYNC_WORK]: -5
Feb  6 11:48:46 taft-03 last message repeated 2865 times
Feb  6 11:49:47 taft-03 last message repeated 6254 times


Version-Release number of selected component (if applicable):
cmirror-1.1.11-1.el5
kmod-cmirror-0.1.5-2.el5
lvm2-2.02.32-1.el5
lvm2-cluster-2.02.32-1.el5

Comment 1 RHEL Product and Program Management 2008-02-06 19:47:04 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 2 Corey Marthaler 2008-02-06 21:22:57 UTC
This is reproducable everytime. Marking as a Testblocker.

Comment 3 Corey Marthaler 2008-02-08 16:59:29 UTC
This bug is verified fixed in cmirror-1.1.13-1.el5/kmod-cmirror-0.1.6-1.el5.

Comment 4 RHEL Product and Program Management 2008-03-11 19:36:37 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 6 Alasdair Kergon 2010-04-27 15:03:03 UTC
Assuming this VERIFIED fix got released.  Closing.
Reopen if it's not yet resolved.


Note You need to log in before you can comment on or make changes to this bug.