432064 – RHEL5 cmirror tracker: log device failure can leave mirror write path super slow

Bug 432064 - RHEL5 cmirror tracker: log device failure can leave mirror write path super slow

Summary: RHEL5 cmirror tracker: log device failure can leave mirror write path super slow

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	cmirror
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Jonathan Earl Brassow
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-02-08 17:36 UTC by Corey Marthaler
Modified:	2010-01-12 02:06 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-01-20 21:27:09 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2009:0158	0	normal	SHIPPED_LIVE	new package: cmirror	2009-01-20 16:05:16 UTC

Description Corey Marthaler 2008-02-08 17:36:52 UTC

Description of problem:
This may very well be related to or even the same as bz 431696. Just like in bz
 431696, this involved a failure to a 3-way mirror. At first I though it was
deadlocked just like in 431696, but then I realized that it was just extremely
slow. It took a 1/2 hour after the log failure until the I/O started again. Even
then, the mirror syncing was so slow that the mirror still wasn't 100% synced
yet. I'll have to re-verify now that bz 431696 is actually stuck and not just
really slow too. 


Senario: Kill disk log of non synced 3 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      nonsyncd_log_3legs
* sync:      0
* mirrors:   1
* disklog:   1
* failpv:    /dev/sde1
* legs:      3
* pvs:       /dev/sdg1 /dev/sdd1 /dev/sdh1 /dev/sde1
************************************************

Creating mirror(s) on taft-04...
taft-04: lvcreate -m 2 -n nonsyncd_log_3legs_1 -L 800M helter_skelter
/dev/sdg1:0-1000 /dev/sdd1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150

Continuing on without fully syncd mirrors, currently at...
        ( 1=0.00% )

Creating gfs on top of mirror(s) on taft-04...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        8654
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

        ---- taft-02 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        8501
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

        ---- taft-03 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        8764
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

        ---- taft-04 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        8401
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit


<start name="taft-01_1" pid="5141" time="Fri Feb  8 11:01:53 2008" type="cmd" />
<start name="taft-02_1" pid="5143" time="Fri Feb  8 11:01:53 2008" type="cmd" />
<start name="taft-03_1" pid="5145" time="Fri Feb  8 11:01:53 2008" type="cmd" />
<start name="taft-04_1" pid="5147" time="Fri Feb  8 11:01:53 2008" type="cmd" />

Disabling device sde on taft-01
Disabling device sde on taft-02
Disabling device sde on taft-03
Disabling device sde on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-04
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.081922 seconds, 512 MB/s
Verifying the down conversion of the failed mirror(s)
  /dev/sde1: open failed: No such device or address
Verifying LEG device /dev/sdg1 *IS* in the volume(s)
  /dev/sde1: open failed: No such device or address
Verifying LEG device /dev/sdd1 *IS* in the volume(s)
  /dev/sde1: open failed: No such device or address
Verifying LEG device /dev/sdh1 *IS* in the volume(s)
  /dev/sde1: open failed: No such device or address
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
  /dev/sde1: open failed: No such device or address

Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----
checkit starting with:

[ *** stuck for 30 min before finishing properly *** ]

VERIFY
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

        ---- taft-02 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

        ---- taft-03 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

        ---- taft-04 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

Enabling device sde on taft-01
Enabling device sde on taft-02
Enabling device sde on taft-03
Enabling device sde on taft-04

Recreating PV /dev/sde1
  WARNING: Volume group helter_skelter is not consistent
  WARNING: Volume Group helter_skelter is not consistent
  WARNING: Volume group helter_skelter is not consistent
Extending the recreated PV back into VG helter_skelter
Since we can't yet up convert existing mirrors, down converting to linear(s)
on taft-04 before re-converting back to original mirror(s)
Up converting linear(s) back to mirror(s) on taft-04...
taft-04: lvconvert -m 2 -b helter_skelter/nonsyncd_log_3legs_1 /dev/sdg1:0-1000
/dev/sdd1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150
Verifying the up conversion from linear(s) to mirror(s)
Verifying device /dev/sdg1 *IS* in the mirror(s)
Verifying device /dev/sdd1 *IS* in the mirror(s)
Verifying device /dev/sdh1 *IS* in the mirror(s)
Verifying device /dev/sde1 *IS* in the mirror(s)

Waiting until all mirrors become fully syncd...
        0/1 mirror(s) are fully synced: ( 1=45.50% )
        0/1 mirror(s) are fully synced: ( 1=71.50% )
        0/1 mirror(s) are fully synced: ( 1=96.50% )
        1/1 mirror(s) are fully synced: ( 1=100.00% )

Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

        ---- taft-02 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

        ---- taft-03 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

        ---- taft-04 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_nonsyncd_log_3legs_1
Working dir:        /mnt/nonsyncd_log_3legs_1/checkit

Stopping the io load (collie/xdoio) on mirror(s)
Unmounting gfs and removing mnt point on taft-01...
Unmounting gfs and removing mnt point on taft-02...
Unmounting gfs and removing mnt point on taft-03...
Unmounting gfs and removing mnt point on taft-04...
Deactivating and removing mirror(s)
Checking for left over mirror dm devices


Version-Release number of selected component (if applicable):
cmirror-1.1.13-1.el5
kmod-cmirror-0.1.6-1.el5

Comment 1 RHEL Program Management 2008-02-08 21:28:02 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Jonathan Earl Brassow 2008-05-02 14:27:38 UTC

On Feb-08-2008, this bug was filed.  Later, this commit was made:
commit 172f68f3b807636434071c6855e4d4157bcbc96b
Author: Jonathan Brassow <jbrassow>
Date:   Wed Feb 13 15:06:23 2008 +0000

    - change the way 'is_remote_recovering' works to improve overall
      performance.

    Before a mirror issues a write, it must call 'is_remote_recovering'
    to ensure that another machine will not be recovering the region during
    the write.  This function can dramatically slow things down.  One way to
    increase performance is to note when the mirror is in-sync - then
    is_remote_recovering can return 0 without having to send the request
    around the cluster.  (This has already been done.)  This greatly speeds up
    I/O during nominal mirror operation.  However, I/O during mirror resyncing
    is still greatly reduced.  The problem is that the cluster network is
    consumed with handling 'is_remote_recovering' calls that it becomes hard
    to actually do the recovery.  The fix is to only allow one
    is_remote_recovering call to go to the cluster every 1/4 sec.  When the
    call goes up to userspace, it also retrieves info about how far along
    the resync is.  If a request is determined to already be in sync by that
    info, then the region is not recovering and can safely be answered without
    having to send the request on to the cluster.  This approach has greatly
    improved both the recovery and nominal throughput.

Comment 4 Corey Marthaler 2008-08-25 22:29:02 UTC

I don't appear to be seeing this anymore. Will reopen if seen again.

Comment 6 errata-xmlrpc 2009-01-20 21:27:09 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0158.html

Note You need to log in before you can comment on or make changes to this bug.