Bug 441370 - single machine mirror log failure caused machine to 'lockup'
Summary: single machine mirror log failure caused machine to 'lockup'
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: lvm2
Version: 4.7
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-07 19:01 UTC by Corey Marthaler
Modified: 2010-05-17 11:50 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-05-17 11:50:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2008-04-07 19:01:06 UTC
Description of problem:
My machine locked up (i couldn't log in or anything) due to all the I/O error
messages when the log device was failed. Because of that, I don't have a lot of
info to report.

 Senario: Kill disk log of synced 2 leg mirror(s)

 ****** Mirror hash info for this scenario ******
 * name:      syncd_log_2legs
 * sync:      1
 * mirrors:   2
 * disklog:   1
 * failpv:    /dev/sde1
 * legs:      2
 * pvs:       /dev/sdh1 /dev/sdg1 /dev/sde1
 ************************************************

 Creating mirror(s) on taft-01...
 taft-01: lvcreate -m 1 -n syncd_log_2legs_1 -L 800M manson /dev/sdh1:0-1000
/dev/sdg1:0-1000 /dev/sde1:0-150
 taft-01: lvcreate -m 1 -n syncd_log_2legs_2 -L 800M manson /dev/sdh1:0-1000
/dev/sdg1:0-1000 /dev/sde1:0-150

 Waiting until all mirrors become fully syncd...
         0/2 mirror(s) are fully synced: ( 1=2.50% 2=0.00% )
         0/2 mirror(s) are fully synced: ( 1=22.50% 2=19.00% )
         0/2 mirror(s) are fully synced: ( 1=43.00% 2=38.50% )
         0/2 mirror(s) are fully synced: ( 1=63.00% 2=57.50% )
         0/2 mirror(s) are fully synced: ( 1=83.00% 2=76.50% )
         1/2 mirror(s) are fully synced: ( 1=100.00% 2=97.50% )
         2/2 mirror(s) are fully synced: ( 1=100.00% 2=100.00% )

 Creating ext on top of mirror(s) on taft-01...
 mke2fs 1.35 (28-Feb-2004)
 mke2fs 1.35 (28-Feb-2004)
 Mounting mirrored ext filesystems on taft-01...

 Writing verification files (checkit) to mirror(s) on...
         ---- taft-01 ----
 checkit starting with:
 CREATE
 Num files:          100
 Random Seed:        27860
 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
 Working dir:        /mnt/syncd_log_2legs_1/checkit
 checkit starting with:
 CREATE
 Num files:          100
 Random Seed:        27864
 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2
 Working dir:        /mnt/syncd_log_2legs_2/checkit

 <start name="taft-01_1" pid="6097" time="Mon Apr  7 12:50:18 2008" type="cmd" />
 <start name="taft-01_2" pid="6099" time="Mon Apr  7 12:50:18 2008" type="cmd" />
 Sleeping 12 seconds to get some outsanding I/O locks before the failure

 Disabling device sde on taft-01

 Attempting I/O to cause mirror down conversion(s) on taft-01
 10+0 records in
 10+0 records out

 Didn't receive heartbeat for 120 seconds


Syslog:
Apr  7 12:48:58 taft-01 qarshd[27875]: Running cmdline: echo offline >
/sys/block/sde/device/state
Apr  7 12:49:03 taft-01 qarshd[27877]: Talking to peer 10.15.80.47:39760
Apr  7 12:49:03 taft-01 qarshd[27877]: Running cmdline: dd if=/dev/zero
of=/mnt/syncd_log_2legs_1/ddfile count=10 bs=4M
Apr  7 12:49:03 taft-01 qarshd[27879]: Talking to peer 10.15.80.47:39762
Apr  7 12:49:03 taft-01 qarshd[27879]: Running cmdline: sync
Apr  7 12:49:03 taft-01 lvm[6987]: Log device, 253:6, has failed.
Apr  7 12:49:03 taft-01 lvm[6987]: Device failure in manson-syncd_log_2legs_2
Apr  7 12:49:03 taft-01 lvm[6987]: WARNING: dev_open(/etc/lvm/lvm.conf) called
while suspended
Apr  7 12:49:03 taft-01 lvm[6987]: Another thread is handling an event.  Waiting...
Apr  7 12:49:03 taft-01 lvm[6987]: WARNING: dev_open(/dev/ramdisk) called while
suspended
Apr  7 12:49:03 taft-01 last message repeated 2 times
Apr  7 12:49:03 taft-01 lvm[6987]: WARNING: dev_open(/dev/cdrom) called while
suspended
Apr  7 12:49:03 taft-01 kernel: scsi1 (0:4): rejecting I/O to offline device
Apr  7 12:49:03 taft-01 kernel: device-mapper: Failing mirror log device 253:6.
Apr  7 12:49:03 taft-01 kernel: scsi1 (0:4): rejecting I/O to offline device
Apr  7 12:49:03 taft-01 last message repeated 6 times
Apr  7 12:49:03 taft-01 kernel: device-mapper: Failing mirror log device 253:2.
Apr  7 12:49:03 taft-01 kernel: scsi1 (0:4): rejecting I/O to offline device
Apr  7 12:49:03 taft-01 last message repeated 12 times
Apr  7 12:49:03 taft-01 lvm[6987]: WARNING: dev_open(/dev/sda) called while
suspended
Apr  7 12:49:03 taft-01 kernel: scsi1 (0:4): rejecting I/O to offline device
Apr  7 12:49:03 taft-01 last message repeated 5 times
Apr  7 12:49:03 taft-01 lvm[6987]: WARNING: dev_open(/dev/sda) called while
suspended
Apr  7 12:49:03 taft-01 kernel: scsi1 (0:4): rejecting I/O to offline device
Apr  7 12:49:03 taft-01 last message repeated 11 times
[...]


Version-Release number of selected component (if applicable):
2.6.9-68.26.ELsmp
lvm2-2.02.27-2.el4_6.2
device-mapper-1.02.21-1.el4

Comment 1 Milan Broz 2010-05-17 11:50:11 UTC
There were many changes in code since the reported version (reported lvm2 version is from RHEL4.6) which should help in this situation.
Please reopen if it is reproducible with current build.


Note You need to log in before you can comment on or make changes to this bug.