Bug 222502 - failures with dmevent cause cmirror issues
failures with dmevent cause cmirror issues
Status: CLOSED NOTABUG
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cmirror (Show other bugs)
4
All Linux
high Severity high
: ---
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
: Reopened, TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-01-12 18:56 EST by Corey Marthaler
Modified: 2010-01-11 21:01 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-03-02 08:36:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2007-01-12 18:56:01 EST
Description of problem:
I've tried this case many times with single node mirrors and it always appears
to work, but when I tried this on a cmirror with the latest rpms. It failed to
down convert the cmirror to a linear. This causes the volume group and mirror to
appear corrupted and also causes I/O errors when attempt to do I/O to the mirror.

[root@link-02 ~]# lvs -a -o +devices
  LV                VG   Attr   LSize   Origin Snap%  Move Log         Copy% 
Devices                          
  mirror            vg   mwi-a- 100.00M                    mirror_mlog 100.00
mirror_mimage_0(0),mirror_mimage_1(0)
  [mirror_mimage_0] vg   iwi-ao 100.00M                                      
/dev/sdh1(0)                     
  [mirror_mimage_1] vg   iwi-ao 100.00M                                      
/dev/sda1(0)                     
  [mirror_mlog]     vg   lwi-ao   4.00M                                      
/dev/sdb1(0)                     

# Here I turned off /dev/sdh

[root@link-02 ~]# lvs -a -o +devices
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 104792064: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 1296954228736: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 1024 at 259384082432: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 512 at 259384082432: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh3: read failed after 0 of 512 at 259384082432: Input/output error
  /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh5: read failed after 0 of 1024 at 259384082432: Input/output error
  /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'KaBiI2-y4zs-4EPW-sPPS-X5uZ-gWLt-7zTctH'.
  Couldn't find all physical volumes for volume group vg.
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'KaBiI2-y4zs-4EPW-sPPS-X5uZ-gWLt-7zTctH'.
  Couldn't find all physical volumes for volume group vg.
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'KaBiI2-y4zs-4EPW-sPPS-X5uZ-gWLt-7zTctH'.
  Couldn't find all physical volumes for volume group vg.
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'KaBiI2-y4zs-4EPW-sPPS-X5uZ-gWLt-7zTctH'.
  Couldn't find all physical volumes for volume group vg.
  Volume group "vg" not found


# I then attempted to write to the mirror to force the down convert, but it
didn't work so I then attempt a write from all nodes in the cluster and still
lvm reports this volume group and mirror as corrupted. 




Version-Release number of selected component (if applicable):
2.6.9-43.ELsmp
lvm2-cluster-2.02.18-1.el4
lvm2-2.02.18-1.el4
device-mapper-1.02.14-2.el4
cmirror-1.0.1-0
cmirror-kernel-smp-2.6.9-18.6
Comment 1 Corey Marthaler 2007-01-15 11:25:04 EST
I tried that again with gfs on top of the 2 cmirrors and had I/O going to them
from all nodes in the cluster. After the primary leg failure, I again saw this
issue and all the I/O to those mirrors ended up deadlocking.

This bug needs to be on the blocker list for cmirrors. 
Comment 2 Kiersten (Kerri) Anderson 2007-01-15 11:33:13 EST
Devel ACK for blocker-beta for cluster 4.5
Comment 3 Jonathan Earl Brassow 2007-01-15 11:43:01 EST
Was the cluster mirror in-sync?

Did you get any core dumps?  (There have been changes made to dmeventd - check
to ensure that it is running.)
Comment 4 Corey Marthaler 2007-01-15 12:13:57 EST
The mirrors have always been in-sync before attempting the failure case and I
didn't see any core dumps and have no reason to believe that dmeventd wasn't
running.
Comment 5 Corey Marthaler 2007-01-15 13:34:22 EST
It appears that dmeventd is running right up until a write is attempted to the
failed mirror, at which time it stops for some reason.
Comment 6 Jonathan Earl Brassow 2007-01-15 14:07:56 EST
where "for some reason" == seg fault

try:

sysctl kernel.core_pattern=/tmp/core

when you start up.  The your core files will appear as /tmp/core.*

Comment 7 Corey Marthaler 2007-01-15 15:55:59 EST
I am also seeing cmirror creation issues now due to what also appears to be
dmeventd failures. Changing title of this bug and bumping priority.

[root@link-08 ~]# lvcreate -m 1 -L 1G -n mirror vg /dev/sda1:0-2000
/dev/sdb1:0-2000 /dev/sdh1:0-100
  Error locking on node link-07: vg-mirror: event registration failed:
libdevmapper-event-lvm2mirror.so
LVM-RgEPrNphjR3yUmyDl8At2ccc3MBgeEYiq09j2Q60abIsnd8liJdela6Z05Sotbss 65280 0
  Error locking on node link-04: vg-mirror: event registration failed:
libdevmapper-event-lvm2mirror.so
LVM-RgEPrNphjR3yUmyDl8At2ccc3MBgeEYiq09j2Q60abIsnd8liJdela6Z05Sotbss 65280 0
  Error locking on node link-02: vg-mirror: event registration failed:
libdevmapper-event-lvm2mirror.so
LVM-RgEPrNphjR3yUmyDl8At2ccc3MBgeEYiq09j2Q60abIsnd8liJdela6Z05Sotbss 65280 0
  Error locking on node link-08: vg-mirror: event registration failed:
libdevmapper-event-lvm2mirror.so
LVM-RgEPrNphjR3yUmyDl8At2ccc3MBgeEYiq09j2Q60abIsnd8liJdela6Z05Sotbss 65280 0
  Failed to activate new LV.
Comment 8 Corey Marthaler 2007-01-29 15:39:18 EST
Just a note that QA is still seeing this issue and that this is blocking any
extensive cmirror testing. 

dmeventd:
[...]
select(6, [5], NULL, NULL, {1, 0})      = 0 (Timeout)
select(6, [5], NULL, NULL, {1, 0})      = 0 (Timeout)
select(6, [5], NULL, NULL, {1, 0})      = 0 (Timeout)
select(6, [5], NULL, NULL, {1, 0})      = 0 (Timeout)
select(6, [5], NULL, NULL, {1, 0})      = ? ERESTARTNOHAND (To be restarted)
PANIC: attached pid 3107 exited
Process 3107 detached


lvm2-cluster-2.02.20-1.el4
lvm2-2.02.20-1.el4
device-mapper-1.02.16-1.el4
2.6.9-43.ELsmp
Comment 9 Corey Marthaler 2007-01-29 18:24:52 EST
The locking model in lvm.conf changed. 
Woulda been nice to know 10 days ago. :)

Note You need to log in before you can comment on or make changes to this bug.