Bug 194125

Summary: device lookup error causes mirror creation locking failure
Product: Red Hat Enterprise Linux 4 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Alasdair Kergon <agk>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: agk, djansa, heinzm, jbrassow, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0504 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 21:48:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 180185    
Attachments:
Description Flags
First attempt at creating cluster mirror volume after create/remove
none
second attempt after create/remove none

Description Corey Marthaler 2006-06-05 21:26:26 UTC
Description of problem:
I had been having realatively good luck creating and deleting cluster mirrors
until my last attempt when I saw a failure due to a "Device lookup failure".

Creation attempt on taft-04:
[root@taft-04 ~]# lvcreate -L 400M -n coreymirror -m 1 mirror_1
  Error locking on node taft-03: Internal lvm error, check syslog
  Failed to activate new LV.

Syslog from taft-03:
Jun  5 09:50:02 taft-03 kernel: device-mapper: Cluster mirror log server is
shutting down.
Jun  5 09:50:02 taft-03 kernel: device-mapper: dm-mirror: Device lookup failure
Jun  5 09:50:02 taft-03 kernel: device-mapper: error adding target to table

taft-02 was the mirror server:
Jun  5 09:50:44 taft-02 kernel: device-mapper: I'm the cluster mirror log server
for LVM-0I9prx81Ohg7GIZe3tsGkG
g4CTqjED4y4tqJXKFcB6PIyVabCnRXUlqJ2U70OxCh - reading disk
Jun  5 09:50:44 taft-02 kernel: device-mapper: Disk Resume::
Jun  5 09:50:44 taft-02 kernel: device-mapper:   Live nodes        :: 1
Jun  5 09:50:44 taft-02 kernel: device-mapper:   In-Use Regions    :: 0
Jun  5 09:50:44 taft-02 kernel: device-mapper:   Good IUR's        :: 0
Jun  5 09:50:44 taft-02 kernel: device-mapper:   Bad IUR's         :: 0
Jun  5 09:50:44 taft-02 kernel: device-mapper:   Sync count        :: 0
Jun  5 09:50:44 taft-02 kernel: device-mapper:   Disk Region count ::
18446744073709551615
Jun  5 09:50:44 taft-02 kernel: device-mapper:   Region count      :: 800
Jun  5 09:50:44 taft-02 kernel: device-mapper:   NOTE:  Mapping has changed.
Jun  5 09:50:44 taft-02 kernel: device-mapper: Marked regions::
Jun  5 09:50:44 taft-02 kernel: device-mapper:   0 - 800
Jun  5 09:50:44 taft-02 kernel: device-mapper:   Total = 800
Jun  5 09:50:44 taft-02 kernel: device-mapper: Out-of-sync regions::
Jun  5 09:50:44 taft-02 kernel: device-mapper:   0 - 800
Jun  5 09:50:44 taft-02 kernel: device-mapper:   Total = 800
Jun  5 09:50:45 taft-02 lvm[4648]: Monitoring mirror device,
mirror_1-coreymirror for events
Jun  5 09:51:04 taft-02 lvm[4648]: mirror_1-coreymirror is now in-sync


taft-03 did have the proper module loaded:
[root@taft-03 lvm]# lsmod
Module                  Size  Used by
dm_cmirror             32924  0
dlm                   130308  3
cman                  137248  10 dm_cmirror,dlm
md5                     5697  1
ipv6                  282657  24
parport_pc             29185  0
lp                     15089  0
parport                43981  2 parport_pc,lp
autofs4                24009  0
i2c_dev                13633  0
i2c_core               28609  1 i2c_dev
sunrpc                175545  1
ds                     21449  0
yenta_socket           22977  0
pcmcia_core            69329  2 ds,yenta_socket
button                  9057  0
battery                11209  0
ac                      6729  0
uhci_hcd               34665  0
ehci_hcd               33605  0
hw_random               7137  0
shpchp                 93833  0
e1000                 120505  0
floppy                 65809  0
sg                     42617  0
dm_snapshot            19072  0
dm_zero                 3649  0
dm_mirror              32209  1 dm_cmirror
ext3                  137809  2
jbd                    69105  1 ext3
dm_mod                 68097  14 dm_cmirror,dm_snapshot,dm_zero,dm_mirror
qla2300               127425  0
qla2xxx               183201  4 qla2300
scsi_transport_fc      11841  1 qla2xxx
megaraid_mbox          40017  2
megaraid_mm            15881  1 megaraid_mbox
sd_mod                 19393  8
scsi_mod              141457  5 sg,qla2xxx,scsi_transport_fc,megaraid_mbox,

[root@taft-03 lvm]# dmsetup status
mirror_1-coreymirror_mimage_1: 0 925696 linear
mirror_1-coreymirror_mlog: 0 8192 linear
mirror_1-coreymirror_mimage_0: 0 925696 linear
VolGroup00-LogVol01: 0 4063232 linear
VolGroup00-LogVol00: 0 40960000 linear
mirror_1-coreymirror:


[root@taft-03 lvm]# lvs
  LV          VG       Attr   LSize   Origin Snap%  Move Log              Copy%
  coreymirror mirror_1 mwi-d- 400.00M                    coreymirror_mlog   0.00


[root@taft-04 ~]# lvs
  LV          VG       Attr   LSize   Origin Snap%  Move Log              Copy%
  coreymirror mirror_1 mwi-a- 400.00M                    coreymirror_mlog 100.00


Version-Release number of selected component (if applicable):
[root@taft-03 lvm]# rpm -qa | grep cmirror
cmirror-kernel-smp-2.6.9-5.0
cmirror-kernel-2.6.9-5.0
cmirror-kernel-debuginfo-2.6.9-5.0
cmirror-1.0.0-5
cmirror-kernel-largesmp-2.6.9-5.0
cmirror-debuginfo-1.0.0-5

How reproducible:
I'll try and reproduce and grab more info...

Comment 2 Jonathan Earl Brassow 2006-07-06 04:10:09 UTC
I haven't seen this.  Has it been reproduced?  Is it a blocker?

I have found something similar to this that I haven't been able to track down yet...  It involves the 
following steps:
1) fail mirror device
2) failure is properly handled, leaving a linear device
3) add new device to the volume group
4) lvconvert fails (or even removal of the volume and an attempt to create a mirror)

It fails because kernel device-mapper can not find a device.  This problem sounds similar, but I haven't 
been able to find out why it happens yet.


Comment 3 Corey Marthaler 2006-07-07 15:25:59 UTC
I have been able to reproduce again this by just creating and deleting cluster
mirrors in a loop.

while true; do lvcreate -m 1 -n cmirror1 -L 100M B; lvremove -f /dev/B/cmirror1;
done

This probably should be a blocker.

 Error locking on node taft-03: Internal lvm error, check syslog
  Failed to activate new LV.
  Logical volume "cmirror1" successfully removed
  Error locking on node taft-01: Command timed out
  Failed to activate new LV.

device-mapper: Couldn't register clustered_log service
device-mapper: Unable to connect to cluster infrastructure.
device-mapper: dm-mirror: Error creating mirror dirty log
device-mapper: error adding target to table


Comment 4 Jonathan Earl Brassow 2006-07-07 15:33:53 UTC
Comment #3 doesn't look like it's the same issue as this bug.  This bug is about
the kernel device-mapper not being able to find a device.  Comment #3 is about a
machine not being able to register a service with CMAN.


Comment 5 Corey Marthaler 2006-07-07 16:00:57 UTC
Filed bz 197952 for the issue in comment #3. Please disregard that comment. 

Comment 6 Jonathan Earl Brassow 2006-07-07 18:25:18 UTC
Created attachment 132076 [details]
First attempt at creating cluster mirror volume after create/remove

Comment 7 Jonathan Earl Brassow 2006-07-07 18:27:15 UTC
Created attachment 132077 [details]
second attempt after create/remove

Comment 8 Jonathan Earl Brassow 2006-07-07 18:30:15 UTC
1) create 5 mirrors
2) remove mirrors
3) create a mirror with a name like one just removed (comment #6 attachment)
4) remove log device left over from failure
5) reattempt create (comment #7 attachment)
Results:
Jul  7 14:17:13 tng1-1 lvm[2148]: device-mapper: create ioctl failed: Device or
resource busy
Jul  7 14:17:13 tng1-1 kernel: device-mapper: Cluster mirror log server is
shutting down.
Jul  7 14:17:13 tng1-1 lvm[2148]: device-mapper: reload ioctl failed: No such
device or address
Jul  7 14:17:13 tng1-1 kernel: device-mapper: dm-mirror: Device lookup failure
Jul  7 14:17:13 tng1-1 kernel: device-mapper: error adding target to table


Comment 9 Jonathan Earl Brassow 2006-07-07 18:44:25 UTC
insert 'vgscan' on all nodes between steps 2 and 3 as work around?  testing...


Comment 10 Jonathan Earl Brassow 2006-07-07 19:02:10 UTC
nope.


Comment 11 Jonathan Earl Brassow 2006-07-07 19:50:35 UTC
Big clue:

[root@tng1-1 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 0)
VolGroup00-LogVol00     (253, 1)
[root@tng1-1 ~]# lvcreate -m1 -L 500M -n lv1 vg
  Logical volume "lv1" created
[root@tng1-1 ~]# lvremove -ff vg
  Logical volume "lv1" successfully removed
[root@tng1-1 ~]# lvs
  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%  Devices
  LogVol00 VolGroup00 -wi-ao 13.19G                               /dev/hda2(32)
  LogVol01 VolGroup00 -wi-ao  1.00G                               /dev/hda2(0)
[root@tng1-1 ~]# dmsetup ls
vg-lv1_mlog     (253, 2)
vg-lv1_mimage_1 (253, 4)
vg-lv1_mimage_0 (253, 3)
VolGroup00-LogVol01     (253, 0)
VolGroup00-LogVol00     (253, 1)


Comment 12 Jonathan Earl Brassow 2006-07-07 20:14:08 UTC
Affects single machine mirroring too.

Comment 13 Jonathan Earl Brassow 2006-07-07 21:59:20 UTC
bogus data coming from kernel regarding open count on underlying devices (when
the parent is removed, the underlying devices still send back an open of 1).

Needs to be fixed in kernel, but a simple sleep(1) in the device-mapper
userspace code for dm devices with children works fine too - allows the open
count time to settle... better than nothing.


Comment 14 Jonathan Earl Brassow 2006-07-10 21:36:16 UTC
patches made, will leave agk to mark as modified when he commits the patches.


Comment 18 Corey Marthaler 2006-07-19 21:49:12 UTC
We hit this bug doing mirror testing today. This is not very reproducable as
I've only ever seen it a few times.


[root@taft-04 ~]# lvcreate -m 1 -n mirror -L 1G vg
  Error locking on node taft-03: Internal lvm error, check syslog
  Failed to activate new LV.
[root@taft-04 ~]# dmsetup ls
vg-mirror_mimage_1      (253, 4)
vg-mirror_mimage_0      (253, 3)
vg-mirror       (253, 5)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
vg-mirror_mlog  (253, 2)


Jul 19 10:39:45 taft-03 kernel: device-mapper: Cluster mirror log server is
shutting down.
Jul 19 10:39:45 taft-03 kernel: device-mapper: dm-mirror: Device lookup failure
Jul 19 10:39:45 taft-03 kernel: device-mapper: error adding target to table
Jul 19 10:39:45 taft-03 dmeventd[4426]: Monitoring mirror device, vg-mirror for
events


[root@taft-03 ~]# uname -ar
Linux taft-03 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64
x86_64 GNU/Linux
[root@taft-03 ~]# rpm -q lvm2
lvm2-2.02.06-6.0.RHEL4
[root@taft-03 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.06-6.0.RHEL4
[root@taft-03 ~]# rpm -q device-mapper
device-mapper-1.02.07-4.0.RHEL4
[root@taft-03 ~]# rpm -q cmirror
cmirror-1.0.1-0
[root@taft-03 ~]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-10.2



Comment 20 Corey Marthaler 2006-07-21 15:51:03 UTC
Hit this with single mirroring as well.

[root@taft-04 ~]# rpm -q device-mapper
device-mapper-1.02.07-4.0.RHEL4
[root@taft-04 ~]# rpm -q lvm2
lvm2-2.02.06-6.0.RHEL4
[root@taft-04 ~]# uname -ar
Linux taft-04 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64
x86_64 GNU/Linux


Comment 22 Dean Jansa 2006-07-31 16:03:08 UTC
Not a release blocker.  Issue is addressed in KBase and bug text.

Comment 24 Red Hat Bugzilla 2006-08-10 21:48:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0504.html