Bug 194125
Summary: | device lookup error causes mirror creation locking failure | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Corey Marthaler <cmarthal> | ||||||
Component: | lvm2 | Assignee: | Alasdair Kergon <agk> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.0 | CC: | agk, djansa, heinzm, jbrassow, mbroz | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2006-0504 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2006-08-10 21:48:40 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 180185 | ||||||||
Attachments: |
|
Description
Corey Marthaler
2006-06-05 21:26:26 UTC
I haven't seen this. Has it been reproduced? Is it a blocker? I have found something similar to this that I haven't been able to track down yet... It involves the following steps: 1) fail mirror device 2) failure is properly handled, leaving a linear device 3) add new device to the volume group 4) lvconvert fails (or even removal of the volume and an attempt to create a mirror) It fails because kernel device-mapper can not find a device. This problem sounds similar, but I haven't been able to find out why it happens yet. I have been able to reproduce again this by just creating and deleting cluster mirrors in a loop. while true; do lvcreate -m 1 -n cmirror1 -L 100M B; lvremove -f /dev/B/cmirror1; done This probably should be a blocker. Error locking on node taft-03: Internal lvm error, check syslog Failed to activate new LV. Logical volume "cmirror1" successfully removed Error locking on node taft-01: Command timed out Failed to activate new LV. device-mapper: Couldn't register clustered_log service device-mapper: Unable to connect to cluster infrastructure. device-mapper: dm-mirror: Error creating mirror dirty log device-mapper: error adding target to table Comment #3 doesn't look like it's the same issue as this bug. This bug is about the kernel device-mapper not being able to find a device. Comment #3 is about a machine not being able to register a service with CMAN. Filed bz 197952 for the issue in comment #3. Please disregard that comment. Created attachment 132076 [details]
First attempt at creating cluster mirror volume after create/remove
Created attachment 132077 [details]
second attempt after create/remove
1) create 5 mirrors 2) remove mirrors 3) create a mirror with a name like one just removed (comment #6 attachment) 4) remove log device left over from failure 5) reattempt create (comment #7 attachment) Results: Jul 7 14:17:13 tng1-1 lvm[2148]: device-mapper: create ioctl failed: Device or resource busy Jul 7 14:17:13 tng1-1 kernel: device-mapper: Cluster mirror log server is shutting down. Jul 7 14:17:13 tng1-1 lvm[2148]: device-mapper: reload ioctl failed: No such device or address Jul 7 14:17:13 tng1-1 kernel: device-mapper: dm-mirror: Device lookup failure Jul 7 14:17:13 tng1-1 kernel: device-mapper: error adding target to table insert 'vgscan' on all nodes between steps 2 and 3 as work around? testing... nope. Big clue: [root@tng1-1 ~]# dmsetup ls VolGroup00-LogVol01 (253, 0) VolGroup00-LogVol00 (253, 1) [root@tng1-1 ~]# lvcreate -m1 -L 500M -n lv1 vg Logical volume "lv1" created [root@tng1-1 ~]# lvremove -ff vg Logical volume "lv1" successfully removed [root@tng1-1 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Devices LogVol00 VolGroup00 -wi-ao 13.19G /dev/hda2(32) LogVol01 VolGroup00 -wi-ao 1.00G /dev/hda2(0) [root@tng1-1 ~]# dmsetup ls vg-lv1_mlog (253, 2) vg-lv1_mimage_1 (253, 4) vg-lv1_mimage_0 (253, 3) VolGroup00-LogVol01 (253, 0) VolGroup00-LogVol00 (253, 1) Affects single machine mirroring too. bogus data coming from kernel regarding open count on underlying devices (when the parent is removed, the underlying devices still send back an open of 1). Needs to be fixed in kernel, but a simple sleep(1) in the device-mapper userspace code for dm devices with children works fine too - allows the open count time to settle... better than nothing. patches made, will leave agk to mark as modified when he commits the patches. We hit this bug doing mirror testing today. This is not very reproducable as I've only ever seen it a few times. [root@taft-04 ~]# lvcreate -m 1 -n mirror -L 1G vg Error locking on node taft-03: Internal lvm error, check syslog Failed to activate new LV. [root@taft-04 ~]# dmsetup ls vg-mirror_mimage_1 (253, 4) vg-mirror_mimage_0 (253, 3) vg-mirror (253, 5) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) vg-mirror_mlog (253, 2) Jul 19 10:39:45 taft-03 kernel: device-mapper: Cluster mirror log server is shutting down. Jul 19 10:39:45 taft-03 kernel: device-mapper: dm-mirror: Device lookup failure Jul 19 10:39:45 taft-03 kernel: device-mapper: error adding target to table Jul 19 10:39:45 taft-03 dmeventd[4426]: Monitoring mirror device, vg-mirror for events [root@taft-03 ~]# uname -ar Linux taft-03 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux [root@taft-03 ~]# rpm -q lvm2 lvm2-2.02.06-6.0.RHEL4 [root@taft-03 ~]# rpm -q lvm2-cluster lvm2-cluster-2.02.06-6.0.RHEL4 [root@taft-03 ~]# rpm -q device-mapper device-mapper-1.02.07-4.0.RHEL4 [root@taft-03 ~]# rpm -q cmirror cmirror-1.0.1-0 [root@taft-03 ~]# rpm -q cmirror-kernel cmirror-kernel-2.6.9-10.2 Hit this with single mirroring as well. [root@taft-04 ~]# rpm -q device-mapper device-mapper-1.02.07-4.0.RHEL4 [root@taft-04 ~]# rpm -q lvm2 lvm2-2.02.06-6.0.RHEL4 [root@taft-04 ~]# uname -ar Linux taft-04 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux Not a release blocker. Issue is addressed in KBase and bug text. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0504.html |