Bug 1320836
Summary: | when vol-create-as a volume with invalid name in a disk pool, libvirt produced error, but parted still created a partition and multipathd didn't generate symbolic link in /dev/mapper | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yisun |
Component: | libvirt | Assignee: | John Ferlan <jferlan> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | dyuan, pzhang, rbalakri, xuzhang, yanyang, yisun |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-1.3.5-1.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-03 18:40:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
yisun
2016-03-24 06:25:29 UTC
The failure path for this code is a "best case" type scenario... As part of the fix for commit id '290ffcf' (bug 1138516), I did add the call to virStorageBackendDiskDeleteVol when the virStorageBackendDiskCreateVol fails. That DeleteVol code has separate paths for devmapper device and "regular" disk backend. That logic was introduced by commit id 'df1011ca8'. I'd have to do some "on system debugging" in order to dig any deeper as I don't have a multipath device mapper setup on my laptop/system. So if you can provide me access to one, I can continue to look deeper. My "assumption" is that the dmsetup command being called is failing, but we're not checking that in the caller - the caller assumes that the command succeeds if it's run. Since 'badName' is probably not expected by dmsetup, I'm assuming it fails as well, but would have to "see it" in action and then generate a fix. I have some ideas already coded up so it shouldn't take too long. As for the duplicate entries in the pool, I see what the issue is... There are a few missed clear all vols in pool prior to calling refreshPool. Those I can send a separate patch for. The new test disk couldn't handle 50M, but 10M worked... I also found that a "parted /dev/mapper/mpathc rm --script 2" after step 6 would allow for the vol-list, pool-refresh, etc. to work. So, my initial assumption wasn't true... the "dmsetup remove --force" is working, but it seems it's (still) the wrong solution especially since the partition isn't deleted by dmsetup remove and that's what in the long run is causing the issue. It's too bad that kind of detail wasn't found in the patches that led to commit id 'df1011ca8': http://www.redhat.com/archives/libvir-list/2011-February/msg00386.html (v3) http://www.redhat.com/archives/libvir-list/2011-February/msg00184.html (v2) http://www.redhat.com/archives/libvir-list/2011-January/msg01270.html (v1) What you'll note in the examples provided of those solutions though is that the "p#" (partition #) that's appended onto the 'base' name is what really is causing the issue. So rather than "shed" the "p" from the "p1", the solution was to use dmsetup remove, which while working in one sense, I'm still not convinced it ever worked "properly". It certainly would remove the volume from the libvirt pool, but the dm device would still have that partition as far as I understand the code at least. Of course, I think this only comes to light now since the "part_separator='no'" support was added (eg. commit id '020135dc' for bz 1265694). Might be interesting to alter the setup to remove the part_separator from the pool def and to not have user_friendly_names set. Of interest would be - using the same process what happens after the failure to create the volume. I suspect the mpathcp2 volume would successfully have the 'dmsetup remove' run; however, running a parted afterwards would so partition 2 still exists. In any case, that's history and now it's time to find a solution for "both" instances. I now understand what needs to be done and will work on some patches. I do have a "fix" for the current situation (part_separator='no'), but would like to "really" fix the other situation where the "p#" is used. So if you could configure perhaps another volume that doesn't use user_friendly_names, I can make sure I have a more functionally complete solution. Hi John, Your assumption is correct. If a volume's multipath name ended with "#" which required a 'p' appended when created new partition. The dm-setup remove can work well if vol-create-as failed. Now I have a new vhba. # virsh nodedev-dumpxml scsi_host6 <device> <name>scsi_host6</name> <path>/sys/devices/pci0000:00/0000:00:09.0/0000:18:00.1/host4/vport-4:0-1/host6</path> <parent>scsi_host4</parent> <capability type='scsi_host'> <host>6</host> <unique_id>6</unique_id> <capability type='fc_host'> <wwnn>21000024ff370145</wwnn> <wwpn>2101001b32a90001</wwpn> <fabric_wwn>2001000dec9877c1</fabric_wwn> </capability> </capability> </device> # lsscsi | grep 6: [6:0:0:0] disk IBM 1726-4xx FAStT 0617 /dev/sdj [6:0:0:1] disk IBM 1726-4xx FAStT 0617 /dev/sdk [6:0:1:0] disk IBM 1726-4xx FAStT 0617 /dev/sdl [6:0:1:1] disk IBM 1726-4xx FAStT 0617 /dev/sdm # cat /etc/multipath/bindings ... mpathd 3600a0b80005b10ca00005e115729093f <== since the wwid is ended with 'f' which is not a number, so a 'p' will not be required by multipathd. so I didn't set up "user_friendly_name no" in /etc/multipath.conf, but use an alias ended in a number. as follow: ... # vim /etc/multipath.conf multipaths { multipath { wwid 3600a0b80005b10ca00005e115729093f alias 5554344432 } # service multipathd restart Redirecting to /bin/systemctl restart multipathd.service # cat pool.xml <pool type='disk'> <name>bugtest</name> <source> <device path='/dev/mapper/5554344432'/> <format type='gpt'/> </source> <target> <path>/dev/disk/by-path/</path> </target> </pool> # virsh pool-define pool.xml; virsh pool-build bugtest --overwrite; virsh pool-start bugtest Pool bugtest defined from pool.xml Pool bugtest built Pool bugtest started # virsh vol-create-as --pool bugtest --name badName --capacity 50M error: Failed to create vol badName error: invalid argument: invalid partition name 'badName', expected '5554344432p1' # parted /dev/mapper/5554344432 print Model: Linux device-mapper (multipath) (dm) Disk /dev/mapper/5554344432: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags <=== no new partition here # service multipathd restart; ll /dev/mapper/ | grep 555 Redirecting to /bin/systemctl restart multipathd.service lrwxrwxrwx. 1 root root 7 May 9 20:23 5554344432 -> ../dm-6 <=== no new partition here # virsh pool-refresh bugtest Pool bugtest refreshed <=== no error here I left the machine turned on, if the vhba deleted by someone else, you can use the nodedev xmls in this comment and comment 2 to create them by yourself. thx. I think we were working at about the same time on the same machine on two different things. I believe you were running my code when you tried this since "the right thing" happened. If you were running the 1.3.2 (or upstream) version, then the deletion would have failed. Additionally, if you had a successful creation using the right name, the "virsh vol-delete $volname $poolname" would have failed, e.g.: # virsh vol-list bugtest Name Path ------------------------------------------------------------------------------ 5554344432p1 /dev/mapper/5554344432p1 # virsh vol-delete 5554344432p1 bugtest error: Failed to delete vol 5554344432p1 error: Storage volume not found: no storage vol with matching path '/dev/mapper/5554344432p1' # ll /dev/mapper/ | grep 555 lrwxrwxrwx. 1 root root 7 May 9 22:57 5554344432 -> ../dm-6 # parted /dev/mapper/5554344432 print Model: Linux device-mapper (multipath) (dm) Disk /dev/mapper/5554344432: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 31.0MB 41.0MB 10.0MB primary # The reason is the deletion code shouldn't be doing the 'dmsetup remove' at all. Rather the fix should have handled the partition separator "p" properly. In any case, the good news is my changes fix both issues ;-). Thanks for setting up the second device so I can test with both configs at the same time (I used pool name bug1test for the "original" issue). I'm going to add some more context to bz 1265694 as well as it seems after actually using things for a while that the code there isn't quite right Patch is pushed to resolve this: commit 8cdff0b93f732256df8d4a5c4eb418f37c707291 Author: John Ferlan <jferlan> Date: Tue Apr 26 08:53:57 2016 -0400 storage: Fix virStorageBackendDiskDeleteVol for device mapper Commit id 'df1011ca8' modified virStorageBackendDiskDeleteVol to use "dmsetup remove --force" to remove the volume, but left things in an inconsistent state since the partition still existed on the disk and only the device mapper device (/dev/dm-#) was removed. Prior to commit '1895b421' (or '1ffd82bb' and '471e1c4e'), this could go unnoticed since virStorageBackendDiskRefreshPool wasn't called. However, the pool would be unusable since the /dev/dm-# device would be removed even though the partition was not removed unless a multipathd restart reset the link. That would of course make the volume appear again in the pool after a refresh or pool start after libvirt reload. This patch removes the 'dmsetup' logic and re-implements the partition deletion logic for device mapper devices. The removal of the partition via 'parted rm --script #' will cause udev device change logic to allow multipathd to handle removing the dm-* device associated with the partition. $ git describe 8cdff0b93f732256df8d4a5c4eb418f37c707291 v1.3.4-211-g8cdff0b $ Verified versions : libvirt-2.0.0-6.el7.x86_64 qemu-kvm-rhev-2.6.0-22.el7.x86_64 device-mapper-multipath-0.4.9-96.el7.x86_64 Steps : Scenario 1 : source device ends with letter 1. Create and build a multipath device pool as following : # virsh pool-list Name State Autostart ------------------------------------------- test1 active no # virsh pool-dumpxml test1 <pool type='disk'> <name>test1</name> <uuid>a508da43-5fa2-46f3-b9b9-0c2193ff26b4</uuid> <capacity unit='bytes'>10737401344</capacity> <allocation unit='bytes'>0</allocation> <available unit='bytes'>10737383936</available> <source> <device path='/dev/mapper/mpathc' part_separator='no'> <freeExtent start='17408' end='10737401344'/> </device> <format type='gpt'/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool> 2. create volume with invalid name, it should report error # virsh vol-create-as --pool test1 --name vol1 --capacity 50M error: Failed to create vol vol1 error: invalid argument: invalid partition name 'vol1', expected 'mpathc1' 3. check in /dev/mapper, there is no extra partition # ll /dev/mapper/ | grep mpathc lrwxrwxrwx. 1 root root 7 Sep 6 13:25 mpathc -> ../dm-6 # parted /dev/mapper/mpathc print Model: Linux device-mapper (multipath) (dm) Disk /dev/mapper/mpathc: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 4. create volume with valid name, and check it can be deleted # virsh vol-create-as --pool test1 --name mpathc1 --capacity 50M Vol mpathc1 created # virsh vol-list --pool test1 Name Path ------------------------------------------------------------------------------ mpathc1 /dev/mapper/mpathc1 # virsh vol-delete mpathc1 --pool test1 Vol mpathc1 deleted Scenario 2 : source device end with number 1. check current multipath devices and change alias # cat /etc/multipath/bindings ...... mpathc 3600a0b80005b10ca00005e115729093f mpathd 3600a0b80005b0acc00004f875728fe8e # vim /etc/multipath.conf multipaths { multipath { wwid 3600a0b80005b10ca00005e115729093f alias 5554344432 } # systemctl restart multipathd 2. define and build a multipath pool like following, use the new alias # virsh pool-dumpxml test <pool type='disk'> <name>test</name> <uuid>8511548b-3959-4ebd-b293-5eb8975169d5</uuid> <capacity unit='bytes'>10737401344</capacity> <allocation unit='bytes'>0</allocation> <available unit='bytes'>10737383936</available> <source> <device path='/dev/mapper/5554344432'> <freeExtent start='17408' end='10737401344'/> </device> <format type='gpt'/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool> 3. create a volume with invalid name, it should report error # virsh vol-create-as --pool test --name vol1 --capacity 50M error: Failed to create vol vol1 error: invalid argument: invalid partition name 'vol1', expected '5554344432p1' # virsh vol-list test Name Path ------------------------------------------------------------------------------ 4. check in /dev/mapper, no extra partition # ls /dev/mapper/ -al | grep 555 lrwxrwxrwx. 1 root root 7 Sep 6 12:49 5554344432 -> ../dm-6 # parted /dev/mapper/5554344432 print Model: Linux device-mapper (multipath) (dm) Disk /dev/mapper/5554344432: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 5. create a volume use valid name and delete it # virsh vol-create-as --pool test --name 5554344432p1 --capacity 50M Vol 5554344432p1 created # virsh vol-list test Name Path ------------------------------------------------------------------------------ 5554344432p1 /dev/mapper/5554344432p1 # virsh vol-delete 5554344432p1 test Vol 5554344432p1 deleted # virsh vol-list test Name Path ------------------------------------------------------------------------------ As above, move this bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |