Bug 1293804

Summary:	libvirt fails to unlink the image disks after creating VMs using virt-install: cannot unlink file 'FOO': Success
Product:	Red Hat Enterprise Linux 7	Reporter:	Naoya Hashimoto <nhashimo>
Component:	libvirt	Assignee:	John Ferlan <jferlan>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.2	CC:	aanm90, amaresh_pathak, crobinso, dyuan, freyes, jferlan, jose_de_la_rosa, ldelossa.ld, lud.janvier, optak, rbalakri, stephan.duehr, tnurlygayanov, tzheng, xarses, yisun
Target Milestone:	rc	Flags:	yisun: needinfo-
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	libvirt-1.3.3-1.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-03 18:49:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Naoya Hashimoto 2015-12-23 07:00:11 UTC

Description of problem:
virsh vol-delete failed to delete (unlink) after creating VMs using virt-install.

Version-Release number of selected component (if applicable):
- kernel: 3.10.0-229.20.1.el7.x86_64
- libvirt daemon: libvirt-daemon-1.2.17-13.el7.x86_64
- libvirt client: libvirt-client-1.2.17-13.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a new VM with an additional disk using virt-install
2. Remove the VM image disks after destroying and undefing the VM.

Actual results:
libvirt fails to delete (unlink) the disks.
However, it succeeds after getting the volume dumpxml.

Expected results:
The VM disk image should be deleted.

Additional info:
The detail of reproducing the issue is as follows.

1. Create a new VM with an additional disk using virt-install
* The KVM Guest Image is the one Last modified: 2015-11-19 in the following link.
  <https://access.redhat.com/downloads/content/69/ver=/rhel---7/7.2/x86_64/product-software>

hostname="ose31-master01"
virt_mem_size="8196"
virt_vcpu_num="2"
virt_disk_format="qcow2"
virt_additional_disk="docker"
virt_additional_disk_size="20"
virt_os_variant="rhel7.1"
virt_networks_public="ose-public"
mac_addr="52:54:00:1f:f0:21"

virt-install \
--ram ${virt_mem_size} \
--vcpus ${virt_vcpu_num} \
--disk path=${virt_dir_images}/${hostname}.${virt_disk_format},device=disk,bus=virtio,format=${virt_disk_format} \
--disk path=${virt_dir_images}/${hostname}_${virt_additional_disk}.${virt_disk_format},device=disk,bus=virtio,format=${virt_disk_format},size=${virt_additional_disk_size} \
--os-variant ${virt_os_variant} \
--network network=${virt_networks_public},mac=${mac_addr} \
--import \
--noautoconsole \
--vnc \
--autostart \
--name ${hostname}

2. Remove the VM image disks after destroying and undefing the VM.
[root@ep7node01 images]# vol=ose31-master01.qcow2
[root@ep7node01 images]# virsh --debug 0 vol-delete --pool default $vol 
vol-delete: pool(optdata): default
vol-delete: vol(optdata): ose31-master01.qcow2
vol-delete: found option <pool>: default
vol-delete: <pool> trying as pool NAME
vol-delete: found option <vol>: ose31-master01.qcow2
vol-delete: <vol> trying as vol name
error: Failed to delete vol ose31-master01.qcow2
error: cannot unlink file '/var/lib/libvirt/images/ose31-master01.qcow2': Success

After getting the volume dumpxml, it succeeds to delete the disk.
[root@ep7node01 images]# virsh vol-dumpxml --pool default $vol
<volume type='file'>
  <name>ose31-node01.qcow2</name>
  <key>/var/lib/libvirt/images/ose31-node01.qcow2</key>
  <source>
  </source>
  <capacity unit='bytes'>32212254720</capacity>
  <allocation unit='bytes'>70782976</allocation>
  <target>
    <path>/var/lib/libvirt/images/ose31-node01.qcow2</path>
    <format type='qcow2'/>
    <permissions>
      <mode>0644</mode>
      <owner>0</owner>
      <group>0</group>
      <label>system_u:object_r:virt_image_t:s0</label>
    </permissions>
    <timestamps>
      <atime>1450846869.584226514</atime>
      <mtime>1450846869.561226454</mtime>
      <ctime>1450846869.979227526</ctime>
    </timestamps>
    <compat>1.1</compat>
    <features/>
  </target>
  <backingStore>
    <path>/var/lib/libvirt/images/rhel7.2-guest-ose.qcow2</path>
    <format type='qcow2'/>
    <permissions>
      <mode>0644</mode>
      <owner>107</owner>
      <group>107</group>
      <label>system_u:object_r:virt_content_t:s0</label>
    </permissions>
    <timestamps>
      <atime>1450846718.359838840</atime>
      <mtime>1447978567.941102422</mtime>
      <ctime>1450846718.332838771</ctime>
    </timestamps>
  </backingStore>
</volume>

[root@ep7node01 images]# virsh --debug 0 vol-delete --pool default $vol 
vol-delete: pool(optdata): default
vol-delete: vol(optdata): ose31-master01.qcow2
vol-delete: found option <pool>: default
vol-delete: <pool> trying as pool NAME
vol-delete: found option <vol>: ose31-master01.qcow2
vol-delete: <vol> trying as vol name
Vol ose31-master01.qcow2 deleted

Comment 2 André Martins 2016-01-17 05:42:14 UTC

Possible duplicate:
https://bugzilla.redhat.com/show_bug.cgi?id=1289327

I'm having the exact same problem:

# virsh vol-delete fooimage.img default
error: Failed to delete vol fooimage.img
error: cannot unlink file '/var/lib/libvirt/images/fooimage.img': Success

Comment 3 Louis DeLosSantos 2016-01-18 09:24:26 UTC

Same problem here, 

Any ideas? 

Errors encountered while removing certain storage devices.

cannot unlink file '/var/lib/libvirt/images/test01.qcow2': Success
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/delete.py", line 182, in _async_delete
    self._async_delete_path(conn, path, meter)
  File "/usr/share/virt-manager/virtManager/delete.py", line 228, in _async_delete_path
    vol.delete(0)
  File "/usr/lib/python2.7/dist-packages/libvirt.py", line 3260, in delete
    if ret == -1: raise libvirtError ('virStorageVolDelete() failed', vol=self)
libvirtError: cannot unlink file '/var/lib/libvirt/images/test01.qcow2': Success


Everytime.

Comment 4 Louis DeLosSantos 2016-01-18 23:17:13 UTC

Hey guys,

Work around (at least for me)

chmod -R 666 /var/lib/libvirt/images/*

Seems to fix the problem pretty consistently. About to just add it to rc.local or something.

Comment 5 John Ferlan 2016-01-19 00:29:04 UTC

From just a quick read this might be related (in a round-about way) to bz 1233003 and/or 1282288.  It seems (especially from comment 4) to be related to not creating a file with the correct permissions or not deleting it properly - add to that issues around how qemu-img uses default mode of 0644 on creation.

In any case, there's a few libvirt upstream commit id's which could solve this - see 'ce6506b0b' and sequence of changes from '77346f27' to '4cd7d220c' (specifically '9345c2bf').  Each of this is already in upstream libvirt 1.3.0

During that work I found that using 'unlink' directly didn't always work as planned due to some changes related to removing the default protections and privileges from the XML. The calls in the code to virFileRemove will do the right thing.

Comment 6 Jose De la Rosa 2016-01-19 16:37:04 UTC

Louis,

Your workaround in comment #4 is not a great workaround, as it allows any user in your host to delete your VM images. In any regard, even after manually removing the VM image file, the volume still can't be deleted:

# rm /var/lib/libvirt/images/todelete

# virsh vol-delete --pool dir todelete
error: Failed to delete vol todelete
error: cannot unlink file '/var/lib/libvirt/images/todelete': Success

OR

# virsh vol-delete --pool dir /var/lib/libvirt/images/todelete
error: Failed to delete vol /var/lib/libvirt/images/todelete
error: cannot unlink file '/var/lib/libvirt/images/todelete': Success

Comment 7 John Ferlan 2016-01-19 20:06:00 UTC

I will agree that setting 666 on the entire contents is a bit dangerous, although it is a bit better than 777.  

It perhaps would be helpful to do an 'ls -al' on the volume in question and the containing directory. May also have been helpful to know what the 'default' pool permissions/mode are set to.  I'm assuming things are being run as root.  Is by any chance NFS involved? That is, is your target /var/lib/libvirt/images really looking at some NFS root-squash server directory.  


I had to make some "localized" adjustments to the script in the description:

hostname="bz1293804"
virt_mem_size="8196"
virt_vcpu_num="2"
virt_dir_images="/home/vm-images"
virt_disk_format="qcow2"
virt_disk_size="15"
virt_additional_disk="docker"
virt_additional_disk_size="10"
virt_os_variant="rhel7"
virt_networks_public="default"
mac_addr="52:54:00:1f:f0:21"

virt-install \
--ram ${virt_mem_size} \
--vcpus ${virt_vcpu_num} \
--disk path=${virt_dir_images}/${hostname}.${virt_disk_format},device=disk,bus=virtio,format=${virt_disk_format},size=${virt_disk_size} \
--disk path=${virt_dir_images}/${hostname}_${virt_additional_disk}.${virt_disk_format},device=disk,bus=virtio,format=${virt_disk_format},size=${virt_additional_disk_size} \
--os-variant ${virt_os_variant} \
--network network=${virt_networks_public},mac=${mac_addr} \
--import \
--noautoconsole \
--vnc \
--autostart \
--name ${hostname}

If I "run" that, then I get:


Starting install...
Allocating 'bz1293804.qcow2'                                                                                                                        |  15 GB  00:00:00     
Allocating 'bz1293804_docker.qcow2'                                                                                                                 |  10 GB  00:00:00     
Creating domain...                                                                                                                                  |    0 B  00:00:00     
Domain creation completed.


# virsh vol-list default
 Name                 Path                                    
------------------------------------------------------------------------------
 bz1293804.qcow2      /home/vm-images/bz1293804.qcow2         
 bz1293804_docker.qcow2 /home/vm-images/bz1293804_docker.qcow2  
...

# ls -al /home/vm-images/bz1293804*.qcow2
-rw-------. 1 root root 10739318784 Jan 19 14:08 /home/vm-images/bz1293804_docker.qcow2
-rw-------. 1 root root 16108814336 Jan 19 14:08 /home/vm-images/bz1293804.qcow2

# virsh destroy bz1293804
Domain bz1293804 destroyed

# virsh undefine bz1293804
Domain bz1293804 has been undefined

# virsh vol-list default
 Name                 Path                                    
------------------------------------------------------------------------------
 bz1293804.qcow2      /home/vm-images/bz1293804.qcow2         
 bz1293804_docker.qcow2 /home/vm-images/bz1293804_docker.qcow2  
...

# virsh vol-delete bz1293804.qcow2 default
Vol bz1293804.qcow2 deleted

# virsh vol-delete bz1293804_docker.qcow2 default
Vol bz1293804_docker.qcow2 deleted

# ls -al /home/vm-images/bz1293804*.qcow2
ls: cannot access /home/vm-images/bz1293804*.qcow2: No such file or directory

So this is not reproducible with upstream libvirt:

# virsh version
Compiled against library: libvirt 1.3.1
Using library: libvirt 1.3.1
Using API: QEMU 1.3.1
Running hypervisor: QEMU 2.4.1


NB: I reran my steps, except prior to the vol-delete commands I did:

rm -rf /home/vm-images/bz1293804*.qcow2

and still had no issue...

I have a few more things to check regarding history of changes, but figured I'd get this out there to see if any more data/details could be provided in the failing environment(s).

Comment 8 John Ferlan 2016-01-19 20:09:49 UTC

Forgot to note - "a" difference between what I did and the original script:

virt_dir_images="/home/vm-images"

I do this because I don't keep my default pool on the root file system...

is defined... In the description, that's empty - so originally those files got created in "/", which is obviously different that the expected /var/lib/libvirt/images/

Comment 9 Timur 2016-01-20 11:34:24 UTC

This issue reproduced on my environment with RHEL 7.2

Comment 10 Cole Robinson 2016-01-20 15:34:33 UTC

As mentioned above there's a fedora bug 1289327 . That reporter is using btrfs. Is anyone other reproducer here using btrfs?

Comment 11 Andrew Woodward 2016-02-04 00:41:37 UTC

I can confirm the issue is present with:

$ virsh version
Compiled against library: libvirt 1.3.1
Using library: libvirt 1.3.1
Using API: QEMU 1.3.1
Running hypervisor: QEMU 2.5.0

Filesystem: xfs
Debian sid

non-root user

I create instances using the python libvirt with the XML interfaces [1][2]

[1] https://github.com/openstack/fuel-devops/blob/master/devops/driver/libvirt/libvirt_driver.py#L314
[2] https://github.com/openstack/fuel-devops/blob/master/devops/driver/libvirt/libvirt_driver.py#L743

In my case using a root user will result in the volume deleting with out error

attempting to delete the volume results in:

$ virsh vol-delete /var/lib/libvirt/images/9-Kilo_admin-iso
error: Failed to delete vol /var/lib/libvirt/images/9-Kilo_admin-iso
error: cannot unlink file '/var/lib/libvirt/images/9-Kilo_admin-iso': Permission denied

As reported inspecting the volume with vol-dumpxml allows the non-root user to delete them

Comment 12 John Ferlan 2016-02-04 15:46:29 UTC

A few extra details and command history may have helped here; otherwise, I have to assume a few things...

non-root user. (check)

runs ???virt-install???  (assume)

Unknown? Are you requested to provide credentials? (e.g. like sudo or su or some way to provide the credentials indicating you are allowed to do privileged things). I'm mostly interested in the connection used - a not root user would typically have a virsh be session based, while a root would be system (eg, the difference between virsh -c qemu:///session vs. virsh -c qemu:///system). Also do you any polkit type auth set up?

Unknown - do you destroy the VM as non-root user?  Are you asked to provide credentials?

Then the following steps:

non-root user runs virsh vol-delete (as shown), fails
non-root user runs virsh vol-dumpxml, succeeds
non-root user runs virsh vol-delete, succeeds

I'd be curious to know what an 'ls -alZ /var/lib/libvirt/images/9-Kilo_admin-iso' shows before the first vol-delete and then after it.  And of course a similar output after the vol-dumpxml.

tks -

Comment 13 Andrew Woodward 2016-02-10 17:42:41 UTC

volumes and domains are created via python bindings and XML

connection:
  declare -x LIBVIRT_DEFAULT_URI="qemu:///system"

permission is handled by membership in libvirt group which is some magic provided by the Debian package. user/script is always non-root and is never asked for credentials

hmm, going back and attempting to re-reproduce dumpxml does nothing for me, the first volume reported stuck is still having problems. All of the other defined volumes delete fine.

my workflow [1]

we can see that running domains have their owner switched to libvirt-qemu but 9-Kilo_admin-iso never comes back even though it's domain is gone.

The only difference that I'm aware of with that volume VS the others (which all appear to delete fine is that this one is uploaded data [2]

[1] http://paste.openstack.org/show/486593/
[2] https://github.com/openstack/fuel-devops/blob/master/devops/driver/libvirt/libvirt_driver.py#L770-L778

Comment 14 Cole Robinson 2016-02-11 19:25:00 UTC

*** Bug 1306564 has been marked as a duplicate of this bug. ***

Comment 15 John Ferlan 2016-02-11 22:27:31 UTC

Very strange indeed...  Something is very odd.

Your first ls -alZ shows the following owned by libvirt-qemu:libvirt-qemu:

-rw------- 1 libvirt-qemu libvirt-qemu ?    70647808 Feb 10 07:45 9-Kilo_slave-01-cinder
-rw------- 1 libvirt-qemu libvirt-qemu ?    92536832 Feb 10 07:45 9-Kilo_slave-01-swift
-rw------- 1 libvirt-qemu libvirt-qemu ?  5693308928 Feb 10 08:40 9-Kilo_slave-01-system
-rw------- 1 libvirt-qemu libvirt-qemu ?    56688640 Feb  8 12:11 9-Kilo_slave-02-cinder
-rw------- 1 libvirt-qemu libvirt-qemu ?   113246208 Feb  9 12:33 9-Kilo_slave-02-swift
-rw------- 1 libvirt-qemu libvirt-qemu ?  4589158400 Feb 10 08:40 9-Kilo_slave-02-system
...
and the remainder owned by root:root

then you run "dos.py erase 9-Kilo", each getting a failure to delete '/var/lib/libvirt/images/9-Kilo_admin-iso' (once for each of the 10 domains). I assume that's because it's a 'file' being used as a 'cdrom' for each of those domains, e.g. something like the following in each guest:

    <disk type='file' device='cdrom'>
      <source file='/var/lib/libvirt/images/9-Kilo_admin-iso'/>
      <target dev='sda' bus='scsi'/>
    </disk>

You follow that up with another ls -alZ and I see:

-rw------- 1 root         root         ?    70647808 Feb 10 07:45 9-Kilo_slave-01-cinder
-rw------- 1 root         root         ?    92536832 Feb 10 07:45 9-Kilo_slave-01-swift
-rw------- 1 root         root         ?  5693308928 Feb 10 08:41 9-Kilo_slave-01-system
-rw------- 1 root         root         ?    56688640 Feb  8 12:11 9-Kilo_slave-02-cinder
-rw------- 1 root         root         ?   113246208 Feb  9 12:33 9-Kilo_slave-02-swift
-rw------- 1 root         root         ?  4589158400 Feb 10 08:41 9-Kilo_slave-02-system
...

NOTE: They've all changed from libvirt-qemu:libvirt-qemu to root:root.  Is that something that 'dos.py erase $prefix' does?

When you go to run (as root) 'virsh vol-delete $each', the 'root:root' owned volumes will be deleted, but the other one won't because it's owned by libvirt-qemu:libvirt-qemu - that's not unexpected.

So, is it possible that dos.py "knows" not to try and change the protections on the ISO file because it doesn't want someone to delete it by mistake while it's assigned to some existing domain?  Also you only showed me the '9-K' domains, is there another domain that could have that file defined as a cdrom?

I'm not sure what dos.py erase does, but my suspicion is that it's a decision point made in the code to not change the owner:group on the 'cdrom' file; however, using the same logic there shouldn't be an attempt to vol-delete the file by the dos.py erase code. Once the code knows it has removed the last domain that the 'cdrom' is in, then it could change the protections to root:root and do the deletion.  Again I have some assumptions here because I don't know the code being used.

Comment 17 John Ferlan 2016-03-04 19:26:05 UTC

Something that was discussed in the last week on libvir-list while pondering a possible solution...

It would be *really helpful* for me to get output from the following:

virsh vol-dumpxml $volume $pool

ls -alZ $volume-path

What I really need is the permissions block from the volume, e.g.

    <permissions>
      <mode>0600</mode>
      <owner>0</owner>
      <group>0</group>
      <label>system_u:object_r:unlabeled_t:s0</label>
    </permissions>

compared to the owner, group, and permissions on the volume-path (e.g. the <path> element from the vol-dumpxml).

Then, does a 'virsh pool-refresh $pool' (I assume $pool == default in this case) followed by a 'virsh vol-dumpxml $volume $pool' show different values for mode, owner, group?  If so, I assume you then can follow that up with a 'virsh vol-delete $volume $pool'.

In order for the proper fix to be generated the assumption we're working under needs to be verified.

Comment 18 Ondřej Pták 2016-03-08 09:44:04 UTC

Sorry for delay, the system was used for testing, so I wait just in case, so testing is not affected. I hope this info will help. If not, I'm prepare to provide more.

$ vagrant box remove test # images are created and used by vagrant from sclo-vagrant1 (collection in CentOS)
Removing box 'test' (v0) with provider 'libvirt'...
Vagrant-libvirt plugin removed box only from you LOCAL ~/.vagrant/boxes directory
From libvirt storage pool you have to delete image manually(virsh, virt-manager or by any other tool)

# virsh vol-delete test_vagrant_box_image_0.img --pool default
error: Failed to delete vol test_vagrant_box_image_0.img
error: cannot unlink file '/var/lib/libvirt/images/test_vagrant_box_image_0.img': Success

# virsh vol-dumpxml test_vagrant_box_image_0.img --pool default
(...)
    <permissions>
      <mode>0744</mode>
      <owner>107</owner>
      <group>107</group>
      <label>system_u:object_r:virt_content_t:s0</label>
    </permissions>
(...)

# virsh vol-delete test_vagrant_box_image_0.img --pool default
error: Failed to delete vol test_vagrant_box_image_0.img
error: cannot unlink file '/var/lib/libvirt/images/test_vagrant_box_image_0.img': Success

# ls -alZ /var/lib/libvirt/images/test_vagrant_box_image_0.img
-rwxr--r--. qemu qemu system_u:object_r:virt_content_t:s0 /var/lib/libvirt/images/test_vagrant_box_image_0.img
# virsh pool-refresh default
Pool default refreshed

# virsh vol-dumpxml test_vagrant_box_image_0.img --pool default
(...)
      <mode>0744</mode>
      <owner>107</owner>
      <group>107</group>
      <label>system_u:object_r:virt_content_t:s0</label>
    </permissions>
(...)

# virsh vol-delete test_vagrant_box_image_0.img --pool default
error: Failed to delete vol test_vagrant_box_image_0.img
error: cannot unlink file '/var/lib/libvirt/images/test_vagrant_box_image_0.img': Success

Comment 19 Ondřej Pták 2016-03-08 09:46:19 UTC

I forgot to mention interesting info in last comment. Volume which I couldn't delete, disappeared in about 10 hours when nobody did anything with virsh or vagrant.

Comment 20 Ondřej Pták 2016-03-09 09:20:33 UTC

It seems that qemu owner was the problem. This solved my problem:

chown root:root /var/lib/libvirt/images/test_vagrant_box_image_0.imgvirsh
virsh vol-dumpxml test_vagrant_box_image_0.img --pool default
virsh vol-delete test_vagrant_box_image_0.img --pool default

But it didn't work without vol-dumpxml.

Comment 21 John Ferlan 2016-03-09 16:07:57 UTC

Is there a reason you're making those comments private?  

In any case, it seems you are confirming what's been speculated in libvir-list regarding what the issue might be. Furthermore, you're correct in noticing that the volume is owned by qemu:qemu in the 'default' pool and that by chown to root:root will allow the volume to be deleted. In fact, that's the key in this case!

So the next question is what causes "qemu:qemu" ownership. So let's go back a few steps. Things exist somehow - I assume there was a command that created that guest, assigned it storage, etc.  Was the storage created/carved out prior to running some sort of vagrant box create command (I don't use vagrant) or does the vagrant processing create it? As root, if I created a volume in the default pool, then it's ownership would be root:root.  As any user, using 'virsh -c qemu:///system vol-create-as ...' the same thing happens (once I'm authenticated). 

But what the issue most likely is that the creation of the storage volume could have supplied the owner/group as qemu (e.g. 107:107).  It's perfectly fine for root to create storage and change it's ownership.  Of course, given the current algorithm that means setting back the ownership afterwards to something that root could delete would also need to be done. For an example, if as root I supplied XML to a vol-create type command that listed "qemu:qemu" in the owner, such as:

<volume type='file'>
  <name>test.img</name>
  <source>
  </source>
  <capacity unit='bytes'>10485760</capacity>
  <allocation unit='bytes'>10485760</allocation>
  <target>
    <path>/home/vm-images/test.img</path>
    <format type='raw'/>
    <permissions>
      <mode>0600</mode>
      <owner>107</owner>
      <group>107</group>
      <label>system_u:object_r:unlabeled_t:s0</label>
    </permissions>
  </target>
</volume>

# virsh vol-create default vol.xml (assuming the above is vol.xml)

Then 

#  virsh vol-delete test.img default
error: Failed to delete vol test.img
error: cannot unlink file '/home/vm-images/test.img': Permission denied

fails.  Setting ownership back to root:root still fails

# chown root:root /home/vm-images/test.img
# virsh vol-delete test.img default
error: Failed to delete vol test.img
error: cannot unlink file '/home/vm-images/test.img': Permission denied

still fails because the XML still has qemu:qemu (e.g., no one's told libvirt and libvirt doesn't save the XML on disk, only in memory).

So, running either '# virsh pool-refresh default' or '# virsh vol-dumpxml test.img default' will reset the XML back to having "root:root" which then allows the deletion.

Now that what's probably happening is clearer - let's see what type of solution I can come up...

Comment 22 Cole Robinson 2016-03-09 16:14:51 UTC

FWIW I have patches I'm just testing now, I'll post within the hour

Comment 23 Cole Robinson 2016-03-09 17:33:35 UTC

*** Bug 1260356 has been marked as a duplicate of this bug. ***

Comment 24 Andrew Woodward 2016-03-09 23:02:13 UTC

Re-ran, other than i used my dos.py to setup the env and create the 'empty' snapshot, I converted all the commands to bash so they can be clearly understood.

I ran through what jferlan commented. I have confirmed that for whatever reason it occurs on a) volumes attached as cdrom attached and maybe that a snapshot was made of it. b) volumes for domains that had previously been running (regardless of creating a snapshot of them in the running state)

volume with a snapshot of a domain that was not ever started had no impact on deleting the volume.

in the case of the running instances (excluding cdrom) they initially didn't erase, after the pool-refresh they deleted

in the case of the cdrom, pool-refresh alone didn't work, only after resetting the permissions and another refresh was virsh able to remove the volume 

http://paste.openstack.org/show/489920/

Comment 25 Ondřej Pták 2016-03-10 13:33:16 UTC

I opened most of my comments, but the 1st should be probably private, because it contains info about not-yet released product.

I created VM by "vagrant up", which uses libvirt by sclo-vagrant1-vagrant-libvirt-0.0.32-1.el7.noarch (vagrant plugin for libvirt support). So probably vagrant is responsible for qemu ownership, I'm not sure. All vagrant commands are executed by ordinary user, with this configuration for allowing access:

cp /opt/rh/sclo-vagrant1/root/usr/share/vagrant/gems/doc/vagrant-libvirt-*/polkit/10-vagrant-libvirt.rules /etc/polkit-1/rules.d
and adding user to group vagrant

that files contains:
/*
 * Allow users in vagrant group to manage libvirt without authentication.
 * Copy this file to /usr/share/polkit-1/rules.d/ to activate.
 */
polkit.addRule(function(action, subject) {
  if ((action.id == "org.libvirt.unix.manage"
    || action.id == "org.libvirt.unix.monitor")
    && subject.isInGroup("vagrant")) {
    return polkit.Result.YES;
  }
});

Comment 26 yisun 2016-03-15 02:55:10 UTC

QE team tried to reproduce this with pure virsh cmd as follow, just record the steps here. 

# qemu-img create -f qcow2 alphaGO.qcow2 1G
Formatting 'alphaGO.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ll | grep alpha
-rw-r--r--. 1 root root      197120 Mar 15 10:45 alphaGO.qcow2

# cat disk.xml
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/alphaGO.qcow2'/>
      <target dev='vdb' bus='virtio'/>
    </disk>


# virsh attach-device test disk.xml
Device attached successfully

# ll | grep alpha
-rw-r--r--. 1 qemu qemu      197120 Mar 15 10:45 alphaGO.qcow2

# virsh pool-refresh default
Pool default refreshed


# virsh vol-delete alphaGO.qcow2 default
error: Failed to delete vol alphaGO.qcow2
error: cannot unlink file '/var/lib/libvirt/images/alphaGO.qcow2': Success


# virsh detach-device test disk.xml
Device detached successfully


# ll | grep alpha
-rw-r--r--. 1 root root      197120 Mar 15 10:45 alphaGO.qcow2

# virsh vol-delete alphaGO.qcow2 default
error: Failed to delete vol alphaGO.qcow2
error: cannot unlink file '/var/lib/libvirt/images/alphaGO.qcow2': Success


# virsh pool-refresh default
Pool default refreshed

# virsh vol-delete alphaGO.qcow2 default
Vol alphaGO.qcow2 deleted

Comment 27 John Ferlan 2016-03-16 10:52:10 UTC

The steps in comment 26 work with the patch submitted by cole applied.



git show adefc561
commit adefc561cc4c6a007529769c3df286f2ed461684
Author: Cole Robinson <crobinso>
Date:   Wed Mar 9 12:20:37 2016 -0500

    util: virfile: Only setuid for virFileRemove if on NFS
    
    NFS with root-squash is the only reason we need to do setuid/setgid
    crazyness in virFileRemove, so limit that behavior to the NFS case.

git describe adefc561
v1.3.2-98-gadefc56


Moving this to POST

Comment 28 Mike McCune 2016-03-28 22:46:32 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 30 yisun 2016-04-15 06:33:04 UTC

Hi John, 
Per commit adefc561cc4c6a007529769c3df286f2ed461684, the img file's owner:group won't be changed to qemu:qemu when it's used by a running guest. And this commit makes it possible to vol-delete a volume even if it's used by a running guest. And the user won't be aware of that. 

So before the bug fixed, the bug itself provides a mechanism to provide volume being deleted accidentally by vol-delete if it's being used by a VM, but after the fix, we can vol-delete the volume anytime.

Do you think this should be considered as a regression?

Comment 31 yisun 2016-04-15 06:35:33 UTC

(In reply to yisun from comment #30)
> Hi John, 
> Per commit adefc561cc4c6a007529769c3df286f2ed461684, the img file's
> owner:group won't be changed to qemu:qemu when it's used by a running guest.
> And this commit makes it possible to vol-delete a volume even if it's used
> by a running guest. And the user won't be aware of that. 
> 
> So before the bug fixed, the bug itself provides a mechanism to provide
> volume being deleted accidentally by vol-delete if it's being used by a VM,
> but after the fix, we can vol-delete the volume anytime.
> 
> Do you think this should be considered as a regression?

... So before the bug fixed, the bug itself provides a mechanism to s/provide/prevent/...

Comment 32 Cole Robinson 2016-04-15 10:47:06 UTC

(In reply to yisun from comment #31)
> (In reply to yisun from comment #30)
> > Hi John, 
> > Per commit adefc561cc4c6a007529769c3df286f2ed461684, the img file's
> > owner:group won't be changed to qemu:qemu when it's used by a running guest.
> > And this commit makes it possible to vol-delete a volume even if it's used
> > by a running guest. And the user won't be aware of that. 
> > 
> > So before the bug fixed, the bug itself provides a mechanism to provide
> > volume being deleted accidentally by vol-delete if it's being used by a VM,
> > but after the fix, we can vol-delete the volume anytime.
> > 
> > Do you think this should be considered as a regression?
> 
> ... So before the bug fixed, the bug itself provides a mechanism to
> s/provide/prevent/...

Personally I don't think that counts as a regression... preventing deletion of a running VMs disk has never been something we aimed to explicitly support. The admin could always go behind libvirt's back and 'rm' the disk anyways

Comment 33 freyes 2016-04-15 19:42:07 UTC

Hi,

I tested commit adefc561 in an Ubuntu machine where I'm having this problem and now I can delete volumes properly, and so far I haven't noticed any regressions.

Cole, thanks for fixing this, I really appreciate it.

Best,

Comment 34 yisun 2016-04-25 09:03:04 UTC

Verified on libvirt-1.3.3-2.el7.x86_64

Steps:

# pwd
/img


# virsh pool-dumpxml testpool
<pool type='dir'>
  <name>testpool</name>
  <uuid>c2fd2db7-9985-4b69-8073-0077c028c715</uuid>
  <capacity unit='bytes'>53660876800</capacity>
  <allocation unit='bytes'>18462621696</allocation>
  <available unit='bytes'>35198255104</available>
  <source>
  </source>
  <target>
    <path>/img</path>
    <permissions>
      <mode>0755</mode>
      <owner>0</owner>
      <group>0</group>
      <label>unconfined_u:object_r:default_t:s0</label>
    </permissions>
  </target>
</pool>


# cat disk.xml 
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/img/testattach'/>
      <target dev='vdb' bus='virtio'/>
    </disk>

# virsh vol-list testpool
 Name                 Path                                    
------------------------------------------------------------------------------
...                     
 testattach           /img/testattach         

# virsh list
 Id    Name                           State
----------------------------------------------------
 35    generic                        running


# virsh attach-device generic disk.xml
Device attached successfully


# virsh destroy generic   <=== due to bz1318181, use destroy instead of detach
Domain generic destroyed

# virsh vol-delete  testattach --pool testpool
Vol testattach deleted    <=== volume can be deleted successfully

Comment 36 errata-xmlrpc 2016-11-03 18:49:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html