Bug 1227475 - storage: transient pools cannot be undefined if source dir is deleted and libvirtd is restarted
Summary: storage: transient pools cannot be undefined if source dir is deleted and lib...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard: LibvirtFirstBug
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-02 18:19 UTC by Richard W.M. Jones
Modified: 2017-09-14 20:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-14 20:33:11 UTC
Embargoed:


Attachments (Terms of Use)

Description Richard W.M. Jones 2015-06-02 18:19:31 UTC
Description of problem:

I don't know how I got into this state, but I now have a pool
which is inactive but cannot be undefined:

$ virsh pool-destroy test-v2v-libvirt
error: Failed to destroy pool test-v2v-libvirt
error: Requested operation is not valid: storage pool 'test-v2v-libvirt' is not active

$ virsh pool-undefine test-v2v-libvirt
error: Failed to undefine pool test-v2v-libvirt
error: internal error: no config file for test-v2v-libvirt

$ virsh pool-dumpxml test-v2v-libvirt
<pool type='dir'>
  <name>test-v2v-libvirt</name>
  <uuid>34f8bdbf-02cb-4bc8-832c-9db7f48e6ecf</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
  </source>
  <target>
    <path>/tmp/goaljobstmpfb2510405693e8e5c8273dc371f676f4/libguestfs-1.29.44/v2v/test-v2v-o-libvirt.d</path>
    <permissions>
      <mode>0755</mode>
      <owner>-1</owner>
      <group>-1</group>
    </permissions>
  </target>
</pool>

There are two problems here: Firstly the "internal error" is
not actionable - what config file is it looking for?  Secondly,
how can I delete the pool?

Version-Release number of selected component (if applicable):

libvirt-1.2.15-2.fc23.x86_64

How reproducible:

100%

Steps to Reproduce:

Unknown.

Comment 1 Richard W.M. Jones 2015-06-02 18:20:16 UTC
Also the directory doesn't exist:

$ ls /tmp/goaljobstmpfb2510405693e8e5c8273dc371f676f4
ls: cannot access /tmp/goaljobstmpfb2510405693e8e5c8273dc371f676f4: No such file or directory

which may or may not be the problem.

Comment 2 Richard W.M. Jones 2015-06-02 18:21:47 UTC
Ah, so there we go.  The way to destroy the pool is:

(1) Create the pool directory.

(2) At this point, 'virsh pool-destroy' will both delete the
directory and destroy the pool ('virsh pool-undefine' seems to
be unnecessary after this).

Comment 3 Erik Skultety 2015-06-03 08:20:52 UTC
(In reply to Richard W.M. Jones from comment #2)
> (2) At this point, 'virsh pool-destroy' will both delete the
> directory and destroy the pool ('virsh pool-undefine' seems to
> be unnecessary after this).

pool-destroy won't touch the data or pool itself, it only stops the pool (mark it as inactive). If the pool was  originally created as transient, no config file was created the pool object will be deallocated and next pool-list won't include such a pool in the output.
However, what you probably meant is pool-delete which also removes, in this case, a directory. Yet this is a very strange scenario which I haven't been able to reproduce so far in any way.
Any chance you have some daemon log you could attach?

Comment 4 Richard W.M. Jones 2015-09-03 15:38:44 UTC
I guess we can close this because I don't know how the machine
got into this state, and it didn't reoccur.

Comment 5 Richard W.M. Jones 2015-10-09 07:42:59 UTC
I had this bug happen again, and again the only way to get rid
of the pool was to recreate the pool directory, then run pool-destroy.

See shell output below.

$ virsh pool-list --all
 Name                 State      Autostart 
-------------------------------------------
 bz                   inactive   yes       
 default              inactive   yes       
 test-v2v-libvirt     inactive   no        
 virt-builder-rhel-images active     yes       
 website              active     yes       

$ virsh pool-destroy test-v2v-libvirt
error: Failed to destroy pool test-v2v-libvirt
error: Requested operation is not valid: storage pool 'test-v2v-libvirt' is not active

$ virsh pool-undefine test-v2v-libvirt
error: Failed to undefine pool test-v2v-libvirt
error: internal error: no config file for test-v2v-libvirt

$ virsh pool-dumpxml test-v2v-libvirt
<pool type='dir'>
  <name>test-v2v-libvirt</name>
  <uuid>74a508fd-8b7a-4145-ac45-d433cc4b06b9</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
  </source>
  <target>
    <path>/tmp/goaljobstmp8361f4fad42eedb3c97d95ba153a137f/libguestfs-1.31.14/v2v/test-v2v-o-libvirt.d</path>
  </target>
</pool>

$ rm -rf /tmp/goaljobstmp8361f4fad42eedb3c97d95ba153a137f
$ virsh pool-dumpxml test-v2v-libvirt
<pool type='dir'>
  <name>test-v2v-libvirt</name>
  <uuid>74a508fd-8b7a-4145-ac45-d433cc4b06b9</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
  </source>
  <target>
    <path>/tmp/goaljobstmp8361f4fad42eedb3c97d95ba153a137f/libguestfs-1.31.14/v2v/test-v2v-o-libvirt.d</path>
  </target>
</pool>

$ virsh pool-undefine test-v2v-libvirt
error: Failed to undefine pool test-v2v-libvirt
error: internal error: no config file for test-v2v-libvirt

$ virsh pool-destroy test-v2v-libvirt
error: Failed to destroy pool test-v2v-libvirt
error: Requested operation is not valid: storage pool 'test-v2v-libvirt' is not active

$ mkdir -p /tmp/goaljobstmp8361f4fad42eedb3c97d95ba153a137f/libguestfs-1.31.14/v2v/test-v2v-o-libvirt.d
$ virsh pool-destroy test-v2v-libvirtPool test-v2v-libvirt destroyed

$ virsh pool-list --all Name                 State      Autostart 
-------------------------------------------
 bz                   inactive   yes       
 default              inactive   yes       
 virt-builder-rhel-images active     yes       
 website              active     yes

Comment 6 Richard W.M. Jones 2016-01-22 15:19:15 UTC
Still happens in libvirt-1.3.0-1.fc24.

Comment 7 Cole Robinson 2016-04-12 18:36:38 UTC
Here's a reproducer:

$ mkdir /tmp/pool
$ sudo virsh pool-create-as --name trans-tmp --target /tmp/pool --type dir
$ sudo systemctl stop libvirtd
$ rm -rf /tmp/pool
$ sudo systemctl start libvirtd
$ sudo virsh pool-info trans-tmp
Name:           trans-tmp
UUID:           8bd3a9dd-41c8-40db-8d4e-9f9f64b81239
State:          inactive
Persistent:     no
Autostart:      no

The driver autostart bits, and several other areas in storage_driver.c, don't cleanup transient pools correct if their state is set to inactive.

I'd like to save this bug for a potential GSOC student for the storage events project, since this will be a decent introduction to functionally interacting with the storage subsystem.

This particular issue is in storage_driver.c:storagePoolUpdateState : the code handling the refreshPool failure doesn't take into account pool->configFile == NULL (a transient pool), and removing it when it is in shutdown state. In fact, 
storagePoolUpdateAllState, which calls storagePoolUpdateState, should be checking for failure, and ensuring the pool is removed from the object list.

Also, many other places that call backend->stopPool in cleanup paths are suspect if they don't look at both pool->configFile and pool-newDef. The only place that gets it right is storagePoolDestroy, so that function's logic should be separated into a shareable function and used in several cleanup paths.

Comment 8 Richard W.M. Jones 2016-05-23 18:42:09 UTC
Still happens with libvirt-1.3.4-2.fc25.x86_64.

Comment 9 Cole Robinson 2016-06-23 00:12:46 UTC
Patches posted

http://www.redhat.com/archives/libvir-list/2016-June/msg01600.html

Comment 10 Cole Robinson 2017-09-14 20:33:11 UTC
A different set of patches eventually landed upstream, example:


commit f3a8e80c130513c2b488df5a561c788133148685
Author: Peter Krempa <pkrempa>
Date:   Thu Mar 30 13:47:45 2017 +0200

    storage: driver: Remove unavailable transient pools after restart


Note You need to log in before you can comment on or make changes to this bug.