Bug 1894359
| Summary: | virt-install race: Storage pool not found: no storage pool with matching name <unrelated pool> | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Martin Pitt <mpitt> | ||||||
| Component: | virt-manager | Assignee: | Cole Robinson <crobinso> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 32 | CC: | berrange, crobinso, jkonecny, mkolman, philip.wyett, rvykydal | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2020-11-24 12:51:55 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Martin Pitt
2020-11-04 06:22:02 UTC
> virt-install-3.1.0-1.fc33.noarch
Sorry, that was a lie, I looked in the wrong place. This happened with
virt-install-2.2.1-3.fc32.noarch
The fc33 version is what I have locally, but unfortunately Fedora 33's docker images don't work any more on a RHEL 7 host, so I have to use a F32 image to run the massively parallel test.
I am trying to get a very small virt-install invocation that causes this effect, without having to endure an actual long installation.
My first naïve attempt was
qemu-img create -f qcow2 test1.img 50M
virt-install --network user --memory 128 --pxe --network user --os-variant alpinelinux3.8 --disk path=test1.img,cache=unsafe --name test1
which essentially just goes into a PXE wait loop -- but that's fine, it defines the domain. But it defines just a single storage pool that's named after the host name ("kstest"):
$ virsh pool-list --all
Name State Autostart
------------------------------
kstest active yes
and running the same thing with s/test1/test2/ also does not create a storage pool. Then I thought maybe it's because the real invocation has two disks, the CD-ROM. But
touch boot.iso
virt-install --network user --memory 128 --pxe --network user --os-variant alpinelinux3.8 --disk path=test1.img,cache=unsafe --disk path=boot.iso,device=cdrom,readonly=on,shareable=on --name test1
but that doesn't create a new pool either. Neither does appending ",bus=sata".
The only reference to pools that I find in https://github.com/rhinstaller/kickstart-tests is scripts/launcher/lib/virtual_controller.py `def destroy()`, which removes a test's own storage pool. But that's not what this bug is about.
So I'm missing something. How does virt-install go into that "create separate storage pool" mode?
Oh, got it -- it's --location! And that requires an actual boot.iso, so: wget https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/Server/x86_64/os/images/boot.iso mkdir test1 ln ./boot.iso test1/ qemu-img create -f qcow2 test1.img 50M virt-install --network user --memory 128 --network user --disk path=test1.img,cache=unsafe --name test1 --location test1/boot.iso,kernel=isolinux/vmlinuz,initrd=isolinux/initrd.img This fails, but it's enough to create a "test1" pool. I'll see to building a reproducer out of that, but that's something for tomorrow. Created attachment 1726822 [details]
initial reproducer
I have a first working reproducer. It creates four virt-installs in a loop (test{1..4}), which closely resembles what happens in our tests. After a few minutes, one of them fails like this:
$ ./reproducer.sh
ERROR Error: --disk path=test1/disk.img,cache=unsafe: Storage pool not found: no storage pool with matching uuid '06cae576-f1eb-44b9-a4b3-6fae9302202c' (test3)
This is exactly this bug (test1 erroring on storage pool test3 getting removed underneath).
You can watch the creation/destruction while the script runs with
watch -n0.2 'virsh pool-list --all; virsh list --all'
Due to its shell/sloppy nature, one failing subprocess won't stop the entire reproducer. Stop the test with `pkill -e reproducer`.
I know this isn't ideal yet, I'll still look for a better one. But at least this confirms that it's not a bug in our kickstart-tests, is _some_ standalone reproducer, and it also happens with virt-install-3.1.0-1.fc33.noarch.
Created attachment 1726828 [details]
better reproducer
I have a better reproducer now, which greatly improves the chance of hitting the race. It creates 100 storage pools, and removes them in the background while trying to do a *single* virt-install run.
I ran it a few times, and get a 100% "success" (i.e. hit the race) rate:
$ ./reproducer2.sh
ERROR Error: --disk path=test1/disk.img,cache=unsafe: Storage pool not found: no storage pool with matching uuid '03957f51-0151-42e5-bf41-3852e27a8654' (pool16)
sometimes it's pool17 or similar, that depends on the timing. But this seems to be reliable enough.
Now that virt-install doesn't run a thousand times, replacing its --quiet with --debug is insightful:
[Thu, 05 Nov 2020 11:02:50 virt-install 41734] DEBUG (cli:204) Launched with command line: /usr/bin/virt-install --name test1 --debug --network user --memory 128 --disk path=test1/disk.img,cache=unsafe --location test1/boot.iso,kernel=isolinux/vmlinuz,initrd=isolinux/initrd.img
[Thu, 05 Nov 2020 11:02:50 virt-install 41734] DEBUG (virtinstall:212) Distilled --network options: ['user']
[Thu, 05 Nov 2020 11:02:50 virt-install 41734] DEBUG (virtinstall:144) Distilled --disk options: ['path=test1/disk.img,cache=unsafe']
[Thu, 05 Nov 2020 11:02:50 virt-install 41734] DEBUG (cli:216) Requesting libvirt URI default
[Thu, 05 Nov 2020 11:02:50 virt-install 41734] DEBUG (cli:219) Received libvirt URI qemu:///session
[Thu, 05 Nov 2020 11:02:50 virt-install 41734] DEBUG (cli:1514) Exception parsing inst=<DeviceDisk disk 139935836089552> optstr=path=test1/disk.img,cache=unsafe
Traceback (most recent call last):
File "/usr/share/virt-manager/virtinst/cli.py", line 1501, in parse
objs = self._parse(inst is None and self.guest or inst)
File "/usr/share/virt-manager/virtinst/cli.py", line 3308, in _parse
super()._parse(inst)
File "/usr/share/virt-manager/virtinst/cli.py", line 1471, in _parse
param.parse_param(self, inst)
File "/usr/share/virt-manager/virtinst/cli.py", line 1135, in parse_param
xmlutil.set_prop_path(inst, self.propname, self.val)
File "/usr/share/virt-manager/virtinst/xmlutil.py", line 66, in set_prop_path
return setattr(parent, pieces[-1], value)
File "/usr/share/virt-manager/virtinst/devices/disk.py", line 337, in _set_path
(vol_object, parent_pool) = diskbackend.manage_path(self.conn, newpath)
File "/usr/share/virt-manager/virtinst/diskbackend.py", line 137, in manage_path
if not path_is_url(path) and not path_is_network_vol(conn, path):
File "/usr/share/virt-manager/virtinst/diskbackend.py", line 171, in path_is_network_vol
for volxml in conn.fetch_all_vols():
File "/usr/share/virt-manager/virtinst/connection.py", line 273, in fetch_all_vols
return self._fetch_helper(
File "/usr/share/virt-manager/virtinst/connection.py", line 179, in _fetch_helper
self._fetch_cache[key] = raw_cb()
File "/usr/share/virt-manager/virtinst/connection.py", line 222, in _fetch_all_vols_raw
for poolxmlobj in self.fetch_all_pools():
File "/usr/share/virt-manager/virtinst/connection.py", line 264, in fetch_all_pools
return self._fetch_helper(
File "/usr/share/virt-manager/virtinst/connection.py", line 179, in _fetch_helper
self._fetch_cache[key] = raw_cb()
File "/usr/share/virt-manager/virtinst/connection.py", line 195, in _fetch_all_pools_raw
return [self._build_pool_raw(poolobj) for poolobj in ret]
File "/usr/share/virt-manager/virtinst/connection.py", line 195, in <listcomp>
return [self._build_pool_raw(poolobj) for poolobj in ret]
File "/usr/share/virt-manager/virtinst/connection.py", line 190, in _build_pool_raw
parsexml=poolobj.XMLDesc(0))
File "/usr/lib64/python3.9/site-packages/libvirt.py", line 3502, in XMLDesc
if ret is None: raise libvirtError ('virStoragePoolGetXMLDesc() failed', pool=self)
libvirt.libvirtError: Storage pool not found: no storage pool with matching uuid '3d85ec50-f69f-45af-9aa5-dd3151773269' (pool15)
[Thu, 05 Nov 2020 11:02:50 virt-install 41734] DEBUG (cli:255) File "/usr/bin/virt-install", line 8, in <module>
virtinstall.runcli()
File "/usr/share/virt-manager/virtinst/virtinstall.py", line 1158, in runcli
sys.exit(main())
File "/usr/share/virt-manager/virtinst/virtinstall.py", line 1145, in main
guest, installer = build_guest_instance(conn, options)
File "/usr/share/virt-manager/virtinst/virtinstall.py", line 580, in build_guest_instance
guest = _build_options_guest(conn, options)
File "/usr/share/virt-manager/virtinst/virtinstall.py", line 561, in _build_options_guest
cli.parse_option_strings(options, guest, None)
File "/usr/share/virt-manager/virtinst/cli.py", line 4391, in parse_option_strings
parseret = parserobj.parse(inst)
File "/usr/share/virt-manager/virtinst/cli.py", line 1516, in parse
fail(_("Error: %(cli_flag_name)s %(options)s: %(err)s") %
File "/usr/share/virt-manager/virtinst/cli.py", line 255, in fail
log.debug("".join(traceback.format_stack()))
[Thu, 05 Nov 2020 11:02:50 virt-install 41734] ERROR (cli:256) Error: --disk path=test1/disk.img,cache=unsafe: Storage pool not found: no storage pool with matching uuid '3d85ec50-f69f-45af-9aa5-dd3151773269' (pool15)
[Thu, 05 Nov 2020 11:02:50 virt-install 41734] DEBUG (cli:258)
Traceback (most recent call last):
File "/usr/share/virt-manager/virtinst/cli.py", line 1501, in parse
objs = self._parse(inst is None and self.guest or inst)
File "/usr/share/virt-manager/virtinst/cli.py", line 3308, in _parse
super()._parse(inst)
File "/usr/share/virt-manager/virtinst/cli.py", line 1471, in _parse
param.parse_param(self, inst)
File "/usr/share/virt-manager/virtinst/cli.py", line 1135, in parse_param
xmlutil.set_prop_path(inst, self.propname, self.val)
File "/usr/share/virt-manager/virtinst/xmlutil.py", line 66, in set_prop_path
return setattr(parent, pieces[-1], value)
File "/usr/share/virt-manager/virtinst/devices/disk.py", line 337, in _set_path
(vol_object, parent_pool) = diskbackend.manage_path(self.conn, newpath)
File "/usr/share/virt-manager/virtinst/diskbackend.py", line 137, in manage_path
if not path_is_url(path) and not path_is_network_vol(conn, path):
File "/usr/share/virt-manager/virtinst/diskbackend.py", line 171, in path_is_network_vol
for volxml in conn.fetch_all_vols():
File "/usr/share/virt-manager/virtinst/connection.py", line 273, in fetch_all_vols
return self._fetch_helper(
File "/usr/share/virt-manager/virtinst/connection.py", line 179, in _fetch_helper
self._fetch_cache[key] = raw_cb()
File "/usr/share/virt-manager/virtinst/connection.py", line 222, in _fetch_all_vols_raw
for poolxmlobj in self.fetch_all_pools():
File "/usr/share/virt-manager/virtinst/connection.py", line 264, in fetch_all_pools
return self._fetch_helper(
File "/usr/share/virt-manager/virtinst/connection.py", line 179, in _fetch_helper
self._fetch_cache[key] = raw_cb()
File "/usr/share/virt-manager/virtinst/connection.py", line 195, in _fetch_all_pools_raw
return [self._build_pool_raw(poolobj) for poolobj in ret]
File "/usr/share/virt-manager/virtinst/connection.py", line 195, in <listcomp>
return [self._build_pool_raw(poolobj) for poolobj in ret]
File "/usr/share/virt-manager/virtinst/connection.py", line 190, in _build_pool_raw
parsexml=poolobj.XMLDesc(0))
File "/usr/lib64/python3.9/site-packages/libvirt.py", line 3502, in XMLDesc
if ret is None: raise libvirtError ('virStoragePoolGetXMLDesc() failed', pool=self)
libvirt.libvirtError: Storage pool not found: no storage pool with matching uuid '3d85ec50-f69f-45af-9aa5-dd3151773269' (pool15)
So this is a TOCTOU again between _fetch_all_pools_raw() and introspecting a particular pool wiht the .XMLDesc() function.
I sent a fix to upstream: https://github.com/virt-manager/virt-manager/pull/179 Martin's fixed were applied upstream now The fix got released upstream in v3.2.0, and delivered into F33 in https://bodhi.fedoraproject.org/updates/FEDORA-2020-0782be7dfb I ran the reproducer again, and it is stable now. So this is fixed in F33, and for F32 we can backport this locally. Thanks! |