Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2177837

Summary: When provisioning multiple VMs with ISOs from Satellite, deployment fails because of storage pool name conflicts
Product: Red Hat Enterprise Linux 8 Reporter: Allie DeVolder <adevolder>
Component: virt-managerAssignee: Jonathon Jongsma <jjongsma>
Status: CLOSED WONTFIX QA Contact: Hongzhou Liu <hongzliu>
Severity: high Docs Contact:
Priority: medium    
Version: 8.7CC: crobinso, gveitmic, hongzliu, jsuchane, juzhou, lmen, tyan, tzheng, virt-maint
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-22 12:38:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2177835    

Description Allie DeVolder 2023-03-13 16:46:50 UTC
Description of problem: 

When provisioning multiple VMs with ISOs from Satellite, deployment fails because of storage pool name conflicts

Version-Release number of selected component (if applicable):
libvirt-8.0.0-10.1.module+el8.7.0+17192+cbc2449b.x86_64

How reproducible:
100%


Steps to Reproduce:
1. Provision multiple VMs from Satellite using ISOs for boot media

Actual results:

VMs fail to be created because libvirt creates a storage pool for the provided ISO using a name based on the ISO specified, so other VMs fail because that storage pool name is already in use.

Expected results:

Multiple VMs provisioned as expected from Satellite.

Additional info:

This can be worked around by implemented a time delay between VMs, but this is severely hindering in a production environment.

Comment 1 Peter Krempa 2023-03-13 18:57:19 UTC
Libvirt is not creating any storage pools automatically/on behalf of users. Each storage pool must be defined explicitly with a name controlled by the entity creating the storage pool. What management tool are you using? What error message are you seeing?

Comment 2 Meina Li 2023-03-14 06:05:18 UTC
I also tried to test it, but it did not reproduce.I'm not sure if my test scenario is reasonable. Only for reference.

Test Version:
libvirt-8.0.0-18.module+el8.8.0+18287+112bb4e6.x86_64
qemu-kvm-6.2.0-31.module+el8.8.0+18188+901de023.x86_64

Test Steps:
1. Create a vm by using virt-install and iso file.
# virt-install -v --name test --ram 3072 --vcpus=2 --boot hd --disk path=/tmp/boot.iso,device=cdrom --network bridge:virbr0 --graphics vnc --wait -1 --os-type linux --os-variant=rhel7.0 --accelerate --console pty,target_type=serial --disk path=/dev/sdb,device=disk,bus=virtio --disk path=/dev/sdc,device=disk,bus=virtio
WARNING  --os-type is deprecated and does nothing. Please stop using it.

Starting install...
Creating domain...                                             |         00:00:00     
Running graphical console command: virt-viewer --connect qemu:///system --wait test

Domain is still running. Installation may be in progress.
Waiting for the installation to complete.

2. Create another vm by using the same iso.
# virt-install -v --name test1 --ram 3072 --vcpus=2 --boot hd --disk path=/tmp/boot.iso,device=cdrom --network bridge:virbr0 --graphics vnc --wait -1 --os-type linux --os-variant=rhel7.0 --accelerate --console pty,target_type=serial --disk path=/dev/sdd,device=disk,bus=virtio --disk path=/dev/sde,device=disk,bus=virtio
WARNING  --os-type is deprecated and does nothing. Please stop using it.

Starting install...
Creating domain...                                             |         00:00:00     
Running graphical console command: virt-viewer --connect qemu:///system --wait test1

Domain is still running. Installation may be in progress.
Waiting for the installation to complete.

Can't get that error.

Comment 3 Peter Krempa 2023-03-14 07:18:39 UTC
Ah, so in the linked case it's done via 'virt-install':

virt-install -v --name as64_47774 --ram 3072 --vcpus=2 --boot hd --disk path=/tmp/as64_47774/ISO/seed.iso,device=cdrom --network bridge:virbrapp --graphics none --wait -1 --os-type linux --os-variant=rhel7.0 ${gpu_list} --accelerate --console pty,target_type=serial --disk path=/dev/dom0.virt/as64_47774_OS_VM,device=disk,bus=virtio --disk path=/dev/dom0.virt/as64_47774_LOG_LVM,device=disk,bus=virtio '"
="++ virt-install -v --name as64_47774 --ram 3072 --vcpus=2 --boot hd --disk path=/tmp/as64_47774/ISO/seed.iso,device=cdrom --network bridge:virbrapp --graphics none --wait -1 --os-type linux --os-variant=rhel7.0 --accelerate --console pty,target_type=serial --disk path=/dev/dom0.virt/as64_47774_OS_VM,device=disk,bus=virtio --disk path=/dev/dom0.virt/as64_47774_LOG_LVM,device=disk,bus=virtio"
ERROR Error: --disk path=/tmp/as64_47774/ISO/seed.iso,device=cdrom: Could not define storage pool: operation failed: pool 'ISO-973' already exists with uuid 42f1b413-4957-469c-b706-b60fcf8cd841

Moving to the 'virt-manager' component.

Comment 4 Germano Veit Michel 2023-03-16 23:18:18 UTC
# mkdir -p /tmp/test1/disks
# mkdir -p /tmp/test2/disks
# qemu-img create -f qcow2 /tmp/test1/disks/disk.qcow2 1G
# qemu-img create -f qcow2 /tmp/test2/disks/disk.qcow2 1G

virt-install -d -v --name test1 --ram 1024 --vcpus=1 --boot hd --disk path=/tmp/test1/disks/disk.qcow2 --graphics none --wait -1 --os-type linux --os-variant=rhel7.0 --accelerate --console pty,target_type=serial
virt-install -d -v --name test2 --ram 1024 --vcpus=1 --boot hd --disk path=/tmp/test2/disks/disk.qcow2 --graphics none --wait -1 --os-type linux --os-variant=rhel7.0 --accelerate --console pty,target_type=serial

# virsh pool-list
 Name      State    Autostart
-------------------------------
 disks     active   yes
 disks-1   active   yes

It uses last child directory name (i.e. 'disks' if the path of the volume is /tmp/test2/disks/disk.qcow2) to name the storage pool.
If the pool already exists, it adds a sufix with a number.

The problem seems to be a race condition when doing 2+ VMs at the time, so it tries to create 2+ pools with the same name (a +1 bump from the previous sufix).

Maybe using the full path (i.e. tmp_test2_disks) would fix this in an easier way than dealing with a racing condition.

Comment 5 Germano Veit Michel 2023-03-16 23:22:01 UTC
(In reply to Meina Li from comment #2)
> I also tried to test it, but it did not reproduce.I'm not sure if my test
> scenario is reasonable. Only for reference.
> 
> Test Version:
> libvirt-8.0.0-18.module+el8.8.0+18287+112bb4e6.x86_64
> qemu-kvm-6.2.0-31.module+el8.8.0+18188+901de023.x86_64
> 
> Test Steps:
> 1. Create a vm by using virt-install and iso file.
> # virt-install -v --name test --ram 3072 --vcpus=2 --boot hd --disk
> path=/tmp/boot.iso,device=cdrom --network bridge:virbr0 --graphics vnc
> --wait -1 --os-type linux --os-variant=rhel7.0 --accelerate --console
> pty,target_type=serial --disk path=/dev/sdb,device=disk,bus=virtio --disk
> path=/dev/sdc,device=disk,bus=virtio
> WARNING  --os-type is deprecated and does nothing. Please stop using it.
> 
> Starting install...
> Creating domain...                                             |        
> 00:00:00     
> Running graphical console command: virt-viewer --connect qemu:///system
> --wait test
> 
> Domain is still running. Installation may be in progress.
> Waiting for the installation to complete.
> 
> 2. Create another vm by using the same iso.
> # virt-install -v --name test1 --ram 3072 --vcpus=2 --boot hd --disk
> path=/tmp/boot.iso,device=cdrom --network bridge:virbr0 --graphics vnc
> --wait -1 --os-type linux --os-variant=rhel7.0 --accelerate --console
> pty,target_type=serial --disk path=/dev/sdd,device=disk,bus=virtio --disk
> path=/dev/sde,device=disk,bus=virtio
> WARNING  --os-type is deprecated and does nothing. Please stop using it.
> 
> Starting install...
> Creating domain...                                             |        
> 00:00:00     
> Running graphical console command: virt-viewer --connect qemu:///system
> --wait test1
> 
> Domain is still running. Installation may be in progress.
> Waiting for the installation to complete.
> 
> Can't get that error.

That will not work as both VMs will use the same pool named tmp, you used on both VMs:
path=/tmp/boot.iso
path=/tmp/boot.iso

You need different paths to hit it, so it tries to create twice as the 2nd VM path is not a storage pool, and names it with a similar name, triggering the suffix logic:
path=/tmp/vm1/iso/boot.iso
path=/tmp/vm2/iso/boot.iso

This should create an ISO and ISO-1 pool, one for each. Then you loop this quickly and eventually 2 processes will try to create a pool with same name (i.e. ISO-xyz).

Comment 7 Hongzhou Liu 2023-03-20 07:23:17 UTC
I can reproduce this problem on my machine

packages:
libvirt-8.0.0-10.3.module+el8.7.0+18207+556790c8.x86_64
virt-install-3.2.0-4.el8.noarch


my cmd:

>-------------------------
# virt-install --name vm1 --location /home//iso/RHEL-9.2.0-20230205.12-x86_64-dvd1.iso --boot hd --osinfo detect=on --memory 4096 --disk /home/iso/vm1.img,size=20 & virt-install --name vm2 --location /home//iso/RHEL-9.2.0-20230205.12-x86_64-dvd1.iso --boot hd --osinfo detect=on --memory 4096 --disk /home/iso/vm2.img,size=20 & virt-install --name vm3 --location /home//iso/RHEL-9.2.0-20230205.12-x86_64-dvd1.iso --boot hd --osinfo detect=on --memory 4096 --disk /home/iso/vm3.img,size=20 & virt-install --name vm4 --location /home//iso/RHEL-9.2.0-20230205.12-x86_64-dvd1.iso --boot hd --osinfo detect=on --memory 4096 --disk /home/iso/vm4.img,size=20 & virt-install --location /home//iso/RHEL-9.2.0-20230205.12-x86_64-dvd1.iso --boot hd --osinfo detect=on --memory 4096 --disk /home/iso/vm5.img,size=20 --name vm5

[1] 21735
[2] 21736
[3] 21737
[4] 21738
Error setting up logfile: Could not create directory /root/.cache/virt-manager: [Errno 17] File exists: '/root/.cache/virt-manager'
ERROR    Error: --disk /home/iso/vm1.img,size=20: Could not define storage pool: operation failed: pool 'iso' already exists with uuid 0b13ad8c-fa24-40d1-976a-1791b445d3e2
ERROR    Error: --disk /home/iso/vm2.img,size=20: Could not define storage pool: operation failed: pool 'iso' already exists with uuid 0b13ad8c-fa24-40d1-976a-1791b445d3e2
ERROR    Error: --disk /home/iso/vm3.img,size=20: Could not define storage pool: operation failed: pool 'iso' already exists with uuid 0b13ad8c-fa24-40d1-976a-1791b445d3e2
WARNING  Graphics requested but DISPLAY is not set. Not running virt-viewer.
WARNING  No console to launch for the guest, defaulting to --wait -1

Starting install...
WARNING  Graphics requested but DISPLAY is not set. Not running virt-viewer.
WARNING  No console to launch for the guest, defaulting to --wait -1

>--------------------------


In my experiences, not all installation fails. The number of failed cases is different each time, sometimes all cases will pass.

Comment 8 Jonathon Jongsma 2023-03-21 19:23:30 UTC
So, (In reply to Germano Veit Michel from comment #5)
> That will not work as both VMs will use the same pool named tmp, you used on
> both VMs:
> path=/tmp/boot.iso
> path=/tmp/boot.iso
> 
> You need different paths to hit it, so it tries to create twice as the 2nd
> VM path is not a storage pool, and names it with a similar name, triggering
> the suffix logic:
> path=/tmp/vm1/iso/boot.iso
> path=/tmp/vm2/iso/boot.iso
> 
> This should create an ISO and ISO-1 pool, one for each. Then you loop this
> quickly and eventually 2 processes will try to create a pool with same name
> (i.e. ISO-xyz).

Judging by comment #7, this can still happen when the paths are to the same location, provided that the pool did not exist when the batch operation started. 

(In reply to Germano Veit Michel from comment #4)
> Maybe using the full path (i.e. tmp_test2_disks) would fix this in an easier
> way than dealing with a racing condition.

This would seem to solve the specific use case that seems to have triggered this bug. I.e. the case where where multiple virt-install instances are started with disks located in different /tmp/$PER_GUEST_DIR/ISO directories. 

But the underlying race would still exist as shown in comment #7. For example, I could imagine a management app like Satellite triggering a batch installation process where the ISOs were located in a batch-specific directory rather than a guest-specific directory. For example, installing 100 guests from the same iso located at /tmp/batch-xyz/disk.iso. Then each virt-install instance would try to create a pool named tmp_batch-xyz, and we'd run into the same error.

The root issue is in the way virt-install deals with generating a pool name. When virt-install wants to auto-create a pool named $POOLNAME, it first queries the list of storage pools and caches that list. It iterates that list to determine whether a pool with that name exists. If it does, it checks for $POOLNAME-1, then $POOLNAME-2, and so on. When it finally finds a $POOLNAME-$N that does not exist, it attempts to create a new pool with that name. But since it is operating on a cached list of pools, any pools that were created by other clients between the time that it first queried the list of pools and the time when it tried to create the new pool would not be checked.

Maybe using the full file path is good enough, though. It will certainly reduce the scenarios where the race would occur.

Cole, do you have any opinions on how much we should do in virt-install to acommodate this sort of parallel execution approach?

Comment 10 Germano Veit Michel 2023-03-21 21:32:09 UTC
(In reply to Jonathon Jongsma from comment #8)
> Judging by comment #7, this can still happen when the paths are to the same
> location, provided that the pool did not exist when the batch operation
> started.

True. But in that case the race condition has a start and end date, it can only happen until the Nth process starts seeing the new pool, then it will just re-use it instead of creating new ones.
With the different directories it is much easier to hit it, as its always creating new pools.
 
> But the underlying race would still exist as shown in comment #7. For
> example, I could imagine a management app like Satellite triggering a batch
> installation process where the ISOs were located in a batch-specific
> directory rather than a guest-specific directory. For example, installing
> 100 guests from the same iso located at /tmp/batch-xyz/disk.iso. Then each
> virt-install instance would try to create a pool named tmp_batch-xyz, and
> we'd run into the same error.

Indeed, but once the pool is created, the next 100 batch has zero chances of hitting it.
So in this case the customer may hit it once, and then not again. Less annoying.

> The root issue is in the way virt-install deals with generating a pool name.
> When virt-install wants to auto-create a pool named $POOLNAME, it first
> queries the list of storage pools and caches that list. It iterates that
> list to determine whether a pool with that name exists. If it does, it
> checks for $POOLNAME-1, then $POOLNAME-2, and so on. When it finally finds a
> $POOLNAME-$N that does not exist, it attempts to create a new pool with that
> name. But since it is operating on a cached list of pools, any pools that
> were created by other clients between the time that it first queried the
> list of pools and the time when it tried to create the new pool would not be
> checked.
> 
> Maybe using the full file path is good enough, though. It will certainly
> reduce the scenarios where the race would occur.

I think so, as it resolves the customer use case, which is the non-stop creation of more and more pools.
Note this appears to be the use case of our own product, Satellite.

Allie, do you want to offer the customer the RPMs at #comment 9 as test packages for the customer to try? I think it could be a good idea to ensure it solves the problem.
Note its test package, not supported and its not a hotfix.

Comment 11 Jonathon Jongsma 2023-03-21 22:04:28 UTC
(In reply to Germano Veit Michel from comment #10)
> (In reply to Jonathon Jongsma from comment #8)
> > Judging by comment #7, this can still happen when the paths are to the same
> > location, provided that the pool did not exist when the batch operation
> > started.
> 
> True. But in that case the race condition has a start and end date, it can
> only happen until the Nth process starts seeing the new pool, then it will
> just re-use it instead of creating new ones.
> With the different directories it is much easier to hit it, as its always
> creating new pools.

Indeed, but any failure would be a significant customer issue, I would think.


> > But the underlying race would still exist as shown in comment #7. For
> > example, I could imagine a management app like Satellite triggering a batch
> > installation process where the ISOs were located in a batch-specific
> > directory rather than a guest-specific directory. For example, installing
> > 100 guests from the same iso located at /tmp/batch-xyz/disk.iso. Then each
> > virt-install instance would try to create a pool named tmp_batch-xyz, and
> > we'd run into the same error.
> 
> Indeed, but once the pool is created, the next 100 batch has zero chances of
> hitting it.
> So in this case the customer may hit it once, and then not again. Less
> annoying.

Well, in my (imaginary future) scenario, the next batch might use ISOs located in a different batch-specific directory (e.g. `/tmp/batch-ABC/`), and then we'd run into the same error for the first few virt-install invocations of that batch. And then the next batch might use /tmp/batch-DEF/ and so on. So it's not necessarily a "once-and-never-again" situation. It could be a "once-per-batch" situation. Again, this is all hypothetical; I'm just illustrating that using the full path for the pool name is not fully fixing the root problem. And depending on the design of the management app, this could pop up again in the future. 

Out of curiosity, how many parallel jobs does satellite trigger? It seems to me that this sort of parallel usage is a bit beyond the anticipated use case for virt-install, so I'm quite interested to get Cole's take on the issue.

Comment 13 Germano Veit Michel 2023-03-21 22:12:30 UTC
(In reply to Jonathon Jongsma from comment #11)
> Well, in my (imaginary future) scenario, the next batch might use ISOs
> located in a different batch-specific directory

Fair enough. My imaginary future was a bit different :)

> Out of curiosity, how many parallel jobs does satellite trigger? It seems to
> me that this sort of parallel usage is a bit beyond the anticipated use case
> for virt-install, so I'm quite interested to get Cole's take on the issue.

I guess it depends on how many VMs the customer tells satellite to provision.

At the end of the day it sounds like a very complicate proposition to resolve the actual race condition, N processes talking to libvirt.
I'm not a libvirt or virt-install dev but I imagine it will be hard to make the whole thing atomic. Or maybe use random names, unsure.
As long as we solve the customer use case I think its OK. I believe this problem has been there since day 0 and we are only seeing it now, so it should be a rare use case.

Comment 21 Cole Robinson 2023-03-28 17:19:15 UTC
@jjongsma IMO your suggestion may create new problems. My first thought is filename length vs PATH_MAX: libvirt has an implicit max object name length of 255 chars, so this would need extra care. Also virsh output and UI tools would likely be uglier cramming full path into the object name. And then whatever other issues it might trigger. I'm not in favor of that type of change without a lot more testing.


Handling parallel execution like this is difficult. virt-install fills in defaults and checks conflicts for a number of things, and there's many possible TOCTTOU issues. Consider even the case where the same ISO /tmp/ISODIR/my.iso being used for 2 installs:

* virt-install 1 caches pool list
* virt-install 2 caches pool list
* virt-install 1 sees /tmp/ISODIR/ isn't a pool in the cache, creates it
* virt-install 2 sees /tmp/ISODIR/ isn't a pool in the cache, creates it -> error: ISODIR pool already exists

We could rearchitect virt-install to not attempt to generate a non-conflicting pool name ahead of time, and instead rely on the `XML define` step to tell us if the name is free or not, and if it isn't, try the next generate name, etc. But then every collision hit would generate a 'pool name in use' error in libvirt logs, which users don't like. That was one of the motivations for doing the collision check the way we do it now.


Can someone clarify, is virt-install invoked by satellite, or is happening in customer scripts?
My feeling is this is simplest to fix on the user or satellite side:

* Add a `sleep 5` between each virt-install kickoff. That's probably enough if they are invoked truly serially in the same process. Compared to network download time and VM install time it probably doesn't add much overhead.
* Create each pool ahead of time before the virt-install kickoff. Then virt-install doesn't need to auto create any pool here: sudo virsh pool-create-as --type dir --name "$(basename $(dirname $ISO))" --source-path "$(dirname $ISO)"
* Rearrange where ISOs are generated, like drop them in the tmpdir and not $tmpdir/seed, to trick virt-install into using the already unique tmpdir as the pool name

Thoughts?

Comment 22 Jonathon Jongsma 2023-03-28 18:57:51 UTC
(In reply to Cole Robinson from comment #21)
> Handling parallel execution like this is difficult. virt-install fills in
> defaults and checks conflicts for a number of things, and there's many
> possible TOCTTOU issues. Consider even the case where the same ISO
> /tmp/ISODIR/my.iso being used for 2 installs:
> 
> * virt-install 1 caches pool list
> * virt-install 2 caches pool list
> * virt-install 1 sees /tmp/ISODIR/ isn't a pool in the cache, creates it
> * virt-install 2 sees /tmp/ISODIR/ isn't a pool in the cache, creates it ->
> error: ISODIR pool already exists

Yeah, that's basically the scenario in comment #7 that we discussed above.

> We could rearchitect virt-install to not attempt to generate a
> non-conflicting pool name ahead of time, and instead rely on the `XML
> define` step to tell us if the name is free or not, and if it isn't, try the
> next generate name, etc. But then every collision hit would generate a 'pool
> name in use' error in libvirt logs, which users don't like. That was one of
> the motivations for doing the collision check the way we do it now.

The one other 'reachitecture' solution that I considered was making virt-install listen for libvirt pool lifecycle events to keep its cache up to date. That probably doesn't totally eliminate the race, but it might narrow it significantly. 

But I see virt-install as a relatively simple command-line utility that runs to completion and exits. By adding event listeners like this, it seems we would be turning it into something a bit different than what it is designed to be. That's why I started by considering the low-effort approach. If something more is needed, I would start to question whether virt-install is really the right tool for the job.

Comment 23 Hongzhou Liu 2023-03-29 01:40:26 UTC
(In reply to Cole Robinson from comment #21)
> @jjongsma IMO your suggestion may create new problems. My first thought is
> filename length vs PATH_MAX: libvirt has an implicit max object name length
> of 255 chars, so this would need extra care. Also virsh output and UI tools
> would likely be uglier cramming full path into the object name. And then
> whatever other issues it might trigger. I'm not in favor of that type of
> change without a lot more testing.
> 
> 
> Handling parallel execution like this is difficult. virt-install fills in
> defaults and checks conflicts for a number of things, and there's many
> possible TOCTTOU issues. Consider even the case where the same ISO
> /tmp/ISODIR/my.iso being used for 2 installs:
> 
> * virt-install 1 caches pool list
> * virt-install 2 caches pool list
> * virt-install 1 sees /tmp/ISODIR/ isn't a pool in the cache, creates it
> * virt-install 2 sees /tmp/ISODIR/ isn't a pool in the cache, creates it ->
> error: ISODIR pool already exists
> 
> We could rearchitect virt-install to not attempt to generate a
> non-conflicting pool name ahead of time, and instead rely on the `XML
> define` step to tell us if the name is free or not, and if it isn't, try the
> next generate name, etc. But then every collision hit would generate a 'pool
> name in use' error in libvirt logs, which users don't like. That was one of
> the motivations for doing the collision check the way we do it now.
> 
> 
> Can someone clarify, is virt-install invoked by satellite, or is happening
> in customer scripts?
> My feeling is this is simplest to fix on the user or satellite side:
> 
> * Add a `sleep 5` between each virt-install kickoff. That's probably enough
> if they are invoked truly serially in the same process. Compared to network
> download time and VM install time it probably doesn't add much overhead.

The reporter said the time delay will be hindering in a production environment, this solution may not works.
 
> * Create each pool ahead of time before the virt-install kickoff. Then
> virt-install doesn't need to auto create any pool here: sudo virsh
> pool-create-as --type dir --name "$(basename $(dirname $ISO))" --source-path
> "$(dirname $ISO)"

I think this might be the simplest way to fix the problem. @adevolder, could you checkout this solution and see if it meets your needs?

Comment 24 Hongzhou Liu 2023-03-29 01:40:48 UTC
(In reply to Cole Robinson from comment #21)
> @jjongsma IMO your suggestion may create new problems. My first thought is
> filename length vs PATH_MAX: libvirt has an implicit max object name length
> of 255 chars, so this would need extra care. Also virsh output and UI tools
> would likely be uglier cramming full path into the object name. And then
> whatever other issues it might trigger. I'm not in favor of that type of
> change without a lot more testing.
> 
> 
> Handling parallel execution like this is difficult. virt-install fills in
> defaults and checks conflicts for a number of things, and there's many
> possible TOCTTOU issues. Consider even the case where the same ISO
> /tmp/ISODIR/my.iso being used for 2 installs:
> 
> * virt-install 1 caches pool list
> * virt-install 2 caches pool list
> * virt-install 1 sees /tmp/ISODIR/ isn't a pool in the cache, creates it
> * virt-install 2 sees /tmp/ISODIR/ isn't a pool in the cache, creates it ->
> error: ISODIR pool already exists
> 
> We could rearchitect virt-install to not attempt to generate a
> non-conflicting pool name ahead of time, and instead rely on the `XML
> define` step to tell us if the name is free or not, and if it isn't, try the
> next generate name, etc. But then every collision hit would generate a 'pool
> name in use' error in libvirt logs, which users don't like. That was one of
> the motivations for doing the collision check the way we do it now.
> 
> 
> Can someone clarify, is virt-install invoked by satellite, or is happening
> in customer scripts?
> My feeling is this is simplest to fix on the user or satellite side:
> 
> * Add a `sleep 5` between each virt-install kickoff. That's probably enough
> if they are invoked truly serially in the same process. Compared to network
> download time and VM install time it probably doesn't add much overhead.

The reporter said the time delay will be hindering in a production environment, this solution may not works.
 
> * Create each pool ahead of time before the virt-install kickoff. Then
> virt-install doesn't need to auto create any pool here: sudo virsh
> pool-create-as --type dir --name "$(basename $(dirname $ISO))" --source-path
> "$(dirname $ISO)"

I think this might be the simplest way to fix the problem. @adevolder, could you checkout this solution and see if it meets your needs?

Comment 25 Germano Veit Michel 2023-04-05 04:45:59 UTC
(In reply to Jonathon Jongsma from comment #22)
> But I see virt-install as a relatively simple command-line utility that runs
> to completion and exits. By adding event listeners like this, it seems we
> would be turning it into something a bit different than what it is designed
> to be. That's why I started by considering the low-effort approach. If
> something more is needed, I would start to question whether virt-install is
> really the right tool for the job.

Definitely agreed. 

Maybe just a random usleep and retry approach on the virt-install side, up to about 3 times?

If this requires a lot of changes for the use-case, then it may be more of an RFE to parallel use than a bug.

And apparently this is not how Satellite would usually provision VMs on RHEL KVM, its some custom way of doing it, so its a corner case bug.

Comment 27 Red Hat Bugzilla 2023-12-21 04:25:14 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days