Bug 2114689

Summary: volume api test detach is not stable
Product: Red Hat OpenStack Reporter: Attila Fazekas <afazekas>
Component: openstack-tempestAssignee: Chandan Kumar <chkumar>
Status: CLOSED ERRATA QA Contact: Martin Kopec <mkopec>
Severity: high Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: alifshit, apevec, bgibizer, fhubik, lhh, slinaber, spower, udesale
Target Milestone: gaKeywords: TestBlocker, Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tempest-31.1.0-0.20220719160757.56d259d.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 12:24:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Attila Fazekas 2022-08-03 06:18:32 UTC
pci(e) device removal requires a cooperating guest operating system.
Typically the guest needs to learn the device plugged:
 - either the hotplug is initialized at plug time
 - explicit pcirescan happens after the plug (for ex init script)
Also the guest must have  hotplug activated (for ex. acpi) in order to remove a device.
I wonder is qemu/libvirt/nova repeats the remove request over time, is it expected to be repeated ?

In order to create clean situation and know is the VM was at least able to boot and
initialized the hotplug subsystem we should do instance validation. ATM all know image
initialize the hotplug before allows an ssh connection. (We might do more than an auth test if needed)

2022-08-02 17:54:01.713 322369 DEBUG tempest [-] validation.run_validation      = True log_opt_values /usr/lib/python3.6/site-packages/oslo_config/cfg.py:2615

The test log did not include ssh attempt.

tempest.api.volume.test_volumes_snapshots.VolumesSnapshotTestJSON.test_snapshot_create_offline_delete_online[compute,id-5210a1de-85a0-11e6-bb21-641c676a5d61] (from full) 
tempest.api.volume.test_volumes_snapshots.VolumesSnapshotTestJSON.test_snapshot_create_delete_with_volume_in_use 

How reproducible:
Depends on the weather, in some test jobs almost always happens.

Additional info:
cirros-0.5.2-x86_64-disk.img

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 317, in wait_for_volume_resource_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: volume 2153277e-0a49-44b6-a498-cf9f2414ac7a failed to reach available status (current in-use) within the required time (300 s).

Comment 3 Filip HubĂ­k 2022-08-04 09:53:51 UTC
Note this issue seems to me like it could have the same or close to root cause to https://bugzilla.redhat.com/show_bug.cgi?id=2012096, that branched into several subcomponents like qemu, libvirt, nova, ...

Anyway until there is consensus on the topic whether this will be addressed by these subcomponents or not, I suggest to consider any "Tempest-waiters" as a workaround and track them accordingly (https://bugzilla.redhat.com/show_bug.cgi?id=2012096#c39).

Comment 7 Martin Kopec 2022-08-16 13:14:57 UTC
The fix (https://review.opendev.org/c/openstack/tempest/+/852030/) is part of the 'Fixed In Version' package (openstack-tempest-31.1.0-0.20220719160757.56d259d.el9ost) which is available since RHOS-17.0-RHEL-9-20220811.n.0. Therefore I'm marking this as VERIFIED.

Comment 12 errata-xmlrpc 2022-09-21 12:24:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543