Bug 2114689

Summary:	volume api test detach is not stable
Product:	Red Hat OpenStack	Reporter:	Attila Fazekas <afazekas>
Component:	openstack-tempest	Assignee:	Chandan Kumar <chkumar>
Status:	CLOSED ERRATA	QA Contact:	Martin Kopec <mkopec>
Severity:	high	Docs Contact:
Priority:	high
Version:	17.0 (Wallaby)	CC:	alifshit, apevec, bgibizer, fhubik, lhh, slinaber, spower, udesale
Target Milestone:	ga	Keywords:	TestBlocker, Triaged
Target Release:	17.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-tempest-31.1.0-0.20220719160757.56d259d.el9ost	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-09-21 12:24:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Attila Fazekas 2022-08-03 06:18:32 UTC

pci(e) device removal requires a cooperating guest operating system.
Typically the guest needs to learn the device plugged:
 - either the hotplug is initialized at plug time
 - explicit pcirescan happens after the plug (for ex init script)
Also the guest must have  hotplug activated (for ex. acpi) in order to remove a device.
I wonder is qemu/libvirt/nova repeats the remove request over time, is it expected to be repeated ?

In order to create clean situation and know is the VM was at least able to boot and
initialized the hotplug subsystem we should do instance validation. ATM all know image
initialize the hotplug before allows an ssh connection. (We might do more than an auth test if needed)

2022-08-02 17:54:01.713 322369 DEBUG tempest [-] validation.run_validation      = True log_opt_values /usr/lib/python3.6/site-packages/oslo_config/cfg.py:2615

The test log did not include ssh attempt.

tempest.api.volume.test_volumes_snapshots.VolumesSnapshotTestJSON.test_snapshot_create_offline_delete_online[compute,id-5210a1de-85a0-11e6-bb21-641c676a5d61] (from full) 
tempest.api.volume.test_volumes_snapshots.VolumesSnapshotTestJSON.test_snapshot_create_delete_with_volume_in_use 

How reproducible:
Depends on the weather, in some test jobs almost always happens.

Additional info:
cirros-0.5.2-x86_64-disk.img

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 317, in wait_for_volume_resource_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: volume 2153277e-0a49-44b6-a498-cf9f2414ac7a failed to reach available status (current in-use) within the required time (300 s).

Comment 3 Filip Hubík 2022-08-04 09:53:51 UTC

Note this issue seems to me like it could have the same or close to root cause to https://bugzilla.redhat.com/show_bug.cgi?id=2012096, that branched into several subcomponents like qemu, libvirt, nova, ...

Anyway until there is consensus on the topic whether this will be addressed by these subcomponents or not, I suggest to consider any "Tempest-waiters" as a workaround and track them accordingly (https://bugzilla.redhat.com/show_bug.cgi?id=2012096#c39).

Comment 7 Martin Kopec 2022-08-16 13:14:57 UTC

The fix (https://review.opendev.org/c/openstack/tempest/+/852030/) is part of the 'Fixed In Version' package (openstack-tempest-31.1.0-0.20220719160757.56d259d.el9ost) which is available since RHOS-17.0-RHEL-9-20220811.n.0. Therefore I'm marking this as VERIFIED.

Comment 12 errata-xmlrpc 2022-09-21 12:24:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543