2161733 – Make sure errors of nova-manage attachment refresh command are shown

Bug 2161733 - Make sure errors of nova-manage attachment refresh command are shown

Summary: Make sure errors of nova-manage attachment refresh command are shown

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	16.2 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	z6
Target Release:	16.2 (Train on RHEL 8.4)
Assignee:	Amit Uniyal
QA Contact:	OSP DFG:Compute
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-01-17 18:03 UTC by Artom Lifshitz
Modified:	2023-11-08 19:19 UTC (History)
CC List:	8 users (show)
Fixed In Version:	openstack-nova-20.6.2-2.20230713165111.8a24acd.el8ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 19:18:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-21512	0	None	None	None	2023-01-17 18:04:48 UTC
Red Hat Product Errata	RHBA-2023:6307	0	None	None	None	2023-11-08 19:19:07 UTC

Comment 2 Artom Lifshitz 2023-01-17 19:51:57 UTC

Work item: Improve output upon error. From the email thread:

>> 2- Missing error messages
>>
>> There were times when the refresh command didn't output any errors on
>> the terminal though it had failed.
>>
>> This happened for example when we run into the issue described in the
>> previous point, where exit code from the script would be 1 but there
>> would be no message printed on the screen.
>>
>> This lead support team to believe during the weekend session that the
>> command did run successfully, so they didn't understand why the proposed
>> steps in our document would not work.
> yep that was mentioned by melanie or gibi as well i think on monday that defently can be improved

When I heard this I thought maybe the output was being captured to a log
file instead of going to the console. But I googled again and found this:

https://stackoverflow.com/questions/55325145/python-not-printing-output

Maybe we just need to flush stdout before exiting nova-manage? I'm
surprised we never ran into this before though.

We need to understand what's going on here - whether it's a matter of just calling flush() like Melanie said, or do we need to add called to LOG.error in some places.

Comment 3 Artom Lifshitz 2023-01-17 19:52:52 UTC

Work item: fix handling of instance locking. From the email thread:

>> 3- VMs in locked state
>>
>> This may be by design, but I'll say it here and let the compute team
>> decide on the correct behavior.
>>
>> On some failures, like the one from step #1, the refresh script leaves
>> the instance in a locked state instead of clearing it.
> 
> ya that kind of a bug.
> we put it in the locked state to make sure the end user cannot make any action like hard rebooting the instace
> while we are messing with the db. that is also why we require the vm to be off so that they cant power it off
> by sshing in.
> 
> regardless of the success or failure the reshsh command shoudl restore the lock state
> 
> so if it was locked before leave it locked and if it was unlocked leave it unlocked.
> so this sound like a bug in our error handeling and clean up

+1

Comment 4 Artom Lifshitz 2023-01-17 19:54:14 UTC

Work item: disconnecting the volume from the correct host. From the email thread:

>> 5- Disconnecting from the wrong host
>>
>> There were cases where the instance said to live in compute#1 but the
>> connection_info in the BDM record was for compute#2, and when the script
>> called `remote_volume_connection` then nova would call os-brick on
>> compute#1 (the wrong node) and try to detach it.
>>
>> In some case os-brick would mistakenly think that the volume was
>> attached (because the target and lun matched an existing volume on the
>> host) and would try to disconnect, resulting in errors on the compute
>> logs.
>>
>> It wasn't a problem (besides creating some confusion and noise) because
>> the removal of the multipath failed since it was in use by an instance.
>>
>> I believe it may be necessary to change the code here:
>>
>>                  compute_rpcapi.remove_volume_connection(
>>                      cctxt, instance, volume_id, instance.host)
>>
>> To use the "host" from the connector properties in the
>> bdb.connection_info if it is present.
> 
> ya that also sound like a clear bug

Comment 15 errata-xmlrpc 2023-11-08 19:18:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.6 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6307

Note You need to log in before you can comment on or make changes to this bug.