Bug 2161733

Summary: Make sure errors of nova-manage attachment refresh command are shown
Product: Red Hat OpenStack Reporter: Artom Lifshitz <alifshit>
Component: openstack-novaAssignee: Amit Uniyal <auniyal>
Status: CLOSED ERRATA QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: low Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: dasmith, eglynn, jhakimra, kchamart, sbauza, sgordon, udesale, vromanso
Target Milestone: z6Keywords: Patch, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-20.6.2-2.20230713165111.8a24acd.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-08 19:18:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Artom Lifshitz 2023-01-17 19:51:57 UTC
Work item: Improve output upon error. From the email thread:

>> 2- Missing error messages
>>
>> There were times when the refresh command didn't output any errors on
>> the terminal though it had failed.
>>
>> This happened for example when we run into the issue described in the
>> previous point, where exit code from the script would be 1 but there
>> would be no message printed on the screen.
>>
>> This lead support team to believe during the weekend session that the
>> command did run successfully, so they didn't understand why the proposed
>> steps in our document would not work.
> yep that was mentioned by melanie or gibi as well i think on monday that defently can be improved

When I heard this I thought maybe the output was being captured to a log
file instead of going to the console. But I googled again and found this:

https://stackoverflow.com/questions/55325145/python-not-printing-output

Maybe we just need to flush stdout before exiting nova-manage? I'm
surprised we never ran into this before though.

We need to understand what's going on here - whether it's a matter of just calling flush() like Melanie said, or do we need to add called to LOG.error in some places.

Comment 3 Artom Lifshitz 2023-01-17 19:52:52 UTC
Work item: fix handling of instance locking. From the email thread:

>> 3- VMs in locked state
>>
>> This may be by design, but I'll say it here and let the compute team
>> decide on the correct behavior.
>>
>> On some failures, like the one from step #1, the refresh script leaves
>> the instance in a locked state instead of clearing it.
> 
> ya that kind of a bug.
> we put it in the locked state to make sure the end user cannot make any action like hard rebooting the instace
> while we are messing with the db. that is also why we require the vm to be off so that they cant power it off
> by sshing in.
> 
> regardless of the success or failure the reshsh command shoudl restore the lock state
> 
> so if it was locked before leave it locked and if it was unlocked leave it unlocked.
> so this sound like a bug in our error handeling and clean up

+1

Comment 4 Artom Lifshitz 2023-01-17 19:54:14 UTC
Work item: disconnecting the volume from the correct host. From the email thread:

>> 5- Disconnecting from the wrong host
>>
>> There were cases where the instance said to live in compute#1 but the
>> connection_info in the BDM record was for compute#2, and when the script
>> called `remote_volume_connection` then nova would call os-brick on
>> compute#1 (the wrong node) and try to detach it.
>>
>> In some case os-brick would mistakenly think that the volume was
>> attached (because the target and lun matched an existing volume on the
>> host) and would try to disconnect, resulting in errors on the compute
>> logs.
>>
>> It wasn't a problem (besides creating some confusion and noise) because
>> the removal of the multipath failed since it was in use by an instance.
>>
>> I believe it may be necessary to change the code here:
>>
>>                  compute_rpcapi.remove_volume_connection(
>>                      cctxt, instance, volume_id, instance.host)
>>
>> To use the "host" from the connector properties in the
>> bdb.connection_info if it is present.
> 
> ya that also sound like a clear bug

Comment 15 errata-xmlrpc 2023-11-08 19:18:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.6 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6307