Bug 2161733 - Make sure errors of nova-manage attachment refresh command are shown
Summary: Make sure errors of nova-manage attachment refresh command are shown
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: z6
: 16.2 (Train on RHEL 8.4)
Assignee: Amit Uniyal
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-17 18:03 UTC by Artom Lifshitz
Modified: 2023-07-27 16:09 UTC (History)
8 users (show)

Fixed In Version: openstack-nova-20.6.2-2.20230713165111.8a24acd.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-21512 0 None None None 2023-01-17 18:04:48 UTC

Comment 2 Artom Lifshitz 2023-01-17 19:51:57 UTC
Work item: Improve output upon error. From the email thread:

>> 2- Missing error messages
>>
>> There were times when the refresh command didn't output any errors on
>> the terminal though it had failed.
>>
>> This happened for example when we run into the issue described in the
>> previous point, where exit code from the script would be 1 but there
>> would be no message printed on the screen.
>>
>> This lead support team to believe during the weekend session that the
>> command did run successfully, so they didn't understand why the proposed
>> steps in our document would not work.
> yep that was mentioned by melanie or gibi as well i think on monday that defently can be improved

When I heard this I thought maybe the output was being captured to a log
file instead of going to the console. But I googled again and found this:

https://stackoverflow.com/questions/55325145/python-not-printing-output

Maybe we just need to flush stdout before exiting nova-manage? I'm
surprised we never ran into this before though.

We need to understand what's going on here - whether it's a matter of just calling flush() like Melanie said, or do we need to add called to LOG.error in some places.

Comment 3 Artom Lifshitz 2023-01-17 19:52:52 UTC
Work item: fix handling of instance locking. From the email thread:

>> 3- VMs in locked state
>>
>> This may be by design, but I'll say it here and let the compute team
>> decide on the correct behavior.
>>
>> On some failures, like the one from step #1, the refresh script leaves
>> the instance in a locked state instead of clearing it.
> 
> ya that kind of a bug.
> we put it in the locked state to make sure the end user cannot make any action like hard rebooting the instace
> while we are messing with the db. that is also why we require the vm to be off so that they cant power it off
> by sshing in.
> 
> regardless of the success or failure the reshsh command shoudl restore the lock state
> 
> so if it was locked before leave it locked and if it was unlocked leave it unlocked.
> so this sound like a bug in our error handeling and clean up

+1

Comment 4 Artom Lifshitz 2023-01-17 19:54:14 UTC
Work item: disconnecting the volume from the correct host. From the email thread:

>> 5- Disconnecting from the wrong host
>>
>> There were cases where the instance said to live in compute#1 but the
>> connection_info in the BDM record was for compute#2, and when the script
>> called `remote_volume_connection` then nova would call os-brick on
>> compute#1 (the wrong node) and try to detach it.
>>
>> In some case os-brick would mistakenly think that the volume was
>> attached (because the target and lun matched an existing volume on the
>> host) and would try to disconnect, resulting in errors on the compute
>> logs.
>>
>> It wasn't a problem (besides creating some confusion and noise) because
>> the removal of the multipath failed since it was in use by an instance.
>>
>> I believe it may be necessary to change the code here:
>>
>>                  compute_rpcapi.remove_volume_connection(
>>                      cctxt, instance, volume_id, instance.host)
>>
>> To use the "host" from the connector properties in the
>> bdb.connection_info if it is present.
> 
> ya that also sound like a clear bug


Note You need to log in before you can comment on or make changes to this bug.