Bug 1898578

Summary: [OSP 16.1] n-cpu raising MessageUndeliverable when replying to RPC call
Product: Red Hat OpenStack Reporter: Andre <afariasa>
Component: python-oslo-messagingAssignee: Hervé Beraud <hberaud>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: apevec, athomas, bdobreli, dabarzil, dasmith, ebarrera, eglynn, fsoppels, geguileo, gkadam, hberaud, jeckersb, jhakimra, kchamart, lhh, lmiccini, lyarwood, mvalsecc, rhayakaw, sbauza, sgordon, tcarlin, vromanso
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: afariasa: needinfo?
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-oslo-messaging-10.2.1-1.20201114001303.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-26 13:49:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andre 2020-11-17 14:45:01 UTC
Description of problem:
Note: Customer is using EMCDell VNX as a cinder driver. But we'd like to make sure this issue is not raised from OpenStack side

Volume attachment is failing, on Cinder it shows as available but nova shows it as attached.
I'll post more information and logs on the next comment as private, since it may contain customer sensitive information.

We're expecting a RCA from this issue.

Version-Release number of selected component (if applicable):
dellemc/openstack-cinder-volume-dellemc-rhosp16:latest
rhosp-rhel8/openstack-cinder-scheduler:16.1-49
rhosp-rhel8/openstack-cinder-api:16.1-49

rhosp-rhel8/openstack-nova-compute:16.1-52.1602000860
rhosp-rhel8/openstack-nova-libvirt:16.1-56.1602000855
rhosp-rhel8/openstack-nova-api:16.1-55
rhosp-rhel8/openstack-nova-novncproxy:16.1-54
rhosp-rhel8/openstack-nova-scheduler:16.1-53
rhosp-rhel8/openstack-nova-conductor:16.1-53

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
sosreport are available on supportshell under /cases/02791747

Comment 7 Andre 2020-11-23 10:02:56 UTC
Do we currently have any workaround for this issue? Or maybe just to mitigate, like increasing some timeout?

Comment 9 Lee Yarwood 2020-11-23 17:00:12 UTC
(In reply to Andre from comment #7)
> Do we currently have any workaround for this issue? Or maybe just to
> mitigate, like increasing some timeout?

There's no workaround at present, we have spoken about removing this initial RPC call entirely and creating the bdm record in the DB from within the API for sometime to avoid the fallout we are seeing here from the timeout.

That said the underlying issue here seems to be more of a RPC issue so lets address that and I might spawn a separate bug to track the additional rework/refactor in openstack-nova.

Comment 10 Lee Yarwood 2020-11-24 11:46:31 UTC
(In reply to Lee Yarwood from comment #9)
> I might spawn a separate bug to track the additional rework/refactor in openstack-nova.

Apologies, I had already done this in bug #1899581.

Comment 28 Hervé Beraud 2020-12-02 13:46:00 UTC
Fix submitted on master upstream:

https://review.opendev.org/c/openstack/oslo.messaging/+/764776

Comment 58 Hervé Beraud 2021-03-18 10:40:27 UTC
All the fixes have been cherry-picked from upstream train into OSP 16.1

Build successfully completed https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35535009

Generated version python-oslo-messaging-10.2.1-1.20201114001303.el8ost

Comment 71 errata-xmlrpc 2021-05-26 13:49:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2097