Bug 1869346

Summary: OS-Brick: VxFlexOS Volume-attach failed with KeyError: 'config_group'
Product: Red Hat OpenStack Reporter: Alan Bishop <abishop>
Component: python-os-brickAssignee: Alan Bishop <abishop>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: abishop, amcleod, apevec, ariveral, arkady_kanevsky, bcao, ccopello, dcadzow, dcain, dmaley, gael_rehault, gcharot, geguileo, hrivero, Ivan.Pchelintsev, jamsmith, jeokim, jhardee, jjoyce, jko, jschluet, jvisser, kholtz, kurt_hey, lhh, mburns, michele, mkim, morazi, munna, nchandek, pcaruana, pgrist, rajini.karthik, ravsingh, sam.wan, slinaber, sputhenp, sroza, tkajinam, tshefi, vladislav.belogrudov
Target Milestone: z2Keywords: OtherQA, TestOnly, Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-os-brick-2.10.4-0.20200624084658.12d252d.el8ost Doc Type: Bug Fix
Doc Text:
This update fixes an incompatibility that caused VxFlex volume detachment attempts to fail. + A recent change in VxFlex cinder volume credentialing methods was not backward compatible with pre-existing volume attachments. If a VxFlex volume attachment was made before the credentialing method change, attempts to detach the volume failed. + Now the detachments do not fail.
Story Points: ---
Clone Of: 1862213
: 1908366 (view as bug list) Environment:
Last Closed: 2020-10-28 15:39:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1595325, 1715964, 1908366    

Description Alan Bishop 2020-08-17 16:03:10 UTC
+++ This bug was initially created as a clone of Bug #1862213 +++

Description of problem:

This BZ covers the os-brick portion of the problem reported in bug #1862213. I am including relevant comments from the original BZ, below.

Additional info:

--- Additional comment from Takashi Kajinami on 2020-08-13 03:07:57 UTC ---

I've checked the pasted patch here, but I'm afraid that the fix in cinder might not be sufficient.

If I understand correctly the current error is raised because "config_group" key doesn't exist
in connection_properties stored in nova bdm record.
This means that even if we apply fix for cinder to make it expose config_group parameter in its response,
it won't resolve attach failure in existing instances which were created before update to 16.1,
but only works with the new attachment created after update.

Please correct me if I'm wrong.I might have missed some logics about attachment management in nova side
is something has been changed side queens (or something special is implemented in scaleio).

--- Additional comment from Gorka Eguileor on 2020-08-13 10:41:39 UTC ---

Hi Takashi-san,

I believe you are correct, this fix will only work for new attachments and won't help with any already attached volumes.
Old attachments should already have the password in the connection properties, so we need to make the ScaleIO connector in OS-Brick backward compatible with that information.
There is no patch to fix this upstream, so we would have to write a new one.

I'm going on PTO today for 2 weeks, but I'll try to write today a patch to fix this and then one of my colleagues will handle the rest (babysitting upstream patch, backporting upstream and downstream, etc.).

Regards,
Gorka.

--- Additional comment from Gorka Eguileor on 2020-08-13 11:24:07 UTC ---

I have submitted patch https://review.opendev.org/#/c/746109 for review.
It should fix the backward compatibility issue mentioned by Takeshi-san on comment #16

Comment 2 Alan Bishop 2020-08-17 17:59:43 UTC
Patch merged on upstream master, and will need to be backported to upstream stable branches. But due to urgency, we'll backport downstream for OSP-16.1 in parallel.

Comment 16 Rajini Karthik 2020-08-25 15:08:58 UTC
The validation of this fix is completed successfully

	3-nodes RHOSP 16.1 with RHEL 8.2
[root@elabrhosp85ctl0 ~]# more /etc/redhat-release
Red Hat Enterprise Linux release 8.2 (Ootpa)
[root@elabrhosp85ctl0 ~]# more /etc/rhosp-release
Red Hat OpenStack Platform release 16.1.0 GA (Train)

	PowerFlex Version 3.5.0 / Build 365

With below HF and workaround applied:
	python3-os-brick-2.10.4-0.20200624084658, fix for https://bugzilla.redhat.com/show_bug.cgi?id=1869346 
To apply the fix, use 'podman cp' to copy the rpm to cinder-volume/cinder-backup/nova-compute containers and install there. 
please check attached ‘xxx-install-fix.log’ for more details.

Comment 17 John Visser 2020-08-25 17:46:56 UTC
@tkajinam will you deliver the HF to the customer now that Dell?ENC has verified it?

Comment 18 Takashi Kajinami 2020-08-26 00:21:18 UTC
(In reply to John Visser from comment #17)
> @tkajinam will you deliver the HF to the customer now that
> Dell?ENC has verified it?

Yes. We'll provide hotix to the customer.
Thanks again for your help.

Comment 30 errata-xmlrpc 2020-10-28 15:39:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4284