Bug 1888417

Summary: ElementOS/SolidFire cinder driver may fail operations with xNotPrimary error when ElementOS system is upgrading
Product: Red Hat OpenStack Reporter: Fernando Ferraz <sfernand>
Component: openstack-cinderAssignee: Pablo Caruana <pcaruana>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: medium Docs Contact: Andy Stillman <astillma>
Priority: medium    
Version: 13.0 (Queens)CC: gfidente, gregraka, ltoscano, pcaruana, slinaber, spower
Target Milestone: z16Keywords: OtherQA, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-12.0.10-23.el7ost Doc Type: Bug Fix
Doc Text:
Before this update, API calls to the NetApp SolidFire back end for the Block Storage service (cinder) could fail with a `xNotPrimary` error. This type of error occurred when an operation was made to a volume at the same time that SolidFire automatically moved connections to rebalance the cluster workload. + With this update, a SolidFire driver patch adds the `xNotPrimary` exception to the list of exceptions that can be retried.
Story Points: ---
Clone Of:
: 1947474 (view as bug list) Environment:
Last Closed: 2021-06-16 10:58:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1947474    
Bug Blocks:    

Description Fernando Ferraz 2020-10-14 19:33:28 UTC
hi, we have a customer reporting the following behavior in cinder, using RHOSP 13 (Queens) and the SolidFire driver:

When SolidFire is under heavy load or being upgraded, the
SolidFire cluster may automatically move connections from primary
to secondary nodes, in order to rebalance cluster workload.

Although this operation ocurrs very quickly, if an operation is made
to a volume at the same time it's being moved, there might be a
chance that API calls such as create snapshot could fail with
xNotPrimary error. Normally this will succeed on a retry of the
operation.


I've already submitted a patch upstream addressing this issue and I'm working to get this merged. After we have it in stable/queens, when this might become available for them to update their rhosp environment?

Upstream patch:
https://review.opendev.org/#/c/755373/5

Launchpad bug:
https://bugs.launchpad.net/cinder/+bug/1891914

Comment 1 Luigi Toscano 2020-10-14 21:27:50 UTC
(In reply to Fernando Ferraz from comment #0)

> 
> I've already submitted a patch upstream addressing this issue and I'm
> working to get this merged. After we have it in stable/queens, when this
> might become available for them to update their rhosp environment?

I can't talk yet about the next OSP13 updates, so I can't say yet whether it will be a direct import or it will require some internal cherry-picking.

I don't think I'm incorrect if say that having that patch (thanks for it!) merged in stable/queens is a necessary condition, but not sufficient, for having the fix in OSP13 (unless it was a patch that couldn't be backport to stable branches, but it looks like it is a normal bugfix). So please continue the bugfix-and-backport process.

Comment 24 errata-xmlrpc 2021-06-16 10:58:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 13.0 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2385