Bug 1941954 - SolidFire driver can fail to clone due timeout
Summary: SolidFire driver can fail to clone due timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ga
: 16.2 (Train on RHEL 8.4)
Assignee: Pablo Caruana
QA Contact: Tzach Shefi
RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks: 1888469 1939394 1941957
TreeView+ depends on / blocked
 
Reported: 2021-03-23 09:45 UTC by Pablo Caruana
Modified: 2023-10-25 06:16 UTC (History)
6 users (show)

Fixed In Version: openstack-cinder-15.5.0-2.20210409044947.a75f863.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1888469
: 1941957 (view as bug list)
Environment:
Last Closed: 2021-09-15 07:13:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 764940 0 None MERGED NetApp SolidFire: Fix clone and request timeout issues 2021-03-31 13:21:51 UTC
Red Hat Issue Tracker OSP-1662 0 None None None 2023-10-25 06:16:00 UTC
Red Hat Product Errata RHEA-2021:3483 0 None None None 2021-09-15 07:13:32 UTC

Description Pablo Caruana 2021-03-23 09:45:27 UTC
+++ This bug was initially created as a clone of Bug #1888469 +++

Hi folks, we have some customers experiencing timeout issues in cloning operations and occasionally during api calls using the SolidFire driver in OSP 13 (Queens), mostly when dealing with significant large volumes or due poor network performance. Current timeout for API calls is too small for certain environments, and or cloning operation also has a hard coded timeout that doesn't work for all customers. 

I've submitted a patch upstream (not merged yet) to address this issue, by adding two parameters in cinder.conf to allow users to proper configure timeout values according to their environment.

I expect to have this fix backported to stable/queens soon, and my understanding is that the safest approach for customers to get this fix is through a osp13 update. Could you folks evaluate the possibility to include this fix in the next release cycle?

See below the Launchpad bug description:

When cloning a volume in solidfire.py there is a module "_get_model_info" in here is a hardcoded retry count of 600. Customers are facing timeout issues when volumes are too big (ie. multi-terabyte volumes), due to poor networks or upgrade issues that revolve around the ElementOS cluster. A viable solution is to make this value configurable in cinder.conf, to allow users to proper configure this according to their environment.


Upstream patch:
https://review.opendev.org/#/c/756130/


Launchpad issue:
https://bugs.launchpad.net/cinder/+bug/1898587

Comment 9 errata-xmlrpc 2021-09-15 07:13:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:3483


Note You need to log in before you can comment on or make changes to this bug.