Bug 1930806

Summary: Tune cinder wsgi/httpd timeout
Product: Red Hat OpenStack Reporter: Andreas Karis <akaris>
Component: openstack-tripleo-heat-templatesAssignee: Alan Bishop <abishop>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: abishop, aschultz, ebarrera, gkadam, gregraka, mburns, ndeevy
Target Milestone: z7Keywords: Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210705103304.29a02c1.el8ost Doc Type: Enhancement
Doc Text:
This enhancement adds the new `CinderRpcResponseTimeout` and `CinderApiWsgiTimeout` parameters to support tuning RPC and API WSGI timeouts in the Block Storage service (cinder). Default timeout values might not be adequate for large deployments and in situations where transactions might be delayed due to system load. + It is now possible to tune the RPC and API WSGI timeouts to prevent transactions prematurely timing out.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-09 20:18:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Karis 2021-02-19 15:50:11 UTC
Description of problem:
It's currently impossible to tune cinder wsgi/httpd timeout

We can:
* tune haproxy timeouts
* tune cinder RPC timeouts

When detaching volumes, nova calls into haproxy which calls into cinder-api. cinder-api calls cinder-volume via rabbitmq. The default timeouts here are 1 minute for RPC, 1 minute for httpd/wsgi and 2 minutes for  haproxy.

Udner heavy load and/or depending on the backend detach calls might take longer than 1 minute. In our customer case, ca. 2 minutes. 

We tweaked haproxy and rpc calls, but we still have issues with:
~~~
/var/log/containers/httpd/cinder-api/cinder_wsgi_error.log:[Wed Feb 17 12:28:45.685321 2021] [wsgi:error] [pid 10529] [client 10.133.0.136:35866] Timeout when reading response headers from daemon process 'cinder-api': /var/www/cgi-bin/cinder/cinder-api
~~~


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Alan Bishop 2021-02-24 18:04:25 UTC
Until a proper fix is developed, here is a workaround. Add this to any env deployment file:

parameter_defaults:
  ControllerExtraConfig:
    cinder::wsgi::apache::vhost_custom_fragment: 'Timeout 300'

The value will appear in cinder's /etc/httpd/conf.d/10-cinder_wsgi.conf file (not in /etc/httpd/conf/httpd.conf)

Comment 3 Andreas Karis 2021-02-24 18:52:16 UTC
Nice, thanks! We'll try that!

Comment 4 Alan Bishop 2021-02-24 22:02:47 UTC
You can also control the RPC response timeout:

parameter_defaults:
  ControllerExtraConfig:
    cinder::rpc_response_timeout: 120
    cinder::wsgi::apache::vhost_custom_fragment: 'Timeout 300'

The upstream patch I just proposed woud add the following two THT parameters:

CinderRpcResponseTimeout
CinderApiWsgiTimeout

Comment 15 Tzach Shefi 2021-08-01 12:32:22 UTC
Verified on:
openstack-tripleo-heat-templates-11.3.2-1.20210720153309.29a02c1.el8ost.noarch

Used this yaml:
(overcloud) [stack@undercloud-0 ~]$ cat virt/extra_templates.yaml 
parameter_defaults:
    ControllerExtraConfig:
        cinder::rpc_response_timeout: 120
        cinder::wsgi::apache::vhost_custom_fragment: Timeout 300


Resulting in an overcloud deployment with both required setting:

[root@controller-0 ~]# grep rpc_res /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf 
#rpc_response_timeout = 60
rpc_response_timeout=120



[root@controller-0 ~]# cat /var/lib/config-data/puppet-generated/cinder/etc/httpd/conf.d/10-cinder_wsgi.conf 
# ************************************
# Vhost template in module puppetlabs-apache
# Managed by Puppet
...
...
  ## Custom fragment
  Timeout 300
</VirtualHost>

Good to verify.

Comment 26 errata-xmlrpc 2021-12-09 20:18:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762