Bug 1930806

Summary:	Tune cinder wsgi/httpd timeout
Product:	Red Hat OpenStack	Reporter:	Andreas Karis <akaris>
Component:	openstack-tripleo-heat-templates	Assignee:	Alan Bishop <abishop>
Status:	CLOSED ERRATA	QA Contact:	Tzach Shefi <tshefi>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	16.1 (Train)	CC:	abishop, aschultz, ebarrera, gkadam, gregraka, mburns, ndeevy
Target Milestone:	z7	Keywords:	Triaged, ZStream
Target Release:	16.1 (Train on RHEL 8.2)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-tripleo-heat-templates-11.3.2-1.20210705103304.29a02c1.el8ost	Doc Type:	Enhancement
Doc Text:	This enhancement adds the new `CinderRpcResponseTimeout` and `CinderApiWsgiTimeout` parameters to support tuning RPC and API WSGI timeouts in the Block Storage service (cinder). Default timeout values might not be adequate for large deployments and in situations where transactions might be delayed due to system load. + It is now possible to tune the RPC and API WSGI timeouts to prevent transactions prematurely timing out.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-12-09 20:18:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Andreas Karis 2021-02-19 15:50:11 UTC

Description of problem:
It's currently impossible to tune cinder wsgi/httpd timeout

We can:
* tune haproxy timeouts
* tune cinder RPC timeouts

When detaching volumes, nova calls into haproxy which calls into cinder-api. cinder-api calls cinder-volume via rabbitmq. The default timeouts here are 1 minute for RPC, 1 minute for httpd/wsgi and 2 minutes for  haproxy.

Udner heavy load and/or depending on the backend detach calls might take longer than 1 minute. In our customer case, ca. 2 minutes. 

We tweaked haproxy and rpc calls, but we still have issues with:
~~~
/var/log/containers/httpd/cinder-api/cinder_wsgi_error.log:[Wed Feb 17 12:28:45.685321 2021] [wsgi:error] [pid 10529] [client 10.133.0.136:35866] Timeout when reading response headers from daemon process 'cinder-api': /var/www/cgi-bin/cinder/cinder-api
~~~


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Alan Bishop 2021-02-24 18:04:25 UTC

Until a proper fix is developed, here is a workaround. Add this to any env deployment file:

parameter_defaults:
  ControllerExtraConfig:
    cinder::wsgi::apache::vhost_custom_fragment: 'Timeout 300'

The value will appear in cinder's /etc/httpd/conf.d/10-cinder_wsgi.conf file (not in /etc/httpd/conf/httpd.conf)

Comment 3 Andreas Karis 2021-02-24 18:52:16 UTC

Nice, thanks! We'll try that!

Comment 4 Alan Bishop 2021-02-24 22:02:47 UTC

You can also control the RPC response timeout:

parameter_defaults:
  ControllerExtraConfig:
    cinder::rpc_response_timeout: 120
    cinder::wsgi::apache::vhost_custom_fragment: 'Timeout 300'

The upstream patch I just proposed woud add the following two THT parameters:

CinderRpcResponseTimeout
CinderApiWsgiTimeout

Comment 15 Tzach Shefi 2021-08-01 12:32:22 UTC

Verified on:
openstack-tripleo-heat-templates-11.3.2-1.20210720153309.29a02c1.el8ost.noarch

Used this yaml:
(overcloud) [stack@undercloud-0 ~]$ cat virt/extra_templates.yaml 
parameter_defaults:
    ControllerExtraConfig:
        cinder::rpc_response_timeout: 120
        cinder::wsgi::apache::vhost_custom_fragment: Timeout 300


Resulting in an overcloud deployment with both required setting:

[root@controller-0 ~]# grep rpc_res /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf 
#rpc_response_timeout = 60
rpc_response_timeout=120



[root@controller-0 ~]# cat /var/lib/config-data/puppet-generated/cinder/etc/httpd/conf.d/10-cinder_wsgi.conf 
# ************************************
# Vhost template in module puppetlabs-apache
# Managed by Puppet
...
...
  ## Custom fragment
  Timeout 300
</VirtualHost>

Good to verify.

Comment 26 errata-xmlrpc 2021-12-09 20:18:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762