Bug 1566520

Summary: Response from stack update is "504 Gateway Timeout"
Product: Red Hat OpenStack Reporter: Dariusz Wojewódzki <dwojewod>
Component: openstack-heatAssignee: Zane Bitter <zbitter>
Status: CLOSED WORKSFORME QA Contact: Ronnie Rasouli <rrasouli>
Severity: medium Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: aschultz, dwojewod, mburns, pneedle, rhel-osp-director-maint, saime, sbaker, shardy, srevivo, therve, zbitter
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-29 10:02:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rhbz1566520.yaml none

Comment 2 Zane Bitter 2018-04-12 13:54:28 UTC
Yes, this sounds like the same issue as bug 1394920. 504 is not something that Heat ever returns. It's something that HAProxy returns if it doesn't get a reply back before its timeout. So the first thing to check is the HAProxy timeout - heat-api's timeout when passing requests to the engine is 600s on the undercloud, so it makes sense to set the HAProxy timeout to 600s as well.

Comment 4 Zane Bitter 2018-04-12 16:27:42 UTC
The puppet setting for this is called haproxy_default_timeout. The default value is:

  [ 'http-request 10s', 'queue 2m', 'connect 10s', 'client 2m', 'server 2m', 'check 10s' ]

I'd advise changing the 2m to 10m across the board:

  [ 'http-request 10s', 'queue 10m', 'connect 10s', 'client 10m', 'server 10m', 'check 10s' ]

IIUC you want to change this in the undercloud hieradata and then re-run puppet, otherwise it is liable to get overwritten in the future.

Comment 5 Zane Bitter 2018-04-12 16:38:31 UTC
With the default configuration on OSP10, you'll only be using 4 heat-engine worker (number of CPUs / 2). If you haven't already, you probably also want to bump this to 8 so that you're using all of your CPU capacity for cpu-bound operations. (This is the default in every release except OSP10.)

The bug that most closely resembles this is actually bug 1393802.

FWIW, the validation performance is considerably improved in OSP11 and later.

Comment 6 Thomas Hervé 2018-04-19 13:56:47 UTC
Did you push the new timeout values successfully? Thanks.

Comment 9 Alex Schultz 2018-05-18 16:06:57 UTC
Created attachment 1438735 [details]
rhbz1566520.yaml

So you don't need to update the puppet-stack-config.yaml. You can use the hieradata_override option in the undercloud.conf.  An example of this setting is can be found on the APPENDIX G. SECURITY ENHANCEMENTS of the OSP10 documentation[0].

I have attached an example of the contents of the hieradata override file that can be used.  Simply update the undercloud with "hieradata_override = rhbz1566520.yaml" and run an undercloud update. This should apply the override values.  Feel free to rename this file if this naming convention does not work for you.

[0] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/director_installation_and_usage/appe-security_enhancements