Yes, this sounds like the same issue as bug 1394920. 504 is not something that Heat ever returns. It's something that HAProxy returns if it doesn't get a reply back before its timeout. So the first thing to check is the HAProxy timeout - heat-api's timeout when passing requests to the engine is 600s on the undercloud, so it makes sense to set the HAProxy timeout to 600s as well.
The puppet setting for this is called haproxy_default_timeout. The default value is:
[ 'http-request 10s', 'queue 2m', 'connect 10s', 'client 2m', 'server 2m', 'check 10s' ]
I'd advise changing the 2m to 10m across the board:
[ 'http-request 10s', 'queue 10m', 'connect 10s', 'client 10m', 'server 10m', 'check 10s' ]
IIUC you want to change this in the undercloud hieradata and then re-run puppet, otherwise it is liable to get overwritten in the future.
With the default configuration on OSP10, you'll only be using 4 heat-engine worker (number of CPUs / 2). If you haven't already, you probably also want to bump this to 8 so that you're using all of your CPU capacity for cpu-bound operations. (This is the default in every release except OSP10.)
The bug that most closely resembles this is actually bug 1393802.
FWIW, the validation performance is considerably improved in OSP11 and later.
Did you push the new timeout values successfully? Thanks.
Created attachment 1438735 [details]
So you don't need to update the puppet-stack-config.yaml. You can use the hieradata_override option in the undercloud.conf. An example of this setting is can be found on the APPENDIX G. SECURITY ENHANCEMENTS of the OSP10 documentation.
I have attached an example of the contents of the hieradata override file that can be used. Simply update the undercloud with "hieradata_override = rhbz1566520.yaml" and run an undercloud update. This should apply the override values. Feel free to rename this file if this naming convention does not work for you.