Description of problem: listing of resources times out on a stack with 350 nodes. Structure of stack is not too complex (one level of nested stack), templates definition is here: https://github.com/redhat-openstack/openshift-on-openstack Version-Release number of selected component (if applicable): python-heatclient-1.4.0-0.20160831084943.fb7802e.el7ost.noarch openstack-heat-engine-7.0.0-0.20160907124808.21e49dc.el7ost.noarch openstack-heat-api-cloudwatch-7.0.0-0.20160907124808.21e49dc.el7ost.noarch openstack-heat-api-cfn-7.0.0-0.20160907124808.21e49dc.el7ost.noarch puppet-heat-9.2.0-0.20160901072004.4d7b5be.el7ost.noarch openstack-heat-common-7.0.0-0.20160907124808.21e49dc.el7ost.noarch openstack-heat-api-7.0.0-0.20160907124808.21e49dc.el7ost.noarch How reproducible: I understand it's hard to reproduce because of required HW resources, in this case openshift-on-openstack templates and 350 openshift compute nodes were deployed. I don't think this is specific to openshift-on-openstack templates so any big stack which includes nested stack will work as a reproducer. My guess would be that there is no reference of top-level stack in resources of nested stack so Heat has to do many selects (per each nested stack) to get all resources. Actual results: $ time openstack stack resource list --nested-depth=2 test WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils ERROR: <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> real 1m1.721s user 0m0.617s sys 0m0.133s Expected results: list of resources is returned in a short (user-friendly) time
Yes, I think your guess is correct. I think the problem is that we're loading the whole stack in memory in order to figure out which of its resources are nested stacks. If we had a way to load everything from the database with a single query and return that data directly to the user (without instantiating all of Heat's data structures) that'd be much better. It's a big project though. Thomas has a patch up that should have some impact in the short term: https://review.openstack.org/#/c/398476/ It's being tracked in bug 1393802.
The above patch merged in Ocata, so it will be fixed in OSP 11. It was also backported to stable/newton, so I'll clone this bug to OSP 10 as well.
Actually, it's already in OSP 10 so I'm just going to close this as a dup. *** This bug has been marked as a duplicate of bug 1393802 ***