Bug 2209391

Summary: After updating to 16.2.5, heat stack-show on the undercloud takes close to ~7 minutes and break everything
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: openstack-heatAssignee: OSP Team <rhos-maint>
Status: CLOSED NOTABUG QA Contact: David Rosenfeld <drosenfe>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 16.2 (Train)   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-23 20:45:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2023-05-23 18:42:05 UTC
After updating to 16.2.5, heat stack-show/environment-show on the undercloud takes close to ~7 minutes and break everything starting with ansible inventory generation in the tripleoclient calling heatclient and timing out after 30s .

Command executed:
stack     642186  0.2  0.0 6753600 111472 pts/2  S+   17:43   0:06  |   \_ /usr/bin/python3 /usr/bin/openstack --debug --verbose overcloud external-update run --stack overcloud --tags container_image_prepare                              
timesout after 30 seconds ...

I had to modify heat client to timeout after 600s :
~~~
    def get(self, stack_id, resolve_outputs=True):
        """Get the metadata for a specific stack.

        :param stack_id: Stack ID or name to lookup
        :param resolve_outputs: If True, then outputs for this
               stack will be resolved
        """
        kwargs = {}
        if not resolve_outputs:
            kwargs['params'] = {"resolve_outputs": False}
        resp = self.client.get('/stacks/%s' % stack_id, **kwargs, timeout=600)
        body = utils.get_response_body(resp)
        return Stack(self, body.get('stack'), loaded=True)
~~~

heat show takes much time to return the heat stack ... that is not normal either.



[dhostname] [05:18:13 PM]
✘-2 ~/overcloud [master ↓·32|●2✚ 4…1]
17:18 $ time /usr/bin/python3 -s /usr/bin/tripleo-ansible-inventory  --debug --os-cloud undercloud --stack overcloud --undercloud-key-file /var/lib/mistral/.ssh/tripleo-admin-rsa --ansible_ssh_user tripleo-admin --undercloud-connection ss
h --static-yaml-inventory /home/stack/tripleo-ansible-inventory.yaml                                                                                                                                                                         

real    6m20.432s
user    0m1.688s
sys     0m0.143s


config_download took ~40 minutes

now node_update is being executed:

(undercloud) [stack@director:~]$ mistral task-list | grep -v SUCCESS
+--------------------------------------+--------------------------+---------------------------------------------+--------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+
| ID                                   | Name                     | Workflow name                               | Workflow namespace | Workflow Execution ID                | State   | State info                   | Created at          | Updated at          |
+--------------------------------------+--------------------------+---------------------------------------------+--------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+
| da761abd-04e4-46cc-b75e-959228000e6a | get_deployment_status    | tripleo.deployment.v1.get_deployment_status |                    | d442c32d-a743-4429-a40b-992bb15a41df | ERROR   | Failed to handle action c... | 2023-05-23 08:43:19 | 2023-05-23 09:48:38 |
| 3e4f1e48-047d-48a4-92ac-b2ee30f24afc | node_update              | tripleo.package_update.v1.update_nodes      |                    | cbd0fd17-b496-4c6f-a38d-8470b0142ec2 | RUNNING | None                         | 2023-05-23 18:29:24 | 2023-05-23 18:29:24 |
+--------------------------------------+--------------------------+---------------------------------------------+--------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+
this one completed in 30 seconds.


Even in debug, heat engine isn't generating much output of what it's doing/executing . 

The mysql database is 9GB (which isn't the end of the world either and environment have ~220 computes.

Is this slowness expected ?