Bug 1973952
Summary: | Network config is not updated on stack update | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Sergey Bekkerman <sbekkerm> | |
Component: | openstack-tripleo-heat-templates | Assignee: | Harald Jensås <hjensas> | |
Status: | CLOSED ERRATA | QA Contact: | Joe H. Rahme <jhakimra> | |
Severity: | urgent | Docs Contact: | ||
Priority: | medium | |||
Version: | 16.2 (Train) | CC: | ahyder, apetrich, bdobreli, ccamposr, hjensas, mburns, mkrcmari, owalsh, pbabbar, sbaker, shrjoshi, spower | |
Target Milestone: | --- | Keywords: | Regression, Triaged | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | openstack-tripleo-heat-templates-11.5.1-2.20210603174816.el8ost.4 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1975084 (view as bug list) | Environment: | ||
Last Closed: | 2021-09-15 07:16:23 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1958293, 1975084, 1975346 |
Comment 1
Bernard Cafarelli
2021-06-22 10:09:34 UTC
Did you set "NetworkDeploymentActions: ['CREATE','UPDATE'] when updating the deployment? See: https://access.redhat.com/solutions/2213711 Due to a bug, the network config on the nodes has been updating on every update since OSP16. We recently backported the fixes to close that gap, see: https://bugzilla.redhat.com/show_bug.cgi?id=1958293. Fixing this bug is likely the reason you are now seeing your job failing. Bernard: Yes, the wait is caused by nova_wait_for_compute_service. It blocks until the compute service starts, which is failing because it cannot connect to rabbitmq: 2021-06-22 10:00:06.558 8 ERROR oslo.messaging._drivers.impl_rabbit [req-8a38c79c-ff6d-4540-a206-a81c2fbe603d - - - - -] Connection failed: timed out (retrying in 0 seconds): socket.timeout: timed out The dcn3 compute can't ping the controllers on internalapi: [root@dcn3-compute3-0 heat-admin]# ping central-controller0-0.internalapi.redhat.local PING central-controller0-0.internalapi.redhat.local (172.25.1.164) 56(84) bytes of data. ^C --- central-controller0-0.internalapi.redhat.local ping statistics --- 13 packets transmitted, 0 received, 100% packet loss, time 12285ms Controller can ping the compute but that is through the default gateway not the spine routers: [heat-admin@central-controller0-0 ~]$ ping 172.25.4.169 PING 172.25.4.169 (172.25.4.169) 56(84) bytes of data. 64 bytes from 172.25.4.169: icmp_seq=1 ttl=63 time=51.1 ms 64 bytes from 172.25.4.169: icmp_seq=2 ttl=63 time=50.4 ms ^C --- 172.25.4.169 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 50.426/50.741/51.056/0.315 ms [heat-admin@central-controller0-0 ~]$ traceroute 172.25.4.169 traceroute to 172.25.4.169 (172.25.4.169), 30 hops max, 60 byte packets 1 _gateway (10.0.10.1) 0.320 ms 0.287 ms 0.277 ms Route works for a dcn2 compute: [heat-admin@central-controller0-0 ~]$ traceroute 172.25.3.169 traceroute to 172.25.3.169 (172.25.3.169), 30 hops max, 60 byte packets 1 172.25.1.254 (172.25.1.254) 0.294 ms 0.258 ms 0.245 ms 2 172.25.1.254 (172.25.1.254) 3005.468 ms !H 3005.456 ms !H 3005.444 ms !H Harald: ack thanks, I'll check that. (In reply to Harald Jensås from comment #2) > Did you set "NetworkDeploymentActions: ['CREATE','UPDATE'] when updating the > deployment? > > See: https://access.redhat.com/solutions/2213711 > > > Due to a bug, the network config on the nodes has been updating on every > update since OSP16. We recently backported the fixes to close that gap, see: > https://bugzilla.redhat.com/show_bug.cgi?id=1958293. Fixing this bug is > likely the reason you are now seeing your job failing. Looks like the cause alright, os-net-config isn't being run on the controllers on a stack update. However adding UPDATE to NetworkDeploymentActions didn't work for me. Actions are set correctly AFAICT: [root@site-undercloud-0 central]# pwd /var/lib/mistral/central [root@site-undercloud-0 central]# grep -r tripleo_network_config_network_deployment_actions * deploy_steps_playbook.yaml: tripleo_network_config_network_deployment_actions: "{{ network_deployment_actions }}" [root@site-undercloud-0 central]# grep -A2 network_deployment_actions group_vars/Controller0 network_deployment_actions: - - CREATE - UPDATE But the tasks are still skipped: [root@site-undercloud-0 central]# grep 'Run NetworkConfig script' ansible.log 2021-06-22 11:09:08,623 p=56749 u=mistral n=ansible | 2021-06-22 11:09:08.622924 | 5254007e-ba68-f825-908b-000000000053 | TASK | Run NetworkConfig script 2021-06-22 11:09:08,670 p=56749 u=mistral n=ansible | 2021-06-22 11:09:08.670745 | 5254007e-ba68-f825-908b-000000000053 | SKIPPED | Run NetworkConfig script | central-controller0-0 2021-06-22 11:09:08,865 p=56749 u=mistral n=ansible | 2021-06-22 11:09:08.864752 | 5254007e-ba68-f825-908b-000000000053 | TASK | Run NetworkConfig script 2021-06-22 11:09:08,936 p=56749 u=mistral n=ansible | 2021-06-22 11:09:08.935907 | 5254007e-ba68-f825-908b-000000000053 | TASK | Run NetworkConfig script 2021-06-22 11:09:08,945 p=56749 u=mistral n=ansible | 2021-06-22 11:09:08.945319 | 5254007e-ba68-f825-908b-000000000053 | SKIPPED | Run NetworkConfig script | central-controller0-1 2021-06-22 11:09:09,057 p=56749 u=mistral n=ansible | 2021-06-22 11:09:09.056948 | 5254007e-ba68-f825-908b-000000000053 | SKIPPED | Run NetworkConfig script | central-controller0-2 2021-06-22 11:09:09,488 p=56749 u=mistral n=ansible | 2021-06-22 11:09:09.488562 | 5254007e-ba68-f825-908b-000000000053 | TASK | Run NetworkConfig script 2021-06-22 11:09:09,575 p=56749 u=mistral n=ansible | 2021-06-22 11:09:09.575345 | 5254007e-ba68-f825-908b-000000000053 | SKIPPED | Run NetworkConfig script | central-compute0-0 2021-06-22 11:09:09,725 p=56749 u=mistral n=ansible | 2021-06-22 11:09:09.725638 | 5254007e-ba68-f825-908b-000000000053 | TASK | Run NetworkConfig script 2021-06-22 11:09:09,801 p=56749 u=mistral n=ansible | 2021-06-22 11:09:09.800612 | 5254007e-ba68-f825-908b-000000000053 | SKIPPED | Run NetworkConfig script | central-compute0-1 Adding some debug tasks: - name: Debug fail: msg: "{{ 'UPDATE' in tripleo_network_config_network_deployment_actions }}" 2021-06-22 12:02:02.278372 | 5254007e-ba68-7bde-1e69-000000000050 | FATAL | Debug | central-controller0-0 | error={"changed": false, "msg": false} - name: Debug fail: msg: "{{ tripleo_network_config_network_deployment_actions }}" 2021-06-22 12:06:37.220712 | 5254007e-ba68-b3ef-d670-000000000050 | FATAL | Debug | central-controller0-2 | error={"changed": false, "msg": [["CREATE", "UPDATE"]]} This is the problem: network_deployment_actions: - - CREATE - UPDATE It's a list of list, should be a list. I see the problem, testing a fix now network_deployment_actions: - - CREATE - UPDATE ^^ The list is nested, it should just be a list of strings. What you want is: network_deployment_actions: - CREATE - UPDATE Not sure how you ended up with a nested list. Confirmed it works with the t-h-t fix. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483 |