Description of problem: OSP 16.2 z6 | Minor update from 16.1 to 16.2.6 occasionally fails on paunch. Happens in stage Overcloud Update while updating controller nodes. 2023-10-17 18:29:52 | 2023-10-17 18:29:41.003725 | 52540010-1760-51e9-ef94-000000000179 | TIMING | Start containers for step {{ step }} using paunch | ctrl-2-16-1 | 2:08:03.810767 | 0.92s 2023-10-17 18:29:52 | 2023-10-17 18:29:41.030836 | 52540010-1760-51e9-ef94-00000000017a | TASK | Wait for containers to start for step 1 using paunch 2023-10-17 18:29:52 | 2023-10-17 18:29:41.599594 | 52540010-1760-51e9-ef94-00000000017a | WAITING | Wait for containers to start for step 1 using paunch | ctrl-2-16-1 | 360 retries left 2023-10-17 18:29:52 | 2023-10-17 18:29:51.956164 | 52540010-1760-51e9-ef94-00000000017a | WAITING | Wait for containers to start for step 1 using paunch | ctrl-2-16-1 | 359 retries left 2023-10-17 18:30:03 | 2023-10-17 18:30:03 | 2023-10-17 18:30:02.876441 | 52540010-1760-51e9-ef94-00000000017a | FATAL | Wait for containers to start for step 1 using paunch | ctrl-2-16-1 | error={"ansible_job_id": "55440942512.555016", "attempts": 3, "changed": false, "cmd": "/home/tripleo-admin/.ansible/tmp/ansible-tmp-1697567380.2127461-8950-110259694849210/AnsiballZ_paunch.py", "data": "", "finished": 1, "msg": "Traceback (most recent call last):\n File \"/tmp/ansible_async_wrapper_payload_kjf528ot/ansible_async_wrapper_payload.zip/ansible/modules/utilities/logic/async_wrapper.py\", line 166, in _run_module\n File remove_container", " systemd.service_delete(container=container, log=self.log)", " File \"/usr/lib/python3.6/site-packages/paunch/utils/systemd.py\", line 153, in service_delete", " systemctl.stop(sysd_f)", " File \"/usr/lib/python3.6/site-packages/paunch/utils/systemctl.py\", line 42, in stop", " systemctl(['stop', service], log)", " File \"/usr/lib/python3.6/site-packages/paunch/utils/systemctl.py\", line 34, in systemctl", " raise SystemctlException(str(err))", "paunch.utils.systemctl.SystemctlException: Command '['systemctl', 'stop', 'tripleo_metrics_qdr.service']' returned non-zero exit status 1."]} 2023-10-17 18:30:06 | 2023-10-17 18:30:06 | 2023-10-17 18:30:02.898738 | 52540010-1760-51e9-ef94-00000000017a | TIMING | Wait for containers to start for step {{ step }} using paunch | ctrl-2-16-1 | 2:08:25.705744 | 21.87s 2023-10-17 18:30:06 | 2023-10-17 18:30:06 | NO MORE HOSTS LEFT Version-Release number of selected component (if applicable): How reproducible: Sometimes. Steps to Reproduce: 1. Update OSP16.1 to OSP16.2.6 2. 3. Actual results: Overcloud update fails Expected results: Overcloud update succeeds Additional info:
The issue related to collectd-sensubility out of memory.
Content available at https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/keeping_red_hat_openstack_platform_updated/index#assembly_preparing-for-a-minor-update_keeping-updated in section: "Known issues that might block an update": Minor update from 16.1 to 16.2.6 occasionally fails on paunch not being able to start container.