Bug 2253891
| Summary: | [FFU] RHEL host upgrade of HCI nodes fails on inability to start a pacemaker cluster | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marian Krcmarik <mkrcmari> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Lukas Bezdicka <lbezdick> |
| Status: | CLOSED ERRATA | QA Contact: | Marian Krcmarik <mkrcmari> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 17.1 (Wallaby) | CC: | lbezdick, mariel, mburns, mciecier, prgutier, ramishra, ushkalim, yatanaka |
| Target Milestone: | z2 | Keywords: | Regression, Triaged |
| Target Release: | 17.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-14.3.1-17.1.20231103010826.el9ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-01-16 14:31:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I've actually observed the behavior on the networker nodes too
2023-12-05 21:50:25 | 2023-12-05 21:50:25.225274 | 525400c3-b998-5b88-81bc-00000000086d | FATAL | Start pacemaker cluster after reboot | networker-0 | error={"changed": false, "msg": "Command execution failed.\nCommand: `pcs cluster start`\nError: Error: cluster is not currently configured on this node\n"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 17.1.2 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:0209 *** Bug 2237659 has been marked as a duplicate of this bug. *** |
Description of problem: The RHEL host upgrade of HCI nodes fails on the following error: FATAL | Start pacemaker cluster after reboot | dcn2-computehci2-0 | error={"changed": false, "msg": "Command execution failed.\nCommand: `pcs cluster start`\nError: Error: cluster is not currently configured on this node\n"} Even tho there is no pacemaker cluster configured and should not be the host upgrade procedure tries to start pacemaker cluster. It happen on the step 5 of the upgrade during executing the task from this commit: https://opendev.org/openstack/tripleo-heat-templates/commit/a4185f80d2158560a546d0f46e8d4caab9ff6e43 And is part of: https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/deployment/podman/podman-baremetal-ansible.yaml#L231 The problem seems to be that HCI nodes have pcs package installed for some reason (I do not know what's the reason is). While the package is not (I assume) needed on the HCI nodes, It got pulled in at some point so probably It would be better to adjust the check and not only check if pcs is present but if any cluster is running/configured on the node because based on the deployments of 16.2 with HCI nodes I checked It seems that pcs is always present so customers may have it installed on the HCI nodes already. The same goes for my env where pcs was already installed on 16.2 before the FFU. The host upgrade was executed i.e. in the following command: openstack overcloud upgrade run --yes \ --stack dcn2 \ --tags system_upgrade \ --limit dcn2-computehci2-0 And the log from the step % of the procedure PLAY [Upgrade tasks for step 5] ************************************************ [WARNING]: conditional statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: '{{ playbook_dir }}/{{ _task_file_path }}' is exists 2023-12-09 01:20:45.501524 | 5254009f-6055-e05f-9966-000000000030 | TIMING | include_tasks | dcn2-computehci2-0 | 0:31:03.476240 | 0.07s 2023-12-09 01:20:45.606261 | 64500b99-dc31-4b7a-9c1c-7e750878536d | INCLUDED | /home/stack/overcloud-deploy/dcn2/config-download/dcn2/ComputeHCI2/upgrade_tasks_step5.yaml | dcn2-computehci2-0 2023-12-09 01:20:45.648112 | 5254009f-6055-e05f-9966-00000000034a | TASK | Create cinder image conversion directory 2023-12-09 01:20:45.703463 | 5254009f-6055-e05f-9966-00000000034a | SKIPPED | Create cinder image conversion directory | dcn2-computehci2-0 2023-12-09 01:20:45.704772 | 5254009f-6055-e05f-9966-00000000034a | TIMING | Create cinder image conversion directory | dcn2-computehci2-0 | 0:31:03.679508 | 0.05s 2023-12-09 01:20:45.734265 | 5254009f-6055-e05f-9966-00000000034b | TASK | Mount cinder's image conversion NFS share 2023-12-09 01:20:45.788972 | 5254009f-6055-e05f-9966-00000000034b | SKIPPED | Mount cinder's image conversion NFS share | dcn2-computehci2-0 2023-12-09 01:20:45.790543 | 5254009f-6055-e05f-9966-00000000034b | TIMING | Mount cinder's image conversion NFS share | dcn2-computehci2-0 | 0:31:03.765278 | 0.05s 2023-12-09 01:20:45.822249 | 5254009f-6055-e05f-9966-00000000034d | TASK | Mount Nova NFS Share 2023-12-09 01:20:45.873551 | 5254009f-6055-e05f-9966-00000000034d | SKIPPED | Mount Nova NFS Share | dcn2-computehci2-0 2023-12-09 01:20:45.874921 | 5254009f-6055-e05f-9966-00000000034d | TIMING | Mount Nova NFS Share | dcn2-computehci2-0 | 0:31:03.849658 | 0.05s 2023-12-09 01:20:45.904007 | 5254009f-6055-e05f-9966-00000000034f | TASK | Check if pcs is present 2023-12-09 01:20:46.958806 | 5254009f-6055-e05f-9966-00000000034f | OK | Check if pcs is present | dcn2-computehci2-0 2023-12-09 01:20:46.960144 | 5254009f-6055-e05f-9966-00000000034f | TIMING | Check if pcs is present | dcn2-computehci2-0 | 0:31:04.934880 | 1.05s 2023-12-09 01:20:46.993346 | 5254009f-6055-e05f-9966-000000000350 | TASK | Start pacemaker cluster after reboot 2023-12-09 01:20:48.776939 | 5254009f-6055-e05f-9966-000000000350 | FATAL | Start pacemaker cluster after reboot | dcn2-computehci2-0 | error={"changed": false, "msg": "Command execution failed.\nCommand: `pcs cluster start`\nError: Error: cluster is not currently configured on this node\n"} 2023-12-09 01:20:48.778363 | 5254009f-6055-e05f-9966-000000000350 | TIMING | Start pacemaker cluster after reboot | dcn2-computehci2-0 | 0:31:06.753091 | 1.78s Vopenstack-tripleo-common-containers-15.4.1-17.1.20230927010819.el9ost.noarch puppet-tripleo-14.2.3-17.1.20231102190827.40278e1.el9ost.noarch ansible-tripleo-ipsec-11.0.1-17.1.20230620172008.b5559c8.el9ost.noarch ansible-tripleo-ipa-0.3.1-17.1.20230627190951.8d29d9e.el9ost.noarch ansible-role-tripleo-modify-image-1.5.1-17.1.20230621064242.b6eedb6.el9ost.noarch python3-tripleo-common-15.4.1-17.1.20230927010819.el9ost.noarch openstack-tripleo-common-15.4.1-17.1.20230927010819.el9ost.noarch tripleo-ansible-3.3.1-17.1.20231101230823.4d015bf.el9ost.noarch openstack-tripleo-heat-templates-14.3.1-17.1.20231103010823.el9ost.noarch openstack-tripleo-validations-14.3.2-17.1.20231026020815.2b526f8.el9ost.noarch python3-tripleoclient-16.5.1-17.1.20230927000827.f3599d0.el9ost.noarch openstack-tripleo-image-elements-13.1.3-17.1.20230621111410.a641940.el9ost.noarch openstack-tripleo-puppet-elements-14.1.3-17.1.20230810141019.b4e0cbd.el9ost.noarchersion-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Perform System host upgrade of a HCI nodes from RHEL8.4 to RHEL9.2 as a part of FFU procedure from 16.2 to 17.1 Additional info: