Bug 2082601
| Summary: | Ironic-conductor resetting a machine that was removed from hub cluster | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alex Krzos <akrzos> |
| Component: | Bare Metal Hardware Provisioning | Assignee: | Tomas Sedovic <tsedovic> |
| Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | Amit Ugol <augol> |
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | derekh, imiller, rpittau |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-06-02 08:02:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hi Alex, could you provid a must gather with the logs, We'd be mainly interested in seeing the entire ironic logs and baremetal-operator closing this as we don't have enough info to move forward with the troubleshooting or reproduce the issue please re-open the BZ or open a new one if you encounter the same issue again |
Description of problem: While provisioning many OCP SNO clusters using Zero Touch Provisioning (ZTP) with ACM, I had attempted to have failed clusters re-provisioned by removing them from the hub cluster. While they were removed, I had powered down the machines however it seems Ironic conductor or a metal3 component continued to re-power on the machines despite the fact that all references to the machine has been removed. Version-Release number of selected component (if applicable): Hub cluster 4.10.8 SNO clusters 4.9.26 ACM - 2.5.0-DOWNSTREAM-2022-05-04-04-34-55 How reproducible: Unclear, I had attempted to re-provision failed clusters in only one test so far. Steps to Reproduce: 1. 2. 3. Actual results: After the SNO definition was removed from ZTP, gitops resynced to the hub cluster removing the namespace, bmh, infraenv, agentclusterinstall, nmstateconfig, ad the finalizer in order to allow the namespace to finish terminating. After the namespace termianted, I noticed that the VM's that where referenced by the bmh object were turned on, I powered them down, but moments later they were powered on again. After watching logs for sushy-emulator we witnessed something from the hub cluster was resetting the power on those machines: [root@f35-h17-000-r640 ~]# virsh destroy sno00045 Domain sno00045 destroyed [root@f35-h17-000-r640 ~]# journalctl -f | grep post -i May 05 20:03:58 f35-h17-000-r640.rdu2.scalelab.redhat.com sushy-emulator[279756]: fc00:1000::5 - - [05/May/2022 20:03:58] "POST /redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e/Actions/ComputerSystem.Reset HTTP/1.1" 204 - In the above snippet, you can see the vm was powered off (destroy operation) and a subsequent reset was issued to the redfish api to power this vm back on. Expected results: Once an SNO definition is removed via ZTP for Ironic to not interact with the bare metal host for that SNO machine at all. Additional info: Forcing the metal3 pod with ironic-conductor running to be recreated resolved the issue. oc delete po -n openshift-machine-api -l baremetal.openshift.io/cluster-baremetal-operator=metal3-state –all Logs that seemed to show the sno/vm being powered back on: 2022-05-05 20:03:57.587 1 DEBUG sushy.resources.base [req-c9ea00cb-8829-480d-8c59-c8adb60bb61b - - - - -] Received representation of System /redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e: {'_actions': {'reset': {'allowed_values': ['On', 'ForceOff', 'GracefulShutdown', 'GracefulRestart', 'ForceRestart', 'Nmi', 'ForceOn'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e/Actions/ComputerSystem.Reset'}}, '_oem_vendors': None, 'asset_tag': None, 'bios_version': None, 'boot': {'allowed_values': ['Pxe', 'Cd', 'Hdd'], 'enabled': <BootSourceOverrideEnabled.CONTINUOUS: 'Continuous'>, 'mode': <BootSourceOverrideMode.UEFI: 'UEFI'>, 'target': <BootSource.HDD: 'Hdd'>}, 'description': None, 'hostname': None, 'identity': 'fd1f3a27-b58d-582d-b234-a3f786af806e', 'indicator_led': <IndicatorLED.LIT: 'Lit'>, 'links': {'oem_vendors': None}, 'maintenance_window': None, 'manufacturer': 'Sushy Emulator', 'memory_summary': {'health': None, 'size_gib': 18}, 'name': 'sno00045', 'part_number': None, 'power_state': <PowerState.OFF: 'Off'>, 'serial_number': None, 'sku': None, 'status': {'health': <Health.OK: 'OK'>, 'health_rollup': None, 'state': <State.ENABLED: 'Enabled'>}, 'system_type': None, 'uuid': 'fd1f3a27-b58d-582d-b234-a3f786af806e'} refresh /usr/lib/python3.6/site-packages/sushy/resources/base.py:656 2022-05-05 20:03:57.699 1 DEBUG sushy.resources.base [req-c9ea00cb-8829-480d-8c59-c8adb60bb61b - - - - -] Received representation of System /redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e: {'_actions': {'reset': {'allowed_values': ['On', 'ForceOff', 'GracefulShutdown', 'GracefulRestart', 'ForceRestart', 'Nmi', 'ForceOn'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e/Actions/ComputerSystem.Reset'}}, '_oem_vendors': None, 'asset_tag': None, 'bios_version': None, 'boot': {'allowed_values': ['Pxe', 'Cd', 'Hdd'], 'enabled': <BootSourceOverrideEnabled.CONTINUOUS: 'Continuous'>, 'mode': <BootSourceOverrideMode.UEFI: 'UEFI'>, 'target': <BootSource.HDD: 'Hdd'>}, 'description': None, 'hostname': None, 'identity': 'fd1f3a27-b58d-582d-b234-a3f786af806e', 'indicator_led': <IndicatorLED.LIT: 'Lit'>, 'links': {'oem_vendors': None}, 'maintenance_window': None, 'manufacturer': 'Sushy Emulator', 'memory_summary': {'health': None, 'size_gib': 18}, 'name': 'sno00045', 'part_number': None, 'power_state': <PowerState.OFF: 'Off'>, 'serial_number': None, 'sku': None, 'status': {'health': <Health.OK: 'OK'>, 'health_rollup': None, 'state': <State.ENABLED: 'Enabled'>}, 'system_type': None, 'uuid': 'fd1f3a27-b58d-582d-b234-a3f786af806e'} refresh /usr/lib/python3.6/site-packages/sushy/resources/base.py:656 2022-05-05 20:03:57.812 1 DEBUG sushy.resources.base [req-c9ea00cb-8829-480d-8c59-c8adb60bb61b - - - - -] Received representation of System /redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e: {'_actions': {'reset': {'allowed_values': ['On', 'ForceOff', 'GracefulShutdown', 'GracefulRestart', 'ForceRestart', 'Nmi', 'ForceOn'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e/Actions/ComputerSystem.Reset'}}, '_oem_vendors': None, 'asset_tag': None, 'bios_version': None, 'boot': {'allowed_values': ['Pxe', 'Cd', 'Hdd'], 'enabled': <BootSourceOverrideEnabled.CONTINUOUS: 'Continuous'>, 'mode': <BootSourceOverrideMode.UEFI: 'UEFI'>, 'target': <BootSource.HDD: 'Hdd'>}, 'description': None, 'hostname': None, 'identity': 'fd1f3a27-b58d-582d-b234-a3f786af806e', 'indicator_led': <IndicatorLED.LIT: 'Lit'>, 'links': {'oem_vendors': None}, 'maintenance_window': None, 'manufacturer': 'Sushy Emulator', 'memory_summary': {'health': None, 'size_gib': 18}, 'name': 'sno00045', 'part_number': None, 'power_state': <PowerState.OFF: 'Off'>, 'serial_number': None, 'sku': None, 'status': {'health': <Health.OK: 'OK'>, 'health_rollup': None, 'state': <State.ENABLED: 'Enabled'>}, 'system_type': None, 'uuid': 'fd1f3a27-b58d-582d-b234-a3f786af806e'} refresh /usr/lib/python3.6/site-packages/sushy/resources/base.py:656 2022-05-05 20:03:57.935 1 DEBUG sushy.resources.base [req-c9ea00cb-8829-480d-8c59-c8adb60bb61b - - - - -] Received representation of System /redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e: {'_actions': {'reset': {'allowed_values': ['On', 'ForceOff', 'GracefulShutdown', 'GracefulRestart', 'ForceRestart', 'Nmi', 'ForceOn'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e/Actions/ComputerSystem.Reset'}}, '_oem_vendors': None, 'asset_tag': None, 'bios_version': None, 'boot': {'allowed_values': ['Pxe', 'Cd', 'Hdd'], 'enabled': <BootSourceOverrideEnabled.CONTINUOUS: 'Continuous'>, 'mode': <BootSourceOverrideMode.UEFI: 'UEFI'>, 'target': <BootSource.HDD: 'Hdd'>}, 'description': None, 'hostname': None, 'identity': 'fd1f3a27-b58d-582d-b234-a3f786af806e', 'indicator_led': <IndicatorLED.LIT: 'Lit'>, 'links': {'oem_vendors': None}, 'maintenance_window': None, 'manufacturer': 'Sushy Emulator', 'memory_summary': {'health': None, 'size_gib': 18}, 'name': 'sno00045', 'part_number': None, 'power_state': <PowerState.OFF: 'Off'>, 'serial_number': None, 'sku': None, 'status': {'health': <Health.OK: 'OK'>, 'health_rollup': None, 'state': <State.ENABLED: 'Enabled'>}, 'system_type': None, 'uuid': 'fd1f3a27-b58d-582d-b234-a3f786af806e'} refresh /usr/lib/python3.6/site-packages/sushy/resources/base.py:656 2022-05-05 20:03:58.060 1 DEBUG sushy.resources.base [req-c9ea00cb-8829-480d-8c59-c8adb60bb61b - - - - -] Received representation of System /redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e: {'_actions': {'reset': {'allowed_values': ['On', 'ForceOff', 'GracefulShutdown', 'GracefulRestart', 'ForceRestart', 'Nmi', 'ForceOn'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e/Actions/ComputerSystem.Reset'}}, '_oem_vendors': None, 'asset_tag': None, 'bios_version': None, 'boot': {'allowed_values': ['Pxe', 'Cd', 'Hdd'], 'enabled': <BootSourceOverrideEnabled.CONTINUOUS: 'Continuous'>, 'mode': <BootSourceOverrideMode.UEFI: 'UEFI'>, 'target': <BootSource.HDD: 'Hdd'>}, 'description': None, 'hostname': None, 'identity': 'fd1f3a27-b58d-582d-b234-a3f786af806e', 'indicator_led': <IndicatorLED.LIT: 'Lit'>, 'links': {'oem_vendors': None}, 'maintenance_window': None, 'manufacturer': 'Sushy Emulator', 'memory_summary': {'health': None, 'size_gib': 18}, 'name': 'sno00045', 'part_number': None, 'power_state': <PowerState.OFF: 'Off'>, 'serial_number': None, 'sku': None, 'status': {'health': <Health.OK: 'OK'>, 'health_rollup': None, 'state': <State.ENABLED: 'Enabled'>}, 'system_type': None, 'uuid': 'fd1f3a27-b58d-582d-b234-a3f786af806e'} refresh /usr/lib/python3.6/site-packages/sushy/resources/base.py:656 2022-05-05 20:04:00.041 1 DEBUG sushy.resources.base [-] Received representation of System /redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e: {'_actions': {'reset': {'allowed_values': ['On', 'ForceOff', 'GracefulShutdown', 'GracefulRestart', 'ForceRestart', 'Nmi', 'ForceOn'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Systems/fd1f3a27-b58d-582d-b234-a3f786af806e/Actions/ComputerSystem.Reset'}}, '_oem_vendors': None, 'asset_tag': None, 'bios_version': None, 'boot': {'allowed_values': ['Pxe', 'Cd', 'Hdd'], 'enabled': <BootSourceOverrideEnabled.CONTINUOUS: 'Continuous'>, 'mode': <BootSourceOverrideMode.UEFI: 'UEFI'>, 'target': <BootSource.HDD: 'Hdd'>}, 'description': None, 'hostname': None, 'identity': 'fd1f3a27-b58d-582d-b234-a3f786af806e', 'indicator_led': <IndicatorLED.LIT: 'Lit'>, 'links': {'oem_vendors': None}, 'maintenance_window': None, 'manufacturer': 'Sushy Emulator', 'memory_summary': {'health': None, 'size_gib': 18}, 'name': 'sno00045', 'part_number': None, 'power_state': <PowerState.ON: 'On'>, 'serial_number': None, 'sku': None, 'status': {'health': <Health.OK: 'OK'>, 'health_rollup': None, 'state': <State.ENABLED: 'Enabled'>}, 'system_type': None, 'uuid': 'fd1f3a27-b58d-582d-b234-a3f786af806e'} refresh /usr/lib/python3.6/site-packages/sushy/resources/base.py:656