Bug 1403185
Summary: | Takeover does not work in case of pxe boot | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | VIKRANT <vaggarwa> |
Component: | openstack-ironic | Assignee: | Dmitry Tantsur <dtantsur> |
Status: | CLOSED ERRATA | QA Contact: | Dan Yasny <dyasny> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 9.0 (Mitaka) | CC: | achernet, bfournie, dtantsur, dyasny, jjoyce, jschluet, lmartins, mburns, mlammon, rhel-osp-director-maint, sclewis, slinaber, srevivo |
Target Milestone: | Upstream M2 | Keywords: | Triaged |
Target Release: | 13.0 (Queens) | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | openstack-ironic-10.1.2-2.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-27 13:29:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1473267 |
Description
VIKRANT
2016-12-09 11:30:53 UTC
Hi! The patch for the bug has merged upstream. We're looking into possibility of backporting it. As a work around, use local boot with bare metal instances. Backported to Newton, pending OSP 10 rebase. Lucas is looking into the possibility of a backport to Mitaka (OSP 9). Hi VIKRANT, Thanks for reporting this. I was looking at the effort required for backporting this fix all the way down to Mitaka and it's not trivial. A couple of methods in the pxe.py module has been re-written since and the fix no longer merges correctly, we would need to rewrite part of the fix. Therefore, for now we won't consider this backport to OSP-9. I'm changing the target of this bug to OSP-10 where the fix has been backported already. Please let us know if it's OK with you. Cheers, Lucas Hi Dan, Here are the steps which I got from Cu. ~~~ The problem happend when I do not uselocal boot in ironic(it will net boot every time when the baremetal reboot) The steps: 1. boot a baremetal instance (notice the instance is not local boot). 2. check which ironic-conductor it belong to, then shutdown the physical node of ironic-conductor (it is one of the controller) 3. reboot the baremetal instance. ~~~ The BZ failed QA. It looks like takeover doesn't work in any case, not just PXE. the flow employed: - have a BM node under OSP10 managed by ironic - try to power the ironic node off (or on) the idea is to catch the moment when the node's "reservation" field is populated by the current conductor - kill the controller node that holds the reservation results: - ironic node-show remains stuck with "reservation | overcloud-controller-2.localdomain" - any attempt to power on/off the node fail: [stack@undercloud-0 ~]$ ironic node-set-power-state ironic-1 on Node f723a2cd-4f8d-4dd7-aad5-cb15db6e932d is locked by host overcloud-controller-2.localdomain, please retry after the current operation is completed. (HTTP 409) [stack@undercloud-0 ~]$ ironic node-set-power-state ironic-1 off Node f723a2cd-4f8d-4dd7-aad5-cb15db6e932d is locked by host overcloud-controller-2.localdomain, please retry after the current operation is completed. (HTTP 409) I'm setting this back to assigned, as it failed QA Dan, how much time did you wait? Cleaning up reservations is certainly not instant. (In reply to Dmitry Tantsur from comment #18) > Dan, how much time did you wait? Cleaning up reservations is certainly not > instant. 16 hours so far - still stuck The bug Dan hit was fixed in OSP 10 in https://github.com/openstack/ironic/commit/d52077f4fe8c668b258702e8298a4beaa19476d8. However, there is one missing change for proper take over, attaching it. And one more change to complete the picture. It looks like all patches are in stable/queens. Moving to POST. I would like https://review.openstack.org/#/c/546273/ to also get in as part of this work, so moving back to ON_DEV for now. Sorry for not updating earlier. As https://review.openstack.org/#/c/554202/ has landed, which is the backport for https://review.openstack.org/#/c/546273/, moving to POST. Install latest osp 13 puddle:2018-05-10.3 Step 1) (overcloud) [stack@undercloud-0 ~]$ ironic node-list The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+----------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+----------+--------------------------------------+-------------+--------------------+-------------+ | e5b6f81f-857b-4867-a7f5-729769609d93 | ironic-0 | 82010342-9c04-421f-8cb0-1ab2277786b3 | power on | active | False | | 4f0ad22a-a246-40b4-8656-04031b3630cb | ironic-1 | None | power off | available | False | +--------------------------------------+----------+--------------------------------------+-------------+--------------------+-------------+ Step 2) (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | dbec5044-73ad-4301-b5b9-f7a12194216c | ceph-0 | f7a92aa8-f094-4e34-b6c2-5511173472bd | power on | active | False | | 2047d948-5619-4e44-ab3d-ce62c29f469e | ceph-1 | c25e6d66-f671-43a7-a7f8-e7e79d1ff963 | power on | active | False | | 47f9dc3c-dd9f-48f0-bd60-611e56ca5d91 | ceph-2 | a3182ebd-b469-4b5f-bf39-298015ff8ae7 | power on | active | False | | 3c4e1fe8-90a5-4999-be51-c90cf6cbf40a | compute-0 | e18cae11-2f9a-446f-ba55-e708494e0f7d | power on | active | False | | 58168f45-2080-4a0d-aec2-41a23977840f | controller-0 | f578b76f-3ca4-4566-b2ac-d2d81795aae2 | power on | active | False | | 8d14cf19-dba8-4194-a711-0454f728d2eb | controller-1 | 8e6a507a-9c4a-464f-8bcc-08ecf9ed059e | power on | active | False | | b70d02c2-96a0-4de1-a56b-5529baf62f42 | controller-2 | 61eaeb8f-843a-4e29-817e-a6256de5b2dc | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ Step 3) (overcloud) [stack@undercloud-0 ~]$ ironic node-set-power-state ironic-0 off The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. (overcloud) [stack@undercloud-0 ~]$ ironic node-show ironic-0 The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +------------------------+--------------------------------------------------------------------------+ | Property | Value | +------------------------+--------------------------------------------------------------------------+ | boot_interface | None | | chassis_uuid | None | | clean_step | {} | | console_enabled | False | | console_interface | None | | created_at | 2018-05-15T17:13:01+00:00 | | deploy_interface | None | | driver | pxe_ipmitool | | driver_info | {u'ipmi_port': u'6234', u'ipmi_username': u'admin', u'deploy_kernel': | | | u'90e6217f-0839-4833-ae12-76d7a70d3866', u'ipmi_address': u'172.16.0.1', | | | u'deploy_ramdisk': u'4bc40ea6-33fe-407c-bb54-e485b9a7f0e3', | | | u'ipmi_password': u'******'} | | driver_internal_info | {u'agent_cached_clean_steps_refreshed': u'2018-05-15 17:14:29.562653', | | | u'agent_cached_clean_steps': {u'deploy': [{u'priority': 99, | | | u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, | | | u'step': u'erase_devices_metadata'}, {u'priority': 10, u'interface': | | | u'deploy', u'reboot_requested': False, u'abortable': True, u'step': | | | u'erase_devices'}]}, u'clean_steps': None, u'hardware_manager_version': | | | {u'generic_hardware_manager': u'1.1'}, u'is_whole_disk_image': False, | | | u'agent_continue_if_ata_erase_failed': False, | | | u'agent_erase_devices_iterations': 1, u'agent_erase_devices_zeroize': | | | True, u'root_uuid_or_disk_id': u'0d0b8fbf-db98-4612-b551-81fb39aacaec', | | | u'agent_version': u'3.2.1.dev2', u'agent_url': | | | u'http://192.168.24.44:9999'} | | extra | {} | | inspect_interface | None | | inspection_finished_at | None | | inspection_started_at | None | | instance_info | {u'root_gb': u'20', u'display_name': u'instance2', u'image_source': | | | u'b852a157-dc53-4e94-9515-2ce4772f04a6', u'memory_mb': u'1024', | | | u'vcpus': u'1', u'local_gb': u'40', u'configdrive': u'******', | | | u'swap_mb': u'0', u'nova_host_id': u'overcloud- | | | controller-2.localdomain'} | | instance_uuid | 82010342-9c04-421f-8cb0-1ab2277786b3 | | last_error | None | | maintenance | False | | maintenance_reason | None | | management_interface | None | | name | ironic-0 | | network_interface | flat | | power_interface | None | | power_state | power on | | properties | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'40', | | | u'cpus': u'4', u'capabilities': u'boot_option:local'} | | provision_state | active | | provision_updated_at | 2018-05-15T17:24:10+00:00 | | raid_config | {} | | raid_interface | None | | reservation | overcloud-controller-2.localdomain | | resource_class | None | | storage_interface | noop | | target_power_state | power off | | target_provision_state | None | | target_raid_config | {} | | traits | | | updated_at | 2018-05-15T21:57:25+00:00 | | uuid | e5b6f81f-857b-4867-a7f5-729769609d93 | | vendor_interface | None | +------------------------+--------------------------------------------------------------------------+ As soon as I see the reservation from Step 3 power off on ironic-0, I issued the following in Step 4. Step 4) (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node reboot b70d02c2-96a0-4de1-a56b-5529baf62f42 (overcloud) [stack@undercloud-0 ~]$ ironic node-show ironic-0 | grep reservation The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. | reservation | None I repeated this a couple time. I did not see any hangup with reservation. I was able to power off/on several more times. Will check with dtantsur if this is sufficient for verification. Yeah, it seems that the problem from comment 17 is gone. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086 |