Bug 2104675
| Summary: | OSP17 Compute replacement fails with error Refusing to proceed can't find hostname | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Rosenfeld <drosenfe> |
| Component: | openstack-tripleo-common | Assignee: | Adriano Petrich <apetrich> |
| Status: | CLOSED DUPLICATE | QA Contact: | David Rosenfeld <drosenfe> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 17.0 (Wallaby) | CC: | bshephar, jslagle, mburns, ramishra, slinaber |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-07-07 03:19:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hey, so in the baremetal_deployment.yaml file, it looks like we are trying to remove compute-0:
- name: Compute
count: 2
hostname_format: compute-%index%
defaults:
profile: compute
network_config:
template: /home/stack/composable_roles/network/nic-configs/compute.j2
networks:
- network: ctlplane
vif: true
- network: internal_api
- network: tenant
- network: storage
instances:
- hostname: compute-0
name: compute-1
provisioned: false
So the node with hostname compute-0 in the overcloud, matches compute-1 in Ironic? This seems like it might be a mistake here as compute-0 is still deployed and you tried to scale down compute-1? So, do we mean to have it like this instead?:
- name: Compute
count: 2
hostname_format: compute-%index%
defaults:
profile: compute
network_config:
template: /home/stack/composable_roles/network/nic-configs/compute.j2
networks:
- network: ctlplane
vif: true
- network: internal_api
- network: tenant
- network: storage
instances:
- hostname: compute-1
name: compute-1
provisioned: false
This is metalsmith list: (undercloud) [stack@undercloud-0 ~]$ metalsmith list +--------------------------------------+--------------+--------------------------------------+---------------+--------+------------------------+ | UUID | Node Name | Allocation UUID | Hostname | State | IP Addresses | +--------------------------------------+--------------+--------------------------------------+---------------+--------+------------------------+ | 96a3bfb0-1a5e-4f81-b671-343409700708 | ceph-0 | 1f4d52a7-fdb8-4044-9c8a-8d429f6dedb0 | cephstorage-2 | ACTIVE | ctlplane=192.168.24.22 | | 70085593-04c2-43a4-8ef1-75a4503141b6 | ceph-1 | 5fcc6079-340c-439e-8c74-ba9ae1a00c1c | cephstorage-1 | ACTIVE | ctlplane=192.168.24.41 | | 5f2c50f8-0f1e-4f1f-a345-a2f9da5d7192 | ceph-2 | 3b53df04-b57d-46bb-a73b-ee64e0ddcedd | cephstorage-0 | ACTIVE | ctlplane=192.168.24.35 | | 815d1771-60a4-418d-8a6d-6462ddf20886 | compute-0 | ad26596a-54d2-4079-a4b1-8a05ba9da28c | compute-1 | ACTIVE | ctlplane=192.168.24.51 | | 627170d0-0807-469d-9223-f8762169ffa2 | compute-1 | f9ae797c-a51f-4b9e-8ea4-4db45abbbbe5 | compute-0 | ACTIVE | ctlplane=192.168.24.27 | | 16dede8b-016a-4e57-b600-1206fa958f96 | controller-0 | 9ea719ca-12b3-4c90-8491-672ae92c7d3c | controller-1 | ACTIVE | ctlplane=192.168.24.34 | | eea417c5-d1b8-4bdd-9e83-834d015445c4 | controller-1 | 96a845c6-5e0c-45e6-b094-cc933e2fe4e3 | controller-0 | ACTIVE | ctlplane=192.168.24.38 | | c496a066-ea50-492b-ac86-6138ed26d263 | controller-2 | 7279e9ca-f8b3-4aa3-9acd-a6e35070832b | controller-2 | ACTIVE | ctlplane=192.168.24.8 | | fcddf6bd-c743-46df-8db5-9f31631bcdb0 | database-0 | 42926208-8547-454b-8daa-803b4b66b2a5 | database-0 | ACTIVE | ctlplane=192.168.24.9 | | 90ce7155-cbb5-4b5b-a93e-649af77f2e64 | database-1 | 1c908502-c12a-457d-9fc3-479d62ab62de | database-1 | ACTIVE | ctlplane=192.168.24.31 | | ba119d83-43cd-4efe-a703-cb30e7d8aa45 | database-2 | 32b39c2b-85f2-4676-86a5-44fd1883a6a7 | database-2 | ACTIVE | ctlplane=192.168.24.45 | | 6155b66b-e0ec-48cf-b08d-1eed283ee397 | messaging-0 | 99eff70d-e669-4119-8aa6-2ed0012734bb | messaging-2 | ACTIVE | ctlplane=192.168.24.12 | | 261584fc-42ac-4907-89d5-578c4e9ce6b7 | messaging-1 | c725e257-14ce-46df-8384-933679eff4c1 | messaging-1 | ACTIVE | ctlplane=192.168.24.24 | | 85016e70-bbcc-4ad8-9536-fd524c154dcc | messaging-2 | f82d3856-9b0e-4362-b1d9-ac36a9b2d059 | messaging-0 | ACTIVE | ctlplane=192.168.24.53 | | da472ada-943c-45f6-b082-e3b1422362d7 | networker-0 | 30f14784-37e5-4240-9aa1-bc08e8a2c5ea | networker-1 | ACTIVE | ctlplane=192.168.24.14 | | ed17e123-abc7-459c-bc18-427e9ad90664 | networker-1 | 2cdfd562-a5cb-4e21-b28b-3584d7b54021 | networker-0 | ACTIVE | ctlplane=192.168.24.10 | +--------------------------------------+--------------+--------------------------------------+---------------+--------+------------------------+ When deployed there is no guarantee that the hostname and the node name match. Also the documentation: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html#deploying-the-overcloud says the instances entry contains: The name of the baremetal node to remove from the overcloud The hostname which is assigned to that node The baremetal_deployment.yaml file in the job matches the documentation and the deployment. *** This bug has been marked as a duplicate of bug 2092444 *** |
Description of problem: A two compute deployment is performed. The computes are compute-0 and compute-1. compute-1 is scaled down and then a scale up of a node named compute-2 is attempted. When a stack update is performed to add compute-2 it fails with this error message: 2022-07-06 19:56:09.708256 | 525400e2-e4ac-47fe-b253-00000000000d | FATAL | Find existing instances | localhost | error={"changed": false, "msg": "Requested hostname compute-0 was not found, but the deployed node 815d1771-60a4-418d-8a6d-6462ddf20886 has a matching name. Refusing to proceed to avoid confusing results. Please either rename the node or use a different hostname"} This is seen in openstack baremetal list after deployment: | 815d1771-60a4-418d-8a6d-6462ddf20886 | compute-0 | ad26596a-54d2-4079-a4b1-8a05ba9da28c | power on | active | False | | 627170d0-0807-469d-9223-f8762169ffa2 | compute-1 | f9ae797c-a51f-4b9e-8ea4-4db45abbbbe5 | power on | active | False | This is seen in openstack baremetal list after the scale down: | 815d1771-60a4-418d-8a6d-6462ddf20886 | compute-0 | ad26596a-54d2-4079-a4b1-8a05ba9da28c | power on | active | False | | 627170d0-0807-469d-9223-f8762169ffa2 | compute-1 | None | power off | available | False | This is seen in openstack baremetal node list after the scale up is attempted and fails: | 815d1771-60a4-418d-8a6d-6462ddf20886 | compute-0 | ad26596a-54d2-4079-a4b1-8a05ba9da28c | power on | active | False | | 627170d0-0807-469d-9223-f8762169ffa2 | compute-1 | 5988a9de-f6d9-4522-9c51-d6aa6bf670af | power on | active | False | | 75432c61-aa8b-4763-be97-b05d0bc06560 | compute-2 | None | power off | available | False | The node named compute-2 was available and it should have been used for the scale up, but it was not. Version-Release number of selected component (if applicable): RHOS-17.0-RHEL-9-20220701.n.1 How reproducible: Every time Steps to Reproduce: 1. Execute this job in Jenkins: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/df/view/rfe/job/DFG-df-rfe-17.0-virsh-3cont_3db_3msg_2net_2comp_3ceph-blacklist-2networker-compute-replacement/ Note: the scale down infrared update is in progress and hasn't been committed yet. 2. 3. Actual results: Compute replacement fails with error above Expected results: Compute replacement is successful. Additional info: