Bug 2155917
| Summary: | RHOSP 17.0 EDGE: Overcloud Multi-stack Spine Leaf deployment failed with FileNotFoundError: [Errno 2] No such file or directory: '/root/overcloud-deploy/central/central-passwords.yaml' | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Sree <skovili> |
| Component: | openstack-tripleo | Assignee: | Brendan Shephard <bshephar> |
| Status: | CLOSED NOTABUG | QA Contact: | Sree <skovili> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 17.0 (Wallaby) | CC: | bshephar, mburns |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-17 22:40:37 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hey Sree, Can we re-run that command without sudo? I don't think there is anything that needs privilege escalation there now in OSP17. So just: openstack overcloud export --force-overwrite --stack central --output-file /home/stack/central-export.yaml As the stack user. Does that work? If so, we'll need to just adjust those DCN jobs to remove the sudo --preserve-env Hey Sree,
This is an entirely different error now. It's unrelated to the initial error reported on this BZ. It would be best to raise a new BZ for this new problem:
So the new error is:
2023-01-17 02:31:49.543837 | 525400d8-78eb-88de-f506-000000000100 | TASK | Nova: Manage aggregate and availability zone and add hosts to the zone
2023-01-17 02:31:52.353463 | 525400d8-78eb-88de-f506-000000000100 | FATAL | Nova: Manage aggregate and availability zone and add hosts to the zone | undercloud | error={"changed": false, "extra_data": {"data": null, "details": "Compute host dcn1-compute-1.redhat.local could not be found.", "response": "{\"itemNotFound\": {\"code\": 404, \"message\": \"Compute host dcn1-compute-1.redhat.local could not be found.\"}}"}, "msg": "ResourceNotFound: 404: Client Error for url: https://overcloud.redhat.local:13774/v2.1/os-aggregates/5/action, Compute host dcn1-compute-1.redhat.local could not be found."}
2023-01-17 02:31:52.357508 | 525400d8-78eb-88de-f506-000000000100 | TIMING | Nova: Manage aggregate and availability zone and add hosts to the zone | undercloud | 0:21:07.150875 | 2.81s
So we can set the new BZ component to tripleo-heat-templates:
https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/nova/nova-az-config.yaml#L71-L87
I don't see any dcn1-compute1 in the environment btw: These are the two nodes it collected logs from: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-edge-deployment-17.0-rhel-virthost-ipv4-3cont-2comp-2leafs-x-2comp-tls_everywhere-routed_provider_nets-ovn-naz/29/site-compute-0/etc/hostname.gz http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-edge-deployment-17.0-rhel-virthost-ipv4-3cont-2comp-2leafs-x-2comp-tls_everywhere-routed_provider_nets-ovn-naz/29/site-compute-1/etc/hostname.gz But on the Hypervisor, I can see there are other nodes that exist. I'm not sure why there are no logs from those nodes that were collected by that job: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-edge-deployment-17.0-rhel-virthost-ipv4-3cont-2comp-2leafs-x-2comp-tls_everywhere-routed_provider_nets-ovn-naz/29/hypervisor/var/log/extra/virsh-list.txt.gz Central site seems to deploy fine: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-edge-deployment-17.0-rhel-virthost-ipv4-3cont-2comp-2leafs-x-2comp-tls_everywhere-routed_provider_nets-ovn-naz/29/site-undercloud-0/home/stack/overcloud_install.log.gz PLAY RECAP ********************************************************************* central-compute0-0 : ok=476 changed=191 unreachable=0 failed=0 skipped=213 rescued=0 ignored=1 central-compute0-1 : ok=473 changed=191 unreachable=0 failed=0 skipped=213 rescued=0 ignored=1 central-controller0-0 : ok=656 changed=262 unreachable=0 failed=0 skipped=227 rescued=0 ignored=1 central-controller0-1 : ok=655 changed=255 unreachable=0 failed=0 skipped=228 rescued=0 ignored=1 central-controller0-2 : ok=655 changed=255 unreachable=0 failed=0 skipped=228 rescued=0 ignored=1 localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 undercloud : ok=938 changed=338 unreachable=0 failed=0 skipped=223 rescued=64 ignored=1 So we really need to collect the logs from those other nodes, since that's when the failure occurs: 2023-01-17 02:31:49,544 p=200645 u=stack n=ansible | 2023-01-17 02:31:49.543837 | 525400d8-78eb-88de-f506-000000000100 | TASK | Nova: Manage aggregate and availability zone and add hosts to the zone 2023-01-17 02:31:52,356 p=200645 u=stack n=ansible | 2023-01-17 02:31:52.353463 | 525400d8-78eb-88de-f506-000000000100 | FATAL | Nova: Manage aggregate and availability zone and add hosts to the zone | undercloud | error={"changed": false, "extra_data": {"data": null, "details": "Compute host dcn1-compute-1.redhat.local could not be found.", "response": "{\"itemNotFound\": {\"code\": 404, \"message\": \"Compute host dcn1-compute-1.redhat.local could not be found.\"}}"}, "msg": "ResourceNotFound: 404: Client Error for url: https://overcloud.redhat.local:13774/v2.1/os-aggregates/5/action, Compute host dcn1-compute-1.redhat.local could not be found."} 2023-01-17 02:31:52,365 p=200645 u=stack n=ansible | NO MORE HOSTS LEFT ************************************************************* 2023-01-17 02:31:52,367 p=200645 u=stack n=ansible | PLAY RECAP ********************************************************************* 2023-01-17 02:31:52,368 p=200645 u=stack n=ansible | dcn1-compute-0 : ok=476 changed=191 unreachable=0 failed=0 skipped=213 rescued=0 ignored=1 2023-01-17 02:31:52,368 p=200645 u=stack n=ansible | dcn1-compute-1 : ok=473 changed=191 unreachable=0 failed=0 skipped=213 rescued=0 ignored=1 2023-01-17 02:31:52,369 p=200645 u=stack n=ansible | dcn1-network-0 : ok=399 changed=156 unreachable=0 failed=0 skipped=190 rescued=0 ignored=1 2023-01-17 02:31:52,369 p=200645 u=stack n=ansible | dcn1-network-1 : ok=399 changed=156 unreachable=0 failed=0 skipped=190 rescued=0 ignored=1 2023-01-17 02:31:52,370 p=200645 u=stack n=ansible | localhost : ok=0 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 2023-01-17 02:31:52,370 p=200645 u=stack n=ansible | undercloud : ok=690 changed=138 unreachable=0 failed=1 skipped=279 rescued=36 ignored=1 The error means that dcn1-compute-1.redhat.local isn't registered with Nova, so we can't assign it to the aggregate and availability zone. To understand why, we would need logs from the node to see if nova_compute is running and working fine. Let's collect that info and put it all on a new BZ to avoid creating confusion on this one. Hey, I'm not sure where that file lives: :jobs/DFG/edge/stages/overcloud_deploy_spine_leaf_multistack.groovy.inc But something is clearly still trying to use sudo, or maybe Ansible using become:true, or --become since it's trying to access a file in the /root directory |
Description of problem: Edge deployment failed at Overcloud Multi-stack Spine Leaf deployment, export command failed with below error: FileNotFoundError: [Errno 2] No such file or directory: '/root/overcloud-deploy/central/central-passwords.yaml' command executed: sudo --preserve-env openstack overcloud export --force-overwrite --stack central --output-file /home/stack/central-export.yaml Version-Release number of selected component (if applicable): 17.0 How reproducible: 100 % Reproducible Steps to Reproduce: 1.Deploy RHOSP 17.0 mutlistack deployment with Controller:3,compute:2,freeipa:1 tls-everywhere, extending to 2 dcn nodes. 2. network protocol ipv4, no external storage Actual results: hypervisor | FAILED | rc=1 >> Exception occured while running the command Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/tripleoclient/command.py", line 32, in run super(Command, self).run(parsed_args) File "/usr/lib/python3.9/site-packages/osc_lib/command/command.py", line 39, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3.9/site-packages/cliff/command.py", line 186, in run return_code = self.take_action(parsed_args) or 0 File "/usr/lib/python3.9/site-packages/tripleoclient/v1/overcloud_export.py", line 105, in take_action data = export.export_overcloud( File "/usr/lib/python3.9/site-packages/tripleoclient/export.py", line 254, in export_overcloud data = export_passwords(working_dir, stack, excludes) File "/usr/lib/python3.9/site-packages/tripleoclient/export.py", line 54, in export_passwords with open(passwords_file) as f: FileNotFoundError: [Errno 2] No such file or directory: '/root/overcloud-deploy/central/central-passwords.yaml' [Errno 2] No such file or directory: '/root/overcloud-deploy/central/central-passwords.yaml'non-zero return code Expected results: Overcloud multistack deployment successful Additional info: