Bug 1904220
Summary: | openstack overcloud deploy fails during config-download phase | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Itai Levy <itailev> |
Component: | openstack-tripleo-common | Assignee: | Rabi Mishra <ramishra> |
Status: | CLOSED ERRATA | QA Contact: | David Rosenfeld <drosenfe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 16.1 (Train) | CC: | ahleihel, cjeanner, drosenfe, hakhande, jhajyahy, jjoyce, jschluet, mburns, ramishra, slinaber, tvignaud |
Target Milestone: | z6 | Keywords: | Reopened, Triaged |
Target Release: | 16.1 (Train on RHEL 8.2) | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-common-11.4.1-1.20210310124600.75bd92a.el8ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-26 13:49:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Itai Levy
2020-12-03 20:38:23 UTC
Please let me know if remote troubleshooting session is required to speed up investigation and resolution. cat /etc/rhosp-release Red Hat OpenStack Platform release 16.1.2 GA (Train) Seems like config-download working directory "overcloud" under /var/lib/mistral/ that should hold the ansible configs is missing. not sure why its not created by the director... Good point! If you create it and run again does it work? Could you check to see the file deployment/mistral/mistral-engine-container-puppet.yaml under tht is there and if that directory is in the file (around line 128)? Just to clarify, /var/lib/mistral/ is there and it includes "ansible_fact_cache" directory, however "overcloud" directory is missing. Already tried creating "overcloud" directory under /var/lib/mistral/ and chown to mistral user, and repeat overcloud deploy --stack-only followed by overcloud deploy --config-download-only, however it didnt help... I will try deleting the overcloud stack and recreating... file deployment/mistral/mistral-engine-container-puppet.yaml is there, and includes mistral_engine container volume however mounted as read-only: docker_config: step_4: mistral_engine: image: {get_param: ContainerMistralEngineImage} net: host privileged: false restart: always healthcheck: {get_attr: [ContainersCommon, healthcheck_rpc_port]} volumes: list_concat: - {get_attr: [ContainersCommon, volumes]} - - /run:/run - /var/lib/kolla/config_files/mistral_engine.json:/var/lib/kolla/config_files/config.json:ro - /var/lib/config-data/puppet-generated/mistral:/var/lib/kolla/config_files/src:ro - /var/log/containers/mistral:/var/log/mistral:z - /var/lib/mistral:/var/lib/mistral:ro - /usr/share/ansible/:/usr/share/ansible/:ro - /usr/share/openstack-tripleo-validations:/usr/share/openstack-tripleo-validations:ro who should create the "overcloud" directory under /var/lib/mistral and place the ansible config files? Adriano, creating the directory didnt help. The only directory that is populated with ansible files per node is /var/lib/mistral/ansible_fact_cache who should create the "overcloud" directory under /var/lib/mistral and place there the ansible config files? is it mistal engine container? how should we proceed? is the error I get related to OS::TripleO::SoftwareDeployment resource? This is failing when downloading playbooks either from[1] or [2], as it's looking for a config with id None. So you won't have the config-downloaded playbooks yet. Is your heat stack in a good state? The details in this BZ is not enough to troubleshoot. There is surely something messed up. Is this a customer reported issue? I don't see a support case linked. Can you provide the undercloud heat db dump and sosreport to investigate? [1] https://github.com/openstack/tripleo-common/blob/stable/train/tripleo_common/utils/config.py#L231 [2] https://github.com/openstack/tripleo-common/blob/stable/train/tripleo_common/utils/config.py#L94 Hi Rabi, This is not a customer reported issue, we see it in our lab. As you can see the stack is in created state and there are no stack failures: (undercloud) [stack@rhosp-director ~]$ openstack stack list +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | b41bdfb3-a893-46b5-becf-462af4b89098 | overcloud | 109aa1ef23ec4e8091da354d6c465e24 | CREATE_COMPLETE | 2020-12-10T11:10:10Z | None | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ (undercloud) [stack@rhosp-director ~]$ openstack stack failures list overcloud (undercloud) [stack@rhosp-director ~]$ See attached a file include: - deployment yaml files - sosreport - db dumps https://drive.google.com/file/d/1hl8fKDedN9_sT1MO2A9nSaHjsVWIz1Yz/view?usp=sharing Your assistance is appreciated. thanks Itai I don't see any issue with heat. I suspect it's probably some issue with swift ( I remember something similar reported earlier), as I see below in mistral logs. May be you can check if swift is working on undercloud? Try and celanup the overcloud-config container and run deploy again. 020-12-10 11:32:22.947 7 INFO workflow_trace [req-f23ba456-c129-441a-aeb0-f72f9c1ffa45 80ac31972a594c3aa353cf98e599b0e8 109aa1ef23ec4e8091da354d6c465e24 - default default] Workflow 'tripleo.swift.v1.container_exists' [RUNNING -> ERROR, msg=None] (execution_id=9dadcee0-be3b-42d3-acc3-2ca853c8832c) 2020-12-10 11:32:22.994 7 INFO mistral.engine.engine_server [req-f23ba456-c129-441a-aeb0-f72f9c1ffa45 80ac31972a594c3aa353cf98e599b0e8 109aa1ef23ec4e8091da354d6c465e24 - default default] Received RPC request 'on_action_complete'[action_ex_id=9dadcee0-be3b-42d3-acc3-2ca853c8832c, result=Result [data=None, error=Failed subworkflow [execution_id=9dadcee0-be3b-42d3-acc3-2ca853c8832c], cancel=False]] 2020-12-10 11:32:23.010 7 INFO workflow_trace [req-f23ba456-c129-441a-aeb0-f72f9c1ffa45 80ac31972a594c3aa353cf98e599b0e8 109aa1ef23ec4e8091da354d6c465e24 - default default] Task 'verify_container_doesnt_exist' (0b0746cb-c289-4db5-8888-f159b88828f2) [RUNNING -> ERROR, msg=None] (execution_id=79c0525c-6bd9-4c94-acf8-fae8045f300a) here is an updated link for the file download: https://drive.google.com/file/d/19mCMW7UJB-FxnhkNwg7ys_S3ZMLFNLl6/view?usp=sharing Swift containers seems to be up: podman ps -a | grep -i swift 60441dea88e4 rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-ironic-conductor:16.1 /usr/bin/bootstra... 5 hours ago Exited (0) 5 hours ago create_swift_temp_url_key 756a4b93d47b rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-proxy-server:16.1 kolla_start 5 hours ago Up 5 hours ago swift_proxy 24faceb843c5 rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-object:16.1 kolla_start 5 hours ago Up 5 hours ago swift_rsync ec6a7ce45493 rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-object:16.1 kolla_start 5 hours ago Up 5 hours ago swift_object_updater b1bbe51517cb rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-object:16.1 kolla_start 5 hours ago Up 5 hours ago swift_object_server 927fae3111ee rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-proxy-server:16.1 kolla_start 5 hours ago Up 5 hours ago swift_object_expirer 4403b0f33c29 rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-container:16.1 kolla_start 5 hours ago Up 5 hours ago swift_container_updater 614329e519e4 rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-container:16.1 kolla_start 5 hours ago Up 5 hours ago swift_container_server d5391ca594ad rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-account:16.1 kolla_start 5 hours ago Up 5 hours ago swift_account_server d444054a9d33 rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-account:16.1 kolla_start 5 hours ago Up 5 hours ago swift_account_reaper 4edcdabb95be rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-account:16.1 chown -R swift: /... 5 hours ago Exited (0) 5 hours ago swift_setup_srv 721fc1463a02 rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-object:16.1 /bin/bash -c sed ... 5 hours ago Exited (0) 5 hours ago swift_rsync_fix 96f40d1f1fbe rhosp-director.ctlplane.localdomain:8787/rhosp-rhel8/openstack-swift-proxy-server:16.1 /bin/bash -c cp -... 5 hours ago Exited (0) 5 hours ago swift_copy_rings I already tried: - deleting the undercloud stack and re-deploying - deleting the undercloud packages as advised in https://access.redhat.com/solutions/2210421m and reinstalling from scrach - reinstalling 16.0 undercloud + overcloud instead of 16.1 nothing helped, same error. I have a feeling that I am missing something basic here... any idea how to proceed? Itai As I used RHEL 8.2 DVD iso for the Undercloud OS installation, initial baremetal nodes introspection was failing and I had to figure out that I need to stop/disable libvirtd service that was occupying port 67 and prevented from ironic_inspector_dnsmasq container from coming up... Maybe there is another RHEL 8.2 inbox service that messing up with undercloud containers and preventing a proper functionality? [root@rhosp-director stack]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.2 (Ootpa) [root@rhosp-director stack]# iptables -L -n | grep swift ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 multiport dports 8080 state NEW /* 100 swift_proxy_server_haproxy ipv4 */ ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 multiport dports 13808 state NEW /* 100 swift_proxy_server_haproxy_ssl ipv4 */ ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 multiport dports 8080,13808 state NEW /* 122 swift proxy ipv4 */ ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 multiport dports 873,6000,6001,6002 state NEW /* 123 swift storage ipv4 */ [root@rhosp-director stack]# netstat -pna | grep LISTEN | grep "8080\|13808\|873\|6000\|6001\|6002" tcp 0 0 192.168.24.1:8080 0.0.0.0:* LISTEN 730873/python3 tcp 0 0 192.168.24.1:6000 0.0.0.0:* LISTEN 728194/python3 tcp 0 0 192.168.24.3:8080 0.0.0.0:* LISTEN 704947/haproxy tcp 0 0 192.168.24.2:13808 0.0.0.0:* LISTEN 704947/haproxy tcp 0 0 192.168.24.1:6001 0.0.0.0:* LISTEN 727568/python3 tcp 0 0 192.168.24.1:6002 0.0.0.0:* LISTEN 727285/python3 tcp 0 0 192.168.24.1:873 0.0.0.0:* LISTEN 728651/rsync Ah I should have checked your templates earlier. The network templates are wrong. [ramishra@ramishra-laptop deploy_yamls]$ cat controller.yaml .... outputs: OS::stack_id: description: The OsNetConfigImpl resource. value: [ramishra@ramishra-laptop deploy_yamls]$ cat computesriov.yaml .... outputs: OS::stack_id: description: The OsNetConfigImpl resource. value: So they are missing the last line outputs: OS::stack_id: description: The OsNetConfigImpl resource. value: get_resource: OsNetConfigImpl << Missing So they are going as None. It would be nice if the tool could say which key is missing rather than just crashing :) > It would be nice if the tool could say which key is missing rather than just crashing :)
The key is there, but the value is empty, which is kind of valid for a template. Though it's very difficult to add checks these mistakes in custom config templates, we'll put a fix to check for network config id (not being None).
Thanks! Removed get_resource: OsNetConfigImpl from compute.yaml, deployed and got below error: Invalid network config for role Compute. Please check the network config templates used Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2097 |