Description of problem: when stacking osp16.2 on top of rhel8.3 host and container operating system we see the overcloud deployment fails every time the "top level" overcloud deployment log says: Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log. Overcloud Endpoint: http://10.0.0.125:5000 Overcloud Horizon Dashboard URL: http://10.0.0.125:80/dashboard Overcloud rc file: /home/stack/overcloudrc Overcloud Deployed with error the tail of above-mentioned ansible.log shows: 2020-08-20 11:57:44,903 p=940 u=mistral n=ansible | TASK [Start containers for step 2 using paunch] ******************************** 2020-08-20 11:57:44,904 p=940 u=mistral n=ansible | Thursday 20 August 2020 11:57:44 +0000 (0:00:00.375) 0:36:10.638 ******* 2020-08-20 11:57:45,988 p=940 u=mistral n=ansible | changed: [controller-0] => {"ansible_job_id": "801936624929.47900", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/801936624929.47900", "started": 1} 2020-08-20 11:57:46,142 p=940 u=mistral n=ansible | changed: [ceph-0] => {"ansible_job_id": "703076992934.22717", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/703076992934.22717", "started": 1} 2020-08-20 11:57:46,195 p=940 u=mistral n=ansible | changed: [compute-0] => {"ansible_job_id": "547505552882.23026", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/547505552882.23026", "started": 1} 2020-08-20 11:57:46,322 p=940 u=mistral n=ansible | TASK [Wait for containers to start for step 2 using paunch] ******************** 2020-08-20 11:57:46,323 p=940 u=mistral n=ansible | Thursday 20 August 2020 11:57:46 +0000 (0:00:01.418) 0:36:12.057 ******* 2020-08-20 11:57:46,959 p=940 u=mistral n=ansible | WAITING FOR COMPLETION: Wait for containers to start for step 2 using paunch (1200 retries left). 2020-08-20 11:57:47,062 p=940 u=mistral n=ansible | ok: [ceph-0] => {"action": [], "ansible_job_id": "703076992934.22717", "attempts": 1, "changed": false, "finished": 1} ... 2020-08-20 11:58:32,283 p=940 u=mistral n=ansible | ok: [compute-0] => {"action": ["Applying config_id tripleo_step2"], "ansible_job_id": "547505552882.23026", "attempts": 14, "changed": false, "finished": 1, "rc": 0, "stderr": "$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ctlplane.redhat. ... 2020-08-20 12:40:07,690 p=940 u=mistral n=ansible | WAITING FOR COMPLETION: Wait for containers to start for step 2 using paunch (469 retries left). 2020-08-20 12:40:19,976 p=940 u=mistral n=ansible | fatal: [controller-0]: FAILED! => {"ansible_job_id": "801936624929.47900", "attempts": 733, "changed": false, "finished": 1, "msg": "Paunch failed with config_id tripleo_step2", "rc": 6, "stderr": "$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-scheduler:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-haproxy:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ct ... 2020-08-20 12:40:21,965 p=940 u=mistral n=ansible | NO MORE HOSTS LEFT ************************************************************* 2020-08-20 12:40:21,970 p=940 u=mistral n=ansible | PLAY RECAP ********************************************************************* 2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | ceph-0 : ok=175 changed=95 unreachable=0 failed=0 skipped=297 rescued=0 ignored=0 2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | compute-0 : ok=204 changed=119 unreachable=0 failed=0 skipped=266 rescued=0 ignored=0 2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | controller-0 : ok=244 changed=151 unreachable=0 failed=1 skipped=263 rescued=0 ignored=0 2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | undercloud : ok=67 changed=28 unreachable=0 failed=0 skipped=62 rescued=0 ignored=0 2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | Thursday 20 August 2020 12:40:21 +0000 (0:42:35.648) 1:18:47.706 ******* 2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | =============================================================================== and, while it's not clear from the above log files what is exactly the problem (many puppet tasks are complaining with warnings and errors but most of them seem to be a 'regular' thing - compared a successful osp16.1/el8.2 deployment and they exist there too)... after checking the mysql_init_bundle container it seems it exits with code 1 (while expected 0 as per successful deployment) and the logs of it show: Debug: Stored state in 0.01 seconds Changes: Total: 4 Events: Failure: 2 Success: 4 Total: 6 Resources: Failed: 2 Changed: 4 Skipped: 59 Out of sync: 6 Total: 67 Time: File line: 0.00 File: 0.05 Last run: 1597925222 Config retrieval: 3.67 Pcmk property: 492.60 Total: 493.27 Version: Config: 1597924725 Puppet: 5.5.10 Error: Failed to apply catalog: Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) and above this summary and error there are many lines like this one: Debug: Sleeping for 10 seconds between tries Debug: backup_cib: pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3 returned Debug: try 20/20: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3 node attribute controller-0 galera-role=true Debug: Error: Error: unable to set attribute galera-role crm_attribute: Connection to local file '/var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3' failed: Update does not conform to the configured schema Error connecting to the CIB manager: Update does not conform to the configured schema finishing with: Debug: Pacemaker::Property[galera-role-controller-0]: Resource is being skipped, unscheduling all events Debug: Pacemaker::Resource::Bundle[galera-bundle]: Resource is being skipped, unscheduling all events Error: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3 node attribute controller-0 galera-role=true failed: Error: unable to set attribute galera-role. Too many tries Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Pacemaker::Property[galera-role-controller-0]/Pcmk_property[property-controller-0-galera-role]/ensure: change from 'absent' to 'present' fa iled: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3 node attribute controller-0 galera-role=true failed: Error: unable to set attribute galera-role. Too many tries Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Pacemaker::Resource::Bundle[galera-bundle]/Pcmk_bundle[galera-bundle]: Dependency Pcmk_property[property-controller-0-galera-role] has fai lures: true Version-Release number of selected component (if applicable): puddle RHOS_TRUNK-16.2-RHEL-8-20200811.n.0 osp16.2 rhel8.3 [root@controller-0 ~]# cat /etc/*release NAME="Red Hat Enterprise Linux" VERSION="8.3 (Ootpa)" ID="rhel" ID_LIKE="fedora" VERSION_ID="8.3" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux 8.3 Beta (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:beta" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8" REDHAT_BUGZILLA_PRODUCT_VERSION=8.3 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8.3 Beta" Red Hat Enterprise Linux release 8.3 Beta (Ootpa) Red Hat OpenStack Platform release 16.2.0 Beta (Train) Red Hat Enterprise Linux release 8.3 Beta (Ootpa) How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: I currently have a machine with this osp16.2 problem shown, it's available for troubleshooting for next few days
*** This bug has been marked as a duplicate of bug 187119 ***
*** This bug has been marked as a duplicate of bug 1869379 ***