1871195 – overcloud deployment fails, mysql_init_bundle container exits 1 with: Error: unable to set attribute galera-role

Bug 1871195 - overcloud deployment fails, mysql_init_bundle container exits 1 with: Error: unable to set attribute galera-role

Summary: overcloud deployment fails, mysql_init_bundle container exits 1 with: Error: ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1869379
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	mariadb-galera
Sub Component:
Version:	16.2 (Train)
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Damien Ciabrini
QA Contact:	pkomarov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-21 14:43 UTC by Waldemar Znoinski
Modified:	2020-08-21 15:07 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-21 15:06:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Waldemar Znoinski 2020-08-21 14:43:00 UTC

Description of problem:
when stacking osp16.2 on top of rhel8.3 host and container operating system we see the overcloud deployment fails every time

the "top level" overcloud deployment log says:
Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.
Overcloud Endpoint: http://10.0.0.125:5000
Overcloud Horizon Dashboard URL: http://10.0.0.125:80/dashboard
Overcloud rc file: /home/stack/overcloudrc
Overcloud Deployed with error


the tail of above-mentioned ansible.log shows:
2020-08-20 11:57:44,903 p=940 u=mistral n=ansible | TASK [Start containers for step 2 using paunch] ********************************
2020-08-20 11:57:44,904 p=940 u=mistral n=ansible | Thursday 20 August 2020  11:57:44 +0000 (0:00:00.375)       0:36:10.638 ******* 
2020-08-20 11:57:45,988 p=940 u=mistral n=ansible | changed: [controller-0] => {"ansible_job_id": "801936624929.47900", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/801936624929.47900", "started": 1}
2020-08-20 11:57:46,142 p=940 u=mistral n=ansible | changed: [ceph-0] => {"ansible_job_id": "703076992934.22717", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/703076992934.22717", "started": 1}
2020-08-20 11:57:46,195 p=940 u=mistral n=ansible | changed: [compute-0] => {"ansible_job_id": "547505552882.23026", "changed": true, "finished": 0, "results_file": "/root/.ansible_async/547505552882.23026", "started": 1}
2020-08-20 11:57:46,322 p=940 u=mistral n=ansible | TASK [Wait for containers to start for step 2 using paunch] ********************
2020-08-20 11:57:46,323 p=940 u=mistral n=ansible | Thursday 20 August 2020  11:57:46 +0000 (0:00:01.418)       0:36:12.057 ******* 
2020-08-20 11:57:46,959 p=940 u=mistral n=ansible | WAITING FOR COMPLETION: Wait for containers to start for step 2 using paunch (1200 retries left).
2020-08-20 11:57:47,062 p=940 u=mistral n=ansible | ok: [ceph-0] => {"action": [], "ansible_job_id": "703076992934.22717", "attempts": 1, "changed": false, "finished": 1}

...

2020-08-20 11:58:32,283 p=940 u=mistral n=ansible | ok: [compute-0] => {"action": ["Applying config_id tripleo_step2"], "ansible_job_id": "547505552882.23026", "attempts": 14, "changed": false, "finished": 1, "rc": 0, "stderr": "$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ctlplane.redhat.

...

2020-08-20 12:40:07,690 p=940 u=mistral n=ansible | WAITING FOR COMPLETION: Wait for containers to start for step 2 using paunch (469 retries left).
2020-08-20 12:40:19,976 p=940 u=mistral n=ansible | fatal: [controller-0]: FAILED! => {"ansible_job_id": "801936624929.47900", "attempts": 733, "changed": false, "finished": 1, "msg": "Paunch failed with config_id tripleo_step2", "rc": 6, "stderr": "$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-api:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-scheduler:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-glance-api:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-haproxy:16.2_20200811.1\nb''\nb''\n$ podman image exists undercloud-0.ct

...

2020-08-20 12:40:21,965 p=940 u=mistral n=ansible | NO MORE HOSTS LEFT *************************************************************                                                                              
2020-08-20 12:40:21,970 p=940 u=mistral n=ansible | PLAY RECAP *********************************************************************                                                                              
2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | ceph-0                     : ok=175  changed=95   unreachable=0    failed=0    skipped=297  rescued=0    ignored=0                                            
2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | compute-0                  : ok=204  changed=119  unreachable=0    failed=0    skipped=266  rescued=0    ignored=0                                            
2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | controller-0               : ok=244  changed=151  unreachable=0    failed=1    skipped=263  rescued=0    ignored=0                                            
2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | undercloud                 : ok=67   changed=28   unreachable=0    failed=0    skipped=62   rescued=0    ignored=0                                            
2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | Thursday 20 August 2020  12:40:21 +0000 (0:42:35.648)       1:18:47.706 ******* 
2020-08-20 12:40:21,971 p=940 u=mistral n=ansible | =============================================================================== 


and, while it's not clear from the above log files what is exactly the problem (many puppet tasks are complaining with warnings and errors but most of them seem to be a 'regular' thing - compared a successful osp16.1/el8.2 deployment and they exist there too)... 
after checking the mysql_init_bundle container it seems it exits with code 1 (while expected 0 as per successful deployment) and the logs of it show:


Debug: Stored state in 0.01 seconds
Changes:
            Total: 4
Events:
          Failure: 2
          Success: 4
            Total: 6
Resources:
           Failed: 2
          Changed: 4
          Skipped: 59
      Out of sync: 6
            Total: 67
Time:
        File line: 0.00
             File: 0.05
         Last run: 1597925222
   Config retrieval: 3.67
    Pcmk property: 492.60
            Total: 493.27
Version:
           Config: 1597924725
           Puppet: 5.5.10
Error: Failed to apply catalog: Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)


and above this summary and error there are many lines like this one:

Debug: Sleeping for 10 seconds between tries
Debug: backup_cib: pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3 returned 
Debug: try 20/20: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3 node attribute controller-0 galera-role=true
Debug: Error: Error: unable to set attribute galera-role
crm_attribute: Connection to local file '/var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3' failed: Update does not conform to the configured schema
Error connecting to the CIB manager: Update does not conform to the configured schema


finishing with:

Debug: Pacemaker::Property[galera-role-controller-0]: Resource is being skipped, unscheduling all events
Debug: Pacemaker::Resource::Bundle[galera-bundle]: Resource is being skipped, unscheduling all events
Error: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3 node attribute controller-0 galera-role=true failed: Error: unable to set attribute galera-role. Too many tries
Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Pacemaker::Property[galera-role-controller-0]/Pcmk_property[property-controller-0-galera-role]/ensure: change from 'absent' to 'present' fa
iled: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20200820-9-1nlkxj3 node attribute controller-0 galera-role=true failed: Error: unable to set attribute galera-role. Too many tries
Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Pacemaker::Resource::Bundle[galera-bundle]/Pcmk_bundle[galera-bundle]: Dependency Pcmk_property[property-controller-0-galera-role] has fai
lures: true





Version-Release number of selected component (if applicable):
puddle RHOS_TRUNK-16.2-RHEL-8-20200811.n.0
osp16.2
rhel8.3

[root@controller-0 ~]# cat /etc/*release
NAME="Red Hat Enterprise Linux"
VERSION="8.3 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.3"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.3 Beta (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:beta"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.3 Beta"
Red Hat Enterprise Linux release 8.3 Beta (Ootpa)
Red Hat OpenStack Platform release 16.2.0 Beta (Train)
Red Hat Enterprise Linux release 8.3 Beta (Ootpa)



How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I currently have a machine with this osp16.2 problem shown, 
it's available for troubleshooting for next few days

Comment 1 Waldemar Znoinski 2020-08-21 15:06:51 UTC


*** This bug has been marked as a duplicate of bug 187119 ***

Comment 2 Waldemar Znoinski 2020-08-21 15:07:35 UTC


*** This bug has been marked as a duplicate of bug 1869379 ***

Note You need to log in before you can comment on or make changes to this bug.