Description of problem: Deployment currently fails with: 2016-08-10 11:28:30 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 2016-08-10 11:28:31 [0]: SIGNAL_COMPLETE Unknown 2016-08-10 11:28:31 [overcloud-ControllerNodesPostDeployment-aa2zt557lizs-ControllerServicesBaseDeployment_Step2-ualpuau5qu3e]: CREATE_FAILED Resource CREATE failed: Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 pcs status shows: Clone Set: haproxy-clone [haproxy] Started: [ overcloud-ctrl-0 overcloud-ctrl-1 overcloud-ctrl-2 ] ip-10.122.4.8 (ocf::heartbeat:IPaddr2): Started overcloud-ctrl-1 ip-10.122.4.9 (ocf::heartbeat:IPaddr2): Started overcloud-ctrl-1 Clone Set: openstack-core-clone [openstack-core] Started: [ overcloud-ctrl-0 overcloud-ctrl-1 overcloud-ctrl-2 ] Master/Slave Set: redis-master [redis] Masters: [ overcloud-ctrl-2 ] Slaves: [ overcloud-ctrl-0 overcloud-ctrl-1 ] Master/Slave Set: galera-master [galera] galera (ocf::heartbeat:galera): FAILED Master overcloud-ctrl-2 (unmanaged) galera (ocf::heartbeat:galera): FAILED Master overcloud-ctrl-0 (unmanaged) Masters: [ overcloud-ctrl-1 ] Clone Set: mongod-clone [mongod] Started: [ overcloud-ctrl-0 overcloud-ctrl-1 overcloud-ctrl-2 ] Clone Set: memcached-clone [memcached] Started: [ overcloud-ctrl-0 overcloud-ctrl-1 overcloud-ctrl-2 ] Failed Actions: * galera_promote_0 on overcloud-ctrl-2 'unknown error' (1): call=68, status=complete, exitreason='MySQL server failed to start (pid=8941) (rc=0), please check your installation', last-rc-change='Wed Aug 10 12:40:13 2016', queued=0ms, exec=38480ms * ip-10.122.4.9_monitor_10000 on overcloud-ctrl-1 'unknown' (189): call=76, status=Error, exitreason='none', last-rc-change='Wed Aug 10 12:38:33 2016', queued=0ms, exec=0ms * ip-10.122.4.8_monitor_10000 on overcloud-ctrl-1 'unknown error' (1): call=-1, status=Timed Out, exitreason='none', last-rc-change='Wed Aug 10 12:39:53 2016', queued=0ms, exec=0ms * openstack-core_monitor_10000 on overcloud-ctrl-1 'unknown' (189): call=81, status=Error, exitreason='none', last-rc-change='Wed Aug 10 12:38:33 2016', queued=0ms, exec=0ms * galera_promote_0 on overcloud-ctrl-0 'unknown error' (1): call=70, status=complete, exitreason='MySQL server failed to start (pid=26882) (rc=0), please check your installation', last-rc-change='Wed Aug 10 12:40:13 2016', queued=0ms, exec=38507ms PCSD Status: overcloud-ctrl-0: Online overcloud-ctrl-1: Online overcloud-ctrl-2: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled mysql logs show: 2016-08-10 12:40:49 139724652165248 [Note] WSREP: view((empty)) 2016-08-10 12:40:49 139724652165248 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():162 2016-08-10 12:40:49 139724652165248 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():206: Failed to open backend connection: -110 (Connection timed out) 2016-08-10 12:40:49 139724652165248 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1379: Failed to open channel 'galera_cluster' at 'gcomm://overcloud-ctrl-exeter-0,overcloud-ctrl-exeter-1,overcloud-ctrl-exeter-2': -110 (Connection timed out) 2016-08-10 12:40:49 139724652165248 [ERROR] WSREP: gcs connect failed: Connection timed out 2016-08-10 12:40:49 139724652165248 [ERROR] WSREP: wsrep::connect(gcomm://overcloud-ctrl-exeter-0,overcloud-ctrl-exeter-1,overcloud-ctrl-exeter-2) failed: 7 2016-08-10 12:40:49 139724652165248 [ERROR] Aborting Version-Release number of selected component (if applicable): This is using current mitaka stable images. Nightly delorean images do not exhibit this problem. How reproducible: Always Steps to Reproduce: 1. Build images as follows: Edit /usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_image.py and add rdo-release to build from stable # mkdir ~/images # cd ~/images # export RDO_RELEASE=mitaka # openstack overcloud image build --all # openstack overcloud image upload --update-existing 2. Deploy overcloud
I've experienced the same problems as well, and it looks like different versions of the resource-agents package produces different results. Technically this should be moved under the resource-agents component, which might mean moving it out from RDO product to CentOS, but I'll get someone to confirm
I have an environment that can 100% reproduce this Looking closer this might not be a resource-agents issue, perhaps an issue with the mariadb or galera packages themselves
Created attachment 1189856 [details] sosreports from controller nodes sosreports from controller nodes experiencing problem are attached
Ok I have narrowed down the issue. It looks like the version of galera we should be using is galera-25.3.5-6.el7.x86_64 Which is provided by the openstack-mitaka repo. If you have EPEL enabled on the machine, you will instead get galera-25.3.12-2.el7.x86_64 Which obviously has an issue with the version of mysql-server-galera we are using. Basically you need to make absolutely sure you don't have epel enabled on your overcloud images, and make sure the version of galera you are using is the one we ship as part of RDO. This is a problem because python-openstackclient forces epel to be enabled, even though it shouldn't be (reason being something to do with diskimage-builder).
No, the issue is that you're using galera instead of mariadb 10.1 directly. Galera (standalone version) has been deprecated by MariaDB 10.1 which now includes the former. We've been shipping it since Mitaka release. http://cbs.centos.org/koji/buildinfo?buildID=10246 In short, do not use standalone galera for Mitaka and newer releases.
(In reply to Haïkel Guémar from comment #6) > No, the issue is that you're using galera instead of mariadb 10.1 directly. > > Galera (standalone version) has been deprecated by MariaDB 10.1 which now > includes the former. We've been shipping it since Mitaka release. > http://cbs.centos.org/koji/buildinfo?buildID=10246 > > In short, do not use standalone galera for Mitaka and newer releases. But that is what gets rolled into the image when we build it? "We" are not requesting Galera at any point.
(In reply to Haïkel Guémar from comment #6) > No, the issue is that you're using galera instead of mariadb 10.1 directly. > > Galera (standalone version) has been deprecated by MariaDB 10.1 which now > includes the former. We've been shipping it since Mitaka release. > http://cbs.centos.org/koji/buildinfo?buildID=10246 > > In short, do not use standalone galera for Mitaka and newer releases. Yes we understand that, the problem is the tripleo image building process openstack overcloud image build --all is pulling in epel and the bad galera package. We have no way of avoiding that. The process itself is broken. So "we" aren't doing anything, tripleo is. The reason this has slipped through is that the way images are built as part of CI and in CBS is different to how users are expected to build these images
Hello, I'm closing this. Not sure if they general QA issue with image building has been addressed but I just ended up using cbs images. Will re-open on next RDO project if issue persists.