Bug 1483643 - [OSP12] overcloud deployment fails on Bare-metal due to "ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use"
Summary: [OSP12] overcloud deployment fails on Bare-metal due to "ERROR 1047 (08S01) a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: beta
: 12.0 (Pike)
Assignee: Dan Prince
QA Contact: Omri Hochman
Andrew Burden
URL:
Whiteboard:
: 1492314 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-21 14:56 UTC by Omri Hochman
Modified: 2018-02-05 19:12 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-heat-templates-7.0.3-0.20171023134947.8da5e1f.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 21:53:35 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1717250 0 None None None 2017-09-14 13:04:23 UTC
OpenStack gerrit 489277 0 'None' MERGED MariaDB: create clustercheck user at container bootstrap 2020-09-11 12:09:49 UTC
OpenStack gerrit 505663 0 'None' MERGED Disable all uses of wsrep-provider in mysql_bootstrap container 2020-09-11 12:09:48 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Omri Hochman 2017-08-21 14:56:41 UTC
[OSP12] overcloud deployment fails on Bare-metal due to "ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use" 


Environment (puddle is 2017-08-18.3, images TAG=2017-08-18.2)
--------------------------------------------------------------
openstack-tripleo-heat-templates-7.0.0-0.20170805163048.el7ost.noarch


steps :
-------
attempt deployment on BM :  3 controler , 1 compute  
 

results :
-----------
-  overcloud deployment fails:  

| AllNodesDeploySteps                          | e79423e8-dba6-4c0e-b350-0ed7869fd225                                                                                                                                                 | OS::TripleO::PostDeploySteps                                                                                                   | CREATE_FAILED   | 2017-06-13T16:10:47Z | overcloud                                                                                                                                                |
| ControllerDeployment_Step1                   | aa764d47-2a8b-4377-ad9f-4bfe7428527a                                                                                                                                                 | OS::Heat::StructuredDeploymentGroup                                                                                            | CREATE_FAILED   | 2017-06-13T16:38:43Z | overcloud-AllNodesDeploySteps-3rqma3wcyrum                                                                                                               |
| 2                                            | 7c4c33eb-b4db-466f-b2b8-ac222f0e861a                                                                                                                                                 | OS::Heat::StructuredDeployment                                                                                                 | CREATE_FAILED   | 2017-06-13T16:41:48Z | overcloud-AllNodesDeploySteps-3rqma3wcyrum-ControllerDeployment_Step1-r3rbji33anuj                                                                       |

--------------
failure view : 
--------------
2:36:00 2017-08-09 21:23:44Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1]: CREATE_FAILED  Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
02:36:00 2017-08-09 21:23:45Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1]: CREATE_FAILED  Error: resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
02:36:00 2017-08-09 21:23:45Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: Error: resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
02:36:00 2017-08-09 21:23:46Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
02:36:00 2017-08-09 21:23:46Z [overcloud]: CREATE_FAILED  Resource CREATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
02:36:00 
02:36:00  Stack overcloud CREATE_FAILED 
02:36:00 
02:36:00 overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.1:
02:36:00   resource_type: OS::Heat::StructuredDeployment
02:36:00   physical_resource_id: d8769363-2a8f-4cab-a78a-77ea3e92dd85
02:36:00   status: CREATE_FAILED
02:36:00   status_reason: |
02:36:00     Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
02:36:00   deploy_stdout: |
02:36:00     ...
02:36:00         ], 
02:36:00         "changed": false, 
02:36:00         "failed": true, 
02:36:00         "failed_when_result": true
02:36:00     }
02:36:00     	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/1f2a23d0-5e97-4e7a-9a3a-4fdaf0ba918d_playbook.retry
02:36:00     
02:36:00     PLAY RECAP *********************************************************************
02:36:00     localhost                  : ok=8    changed=1    unreachable=0    failed=1   
02:36:00     
02:36:00     (truncated, view all with --long)
02:36:00   deploy_stderr: |
02:36:00 
02:36:00 overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0:
02:36:00   resource_type: OS::Heat::StructuredDeployment
02:36:00   physical_resource_id: de764d1f-1256-4446-ad8c-f5e0b2a62a7a
02:36:00   status: CREATE_FAILED
02:36:00   status_reason: |
02:36:00     Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
02:36:00   deploy_stdout: |
02:36:00     ...
02:36:00         ], 
02:36:00         "changed": false, 
02:36:00         "failed": true, 
02:36:00         "failed_when_result": true
02:36:00     }
02:36:00     	to retry, use: --limit @/var/lib/heaHeat Stack create failed.
02:36:00 Heat Stack create failed.
02:36:00 t-config/heat-config-ansible/34fd2f6f-de26-4db7-993b-f1b9765ac2fa_playbook.retry
02:36:00     
02:36:00     PLAY RECAP *********************************************************************
02:36:00     localhost                  : ok=8    changed=1    unreachable=0    failed=1   
02:36:00     
02:36:00     (truncated, view all with --long)
02:36:00   deploy_stderr: |
02:36:00 
02:36:00 Build step 'Virtualenv Builder' marked build as failure
02:36:01 Build step 'Groovy Postbuild' marked build as failure
02:36:01 Build step 'Groovy Postbuild' marked build as failure
02:36:01 [BFA] Scanning build for known causes...
02:36:02 .[BFA] Found failure cause(s):
02:36:02 [BFA] no-such-file-or-directory
02:36:02 [BFA] command not found from category shell
02:36:02 [BFA] Done. 1s
02:36:02 Started calculate disk usage of build
02:36:02 Finished Calculation of disk usage of build in 0 seconds
02:36:02 Started calculate disk usage of workspace
02:36:02 Finished Calculation of disk usage of workspace in 0 seconds
02:36:03 Finished: FAILURE




(undercloud) [stack@undercloud ~]$ heat deployment-show 7c4c33eb-b4db-466f-b2b8-ac222f0e861a | less
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED",
  "server_id": "10d956cc-9468-426c-b309-d078e081be90",
  "config_id": "8a882786-c197-4b73-b981-ea6339ddcdc3",
  "output_values": {
    "deploy_stdout": "\nPLAY [localhost] ***************************************************************\n\nTASK [Gathering Facts] *********************************************************\nok: [localhost]\n\nTASK [Write the config_step hieradata] *****************************************\nchanged: [localhost]\n\nTASK [Run puppet host configuration for step 1] ********************************\nok: [localhost]\n\nTASK [debug] *******************************************************************\nok: [localhost] => {\n    \"(outputs.stderr|default('')).split('\\n')|union(outputs.stdout_lines|default([]))\": [\n        \"exception: connect failed\", \n        \"Warning: Undefined variable 'deploy_config_name'; \", \n        \"   (file & line not available)\", \n        \"Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\\\"/etc/puppet/modules/ntp/manifests/init.pp\\\", 54]:[\\\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\\\", 29]\", \n        \"   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')\", \n        \"Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Absolute_Path. There is further documentation for validate_legacy function in the README. at [\\\"/etc/puppet/modules/ntp/manifests/init.pp\\\", 55]:[\\\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\\\", 29]\", \n        \"Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README. at [\\\"/etc/puppet/modules/ntp/manifests/init.pp\\\", 56]:[\\\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\\\", 29]\", \n        \"Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Array. There is further documentation for validate_legacy function in the README. at [\\\"/etc/puppet/modules/ntp/manifests/init.pp\\\", 66]:[\\\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\\\", 29]\", \n        \"Warning: This method is deprecated, please use the stdlib validate_legacy function, with Pattern[]. There is further documentation for validate_legacy function in the README. at [\\\"/etc/puppet/modules/ntp/manifests/init.pp\\\", 68]:[\\\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\\\", 29]\", \n        \"Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Numeric. There is further documentation for validate_legacy function in the README. at [\\\"/etc/puppet/modules/ntp/manifests/init.pp\\\", 89]:[\\\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\\\", 29]\", \n        \"Warning: ModuleLoader: module 'ssh' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\", \n        \"Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\\\"/etc/puppet/modules/ssh/manifests/server.pp\\\", 12]:[\\\"/var/lib/tripleo-config/puppet_step_config.pp\\\", 31]\", \n        \"Warning: ModuleLoader: module 'timezone' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\", \n        \"Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend\", \n        \"Notice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.\", \n        \"Notice: Compiled catalog for overcloud-controller-2.localdomain in environment production in 4.78 seconds\", \n        \"Notice: /Stage[main]/Tripleo::Profile::Base::Docker/Augeas[docker-sysconfig-options]/returns: executed successfully\", \n        \"Notice: /Stage[main]/Tripleo::Profile::Base::Docker/Augeas[docker-sysconfig-registry]/returns: executed successfully\", \n



--------------------------------------------------------------------

connecting to controller-2 :
---------------------------


[heat-admin@overcloud-controller-2 ~]$ sudo journalctl -u os-collect-config | grep ERROR
Aug 21 01:56:47 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:56:47,258] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 01:57:11 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:57:11,117] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 01:57:32 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:57:32,312] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 01:58:08 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:58:08,319] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 01:58:51 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:58:51,792] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:01:12 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:01:12,922] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:02:30 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:02:30,453] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:03:18 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:03:18,549] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:04:32 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:04:32,321] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:06:21 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:06:21,841] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:16:11 overcloud-controller-2 os-collect-config[3368]: , \n        \"OK, successfully used password, moving on...\\r\", \n        \"Setting the root password ensures that nobody can log into the MariaDB\\r\", \n        \"root user without the proper authorisation.\\r\", \n        \"Set root password? [Y/n] y\\r\", \n        \"New password: \\r\", \n        \"Re-enter new password: \\r\", \n        \"ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use\\r\", \n        \"Password update failed!\\r\", \n        \"Cleaning up...\\r\", \n        \"Failed to get 'Remove anonymous users?' prompt\", \n        \"stderr: INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json\", \n        \"INFO:__main__:Validating config file\", \n        \"INFO:__main__:Kolla config strategy set to: COPY_ALWAYS\", \n        \"INFO:__main__:Copying service configuration files\", \n        \"INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets\", \n        \"INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets\", \n        \"INFO:__main__:Deleting /etc/my.cnf.d/galera.cnf\", \n        \"INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/galera.cnf to /etc/my.cnf.d/galera.cnf\", \n        \"INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/sysconfig/clustercheck to /etc/sysconfig/clustercheck\", \n        \"INFO:__main__:Copying /var/lib/kolla/config_files/src/root/.my.cnf to /root/.my.cnf\", \n        \"INFO:__main__:Writing out command to execute\", \n        \"170821  2:15:59 [Warning] option 'open_files_limit': unsigned value 18446744073709551615 adjusted to 4294967295\", \n        \"170821  2:16:01 [Warning] option 'open_files_limit': unsigned value 18446744073709551615 adjusted to 4294967295\", \n        \"stdout: 1aadbb2545d1701665c7a1516d5dd60469779a4b67e5b4473038ffd3c3a7de48\"\n    ], \n    \"changed\": false, \n    \"failed\": true, \n    \"failed_when_result\": true\n}\n\tto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/8a882786-c197-4b73-b981-ea6339ddcdc3_p
Aug 21 02:16:12 overcloud-controller-2 os-collect-config[3368]: "ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use\r",
Aug 21 02:16:12 overcloud-controller-2 os-collect-config[3368]: [2017-08-21 02:16:11,806] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-ansible/8a882786-c197-4b73-b981-ea6339ddcdc3_playbook.yaml. [2]
Aug 21 02:16:13 overcloud-controller-2 os-collect-config[3368]: [2017-08-21 02:16:13,816] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None

Comment 1 Omri Hochman 2017-08-21 15:00:21 UTC
Adding notes from IRC : 
------------------------

(1) <bandini> ohochman, mandre: I think the reason is we are missing https://github.com/openstack/tripleo-heat-templates/commit/7968f37f6e65ed542340437f23b6c267b8b1f2ce
from this tht


(2) [10:32:26]  <tvignaud>	ohochman: ahrechan dciabrin|pto  https://code.engineering.redhat.com/gerrit/115956 MariaDB: create clustercheck user at container bootstrap

Comment 4 Jon Schlueter 2017-09-13 22:25:18 UTC
above mentioned patch 489277 is in downstream build openstack-tripleo-heat-templates-7.0.0-0.20170901051303.0rc1.el7ost

Comment 5 Omri Hochman 2017-09-13 22:28:42 UTC
looks like the issue reproduced with:
 openstack-tripleo-heat-templates-7.0.0-0.20170901051303.0rc1.el7ost.noarch


  \"In order to log into MariaDB to secure it, we'll need the current\\r\", \n        \"password for the root user.  If you've just installed MariaDB, and\\r\", \n        \"you haven't set the root password yet, the password will be blank,\\r\", \n        \"so you should just press enter here.\\r\", \n        \"Enter current password for root (enter for none): \\r\", \n        \"OK, successfully used password, moving on...\\r\", \n        \"Setting the root password ensures that nobody can log into the MariaDB\\r\", \n        \"root user without the proper authorisation.\\r\", \n        \"Set root password? [Y/n] y\\r\", \n        \"New password: \\r\", \n        \"Re-enter new password: \\r\", \n        \"ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use\\r\", \n        \"Password update failed!\\r\", \n        \"Cleaning up...\\r\", \n        \"Failed to get 'Remove anonymous users?' prompt\", \n        \"stderr: + '[' -e /var/lib/mysql/mysql ']'\", \n        \"+ kolla_start\", \n        \"INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json\", \n        \"INFO:__main__:Validating config file\", \n        \"INFO:__main__:Kolla config strategy set to: COPY_ALWAYS\", \n        \"INFO:__main__:Copying service configuration files\", \n        \"INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets\", \n        \"INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets\", \n        \"INFO:__main__:Deleting /etc/my.cnf.d/galera.cnf\", \n        \"INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/galera.cnf to /etc/my.cnf.d/galera.cnf\", \n        \"INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/sysconfig/clustercheck to /etc/sysconfig/clustercheck\", \n        \"INFO:__main__:Copying /var/lib/kolla/config_files/src/root/.my.cnf to /root/.my.cnf\", \n        \"INFO:__main__:Writing out command to execute\", \n        \"170913 17:51:25 [Warning] option 'open_files_limit': unsigned value 18446744073709551615 adjusted to 4294967295\", \n        \"170913 17:51:27 [Warning] option 'open_files_limit': unsigned value 18446744073709551615 adjusted to 4294967295\", \n        \"stdout: 386224818d266b6f82532fad1ac3fe0ea45ecd04c1db754cb5f80dd8a7474a19\"\n    ], \n    \"changed\": false, \n    \"failed\": true, \n    \"failed_when_result\": true\n}\n\tto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/2de16e6c-e601-4c18-bd1b-ce5a267accf8_playbook.retry\n\nPLAY RECAP *********************************************************************\nlocalhost                  : ok=8    changed=1    unreachable=0    failed=1   \n\n",

Comment 7 Michele Baldessari 2017-09-14 05:55:16 UTC
controller-2 failed in mysql_bootstrap:
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Error running ['docker', 'run', '--name', 'mysql_bootstrap', '--label', 'config_id=tripleo_step1', '--label', 'container_name=mysql_bootstrap', '--label', 'managed_by=paunch', '--label', 'config_data={\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"KOLLA_BOOTSTRAP=True\", \"KOLLA_KUBERNETES=True\", \"DB_MAX_TIMEOUT=60\", \"DB_CLUSTERCHECK_PASSWORD=wNZ6dj6DHpQg38Mn3wH4Mv9sj\", \"DB_ROOT_PASSWORD=kZlnL8BpMl\"], \"start_order\": 1, \"command\": [\"bash\", \"-ecx\", \"if [ -e /var/lib/mysql/mysql ]; then exit 0; fi\\\\nkolla_start\\\\nmysqld_safe --skip-networking --wsrep-on=OFF --wsrep-provider=none &\\\\ntimeout ${DB_MAX_TIMEOUT} /bin/bash -c \\'until mysqladmin -uroot -p\\\\\"${DB_ROOT_PASSWORD}\\\\\" ping 2>/dev/null; do sleep 1; done\\'\\\\nmysql -uroot -p\\\\\"${DB_ROOT_PASSWORD}\\\\\" -e \\\\\"CREATE USER \\'clustercheck\\'@\\'localhost\\' IDENTIFIED BY \\'${DB_CLUSTERCHECK_PASSWORD}\\';\\\\\"\\\\nmysql -uroot -p\\\\\"${DB_ROOT_PASSWORD}\\\\\" -e \\\\\"GRANT PROCESS ON *.* TO \\'clustercheck\\'@\\'localhost\\' WITH GRANT OPTION;\\\\\"\\\\ntimeout ${DB_MAX_TIMEOUT} mysqladmin -uroot -p\\\\\"${DB_ROOT_PASSWORD}\\\\\" shutdown\"], \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json\", \"/var/lib/config-data/puppet-generated/mysql/:/var/lib/kolla/config_files/src:ro\", \"/var/lib/mysql:/var/lib/mysql\"], \"image\": \"192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3\", \"detach\": false, \"net\": \"host\"}', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=KOLLA_BOOTSTRAP=True', '--env=KOL
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: LA_KUBERNETES=True', '--env=DB_MAX_TIMEOUT=60', '--env=DB_CLUSTERCHECK_PASSWORD=wNZ6dj6DHpQg38Mn3wH4Mv9sj', '--env=DB_ROOT_PASSWORD=kZlnL8BpMl', '--net=host', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json', '--volume=/var/lib/config-data/puppet-generated/mysql/:/var/lib/kolla/config_files/src:ro', '--volume=/var/lib/mysql:/var/lib/mysql', '192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3', 'bash', '-ecx', 'if [ -e /var/lib/mysql/mysql ]; then exit 0; fi\\nkolla_start\\nmysqld_safe --skip-networking --wsrep-on=OFF --wsrep-provider=none &\\ntimeout ${DB_MAX_TIMEOUT} /bin/bash -c \\'until mysqladmin -uroot -p\"${DB_ROOT_PASSWORD}\" ping 2>/dev/null; do sleep 1; done\\'\\nmysql -uroot -p\"${DB_ROOT_PASSWORD}\" -e \"CREATE USER \\'clustercheck\\'@\\'localhost\\' IDENTIFIED BY \\'${DB_CLUSTERCHECK_PASSWORD}\\';\"\\nmysql -uroot -p\"${DB_ROOT_PASSWORD}\" -e \"GRANT PROCESS ON *.* TO \\'clustercheck\\'@\\'localhost\\' WITH GRANT OPTION;\"\\ntimeout ${DB_MAX_TIMEOUT} mysqladmin -uroot -p\"${DB_ROOT_PASSWORD}\" shutdown']. [1]",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "stdout: Installing MariaDB/MySQL system tables in '/var/lib/mysql' ...",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "OK",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Filling help tables...",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "To start mysqld at boot time you have to copy",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "support-files/mysql.server to the right place for your system",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "To do so, start the server, then issue the following commands:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "'/usr/bin/mysqladmin' -u root password 'new-password'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "'/usr/bin/mysqladmin' -u root -h overcloud-controller-2 password 'new-password'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Alternatively you can run:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "'/usr/bin/mysql_secure_installation'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "which will also give you the option of removing the test",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "databases and anonymous user created by default.  This is",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "strongly recommended for production servers.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "See the MariaDB Knowledgebase at http://mariadb.com/kb or the",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "MySQL manual for more instructions.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "You can start the MariaDB daemon with:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "cd '/usr' ; /usr/bin/mysqld_safe --datadir='/var/lib/mysql'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "You can test the MariaDB daemon with mysql-test-run.pl",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "cd '/usr/mysql-test' ; perl mysql-test-run.pl",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Please report any problems at http://mariadb.org/jira",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "The latest information about MariaDB is available at http://mariadb.org/.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "You can find additional information about the MySQL part at:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "http://dev.mysql.com",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Support MariaDB development by buying support/new features from MariaDB",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Corporation Ab. You can contact us about this at sales.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Alternatively consider joining our community based development effort:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "http://mariadb.com/kb/en/contributing-to-the-mariadb-project/",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "170913 17:51:29 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "170913 17:51:29 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "170913 17:51:29 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.xYeqx3' --pid-file='/var/lib/mysql/overcloud-controller-2-recover.pid'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "170913 17:51:32 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "spawn mysql_secure_installation\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "/usr/bin/mysql_secure_installation: line 379: find_mysql_client: command not found\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "      SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "In order to log into MariaDB to secure it, we'll need the current\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "password for the root user.  If you've just installed MariaDB, and\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "you haven't set the root password yet, the password will be blank,\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "so you should just press enter here.\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Enter current password for root (enter for none): \r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "OK, successfully used password, moving on...\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Setting the root password ensures that nobody can log into the MariaDB\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "root user without the proper authorisation.\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Set root password? [Y/n] y\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "New password: \r",                                                                                                                                                                                                                                          
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Re-enter new password: \r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Password update failed!\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Cleaning up...\r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Failed to get 'Remove anonymous users?' prompt",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "stderr: + '[' -e /var/lib/mysql/mysql ']'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "+ kolla_start",

Interestingly enough it did work on controller-0:
11c5320a7b9c        192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3     "bash -ecx 'if [ -e /"   11 hours ago        Exited (0) 11 hours ago                       mysql_bootstrap
 
Same on controller-1:
4c8be359a5db        192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3     "bash -ecx 'if [ -e /"   11 hours ago        Exited (0) 11 hours ago                       mysql_bootstrap
 
Only contorller-2 failed. If I start the container by hand I get the following:
[root@overcloud-controller-2 ~]# docker run --rm -it  -e "KOLLA_CONFIG_STRATEGY=COPY_ALWAYS"  -e "KOLLA_BOOTSTRAP=True"  -e "KOLLA_KUBERNETES=True"  -e "DB_MAX_TIMEOUT=60"  -e "DB_CLUSTERCHECK_PASSWORD=wNZ6dj6DHpQg38Mn3wH4Mv9sj"  -e "DB_ROOT_PASSWORD=kZlnL8BpMl"  -e "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"  -e "container=oci"  -e "KOLLA_BASE_DISTRO=rhel"  -e "KOLLA_INSTALL_TYPE=binary"  -e "KOLLA_INSTALL_METATYPE=rhos"  -e "PS1=$(tput bold)($(printenv KOLLA_SERVICE_NAME))$(tput sgr0)[$(id -un)@$(hostname -s) $(pwd)]$ "   -v /dev/log:/dev/log -v /etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro -v /var/lib/mysql:/var/lib/mysql -v /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro -v /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro -v /var/lib/config-data/puppet-generated/mysql:/var/lib/kolla/config_files/src:ro -v /etc/hosts:/etc/hosts:ro -v /etc/puppet:/etc/puppet:ro -v /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro -v /var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json -v /etc/localtime:/etc/localtime:ro -v /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro -ti 192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3 /bin/bash
()[root@overcloud-controller-2 /root]$ id
uid=42434(mysql) gid=42434(mysql) groups=42434(mysql),42400(kolla)
()[root@overcloud-controller-2 /root]$ export TERM=xterm
 
Now if I run the same command that THT runs:
()[root@overcloud-controller-2 /root]$ mysqld_safe --skip-networking --wsrep-on=OFF --wsrep-provider=none &
[1] 10
()[root@overcloud-controller-2 /root]$ 170914 05:50:58 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'.
170914 05:50:58 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170914 05:50:58 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.G83dzc' --pid-file='/var/lib/mysql/003dbc6362fb-recover.pid'
170914 05:51:00 mysqld_safe WSREP: Recovered position 2db1852d-98ac-11e7-93b0-824a02ed3674:0
 
()[root@overcloud-controller-2 /root]$ more /var/log/mariadb/mariadb.log
170914 05:50:58 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170914 05:50:58 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.G83dzc' --pid-file='/var/lib/mysql/003dbc6362fb-recover.pid'
nohup: ignoring input
170914 05:51:00 mysqld_safe WSREP: Recovered position 2db1852d-98ac-11e7-93b0-824a02ed3674:0
170914  5:51:00 [Note] WSREP: wsrep_start_position var submitted: '2db1852d-98ac-11e7-93b0-824a02ed3674:0'
170914  5:51:00 [Warning] You need to use --log-bin to make --binlog-format work.
170914  5:51:00 InnoDB: The InnoDB memory heap is disabled
170914  5:51:00 InnoDB: Mutexes and rw_locks use GCC atomic builtins
170914  5:51:00 InnoDB: Compressed tables use zlib 1.2.7
170914  5:51:00 InnoDB: Using Linux native AIO
170914  5:51:00 InnoDB: Initializing buffer pool, size = 128.0M
170914  5:51:00 InnoDB: Completed initialization of buffer pool
170914  5:51:00 InnoDB: highest supported file format is Barracuda.
170914  5:51:00  InnoDB: Waiting for the background threads to start
170914  5:51:01 Percona XtraDB (http://www.percona.com) 5.5.41-MariaDB-37.0 started; log sequence number 1598283
170914  5:51:01 [Note] Plugin 'FEEDBACK' is disabled.
170914  5:51:01 [Note] Event Scheduler: Loaded 0 events
170914  5:51:01 [Note] WSREP: Initial position: 2db1852d-98ac-11e7-93b0-824a02ed3674:0
170914  5:51:01 [Note] WSREP: wsrep_load(): loading provider library 'none'
170914  5:51:01 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.42-MariaDB-wsrep'  socket: '/var/lib/mysql/mysql.sock'  port: 0  MariaDB Server, wsrep_25.11.r4026
 
What I don't get is is this WSREP recovering position stuff is normal. Will chat with Damien once he is up. (leaving needinfo on me)

Comment 8 Damien Ciabrini 2017-09-14 08:48:30 UTC
From the logs, this seems to happen during the regular kolla bootstrap shell script, so _before_ we add the clustercheck user.

I'm currently looking at the kolla steps in detail to find out why it would randomly fail.

Comment 9 Damien Ciabrini 2017-09-14 13:06:53 UTC
Omri,

So it looks to me that some of the calls to msqld_safe were still happening with WSREP enabled while they shouldn't during the bootstrap of the DB.

Could you maybe try to apply this patch from https://review.openstack.org/#/c/504049/ and see if it fixes your issues?

Comment 10 Omri Hochman 2017-09-14 21:24:22 UTC
Hi Damien  thanks for your help , 
I saw the deployment fails again on the BM env after applying the patch.

I assume you're still reviewing the issue,  just wanted to show the output I'm getting on the controller :


[root@overcloud-controller-0 ~]# journalctl -u os-collect-config | grep ERROR
Sep 14 19:34:12 overcloud-controller-0 os-collect-config[3395]: O:__main__:Copying service configuration files\", \n        \"INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets\", \n        \"INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets\", \n        \"ERROR:__main__:MissingRequiredSource: /var/lib/kolla/config_files/src/* file is not found\", \n        \"stdout: 4b35cbd91e0a094b5c07be5a1d156914d89ba2cab1f13b737b2e6f4f567bc936\"\n    ], \n    \"changed\": false, \n    \"failed\": true, \n    \"failed_when_result\": true\n}\n\tto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/b6072a53-edef-4116-933f-43dc524b6aeb_playbook.retry\n\nPLAY RECAP *********************************************************************\nlocalhost                  : ok=8    changed=1    unreachable=0    failed=1   \n\n", "deploy_stderr": "", "deploy_status_code": 2}
Sep 14 19:34:12 overcloud-controller-0 os-collect-config[3395]: "ERROR:__main__:MissingRequiredSource: /var/lib/kolla/config_files/src/* file is not found",
Sep 14 19:34:12 overcloud-controller-0 os-collect-config[3395]: [2017-09-14 19:34:12,671] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-ansible/b6072a53-edef-4116-933f-43dc524b6aeb_playbook.yaml. [2]
 

not sure why this playbook fails : 

[root@overcloud-controller-0 ~]# less /var/lib/heat-config/heat-config-ansible/b6072a53-edef-4116-933f-43dc524b6aeb_playbook.yaml

http://pastebin.test.redhat.com/516337

Comment 11 Damien Ciabrini 2017-09-19 06:18:53 UTC
Yes, the version you tested was still under review. I've fixed that error and did a deploy on a downstream env to validate it.

The review upstream has all the +2 needed, I'm waiting for the jobs to pass green.

Comment 12 Martin André 2017-09-20 13:32:11 UTC
*** Bug 1492314 has been marked as a duplicate of this bug. ***

Comment 13 Damien Ciabrini 2017-09-21 07:05:42 UTC
fix merged in Master and Pike upstream

Comment 15 Omri Hochman 2017-11-03 16:09:25 UTC
Unable to reproduce with : OpenStack-12.0-RHEL-7 Puddle: 2017-10-31.1

Comment 19 errata-xmlrpc 2017-12-13 21:53:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.