Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1132155

Summary:

Rubygem-Staypuft: HA-nova deployment fails - puppet run completed with errors: Could not prefetch keystone_endpoint provider 'keystone': Execution of '/usr/bin/keystone --os-endpoint http://192.168.0.9:35357/v2.0/ endpoint-list' returned 1: Unable to e

Product:

Red Hat OpenStack

Reporter:

Alexander Chuzhoy <sasha>

Component:

openstack-foreman-installer

Assignee:

Crag Wolfe <cwolfe>

Status:

CLOSED ERRATA

QA Contact:

Leonid Natapov <lnatapov>

Severity:

high

Docs Contact:

Priority:

high

Version:

Foreman (RHEL 6)

CC:

aberezin, augol, breeler, cwolfe, mburns, morazi, ohochman, racedoro, rhos-maint, sclewis, yeylon

Target Milestone:

Keywords:

TestOnly, Triaged

Target Release:

Installer

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Previously, Puppet was running before the database was ready across all nodes. As a result, services would try to start up before the database was ready, causing intermittent failure. This has been fixed by tighter orchestration with respect to the database deployment. Now, services only start once the database is ready.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-10-02 12:56:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
tarball of /var/log/messages and /var/log/mariadb/mariadb.log from the 3 nodes	none

Description Alexander Chuzhoy 2014-08-20 19:55:07 UTC

Rubygem-Staypuft:  HA-nova deployment fails - puppet run completed with errors:  	Could not prefetch keystone_endpoint provider 'keystone': Execution of '/usr/bin/keystone --os-endpoint http://192.168.0.9:35357/v2.0/ endpoint-list' returned 1: Unable to e



Environment:
rhel-osp-installer-0.1.10-2.el6ost.noarch
openstack-foreman-installer-2.0.21-1.el6ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch
openstack-puppet-modules-2014.1-20.2.el6ost.noarch

Steps to reproduce:
1. Install rhel-osp-installer
2. Configure/Run HA-Nova deployment.

Result: 
The deployment fails.
I noticed that the mariadb was down on one controller.
Also,
I see the following puppet erros reported:
Controller #1:
/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]   keystone-manage db_sync returned 1 instead of one of [0]
/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync] 	Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0]

Controller #2:
Could not prefetch keystone_endpoint provider 'keystone': Execution of '/usr/bin/keystone --os-endpoint http://192.168.0.9:35357/v2.0/ endpoint-list' returned 1: Unable to establish connection to http://192.168.0.9:35357/v2.0/endpoints

Controller #3:
change from notrun to 0 failed: /tmp/ha-all-in-one-util.bash i_am_vip 192.168.0.95 || /tmp/ha-all-in-one-util.bash property_exists keystone returned 1 instead of one of [0]
/tmp/ha-all-in-one-util.bash i_am_vip 192.168.0.95 || /tmp/ha-all-in-one-util.bash property_exists keystone returned 1 instead of one of [0]



Expected result: the deployment should succeed.

Comment 1 Crag Wolfe 2014-08-20 20:01:11 UTC

Created attachment 928935 [details]
tarball of /var/log/messages and /var/log/mariadb/mariadb.log from the 3 nodes

Comment 2 Crag Wolfe 2014-08-20 20:08:59 UTC

Based on /var/log/messages from 10.8.30.210, it looks like galera was up and running until it got added as a pacemaker resource which then resulted in " systemd: Stopped MariaDB database server."  Which then caused the keystone db_sync to fail.

...

Aug 20 17:20:14 maca25400702875 puppet-agent[3169]: (/Stage[main]/Quickstack::Pacemaker::Galera/Exec[all-galera-nodes-are-up]/returns) executed successfully
Aug 20 17:20:14 maca25400702875 puppet-agent[3169]: (/Stage[main]/Quickstack::Pacemaker::Keystone/Keystone_config[DEFAULT/max_retries]/ensure) created
Aug 20 17:20:14 maca25400702875 puppet-agent[3169]: (/Stage[main]/Quickstack::Firewall::Keystone/Firewall[001 keystone incoming]/ensure) created
Aug 20 17:20:17 maca25400702875 systemd: Reloading.
Aug 20 17:20:17 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation mysqld_monitor_0 (call=45, rc=0, cib-update=63, confirmed=true) ok
Aug 20 17:20:17 maca25400702875 systemd: Reloading.
Aug 20 17:20:17 maca25400702875 systemd: Stopping MariaDB database server...
Aug 20 17:20:17 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation mysqld_stop_0 (call=46, rc=0, cib-update=64, confirmed=true) ok
Aug 20 17:20:17 maca25400702875 systemd: Reloading.
Aug 20 17:20:17 maca25400702875 systemd: Starting MariaDB database server...
Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation mysqld_start_0 (call=47, rc=7, cib-update=65, confirmed=true) not running
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_cs_dispatch: Update relayed from maca25400702876.example.com
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-mysqld (INFINITY)
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_perform_update: Sent update 221: fail-count-mysqld=INFINITY
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_cs_dispatch: Update relayed from maca25400702876.example.com
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-mysqld (1408555219)
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_perform_update: Sent update 224: last-failure-mysqld=1408555219
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_cs_dispatch: Update relayed from maca25400702876.example.com
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-mysqld (INFINITY)
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_perform_update: Sent update 227: fail-count-mysqld=INFINITY
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_cs_dispatch: Update relayed from maca25400702876.example.com
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-mysqld (1408555219)
Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_perform_update: Sent update 230: last-failure-mysqld=1408555219
Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation ip-192.168.0.95_monitor_0 (call=51, rc=7, cib-update=66, confirmed=true) not running
Aug 20 17:20:19 maca25400702875 systemd: Reloading.
Aug 20 17:20:19 maca25400702875 systemd: Starting MariaDB database server...
Aug 20 17:20:19 maca25400702875 systemd: Stopping MariaDB database server...
Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation mysqld_stop_0 (call=53, rc=0, cib-update=67, confirmed=true) ok
Aug 20 17:20:19 maca25400702875 IPaddr2(ip-192.168.0.95)[17398]: INFO: Adding inet address 192.168.0.95/32 with broadcast address 192.168.0.255 to device eth2
Aug 20 17:20:19 maca25400702875 IPaddr2(ip-192.168.0.95)[17398]: INFO: Bringing device eth2 up
Aug 20 17:20:19 maca25400702875 IPaddr2(ip-192.168.0.95)[17398]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.0.95 eth2 192.168.0.95 auto not_used not_used
Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation ip-192.168.0.95_start_0 (call=52, rc=0, cib-update=68, confirmed=true) ok
Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation ip-192.168.0.95_monitor_30000 (call=54, rc=0, cib-update=69, confirmed=false) ok
Aug 20 17:20:21 maca25400702875 mysqld_safe: 140820 17:20:21 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended
Aug 20 17:20:21 maca25400702875 systemd: Stopped MariaDB database server.
Aug 20 17:20:25 maca25400702875 puppet-agent[3169]: (/Stage[main]/Quickstack::Pacemaker::Keystone/Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip]/returns) executed successfully
Aug 20 17:20:26 maca25400702875 puppet-agent[3169]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0]

Comment 3 Alexander Chuzhoy 2014-08-20 21:12:55 UTC

Subsequent re-deployment got passed the HA-controllers and failed on compute:
Execution of '/usr/bin/nova-manage network create novanetwork  192.168.32.0/23 2  --vlan 10' returned 1: Command failed, please check  log for more info

Comment 4 Omri Hochman 2014-08-21 11:02:09 UTC

Happened to me when attempted HA-Nova deployment using staypuft:

It seems that puppet failed to run the command 'keystone-manage db_sync'  and because of that the deployment failed.   

When I ran the command 'keystone-manage db_sync' manually on the controllers it worked fine, then re-run puppet agent and the deployment finished successfully. 

openstack-puppet-modules-2014.1-20.2.el6ost.noarch 


/var/log/messages :
------------------
Aug 21 08:57:53 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Quickstack::Firewall::Keystone/Firewall[001 keystone incoming]/ensure) created
Aug 21 09:59:43 maca25400702877.example.com puppet-agent[3861]: /tmp/ha-all-in-one-util.bash i_am_vip 192.168.0.95 || /tmp/ha-all-in-one-util.bash property_exists keystone returned 1 instead of one of [0]
Aug 21 09:59:43 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Quickstack::Pacemaker::Keystone/Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip]/returns) change from notrun to 0 failed: /tmp/ha-all-in-one-util.bash i_am_vip 192.168.0.95 || /tmp/ha-all-in-one-util.bash property_exists keystone returned 1 instead of one of [0]
Aug 21 09:59:43 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Dependency Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip] has failures: true
Aug 21 09:59:43 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Skipping because of failed dependencies
Aug 21 09:59:44 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0]
Aug 21 09:59:44 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) keystone-manage db_sync returned 1 instead of one of [0]
Aug 21 09:59:44 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone/Exec[keystone-manage pki_setup]) Dependency Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip] has failures: true
Aug 21 09:59:44 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone/Exec[keystone-manage pki_setup]) Skipping because of failed dependencies
Aug 21 09:59:45 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone/Exec[keystone-manage pki_setup]) Triggered 'refresh' from 27 events
Aug 21 09:59:45 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone/Service[keystone]) Dependency Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip] has failures: true

Comment 5 Crag Wolfe 2014-08-21 15:53:29 UTC

Omri, more than likely one of the db nodes was down when this keystone db_sync was attempted.  We would need to see /var/log/messages from all three nodes to verify, however (and /var/log/mariadb/mariadb.log would help, assuming it is a db issue).  My guess is that mariadb was restarted by pacemaker and the keystone db_sync was attempted before that restart was complete.  But without the logs, it is just speculation.

Comment 6 Alexander Chuzhoy 2014-08-21 21:49:15 UTC

Reproduced with HA-neutron (vlan networking type).

Comment 7 Crag Wolfe 2014-08-22 00:11:33 UTC

Patch posted: https://github.com/redhat-openstack/astapor/pull/359

Comment 8 Jason Guiditta 2014-08-26 16:09:06 UTC

Alternate approach here:
https://github.com/redhat-openstack/astapor/pull/360

Comment 9 Crag Wolfe 2014-08-26 16:28:44 UTC

It's not an alternate approach to PR 359 (but maybe "alternate" was intended to mean an aspect of the PR), afaict.  I.e., from the comment in PR 359 "For the initial puppet run, once galera is fully bootstrapped, shut it down before we add it as a pacemaker resource which will start it back up."  And "Make sure the galera cluster is up with one last clustercheck."  360 is a pure puppet ordering thing, 359 is not...

Comment 11 Omri Hochman 2014-08-28 12:12:23 UTC

Reproduced with puddle 2014-08-27.4 :  
--------------------------------------
openstack-foreman-installer-2.0.22-1.el6ost.noarch
rhel-osp-installer-0.1.10-2.el6ost.noarch
openstack-puppet-modules-2014.1-21.4.el6ost.noarch


Steps: 
-------
-Attempt to deploy HA-nova (3 X controller , 1 X compute) 

journalctl -u puppet:
-----------------------
Aug 28 10:44:40 maca25400702875.example.com puppet-agent[3443]: Could not prefetch keystone_endpoint provider 'keystone': Execution of '/usr/bin/keystone --os-endpoint http://192.168.0.7:35357/v2.0/ endpoint-list' returned 1: Unable to establish connection to http://192.168.0.7:35357/v2.0/endpoints
Aug 28 10:44:40 maca25400702875.example.com puppet-agent[3443]: (/Stage[main]/Heat::Keystone::Auth/Keystone_endpoint[RegionOne/heat]) Dependency Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip] has failures: true
Aug 28 10:44:40 maca25400702875.example.com puppet-agent[3443]: (/Stage[main]/Heat::Keystone::Auth/Keystone_endpoint[RegionOne/heat]) Skipping because of failed dependencies

Comment 12 Crag Wolfe 2014-08-28 14:37:38 UTC

Please attach /var/log/messages and /var/log/mariadb/mariadb.log for the 3 nodes so we can try to debug this.

Comment 14 Alexander Chuzhoy 2014-09-04 19:49:30 UTC

Tried to reproduce a couple of times:
Environment:
rhel-osp-installer-0.1.10-2.el6ost.noarch
openstack-foreman-installer-2.0.22-1.el6ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch
openstack-puppet-modules-2014.1-21.7.el6ost.noarch


The reported issue didn't reproduce. 

Instead reproduced the issue reported in comment #3. Created BZ #1138407.

Comment 15 Leonid Natapov 2014-09-22 12:40:00 UTC

ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch
openstack-puppet-modules-2014.1-21.8.el6ost.noarch
rhel-osp-installer-0.3.5-1.el6ost.noarch
openstack-foreman-installer-2.0.24-1.el6ost.noarch


Nor reproduces.

Comment 17 Scott Lewis 2014-10-02 12:56:29 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1350.html

Comment 18 Amit Ugol 2018-05-02 10:52:54 UTC

closed, no need for needinfo.