Bug 1132155
| Summary: | Rubygem-Staypuft: HA-nova deployment fails - puppet run completed with errors: Could not prefetch keystone_endpoint provider 'keystone': Execution of '/usr/bin/keystone --os-endpoint http://192.168.0.9:35357/v2.0/ endpoint-list' returned 1: Unable to e | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||
| Component: | openstack-foreman-installer | Assignee: | Crag Wolfe <cwolfe> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Leonid Natapov <lnatapov> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | Foreman (RHEL 6) | CC: | aberezin, augol, breeler, cwolfe, mburns, morazi, ohochman, racedoro, rhos-maint, sclewis, yeylon | ||||
| Target Milestone: | z1 | Keywords: | TestOnly, Triaged | ||||
| Target Release: | Installer | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Previously, Puppet was running before the database was ready across all nodes. As a result, services would try to start up before the database was ready, causing intermittent failure.
This has been fixed by tighter orchestration with respect to the database deployment. Now, services only start once the database is ready.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-10-02 12:56:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Alexander Chuzhoy
2014-08-20 19:55:07 UTC
Created attachment 928935 [details]
tarball of /var/log/messages and /var/log/mariadb/mariadb.log from the 3 nodes
Based on /var/log/messages from 10.8.30.210, it looks like galera was up and running until it got added as a pacemaker resource which then resulted in " systemd: Stopped MariaDB database server." Which then caused the keystone db_sync to fail. ... Aug 20 17:20:14 maca25400702875 puppet-agent[3169]: (/Stage[main]/Quickstack::Pacemaker::Galera/Exec[all-galera-nodes-are-up]/returns) executed successfully Aug 20 17:20:14 maca25400702875 puppet-agent[3169]: (/Stage[main]/Quickstack::Pacemaker::Keystone/Keystone_config[DEFAULT/max_retries]/ensure) created Aug 20 17:20:14 maca25400702875 puppet-agent[3169]: (/Stage[main]/Quickstack::Firewall::Keystone/Firewall[001 keystone incoming]/ensure) created Aug 20 17:20:17 maca25400702875 systemd: Reloading. Aug 20 17:20:17 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation mysqld_monitor_0 (call=45, rc=0, cib-update=63, confirmed=true) ok Aug 20 17:20:17 maca25400702875 systemd: Reloading. Aug 20 17:20:17 maca25400702875 systemd: Stopping MariaDB database server... Aug 20 17:20:17 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation mysqld_stop_0 (call=46, rc=0, cib-update=64, confirmed=true) ok Aug 20 17:20:17 maca25400702875 systemd: Reloading. Aug 20 17:20:17 maca25400702875 systemd: Starting MariaDB database server... Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation mysqld_start_0 (call=47, rc=7, cib-update=65, confirmed=true) not running Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_cs_dispatch: Update relayed from maca25400702876.example.com Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-mysqld (INFINITY) Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_perform_update: Sent update 221: fail-count-mysqld=INFINITY Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_cs_dispatch: Update relayed from maca25400702876.example.com Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-mysqld (1408555219) Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_perform_update: Sent update 224: last-failure-mysqld=1408555219 Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_cs_dispatch: Update relayed from maca25400702876.example.com Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-mysqld (INFINITY) Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_perform_update: Sent update 227: fail-count-mysqld=INFINITY Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_cs_dispatch: Update relayed from maca25400702876.example.com Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-mysqld (1408555219) Aug 20 17:20:19 maca25400702875 attrd[12405]: notice: attrd_perform_update: Sent update 230: last-failure-mysqld=1408555219 Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation ip-192.168.0.95_monitor_0 (call=51, rc=7, cib-update=66, confirmed=true) not running Aug 20 17:20:19 maca25400702875 systemd: Reloading. Aug 20 17:20:19 maca25400702875 systemd: Starting MariaDB database server... Aug 20 17:20:19 maca25400702875 systemd: Stopping MariaDB database server... Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation mysqld_stop_0 (call=53, rc=0, cib-update=67, confirmed=true) ok Aug 20 17:20:19 maca25400702875 IPaddr2(ip-192.168.0.95)[17398]: INFO: Adding inet address 192.168.0.95/32 with broadcast address 192.168.0.255 to device eth2 Aug 20 17:20:19 maca25400702875 IPaddr2(ip-192.168.0.95)[17398]: INFO: Bringing device eth2 up Aug 20 17:20:19 maca25400702875 IPaddr2(ip-192.168.0.95)[17398]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.0.95 eth2 192.168.0.95 auto not_used not_used Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation ip-192.168.0.95_start_0 (call=52, rc=0, cib-update=68, confirmed=true) ok Aug 20 17:20:19 maca25400702875 crmd[12407]: notice: process_lrm_event: LRM operation ip-192.168.0.95_monitor_30000 (call=54, rc=0, cib-update=69, confirmed=false) ok Aug 20 17:20:21 maca25400702875 mysqld_safe: 140820 17:20:21 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended Aug 20 17:20:21 maca25400702875 systemd: Stopped MariaDB database server. Aug 20 17:20:25 maca25400702875 puppet-agent[3169]: (/Stage[main]/Quickstack::Pacemaker::Keystone/Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip]/returns) executed successfully Aug 20 17:20:26 maca25400702875 puppet-agent[3169]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0] Subsequent re-deployment got passed the HA-controllers and failed on compute: Execution of '/usr/bin/nova-manage network create novanetwork 192.168.32.0/23 2 --vlan 10' returned 1: Command failed, please check log for more info Happened to me when attempted HA-Nova deployment using staypuft: It seems that puppet failed to run the command 'keystone-manage db_sync' and because of that the deployment failed. When I ran the command 'keystone-manage db_sync' manually on the controllers it worked fine, then re-run puppet agent and the deployment finished successfully. openstack-puppet-modules-2014.1-20.2.el6ost.noarch /var/log/messages : ------------------ Aug 21 08:57:53 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Quickstack::Firewall::Keystone/Firewall[001 keystone incoming]/ensure) created Aug 21 09:59:43 maca25400702877.example.com puppet-agent[3861]: /tmp/ha-all-in-one-util.bash i_am_vip 192.168.0.95 || /tmp/ha-all-in-one-util.bash property_exists keystone returned 1 instead of one of [0] Aug 21 09:59:43 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Quickstack::Pacemaker::Keystone/Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip]/returns) change from notrun to 0 failed: /tmp/ha-all-in-one-util.bash i_am_vip 192.168.0.95 || /tmp/ha-all-in-one-util.bash property_exists keystone returned 1 instead of one of [0] Aug 21 09:59:43 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Dependency Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip] has failures: true Aug 21 09:59:43 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Skipping because of failed dependencies Aug 21 09:59:44 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0] Aug 21 09:59:44 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) keystone-manage db_sync returned 1 instead of one of [0] Aug 21 09:59:44 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone/Exec[keystone-manage pki_setup]) Dependency Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip] has failures: true Aug 21 09:59:44 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone/Exec[keystone-manage pki_setup]) Skipping because of failed dependencies Aug 21 09:59:45 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone/Exec[keystone-manage pki_setup]) Triggered 'refresh' from 27 events Aug 21 09:59:45 maca25400702877.example.com puppet-agent[3861]: (/Stage[main]/Keystone/Service[keystone]) Dependency Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip] has failures: true Omri, more than likely one of the db nodes was down when this keystone db_sync was attempted. We would need to see /var/log/messages from all three nodes to verify, however (and /var/log/mariadb/mariadb.log would help, assuming it is a db issue). My guess is that mariadb was restarted by pacemaker and the keystone db_sync was attempted before that restart was complete. But without the logs, it is just speculation. Reproduced with HA-neutron (vlan networking type). Patch posted: https://github.com/redhat-openstack/astapor/pull/359 Alternate approach here: https://github.com/redhat-openstack/astapor/pull/360 It's not an alternate approach to PR 359 (but maybe "alternate" was intended to mean an aspect of the PR), afaict. I.e., from the comment in PR 359 "For the initial puppet run, once galera is fully bootstrapped, shut it down before we add it as a pacemaker resource which will start it back up." And "Make sure the galera cluster is up with one last clustercheck." 360 is a pure puppet ordering thing, 359 is not... Reproduced with puddle 2014-08-27.4 : -------------------------------------- openstack-foreman-installer-2.0.22-1.el6ost.noarch rhel-osp-installer-0.1.10-2.el6ost.noarch openstack-puppet-modules-2014.1-21.4.el6ost.noarch Steps: ------- -Attempt to deploy HA-nova (3 X controller , 1 X compute) journalctl -u puppet: ----------------------- Aug 28 10:44:40 maca25400702875.example.com puppet-agent[3443]: Could not prefetch keystone_endpoint provider 'keystone': Execution of '/usr/bin/keystone --os-endpoint http://192.168.0.7:35357/v2.0/ endpoint-list' returned 1: Unable to establish connection to http://192.168.0.7:35357/v2.0/endpoints Aug 28 10:44:40 maca25400702875.example.com puppet-agent[3443]: (/Stage[main]/Heat::Keystone::Auth/Keystone_endpoint[RegionOne/heat]) Dependency Exec[i-am-keystone-vip-OR-keystone-is-up-on-vip] has failures: true Aug 28 10:44:40 maca25400702875.example.com puppet-agent[3443]: (/Stage[main]/Heat::Keystone::Auth/Keystone_endpoint[RegionOne/heat]) Skipping because of failed dependencies Please attach /var/log/messages and /var/log/mariadb/mariadb.log for the 3 nodes so we can try to debug this. Tried to reproduce a couple of times: Environment: rhel-osp-installer-0.1.10-2.el6ost.noarch openstack-foreman-installer-2.0.22-1.el6ost.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch openstack-puppet-modules-2014.1-21.7.el6ost.noarch The reported issue didn't reproduce. Instead reproduced the issue reported in comment #3. Created BZ #1138407. ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch openstack-puppet-modules-2014.1-21.8.el6ost.noarch rhel-osp-installer-0.3.5-1.el6ost.noarch openstack-foreman-installer-2.0.24-1.el6ost.noarch Nor reproduces. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1350.html closed, no need for needinfo. |