Bug 1426253
| Summary: | - Failed Services during Upgrade from OSP 8 - OSP 9 Step - major-upgrade-pacemaker-converge.yaml | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Randy Perryman <randy_perryman> | ||||||
| Component: | rhosp-director | Assignee: | Angus Thomas <athomas> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Amit Ugol <augol> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 9.0 (Mitaka) | CC: | arkady_kanevsky, aschultz, audra_cooper, cdevine, christopher_dearborn, dbecker, dcain, John_walsh, kurt_hey, mburns, michele, morazi, randy_perryman, rhel-osp-director-maint, smerrow, sreichar | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-02-24 17:59:23 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1305654 | ||||||||
| Attachments: |
|
||||||||
Created attachment 1256950 [details]
sosreport from one of the controllers
Created attachment 1256951 [details]
sosreport from one of the controllers part b
So the issue of the failed services is that they can't connect to galera. I am not yet sure as to why for example heat-engine claims 'Started' but in the logs it is still showing failure to connect to the DB: 2017-02-22 23:14:25.280 48930 WARNING oslo_db.sqlalchemy.engines [req-84063df0-eb92-491a-a63f-cae43f42a85d - - - - -] SQL connection failed. 5 attempts left. This seems to me some manifestation of server.conf being set with bind-address set to localhost again in the mysql config? # grep -ir bind etc/my.cnf.d/ etc/my.cnf.d/galera.cnf:bind-address = overcloud-controller-0 etc/my.cnf.d/galera.cnf.rpmnew:# Override bind-address etc/my.cnf.d/galera.cnf.rpmnew:# In some systems bind-address defaults to 127.0.0.1, and with mysqldump SST etc/my.cnf.d/galera.cnf.rpmnew:bind-address=0.0.0.0 etc/my.cnf.d/server.cnf:bind-address = 127.0.0.1 I'll check with Sofer, shortly (In reply to Michele Baldessari from comment #3) > So the issue of the failed services is that they can't connect to galera. > I am not yet sure as to why for example heat-engine claims 'Started' but in > the logs it is still showing failure to connect to the DB: > 2017-02-22 23:14:25.280 48930 WARNING oslo_db.sqlalchemy.engines > [req-84063df0-eb92-491a-a63f-cae43f42a85d - - - - -] SQL connection failed. > 5 attempts left. > > > This seems to me some manifestation of server.conf being set with > bind-address set to localhost again in the mysql config? > # grep -ir bind etc/my.cnf.d/ > etc/my.cnf.d/galera.cnf:bind-address = overcloud-controller-0 > etc/my.cnf.d/galera.cnf.rpmnew:# Override bind-address > etc/my.cnf.d/galera.cnf.rpmnew:# In some systems bind-address defaults to > 127.0.0.1, and with mysqldump SST > etc/my.cnf.d/galera.cnf.rpmnew:bind-address=0.0.0.0 > etc/my.cnf.d/server.cnf:bind-address = 127.0.0.1 > > I'll check with Sofer, shortly We realized we didn't have all of the patches applied. After doing so, Upgrade is now successful and all services are Started. Thanks Audra, I'll close as duplicate of the other one so we can also track the proper errata release there *** This bug has been marked as a duplicate of bug 1413686 *** |
Description of problem: At the Major major-upgrade-pacemaker-converge.yaml step after running many services failed to start on the controllers. Prior to running the step they were all running cleanly. Version-Release number of selected component (if applicable): Upgrade from OSP 8 to 9 How reproducible: Seen this one time so far Steps to Reproduce: 1. Install OSP 8 2. Complete Minor Update 3. Begin Major Upgrade to OSP 9 Actual results: Cluster name: tripleo_cluster Stack: corosync Current DC: overcloud-controller-0 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Wed Feb 22 23:33:24 2017 Last change: Wed Feb 22 22:22:56 2017 by hacluster via crmd on overcloud-controller-2 3 nodes and 130 resources configured Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Full list of resources: ip-192.168.190.183 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-192.168.120.184 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 Clone Set: haproxy-clone [haproxy] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] ip-192.168.170.20 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 ip-192.168.120.185 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-192.168.140.21 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 ip-192.168.140.20 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 Master/Slave Set: redis-master [redis] Masters: [ overcloud-controller-1 ] Slaves: [ overcloud-controller-0 overcloud-controller-2 ] Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: mongod-clone [mongod] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: rabbitmq-clone [rabbitmq] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: memcached-clone [memcached] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-l3-agent-clone [neutron-l3-agent] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-engine-clone [openstack-heat-engine] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-api-clone [openstack-heat-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-api-clone [openstack-nova-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-glance-registry-clone [openstack-glance-registry] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-cinder-api-clone [openstack-cinder-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-glance-api-clone [openstack-glance-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: delay-clone [delay] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-server-clone [neutron-server] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: httpd-clone [httpd] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] overcloud-controller-1-ipmi (stonith:fence_ipmilan): Started overcloud-controller-2 overcloud-controller-0-ipmi (stonith:fence_ipmilan): Started overcloud-controller-1 overcloud-controller-2-ipmi (stonith:fence_ipmilan): Started overcloud-controller-0 Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-core-clone [openstack-core] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-gnocchi-metricd-clone [openstack-gnocchi-metricd] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-sahara-api-clone [openstack-sahara-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-sahara-engine-clone [openstack-sahara-engine] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-gnocchi-statsd-clone [openstack-gnocchi-statsd] openstack-gnocchi-statsd (systemd:openstack-gnocchi-statsd): FAILED overcloud-controller-0 openstack-gnocchi-statsd (systemd:openstack-gnocchi-statsd): FAILED overcloud-controller-2 openstack-gnocchi-statsd (systemd:openstack-gnocchi-statsd): FAILED overcloud-controller-1 Failed Actions: * openstack-cinder-scheduler_monitor_60000 on overcloud-controller-0 'not running' (7): call=729, status=complete, exitreason='none', last-rc-change='Wed Feb 22 23:30:47 2017', queued=0ms, exec=0ms * openstack-nova-scheduler_start_0 on overcloud-controller-0 'OCF_TIMEOUT' (198): call=319, status=Timed Out, exitreason='none', last-rc-change='Wed Feb 22 22:10:20 2017', queued=0ms, exec=199987ms * openstack-cinder-volume_monitor_60000 on overcloud-controller-0 'not running' (7): call=378, status=complete, exitreason='none', last-rc-change='Wed Feb 22 22:23:10 2017', queued=0ms, exec=0ms * openstack-cinder-api_monitor_60000 on overcloud-controller-0 'not running' (7): call=727, status=complete, exitreason='none', last-rc-change='Wed Feb 22 23:30:44 2017', queued=0ms, exec=0ms * openstack-gnocchi-statsd_monitor_60000 on overcloud-controller-0 'not running' (7): call=735, status=complete, exitreason='none', last-rc-change='Wed Feb 22 23:33:23 2017', queued=0ms, exec=0ms * neutron-server_start_0 on overcloud-controller-0 'not running' (7): call=394, status=complete, exitreason='none', last-rc-change='Wed Feb 22 22:22:57 2017', queued=0ms, exec=90163ms * openstack-nova-scheduler_start_0 on overcloud-controller-2 'OCF_TIMEOUT' (198): call=311, status=Timed Out, exitreason='none', last-rc-change='Wed Feb 22 22:10:20 2017', queued=0ms, exec=199987ms * openstack-cinder-scheduler_monitor_60000 on overcloud-controller-2 'not running' (7): call=627, status=complete, exitreason='none', last-rc-change='Wed Feb 22 23:30:47 2017', queued=0ms, exec=0ms * openstack-cinder-api_monitor_60000 on overcloud-controller-2 'not running' (7): call=625, status=complete, exitreason='none', last-rc-change='Wed Feb 22 23:30:44 2017', queued=0ms, exec=0ms * openstack-gnocchi-statsd_monitor_60000 on overcloud-controller-2 'not running' (7): call=631, status=complete, exitreason='none', last-rc-change='Wed Feb 22 23:33:23 2017', queued=0ms, exec=0ms * neutron-server_start_0 on overcloud-controller-2 'not running' (7): call=374, status=complete, exitreason='none', last-rc-change='Wed Feb 22 22:22:57 2017', queued=0ms, exec=90161ms * openstack-nova-scheduler_start_0 on overcloud-controller-1 'OCF_TIMEOUT' (198): call=316, status=Timed Out, exitreason='none', last-rc-change='Wed Feb 22 22:10:20 2017', queued=0ms, exec=199988ms * openstack-cinder-scheduler_monitor_60000 on overcloud-controller-1 'not running' (7): call=632, status=complete, exitreason='none', last-rc-change='Wed Feb 22 23:30:47 2017', queued=0ms, exec=0ms * openstack-cinder-api_monitor_60000 on overcloud-controller-1 'not running' (7): call=630, status=complete, exitreason='none', last-rc-change='Wed Feb 22 23:30:44 2017', queued=0ms, exec=0ms * openstack-gnocchi-statsd_monitor_60000 on overcloud-controller-1 'not running' (7): call=636, status=complete, exitreason='none', last-rc-change='Wed Feb 22 23:33:23 2017', queued=0ms, exec=0ms * neutron-server_start_0 on overcloud-controller-1 'not running' (7): call=379, status=complete, exitreason='none', last-rc-change='Wed Feb 22 22:22:57 2017', queued=0ms, exec=92161ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@overcloud-controller-0 ~]# Expected results: No Stopped Services Additional info: