Bug 1174389
Summary: | rubygem-staypuft: Failed actions in pcs status report after a successful deployment. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||||||||
Component: | openstack-foreman-installer | Assignee: | Jason Guiditta <jguiditt> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | yeylon <yeylon> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | unspecified | CC: | cwolfe, eglynn, fdinitto, lnatapov, mburns, morazi, rhos-maint, srevivo, yeylon | ||||||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||||||
Target Release: | Installer | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-04-04 15:45:33 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1177026 | ||||||||||||
Attachments: |
|
Description
Alexander Chuzhoy
2014-12-15 19:09:56 UTC
Created attachment 969236 [details]
attaching logs from a host where pcs reports failure
Full output from pcs status: [root@maca25400702875 ~]# pcs status Cluster name: openstack Last updated: Mon Dec 15 14:16:47 2014 Last change: Mon Dec 15 11:34:13 2014 via crmd on pcmk-maca25400702875 Stack: corosync Current DC: pcmk-maca25400702875 (2) - partition with quorum Version: 1.1.10-32.el7_0.1-368c726 3 Nodes configured 124 Resources configured Online: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Full list of resources: ip-192.168.0.4 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 ip-192.168.0.3 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 ip-192.168.0.29 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 ip-192.168.0.31 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 ip-192.168.0.2 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 ip-192.168.0.32 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 ip-192.168.0.23 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 ip-192.168.0.24 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 ip-192.168.0.25 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 ip-192.168.0.36 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 Clone Set: memcached-clone [memcached] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: rabbitmq-server-clone [rabbitmq-server] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] ip-192.168.0.30 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 Clone Set: haproxy-clone [haproxy] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] ip-192.168.0.13 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 Master/Slave Set: galera-master [galera] Masters: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] ip-192.168.0.26 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 ip-192.168.0.28 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 ip-192.168.0.27 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 Clone Set: openstack-keystone-clone [openstack-keystone] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] ip-192.168.0.14 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 Clone Set: fs-varlibglanceimages-clone [fs-varlibglanceimages] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] ip-192.168.0.16 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 ip-192.168.0.15 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 Clone Set: openstack-glance-registry-clone [openstack-glance-registry] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-glance-api-clone [openstack-glance-api] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] ip-192.168.0.35 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 ip-192.168.0.33 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 ip-192.168.0.34 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-nova-api-clone [openstack-nova-api] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] ip-192.168.0.6 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 ip-192.168.0.5 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 ip-192.168.0.12 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 Clone Set: openstack-cinder-api-clone [openstack-cinder-api] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] openstack-cinder-volume (systemd:openstack-cinder-volume): Started pcmk-maca25400702875 Clone Set: neutron-server-clone [neutron-server] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] ip-192.168.0.17 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 ip-192.168.0.18 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 ip-192.168.0.19 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 ip-192.168.0.20 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876 ip-192.168.0.22 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877 ip-192.168.0.21 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875 Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Resource Group: neutron-agents neutron-openvswitch-agent (systemd:neutron-openvswitch-agent): Started pcmk-maca25400702876 neutron-dhcp-agent (systemd:neutron-dhcp-agent): Started pcmk-maca25400702876 neutron-l3-agent (systemd:neutron-l3-agent): Started pcmk-maca25400702876 neutron-metadata-agent (systemd:neutron-metadata-agent): Started pcmk-maca25400702876 Clone Set: openstack-heat-api-clone [openstack-heat-api] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Resource Group: heat openstack-heat-engine (systemd:openstack-heat-engine): Started pcmk-maca25400702877 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: httpd-clone [httpd] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: mongod-clone [mongod] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: ceilometer-delay-clone [ceilometer-delay] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] openstack-ceilometer-central (systemd:openstack-ceilometer-central): Started pcmk-maca25400702875 Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator] Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ] Failed actions: openstack-ceilometer-central_monitor_30000 on pcmk-maca25400702875 'not running' (7): call=571, status=complete, last-rc-change='Mon Dec 15 13:50:33 2014', queued=0ms, exec=0ms neutron-server_monitor_30000 on pcmk-maca25400702875 'OCF_PENDING' (196): call=527, status=complete, last-rc-change='Mon Dec 15 13:49:29 2014', queued=0ms, exec=0ms PCSD Status: pcmk-maca25400702876: Online pcmk-maca25400702875: Online pcmk-maca25400702877: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled Created attachment 969276 [details]
logs/conf from a controller reported with issues
Created attachment 969278 [details]
Logs from the second controller
Created attachment 969280 [details]
Logs from the third controller
/var/log/ceilometer/central.log on c1 includes the following output (multiple times): File Edit Options Buffers Tools Help 2014-12-15 11:20:12.409 23864 ERROR ceilometer.nova_client [-] The request you have made requires authentication. (HTTP 401) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client Traceback (most recent call last): 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/ceilometer/nova_client.py", line 51, in with_logging 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client return func(*args, **kwargs) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/ceilometer/nova_client.py", line 155, in instance_get_all 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client search_opts=search_opts) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/v1_1/servers.py", line 603, in list 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client return self._list("/servers%s%s" % (detail, query_string), "servers") 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 67, in _list 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client _resp, body = self.api.client.get(url) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 487, in get 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client return self._cs_request(url, 'GET', **kwargs) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 446, in _cs_request 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client self.authenticate() 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 586, in authenticate 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client auth_url = self._v2_auth(auth_url) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 677, in _v2_auth 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client return self._authenticate(url, body) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 690, in _authenticate 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client **kwargs) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 439, in _time_request 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client resp, body = self.request(url, method, **kwargs) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 433, in request 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client raise exceptions.from_response(resp, body, url, method) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client Unauthorized: The request you have made requires authentication. (HTTP 401) 2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client (and similar log lines for ceilometer.agent instead of ceilometer.nova_client). to be continued... *** Bug 1174784 has been marked as a duplicate of this bug. *** *** Bug 1174800 has been marked as a duplicate of this bug. *** Looking at the cN-logs tarballs attached, I noticed that the configured auth_url differs between the services, specifically both ceilometer & neutron use 192.168.0.26:35357, whereas nova uses 192.168.0.28:35357 and glance uses localhost:5000, e.g. $ find . -name "*.conf" | xargs grep auth_url | grep -v '#' ./etc/neutron/neutron.conf:nova_admin_auth_url =http://192.168.0.26:35357/v2.0 ./etc/ceilometer/ceilometer.conf:os_auth_url=http://192.168.0.26:35357/v2.0 ./etc/nova/nova.conf:admin_auth_url=http://192.168.0.28:35357/v2.0 ./etc/glance/glance-cache.conf:auth_url = http://localhost:5000/v2.0 Do we expect that spread of keystone addresses for this kind of deployment? Is the keystone instance addressable at 192.168.0.26 actually running cleanly? An older version is probably the issue with ceilometer. Looking at ceilometer.conf from an uploaded log: $ grep rabbit_host c1/etc/ceilometer/ceilometer.conf #rabbit_host=localhost rabbit_host=127.0.0.1 #rabbit_hosts=$rabbit_host:$rabbit_port rabbit_hosts=127.0.0.1:5672 Instead of 127.0.0.1, it should be a rabbit vip which was fixed in: https://bugzilla.redhat.com/show_bug.cgi?id=1173217 Regarding auth_url question, given haproxy.cfg that looks like: listen keystone-admin bind 192.168.0.28:35357 bind 192.168.0.27:35357 bind 192.168.0.26:35357 mode tcp option tcplog server pcmk-maca25400702876 192.168.0.8:35357 check inter 1s server pcmk-maca25400702875 192.168.0.7:35357 check inter 1s server pcmk-maca25400702877 192.168.0.10:35357 check inter 1s either the .26 or .28 url should work. The glance-cache.conf auth_url looks broken, however. (In reply to Crag Wolfe from comment #12) > > either the .26 or .28 url should work. The glance-cache.conf auth_url looks > broken, however. This part was fixed as part of the openstack-foreman-installer-3.0.8-1.el7ost build Regarding the neutron-server_monitor failed action -- pacemaker makes a number of attempts at starting neutron-server: $ grep 'Recover neutron-server' messages Dec 15 11:26:19 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:1 (Started pcmk-maca25400702876) Dec 15 11:32:00 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 11:48:07 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 11:48:10 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 12:49:34 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 13:49:32 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 13:49:35 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) However, I'm not finding the smoking gun that caused pacemaker to log that. And there is evidence of neutron-server being restarted periodically: $ grep 'Config paste file' neutron/server.log 2014-12-15 11:08:04.147 3858 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:08:58.880 5028 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:25:21.637 1708 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:31:58.455 14468 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:32:07.814 14662 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:48:07.838 8659 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:48:15.858 8810 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 12:19:42.713 25450 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 12:49:25.953 6900 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 12:49:41.749 7483 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 13:19:33.439 21549 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 13:49:30.586 3604 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 13:49:38.705 3752 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 14:19:35.128 18144 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini But it does not appear that neutron-server process was actually dead before these restarts took place. I am not seeing any of these restarts or failed actions in my development environment, making this difficult to debug. Going back to the well and seeing if David has any additional insights (thanks David!). Did you all increase the start timeout for neutron-server yet? I saw something similar to this last night and recommended setting 'op start timeout=90s' for neutron-server. pcs resource create neutron-server systemd:neutron-server op start timeout=90s --clone or to update a running deployment pcs resource op add neutron-server start timeout=90s -- David (In reply to David Vossel from comment #15) > Did you all increase the start timeout for neutron-server yet? I saw > something similar to this last night and recommended setting 'op start > timeout=90s' for neutron-server. > > pcs resource create neutron-server systemd:neutron-server op start > timeout=90s --clone > > or to update a running deployment > > pcs resource op add neutron-server start timeout=90s > > -- David We will be adding this as part of our effort to align with the newly released HA ref arch for OSP 6. Fabio, the arch has a timeout of 60 seconds, and both this BZ and https://bugzilla.redhat.com/show_bug.cgi?id=1175525 suggest 90, which should we be using? (In reply to Jason Guiditta from comment #16) > (In reply to David Vossel from comment #15) > > Did you all increase the start timeout for neutron-server yet? I saw > > something similar to this last night and recommended setting 'op start > > timeout=90s' for neutron-server. > > > > pcs resource create neutron-server systemd:neutron-server op start > > timeout=90s --clone > > > > or to update a running deployment > > > > pcs resource op add neutron-server start timeout=90s > > > > -- David > > We will be adding this as part of our effort to align with the newly > released HA ref arch for OSP 6. Fabio, the arch has a timeout of 60 > seconds, and both this BZ and > https://bugzilla.redhat.com/show_bug.cgi?id=1175525 suggest 90, which should > we be using? 90 seconds is fine. I'll fix the doc. *** Bug 1179493 has been marked as a duplicate of this bug. *** |