Bug 1174389
| Summary: | rubygem-staypuft: Failed actions in pcs status report after a successful deployment. | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||||||||
| Component: | openstack-foreman-installer | Assignee: | Jason Guiditta <jguiditt> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | yeylon <yeylon> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | unspecified | CC: | cwolfe, eglynn, fdinitto, lnatapov, mburns, morazi, rhos-maint, srevivo, yeylon | ||||||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||||||
| Target Release: | Installer | ||||||||||||
| Hardware: | x86_64 | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2016-04-04 15:45:33 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 1177026 | ||||||||||||
| Attachments: |
|
||||||||||||
Created attachment 969236 [details]
attaching logs from a host where pcs reports failure
Full output from pcs status:
[root@maca25400702875 ~]# pcs status
Cluster name: openstack
Last updated: Mon Dec 15 14:16:47 2014
Last change: Mon Dec 15 11:34:13 2014 via crmd on pcmk-maca25400702875
Stack: corosync
Current DC: pcmk-maca25400702875 (2) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
3 Nodes configured
124 Resources configured
Online: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Full list of resources:
ip-192.168.0.4 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
ip-192.168.0.3 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
ip-192.168.0.29 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
ip-192.168.0.31 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
ip-192.168.0.2 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
ip-192.168.0.32 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
ip-192.168.0.23 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
ip-192.168.0.24 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
ip-192.168.0.25 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
ip-192.168.0.36 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
Clone Set: memcached-clone [memcached]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: rabbitmq-server-clone [rabbitmq-server]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
ip-192.168.0.30 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
Clone Set: haproxy-clone [haproxy]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
ip-192.168.0.13 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
Master/Slave Set: galera-master [galera]
Masters: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
ip-192.168.0.26 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
ip-192.168.0.28 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
ip-192.168.0.27 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
Clone Set: openstack-keystone-clone [openstack-keystone]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
ip-192.168.0.14 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
Clone Set: fs-varlibglanceimages-clone [fs-varlibglanceimages]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
ip-192.168.0.16 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
ip-192.168.0.15 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-glance-api-clone [openstack-glance-api]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
ip-192.168.0.35 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
ip-192.168.0.33 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
ip-192.168.0.34 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-nova-api-clone [openstack-nova-api]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
ip-192.168.0.6 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
ip-192.168.0.5 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
ip-192.168.0.12 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
openstack-cinder-volume (systemd:openstack-cinder-volume): Started pcmk-maca25400702875
Clone Set: neutron-server-clone [neutron-server]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
ip-192.168.0.17 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
ip-192.168.0.18 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
ip-192.168.0.19 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
ip-192.168.0.20 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702876
ip-192.168.0.22 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702877
ip-192.168.0.21 (ocf::heartbeat:IPaddr2): Started pcmk-maca25400702875
Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Resource Group: neutron-agents
neutron-openvswitch-agent (systemd:neutron-openvswitch-agent): Started pcmk-maca25400702876
neutron-dhcp-agent (systemd:neutron-dhcp-agent): Started pcmk-maca25400702876
neutron-l3-agent (systemd:neutron-l3-agent): Started pcmk-maca25400702876
neutron-metadata-agent (systemd:neutron-metadata-agent): Started pcmk-maca25400702876
Clone Set: openstack-heat-api-clone [openstack-heat-api]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Resource Group: heat
openstack-heat-engine (systemd:openstack-heat-engine): Started pcmk-maca25400702877
Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: httpd-clone [httpd]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: mongod-clone [mongod]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: ceilometer-delay-clone [ceilometer-delay]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
openstack-ceilometer-central (systemd:openstack-ceilometer-central): Started pcmk-maca25400702875
Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator]
Started: [ pcmk-maca25400702875 pcmk-maca25400702876 pcmk-maca25400702877 ]
Failed actions:
openstack-ceilometer-central_monitor_30000 on pcmk-maca25400702875 'not running' (7): call=571, status=complete, last-rc-change='Mon Dec 15 13:50:33 2014', queued=0ms, exec=0ms
neutron-server_monitor_30000 on pcmk-maca25400702875 'OCF_PENDING' (196): call=527, status=complete, last-rc-change='Mon Dec 15 13:49:29 2014', queued=0ms, exec=0ms
PCSD Status:
pcmk-maca25400702876: Online
pcmk-maca25400702875: Online
pcmk-maca25400702877: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Created attachment 969276 [details]
logs/conf from a controller reported with issues
Created attachment 969278 [details]
Logs from the second controller
Created attachment 969280 [details]
Logs from the third controller
/var/log/ceilometer/central.log on c1 includes the following output (multiple times):
File Edit Options Buffers Tools Help
2014-12-15 11:20:12.409 23864 ERROR ceilometer.nova_client [-] The request you have made requires authentication. (HTTP 401)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client Traceback (most recent call last):
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/ceilometer/nova_client.py", line 51, in with_logging
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client return func(*args, **kwargs)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/ceilometer/nova_client.py", line 155, in instance_get_all
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client search_opts=search_opts)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/v1_1/servers.py", line 603, in list
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client return self._list("/servers%s%s" % (detail, query_string), "servers")
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 67, in _list
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client _resp, body = self.api.client.get(url)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 487, in get
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client return self._cs_request(url, 'GET', **kwargs)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 446, in _cs_request
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client self.authenticate()
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 586, in authenticate
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client auth_url = self._v2_auth(auth_url)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 677, in _v2_auth
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client return self._authenticate(url, body)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 690, in _authenticate
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client **kwargs)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 439, in _time_request
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client resp, body = self.request(url, method, **kwargs)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 433, in request
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client raise exceptions.from_response(resp, body, url, method)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client Unauthorized: The request you have made requires authentication. (HTTP 401)
2014-12-15 11:20:12.409 23864 TRACE ceilometer.nova_client
(and similar log lines for ceilometer.agent instead of ceilometer.nova_client).
to be continued...
*** Bug 1174784 has been marked as a duplicate of this bug. *** *** Bug 1174800 has been marked as a duplicate of this bug. *** Looking at the cN-logs tarballs attached, I noticed that the configured auth_url differs between the services, specifically both ceilometer & neutron use 192.168.0.26:35357, whereas nova uses 192.168.0.28:35357 and glance uses localhost:5000, e.g. $ find . -name "*.conf" | xargs grep auth_url | grep -v '#' ./etc/neutron/neutron.conf:nova_admin_auth_url =http://192.168.0.26:35357/v2.0 ./etc/ceilometer/ceilometer.conf:os_auth_url=http://192.168.0.26:35357/v2.0 ./etc/nova/nova.conf:admin_auth_url=http://192.168.0.28:35357/v2.0 ./etc/glance/glance-cache.conf:auth_url = http://localhost:5000/v2.0 Do we expect that spread of keystone addresses for this kind of deployment? Is the keystone instance addressable at 192.168.0.26 actually running cleanly? An older version is probably the issue with ceilometer. Looking at ceilometer.conf from an uploaded log: $ grep rabbit_host c1/etc/ceilometer/ceilometer.conf #rabbit_host=localhost rabbit_host=127.0.0.1 #rabbit_hosts=$rabbit_host:$rabbit_port rabbit_hosts=127.0.0.1:5672 Instead of 127.0.0.1, it should be a rabbit vip which was fixed in: https://bugzilla.redhat.com/show_bug.cgi?id=1173217 Regarding auth_url question, given haproxy.cfg that looks like: listen keystone-admin bind 192.168.0.28:35357 bind 192.168.0.27:35357 bind 192.168.0.26:35357 mode tcp option tcplog server pcmk-maca25400702876 192.168.0.8:35357 check inter 1s server pcmk-maca25400702875 192.168.0.7:35357 check inter 1s server pcmk-maca25400702877 192.168.0.10:35357 check inter 1s either the .26 or .28 url should work. The glance-cache.conf auth_url looks broken, however. (In reply to Crag Wolfe from comment #12) > > either the .26 or .28 url should work. The glance-cache.conf auth_url looks > broken, however. This part was fixed as part of the openstack-foreman-installer-3.0.8-1.el7ost build Regarding the neutron-server_monitor failed action -- pacemaker makes a number of attempts at starting neutron-server: $ grep 'Recover neutron-server' messages Dec 15 11:26:19 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:1 (Started pcmk-maca25400702876) Dec 15 11:32:00 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 11:48:07 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 11:48:10 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 12:49:34 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 13:49:32 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) Dec 15 13:49:35 maca25400702875 pengine[15268]: notice: LogActions: Recover neutron-server:0 (Started pcmk-maca25400702875) However, I'm not finding the smoking gun that caused pacemaker to log that. And there is evidence of neutron-server being restarted periodically: $ grep 'Config paste file' neutron/server.log 2014-12-15 11:08:04.147 3858 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:08:58.880 5028 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:25:21.637 1708 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:31:58.455 14468 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:32:07.814 14662 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:48:07.838 8659 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 11:48:15.858 8810 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 12:19:42.713 25450 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 12:49:25.953 6900 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 12:49:41.749 7483 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 13:19:33.439 21549 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 13:49:30.586 3604 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 13:49:38.705 3752 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini 2014-12-15 14:19:35.128 18144 INFO neutron.common.config [-] Config paste file: /usr/share/neutron/api-paste.ini But it does not appear that neutron-server process was actually dead before these restarts took place. I am not seeing any of these restarts or failed actions in my development environment, making this difficult to debug. Going back to the well and seeing if David has any additional insights (thanks David!). Did you all increase the start timeout for neutron-server yet? I saw something similar to this last night and recommended setting 'op start timeout=90s' for neutron-server. pcs resource create neutron-server systemd:neutron-server op start timeout=90s --clone or to update a running deployment pcs resource op add neutron-server start timeout=90s -- David (In reply to David Vossel from comment #15) > Did you all increase the start timeout for neutron-server yet? I saw > something similar to this last night and recommended setting 'op start > timeout=90s' for neutron-server. > > pcs resource create neutron-server systemd:neutron-server op start > timeout=90s --clone > > or to update a running deployment > > pcs resource op add neutron-server start timeout=90s > > -- David We will be adding this as part of our effort to align with the newly released HA ref arch for OSP 6. Fabio, the arch has a timeout of 60 seconds, and both this BZ and https://bugzilla.redhat.com/show_bug.cgi?id=1175525 suggest 90, which should we be using? (In reply to Jason Guiditta from comment #16) > (In reply to David Vossel from comment #15) > > Did you all increase the start timeout for neutron-server yet? I saw > > something similar to this last night and recommended setting 'op start > > timeout=90s' for neutron-server. > > > > pcs resource create neutron-server systemd:neutron-server op start > > timeout=90s --clone > > > > or to update a running deployment > > > > pcs resource op add neutron-server start timeout=90s > > > > -- David > > We will be adding this as part of our effort to align with the newly > released HA ref arch for OSP 6. Fabio, the arch has a timeout of 60 > seconds, and both this BZ and > https://bugzilla.redhat.com/show_bug.cgi?id=1175525 suggest 90, which should > we be using? 90 seconds is fine. I'll fix the doc. *** Bug 1179493 has been marked as a duplicate of this bug. *** |
rubygem-staypuft: Failed actions in pscs status report after a successful deployment. Environment: openstack-foreman-installer-3.0.5-1.el7ost.noarch ruby193-rubygem-staypuft-0.5.5-1.el7ost.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el7ost.noarch rhel-osp-installer-client-0.5.2-2.el7ost.noarch openstack-puppet-modules-2014.2.7-2.el7ost.noarch rhel-osp-installer-0.5.2-2.el7ost.noarch Steps to reproduce: 1. Install rhel-osp-installer 2. Complete a successful deployment of HAneutron (3 controllers + 1 compute) 3. Check if there are failed actions in the output from 'pcs status': Result: Failed actions: openstack-ceilometer-central_monitor_30000 on pcmk-maca25400702875 'not running' (7): call=571, status=complete, last-rc-change='Mon Dec 15 13:50:33 2014', queued=0ms, exec=0ms neutron-server_monitor_30000 on pcmk-maca25400702875 'OCF_PENDING' (196): call=527, status=complete, last-rc-change='Mon Dec 15 13:49:29 2014', queued=0ms, exec=0ms Expected result: The 'Failed actions' section shouldn't report these.