Description of problem: I deployed 3 nodes in a HA nova network cluster using o-f-i/astapor. When I rebooted a node, the cluster did not start up. Here you can see a pcs status, reboot, then another attempted pcs status. [root@ospha2 ~]# pcs status Cluster name: openstack Last updated: Thu Jul 10 14:38:34 2014 Last change: Thu Jul 10 14:37:06 2014 via crmd on ospha1.cloud.lab.eng.bos.redhat.com Stack: corosync Current DC: ospha3.cloud.lab.eng.bos.redhat.com (3) - partition with quorum Version: 1.1.10-31.el7_0-368c726 3 Nodes configured 74 Resources configured Online: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Full list of resources: stonith-ipmilan-10.19.143.63 (stonith:fence_ipmilan): Started ospha1.cloud.lab.eng.bos.redhat.com stonith-ipmilan-10.19.143.62 (stonith:fence_ipmilan): Started ospha3.cloud.lab.eng.bos.redhat.com stonith-ipmilan-10.19.143.61 (stonith:fence_ipmilan): Started ospha2.cloud.lab.eng.bos.redhat.com ip-10.19.139.18 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com ip-10.19.139.3 (ocf::heartbeat:IPaddr2): Started ospha2.cloud.lab.eng.bos.redhat.com ip-10.19.139.19 (ocf::heartbeat:IPaddr2): Started ospha3.cloud.lab.eng.bos.redhat.com Clone Set: memcached-clone [memcached] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: rabbitmq-server-clone [rabbitmq-server] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: haproxy-clone [haproxy] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.2 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com Clone Set: mysqld-clone [mysqld] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.4 (ocf::heartbeat:IPaddr2): Started ospha2.cloud.lab.eng.bos.redhat.com Clone Set: openstack-keystone-clone [openstack-keystone] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: fs-varlibglanceimages-clone [fs-varlibglanceimages] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.5 (ocf::heartbeat:IPaddr2): Started ospha3.cloud.lab.eng.bos.redhat.com Clone Set: openstack-glance-registry-clone [openstack-glance-registry] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-glance-api-clone [openstack-glance-api] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.7 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-nova-api-clone [openstack-nova-api] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.6 (ocf::heartbeat:IPaddr2): Started ospha2.cloud.lab.eng.bos.redhat.com Clone Set: openstack-cinder-api-clone [openstack-cinder-api] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-cinder-volume-clone [openstack-cinder-volume] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.10 (ocf::heartbeat:IPaddr2): Started ospha3.cloud.lab.eng.bos.redhat.com ip-10.19.139.17 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com Clone Set: openstack-heat-api-clone [openstack-heat-api] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Resource Group: heat openstack-heat-engine (systemd:openstack-heat-engine): Started ospha2.cloud.lab.eng.bos.redhat.com Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Clone Set: httpd-clone [httpd] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] PCSD Status: 10.19.139.31: Online 10.19.139.32: Online 10.19.139.33: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@ospha2 ~]# reboot Connection to ospha2 closed by remote host. Connection to ospha2 closed. [sreichar@se-users ~]$ ssh -Y root@ospha2 Last login: Thu Jul 10 10:37:52 2014 from se-users.cloud.lab.eng.bos.redhat.com Kickstarted on 2014-07-08 [root@ospha2 ~]# pcs status Error: cluster is not currently running on this node [root@ospha2 ~]# [root@ospha2 ~]# pcs status Error: cluster is not currently running on this node [root@ospha2 ~]# ps -ef | grep -e pace -e coro -e pcs root 1546 1 0 14:42 ? 00:00:00 /bin/sh /usr/lib/pcsd/pcsd start root 1584 1546 0 14:42 ? 00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb root 1586 1584 0 14:42 ? 00:00:00 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb root 8141 7282 0 14:47 pts/0 00:00:00 grep --color=auto -e pace -e coro -e pcs [root@ospha2 ~]# I've tarred up /var/log/ and will attach Version-Release number of selected component (if applicable): [root@ospha-inst manifests]# yum list installed | grep -e foreman -e puppet This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. foreman.noarch 1.6.0.15-1.el6sat @OpenStack-Foreman-Puddle foreman-installer.noarch 1:1.5.0-0.4.RC2.el6ost foreman-mysql2.noarch 1.6.0.15-1.el6sat @OpenStack-Foreman-Puddle foreman-proxy.noarch 1.6.0.8-1.el6sat @OpenStack-Foreman-Puddle foreman-selinux.noarch 1.6.0-2.el6sat @OpenStack-Foreman-Puddle openstack-foreman-installer.noarch 2.0.12-2.el6ost @/openstack-foreman-installer-2.0.12-2.el6ost.noarch openstack-puppet-modules.noarch 2014.1-18.2.el7ost @/openstack-puppet-modules-2014.1-18.2.el7ost.noarch puppet.noarch 3.6.2-1.el6 @OpenStack-Foreman-Puddle puppet-server.noarch 3.6.2-1.el6 @OpenStack-Foreman-Puddle ruby193-rubygem-foreman_openstack_simplify.noarch rubygem-foreman_api.noarch 0.1.11-4.el6sat @OpenStack-Foreman-Puddle rubygem-hammer_cli_foreman.noarch 0.1.0-6.el6sat @OpenStack-Foreman-Puddle rubygem-hammer_cli_foreman-doc.noarch [root@ospha-inst manifests]# How reproducible: Any node I rebooted. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Could not attach tarball too big. Using an url for now, but it may disappear at any time. http://refarch.cloud.lab.eng.bos.redhat.com/pub/tmp/ospha2-var-log.tgz Also since I only rebooted ospha2 I decided to give tyhe output of pcs status from the other 2 cluster members (Surprised by the result, why are teh results different) [root@ospha1 ~]# pcs status Cluster name: openstack Last updated: Thu Jul 10 15:28:37 2014 Last change: Thu Jul 10 14:37:06 2014 via crmd on ospha1.cloud.lab.eng.bos.redhat.com Stack: corosync Current DC: ospha3.cloud.lab.eng.bos.redhat.com (3) - partition with quorum Version: 1.1.10-31.el7_0-368c726 3 Nodes configured 74 Resources configured Online: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] OFFLINE: [ ospha2.cloud.lab.eng.bos.redhat.com ] Full list of resources: stonith-ipmilan-10.19.143.63 (stonith:fence_ipmilan): Started ospha1.cloud.lab.eng.bos.redhat.com stonith-ipmilan-10.19.143.62 (stonith:fence_ipmilan): Started ospha3.cloud.lab.eng.bos.redhat.com stonith-ipmilan-10.19.143.61 (stonith:fence_ipmilan): Started ospha3.cloud.lab.eng.bos.redhat.com ip-10.19.139.18 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com ip-10.19.139.3 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com ip-10.19.139.19 (ocf::heartbeat:IPaddr2): Started ospha3.cloud.lab.eng.bos.redhat.com Clone Set: memcached-clone [memcached] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: rabbitmq-server-clone [rabbitmq-server] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: haproxy-clone [haproxy] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.2 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com Clone Set: mysqld-clone [mysqld] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.4 (ocf::heartbeat:IPaddr2): Started ospha3.cloud.lab.eng.bos.redhat.com Clone Set: openstack-keystone-clone [openstack-keystone] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: fs-varlibglanceimages-clone [fs-varlibglanceimages] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.5 (ocf::heartbeat:IPaddr2): Started ospha3.cloud.lab.eng.bos.redhat.com Clone Set: openstack-glance-registry-clone [openstack-glance-registry] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-glance-api-clone [openstack-glance-api] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.7 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-nova-api-clone [openstack-nova-api] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.6 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com Clone Set: openstack-cinder-api-clone [openstack-cinder-api] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-cinder-volume-clone [openstack-cinder-volume] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] ip-10.19.139.10 (ocf::heartbeat:IPaddr2): Started ospha3.cloud.lab.eng.bos.redhat.com ip-10.19.139.17 (ocf::heartbeat:IPaddr2): Started ospha1.cloud.lab.eng.bos.redhat.com Clone Set: openstack-heat-api-clone [openstack-heat-api] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Resource Group: heat openstack-heat-engine (systemd:openstack-heat-engine): Started ospha3.cloud.lab.eng.bos.redhat.com Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] Clone Set: httpd-clone [httpd] Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ] Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ] PCSD Status: 10.19.139.31: Online 10.19.139.32: Online 10.19.139.33: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@ospha1 ~]# [root@ospha3 ~]# pcs cluster status Cluster Status: Last updated: Thu Jul 10 15:28:13 2014 Last change: Thu Jul 10 14:37:06 2014 via crmd on ospha1.cloud.lab.eng.bos.redhat.com Stack: corosync Current DC: ospha3.cloud.lab.eng.bos.redhat.com (3) - partition with quorum Version: 1.1.10-31.el7_0-368c726 3 Nodes configured 74 Resources configured PCSD Status: 10.19.139.31: Online 10.19.139.32: Online 10.19.139.33: Online [root@ospha3 ~]#
updating the dependency to be on the 7.0.z clone
Steve, any chance this still reproduces with staypuft and the 7.0.z fix?
since 7.0.z bug is done, setting this to testonly for further testing.
I've changed my testbed to test another config, I will not be able to test until after I return from PTO (8/25?)
I just rebooted my cluster to see what happens, and it seems fine: $ pcs status Cluster name: openstack Last updated: Wed Aug 13 14:08:59 2014 Last change: Wed Aug 13 14:08:47 2014 via cibadmin on c1a2.example.com Stack: corosync Current DC: c1a1.example.com (1) - partition with quorum Version: 1.1.10-32.el7_0-368c726 3 Nodes configured 101 Resources configured Online: [ c1a1.example.com c1a2.example.com c1a3.example.com ] ... PCSD Status: 192.168.200.10: Online 192.168.200.20: Online 192.168.200.30: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled my nodes appear to have: systemd-208-11.el7_0.2.x86_64 I am calling this verified, unless anyone raises an issue
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1090.html