Bug 1118464

Summary: pacemaker not restarting on reboot after o-f-i install
Product: Red Hat OpenStack Reporter: Steve Reichard <sreichar>
Component: openstack-foreman-installerAssignee: Jason Guiditta <jguiditt>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: high    
Version: 5.0 (RHEL 7)CC: acathrow, dcbw, hbrock, mburns, morazi, rhos-maint, sreichar, yeylon
Target Milestone: gaKeywords: TestOnly
Target Release: Installer   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1144062 (view as bug list) Environment:
Last Closed: 2014-08-21 18:05:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1121650    
Bug Blocks: 1144062    

Description Steve Reichard 2014-07-10 19:18:15 UTC
Description of problem:

I deployed 3 nodes in a HA nova network cluster using o-f-i/astapor.

When I rebooted a node, the cluster did not start up.

Here you can see a pcs status, reboot, then another attempted pcs status.


[root@ospha2 ~]# pcs status
Cluster name: openstack
Last updated: Thu Jul 10 14:38:34 2014
Last change: Thu Jul 10 14:37:06 2014 via crmd on ospha1.cloud.lab.eng.bos.redhat.com
Stack: corosync
Current DC: ospha3.cloud.lab.eng.bos.redhat.com (3) - partition with quorum
Version: 1.1.10-31.el7_0-368c726
3 Nodes configured
74 Resources configured


Online: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]

Full list of resources:

 stonith-ipmilan-10.19.143.63	(stonith:fence_ipmilan):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 stonith-ipmilan-10.19.143.62	(stonith:fence_ipmilan):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 stonith-ipmilan-10.19.143.61	(stonith:fence_ipmilan):	Started ospha2.cloud.lab.eng.bos.redhat.com 
 ip-10.19.139.18	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 ip-10.19.139.3	(ocf::heartbeat:IPaddr2):	Started ospha2.cloud.lab.eng.bos.redhat.com 
 ip-10.19.139.19	(ocf::heartbeat:IPaddr2):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 Clone Set: memcached-clone [memcached]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: rabbitmq-server-clone [rabbitmq-server]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: haproxy-clone [haproxy]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.2	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 Clone Set: mysqld-clone [mysqld]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.4	(ocf::heartbeat:IPaddr2):	Started ospha2.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-keystone-clone [openstack-keystone]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: fs-varlibglanceimages-clone [fs-varlibglanceimages]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.5	(ocf::heartbeat:IPaddr2):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-glance-api-clone [openstack-glance-api]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.7	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-nova-api-clone [openstack-nova-api]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.6	(ocf::heartbeat:IPaddr2):	Started ospha2.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-cinder-volume-clone [openstack-cinder-volume]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.10	(ocf::heartbeat:IPaddr2):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 ip-10.19.139.17	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-heat-api-clone [openstack-heat-api]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Resource Group: heat
     openstack-heat-engine	(systemd:openstack-heat-engine):	Started ospha2.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
 Clone Set: httpd-clone [httpd]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha2.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]

PCSD Status:
  10.19.139.31: Online
  10.19.139.32: Online
  10.19.139.33: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@ospha2 ~]# reboot
Connection to ospha2 closed by remote host.
Connection to ospha2 closed.
[sreichar@se-users ~]$  ssh -Y root@ospha2
Last login: Thu Jul 10 10:37:52 2014 from se-users.cloud.lab.eng.bos.redhat.com
Kickstarted on 2014-07-08
[root@ospha2 ~]# pcs status
Error: cluster is not currently running on this node
[root@ospha2 ~]# 

[root@ospha2 ~]# pcs status
Error: cluster is not currently running on this node
[root@ospha2 ~]# ps -ef | grep -e pace -e coro -e pcs
root      1546     1  0 14:42 ?        00:00:00 /bin/sh /usr/lib/pcsd/pcsd start
root      1584  1546  0 14:42 ?        00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
root      1586  1584  0 14:42 ?        00:00:00 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
root      8141  7282  0 14:47 pts/0    00:00:00 grep --color=auto -e pace -e coro -e pcs
[root@ospha2 ~]#


I've tarred up /var/log/ and will attach






Version-Release number of selected component (if applicable):

[root@ospha-inst manifests]# yum list installed | grep -e foreman -e puppet 
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
foreman.noarch                      1.6.0.15-1.el6sat  @OpenStack-Foreman-Puddle
foreman-installer.noarch            1:1.5.0-0.4.RC2.el6ost
foreman-mysql2.noarch               1.6.0.15-1.el6sat  @OpenStack-Foreman-Puddle
foreman-proxy.noarch                1.6.0.8-1.el6sat   @OpenStack-Foreman-Puddle
foreman-selinux.noarch              1.6.0-2.el6sat     @OpenStack-Foreman-Puddle
openstack-foreman-installer.noarch  2.0.12-2.el6ost    @/openstack-foreman-installer-2.0.12-2.el6ost.noarch
openstack-puppet-modules.noarch     2014.1-18.2.el7ost @/openstack-puppet-modules-2014.1-18.2.el7ost.noarch
puppet.noarch                       3.6.2-1.el6        @OpenStack-Foreman-Puddle
puppet-server.noarch                3.6.2-1.el6        @OpenStack-Foreman-Puddle
ruby193-rubygem-foreman_openstack_simplify.noarch
rubygem-foreman_api.noarch          0.1.11-4.el6sat    @OpenStack-Foreman-Puddle
rubygem-hammer_cli_foreman.noarch   0.1.0-6.el6sat     @OpenStack-Foreman-Puddle
rubygem-hammer_cli_foreman-doc.noarch
[root@ospha-inst manifests]# 



How reproducible:


Any node I rebooted.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Steve Reichard 2014-07-10 19:30:39 UTC
Could not attach tarball too big.

Using an url for now, but it may disappear at any time.

http://refarch.cloud.lab.eng.bos.redhat.com/pub/tmp/ospha2-var-log.tgz



Also since I only rebooted ospha2 I decided to give tyhe output of pcs status from the other 2 cluster members (Surprised by the result, why are teh results different)


[root@ospha1 ~]# pcs status
Cluster name: openstack
Last updated: Thu Jul 10 15:28:37 2014
Last change: Thu Jul 10 14:37:06 2014 via crmd on ospha1.cloud.lab.eng.bos.redhat.com
Stack: corosync
Current DC: ospha3.cloud.lab.eng.bos.redhat.com (3) - partition with quorum
Version: 1.1.10-31.el7_0-368c726
3 Nodes configured
74 Resources configured


Online: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
OFFLINE: [ ospha2.cloud.lab.eng.bos.redhat.com ]

Full list of resources:

 stonith-ipmilan-10.19.143.63	(stonith:fence_ipmilan):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 stonith-ipmilan-10.19.143.62	(stonith:fence_ipmilan):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 stonith-ipmilan-10.19.143.61	(stonith:fence_ipmilan):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 ip-10.19.139.18	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 ip-10.19.139.3	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 ip-10.19.139.19	(ocf::heartbeat:IPaddr2):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 Clone Set: memcached-clone [memcached]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: rabbitmq-server-clone [rabbitmq-server]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: haproxy-clone [haproxy]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.2	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 Clone Set: mysqld-clone [mysqld]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.4	(ocf::heartbeat:IPaddr2):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-keystone-clone [openstack-keystone]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: fs-varlibglanceimages-clone [fs-varlibglanceimages]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.5	(ocf::heartbeat:IPaddr2):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-glance-api-clone [openstack-glance-api]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.7	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-nova-api-clone [openstack-nova-api]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.6	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-cinder-volume-clone [openstack-cinder-volume]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 ip-10.19.139.10	(ocf::heartbeat:IPaddr2):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 ip-10.19.139.17	(ocf::heartbeat:IPaddr2):	Started ospha1.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-heat-api-clone [openstack-heat-api]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Resource Group: heat
     openstack-heat-engine	(systemd:openstack-heat-engine):	Started ospha3.cloud.lab.eng.bos.redhat.com 
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]
 Clone Set: httpd-clone [httpd]
     Started: [ ospha1.cloud.lab.eng.bos.redhat.com ospha3.cloud.lab.eng.bos.redhat.com ]
     Stopped: [ ospha2.cloud.lab.eng.bos.redhat.com ]

PCSD Status:
  10.19.139.31: Online
  10.19.139.32: Online
  10.19.139.33: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@ospha1 ~]# 



[root@ospha3 ~]# pcs cluster status
Cluster Status:
 Last updated: Thu Jul 10 15:28:13 2014
 Last change: Thu Jul 10 14:37:06 2014 via crmd on ospha1.cloud.lab.eng.bos.redhat.com
 Stack: corosync
 Current DC: ospha3.cloud.lab.eng.bos.redhat.com (3) - partition with quorum
 Version: 1.1.10-31.el7_0-368c726
 3 Nodes configured
 74 Resources configured

PCSD Status:
  10.19.139.31: Online
  10.19.139.32: Online
  10.19.139.33: Online
[root@ospha3 ~]#

Comment 5 Andrew Cathrow 2014-07-30 14:16:37 UTC
updating the dependency to be on the 7.0.z clone

Comment 7 Mike Burns 2014-08-12 18:41:04 UTC
Steve, any chance this still reproduces with staypuft and the 7.0.z fix?

Comment 8 Mike Burns 2014-08-12 18:42:15 UTC
since 7.0.z bug is done, setting this to testonly for further testing.

Comment 9 Steve Reichard 2014-08-12 19:34:46 UTC
I've changed my testbed to test another config, I will not be able to test until after I return from PTO (8/25?)

Comment 11 Jason Guiditta 2014-08-13 21:12:48 UTC
I just rebooted my cluster to see what happens, and it seems fine:

$ pcs status
Cluster name: openstack
Last updated: Wed Aug 13 14:08:59 2014
Last change: Wed Aug 13 14:08:47 2014 via cibadmin on c1a2.example.com
Stack: corosync
Current DC: c1a1.example.com (1) - partition with quorum
Version: 1.1.10-32.el7_0-368c726
3 Nodes configured
101 Resources configured


Online: [ c1a1.example.com c1a2.example.com c1a3.example.com ]

...

PCSD Status:
  192.168.200.10: Online
  192.168.200.20: Online
  192.168.200.30: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


my nodes appear to have:
systemd-208-11.el7_0.2.x86_64


I am calling this verified, unless anyone raises an issue

Comment 12 errata-xmlrpc 2014-08-21 18:05:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1090.html