Bug 1104196 - Fence_ipmilan not working with A4
Summary: Fence_ipmilan not working with A4
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-foreman-installer
Version: 4.0
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: Installer
Assignee: Crag Wolfe
QA Contact: Ami Jeain
URL:
Whiteboard:
Depends On:
Blocks: 1040649
TreeView+ depends on / blocked
 
Reported: 2014-06-03 13:28 UTC by Steve Reichard
Modified: 2015-05-06 13:33 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-06 13:33:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
parameters (30.29 KB, text/plain)
2014-06-03 13:29 UTC, Steve Reichard
no flags Details
puppet output from first pass (427.44 KB, application/octet-stream)
2014-06-03 13:29 UTC, Steve Reichard
no flags Details
puppet output from second pass (1.01 MB, application/octet-stream)
2014-06-03 13:30 UTC, Steve Reichard
no flags Details

Description Steve Reichard 2014-06-03 13:28:10 UTC
Description of problem:


After A4 was pushed I attempted to deploy an HA configuration (I was able to do this with engineering builds prior to A4)

This config is blocked if I use ipmi fencing.

1: On the first past, the fences are stopped - 


[root@ospha1 ~]# pcs status
Cluster name: openstack
Last updated: Mon Jun  2 16:29:33 2014
Last change: Mon Jun  2 16:25:52 2014 via cibadmin on 10.19.139.31
Stack: cman
Current DC: 10.19.139.32 - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
3 Nodes configured
17 Resources configured


Online: [ 10.19.139.31 10.19.139.32 10.19.139.33 ]

Full list of resources:

 stonith-ipmilan-10.19.143.62	(stonith:fence_ipmilan):	Stopped 
 stonith-ipmilan-10.19.143.61	(stonith:fence_ipmilan):	Stopped 
 Resource Group: db
     fs-varlibmysql	(ocf::heartbeat:Filesystem):	Started 10.19.139.33 
     mysql-ostk-mysql	(ocf::heartbeat:mysql):	Started 10.19.139.33 
 stonith-ipmilan-10.19.143.63	(stonith:fence_ipmilan):	Stopped 
 Clone Set: lsb-memcached-clone [lsb-memcached]
     Started: [ 10.19.139.31 10.19.139.32 10.19.139.33 ]
 ip-10.19.139.2	(ocf::heartbeat:IPaddr2):	Started 10.19.139.31 
 ip-10.19.139.3	(ocf::heartbeat:IPaddr2):	Started 10.19.139.32 
 Clone Set: lsb-qpidd-clone [lsb-qpidd]
     Started: [ 10.19.139.31 10.19.139.32 10.19.139.33 ]
 ip-10.19.139.18	(ocf::heartbeat:IPaddr2):	Started 10.19.139.31 
 Clone Set: lsb-haproxy-clone [lsb-haproxy]
     Started: [ 10.19.139.31 10.19.139.32 10.19.139.33 ]

Failed actions:
    stonith-ipmilan-10.19.143.61_start_0 on 10.19.139.31 'unknown error' (1): call=23, status=Error, last-rc-change='Mon Jun  2 16:24:52 2014', queued=1080ms, exec=0ms
    stonith-ipmilan-10.19.143.62_start_0 on 10.19.139.31 'unknown error' (1): call=8, status=Error, last-rc-change='Mon Jun  2 16:24:50 2014', queued=2076ms, exec=0ms
    stonith-ipmilan-10.19.143.63_start_0 on 10.19.139.31 'unknown error' (1): call=34, status=Error, last-rc-change='Mon Jun  2 16:25:04 2014', queued=1155ms, exec=0ms
    stonith-ipmilan-10.19.143.61_start_0 on 10.19.139.33 'unknown error' (1): call=35, status=Error, last-rc-change='Mon Jun  2 16:25:07 2014', queued=1025ms, exec=0ms
    stonith-ipmilan-10.19.143.62_start_0 on 10.19.139.33 'unknown error' (1): call=28, status=Error, last-rc-change='Mon Jun  2 16:25:04 2014', queued=2024ms, exec=0ms
    stonith-ipmilan-10.19.143.63_start_0 on 10.19.139.33 'unknown error' (1): call=41, status=Error, last-rc-change='Mon Jun  2 16:25:08 2014', queued=1055ms, exec=0ms
    stonith-ipmilan-10.19.143.61_start_0 on 10.19.139.32 'unknown error' (1): call=29, status=Error, last-rc-change='Mon Jun  2 16:25:04 2014', queued=1054ms, exec=0ms
    stonith-ipmilan-10.19.143.62_start_0 on 10.19.139.32 'unknown error' (1): call=18, status=Error, last-rc-change='Mon Jun  2 16:24:53 2014', queued=10274ms, exec=0ms
    stonith-ipmilan-10.19.143.63_start_0 on 10.19.139.32 'unknown error' (1): call=35, status=Error, last-rc-change='Mon Jun  2 16:25:07 2014', queued=1065ms, exec=0ms



[root@ospha1 ~]#



On trying a second pass there is an error when it appears it attempts to add the fences again.




Notice: /Stage[main]/Quickstack::Pacemaker::Common/Exec[pcs-resource-default]/returns: executed successfully
Debug: /Stage[main]/Quickstack::Pacemaker::Common/Exec[pcs-resource-default]: The container Class[Quickstack::Pacemaker::Common] will propagate my refresh event
Debug: Exec[Enable STONITH](provider=posix): Executing check '/usr/sbin/pcs property show stonith-enabled | grep 'stonith-enabled: false''
Debug: Executing '/usr/sbin/pcs property show stonith-enabled | grep 'stonith-enabled: false''
Debug: /Stage[main]/Pacemaker::Stonith/Exec[Enable STONITH]/onlyif: Error: unable to get crm_config
Debug: /Stage[main]/Pacemaker::Stonith/Exec[Enable STONITH]/onlyif: Call cib_query failed (-62): Timer expired
Debug: Exec[Creating stonith::ipmilan](provider=posix): Executing check '/usr/sbin/pcs stonith show stonith-ipmilan-10.19.143.61 > /dev/null 2>&1'
Debug: Executing '/usr/sbin/pcs stonith show stonith-ipmilan-10.19.143.61 > /dev/null 2>&1'
Debug: Exec[Creating stonith::ipmilan](provider=posix): Executing '/usr/sbin/pcs stonith create stonith-ipmilan-10.19.143.61 fence_ipmilan pcmk_host_list="$(/usr/sbin/crm_node -n)" ipaddr=10.19.143.61 login="root" passwd="4score&7" lanplus="" op monitor interval=60s'
Debug: Executing '/usr/sbin/pcs stonith create stonith-ipmilan-10.19.143.61 fence_ipmilan pcmk_host_list="$(/usr/sbin/crm_node -n)" ipaddr=10.19.143.61 login="root" passwd="4score&7" lanplus="" op monitor interval=60s'
Notice: /Stage[main]/Quickstack::Pacemaker::Stonith::Ipmilan/Exec[Creating stonith::ipmilan]/returns: Error: Unable to create resource/fence device
Notice: /Stage[main]/Quickstack::Pacemaker::Stonith::Ipmilan/Exec[Creating stonith::ipmilan]/returns: Call cib_create failed (-62): Timer expired
Error: /usr/sbin/pcs stonith create stonith-ipmilan-10.19.143.61 fence_ipmilan pcmk_host_list="$(/usr/sbin/crm_node -n)" ipaddr=10.19.143.61 login="root" passwd="4score&7" lanplus="" op monitor interval=60s returned 1 instead of one of [0]
/usr/lib/ruby/site_ruby/1.8/puppet/util/errors.rb:96:in `fail'
/usr/lib/ruby/site_ruby/1.8/puppet/type/exec.rb:125:in `sync'




I will attach the puppet output of the different passes of the node selected as cluster_controller_ip and the yaml of the parameters used.



I will try disable fencing to see if I cam make any further progress


Version-Release number of selected component (if applicable):


[root@ospha-foreman ~]# yum list installed | grep -i -e foreman -e puppet
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
foreman.noarch                    1.3.0.4-1.el6sat   @rhel-x86_64-server-6-ost-4
foreman-installer.noarch          1:1.3.0-1.el6sat   @rhel-x86_64-server-6-ost-4
foreman-mysql.noarch              1.3.0.4-1.el6sat   @rhel-x86_64-server-6-ost-4
foreman-mysql2.noarch             1.3.0.4-1.el6sat   @rhel-x86_64-server-6-ost-4
foreman-proxy.noarch              1.3.0-3.el6sat     @rhel-x86_64-server-6-ost-4
foreman-selinux.noarch            1.3.0-1.el6sat     @rhel-x86_64-server-6-ost-4
openstack-foreman-installer.noarch
openstack-puppet-modules.noarch   2013.2-9.1.el6ost  @rhel-x86_64-server-6-ost-4
puppet.noarch                     3.2.4-3.el6_5      @rhel-x86_64-server-6-ost-4
puppet-server.noarch              3.2.4-3.el6_5      @rhel-x86_64-server-6-ost-4
rubygem-foreman_api.noarch        0.1.6-1.el6sat     @rhel-x86_64-server-6-ost-4
[root@ospha-foreman ~]# 





How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Steve Reichard 2014-06-03 13:29:23 UTC
Created attachment 901789 [details]
parameters

Comment 2 Steve Reichard 2014-06-03 13:29:53 UTC
Created attachment 901790 [details]
puppet output from first pass

Comment 3 Steve Reichard 2014-06-03 13:30:27 UTC
Created attachment 901791 [details]
puppet output from second pass

Comment 4 jliberma@redhat.com 2014-06-05 02:32:59 UTC
Steve -- it worked for me, remember to update the IP address from 143 -> 139 and the root password from old to new after our lab move.

I also needed to update my VIPs as they were on the lab network. (10.16->10.19)

Jacob

Comment 5 Steve Reichard 2014-06-06 10:26:58 UTC
Jacob - Foreman deployed fencing work for you?

That is strange since Jay knows about the issue.

Also all management processors are on the 143 (not 139) addresses space.

I did update my all my VIP.

I ran into the problem all last week before the password was updated.

Comment 6 jliberma@redhat.com 2014-06-06 13:35:57 UTC
I did not use Foreman.  

I could not tell if this only applied to Foreman or A4 generally.

I tested A4 without Foreman to see if that worked.

Comment 7 Jason Guiditta 2014-06-13 19:10:39 UTC
Crag, I believe this is fixed in OSP5, right?  Maybe you could add the commits where you fixed it to this BZ, and then we can backport when time allows for next OSP4 release?

Comment 8 Crag Wolfe 2014-06-13 23:11:47 UTC
I believe the root of the problem was that the version of pacemaker used here errored out on a call to "pcs stonith show ...".  I have confirmed this is not a problem for el7.  I also did not see it previously on el6 -- need to confirm the version of pacemaker in this bug "1.1.10-14.el6_5.3-368c726" needs to be supported and if so we can add puppet code to do either "pcs resource show" vs. "pcs stonith show" at run-time based on the version of pacemaker.

If we have an environment now that exhibits this error, we can send a crm_report to Fabio, David and Chris and ask them to narrow down what versions of pacemaker this pertains to.  (Note, email sent outside of bug discussing this on 6/9 and 6/10).

Comment 9 Mike Orazi 2014-09-12 17:36:54 UTC
Let's reverify that this is no longer reproducible.


Note You need to log in before you can comment on or make changes to this bug.