Bug 1155997 - unable to deploy fencing in ha-neutron
Summary: unable to deploy fencing in ha-neutron
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rubygem-staypuft
Version: 5.0 (RHEL 6)
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: z2
: Installer
Assignee: Scott Seago
QA Contact: Asaf Hirshberg
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-23 11:22 UTC by Asaf Hirshberg
Modified: 2014-11-04 17:03 UTC (History)
10 users (show)

Fixed In Version: ruby193-rubygem-staypuft-0.4.14-1.el6ost
Doc Type: Bug Fix
Doc Text:
Cause: The addition of the the fencing UI caused reboots to be attempted using the BMC instead of the discovery image's foreman-proxy. The BMC is not configured to work correctly and may not even be accessible from the Installer. Consequence: Hosts would not reboot when a deployment was started. Fix: Hosts that are in discovery mode are now always rebooted using the discovery image's foreman proxy Result: Hosts reboot correctly during deployments.
Clone Of:
Environment:
Last Closed: 2014-11-04 17:03:52 UTC
Target Upstream Version:


Attachments (Terms of Use)
The requested output from staypuft. (108.55 KB, text/plain)
2014-10-23 16:52 UTC, Alexander Chuzhoy
no flags Details
logs from the staypuft machine. (899.26 KB, application/x-gzip)
2014-10-31 22:19 UTC, Alexander Chuzhoy
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1800 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Installer Bug Fix Advisory 2014-11-04 22:00:19 UTC

Description Asaf Hirshberg 2014-10-23 11:22:16 UTC
Description of problem:
deployment failed at 5% after configuring the controllers to use fencing.
nothing have been ran on the server after I pressed deploy. the bar turned red immediately and stuck at 5%.


Version-Release number of selected component (if applicable):
rhel-osp-installer-0.4.5-1.el6ost.noarch


How reproducible:
4/4

Steps to Reproduce:
1.create an ha deployment and assign 3 controllers
2.configure the networks of the controllers
3.enable fencing and configure it
4. press deploy

Actual results:
deployment stuck at 5%

Expected results:
deployment should running normally


Additional info:
setup information:
rhel 6.6 staypuft server on bm
4 bare metal server: 3 controllers and one compute

Comment 3 Lukas Zapletal 2014-10-23 12:21:24 UTC
Please provide me with sosreport from the Foreman node and this output from the discovered node (use ssh and kernel parameters to get access):

echo "* JOURNAL (last 500 lines) *"
journalctl | tail -n500

echo "* PROXY LOG *"
tail -n 200 /var/log/foreman-proxy/proxy.log

echo "* KERNEL COMMAND LINE *"
cat /proc/cmdline

echo "* FACTER *"
FACTERLIB=/usr/share/fdi/facts/ facter

echo "* IP ADDRESSES *"
ip addr show

echo "* ROUTES *"
ip route

echo "* DHCP LEASES *"
cat /var/lib/NetworkManager/dhclient-*.lease

echo "* DNS *"
cat /etc/resolv.conf

Comment 4 Lukas Zapletal 2014-10-23 12:46:33 UTC
Here are new instructions how to report bugs for discovery image v1.x:

echo "* KERNEL LOG *"
dmesg
tail -n 200 /var/log/messages

echo "* PROXY LOG *"
tail -n 200 /var/log/foreman-proxy/proxy.log

echo "* KERNEL COMMAND LINE *"
cat /proc/cmdline

echo "* FACTER *"
FACTERLIB=/usr/share/ovirt-node-plugin-foreman facter

echo "* IP ADDRESSES *"
ip addr show

echo "* ROUTES *"
ip route

echo "* DHCP LEASES *"
cat /var/lib/dhclient/*.lease

echo "* DNS *"
cat /etc/resolv.conf

Comment 5 Alexander Chuzhoy 2014-10-23 16:52:01 UTC
Created attachment 950027 [details]
The requested output from staypuft.

Comment 6 Alexander Chuzhoy 2014-10-23 16:54:36 UTC
Reproduced in another BM setup.

I was able to run a deployment, but after configuring the fencing for node and starting another deployment - it got stuck with error.

Foreman::Exception: ERF42-1518 [Foreman::Exception]: No se ha encontrado ningún proxy con la característica BCM


Checking the console of the nodes - they didn't even reboot.

Comment 7 Alexander Chuzhoy 2014-10-24 18:16:45 UTC
After configuring the fencing, going to the BMC tab of the host - there's the following error:
Failure: ERF42-1518 [Foreman::Exception]: Unable to find a proxy with BMC feature

Comment 11 Lukas Zapletal 2014-10-29 15:06:31 UTC
Added few more folks to CC. If they do not know right away, I am afraid I can't help you right now. We need to reproduce on staypuft setup, investigate the database and find a workaround.

Comment 12 Lukas Zapletal 2014-10-29 15:48:15 UTC
As per IRC discussion this is something in the staypuft code, I've verified that Foreman nightly with discovery plugin works fine when BMC NIC is configured.

Hint: https://gist.github.com/lzap/2e0bd9e5fb26243dd994

You should see some info logs in the production.log.

Comment 13 Mike Burns 2014-10-29 22:30:33 UTC
Possible fix

https://github.com/theforeman/staypuft/pull/369

Comment 14 Scott Seago 2014-10-29 22:35:09 UTC
Essentially the fact that in the fencing UI we enter username and password on the bmc nic is enough for foreman to consider the host bmc_enabled, and thus host.power returns an instance of PowerManager::BMC (which attempts to use the foreman proxy BMC feature to reboot the host).

ON top of this, the staypuft-specific orchestration code uses the power manager returned by @host.power if one exists, otherwise it uses the discovery image proxy to reboot.

In this particular case, configuring fencing just says "we want bmc configured post-provisioning for use with pacemaker fencing", but this shouldn't affect how we reboot the host for initial provisioning, so I've modified the "reboot for deployment" code to ignore BMC power management if the host is still in the Discovery environment.

We're still leaving the Foreman BMC proxy unconfigured, so this won't help for cases where we want to power-cycle a host that's no longer in the discovery env, but I think that is not a case relevant to the current problem.

Comment 16 Alexander Chuzhoy 2014-10-30 18:24:43 UTC
Verified: FailedQA

Environment:
rhel-osp-installer-0.4.6-1.el6ost.noarch
ruby193-rubygem-staypuft-0.4.12-1.el6ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch
openstack-foreman-installer-2.0.32-1.el6ost.noarch
openstack-puppet-modules-2014.1-24.el6ost.noarch


Clicking on "deploy", after the fencing was configured:
The deployment gets stuck on 5% and then after some time it fails.

Checking the errors tab of the deployment, get the following exception:
Foreman::Exception: ERF42-1518 [Foreman::Exception]: Impossibile trovare un proxy con una funzione BMC

Comment 18 Mike Burns 2014-10-30 19:30:42 UTC
https://github.com/theforeman/staypuft/pull/370

Comment 20 Alexander Chuzhoy 2014-10-31 15:31:57 UTC
Verified: FailedQA

rhel-osp-installer-0.4.7-1.el6ost.noarch
ruby193-rubygem-staypuft-0.4.13-1.el6ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch
openstack-foreman-installer-2.0.32-1.el6ost.noarch
openstack-puppet-modules-2014.1-24.el6ost.noarch


Clicking on "deploy", after the fencing was configured:
The deployment gets stuck on 5% and then after some time it fails.

Checking the errors tab of the deployment, get the following exception:

Foreman::Exception: ERF42-1518 [Foreman::Exception]: BMC 기능이 있는 프록시를 찾을 수 없습니다

Comment 21 Alexander Chuzhoy 2014-10-31 22:19:22 UTC
Created attachment 952620 [details]
logs from the staypuft machine.

Comment 22 Alexander Chuzhoy 2014-10-31 22:24:04 UTC
The same issue reproduced without checking the "Enable Fencing" button.
Just selected the Type of the fencing (IPMI), entered the user/pass.
Logs are attached in comment #21

Comment 23 Scott Seago 2014-11-01 04:09:55 UTC
Ahh, I see what's going on now. The last commit removed the call that used the BMC power management, but it left some of the setup code intact (i.e. determining the power management situation for the host). The logs show it failing in this (no longer needed) code. I'm going to submit another patch that removes this no-longer-used code here.

Comment 24 Scott Seago 2014-11-01 04:15:24 UTC
https://github.com/theforeman/staypuft/pull/371

I've manually made this change in sasha's test env so this can be confirmed Sunday or Monday prior to pushing this fix

Comment 25 Mike Burns 2014-11-01 12:42:05 UTC
For confirmation, I applied this patch in my environment.  Hosts with BMC configured would reboot successfully.  Reverted the patch and hosts would not reboot successfully.

Comment 27 Leonid Natapov 2014-11-02 14:14:38 UTC
ruby193-rubygem-staypuft-0.4.14-1.el6ost

Was able to deploy HA+Neutron with fencing defined. Fencing was properly configured 

 stonith-ipmilan-10.35.160.174	(stonith:fence_ipmilan):	Started mac848f69fbc493.example.com 
 stonith-ipmilan-10.35.160.172	(stonith:fence_ipmilan):	Started macf04da2732fb1.example.com 
 stonith-ipmilan-10.35.160.192	(stonith:fence_ipmilan):	Started macf04da2732fb1.example.com

Comment 29 errata-xmlrpc 2014-11-04 17:03:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2014-1800.html


Note You need to log in before you can comment on or make changes to this bug.