Description of problem: deployment failed at 5% after configuring the controllers to use fencing. nothing have been ran on the server after I pressed deploy. the bar turned red immediately and stuck at 5%. Version-Release number of selected component (if applicable): rhel-osp-installer-0.4.5-1.el6ost.noarch How reproducible: 4/4 Steps to Reproduce: 1.create an ha deployment and assign 3 controllers 2.configure the networks of the controllers 3.enable fencing and configure it 4. press deploy Actual results: deployment stuck at 5% Expected results: deployment should running normally Additional info: setup information: rhel 6.6 staypuft server on bm 4 bare metal server: 3 controllers and one compute
Please provide me with sosreport from the Foreman node and this output from the discovered node (use ssh and kernel parameters to get access): echo "* JOURNAL (last 500 lines) *" journalctl | tail -n500 echo "* PROXY LOG *" tail -n 200 /var/log/foreman-proxy/proxy.log echo "* KERNEL COMMAND LINE *" cat /proc/cmdline echo "* FACTER *" FACTERLIB=/usr/share/fdi/facts/ facter echo "* IP ADDRESSES *" ip addr show echo "* ROUTES *" ip route echo "* DHCP LEASES *" cat /var/lib/NetworkManager/dhclient-*.lease echo "* DNS *" cat /etc/resolv.conf
Here are new instructions how to report bugs for discovery image v1.x: echo "* KERNEL LOG *" dmesg tail -n 200 /var/log/messages echo "* PROXY LOG *" tail -n 200 /var/log/foreman-proxy/proxy.log echo "* KERNEL COMMAND LINE *" cat /proc/cmdline echo "* FACTER *" FACTERLIB=/usr/share/ovirt-node-plugin-foreman facter echo "* IP ADDRESSES *" ip addr show echo "* ROUTES *" ip route echo "* DHCP LEASES *" cat /var/lib/dhclient/*.lease echo "* DNS *" cat /etc/resolv.conf
Created attachment 950027 [details] The requested output from staypuft.
Reproduced in another BM setup. I was able to run a deployment, but after configuring the fencing for node and starting another deployment - it got stuck with error. Foreman::Exception: ERF42-1518 [Foreman::Exception]: No se ha encontrado ningún proxy con la característica BCM Checking the console of the nodes - they didn't even reboot.
After configuring the fencing, going to the BMC tab of the host - there's the following error: Failure: ERF42-1518 [Foreman::Exception]: Unable to find a proxy with BMC feature
Added few more folks to CC. If they do not know right away, I am afraid I can't help you right now. We need to reproduce on staypuft setup, investigate the database and find a workaround.
As per IRC discussion this is something in the staypuft code, I've verified that Foreman nightly with discovery plugin works fine when BMC NIC is configured. Hint: https://gist.github.com/lzap/2e0bd9e5fb26243dd994 You should see some info logs in the production.log.
Possible fix https://github.com/theforeman/staypuft/pull/369
Essentially the fact that in the fencing UI we enter username and password on the bmc nic is enough for foreman to consider the host bmc_enabled, and thus host.power returns an instance of PowerManager::BMC (which attempts to use the foreman proxy BMC feature to reboot the host). ON top of this, the staypuft-specific orchestration code uses the power manager returned by @host.power if one exists, otherwise it uses the discovery image proxy to reboot. In this particular case, configuring fencing just says "we want bmc configured post-provisioning for use with pacemaker fencing", but this shouldn't affect how we reboot the host for initial provisioning, so I've modified the "reboot for deployment" code to ignore BMC power management if the host is still in the Discovery environment. We're still leaving the Foreman BMC proxy unconfigured, so this won't help for cases where we want to power-cycle a host that's no longer in the discovery env, but I think that is not a case relevant to the current problem.
Verified: FailedQA Environment: rhel-osp-installer-0.4.6-1.el6ost.noarch ruby193-rubygem-staypuft-0.4.12-1.el6ost.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch openstack-foreman-installer-2.0.32-1.el6ost.noarch openstack-puppet-modules-2014.1-24.el6ost.noarch Clicking on "deploy", after the fencing was configured: The deployment gets stuck on 5% and then after some time it fails. Checking the errors tab of the deployment, get the following exception: Foreman::Exception: ERF42-1518 [Foreman::Exception]: Impossibile trovare un proxy con una funzione BMC
https://github.com/theforeman/staypuft/pull/370
Verified: FailedQA rhel-osp-installer-0.4.7-1.el6ost.noarch ruby193-rubygem-staypuft-0.4.13-1.el6ost.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch openstack-foreman-installer-2.0.32-1.el6ost.noarch openstack-puppet-modules-2014.1-24.el6ost.noarch Clicking on "deploy", after the fencing was configured: The deployment gets stuck on 5% and then after some time it fails. Checking the errors tab of the deployment, get the following exception: Foreman::Exception: ERF42-1518 [Foreman::Exception]: BMC 기능이 있는 프록시를 찾을 수 없습니다
Created attachment 952620 [details] logs from the staypuft machine.
The same issue reproduced without checking the "Enable Fencing" button. Just selected the Type of the fencing (IPMI), entered the user/pass. Logs are attached in comment #21
Ahh, I see what's going on now. The last commit removed the call that used the BMC power management, but it left some of the setup code intact (i.e. determining the power management situation for the host). The logs show it failing in this (no longer needed) code. I'm going to submit another patch that removes this no-longer-used code here.
https://github.com/theforeman/staypuft/pull/371 I've manually made this change in sasha's test env so this can be confirmed Sunday or Monday prior to pushing this fix
For confirmation, I applied this patch in my environment. Hosts with BMC configured would reboot successfully. Reverted the patch and hosts would not reboot successfully.
ruby193-rubygem-staypuft-0.4.14-1.el6ost Was able to deploy HA+Neutron with fencing defined. Fencing was properly configured stonith-ipmilan-10.35.160.174 (stonith:fence_ipmilan): Started mac848f69fbc493.example.com stonith-ipmilan-10.35.160.172 (stonith:fence_ipmilan): Started macf04da2732fb1.example.com stonith-ipmilan-10.35.160.192 (stonith:fence_ipmilan): Started macf04da2732fb1.example.com
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2014-1800.html