Descriptionsefi litmanovich
2014-01-16 14:45:13 UTC
Created attachment 851125[details]
engine log
Description of problem:
This issue is either a problem with jenkins hosts provisioning or with ssh authentication feature. The issue occured so far only with Puma hosts on hostbroker during automatic testing of soft fencing.
test case - stop vdsmd service on an installed host, state:up to invoke soft fencing. this test was working fine less then a month ago:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/RhevmCore/view/3.3/job/3.3-git-rhevmCore-infra_soft_fencing-rest/11/
recently stopped working due to the following reason (see the full engin log attached:
2014-01-08 15:47:35,960 ERROR [org.ovirt.engine.core.bll.SshSoftFencingCommand] (pool-5-thread-48) [9dbec40] SSH connection to host puma19.scl.lab.tlv.redhat.com failed: javax.naming.AuthenticationException: SSH authentication to 'root.lab.tlv.redhat.com' failed. Please verify provided credentials. Make sure key is authorized at host: javax.naming.AuthenticationException: SSH authentication to 'root.lab.tlv.redhat.com' failed. Please verify provided credentials. Make sure key is authorized at host.
1. after test failed I reserved the host and installed it on my engine. installation worked. then reproduced the test case manually and got the same issue. after that removed the host, installed it again and tried re-install with ssh authentication method and received the same result.
2. I made sure that the entry for ovirt-engine on the host's .ssh/authorized_keys is the same is seen on the engine.
3. I reproduced this issue on a one of my hosts and did not suffer the same results. after that compared the ovirt-engine entry in .ssh/authorized_keys on both my host and puma host and they were the same.
4. after checking with lukas bednar if this might be a plugin issue, he suggested this might occur due to puppet plugin which cleans ssh keys from host broker hosts. he then disabled this option on the plugin but the issue occured again.
5. tried xcmd - ssh -i /etc/pki/ovirt-engine/keys/engine_id_rsa root@host from engine to puma host, and it worked so it seems that the key should be fine.
whether this is a problem with the ssh key or with host's in host broker I have gone out of ideas.
How reproducible:
this does not reproduce manually (at least so far). reproducable upon bulidng this jeknkins task (Test case name: SoftFencingPassedWithoutPM):
http://jenkins.qa.lab.tlv.redhat.com:8080/view/RhevmCore/view/3.3/job/3.3-git-rhevmCore-infra_soft_fencing-rest/
Actual results:
engine isn't able to connect the host and soft fence, host become non-responsive
Expected results:
engine connects to host via ssh and does soft fencing. host state back to up.
Additional info:
Test was running on IS29, stopped working on IS30 through.
you need to stop automation when this happens so we can see it in practice.
there were past reports similar to that, all turned out to be false alarm, as some component at host modified the authorized keys.