Bug 927162

Summary: bootstrap: incorrect progress to next state if phase1 of vdsm-bootstrap fails
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Alon Bar-Lev <alonbl>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.3CC: acathrow, dyasny, hateya, iheim, lpeer, mgoldboi, Rhev-m-bugs, sgrinber, yeylon, ykaul, yzaslavs
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: sf2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 875528    
Bug Blocks:    
Attachments:
Description Flags
log none

Description Dafna Ron 2013-03-25 09:08:30 UTC
Created attachment 715924 [details]
log

Description of problem:

this has happned to me more than once now, when I try to install two hosts at the same time they fail on Downloading certificate request from Host 
after I reinstall them again, the install succeeds. 
this is in 3.1.3

Version-Release number of selected component (if applicable):

3.1.3

How reproducible:

100%

Steps to Reproduce:
1. install two hosts at the same time
2. after we fail on Downloading certificate request from Host reinstall 
3.
  
Actual results:

we fail in step 'Downloading certificate request from Host' when installing more than one host at the same time
if we reinstall the host again, the step succeeds and the install continues

Expected results:

we should not fail 

Additional info:log

Comment 1 Alon Bar-Lev 2013-03-27 14:24:25 UTC
As far as I can see these are all timeout related issues.

2013-03-25 10:50:22,028 ERROR [org.ovirt.engine.core.utils.hostinstall.VdsInstallerSSH] (pool-3-thread-46) SSH error running command cougar01.scl.lab.tlv.redhat.com:'umask 0077; MYTMP="$(mktemp -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; rm -fr "${MYTMP}" && mkdir "${MYTMP}" && tar -C "${MYTMP}" --no-same-permissions -o -x && "${MYTMP}"/setup -c 'ssl=true;management_port=54321' -O 'RedHat' -t 2013-03-25T08:40:22 -f /tmp/firewall.conf.6d69c622-a4e6-4a50-ac4e-5f33765c342f -S /tmp/ovirt-id_rsa_6d69c622-a4e6-4a50-ac4e-5f33765c342f -p 80 -b  -B rhevm  http://dafna-31.scl.lab.tlv.redhat.com:80/Components/vds/ http://dafna-31.scl.lab.tlv.redhat.com:80/Components/vds/ cougar01.scl.lab.tlv.redhat.com 6d69c622-a4e6-4a50-ac4e-5f33765c342f False': javax.naming.TimeLimitExceededException: SSH session hard timeout host 'cougar01.scl.lab.tlv.redhat.com:22'

The problem is that even in this case the engine tries to download the certificate request.

I see where the error is in the code. Does it worth fixing? This whole code is dead in next version.

Comment 8 Itamar Heim 2013-06-11 08:41:57 UTC
3.2 has been released

Comment 9 Itamar Heim 2013-06-11 08:41:57 UTC
3.2 has been released

Comment 10 Itamar Heim 2013-06-11 08:42:00 UTC
3.2 has been released

Comment 11 Itamar Heim 2013-06-11 08:49:17 UTC
3.2 has been released