Bug 701445

Summary: Oz 5 second timeout can cause EC2 build failure in some cases
Product: [Retired] CloudForms Cloud Engine Reporter: Justin Clift <jclift>
Component: imagefactoryAssignee: Chris Lalancette <clalance>
Status: CLOSED CURRENTRELEASE QA Contact: wes hayutin <whayutin>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.3.1CC: clalance, dajohnso, deltacloud-maint, kwade, ssachdev
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 684278, 697919    

Description Justin Clift 2011-05-02 21:07:48 UTC
Description of problem:

  Oz has (at time of writing) a hard coded 5 second connection timeout for ssh connections.

  Occasionally can lead to failures when pushing an image to EC2.  (ie when general network slowness is occurring)


Workaround:

  An effective workaround for this, is to increase the hard coded 5 second timeout to a higher value.

  This value is in two places in the Oz RedHat.py source file:

    /usr/lib/python2.7/site-packages/oz/RedHat.py

  Doing a search in the file for "ConnectTimeout=" will locate them both.  Make the desired change
  (ie from 5 -> 120 or whatever), then restart the Image Factory service to make it take effect:

  $ sudo service imagefactory restart

Comment 1 Chris Lalancette 2011-05-03 13:49:50 UTC
On Oz git HEAD, I've increased the default timeout to 10 seconds.  Unfortunately I haven't yet made it configurable, since I haven't figured out a good way to expose it to the user.  That's a future work item.

Chris Lalancette

Comment 2 wes hayutin 2011-06-16 19:01:04 UTC
is this issue ready for QE?

Comment 3 Chris Lalancette 2011-06-16 19:53:07 UTC
Not yet.  I don't think the change made it into a package yet; once we update the oz package to 0.4.0 (or 0.5.0, once it is out), it should be ready to test.

Chris Lalancette

Comment 4 Chris Lalancette 2011-06-23 14:19:12 UTC
We updated the testing repository to oz 0.4.0, so we should be good to test this out now.

Chris Lalancette

Comment 5 wes hayutin 2011-09-28 14:51:11 UTC
built and start ec2 and vsphere instances.. have not  hit this issue in a very very long time..
verified..


[root@unused ~]# rpm -qa | grep aeolus
aeolus-conductor-0.4.0-0.20110926211730git1cc372b.fc15.noarch
aeolus-conductor-daemons-0.4.0-0.20110926211730git1cc372b.fc15.noarch
aeolus-all-0.4.0-0.20110926211730git1cc372b.fc15.noarch
aeolus-conductor-doc-0.4.0-0.20110926211730git1cc372b.fc15.noarch
aeolus-conductor-devel-0.4.0-0.20110926211730git1cc372b.fc15.noarch
rubygem-aeolus-image-0.1.0-3.20110919115936gitd1d24b4.fc15.noarch
aeolus-configure-2.0.2-4.20110926142838git5044e56.fc15.noarch

Comment 7 wes hayutin 2011-12-08 14:03:28 UTC
closing out old bugs

Comment 8 wes hayutin 2011-12-08 14:15:13 UTC
perm close