Bug 800251

Summary: Instance launched with http service (through configserver )doesn't allow to ssh into ec2 instance
Product: [Retired] CloudForms Cloud Engine Reporter: Shveta <ssachdev>
Component: aeolus-audrey-agentAssignee: Dan Radez <dradez>
Status: CLOSED ERRATA QA Contact: wes hayutin <whayutin>
Severity: medium Docs Contact:
Priority: high    
Version: 1.0.0CC: akarol, cpelland, deltacloud-maint, dgao, dradez, hbrock, jrd, redakkan, ssachdev, whayutin
Target Milestone: beta5Keywords: Reopened, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-15 18:44:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
template
none
with_services
none
without_services none

Description Shveta 2012-03-06 06:43:57 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Added configserver for ec2
2. Launched instance with it , downloaded key and tried to ssh , succesful 
3. Now i edited the xml of the Blueprint and added services 

<?xml version="1.0"?>
<deployable version="1.0" name="RHEL6_2 configserver">
  <description/>
  <assemblies>
    <assembly hwp="small" name="RHEL6-2-configserver">
      <image id="d4f331b2-674f-11e1-807d-00215e203092"/>
<services>
         <service name="http">
           <executable url="http://radez.fedorapeople.org/wordpress-http.sh"/>
           <parameters>
             <parameter name="wp_name" type="scalar">
               <value>wordpress</value>
             </parameter>
             <parameter name="wp_user" type="scalar">
               <value>wordpress</value>
             </parameter>
             <parameter name="wp_pw" type="scalar">
               <value>wordpress</value>
             </parameter>
             <parameter name="mysql_ip" type="scalar">
               <reference assembly="mysql" parameter="ipaddress"/>
             </parameter>
             <parameter name="mysql_hostname" type="scalar">
               <reference assembly="mysql" parameter="hostname"/>
             </parameter>
             <parameter name="mysql_dbup" type="scalar">
               <reference assembly="mysql" parameter="dbup"/>
             </parameter>
           </parameters>
         </service>
       </services>
       <returns>
         <return name="hostname"/>
         <return name="ipaddress"/>
       </returns>
    </assembly>
  </assemblies>
</deployable>


4. Launched an instance , downloaded key and tried to ssh , i could not ssh .
ssh: connect to host ec2-107-22-97-254.compute-1.amazonaws.com port 22: Connection refused

=============================================================

used this audrey template to build

<template>
  <name>RHEL6_2 configserver</name>
  <os>
    <name>RHEL-6</name>
    <version>2</version>
    <arch>x86_64</arch>
    <install type='url'>
      <url>http://download.devel.redhat.com/released/RHEL-6/6.2/Server/x86_64/os/</url>
    </install>
    <rootpw>dog8code</rootpw>
  </os>
  <repositories>
    <repository name="rhel">
      <url>http://download.devel.redhat.com/released/RHEL-6/6.2/Server/x86_64/os/</url>
    </repository>
    <repository name="aeolus">
      <url>http://repos.fedorapeople.org/repos/aeolus/conductor/testing/6Server/x86_64/</url>
    </repository>
  </repositories>
  <packages>
    <package name="aeolus-audrey-agent"/>
  </packages>
  <description>RHEL 6.2 w/ Audrey Client</description>
</template>
  
Actual results:


Expected results:


Additional info:


rpm -qa|grep aeolus
aeolus-conductor-daemons-0.8.0-40.el6.noarch
aeolus-configure-2.5.0-17.el6.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
aeolus-conductor-0.8.0-40.el6.noarch
rubygem-aeolus-cli-0.3.0-12.el6.noarch
aeolus-all-0.8.0-40.el6.noarch
aeolus-conductor-doc-0.8.0-40.el6.noarch

Comment 1 dgao 2012-03-06 16:30:47 UTC
Just a note, I can only reproduce this bug going from one service to multiple services. If going from multiple services to more services, ssh works fine.

Comment 2 dgao 2012-03-08 17:40:24 UTC
After some initial testing, this seems like a timing issue. There are a few observed stages after initializing an instance launch in conductor:

1) The instance is powered on but it's not completely up (None or few systemv services have started). At this stage, both EC2 console and conductor will report the instance to be "running". This is probably a little premature. 
2) The instance is fully up and all of the systemv services has started. EC2 would mark this as "2/2 status check complete". 

Any ssh attempt after stage #1 will result in a hang. But ssh attempt after stage #2 should work fine. 

However we have observed cases where EC2 would report "2/2 status check complete" yet we get a 

ssh: connect to host ec2-184-72-93-44.compute-1.amazonaws.com port 22: Connection refused

This error is consistent to what Shveta ran into in comment #1. But after waiting a few more mins, further attempts of ssh would work properly. 

So overall, I would recommend devs in conductor and deltacloud to make a change and not flip the status to running in conductor until it receives a "2/2 status check complete" from EC2.

Comment 3 wes hayutin 2012-03-08 17:45:32 UTC
waiting for ec2 instances to be ready for ssh access is a known issue w/ ec2 itself. We're simply waiting for the ssh daemon to launch even though the various ec2 tools report its running..

The defect is w/ ec2..

Comment 4 Shveta 2012-03-12 14:00:55 UTC
this is seen with vsphere also , Re-opening the bug

Comment 5 Rehana 2012-03-12 14:20:47 UTC
Steps to reproduce on vsphere:

1. Added configserver for vsphere
2. Build and pushed Audrey agent enabled template to vsphere (PFA; template)
3. Created two deployables one with "wordpress" services and the other one with out having those services(PFA: With services, with_out services)

Actual behaviour:

Observed that i was unable to ssh to instance having services,

ssh root.77.107
ssh: connect to host 10.10.77.107 port 22: Connection refused

where as i was able to ssh to the machine which doesn't have the services

ssh root.77.117
The authenticity of host '10.10.77.117 (10.10.77.117)' can't be established.
RSA key fingerprint is 87:9d:e8:c6:51:b7:96:6d:a5:3f:eb:1a:e9:b8:8e:c4.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.10.77.117' (RSA) to the list of known hosts.
root.77.117's password: 
[root@dhcp77-117 ~]# hostname
dhcp77-117.rhndev.redhat.com

Comment 6 Rehana 2012-03-12 14:21:28 UTC
Created attachment 569417 [details]
template

Comment 7 Rehana 2012-03-12 14:22:11 UTC
Created attachment 569418 [details]
with_services

Comment 8 Rehana 2012-03-12 14:22:41 UTC
Created attachment 569419 [details]
without_services

Comment 9 wes hayutin 2012-03-13 15:23:52 UTC
I tried this scenario.. works for me.. closing not a bug

Comment 10 wes hayutin 2012-03-13 15:36:55 UTC
sry.. this is an issue.. 
the ip is available, but you can not ssh in..

this is a bug.. tracking down

Comment 11 wes hayutin 2012-03-13 15:47:49 UTC
root cause here..

[root@dhcp77-144 rc.d]# cd rc3.d/
[root@dhcp77-144 rc3.d]# ls
K01smartd       K89rdisc         S13cpuspeed    S26udev-post  S90crond
K10psacct       S01sysstat       S13irqbalance  S50vmtoolsd   S95atd
K10saslauthd    S02lvm2-monitor  S15mdmonitor   S55audrey     S97rhnsd
K50netconsole   S08ip6tables     S20kdump       S55sshd       S97rhsmcertd
K74ntpd         S08iptables      S22messagebus  S80postfix    S99local
K75ntpdate      S10network       S25netfs       S82abrt-ccpp
K75quota_nld    S11auditd        S26acpid       S82abrtd
K87restorecond  S12rsyslog       S26haldaemon   S82abrt-oops
[root@dhcp77-144 rc3.d]#

Comment 12 wes hayutin 2012-03-13 15:49:47 UTC
S55audrey and S55sshd have the start value.. ssh needs to be before audrey

Comment 13 Greg Blomquist 2012-03-13 15:54:14 UTC
thanks for tracking this down Wes.

Assigning to Dan.

Comment 14 Dan Radez 2012-03-13 16:41:10 UTC
https://brewweb.devel.redhat.com/taskinfo?taskID=4147874
https://brewweb.devel.redhat.com/taskinfo?taskID=4147881

package version aeolus-audrey-agent-0.4.4-4

Comment 17 dgao 2012-03-14 18:41:23 UTC
[root@dhcp77-109 ~]# rpm -qa | grep "audrey"
aeolus-audrey-agent-0.4.4-4.el6.noarch
[root@dhcp77-109 ~]# head /etc/init.d/audrey 
#! /bin/sh
#
# chkconfig: 345 99 55
# description: The audrey agent.
# processname: audrey

# Source function library.
. /etc/init.d/functions

# Check that networking is up.

Comment 18 errata-xmlrpc 2012-05-15 18:44:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0669.html