Bug 816104 - Unable to ssh to the wordpress Apache machine
Unable to ssh to the wordpress Apache machine
Status: CLOSED CURRENTRELEASE
Product: CloudForms Cloud Engine
Classification: Red Hat
Component: aeolus-audrey-agent (Show other bugs)
1.0.0
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Dan Radez
Rehana
: ZStream
Depends On:
Blocks: 824615
  Show dependency treegraph
 
Reported: 2012-04-25 05:46 EDT by Rehana
Modified: 2012-12-13 14:49 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
User supplied tooling scripts have the capacity to render the instance unusable if they hang and never return. To help avoid this issue it is recommended that all user tooling scripts be fully tested prior to including them in a deployable. If it is not possible to ssh into an instance being launched on EC2 it is possible the user tooling is hung and never completed. If this is encountered the recommendation is to manually test the user tooling script in a manually launched EC2 instance. For more information refer to: Bug 816104 - Unable to ssh to the wordpress Apache machine
Story Points: ---
Clone Of:
: 824615 (view as bug list)
Environment:
Last Closed: 2012-12-13 14:49:23 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Rehana 2012-04-25 05:46:26 EDT
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Launch instances using wordpress application blueprint.
2. Observed that ssh to Apache machine was failing, but i could ssh to mysql machine with out any issue.

log:
ssh -v -i wrdpress-multi-ec2_apache_1335345356_key_70284478636280.pem root@ec2-23-20-71-201.compute-1.amazonaws.com
OpenSSH_5.8p2, OpenSSL 1.0.0h-fips 12 Mar 2012
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to ec2-23-20-71-201.compute-1.amazonaws.com [23.20.71.201] port 22.
debug1: Connection established.
debug1: identity file wrdpress-multi-ec2_apache_1335345356_key_70284478636280.pem type -1
debug1: identity file wrdpress-multi-ec2_apache_1335345356_key_70284478636280.pem-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.8
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Server host key: RSA 26:ac:5f:63:4e:a4:87:96:8f:fd:1f:ef:59:fe:38:3f
debug1: Host 'ec2-23-20-71-201.compute-1.amazonaws.com' is known and matches the RSA host key.
debug1: Found key in /home/rehana/.ssh/known_hosts:118
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: Roaming not allowed by server
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: Next authentication method: gssapi-keyex
debug1: No valid Key exchange context
debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_1000' not found

debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_1000' not found

debug1: Unspecified GSS failure.  Minor code may provide more information


debug1: Unspecified GSS failure.  Minor code may provide more information


debug1: Next authentication method: publickey
debug1: Trying private key: wrdpress-multi-ec2_apache_1335345356_key_70284478636280.pem
debug1: read PEM private key done: type RSA
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: No more authentication methods to try.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
  
Actual results:
unable to ssh to Apache machine

Expected results:
Should be able to ssh to Apache machine

Additional info:

 rpm -qa | grep aeolus
aeolus-conductor-0.8.13-1.el6_2.noarch
aeolus-configure-2.5.3-1.el6.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
rubygem-aeolus-cli-0.3.1-1.el6.noarch
aeolus-all-0.8.13-1.el6_2.noarch
aeolus-conductor-doc-0.8.13-1.el6_2.noarch
aeolus-conductor-daemons-0.8.13-1.el6_2.noarch
Comment 1 Rehana 2012-04-25 05:56:07 EDT
adding mysql ssh details:
ssh -v -i rehana_mysql_1335346667_key_70284477760600.pem root@ec2-23-22-31-163.compute-1.amazonaws.com
OpenSSH_5.8p2, OpenSSL 1.0.0h-fips 12 Mar 2012
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to ec2-23-22-31-163.compute-1.amazonaws.com [23.22.31.163] port 22.
debug1: Connection established.
debug1: identity file rehana_mysql_1335346667_key_70284477760600.pem type -1
debug1: identity file rehana_mysql_1335346667_key_70284477760600.pem-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.8
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Server host key: RSA ea:f9:4b:19:16:90:90:f6:20:4c:c7:30:42:19:b9:a6
debug1: Host 'ec2-23-22-31-163.compute-1.amazonaws.com' is known and matches the RSA host key.
debug1: Found key in /home/rehana/.ssh/known_hosts:121
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: Roaming not allowed by server
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: Next authentication method: gssapi-keyex
debug1: No valid Key exchange context
debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_1000' not found

debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_1000' not found

debug1: Unspecified GSS failure.  Minor code may provide more information


debug1: Unspecified GSS failure.  Minor code may provide more information


debug1: Next authentication method: publickey
debug1: Trying private key: rehana_mysql_1335346667_key_70284477760600.pem
debug1: read PEM private key done: type RSA
debug1: Authentication succeeded (publickey).
Authenticated to ec2-23-22-31-163.compute-1.amazonaws.com ([23.22.31.163]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Sending environment.
debug1: Sending env XMODIFIERS = @im=none
debug1: Sending env LANG = en_US.UTF-8
Last login: Wed Apr 25 05:41:35 2012 from 122.167.170.240
[root@ip-10-72-129-161 ~]#
Comment 2 dgao 2012-04-26 17:20:08 EDT
So the problem here is the deployable that you were using requires a parameter value that was never set. So one of the instance sat there waiting forever.

The reason why this caused an ssh issue in EC2 is that audrey is a init script with the lowest priority. But the code to place .pem key (the same one you download from conductor and used to access ec2 instances) lives in rc.local. rc.local always execute after all the services are started, so if there's any hiccups in the audrey script where it's waiting on a parameter value to be sent over from configserver, the instance cannot be ssh'd into. 

Potential solution is
1) Revert the audrey change so it executes as the last item in rc.local
   - Problem with this solution is that systemd don't use rc.local anymore.
2) If a parameter value is not provided after a certain amount of time, audrey logs something and fails.
   - Problem with this solution is that it could create a false positive if the instance that's providing the value takes a long time to execute through its own executable script.
Comment 3 jrd 2012-04-27 09:57:26 EDT
Per our morning call, jvlcek is going to write down some relnotes for this, in consultation with dgao and gblomqui and dradez.  Stand by.
Comment 4 Joe Vlcek 2012-04-27 10:28:25 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
User supplied tooling scripts have the capacity to render
the instance unusable if they hang and never return.

To help avoid this issue it is recommended that all user
tooling scripts be fully tested prior to including them
in a deployable.

If it is not possible to ssh into an instance being
launched on EC2 it is possible the user tooling is
hung and never completed.

If this is encountered the recommendation is to manually
test the user tooling script in a manually launched EC2
instance.

For more information refer to:
Bug 816104 - Unable to ssh to the wordpress Apache machine
Comment 8 Dan Radez 2012-05-08 10:50:53 EDT
brewed as aeolus-audrey-agent-0.4.6-1
pushed to git in b9d6bec8e80c0e3c14da514cfc1dee2be8bdc752
Comment 10 dgao 2012-09-20 09:59:56 EDT
This is no longer an issue since startup of audrey-agent has been changed to a systemd/systemV script. Failure in startup of audrey-agent will no longer block ssh service from launching. 

Closing

Note You need to log in before you can comment on or make changes to this bug.