Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 872199

Summary: RHEL-5.7 ec2 instances fail to generate /root/.ssh/authorized_keys
Product: [Retired] CloudForms Cloud Engine Reporter: James Laska <jlaska>
Component: aeolus-configserverAssignee: Greg Blomquist <gblomqui>
Status: CLOSED WONTFIX QA Contact: Rehana <aeolus-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.1.0CC: athomas, aweiteka, cpelland, dajohnso, jgreguske, jturner, juwu, lbrindle, mitch, morazi
Target Milestone: 1.1.2Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
See comment#4 for proposed release note
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-19 16:00:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
aeolus-debug-20121101094655.tar.gz none

Description James Laska 2012-11-01 14:28:14 UTC
Created attachment 636688 [details]
aeolus-debug-20121101094655.tar.gz

Description of problem:

Apologies, imagefactory may not be the most appropriate component for this bug.

RHEL 5.7 ec2 instances do not setup /root/.ssh/authorized_keys.  Therefore, I am unable to ssh into any RHEL 5.7 ec2 instances using a valid key provided by Cloud Engine.

Version-Release number of selected component (if applicable):
 * aeolus-conductor-0.13.22-1.el6cf.src.rpm
 * aeolus-configure-2.8.11-1.el6cf.src.rpm
 * imagefactory-1.0.2-1.el6cf.src.rpm
 * iwhd-1.5-2.el6.src.rpm
 * oz-0.8.0-6.el6cf.src.rpm

How reproducible:


Steps to Reproduce:
1. Create an image from a RHEL-5.7 system template ()
2. Build and push the image
3. Launch an instance of the image to ec2
  
Actual results:

> ssh -i Downloads/jlaska_rhel-i386-5.7_1351775886_key_70332254551740.pem     ec2-23-21-21-49.compute-1.amazonaws.com
> Warning: Permanently added 'ec2-23-21-21-49.compute-1.amazonaws.com' (RSA) to the list of known hosts.
> Permission denied (publickey).

Expected results:

> # ssh -i Downloads/rhel-i386-5-8-Production_rhel-i386-5.8_1351704228_key_70332250830260.pem     ec2-184-72-90-38.compute-1.amazonaws.com
> Warning: Permanently added 'ec2-184-72-90-38.compute-1.amazonaws.com' (RSA) to the list of known hosts.
> Last login: Thu Nov  1 09:20:25 2012 from 66.187.233.202
> [root@domU-12-31-39-09-4D-E1 ~]#

Additional info:

 * The problem is _not_ that my ssh key has the wrong permissions.  I've confirmed the key has proper permissions.  I've also tested from different user accounts on different systems to rule out any $HOME/.ssh/config involvement.

> ls -l Downloads/jlaska_rhel-i386-5.7_1351775886_key_70332254551740.pem
> -r--------. 1 jlaska jlaska 1672 Nov  1 09:19 Downloads/jlaska_rhel-i386-5.7_1351775886_key_70332254551740.pem

 * When I stop the 5.7 ec2 instance, and attach it's root volume to a working 5.8 ec2 instance, I can see that the /root/.ssh/authorized_keys file was not created.  Without this file, SSH key authentication will not work.  I can provide access to the volume if needed.

Comment 1 James Laska 2012-11-01 14:30:06 UTC
Snippet from /var/log/secure on the 5.7 instance ...

> Nov  1 08:26:35 domU-12-31-39-00-4D-C7 sshd[2044]: Received disconnect from 66.187.233.206: 11: disconnected by user
> Nov  1 08:26:35 domU-12-31-39-00-4D-C7 sshd[2044]: pam_unix(sshd:session): session closed for user root
> Nov  1 08:26:36 domU-12-31-39-00-4D-C7 sshd[2059]: Accepted publickey for root from 66.187.233.206 port 39706 ssh2
> Nov  1 08:26:36 domU-12-31-39-00-4D-C7 sshd[2059]: pam_unix(sshd:session): session opened for user root by (uid=0)
> Nov  1 08:26:36 domU-12-31-39-00-4D-C7 sshd[2059]: Received disconnect from 66.187.233.206: 11: disconnected by user
> Nov  1 08:26:36 domU-12-31-39-00-4D-C7 sshd[2059]: pam_unix(sshd:session): session closed for user root
> Nov  1 08:26:36 domU-12-31-39-00-4D-C7 sshd[2075]: Accepted publickey for root from 66.187.233.206 port 39812 ssh2
> Nov  1 08:26:37 domU-12-31-39-00-4D-C7 sshd[2075]: pam_unix(sshd:session): session opened for user root by (uid=0)
> Nov  1 08:26:37 domU-12-31-39-00-4D-C7 sshd[2075]: Received disconnect from 66.187.233.206: 11: disconnected by user
> Nov  1 08:26:37 domU-12-31-39-00-4D-C7 sshd[2075]: pam_unix(sshd:session): session closed for user root

Comment 3 James Laska 2012-11-01 18:46:37 UTC
Updates, it appears the problem ...
 - occurs when you launch a 5.7 app with <services> in the blueprint
 - doesn't occur if you launch a 5.7 app with *no* <services> defined in the blueprint

Talking to blomquist, it seems there is a difference between the audrey initscripts between RHEL-5.7 (doesn't work) and RHEL-5.8 (works)

== RHEL-5.7 (doesn't work) ==
 * Installed: aeolus-audrey-agent-0.4.5-1.el5.noarch
 * which provides the file '/etc/init.d/audrey'

== RHEL-5.8 (works) ==

 * Installed: aeolus-audrey-agent-0.4.10-1.el5.noarch
 * which provides the file '/etc/init.d/ZZaudrey'

It seems the name of the audrey initscript impacts whether system startup works correctly.

Comment 4 James Laska 2012-11-01 20:49:57 UTC
5.7 deployments do *not* work if you have any <services> included in your application blueprint.  At this time, CloudForms cannot deploy 5.7 applications that require service orchestration.  Customers requiring this support must include the aeolus-audrey-agent-0.4.10-1.el5 (or newer) package when building RHEL 5.7 images.

Comment 5 Mitch 2012-11-02 13:39:15 UTC
5.7 was supposed to be one of the tested platforms for CF 1.0.  Can we confirm this did work with CF 1.0?  I'm curious if this used to work and now it doesn't (what happened) or if it never worked and we're noticing it now.

Either case, I do not see this as a blocker but instead a strong candidate for a z-steam fix.

Comment 6 Mike Orazi 2012-11-02 16:11:26 UTC
Moving to z based on the above comment.

Comment 7 Greg Blomquist 2012-11-05 14:25:41 UTC
I believe this bug is occurring for three reasons:

1)  the configuration script was responsible for creating a "return parameter" that would be transmitted back to the config server, but the configuration script had an error in it and was never creating the return parameter (essentially, a facter variable)

2)  the installed version of audrey-agent (0.4.5-1) has no mechanism for failing when it encounters a configuration script failure

3)  the installed version of audrey-agent runs as a start-up script in 
/etc/init.d/audrey, which happens to run before the rc.local script executes

The version of Audrey in RHEL 5.8 (0.4.10-1) solves these problems by moving the /etc/init.d script to "ZZaudrey".  It's a ridiculous hack, but it works, in that it allows rc.local to execute first.  And, rc.local is the piece that lays down the ssh keys in EC2 instances.

James, can you confirm this is the case?

James, can you also confirm that you tested audrey-agent-0.4.10-1 in RHEL-5.7?

If the RHEL-5.8 RPM has been tested in RHEL-5.7, it seems the easiest path forward is to backport the audrey-agent-0.4.10-1 RPM to RHEL-5.7, right?

Comment 8 Greg Blomquist 2012-11-06 19:31:33 UTC
Based on conversations in IRC with jlaska and jgreguske regarding this issue, I believe this is purely a doc issue.

A later package (available in rhel5.8 repos and later) fixes the problem discussed.  The issue at hand is that a template that attempts to build a rhel5.7 image will receive an outdated package for audrey-agent.

I believe the the fix for this bug is to document that the audrey-agent package in rhel5.7 has (at least) this known bug.  And, that the workaround is to include the updated (0.4.10-1) audrey-agent package when building a rhel5.7 guest.

I believe that jlaska is currently testing out this scenario to make sure it works correctly.

Once we have some positive feedback for that scenario, we can offer up some guidance on how to construct the templates correctly.

Comment 9 James Laska 2012-11-12 15:01:00 UTC
Confirmed.  Building a RHEL-5.7 instance with access to aeolus-audrey-agent-1.4.10-1 allows service orchestration to proceed as directed.

> # cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 5.7 (Tikanga)
> 
> # rpm -q aeolus-audrey-agent
> aeolus-audrey-agent-0.4.10-1.el5

Can we get a release note for this issue please?  Once release noted, I believe we can mark this issue as CLOSED NEXTRELEASE as the aeolus-audrey-agent shipped with RHEL-5.8 or newer does not exhibit the problem.

Comment 10 James Laska 2012-11-12 15:16:24 UTC
Adjusting the severity as this is a known issue with an easy workaround.