When doing overcloud deployments through automated means (e.g. jenkins) where the root users home directory is overridden, or we want to keep ssh keys in a different directory than home (maybe in a subdirectory owned by git) we see the error ssh-keygen: /var/lib/jenkins/.ssh/known_hosts: No such file or directory ERROR: openstack Command '['ssh-keygen', '-R', u'192.168.1.60']' returned non-zero exit status 255 This is because in /usr/lib/python2.7/site-packages/rdomanager_oscplugin/utils.py we see def remove_known_hosts(overcloud_ip): """For a given IP address remove SSH keys from the known_hosts file""" known_hosts = os.path.expanduser("~/.ssh/known_hosts") if os.path.exists(known_hosts): command = ['ssh-keygen', '-R', overcloud_ip] subprocess.check_call(command) I think the subprocess command is not correctly inheriting the shell environment from the deploy command. Ideally it should look for $PWD/.ssh first, then try home directory. This way if people want to run/keep all configuration in a local dir, it's possible. Also, the subprocess call should inherit the Environment the deploy command was run in Regards, Graeme
The subprocess does inherit the environment. Is there a specific environment variable that is relevant here?
Oh fair enough. When I try to run an overcloud deploy from jenkins (I am doing CD) I make sure to set the environment variable HOME in the jenkins job to /var/lib/jenkins/jobs/overcloud\ deploy/workspace and I still get the error ssh-keygen: /var/lib/jenkins/.ssh/known_hosts: No such file or directory ERROR: openstack Command '['ssh-keygen', '-R', u'192.168.1.60']' returned non-zero exit status 255 It's picking up jenkins normal home directory, instead of using the environment variable HOME. Though now that I think about it, maybe this isn't a problem with our code but with ssh-keygen. Also I'm not sure if we want to rethink our manipulation of files in the .ssh directory? As a Operator in an environment with high security, ssh configuration everywhere is typically under very strict configuration management control, and might be locked down in a way they don't want external tools messing with it. Regards, Graeme
Setting $HOME does seem like something that would be good, I can look into this. From this standpoint we aren't an external tool. :-) We need to be able to ssh into the hosts to set them up when deploying, so not setting this up isn't really an option.
Fix posted. https://review.gerrithub.io/#/c/243728/
I created a directory /home/stack/stack and moved the .ssh directory to it. I exported a new HOME variable that points to the new home and ran the deploy command. I got: ERROR: openstack No section: 'auth' Tested in python-rdomanager-oscplugin-0.0.10-1.el7ost.noarch from poodle 2015-09-05.2
This moves the whole home directory, which means that also the undercloud-passwords.conf needs to be moved. instackenv.json still needs to be located in the current directory, so you have to have /home/stack as the current directory. That probably also should change, but no bug has been filed for that, so that didn't happen before this release.
I tried to verify the bug like this: 1) I created a directory called /tmp/home/stack 2) I copied the contents of /home/stack to the new /tmp/home/stack 3) I defined $HOME to point to the new location 4) To be absolutely certain, I temporarily renamed /home/stack to /home/stack-bkp so that if it is hard-coded anywhere - it won't be found I deployed from my new home directory (which had instackenv.json and .ssh/ and all other files) and I got this error: Could not create directory '/home/stack/.ssh'. Failed to add the host to the list of known hosts (/home/stack/.ssh/known_hosts). Permission denied (publickey,gssapi-keyex,gssapi-with-mic). ERROR: openstack Command '['ssh', '-oStrictHostKeyChecking=no', '-t', '-l', 'heat-admin', u'172.16.0.22', 'sudo', 'keystone-manage', 'pki_setup', '--keystone-user', "$(getent passwd | grep '^keystone' | cut -d: -f1)", '--keystone-group', "$(getent group | grep '^keystone' | cut -d: -f1)"]' returned non-zero exit status 255 Did I test correctly? It seems from the error I got that something somewhere still relies on my home directory to be called /home/stack. If this needs to be tested in a different way - please provide a detailed description of how to test.
Yeah, it seems like something else also assumes that /home/stack/.ssh exists. I didn't try renaming the home directory, I will look into that.
I get completely different errors, but I think it's safe to say that we can't support renaming the users actual home directory... But I tested it by copying instackenv.json and .ssh to /tmp/home/stack and renaming the originals, and I do get the same error as yours, so there is something inside os_cloud_config as well that looks in /home/stack/.ssh despite setting $HOME, so this patch is not enough to fix the bug.
The ssh command will ignore $HOME, so this can be fixed, but only by changing both /tmp/home/stack/connfig and updating os_cloud_config.
Also fixed the PKI initialization to use --root
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Is this bug still relevant to fix in 11 now that post config is handled in puppet?
I'm happy to close this bug out as no longer relevant, as the use case we had for this is no longer applicable (we are doing things differently)
Closing per Graeme Gillies feedback.