Description of problem: Failure to deploy overcloud due to failure to generate ssh key : Stack overcloud/88f21287-7329-4e8f-b80c-1dec19038c18 CSaving key "/tmp/tmpvmlQbw/id_rsa" failed: Permission denied Generating public/private rsa key pair. Command '['ssh-keygen', '-N', '', '-t', 'rsa', '-b', '4096', '-f', '/tmp/tmpvmlQbw/id_rsa', '-C', 'TripleO split stack short term key']' returned non-zero exit status 1 REATE_COMPLETE Version-Release number of selected component (if applicable): How reproducible: Intermittent Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Looks like we're failing here: function generate_short_term_keys { local tmpdir=$(mktemp -d) ssh-keygen -N '' -t rsa -b 4096 -f "$tmpdir/id_rsa" -C "$SHORT_TERM_KEY_COMMENT" > /dev/null echo "$tmpdir" }
The bug report is incomplete. Please describe how to reproduce, what version are you using, and if you did some manual actions regarding the SSH keys before.
I'm running the following script [1] and use the following templates [2] using the latest RHEL 7.6 KVM guest image for the undercloud and the latest puddles. I get a successfully deployed overcloud as the heat stack is CREATE_COMPLETE: (undercloud) [stack@undercloud-0-rhosp14 ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+ | id | stack_name | stack_status | creation_time | updated_time | project | +--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+ | 7c88e9f9-7e36-4335-97ea-47fb45f28216 | overcloud | CREATE_COMPLETE | 2018-12-31T17:17:27Z | None | 94fd1d4c095d44ecb0a85f7b51241a25 | +--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+ but I'm failing here: Deploying overcloud error (2272s) This means "openstack overcloud deploy" didn't return 0. These are the last output from the "openstack overcloud deploy" command: 2018-12-31 17:44:12Z [overcloud.AllNodesDeploySteps.BlockStorageDeployment_Step5]: CREATSaving key "/tmp/tmpldn4bS/id_rsa" failed: Permission denied Generating public/private rsa key pair. Command '['ssh-keygen', '-N', '', '-t', 'rsa', '-b', '4096', '-f', '/tmp/tmpldn4bS/id_rsa', '-C', 'TripleO split stack short term key']' returned non-zero exit status 1 E_IN_PROGRESS state changed 2018-12-31 17:44:12Z [overcloud.AllNodesDeploySteps.ObjectStorageDeployment_Step5]: CREATE_COMPLETE state changed 2018-12-31 17:44:12Z [overcloud.AllNodesDeploySteps.BlockStorageDeployment_Step5]: CREATE_COMPLETE state changed 2018-12-31 17:44:12Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:12Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5]: CREATE_COMPLETE state changed 2018-12-31 17:44:12Z [overcloud.AllNodesDeploySteps.ComputeDeployment_Step5]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:13Z [overcloud.AllNodesDeploySteps.ComputeDeployment_Step5]: CREATE_COMPLETE state changed 2018-12-31 17:44:13Z [overcloud.AllNodesDeploySteps.CephStorageDeployment_Step5]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:13Z [overcloud.AllNodesDeploySteps.CephStorageDeployment_Step5]: CREATE_COMPLETE state changed 2018-12-31 17:44:14Z [overcloud.AllNodesDeploySteps.ComputeExtraConfigPost]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:14Z [overcloud.AllNodesDeploySteps.ObjectStorageExtraConfigPost]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:14Z [overcloud.AllNodesDeploySteps.ControllerExtraConfigPost]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:14Z [overcloud.AllNodesDeploySteps.BlockStorageExtraConfigPost]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:14Z [overcloud.AllNodesDeploySteps.CephStorageExtraConfigPost]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:17Z [overcloud.AllNodesDeploySteps.ComputeExtraConfigPost]: CREATE_COMPLETE state changed 2018-12-31 17:44:17Z [overcloud.AllNodesDeploySteps.ObjectStorageExtraConfigPost]: CREATE_COMPLETE state changed 2018-12-31 17:44:17Z [overcloud.AllNodesDeploySteps.ControllerExtraConfigPost]: CREATE_COMPLETE state changed 2018-12-31 17:44:18Z [overcloud.AllNodesDeploySteps.BlockStorageExtraConfigPost]: CREATE_COMPLETE state changed 2018-12-31 17:44:19Z [overcloud.AllNodesDeploySteps.CephStorageExtraConfigPost]: CREATE_COMPLETE state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.CephStoragePostConfig]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.BlockStoragePostConfig]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.CephStoragePostConfig]: CREATE_COMPLETE state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.BlockStoragePostConfig]: CREATE_COMPLETE state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_COMPLETE state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.ObjectStoragePostConfig]: CREATE_IN_PROGRESS state changed 2018-12-31 17:44:20Z [overcloud.AllNodesDeploySteps.ObjectStoragePostConfig]: CREATE_COMPLETE state changed 2018-12-31 17:44:21Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE Stack CREATE completed successfully 2018-12-31 17:44:21Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE state changed 2018-12-31 17:44:21Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully Stack overcloud/7c88e9f9-7e36-4335-97ea-47fb45f28216 CREATE_COMPLETE Deploying overcloud configuration Enabling ssh admin (tripleo-admin) for hosts: 192.0.2.15 192.0.2.8 192.0.2.12 192.0.2.22 Using ssh user heat-admin for initial connection. Using ssh key at /home/stack/.ssh/id_rsa for initial connection. Removing short term keys locally I'm using the following templates: (undercloud) [stack@undercloud-0-rhosp14 ~]$ rpm -qa | grep heat-templ openstack-tripleo-heat-templates-9.0.1-0.20181013060906.el7ost.noarch [1] https://github.com/david-hill/cloud/blob/14.0/create_undercloud.sh [2] https://github.com/david-hill/rhosp14/tree/14.0-internal
If I re-run this , it'll complete successfully and return 0 which is what I'd expect to happen the first time I run it.
[root@undercloud-0-rhosp14 audit]# grep denied * [root@undercloud-0-rhosp14 audit]#
There's quite a bit of obfuscation going on in the script, it would help if we had the actual overcloud deploy command that can reproduce this issue. Are there any unusual permissions on /tmp on the undercloud, or can you capture the perms on the $tmpdir before it is deleted (or comment out the deletion step[0], temporarily) on an occasion where ssh-keygen fails? [0] https://github.com/openstack/tripleo-heat-templates/blob/a0b72fa415d57171621144e104bac561cf9ef211/deployed-server/scripts/enable-ssh-admin.sh#L93
It looks like it might be a selinux issue after all as I've noticed the behavior changed lately and it was no longer set in permissive . I got the following denied AVCs: audit.log.2:type=AVC msg=audit(1546897000.464:230073): avc: denied { write } for pid=332637 comm="ssh-keygen" name="tmpi54FXC" dev="vda1" ino=102719051 scontext=system_u:system_r:ssh_keygen_t:s0 tcontext=system_u:object_r:initrc_tmp_t:s0 tclass=dir audit.log.2:type=AVC msg=audit(1546897000.464:230073): avc: denied { add_name } for pid=332637 comm="ssh-keygen" name="id_rsa" scontext=system_u:system_r:ssh_keygen_t:s0 tcontext=system_u:object_r:initrc_tmp_t:s0 tclass=dir audit.log.2:type=AVC msg=audit(1546897000.464:230073): avc: denied { create } for pid=332637 comm="ssh-keygen" name="id_rsa" scontext=system_u:system_r:ssh_keygen_t:s0 tcontext=system_u:object_r:initrc_tmp_t:s0 tclass=file Previously I was setting selinux in permissive using this: puppet-stack-config/os-apply-config/etc/puppet/hieradata/CentOS.yaml:tripleo::selinux::mode: permissive puppet-stack-config/os-apply-config/etc/puppet/hieradata/RedHat.yaml:tripleo::selinux::mode: permissive but for some reasons, it looks like it's not longer effective. I added a custom hiera_data.yaml file to undercloud.conf that contains : tripleo::selinux::mode: permissive and redeployed the undercloud . Then redeployed the overcloud ...
So it was a selinux problem as mentionned above as after fixing the permissive selinux issue, I successfully deployed an overcloud and tested it: [jenkins@zappa linux-stable-new]$ bash reproduce_rhosp14.sh Fetching image done (97s) Copying base image done (144s) Resizing base disk done (0s) Customizing image done (110s) Waiting for VM to come up done (18s) Waiting for SSH to come up done (265s) Creating VMs for control done (58s) Creating VMs for compute done (10s) Creating VMs for ceph done (40s) Resuming stopped vbmc engines done (20s) Waiting for VM to reboot done (729s) Copying instackenv to 192.168.122.2 done (1s) Sending overcloud images to undercloud done (47s) Waiting for undercloud deployment done (3957s) Getting new images done (1391s) Uploading RHEL image done (11s) Waiting for introspection done (480s) Waiting for overcloud deployment done (7231s) Waiting for overcloud test done (810s) Reproduce 0
Hello, It's weird: the AVC points a type (initrc_tmp_t) that isn't shown in the "ls -laZd" (tmp_t) shown in your output above. Care to provide the versions for selinux-related packages? I'll try to reproduce it on my lab and dig a bit this weird type. Cheers, C.
Hello, So I'm trying to reproduce that on a RHEL 7.6, with enforcing selinux, but apparently, I'm unable to get this error. Here's the relevant information from my env: container-selinux-2.74-1.el7.noarch libselinux-2.5-14.1.el7.x86_64 libselinux-python-2.5-14.1.el7.x86_64 libselinux-ruby-2.5-14.1.el7.x86_64 libselinux-utils-2.5-14.1.el7.x86_64 openstack-selinux-0.8.15-1.el7ost.noarch openvswitch-selinux-extra-policy-1.0-9.el7fdp.noarch selinux-policy-3.13.1-229.el7_6.6.noarch selinux-policy-targeted-3.13.1-229.el7_6.6.noarch python-tripleoclient-10.6.1-0.20181010222413.8c8f259.el7ost.noarch Interesting log part: Starting ssh admin enablement workflow ssh admin enablement workflow - RUNNING. ssh admin enablement workflow - RUNNING. ssh admin enablement workflow - RUNNING. ssh admin enablement workflow - RUNNING. ssh admin enablement workflow - COMPLETE. Removing TripleO short term key from 192.168.24.7 Warning: Permanently added '192.168.24.7' (ECDSA) to the list of known hosts. Removing TripleO short term key from 192.168.24.14 Warning: Permanently added '192.168.24.14' (ECDSA) to the list of known hosts. Removing short term keys locally Enabling ssh admin - COMPLETE. Waiting for messages on queue 'tripleo' with no timeout. Config downloaded at /var/lib/mistral/overcloud Inventory generated at /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml Running ansible playbook at /var/lib/mistral/overcloud/deploy_steps_playbook.yaml. See log file at /var/lib/mistral/overcloud/ansible.log for progress. ... Using /var/lib/mistral/overcloud/ansible.cfg as config file PLAY [Gather facts from undercloud] ******************************************** Care to provide some info about your versions? Cheers, C.
Hello, You'll be able to reproduce this only if you start your deployment upon startup using a init script started via systemd. I'm pretty sure I must be the only customer trying that so I wouldn't spend much time on that but if you want to allow that selinux AVC in the policies, it would avoid me of messing with selinux modes ... Thank you very much, David Hill
Hello David, Interesting use-case indeed. Care to share your systemd unit? I'd be interested in its content, I think there might be a way to set some proper selinux things in there directly. I'm afraid allowing ssh_keygen_t to write in initrc_tmp_t is a bit out of hand, especially if it's a one-time use-case like that. We'd better find a solution within either the unit script, or maybe some wrapper or whatever. Cheers, C.
Hey Cedric, I've investigated that side myself but couldn't find anything but perhaps my google questions were not adequate. Here is the service file I'm using [1]. Thanks, Dave [1] https://github.com/david-hill/cloud/blob/master/customize.service
Or maybe I simply need to move my /etc/rc.d files to some other location that are unconfined ?
Hey, maybe you can try to push the script in /usr/local/bin, where it has a standard confinement? Cheers, C.
Hello David, Any news on that? Cheers, C.
Hey Cedric, I didn't have time to try to move the files to /usr/local/bin ... as I found a workaround (permissive selinux) . I'll do this as soon as I can find the time between two cases. Thank you very much, David Hill