Bug 1535420

Summary: inspite of regenerating new ssh keys, user needs to factory reinstall affected machines, if setup fails or is aborted, due to possible stale ssh keys
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: gluster-colonizerAssignee: Dustin Black <dblack>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: japplewh, rcyriac, rhs-bugs, rreddy
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.3.1 Async   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gluster-colonizer-1.0.3-1.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-12 12:06:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1542835    
Attachments:
Description Flags
ssh key regen prompt none

Description Nag Pavan Chilakam 2018-01-17 10:54:24 UTC
Created attachment 1382347 [details]
ssh key regen prompt

Description of problem:
=========================
When we use colonizer to deploy gluster on a fresh setup, we are asked if we want to regenerate ssh-keys.
Once we accept that and proceed, and for some reason the setup failed/aborted,
if we rerun colonizer again, it fails due to ssh key errors, even though we again tell it to regenerate ssh keys

This means we have to re install all the machines affected again , which is very tedious and time taking

This bug impacts the time taken to deploy as factory installing takes time.



How reproducible:
====================
hit this a few time , but 100% reproducible

Steps to Reproduce:
1. factory setup all machines
2. start colonizer on one machine to install media server
3. now you would be prompted to regenerate ssh keys, accept it (attached screenshot has the prompt)
4. now for some reason make the script fail during further steps or abort it
5. you are told to reboot to restart deployment through colonizer
6. reboot and restart colonizer to deploy media server.
7. accept ssh key regeneration(attached screenshot has the prompt)
 


Actual results:
============
even after regenration of new keys, deployment script fails due to ssh key problems

Expected results:
=================
once we have accepted for new keys to be regenerated, the script should not fail due to ssh key issues.

Additional info:
the only way to overcome is to factory setup all affected machines, which is very tedious

Comment 2 Dustin Black 2018-01-31 18:28:13 UTC
I can mitigate this problem by pushing the ssh key regeneration to the phase of the script just before the main playbook runs. At this point, all abort conditions based on user input are past and only failures in the main playbook could cause this problem. This should make it dramatically less likely, but I don't think I can easily solve the problem completely.

Comment 3 Dustin Black 2018-01-31 18:33:50 UTC
Completed as described in comment #2 with upstream merge commit to master d46cf79239daa8ba5c25acea58fb2faf8b8e9b95

Comment 4 Nag Pavan Chilakam 2018-03-01 10:21:11 UTC
verification:
I have test this on gluster-colonizer-1.0.3-1.el7rhgs
seeing that ssh keys generation is proceeding even if they existed
also as based on comment#3, the ssh key generation has been moved to later stage.

Hence moving to verified mainly based on what was fixed as part of comment#3

Comment 7 errata-xmlrpc 2018-03-12 12:06:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0477