Description of problem: The playbook fails with: ------- TASK [etcd_server_certificates : Sign and create the peer crt] ***************** changed: [atom0011.example.com -> atom0010.example.com] TASK [etcd_server_certificates : file] ***************************************** fatal: [atom0011.example.com -> atom0010.example.com] FAILED! => { "changed": false, "dest": "/etc/etcd/generated_certs/etcd-atom0011.example.com/ca.crt", "failed": true, "gid": 0, "group": "root", "mode": "0644", "owner": "root", "secontext": "unconfined_u:object_r:etc_t:s0", "size": 1895, "src": "/etc/etcd/ca/ca.crt", "state": "file", "uid": 0 } MSG: Cannot link, file exists at destination ------- Leaving the etcd cluster in down/broken state. etcd nodes: atom0010 atom0011 atom0015 Version-Release number of selected component (if applicable): openshift-ansible-roles-3.6.173.0.21-2.git.0.44a4038.el7.noarch Atomic Host 7.4.1 How reproducible: Unconfirmed
Can you get `ls -la /etc/etcd/generated_certs/etcd-atom0011.example.com/` for us? I imagine they've manually created some symlinks in there?
I think that's what happened; customer restored from a snapshot and issue is no longer present. Closing with INSUFFICIENT DATA. Sorry for inconvenience!
Discussion with andrew we believe this is a real issue so re-open it. Problem occurs when generated_certs dir exists for the hosts we're scaling up and this particular customer will work around that by removing those directories prior to v2 to v3 migration.
We've had no other reported cases of this and the customer in this case was able to isolate root cause to local modifications that they'd made to their certificate file heirarchy that would not likely happen in other scenarios.
I reviewed the case and there's no indication that the problem encountered there matches the behavior described in this bug other than the etcd migration failed. If I were to guess why the migration failed in that case the only thing I can come up with is that there's a proxy configured for the user that ansible is using. Please see https://bugzilla.redhat.com/show_bug.cgi?id=1515667 for more details on that. Since I believe this particular bug to be related to localized problems with the environment in which it was originally observed and that customer has remedied those I'm closing this again.
Hit this same error with the playbook, reopening bug. openshift-ansible-3.6.173.0.48-1.git.0.1609d30.el7.noarch ansible-2.4.0.0-5.el7.noarch ansible 2.4.0.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /bin/ansible python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] fatal: [osm2.abc.com -> osm1.abc.com]: FAILED! => { "changed": false, "dest": "/etc/etcd/generated_certs/etcd-osm2.abc.com/ca.crt", "failed": true, "gid": 0, "group": "root", "mode": "0600", "owner": "root", "secontext": "unconfined_u:object_r:etc_t:s0", "size": 1895, "src": "/etc/etcd/ca/ca.crt", "state": "file", "uid": 0 } MSG: Cannot link, file exists at destination
We intend to fix this by changing from using the existing etcd scaleup playbook to specific tasks required to scale the cluster back up during a v2 to v3 migration. The existing scaleup playbook generates certificates and several other tasks which are not necessary in this scenario.
This should be fixed by https://github.com/openshift/openshift-ansible/pull/7226 - scaleup playbook would no longer be called, so this task won't be executed during migration
Fix is available in openshift-ansible-3.6.173.0.104-1-4-g76aa5371e
Fix for the issue is not yet released, sorry for the noise
PR not in latest 3.6 build(openshift-ansible-3.6.173.0.104-1-4-g76aa5371e).
In openshift-ansible-3.6.173.0.105-1
Fixed. openshift-ansible-3.6.173.0.110-1.git.0.ca81843.el7.noarch no errors found during migration. Red Hat Enterprise Linux Atomic Host 7.4.1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1106