Description of problem: In ansible hosts file there are 6 hosts, but actually they are multihomed 3 hosts. The m1 == e1 and they have different IP addresses. [masters] m1 m2 m3 [etcd] e1 e2 e3 In this case, 2 facts processes are executed on the same host and going to race, and the following error happens in save_local_facts() method: TASK [openshift_facts : Gather Cluster facts and set is_containerized if needed] Traceback (most recent call last): File "/tmp/ansible_0cs8t6/ansible_module_openshift_facts.py", line 1331, in save_local_facts "Could not create fact file: %s, error: %s" % (filename, ex) __main__.OpenShiftFactsFileWriteError: Could not create fact file: /etc/ansible/facts.d/openshift.fact, error: [Errno 17] File exists: '/etc/ansible/facts.d' Both 2 processes pass the exists == false and one makedirs and another failed, it makes the whole installation failed. We can probably retry the exists/makedir if it failed. In /usr/share/ansible/openshift-ansible/roles/openshift_facts/library/openshift_facts.py: def save_local_facts(filename, facts): """ Save local facts Args: filename (str): local facts file facts (dict): facts to set """ try: fact_dir = os.path.dirname(filename) if not os.path.exists(fact_dir): os.makedirs(fact_dir) with open(filename, 'w') as fact_file: fact_file.write(module.jsonify(facts)) os.chmod(filename, 0o600) except (IOError, OSError) as ex: raise OpenShiftFactsFileWriteError( "Could not create fact file: %s, error: %s" % (filename, ex) ) Version-Release number of selected component (if applicable): OpenShift Container Platform 3.3 How reproducible: Rare, timing issue Steps to Reproduce: 1. perform install to hosts with multihomed master/etcd 2. 3. Actual results: Sometimes failed with: __main__.OpenShiftFactsFileWriteError: Could not create fact file: /etc/ansible/facts.d/openshift.fact, error: [Errno 17] File exists: '/etc/ansible/facts.d' Expected results: No failure Additional info:
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/07113bc31ffa60a5fc3f34b392576d4639474485 Fix race condtion in openshift_facts If, for some reason, two facts processes were run simultaneously on the same host, creating the directory could cause an exception. This should help with that. Fixes Bug 1385449
Verified with openshift-ansible-3.4.16-1.git.0.c846018.el7.noarch 1. Create 3 instances which are binding 2 interfaces, one is for master and node, another one is for etcd traffic. #cat hosts <--snip--> [masters] ghuang-1385449-ocp-master-0.test.com ghuang-1385449-ocp-master-1.test.com ghuang-1385449-ocp-master-2.test.com [nodes] ghuang-1385449-ocp-master-0.test.com ghuang-1385449-ocp-master-1.test.com ghuang-1385449-ocp-master-2.test.com [etcd] ghuang-1385449-ocp-etcd-0.test.com ghuang-1385449-ocp-etcd-1.test.com ghuang-1385449-ocp-etcd-2.test.com 2. Trigger the installation Installation succeed. /etc/ansible/facts.d/openshift.fact was created successfully on each instance.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066