Bug 1385449

Summary: openshift_facts.py failed due to race condition on /etc/ansible/facts.d/ directory
Product: OpenShift Container Platform Reporter: Takayoshi Kimura <tkimura>
Component: InstallerAssignee: Samuel Munilla <smunilla>
Status: CLOSED ERRATA QA Contact: Gan Huang <ghuang>
Severity: medium Docs Contact:
Priority: low    
Version: 3.3.0CC: aos-bugs, ghuang, jokerman, mmccomas, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously if hosts defined in the advanced installation inventory had multiple inventory names defined for the same hosts the installer would fail with an error when creating /etc/ansible/facts.d. This race condition has been resolved preventing this problem from happening.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-18 12:43:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Takayoshi Kimura 2016-10-17 05:00:32 UTC
Description of problem:

In ansible hosts file there are 6 hosts, but actually they are multihomed 3 hosts. The m1 == e1 and they have different IP addresses.

[masters]
m1
m2
m3

[etcd]
e1
e2
e3

In this case, 2 facts processes are executed on the same host and going to race, and the following error happens in save_local_facts() method:

TASK [openshift_facts : Gather Cluster facts and set is_containerized if needed]
Traceback (most recent call last):
  File "/tmp/ansible_0cs8t6/ansible_module_openshift_facts.py", line 1331, in save_local_facts
    "Could not create fact file: %s, error: %s" % (filename, ex)
__main__.OpenShiftFactsFileWriteError: Could not create fact file: /etc/ansible/facts.d/openshift.fact, error: [Errno 17] File exists: '/etc/ansible/facts.d'

Both 2 processes pass the exists == false and one makedirs and another failed, it makes the whole installation failed.

We can probably retry the exists/makedir if it failed.

In /usr/share/ansible/openshift-ansible/roles/openshift_facts/library/openshift_facts.py:

def save_local_facts(filename, facts):
    """ Save local facts

        Args:
            filename (str): local facts file
            facts (dict): facts to set
    """
    try:
        fact_dir = os.path.dirname(filename)
        if not os.path.exists(fact_dir):
            os.makedirs(fact_dir)
        with open(filename, 'w') as fact_file:
            fact_file.write(module.jsonify(facts))
        os.chmod(filename, 0o600)
    except (IOError, OSError) as ex:
        raise OpenShiftFactsFileWriteError(
            "Could not create fact file: %s, error: %s" % (filename, ex)
        )

Version-Release number of selected component (if applicable):

OpenShift Container Platform 3.3

How reproducible:

Rare, timing issue

Steps to Reproduce:
1. perform install to hosts with multihomed master/etcd
2.
3.

Actual results:

Sometimes failed with:
__main__.OpenShiftFactsFileWriteError: Could not create fact file: /etc/ansible/facts.d/openshift.fact, error: [Errno 17] File exists: '/etc/ansible/facts.d'

Expected results:

No failure

Additional info:

Comment 1 openshift-github-bot 2016-10-26 18:07:24 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/07113bc31ffa60a5fc3f34b392576d4639474485
Fix race condtion in openshift_facts

If, for some reason, two facts processes were run simultaneously
on the same host, creating the directory could cause an exception.
This should help with that.

Fixes Bug 1385449

Comment 3 Gan Huang 2016-11-03 06:41:57 UTC
Verified with openshift-ansible-3.4.16-1.git.0.c846018.el7.noarch

1. Create 3 instances which are binding 2 interfaces, one is for master and node, another one is for etcd traffic.

#cat hosts
<--snip-->
[masters]
ghuang-1385449-ocp-master-0.test.com 
ghuang-1385449-ocp-master-1.test.com 
ghuang-1385449-ocp-master-2.test.com 

[nodes]
ghuang-1385449-ocp-master-0.test.com 
ghuang-1385449-ocp-master-1.test.com 
ghuang-1385449-ocp-master-2.test.com 

[etcd]
ghuang-1385449-ocp-etcd-0.test.com 
ghuang-1385449-ocp-etcd-1.test.com 
ghuang-1385449-ocp-etcd-2.test.com 

2. Trigger the installation
Installation succeed. /etc/ansible/facts.d/openshift.fact was created successfully on each instance.

Comment 5 errata-xmlrpc 2017-01-18 12:43:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066