Bug 1385449 - openshift_facts.py failed due to race condition on /etc/ansible/facts.d/ directory
Summary: openshift_facts.py failed due to race condition on /etc/ansible/facts.d/ dire...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: Samuel Munilla
QA Contact: Gan Huang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-17 05:00 UTC by Takayoshi Kimura
Modified: 2017-03-08 18:43 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously if hosts defined in the advanced installation inventory had multiple inventory names defined for the same hosts the installer would fail with an error when creating /etc/ansible/facts.d. This race condition has been resolved preventing this problem from happening.
Clone Of:
Environment:
Last Closed: 2017-01-18 12:43:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description Takayoshi Kimura 2016-10-17 05:00:32 UTC
Description of problem:

In ansible hosts file there are 6 hosts, but actually they are multihomed 3 hosts. The m1 == e1 and they have different IP addresses.

[masters]
m1
m2
m3

[etcd]
e1
e2
e3

In this case, 2 facts processes are executed on the same host and going to race, and the following error happens in save_local_facts() method:

TASK [openshift_facts : Gather Cluster facts and set is_containerized if needed]
Traceback (most recent call last):
  File "/tmp/ansible_0cs8t6/ansible_module_openshift_facts.py", line 1331, in save_local_facts
    "Could not create fact file: %s, error: %s" % (filename, ex)
__main__.OpenShiftFactsFileWriteError: Could not create fact file: /etc/ansible/facts.d/openshift.fact, error: [Errno 17] File exists: '/etc/ansible/facts.d'

Both 2 processes pass the exists == false and one makedirs and another failed, it makes the whole installation failed.

We can probably retry the exists/makedir if it failed.

In /usr/share/ansible/openshift-ansible/roles/openshift_facts/library/openshift_facts.py:

def save_local_facts(filename, facts):
    """ Save local facts

        Args:
            filename (str): local facts file
            facts (dict): facts to set
    """
    try:
        fact_dir = os.path.dirname(filename)
        if not os.path.exists(fact_dir):
            os.makedirs(fact_dir)
        with open(filename, 'w') as fact_file:
            fact_file.write(module.jsonify(facts))
        os.chmod(filename, 0o600)
    except (IOError, OSError) as ex:
        raise OpenShiftFactsFileWriteError(
            "Could not create fact file: %s, error: %s" % (filename, ex)
        )

Version-Release number of selected component (if applicable):

OpenShift Container Platform 3.3

How reproducible:

Rare, timing issue

Steps to Reproduce:
1. perform install to hosts with multihomed master/etcd
2.
3.

Actual results:

Sometimes failed with:
__main__.OpenShiftFactsFileWriteError: Could not create fact file: /etc/ansible/facts.d/openshift.fact, error: [Errno 17] File exists: '/etc/ansible/facts.d'

Expected results:

No failure

Additional info:

Comment 1 openshift-github-bot 2016-10-26 18:07:24 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/07113bc31ffa60a5fc3f34b392576d4639474485
Fix race condtion in openshift_facts

If, for some reason, two facts processes were run simultaneously
on the same host, creating the directory could cause an exception.
This should help with that.

Fixes Bug 1385449

Comment 3 Gan Huang 2016-11-03 06:41:57 UTC
Verified with openshift-ansible-3.4.16-1.git.0.c846018.el7.noarch

1. Create 3 instances which are binding 2 interfaces, one is for master and node, another one is for etcd traffic.

#cat hosts
<--snip-->
[masters]
ghuang-1385449-ocp-master-0.test.com 
ghuang-1385449-ocp-master-1.test.com 
ghuang-1385449-ocp-master-2.test.com 

[nodes]
ghuang-1385449-ocp-master-0.test.com 
ghuang-1385449-ocp-master-1.test.com 
ghuang-1385449-ocp-master-2.test.com 

[etcd]
ghuang-1385449-ocp-etcd-0.test.com 
ghuang-1385449-ocp-etcd-1.test.com 
ghuang-1385449-ocp-etcd-2.test.com 

2. Trigger the installation
Installation succeed. /etc/ansible/facts.d/openshift.fact was created successfully on each instance.

Comment 5 errata-xmlrpc 2017-01-18 12:43:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.