Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1592010

Summary: ansible installer fails to create "/etc/cni/net.d/80-openshift-network.conf" during installation
Product: OpenShift Container Platform Reporter: Preetesh Sharma <prsharma>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Status: CLOSED WORKSFORME QA Contact: Meng Bo <bmeng>
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: aos-bugs, bbennett, gkeegan, jokerman, ktadimar, mmccomas, pasik, szobair, tdudgeon.ml
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-22 20:55:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Installation logs none

Description Preetesh Sharma 2018-06-16 05:48:48 UTC
Created attachment 1452098 [details]
Installation logs

Description of problem: While installing openshift on openstack ansible installer failed to create "/etc/cni/net.d/80-openshift-network.conf" in node with below error log.

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Version-Release number of the following components:

rpm -q openshift-ansible - openshift-ansible-docs-3.9.30-1.git.7.46f8678.el7.noarch
openshift-ansible-playbooks-3.9.30-1.git.7.46f8678.el7.noarch
openshift-ansible-3.9.30-1.git.7.46f8678.el7.noarch
openshift-ansible-roles-3.9.30-1.git.7.46f8678.el7.noarch

rpm -q ansible
ansible --version 

How reproducible:

Steps to Reproduce:

1. Install openshift 3.9 on openstack with below options in inventory file.

openshift_release=v3.9
openshift_pkg_version=-3.9.27
openshift_image_tag=v3.9.27


Actual results: Installation fails with below message.
TASK [Set Node install 'Complete'] ******************************************************************************************************************************************
**************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/openshift-node/private/config.yml:32
ok: [<U+FEFF>] => {
    "ansible_stats": {
        "aggregate": true, 
        "data": {
            "installer_phase_node": {
                "end": "20180615225123Z", 
                "status": "Complete"
            }
        }, 
        "per_host": false
    }, 
    "changed": false, 
    "failed": false
}
META: ran handlers
META: ran handlers
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/openshift-node/config.retry


Please include the entire output from the last TASK line through the end of output if an error is generated

Additional info: Below upstream bugzilla of this issue is already raised.
https://github.com/openshift/openshift-ansible/issues/7967

Installer verbose output is attached in this bug.

Comment 1 Scott Dodson 2018-06-18 14:36:49 UTC
The installer doesn't populate this directory. Moving to Networking.

Comment 2 Casey Callendrello 2018-06-19 13:37:01 UTC
Do you have logs from the SDN pod? It writes the configuration file when it is up.

Comment 3 Tim Dudgeon 2018-07-06 15:27:41 UTC
I can readily show this in action if anyone wants to investigate. Please get in touch with me.
There is lots of extra information available in the related github issue:
https://github.com/openshift/openshift-ansible/issues/7967

Comment 4 Casey Callendrello 2018-07-06 16:52:45 UTC
Make sure you use a 3.9 commit of the installer. Installer master won't work on 3.9, since that file is written differently in 3.10

Comment 6 Casey Callendrello 2018-07-12 15:55:23 UTC
This has been needinfo for a week; dropping from urgent.

Comment 7 Tim Dudgeon 2018-07-12 17:11:20 UTC
I object to this being dropped to low priority. 
It is very significant problem!
5 days ago I pointed to lots of extra info on this, and offered to demonstrate this in action. Nobody has been in touch since.

Comment 8 Casey Callendrello 2018-07-13 11:15:13 UTC
Can you get the logs of the SDN pod? That should indicate why it has not written the CNI file.

Comment 9 Tim Dudgeon 2018-07-13 11:38:16 UTC
This is not a containerised installation.
Where would I look for the relevant logs?

Comment 10 Casey Callendrello 2018-07-13 13:42:00 UTC
Since it's not a containerized install, the SDN is part of the node process. Check in "journalctl -u origin-node".

The SDN will write that file once it has connected to the apiserver and determined the clusternetwork. If it fails to do so, it should log accordingly.

So a thing to check is that the node process is able to talk to the apiserver.

Comment 11 Tim Dudgeon 2018-07-23 13:05:57 UTC
Sorry for delay on this - our cluster went down and took some time to get running again.

What I'm finding now with the latest code on the release-3.9 branch of the ansible installer is that this problem is NOT manifesting itself. I ran through the installation process about 8 times and did not hit this problem.

I'm not confident that this means its 'fixed', but right now I cannot generate any logs where it is happening.

If the other users who have encountered this could also check if its still happening for them it would be useful.

Comment 13 Casey Callendrello 2018-08-10 12:06:53 UTC
Venkata,
OCP 3.7 and 3.9 use very different ways to create that file; please file a separate issue.

Comment 16 Casey Callendrello 2018-08-22 20:55:53 UTC
Tim,
I'm going to close this for now, feel free to reopen if the problem manifests itself.

Comment 17 Tim Dudgeon 2018-08-23 15:10:12 UTC
Be aware that there is still traffic on this issue on the upstream github project:
https://github.com/openshift/openshift-ansible/issues/7967
For me on one environment its not happening now, but I'm not convinced that means its fixed. I hope to do some more tests soon in an environment where this happened more frequently.

Comment 18 Red Hat Bugzilla 2023-09-14 04:29:52 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days