Description of problem: Met "could not find csr for nodes: ip-172-18-9-45.ec2.internal", "state": "unknown" during setup the OCP 3.11 cluster using cni plugin: os_sdn_network_plugin_name: cni Version-Release number of the following components: rpm -q openshift-ansible openshift-ansible-3.10.35-1.git.0.e5b821eNone.noarch.rpm rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. setup ocp 3.11 cluster with parameter: os_sdn_network_plugin_name: cni 2. the job was broken 3. Check the csr oc get csr Actual results: fatal: [ec2-52-203-112-231.compute-1.amazonaws.com]: FAILED! => {"attempts": 30, "changed": false, "msg": "Could not find csr for nodes: ip-172-18-9-45.ec2.internal", "state": "unknown"} step 3. the csr always in 'pending' status Expected results: no this error. Additional info: Please attach logs from ansible-playbook with the -vvv flag
Please gather `oc get nodes -o yaml` `oc get csr -o yaml` Please provide a complete inventory.
It is easy to reproduce the problem with configuration: os_sdn_network_plugin_name: cni
I wonder if there is a control-plane component that should be "hostNetwork: true" but isn't. This would mean that it doesn't come up until the network comes up. When os_sdn_network_plugin_name is "cni", no network is installed and the user is expected to provide their own. This is NOT a networking bug, AFAICT.
hmm, I'm using 'cni' is testing the PR https://github.com/openvswitch/ovn-kubernetes/pull/385 @phil, could you help check this?
zzhao I am investigating. I am trying to set up 3.11 cluster and I am still debugging. This is among the problems.
Casey, (Comment 9) When I set "os_sdn_network_plugin_name='cni'", redhat/openshift-ovs-multitenant is still installed and the daemonsets are up ad running. Part of the instructions in PR-385 is to delete the daemonsets. The 'cni' may be causing some other install item to not work as expected. This look sto me like an installed bug, not a network bug.
The logic to run this in master's config was added in commit b17728d542 Clayton, can you clarify what's supposed to be happening here? You are the author of that commit.
PR created in master: https://github.com/openshift/openshift-ansible/pull/10033
https://github.com/openshift/openshift-ansible/pull/10054 release-3.11
These fixes missed today's build so I've produced a new build via brew with them for testing. https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=765897
Fixed at: openshift-ansible-playbooks-3.11.5-1.git.0.5a01a3c.el7_5.noarch.rpm
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days