Created attachment 1842681 [details]
Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.
$ openshift-install version
built from commit 8fc863d833b1b361efc61c81998890e1305bcf9b
release image registry.ci.openshift.org/ocp/release@sha256:4975c19c8d645f0bfa68e770c16c688bc0590b20440de0265c802aae774aa1b7
release architecture amd64
Staring at 15th of Nov we see our OCP-4.10 deployments failing.
We use nightly channel (registry.ci.openshift.org/ocp/release:4.10) to deploy 4.10 clusters.
We use IPI deployment on top of OpenStack (RHOS-D & RHOS-C01 failing on both).
The masters nodes are not joining the clusters. When observing console of the masters nodes, I see only four containers there, and I see errors in coredns-monitor container.
I don't see any suspicions error on bootstrap, it is waiting for the masters to join, so it can schedule some workload on it.
$ crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
59f033eb5dd4c 232fb20b94dcad8030a95eef9c09c9f9f4e89f1685ac405ada44d0203d8d07c5 25 minutes ago Running coredns-monitor 0 8346d59db7493
2b1dfbd079c2c quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f0c1b89092c1966baa30586089f8698f2768b346717194f925cd80dfd84ed040 25 minutes ago Running coredns 0 8346d59db7493
ca7d0d4a174b7 232fb20b94dcad8030a95eef9c09c9f9f4e89f1685ac405ada44d0203d8d07c5 25 minutes ago Running keepalived-monitor 0 b0a7e4c78fef6
60594257d94dc quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3e96c1755163ecb2827bf4b4d1dfdabf2a125e6aeef620a0b8ba52d0c450432c 25 minutes ago Running keepalived 0 b0a7e4c78fef6
$ crictl logs 59f033eb5dd4c
time="2021-11-18T16:56:56Z" level=error msg="Failed to build client config: invalid configuration: [unable to read client-cert /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory, unable to read client-key /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory]"
What did you expect to happen?
The cluster comes up as usually, with three master and three worker nodes.
How to reproduce it (as minimally and precisely as possible)?
$ openshift-install create cluster --dir cnv-qe.rhcloud.com/c01-lbednar --log-level debug
Anything else we need to know?
I attached, logs from bootstrap vm, from master-0, install log & install config.
The issue was introduced with https://github.com/openshift/machine-config-operator/pull/2823/.
The local nameserver is only prepended if the /var/run/NetworkManager/resolv.conf has a default search domain.
On PSI, the default resolv.conf looks like this:
[core@mandre-psi-8x4w9-master-0 ~]$ cat /var/run/NetworkManager/resolv.conf
# Generated by NetworkManager
And thus, the resulting resolv.conf generated by the NetworkManager-resolv-prepender script is:
[core@mandre-psi-8x4w9-master-0 ~]$ cat /etc/resolv.conf
# Generated by KNI resolv prepender NM dispatcher script
Removing the Triaged keyword because:
* the QE automation assessment (flag qe_test_coverage) is missing
Verified on: 4.10.0-0.nightly-2022-01-08-215919
CI run for profile 23_IPI on OSP16 & FIPS on & OVN & csidriver passed
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.