Description of problem: This is for OKD 4.10-2022-03-07, installed on vSphere UPI. After cluster deployment, when the image registry is enabled (and PV added) then this normally adds a line to /etc/hosts pointing to the internal cluster image registry so that CRI-O can find it. This has worked fine until this release. Now some nodes are missing this line in /etc/hosts. I have done four such cluster deployments with 3 master and 3 worker nodes on each and my results varied from that only one of the six nodes had the line to that only one of the six nodes was missing it. Version-Release number of selected component (if applicable): OKD 4.10-2022-03-07 How reproducible: Always, see above Steps to Reproduce: 1. deploy fresh cluster 2. enable internal image registry, adding PV 3. check /etc/hosts on each node Actual results: Some nodes are missing the line in /etc/hosts. Expected results: All nodes must have a line like this in /etc/hosts: 172.30.47.30 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local Additional info: I also did a deployment of a similar OCP 4.10.3 cluster and did not see the problem there.
I also did a deployment of a similar OCP 4.10.3 cluster and did not see the problem there.
The image registry nor its operator doesn't manage /etc/hosts, is it something node related?
I'm not an expert of the inner workings of OpenShift ... but whenever after installing a Cluster on vSphere UPI I enabled the cluster's internal image registry (which is by default disabled after initial cluster deployment), these /etc/hosts entries automatically appeared. E.g. after initial deployment of the cluster they were not there and after enabling the image registry they immediately were created. So my assumption is that the image registry operator somehow causes them to be created.
See https://github.com/openshift/os/pull/657#issuecomment-983680444 So not to name names here, but it's the DNS operator.
The DNS operator creates a "node-resolver" daemonset that runs a pod on each node in the cluster to add the entry for the cluster image registry to the /etc/hosts file on the node host. These pods have a toleration for all taints. Can you provide (a) a list of nodes, (b) a list of pods in the "openshift-dns" namespace so we can verify that a node-resolver pod is running on each node, and (c) the logs for these pods? oc get nodes -o name oc -n openshift-dns get pods -o wide oc -n openshift-dns logs -l dns.operator.openshift.io/daemonset-node-resolver Depending on what we find, we may need more details on the nodes and events: oc get nodes -o yaml oc -n openshift-dns get events -o yaml Alternatively, a must-gather archive would have all of this information (and more).
Follow-up question: Has anyone been able to reproduce this issue on OCP, or is it specific to OKD?
I thought I had added a must-gather to this issue but it seems I forgot it. I still have that cluster where I tested this the last time but have meanwhile created these /etc/hosts entries on it manually as a workaround. Would it still be of value to get a must-gather from it now, 10 days after I had the problem on it? Most logs will have been gone already, I guess. If it helps, I can kill and recreate the cluster easily, to reproduce the problem. Yes, in openshift-dns I see a dns-default and a node-resolver pod for each cluster node. The "oc logs -l daemonset-node-resolver" only yields a couple of these messages: dig: parse of /etc/resolv.conf failed This may be because it (for whatever reason) contains a line containing search . and no real search suffix? Otherwise it looks fine (I checked on the nodes), just two nameserver lines. The "oc get events" does not show any at all (empty output). See my original problem description: no, I have not been able to reproduce this with OCP 4.10, only OKD 4.10 so far.
(In reply to Kai-Uwe Rommel from comment #7) [...] > Would it still be of value to get a must-gather from it now, 10 days after I > had the problem on it? I don't think it is necessary at this point given the information that you have provided. [...] > Yes, in openshift-dns I see a dns-default and a node-resolver pod for each > cluster node. > The "oc logs -l daemonset-node-resolver" only yields a couple of these > messages: > dig: parse of /etc/resolv.conf failed > This may be because it (for whatever reason) contains a line containing > search . > and no real search suffix? Otherwise it looks fine (I checked on the nodes), > just two nameserver lines. [...] This would explain the problem. The code that adds the entry to /etc/hosts looks up the IP address for the entry using a dig command (see <https://github.com/openshift/cluster-dns-operator/blob/master/assets/node-resolver/update-node-resolver.sh>). The dig command is failing because it cannot parse /etc/resolv.conf, so no entry is added to /etc/hosts. I suspect that having "search ." in /etc/resolv.conf is invalid. If it is valid, it isn't clear what it means (I could guess, but guesses aren't documentation). If "search ." is valid, then it should be documented, and the dig command should be fixed to allow it. If "search ." is invalid, then we need to fix the logic that added it. Do you know what added "search ." to /etc/resolv.conf?
If that is the cause ... why do then some nodes have the /etc/hosts entry and some do not? I think it is caused by the node VM being statically IP configured with Afterburn. The Afterburn specification is nothing other than kernel arguments being passed through a vSphere VM advanced property. And the IP configuration kernel arguments allow specifiying IP, netmask, gateway and DNS servers but not a DNS search suffix. This causes /etc/resolv.conf being configured with a "search ." entry. I think that . is just a synonym for "root" domain. This "search ." thing is not new, it's already there since when Afterburn was introduced a couple of 4.x minor releases earlier. See here: https://github.com/openshift/okd/issues/648 As this issue's thread indicated, that issue ought to have been fixed already. Looks like it somehow it returned?
(In reply to Kai-Uwe Rommel from comment #9) > If that is the cause ... why do then some nodes have the /etc/hosts entry > and some do not? Do all nodes have "search ." in /etc/resolv.conf? [...] > This "search ." thing is not new, it's already there since when Afterburn > was introduced a couple of 4.x minor releases earlier. > See here: https://github.com/openshift/okd/issues/648 > As this issue's thread indicated, that issue ought to have been fixed > already. > Looks like it somehow it returned? Thanks for linking that issue! From there, I found the change that addressed the issue when it came up before: https://github.com/openshift/okd-machine-os/pull/159/files The change was to add a "fix-resolv-conf-search.service" systemd unit file that removes the problematic "search ." stanza. Can you verify that your node hosts have this unit file and that it has been run? systemctl status fix-resolv-conf-search.service The unit file checks resolv.conf using a very specific regular expression: "^search .$". If the regexp doesn't exactly match (for example, if there are additional blanks or a comment on the same line), the unit file won't remove the "search ." stanza. Would it be possible for you to attach the resolv.conf file (possibly in a private attachment/comment)? If resolv.conf is different on nodes with the /etc/hosts entry for the cluster image registry and nodes without that entry, could you share resolv.conf from both a node with the entry and a node without?
My current test cluster has 3 master and 3 worker nodes. On this cluster, only master-03 had this problem. See above, I also had different results. On all nodes, resolv.conf is identical. I will upload a copy here. On the master-03 node: [root@master-03 ~]# systemctl status fix-resolv-conf-search.service ○ fix-resolv-conf-search.service - Remove search . from /etc/resolv.conf Loaded: loaded (/usr/lib/systemd/system/fix-resolv-conf-search.service; enabled; vendor preset: enabled) Active: inactive (dead) since Fri 2022-03-11 20:01:31 UTC; 1 week 5 days ago Process: 950 ExecStartPre=/usr/bin/sleep 5 (code=exited, status=0/SUCCESS) Process: 1256 ExecStart=/usr/bin/sed -i -e s/^search .$// /run/systemd/resolve/resolv.conf (code=exited, status=0/SUCCESS) Process: 1257 ExecStart=/usr/bin/cp /run/systemd/resolve/resolv.conf /var/run/NetworkManager/resolv.conf (code=exited, status=0/SUCCESS) Main PID: 1257 (code=exited, status=0/SUCCESS) CPU: 9ms I also checked one of the other nodes - same status. So that fix is still in place and appears to run successfully. But it did not remove the "search ." from /etc/resolv.conf. [root@master-03 ~]# ll /etc/resolv.conf lrwxrwxrwx. 1 root root 39 Mar 11 17:17 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf [root@master-03 ~]# ll /run/systemd/resolve/ total 4 srw-rw-rw-. 1 systemd-resolve systemd-resolve 0 Mar 11 20:01 io.systemd.Resolve drwx------. 2 systemd-resolve systemd-resolve 80 Mar 11 20:01 netif -rw-r--r--. 1 systemd-resolve systemd-resolve 814 Mar 11 20:01 resolv.conf lrwxrwxrwx. 1 systemd-resolve systemd-resolve 11 Mar 11 20:01 stub-resolv.conf -> resolv.conf So it is still referencing the correct file. The systemd file looks OK (it does have "" around the sed command: [root@master-03 ~]# cat /etc/systemd/system/multi-user.target.wants/fix-resolv-conf-search.service [Unit] Description=Remove search . from /etc/resolv.conf DefaultDependencies=no Requires=systemd-resolved.service After=systemd-resolved.service BindsTo=systemd-resolved.service [Service] Type=oneshot ExecStartPre=/usr/bin/sleep 5 ExecStart=/usr/bin/sed -i -e "s/^search .$//" /run/systemd/resolve/resolv.conf ExecStart=/usr/bin/cp /run/systemd/resolve/resolv.conf /var/run/NetworkManager/resolv.conf [Install] WantedBy=multi-user.target According to the systemctl status, it must have run and it did not emit an error return code. The systemctl status shows the time when it was run and the resolv.conf also has exactly this time as its time stamp. When I execute the sed command manually, it does remove the line. On one node, I manually ran it with systemctl: [root@master-02 resolve]# systemctl start fix-resolv-conf-search.service [root@master-02 resolve]# systemctl status fix-resolv-conf-search.service ○ fix-resolv-conf-search.service - Remove search . from /etc/resolv.conf Loaded: loaded (/usr/lib/systemd/system/fix-resolv-conf-search.service; enabled; vendor preset: enabled) Active: inactive (dead) since Thu 2022-03-24 10:44:09 UTC; 3s ago Process: 4094059 ExecStartPre=/usr/bin/sleep 5 (code=exited, status=0/SUCCESS) Process: 4094230 ExecStart=/usr/bin/sed -i -e s/^search .$// /run/systemd/resolve/resolv.conf (code=exited, status=0/SUCCESS) Process: 4094231 ExecStart=/usr/bin/cp /run/systemd/resolve/resolv.conf /var/run/NetworkManager/resolv.conf (code=exited, status=0/SUCCESS) Main PID: 4094231 (code=exited, status=0/SUCCESS) CPU: 9ms The line "search ." was removed from resolv.conf ... So why not during the initial run. Really strange. This is strange. On the other hand, while this might be a problem, it could still be unrelated. The status of this "search ." thing is identical on all nodes. But the line for the internal image registry was only missing on one node (master-03). I do not see a difference between the nodes from my point of view. What I can do next is recreate the cluster and before enabling the image registry, check this "search ." status and re-run the fix-resolv-conf-search.service and make sure this line is removed. And then enable the image registry and check if the problem still appears or not. That should give use a more valid indication if these two problems are related at all.
Created attachment 1868095 [details] resolv.conf from master-03
I reinstalled the test cluster from scratch. Result was: Of the 3 master and 3 worker nodes, the master-01 and worker-02 afterwards did NOT have the "search ." line in their /etc/resolv.conf, the other 4 nodes still had it. Again, the fix-resolv-conf-search.service was run on all 6 nodes and did not log any error on any of the 6 nodes. No idea why it worked on 2 and not on the other 4 nodes. And the interesting bit: when I then enabled the image registry, the addition of the line to /etc/hosts worked on master-01 and worker-02 and not on the other 4. So it looks like the "search ." issue does indeed then later causes the /etc/hosts line missing issue. And even more interesting: on the worker-02 node, the "search ." line did reappear in /etc/resolv.conf after a reboot (which was caused by a later certificate change). I then manually ran the fix-resolv-conf-search.service and the line was removed again. Rebooted that node and it was back again. In journalctl I can see that the fix-resolv-conf-search.service is run after each boot. So I assume the "search ." is also added back to /etc/resolv.conf on each boot by NetworkManager (?) and should be removed then by this fix service. Sometimes this works and sometimes it does not. Perhaps some dependencies for this fix-resolv-conf-search.service are not set correctly? It has set "After=systemd-resolved.service" but apparently this is not sufficient? Should rather be "After=NetworkManager.service"? (Can't change it easily since it is in the read-only /usr tree.) The fix looks weird anyway ... perhaps rather NetworkManager should be fixed to not create a "search ." line in resolv.conf at all? Or dig etc. be fixed to not complain about this?
(In reply to Kai-Uwe Rommel from comment #13) [...] > So it looks like the "search ." issue does indeed then later causes the > /etc/hosts line missing issue. Thanks for verifying this! > And even more interesting: on the worker-02 node, the "search ." line did > reappear in /etc/resolv.conf after a reboot (which was caused by a later > certificate change). > I then manually ran the fix-resolv-conf-search.service and the line was > removed again. > Rebooted that node and it was back again. It's worth noting that the kubelet reads the host's /etc/resolv.conf file and uses it to write the pod's /etc/resolv.conf file when the pod starts. There are two important points here. First, the kubelet does not update the pod's /etc/resolv.conf file when the host's /etc/resolv.conf file is updated, so the error could be caused by a version of /etc/resolv.conf that no longer exists on the host. Second, the kubelet parses the host's /etc/resolv.conf file and generates a new /etc/resolv.conf file for the pod, so it is possible that the kubelet has an error in its resolv.conf parsing or generation logic. These two points could explain some of the confusion; did you check /etc/resolv.conf inside the pod, or only on the node host? (My fault for asking you to check the host file and not the pod file earlier.) In particular, this code in the kubelet is suspect: if fields[0] == "search" { // Normalise search fields so the same domain with and without trailing dot will only count once, to avoid hitting search validation limits. searches = []string{} for _, s := range fields[1:] { searches = append(searches, strings.TrimSuffix(s, ".")) } } https://github.com/kubernetes/kubernetes/blob/bfe649dbc07a3707fe342b971a1dad422e6cb95f/pkg/kubelet/network/dns/dns.go#L270 It looks like the kubelet would append the "." entry after trimming the ".", resulting in an empty entry "" and thus a line with just "search ". Can you attach /etc/resolv.conf from a node-resolver pod that is logging the "parse of /etc/resolv.conf failed" errors? > In journalctl I can see that the fix-resolv-conf-search.service is run after > each boot. > So I assume the "search ." is also added back to /etc/resolv.conf on each > boot by NetworkManager (?) and should be removed then by this fix service. > Sometimes this works and sometimes it does not. Perhaps some dependencies > for this fix-resolv-conf-search.service are not set correctly? > It has set "After=systemd-resolved.service" but apparently this is not > sufficient? > Should rather be "After=NetworkManager.service"? (Can't change it easily > since it is in the read-only /usr tree.) Perhaps we should check with Vadim, who merged <https://github.com/openshift/okd-machine-os/pull/159>, the PR that added fix-resolv-conf-search.service. Vadim, can you provide some guidance here? > The fix looks weird anyway ... perhaps rather NetworkManager should be fixed > to not create a "search ." line in resolv.conf at all? > Or dig etc. be fixed to not complain about this? We could add a workaround in the node-resolver script, which might be the expedient solution. I would prefer if the issue were fixed in the kubelet, NetworkManager, or dig (if it turns out to be an issue in dig), but I would need to hand this BZ off to the team that owns one or another of those components (I work on the DNS operator and other OpenShift components, not on the kubelet or on RHEL system tools or daemons).
In FCOS 33, if OKD is created with static IPs via kernel arguments the DNS a domain not created resulting in an /etc/resolv.conf without a search entry. This is handled by systemd-resolved FCOS 33 uses systemd-246.14-1.fc33 In FCOS 34 this behavior changed and now systems with static IPs via kernel args and no DNS domain have a line search . added like this nameserver 10.99.111.1 nameserver 10.99.111.2 search . This was introduced with: systemd/systemd#17201 FCOS 34 and above use systemd-248.3-1.fc34 which includes the above 'enhancement' Unfortunately this seems to causes a problem with OKD cluster DNS resolution as cluster domains no longer seem to work. Adding new element results in issues such as openshift/okd#694 Get "https://image-registry.openshift-image-registry.svc:5000/v2/": dial tcp: lookup image-registry.openshift-image-registry.svc on 10.10.8.132:53: no such host
I am guessing that we have a new timing issue and that the fix service runs too soon.
somewhat related BZs: https://bugzilla.redhat.com/show_bug.cgi?id=1976858 https://bugzilla.redhat.com/show_bug.cgi?id=1874419
Yes, I can confirm that the "search ." in the node's /etc/resolv.conf leads to lines with just "search" in the node-resolver pods. (We had this before, it's documented somewhere in the cited Github issue, if I remember correctly.) Instead of just fixing the timing of the fix service I'd rather suggest to "really" fix the problem in kubelet (to not generate "search" lines), dig etc. (to not fail on such lines) and NetworkManager (to not write a "search ." line on the node), if possible. :-)
I've proposed a fix in the kubelet upstream; see <https://github.com/kubernetes/kubernetes/pull/109441>.
As a side node, the real "root cause" of this issue is a change made to systemd-resolved in FCOS 34. This made `search .` a default value if there is not a predetermined search domain defined instead of taking it from the FQDN. https://github.com/openshift/okd-machine-os/pull/158 and specifically https://github.com/systemd/systemd/pull/17201 Everything we are doing now is a workaround due to this change. The is an RFE to add a search domain kernal argument to potentially fix this issue. This only seems to happen when static IPs are specified in the IP: stanza of KARGS and there is no way to pass a search domain
melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-25-081133 True False 79m Cluster version is 4.11.0-0.nightly-2022-06-25-081133 melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc get nodes NAME STATUS ROLES AGE VERSION mjoseph-2092041-dtjsp-master-0 Ready master 104m v1.24.0+284d62a mjoseph-2092041-dtjsp-master-1 Ready master 103m v1.24.0+284d62a mjoseph-2092041-dtjsp-master-2 Ready master 104m v1.24.0+284d62a mjoseph-2092041-dtjsp-worker-northcentralus-1 Ready worker 89m v1.24.0+284d62a mjoseph-2092041-dtjsp-worker-northcentralus-2 Ready worker 88m v1.24.0+284d62a melvinjoseph@mjoseph-mac Downloads % oc debug node/mjoseph-2092041-dtjsp-master-0 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/mjoseph-2092041-dtjsp-master-0-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.8 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# sh-4.4# sh-4.4# cat /etc/resolv.conf search kh5qvjfkweyelc1wrf32mjnrea.ex.internal.cloudapp.net nameserver 168.63.129.16 sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.30.245.193 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % oc debug node/mjoseph-2092041-dtjsp-master-1 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/mjoseph-2092041-dtjsp-master-1-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.6 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# sh-4.4# sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.30.245.193 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# cat /etc/resolv.conf search kh5qvjfkweyelc1wrf32mjnrea.ex.internal.cloudapp.net nameserver 168.63.129.16 sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc debug node/mjoseph-2092041-dtjsp-master-2 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/mjoseph-2092041-dtjsp-master-2-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.7 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# sh-4.4# sh-4.4# cat /etc/resolv.conf search kh5qvjfkweyelc1wrf32mjnrea.ex.internal.cloudapp.net nameserver 168.63.129.16 sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.30.245.193 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % oc debug node/mjoseph-2092041-dtjsp-worker-northcentralus-1 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/mjoseph-2092041-dtjsp-worker-northcentralus-1-debug ... To use host binaries, run `chroot /host` cat /etc/hosts Pod IP: 10.0.1.5 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.30.245.193 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# cat /etc/resolv.conf search kh5qvjfkweyelc1wrf32mjnrea.ex.internal.cloudapp.net nameserver 168.63.129.16 sh-4.4# sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % oc debug node/mjoseph-2092041-dtjsp-worker-northcentralus-2 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/mjoseph-2092041-dtjsp-worker-northcentralus-2-debug ... To use host binaries, run `chroot /host` cat /etc/hosts Pod IP: 10.0.1.4 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.30.245.193 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# cat /etc/resolv.conf search kh5qvjfkweyelc1wrf32mjnrea.ex.internal.cloudapp.net nameserver 168.63.129.16 sh-4.4# sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads %
melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.okd-2022-06-25-172439 True False 14m Cluster version is 4.11.0-0.okd-2022-06-25-172439 melvinjoseph@mjoseph-mac Downloads % oc get infrastructure cluster -o=jsonpath={.spec.platformSpec.type} VSphere% melvinjoseph@mjoseph-mac Downloads % oc get nodes NAME STATUS ROLES AGE VERSION jimaokd03-lkb47-master-0 Ready master 44m v1.24.0+284d62a jimaokd03-lkb47-master-1 Ready master 45m v1.24.0+284d62a jimaokd03-lkb47-master-2 Ready master 45m v1.24.0+284d62a jimaokd03-lkb47-worker-565vf Ready worker 35m v1.24.0+284d62a jimaokd03-lkb47-worker-7dv7l Ready worker 35m v1.24.0+284d62a melvinjoseph@mjoseph-mac Downloads % oc debug node/jimaokd03-lkb47-master-0 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd03-lkb47-master-0-debug ... To use host binaries, run `chroot /host` at /etc/hosts^[[DPod IP: 172.31.249.17 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.97.223 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# cat /etc/resolv.conf search us-west-2.compute.internal jimaokd03.qe.devcluster.openshift.com nameserver 172.31.249.17 nameserver 10.3.192.12 sh-4.4# sh-4.4# sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % oc debug node/jimaokd03-lkb47-master-1 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd03-lkb47-master-1-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.249.122 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.97.223 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# cat /etc/resolv.conf search us-west-2.compute.internal jimaokd03.qe.devcluster.openshift.com nameserver 172.31.249.122 nameserver 10.3.192.12 sh-4.4# sh-4.4# sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % oc debug node/jimaokd03-lkb47-master-2 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd03-lkb47-master-2-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.249.153 If you don't see a command prompt, try pressing enter. sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.97.223 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# cat /etc/resolv.conf search us-west-2.compute.internal jimaokd03.qe.devcluster.openshift.com nameserver 172.31.249.153 nameserver 10.3.192.12 sh-4.4# sh-4.4# sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % oc debug nodes/jimaokd03-lkb47-worker-565vf Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd03-lkb47-worker-565vf-debug ... To use host binaries, run `chroot /host` cat /etc/hosts Pod IP: 172.31.249.177 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.97.223 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# cat /etc/resolv.conf search us-west-2.compute.internal jimaokd03.qe.devcluster.openshift.com nameserver 172.31.249.177 nameserver 10.3.192.12 sh-4.4# sh-4.4# sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % oc debug nodes/jimaokd03-lkb47-worker-7dv7l Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd03-lkb47-worker-7dv7l-debug ... To use host binaries, run `chroot /host` cat /etc/hosts Pod IP: 172.31.249.213 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# sh-4.4# sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.97.223 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# cat /etc/resolv.conf search us-west-2.compute.internal jimaokd03.qe.devcluster.openshift.com nameserver 172.31.249.213 nameserver 10.3.192.12 Working fine in OKD latest image also. Hence marking as verified.
For safe side, one more cluster using OKD Vsphere UPI. melvinjoseph@mjoseph-mac Downloads % oc get infrastructure cluster -o=jsonpath={.spec.platformSpec.type} VSphere% melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.okd-2022-06-25-172439 True False 85m Cluster version is 4.11.0-0.okd-2022-06-25-172439 melvinjoseph@mjoseph-mac Downloads % oc get nodes NAME STATUS ROLES AGE VERSION jimaokd04-x55kc-compute-0 Ready worker 96m v1.24.0+284d62a jimaokd04-x55kc-compute-1 Ready worker 96m v1.24.0+284d62a jimaokd04-x55kc-control-plane-0 Ready master 106m v1.24.0+284d62a jimaokd04-x55kc-control-plane-1 Ready master 106m v1.24.0+284d62a jimaokd04-x55kc-control-plane-2 Ready master 106m v1.24.0+284d62a melvinjoseph@mjoseph-mac Downloads % oc debug node/jimaokd04-x55kc-compute-0 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd04-x55kc-compute-0-debug ... To use host binaries, run `chroot /host` cat /etc/hosts Pod IP: 172.31.248.35 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.193.125 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# sh-4.4# cat /etc/resolv.conf nameserver 10.3.192.12 sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc debug node/jimaokd04-x55kc-compute-1 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd04-x55kc-compute-1-debug ... To use host binaries, run `chroot /host` cat /etc/hosts Pod IP: 172.31.248.29 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.193.125 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# sh-4.4# sh-4.4# sh-4.4# sh-4.4# cat /etc/resolv.conf nameserver 10.3.192.12 sh-4.4# sh-4.4# sh-4.4# sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc debug node/jimaokd04-x55kc-control-plane-0 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd04-x55kc-control-plane-0-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.248.84 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.193.125 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# cat /etc/resolv.conf nameserver 10.3.192.12 sh-4.4# sh-4.4# sh-4.4# exit exit Removing debug pod ... ^[[A% melvinjoseph@mjoseph-mac Downloads % oc debug node/jimaokd04-x55kc-control-plane-1 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd04-x55kc-control-plane-1-debug ... To use host binaries, run `chroot /host` cat /etc/hosts cat /etc/resolv.conf Pod IP: 172.31.248.37 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.193.125 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# cat /etc/resolv.conf nameserver 10.3.192.12 sh-4.4# sh-4.4# sh-4.4# exit exit Removing debug pod ... melvinjoseph@mjoseph-mac Downloads % oc debug node/jimaokd04-x55kc-control-plane-2 Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jimaokd04-x55kc-control-plane-2-debug ... To use host binaries, run `chroot /host` cat /etc/hosts cat /etc/resolv.conf Pod IP: 172.31.248.28 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# cat /etc/hosts # Kubernetes-managed hosts file (host network). # Loopback entries; do not change. # For historical reasons, localhost precedes localhost.localdomain: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # See hosts(5) for proper format and other examples: # 192.168.1.10 foo.mydomain.org foo # 192.168.1.13 bar.mydomain.org bar 172.30.193.125 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver sh-4.4# sh-4.4# cat /etc/resolv.conf xnameserver 10.3.192.12
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069