Bug 1927836
| Summary: | Node becomes NotReady during upgrade; upgrade does not complete | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ruth Netser <rnetser> |
| Component: | Node | Assignee: | Ryan Phillips <rphillips> |
| Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | unspecified | CC: | aos-bugs, fdeutsch, pehunt, wking |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-16 14:49:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1927373 | ||
|
Description
Ruth Netser
2021-02-11 16:15:41 UTC
what's the output of `oc adm node-logs ssp04-rvkqg-worker-0-zvfkw` $ oc adm node-logs ssp04-rvkqg-worker-0-zvfkw error: error trying to reach service: dial tcp 192.168.1.21:10250: i/o timeout are you able to access via ssh? if not, you may need to reboot the node via a cloud console to access. I believe we'll need the node logs to make any forward progress this looks like a DNS issue: running `ssh core@ssp04-rvkqg-worker-0-zvfkw` fails but running `ssh core.1.21` (IP address gotten through `oc get nodes -o wide`) succeeds. On the node, the kubelet is complaining a bunch about ``` Feb 11 21:46:50 ssp04-rvkqg-worker-0-zvfkw hyperkube[2292]: E0211 21:46:50.878835 2292 kubelet.go:2190] node "ssp04-rvkqg-worker-0-zvfkw" not found ``` and trying with ping from another host shows: ``` [cnv-qe-jenkins@ssp04-rvkqg-executor ~]$ ping 192.168.1.21 PING 192.168.1.21 (192.168.1.21) 56(84) bytes of data. 64 bytes from 192.168.1.21: icmp_seq=1 ttl=64 time=1.24 ms 64 bytes from 192.168.1.21: icmp_seq=2 ttl=64 time=0.665 ms 64 bytes from 192.168.1.21: icmp_seq=3 ttl=64 time=0.595 ms 64 bytes from 192.168.1.21: icmp_seq=4 ttl=64 time=0.714 ms 64 bytes from 192.168.1.21: icmp_seq=5 ttl=64 time=0.654 ms 64 bytes from 192.168.1.21: icmp_seq=6 ttl=64 time=0.798 ms 64 bytes from 192.168.1.21: icmp_seq=7 ttl=64 time=0.658 ms ^C --- 192.168.1.21 ping statistics --- 7 packets transmitted, 7 received, 0% packet loss, time 111ms rtt min/avg/max/mdev = 0.595/0.761/1.244/0.206 ms [cnv-qe-jenkins@ssp04-rvkqg-executor ~]$ ping ssp04-rvkqg-worker-0-zvfkw ping: ssp04-rvkqg-worker-0-zvfkw: Name or service not known ``` From within the cluster it works, from one of the masters: [core@ssp04-rvkqg-master-0 ~]$ ping ssp04-rvkqg-worker-0-zvfkw PING ssp04-rvkqg-worker-0-zvfkw.ssp04.cnv-qe.rhcloud.com (192.168.1.21) 56(84) bytes of data. 64 bytes from host-192-168-1-21.openstacklocal (192.168.1.21): icmp_seq=1 ttl=64 time=1.45 ms 64 bytes from host-192-168-1-21.openstacklocal (192.168.1.21): icmp_seq=2 ttl=64 time=0.928 ms 64 bytes from host-192-168-1-21.openstacklocal (192.168.1.21): icmp_seq=3 ttl=64 time=0.615 ms 64 bytes from host-192-168-1-21.openstacklocal (192.168.1.21): icmp_seq=4 ttl=64 time=0.609 ms ^C --- ssp04-rvkqg-worker-0-zvfkw.ssp04.cnv-qe.rhcloud.com ping statistics --- (the access to the cluster is done via a separate vm, which does not have DNS) *** This bug has been marked as a duplicate of bug 1913532 *** |