1888520 – [OSP] Kubelet can not access api-int on 1/3 masters

Bug 1888520 - [OSP] Kubelet can not access api-int on 1/3 masters

Summary: [OSP] Kubelet can not access api-int on 1/3 masters

Keywords:
Status:	CLOSED DUPLICATE of bug 1888301
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Harshal Patil
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-15 05:51 UTC by weiwei jiang
Modified:	2020-10-15 13:28 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-15 13:28:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description weiwei jiang 2020-10-15 05:51:32 UTC

Description of problem:

[root@wj45ios1015a-xtq4n-master-2 core]# journalctl -f -u kubelet|grep -i kubelet_node_status
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.305412  362998 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.730980  362998 kubelet_node_status.go:342] Adding node label from cloud provider: beta.kubernetes.io/instance-type=m1.xlarge
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.731009  362998 kubelet_node_status.go:344] Adding node label from cloud provider: node.kubernetes.io/instance-type=m1.xlarge
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.731050  362998 kubelet_node_status.go:355] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=nova
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.731057  362998 kubelet_node_status.go:357] Adding node label from cloud provider: topology.kubernetes.io/zone=nova
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.731062  362998 kubelet_node_status.go:361] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=regionOne
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.731066  362998 kubelet_node_status.go:363] Adding node label from cloud provider: topology.kubernetes.io/region=regionOne
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.741407  362998 kubelet_node_status.go:486] Recording NodeHasSufficientMemory event message for node wj45ios1015a-xtq4n-master-2
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.741431  362998 kubelet_node_status.go:486] Recording NodeHasNoDiskPressure event message for node wj45ios1015a-xtq4n-master-2
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.741445  362998 kubelet_node_status.go:486] Recording NodeHasSufficientPID event message for node wj45ios1015a-xtq4n-master-2
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: I1015 05:27:51.741476  362998 kubelet_node_status.go:70] Attempting to register node wj45ios1015a-xtq4n-master-2
Oct 15 05:27:51 wj45ios1015a-xtq4n-master-2 hyperkube[362998]: E1015 05:27:51.742645  362998 kubelet_node_status.go:92] Unable to register node "wj45ios1015a-xtq4n-master-2" with API server: Post https://api-int.wj45ios1015a.1015-r-b.qe.rhcloud.com:6443/api/v1/nodes: dial tcp 192.168.0.5:6443: connect: connection refused


Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-10-14-095539

How reproducible:
Always?

Steps to Reproduce:
1. Setup one IPI on OSP cluster
2. Check if all master nodes are ready
3.

Actual results:
$ oc get nodes -o wide
NAME                          STATUS     ROLES    AGE    VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                     CONTAINER-RUNTIME
wj45ios1015a-xtq4n-master-0   Ready      master   170m   v1.18.3+2fbd7c7   192.168.2.45    <none>        Red Hat Enterprise Linux CoreOS 45.82.202010091130-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.18.3-19.rhaos4.5.git9264b
4f.el8
wj45ios1015a-xtq4n-master-1   Ready      master   170m   v1.18.3+2fbd7c7   192.168.0.248   <none>        Red Hat Enterprise Linux CoreOS 45.82.202010091130-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.18.3-19.rhaos4.5.git9264b
4f.el8
wj45ios1015a-xtq4n-master-2   NotReady   master   170m   v1.18.3+2fbd7c7   192.168.2.193   <none>        Red Hat Enterprise Linux CoreOS 45.82.202010091130-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.18.3-19.rhaos4.5.git9264b
4f.el8


Expected results:
All master nodes should be ready without error.

Additional info:

Comment 1 weiwei jiang 2020-10-15 05:56:53 UTC

FYI.
The expected progress should be the following:
Kubelet -> preroute iptables change target port from 6443 to 9445(haproxy-monitor) ->  haproxy(listen 9445 port) -> dispatch traffic to the backends based on healthy status(should be one of the masters).

The 192.168.0.5(VIP, which is managed by keepalived) is assigned to one of the masters. in my scenario, it's master-2.

Comment 3 Martin André 2020-10-15 13:28:24 UTC


*** This bug has been marked as a duplicate of bug 1888301 ***

Note You need to log in before you can comment on or make changes to this bug.