Bug 1573122 - [3.7] OCP on Azure - if the kubelet can't reach the Azure API - marked as NotReady
Summary: [3.7] OCP on Azure - if the kubelet can't reach the Azure API - marked as Not...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.7.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.7.z
Assignee: Seth Jennings
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On: 1554748
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-30 08:51 UTC by Paul Dwyer
Modified: 2018-05-18 03:55 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Fixes and issue where a Node can stop reporting status if the connection to the Azure API is terminated uncleanly, resulting a long timeout before the connection is re-established and blocking the status update loop.
Clone Of: 1554748
Environment:
Last Closed: 2018-05-18 03:54:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1576 0 None None None 2018-05-18 03:55:05 UTC

Comment 19 DeShuai Ma 2018-05-10 08:48:54 UTC
Verify on v3.7.46

[root@dma37-master-etcd-nfs-1 ~]# oc version
oc v3.7.46
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dma37-master-etcd-nfs-1:8443
openshift v3.7.46
kubernetes v1.7.6+a08f5eeb62

1. Watch node status in master
[root@dma37-master-etcd-nfs-1 ~]# oc get no dma37-node-registry-router-1 -w
NAME                           STATUS    AGE       VERSION
dma37-node-registry-router-1   Ready     19m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     19m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     19m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     20m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     20m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     20m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     21m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     21m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     21m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     22m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     22m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     22m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     23m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     23m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     23m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     24m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     24m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     24m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     25m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     25m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     25m       v1.7.6+a08f5eeb62
dma37-node-registry-router-1   Ready     26m       v1.7.6+a08f5eeb62


2. No node block the connection with azure api then watch node log
[root@dma37-node-registry-router-1 ~]#  iptables -A OUTPUT -d management.azure.com -j DROP   
[root@dma37-node-registry-router-1 ~]# journalctl -f -u atomic-openshift-node.service |grep "Timeout after"
May 10 08:32:54 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:32:54.204527   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:33:14 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:33:14.225115   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:33:34 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:33:34.249522   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:33:54 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:33:54.274028   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:34:14 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:34:14.293589   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:34:34 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:34:34.318474   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:34:54 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:34:54.341656   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:35:26 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:35:26.402908   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:36:06 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:36:06.424409   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:36:46 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:36:46.454712   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:37:26 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:37:26.486660   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:38:06 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:38:06.552450   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s
May 10 08:38:46 dma37-node-registry-router-1 atomic-openshift-node[29230]: W0510 08:38:46.572796   29230 kubelet_node_status.go:1007] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s

Comment 22 errata-xmlrpc 2018-05-18 03:54:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1576


Note You need to log in before you can comment on or make changes to this bug.