1720174 – No pod failover when multiple nodes are NotReady

Bug 1720174 - No pod failover when multiple nodes are NotReady

Summary: No pod failover when multiple nodes are NotReady

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.11.0
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Ryan Phillips
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1722288 (view as bug list)
Depends On:
Blocks:	1752894 1753995
TreeView+	depends on / blocked

Reported:	2019-06-13 10:14 UTC by Sergio G.
Modified:	2019-10-26 00:54 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1752894 1753995 (view as bug list)
Environment:
Last Closed:	2019-10-18 01:34:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 23779	0	None	closed	[release-3.11] Bug 1720174: upstream: Kubelet status manager sync the status of local pods	2021-01-14 01:19:55 UTC
Red Hat Product Errata	RHBA-2019:3139	0	None	None	None	2019-10-18 01:34:58 UTC

Description Sergio G. 2019-06-13 10:14:45 UTC

Description of problem:
No failover for pods if more than 2 nodes are failing at same time. As an example, stopping 1 infra node and 2 app nodes.


Version-Release number of selected component (if applicable):
3.11


How reproducible:
Not in my laboratory but it happens all the times in customer's facilities.


Steps to Reproduce:
1. Turn off more than 2 nodes (not a master involved)


Actual results:
2. Nodes are marked as NotReady
3. Wait for 5 minutes
4. Pods keep in Running state. No Unknown nor NodeLost states.
5. Wait for 5 minutes more
6. Turn on nodes
7. Pods are terminated and new ones are created


Expected results:
2. Nodes are marked as NotReady
3. Wait for 5 minutes
4. Pods change to Unknown or NodeLost depending if they are part of a deployment or daemonset. New pods are started up to meet required number of replicas whenver it applies
5. Wait for 5 minutes more
6. Turn on nodes
7. Unknown and NodeLost pods are terminated and new ones are created if apply.



Additional info:
- When turning off one single node the result is the expected. This only happens when more than 2 nodes are turned off.
- Note that no master has been turned off during the test.
- See attached file with master-api and master-controllers logs during the test.
- This is a stretched cluster between two datacenters:
DataCenter-1
  dfrsijaspcpm1.example.net  10.240.153.11  (master)
  dfrsijaspcpin1.example.net 10.240.153.13  (infra)
  dfrvijaspcplb1.example.net 10.241.232.92  (loadbalancer: haproxy)
  dfrsijaspcpcn1.example.net 10.240.153.17  (compute)
  dfrsijaspcpcn3.example.net 10.240.153.19  (compute)

DataCenter-2
  dfhsijaspcpm2.example.net  10.240.153.12  (master)
  dfhsijaspcpin2.example.net 10.240.153.14  (infra)
  dfhvijaspcplb2.example.net 10.241.232.93  (loadbalancer: haproxy)
  dfhsijaspcpcn2.example.net 10.240.153.18  (compute)
  dfhsijaspcpcn4.example.net 10.240.153.20  (compute)

DataCenter-3
  dfrsijaspcpm3.example.net  10.241.92.10  (master)
The underlying network is a VLAN with 2 masters, all infrastructure and all compute nodes together. The exception 
here is the third master and both Loadbalancers, they are located in other VLANs. 

The networks are low-latency (<1ms) 
10GBit network connections over multipathing, here the ping values between the masters:

root@dfrsijaspcpm1:~# ping -c 3 dfhsijaspcpm2
PING dfhsijaspcpm2.example.net (10.240.153.12) 56(84) bytes of data.
64 bytes from dfhsijaspcpm2.example.net (10.240.153.12): icmp_seq=1 ttl=64 time=0.442 ms
64 bytes from dfhsijaspcpm2.example.net (10.240.153.12): icmp_seq=2 ttl=64 time=0.429 ms
64 bytes from dfhsijaspcpm2.example.net (10.240.153.12): icmp_seq=3 ttl=64 time=0.462 ms

root@dfrsijaspcpm1:~# ping -c 3 dfrsijaspcpm3
PING dfrsijaspcpm3.examplenet (10.241.92.10) 56(84) bytes of data.
64 bytes from dfrsijaspcpm3.example.net (10.241.92.10): icmp_seq=1 ttl=60 time=0.588 ms
64 bytes from dfrsijaspcpm3.example.net (10.241.92.10): icmp_seq=2 ttl=60 time=0.598 ms
64 bytes from dfrsijaspcpm3.example.net (10.241.92.10): icmp_seq=3 ttl=60 time=0.614 ms

Comment 2 Sergio G. 2019-06-13 10:22:03 UTC

I tend to think that this is related with the fact that the cluster is spread but I can't find a reason why due to the very low latency and the fact that no masters have been turned off during the test so etcd and master-api is okay.

If you need anything else please let me know and I'll get it from customer.

Comment 15 Seth Jennings 2019-07-03 14:13:03 UTC

*** Bug 1722288 has been marked as a duplicate of this bug. ***

Comment 20 Ryan Phillips 2019-07-03 15:59:54 UTC

While going through the logs, I saw the new pods failed to be schedule. It's a slightly different issue, but if you could post all the events for the cluster (for all namespaces), that would help.

Comment 64 Sergio G. 2019-08-21 08:14:41 UTC

For whatever it's worth, the initial case which originated this bugzilla is no longer being affected. Customer replaced baremetal servers to host the master servers with virtual machines with the same hardware requirements, and the issue is gone.

It may still related to networking if the baremetal servers are differently connected than the virtual machines.

Comment 85 errata-xmlrpc 2019-10-18 01:34:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3139

Note You need to log in before you can comment on or make changes to this bug.

acavalla
akaiser
aos-bugs
asolanas
bfurtado
clpereir
gblomqui
jokerman
mfojtik
mmccomas
mnunes
openshift-bugs-escalate
palonsor
pweil
rphillips
rpuccini
rsunog
schoudha
sjenning
skolicha
tnozicka
xtian