Bug 1880365

Summary:	kube-proxy health check probe failing and pods are seen in CrashLoopBackOff state
Product:	OpenShift Container Platform	Reporter:	Rutvik <rkshirsa>
Component:	Networking	Assignee:	Andrew Stoycos <astoycos>
Networking sub component:	openshift-sdn	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aconstan, astoycos, jdesousa, klamb, mrooks, pooriya, rkshirsa, vwalek
Version:	4.3.z
Target Milestone:	---
Target Release:	4.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-10-15 13:15:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rutvik 2020-09-18 11:07:58 UTC

Description of problem:

One of our IBM CloudPack customers seems to be hitting the same issue which was fixed here https://bugzilla.redhat.com/show_bug.cgi?id=1820778

-<>-
Description:*  kube-proxy was working completely fine, but after the
upgrade from CE 2.6.2 to CE 3.2 it keeps restarting. It sounds like the
kube-proxy keeps dying complaining liveness/readiness probe is failing.
-<>-


Version-Release number of selected component (if applicable):

OCP 4.3.29 (Calico)


Actual results:
openshift-kube-proxy pods enter crashloop backoff 

Expected results:
openshift-kube-proxy pods run as normal and should not fail on healthchecks.

Additional info:

Below are the warnings collected from the pods which is why I think we might be hitting the same BZ again.

~~~~
2020-09-01T10:31:32.37026376Z W0901 10:31:32.369966       1 proxier.go:584] Failed to read file /lib/modules/4.18.0-147.20.1.el8_1.x86_64/modules.builtin with error open /lib/modules/4.18.0-147.20.1.el8_1.x86_64/modules.builtin: no such file or directory. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules

2020-09-01T10:31:32.584849247Z W0901 10:31:32.584805       1 proxier.go:597] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2020-09-01T10:31:32.587739958Z W0901 10:31:32.587730       1 proxier.go:597] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
~~~~


I've also come across this error in most of the kube-proxy pods. For this, I think we should also consider checking BZ https://bugzilla.redhat.com/show_bug.cgi?id=1843646.
~~~~
2020-09-17T09:09:48.91469971Z E0917 09:09:48.914578       1 proxier.go:1449] Failed to execute iptables-restore: exit status 4 (iptables-restore v1.8.4 (nf_tables):
2020-09-17T09:09:48.91469971Z line 27016: CHAIN_USER_DEL failed (Device or resource busy): chain KUBE-SEP-4VGFJQBLLXV6WZYJ
~~~~

Comment 2 Mark Rooks 2020-09-23 07:01:36 UTC

Duplicate of 1880680?

Comment 3 Pooriya Aghaalitari 2020-09-25 16:17:29 UTC

What is the timeline for a fix for this bug please? Thank you.

Comment 6 Andrew Stoycos 2020-09-28 21:15:58 UTC

Still investigating no timeline for a fix ATM, see https://bugzilla.redhat.com/show_bug.cgi?id=1880680 as well

Comment 7 Juan Luis de Sousa-Valadas 2020-10-02 10:55:34 UTC

Can you please provide kube-proxy logs at log level 5 or 6?

Comment 8 Andrew Stoycos 2020-10-15 13:15:49 UTC

This was a duplicate of the issue found here https://bugzilla.redhat.com/show_bug.cgi?id=1880365 see IPtables fix

*** This bug has been marked as a duplicate of bug 1880680 ***