Bug 1880365

Summary: kube-proxy health check probe failing and pods are seen in CrashLoopBackOff state
Product: OpenShift Container Platform Reporter: Rutvik <rkshirsa>
Component: NetworkingAssignee: Andrew Stoycos <astoycos>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: aconstan, astoycos, jdesousa, klamb, mrooks, pooriya, rkshirsa, vwalek
Version: 4.3.z   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-15 13:15:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rutvik 2020-09-18 11:07:58 UTC
Description of problem:

One of our IBM CloudPack customers seems to be hitting the same issue which was fixed here https://bugzilla.redhat.com/show_bug.cgi?id=1820778

-<>-
Description:*  kube-proxy was working completely fine, but after the
upgrade from CE 2.6.2 to CE 3.2 it keeps restarting. It sounds like the
kube-proxy keeps dying complaining liveness/readiness probe is failing.
-<>-


Version-Release number of selected component (if applicable):

OCP 4.3.29 (Calico)


Actual results:
openshift-kube-proxy pods enter crashloop backoff 

Expected results:
openshift-kube-proxy pods run as normal and should not fail on healthchecks.

Additional info:

Below are the warnings collected from the pods which is why I think we might be hitting the same BZ again.

~~~~
2020-09-01T10:31:32.37026376Z W0901 10:31:32.369966       1 proxier.go:584] Failed to read file /lib/modules/4.18.0-147.20.1.el8_1.x86_64/modules.builtin with error open /lib/modules/4.18.0-147.20.1.el8_1.x86_64/modules.builtin: no such file or directory. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules

2020-09-01T10:31:32.584849247Z W0901 10:31:32.584805       1 proxier.go:597] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2020-09-01T10:31:32.587739958Z W0901 10:31:32.587730       1 proxier.go:597] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
~~~~


I've also come across this error in most of the kube-proxy pods. For this, I think we should also consider checking BZ https://bugzilla.redhat.com/show_bug.cgi?id=1843646.
~~~~
2020-09-17T09:09:48.91469971Z E0917 09:09:48.914578       1 proxier.go:1449] Failed to execute iptables-restore: exit status 4 (iptables-restore v1.8.4 (nf_tables):
2020-09-17T09:09:48.91469971Z line 27016: CHAIN_USER_DEL failed (Device or resource busy): chain KUBE-SEP-4VGFJQBLLXV6WZYJ
~~~~

Comment 2 Mark Rooks 2020-09-23 07:01:36 UTC
Duplicate of 1880680?

Comment 3 Pooriya Aghaalitari 2020-09-25 16:17:29 UTC
What is the timeline for a fix for this bug please? Thank you.

Comment 6 Andrew Stoycos 2020-09-28 21:15:58 UTC
Still investigating no timeline for a fix ATM, see https://bugzilla.redhat.com/show_bug.cgi?id=1880680 as well

Comment 7 Juan Luis de Sousa-Valadas 2020-10-02 10:55:34 UTC
Can you please provide kube-proxy logs at log level 5 or 6?

Comment 8 Andrew Stoycos 2020-10-15 13:15:49 UTC
This was a duplicate of the issue found here https://bugzilla.redhat.com/show_bug.cgi?id=1880365 see IPtables fix

*** This bug has been marked as a duplicate of bug 1880680 ***