Bug 1995201

Summary: /proc/<pid>/net and /proc/net are being relabeled with a wrong SELinux context and that's causing pod probes failures
Product: OpenShift Container Platform Reporter: Mario Vázquez <mavazque>
Component: NodeAssignee: Sascha Grunert <sgrunert>
Node sub component: CRI-O QA Contact: pmali
Status: CLOSED WORKSFORME Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, cback, djuran, eparis, gwest, jokerman, josorior, mmasters, pehunt, sgrunert
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-11 10:22:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mario Vázquez 2021-08-18 15:57:40 UTC
Description of problem:

We are running our applications on OpenShift 4.6.32 and one of our containers is having issues when running their liveness and readiness probes.

The probes failed to run the following command:

netstat -ap | grep rsyslog | grep '/log/log'

These probes are working on all our nodes but two of them (master01 and worker06), we have investigated and below are our findings:

In the pod events we can see the following log:

Warning  Unhealthy       2m1s (x32 over 11m)  kubelet            Readiness probe failed: netstat: /proc/net/tcp: Permission denied
netstat: /proc/net/tcp6: Permission denied
netstat: /proc/net/udp: Permission denied
netstat: /proc/net/udp6: Permission denied
netstat: /proc/net/raw: Permission denied
netstat: /proc/net/raw6: Permission denied
netstat: /proc/net/unix: Permission denied

If we connect to the node running this container we can see that SELinux is the one blocking the probe (below log was found in /var/log/audit/audit.log):

type=AVC msg=audit(1629200144.726:765654): avc:  denied  { getattr } for  pid=3581173 comm="netstat" path="/proc/133/net/tcp" dev="proc" ino=4026545509 scontext=system_u:system_r:container_t:s0:c558,c946 tcontext=system_u:object_r:devtty_t:s0 tclass=file permissive=1

As you can see the file /proc/133/net/tcp has an invalid context "devtty_t" where "proc_net_t" is expected.

After rebooting the node the file got its valid context again and the probes started working on that node.

We have still one node (master01) that is hitting this issue, we would like to know what caused these folders to be labeled with an invalid SELinux context.


Version-Release number of selected component (if applicable):
4.6.32

How reproducible:
Only reproducible in one node, we don't know root cause so we don't have a reproducer at the moment.

Steps to Reproduce:
1. Schedule a pod with the probes in the description on master01
2. Probes will fail and pod will crashloopback


Actual results:
Folders being labeled with an invalid SELinux context causing probes to fail.

Expected results:
Folders being labeled with a proper SELinux context allowing probes to work.

Additional info:

Comment 2 Miciah Dashiel Butler Masters 2021-08-24 16:12:28 UTC
(In reply to Mario Vázquez from comment #0)
> We are running our applications on OpenShift 4.6.32 and one of our
> containers is having issues when running their liveness and readiness probes.

This looks like a general issue with the kubelet's probes, not with the router or DNS.  Re-assigning to Node/Kubelet for investigation.

Comment 10 Red Hat Bugzilla 2023-09-15 01:13:49 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days