Bug 2106668 - Removing a pod in a worker with an applied tuned profile hangs the worker
Summary: Removing a pod in a worker with an applied tuned profile hangs the worker
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Jiří Mencák
QA Contact: liqcui
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-13 09:24 UTC by Jose Castillo Lema
Modified: 2022-07-13 10:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-13 10:26:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jose Castillo Lema 2022-07-13 09:24:43 UTC
Description of problem:
Removing a pod with open listening sockets in a worker with a tuned profile hangs the worker

Version-Release number of selected component (if applicable):
OCP: 4.11.0-0.nightly-2022-06-25-081133

How reproducible:
100%

Steps to Reproduce:
1. Create a tuned profile:
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-ingress-performance
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Ingress performance profile
      include=openshift
      [sysctl]
      net.ipv4.ipfrag_time=0
      net.ipv4.ipfrag_high_thresh=33554432
      net.ipv4.ipfrag_low_tresh=9437184
    name: openshift-ingress-performance
  recommend:
  - match:
    - label: node-role.kubernetes.io/worker
    priority: 10
    profile: openshift-ingress-performance

2. Create an iperf server pod:
apiVersion: v1
kind: Pod
metadata:
  name: iperf-server
spec:
  nodeName: worker000-r640
  hostNetwork: true
  containers:
  - name: server
    image: quay.io/sronanrh/iperf

3. Start the iperf server:
$ oc exec -it iperf-server -- iperf3 -s

4. Remove the pod
$ oc delete po iperf-server

Actual results:
The worker were the iperf pod was running hangs in NotReady state:
$ oc get node
NAME             STATUS     ROLES    AGE   VERSION
master-0         Ready      master   15d   v1.24.0+284d62a
master-1         Ready      master   15d   v1.24.0+284d62a
master-2         Ready      master   15d   v1.24.0+284d62a
worker000-r640   NotReady   worker   15d   v1.24.0+284d62a   <===
worker001-r640   Ready      worker   15d   v1.24.0+284d62a

Expected results:
No worker gets NoReady state

Comment 1 Jose Castillo Lema 2022-07-13 10:26:06 UTC
It looks like net.ipv4.ipfrag_time=0 was the culprit. With net.ipv4.ipfrag_time=1 we can not reproduce this behaviour. Closing the BZ.


Note You need to log in before you can comment on or make changes to this bug.