2057378 – Pods in Complete state are not removed

Bug 2057378 - Pods in Complete state are not removed

Summary: Pods in Complete state are not removed

Keywords:
Status:	CLOSED DUPLICATE of bug 2050912
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	All
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Maciej Szulik
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-23 10:03 UTC by Joel Rosental R.
Modified:	2022-03-04 15:57 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-04 15:57:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Joel Rosental R. 2022-02-23 10:03:14 UTC

Description of problem:

After upgrading from 4.8 to 4.9.11 "completed" pods are not deleted anymore, e.g: cronjobs seems to ignore the successfulJobsHistoryLimit value.

In the KCM there are some occurrences of these lines:

~~~
2022-01-10T09:00:08.627571256Z E0110 09:00:08.627527       1 shared_informer.go:243] unable to sync caches for garbage collector
2022-01-10T09:00:08.627571256Z E0110 09:00:08.627541       1 garbagecollector.go:242] timed out waiting for dependency graph builder sync during GC sync (attempt 5559)
~~~

and no webhooks seem to be blocking GC to run.

Version-Release number of selected component (if applicable):
OCP 4.9.11

How reproducible:
Always (in customer env)

Steps to Reproduce:
1. Create any object that will create pods in "Completed" state, e.g: a cronjob, and set the "successfulJobsHistoryLimit" parameter.


Actual results:

Pods in "Completed" status last forever, e.g:

~~~
 oc get cronjob cronjob-ldap-group-sync -o yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  creationTimestamp: "2021-08-02T12:09:55Z"
  generation: 1
  labels:
    template: cronjob-ldap-group-sync-secure
    template.openshift.io/template-instance-owner: c09af01c-4c10-11ea-810b-0a580a80002e
  name: cronjob-ldap-group-sync
  namespace: oe930-cron
  resourceVersion: "476752196"
  uid: 7c11ecf3-86f3-4a3f-bccc-47a37a2f9764
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 5
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      backoffLimit: 0
      template:
        metadata:
          creationTimestamp: null
        spec:
          activeDeadlineSeconds: 500
          containers:
          - command:
            - /bin/bash
            - -c
            - oc adm groups sync --confirm --sync-config=/ldap-sync/config/ldap-group-sync.yaml
              $([ -s /ldap-sync/config/whitelist.txt ] && echo --whitelist=/ldap-sync/config/whitelist.txt)
            image: registry.redhat.io/openshift4/ose-cli:latest
            imagePullPolicy: IfNotPresent
            name: cronjob-ldap-group-sync
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /ldap-sync/config
              name: ldap-sync-config
            - mountPath: /ldap-sync/ca
              name: ldap-sync-ca
            - mountPath: /ldap-sync/secrets
              name: ldap-bind-password
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          serviceAccount: ldap-group-syncer
          serviceAccountName: ldap-group-syncer
          terminationGracePeriodSeconds: 30
          volumes:
          - configMap:
              defaultMode: 420
              name: ldap-group-sync
            name: ldap-sync-config
          - configMap:
              defaultMode: 420
              name: ldap-group-sync-ca
            name: ldap-sync-ca
          - name: ldap-bind-password
            secret:
              defaultMode: 420
              secretName: ldap-bind-password
  schedule: '@hourly'
  successfulJobsHistoryLimit: 0
  suspend: false
status:
  lastScheduleTime: "2022-01-11T09:00:00Z"
  lastSuccessfulTime: "2022-01-10T15:02:26Z"
# oc get pods
NAME                                        READY   STATUS      RESTARTS   AGE
cronjob-ldap-group-sync-27363900--1-gns4s   0/1     Completed   0          16h
cronjob-ldap-group-sync-27363960--1-mcrpq   0/1     Completed   0          15h
cronjob-ldap-group-sync-27364020--1-ckprd   0/1     Completed   0          14h
cronjob-ldap-group-sync-27364080--1-mt4sn   0/1     Completed   0          13h
cronjob-ldap-group-sync-27364140--1-q29wf   0/1     Completed   0          12h
cronjob-ldap-group-sync-27364200--1-n6hcg   0/1     Completed   0          11h
cronjob-ldap-group-sync-27364260--1-4bqm4   0/1     Completed   0          10h
cronjob-ldap-group-sync-27364320--1-kbv8b   0/1     Completed   0          9h
cronjob-ldap-group-sync-27364380--1-gwd9b   0/1     Completed   0          8h
cronjob-ldap-group-sync-27364440--1-n4jrp   0/1     Completed   0          7h15m
cronjob-ldap-group-sync-27364500--1-v458d   0/1     Completed   0          6h15m
cronjob-ldap-group-sync-27364560--1-cp29v   0/1     Completed   0          5h15m
cronjob-ldap-group-sync-27364620--1-6cnxn   0/1     Completed   0          4h15m
cronjob-ldap-group-sync-27364680--1-h5f4n   0/1     Completed   0          3h15m
cronjob-ldap-group-sync-27364740--1-zwmsn   0/1     Completed   0          135m
cronjob-ldap-group-sync-27364800--1-2zkvf   0/1     Completed   0          75m
cronjob-ldap-group-sync-27364860--1-4fnv2   0/1     Completed   0          15m
~~~

Expected results:

Completed pods should be cleaned by GC after a while and particularly cronjobs pods with successfulJobsHistoryLimit set, should be honoured.

Additional info:

Comment 6 Maciej Szulik 2022-03-04 15:57:53 UTC


*** This bug has been marked as a duplicate of bug 2050912 ***

Note You need to log in before you can comment on or make changes to this bug.