Bug 2093046

Summary: must-gather debug pods are missing priority class
Product: OpenShift Container Platform Reporter: Stephen Benjamin <stbenjam>
Component: ocAssignee: Arda Guclu <aguclu>
oc sub component: oc QA Contact: zhou ying <yinzhou>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aguclu, astoycos, jchaloup, mfojtik, nsimha
Version: 4.11   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:49:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Stephen Benjamin 2022-06-02 19:35:08 UTC
`oc` uses "system-cluster-critical" for the must-gather pods it starts[1], but when the scripts in must-gather launch debug pods for log collection[2][3], these are missing a priority class.

We have a test that actually tries to make sure all pods in openshift-* namespaces have a priority class, and if it happens to run at the same time we're testing must-gather, we get test failures[4].

[1] https://github.com/openshift/oc/blob/master/pkg/cli/admin/mustgather/mustgather.go#L686-L689
[2] https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_core_dumps#L11
[3] https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_multus_logs#L13
[4] https://search.ci.openshift.org/?search=pods+found+with+invalid+priority+class&maxAge=168h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 4 Stephen Benjamin 2022-09-13 12:14:16 UTC
My workaround wasn't accepted to `oc`.  The feedback was it should be properly addressed by the callers to `oc debug`:

@astoycos who added the first in openshift/must-gather@e99c730 and @nicklesimba who added the latter in openshift/must-gather@4c55296 your approach is breaking our tests and is not the expected way to run pods. I see two path forward:

    revert the changes
    fix them, such that the test linked by @stbenjam in this PR doesn't fail


Comment 6 zhou ying 2022-10-25 02:37:04 UTC
checked with latest oc client, the debug pod will has the priorityClassName: openshift-user-critical

oc version --client
Client Version: 4.12.0-0.nightly-2022-10-25-005239
Kustomize Version: v4.5.7

[root@localhost ~]# cat /tmp/poddebug.yaml 
apiVersion: v1
kind: Pod
    debug.openshift.io/source-container: container-00
    debug.openshift.io/source-resource: /v1, Resource=nodes/ip-xxx.internal
    openshift.io/scc: privileged
  creationTimestamp: "2022-10-25T02:30:40Z"
  name: ip-xxx-debug
  namespace: openshift-debug-lpdgm
  resourceVersion: "41861"
  uid: 2d4e5544-5497-48e8-8446-ed25942b4224
  priority: 1000000000
  priorityClassName: openshift-user-critical

Comment 7 Arda Guclu 2022-10-25 08:03:11 UTC
This is the revert PR https://github.com/openshift/origin/pull/27469 to remove must-gather from exception list. Because problem is fixed.

Comment 10 errata-xmlrpc 2023-01-17 19:49:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 11 Red Hat Bugzilla 2023-09-18 04:38:23 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days