Bug 2093046 - must-gather debug pods are missing priority class
Summary: must-gather debug pods are missing priority class
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.12.0
Assignee: Arda Guclu
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-02 19:35 UTC by Stephen Benjamin
Modified: 2023-09-18 04:38 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:49:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 1263 0 None open Bug 2093046: oc debug: Add priorityClassName into node debugging pod template 2022-10-12 08:59:52 UTC
Github openshift origin pull 27204 0 None Merged Bug 2093046: priority class: add exception for must-gather pending bug fix 2022-10-12 08:59:53 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:49:50 UTC

Description Stephen Benjamin 2022-06-02 19:35:08 UTC
`oc` uses "system-cluster-critical" for the must-gather pods it starts[1], but when the scripts in must-gather launch debug pods for log collection[2][3], these are missing a priority class.

We have a test that actually tries to make sure all pods in openshift-* namespaces have a priority class, and if it happens to run at the same time we're testing must-gather, we get test failures[4].



[1] https://github.com/openshift/oc/blob/master/pkg/cli/admin/mustgather/mustgather.go#L686-L689
[2] https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_core_dumps#L11
[3] https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_multus_logs#L13
[4] https://search.ci.openshift.org/?search=pods+found+with+invalid+priority+class&maxAge=168h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 4 Stephen Benjamin 2022-09-13 12:14:16 UTC
My workaround wasn't accepted to `oc`.  The feedback was it should be properly addressed by the callers to `oc debug`:

@astoycos who added the first in openshift/must-gather@e99c730 and @nicklesimba who added the latter in openshift/must-gather@4c55296 your approach is breaking our tests and is not the expected way to run pods. I see two path forward:

    revert the changes
    fix them, such that the test linked by @stbenjam in this PR doesn't fail


https://github.com/openshift/oc/pull/1160#discussion_r889059354

Comment 6 zhou ying 2022-10-25 02:37:04 UTC
checked with latest oc client, the debug pod will has the priorityClassName: openshift-user-critical

oc version --client
Client Version: 4.12.0-0.nightly-2022-10-25-005239
Kustomize Version: v4.5.7


[root@localhost ~]# cat /tmp/poddebug.yaml 
apiVersion: v1
kind: Pod
metadata:
  annotations:
    debug.openshift.io/source-container: container-00
    debug.openshift.io/source-resource: /v1, Resource=nodes/ip-xxx.internal
    openshift.io/scc: privileged
  creationTimestamp: "2022-10-25T02:30:40Z"
  name: ip-xxx-debug
  namespace: openshift-debug-lpdgm
  resourceVersion: "41861"
  uid: 2d4e5544-5497-48e8-8446-ed25942b4224
spec:
...
...
  priority: 1000000000
  priorityClassName: openshift-user-critical

Comment 7 Arda Guclu 2022-10-25 08:03:11 UTC
This is the revert PR https://github.com/openshift/origin/pull/27469 to remove must-gather from exception list. Because problem is fixed.

Comment 10 errata-xmlrpc 2023-01-17 19:49:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Comment 11 Red Hat Bugzilla 2023-09-18 04:38:23 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.