Bug 1820687 - [sig-devex][Feature:ImageEcosystem][Slow] openshift images should be SCL enabled is forbidden: unable to validate against any security context constraint: []
Summary: [sig-devex][Feature:ImageEcosystem][Slow] openshift images should be SCL enab...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard:
: 1817099 1822298 (view as bug list)
Depends On:
Blocks: 1829327
TreeView+ depends on / blocked
 
Reported: 2020-04-03 15:34 UTC by Gabe Montero
Modified: 2020-07-13 17:25 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1829327 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:25:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-controller-manager-operator pull 394 0 None closed Bug 1820687: allow patch for updating namespace 2021-01-26 11:37:04 UTC
Github openshift cluster-policy-controller pull 22 0 None closed Bug 1820687: use patch when updating namespace 2021-01-26 11:36:22 UTC
Github openshift origin pull 24828 0 None closed Bug 1820687: NS SCC annotations exist, else forbidden: unable to validate... 2021-01-26 11:37:04 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:25:56 UTC

Description Gabe Montero 2020-04-03 15:34:51 UTC
So these test cases have flaked in a few ways.

This bug covers the edge case where the test namespace does not have its SCC annotations because of 

- namespace_scc_allocation_controller.go:336
- from the cluster-policy-controller pod
- of the openshift-kube-controller-manager

hit update conflicts, as we have a few operators trying to update a namespace on its creation

An example error log:

namespace_scc_allocation_controller.go:336] error syncing namespace, it will be retried: Operation cannot be fulfilled on namespaces "e2e-test-s2i-usage-ctzrv": the object has been modified; please apply your changes to the latest version and try again

note, it appears this controller can get backed up enough when a bunch of namespaces get created at once that a namespace missing its SCC annotations may not be listed in those error logs, because its queue gets filled up enough that the test case fails and namespace cleanup occurs before the controller has a chance to update the namespace.

See https://github.com/openshift/origin/pull/24703#issuecomment-608443318

So I'm going to add a wait for namespace SCC annotations to the e2e SetupProject

**May look at updating cluster-policy-controller reduce wait between retries, reduce the conflict window, or just make debug a bit easier in the future**

Comment 1 Gabe Montero 2020-04-07 19:12:09 UTC
Upon initiation by Adam, and slack discussion between Adam, David Eads, and myself, updating cluster-policy-controller seems warranted.

See discussion in slack: https://coreos.slack.com/archives/CB48XQ4KZ/p1586282511367900

Highlights:

- Latest failed image eco test:  https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24703/pull-ci-openshift-origin-master-e2e-gcp-image-ecosystem/200
- look at the "openshift images should be SCL enabled" failures
- And the additional debug in PR 24703
- David suggests a Patch instead of Update should be used in namespace_scc_allocation_controller.go
- David is also unhappy that there were no metrics for cluster-policy-controller

https://ftygpjfx-promecieus.svc.ci.openshift.org/graph?g0.range_input=72m&g0.end_input=2020-04-02%2021%3A00&g0.expr=workqueue_retries_total&g0.tab=0
https://ftygpjfx-promecieus.svc.ci.openshift.org/graph?g0.range_input=72m&g0.end_input=2020-04-02%2021%3A00&g0.expr=workqueue_retries_total%7Bnamespace%3D%22openshift-kube-controller-manager%22%2Cname%3D%22namespace%22%7D&g0.tab=0

For now leaving the test case work around of waiting for the namespace scc annotations associated with this bug.

Comment 2 Maciej Szulik 2020-04-09 08:51:05 UTC
*** Bug 1822298 has been marked as a duplicate of this bug. ***

Comment 3 Maciej Szulik 2020-04-09 08:51:39 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1822298 for more data.

Comment 6 zhou ying 2020-04-30 07:41:59 UTC
Confirmed with latest code , I run the e2e test with command:

`openshift-tests run all --dry-run | grep -E "\[Feature:ImageEcosystem\]\[Slow\] openshift images should be SCL enabled"  | openshift-tests run -f -
`

26 pass, 0 skip (2m8s)

I'll verify this issue. Correct me when I was wrong.

Comment 7 Maciej Szulik 2020-05-04 09:58:35 UTC
It would be best to use audit and confirm that SCC allocation controller is using the patch operation when updating the namespace.

Comment 8 zhou ying 2020-05-08 09:32:49 UTC
Double checked with payload: 4.5.0-0.nightly-2020-05-06-003431, when update namespace could see audit log like:

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"dde874a5-26b7-4b68-b2fe-bf412bf6e9ab","stage":"ResponseComplete","requestURI":"/apis/project.openshift.io/v1/projects/zhouy","verb":"patch","user":{"username":"system:admin","groups":["system:masters","system:authenticated"]},"sourceIPs":["10.0.6.201","10.130.0.1"],"userAgent":"oc/4.4.0 (linux/amd64) kubernetes/2576e48","objectRef":{"resource":"projects","namespace":"zhouy","name":"zhouy","apiGroup":"project.openshift.io","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2020-05-08T09:30:41.613870Z","stageTimestamp":"2020-05-08T09:30:41.626430Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}}

Comment 9 Maciej Szulik 2020-05-11 16:23:33 UTC
*** Bug 1817099 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2020-07-13 17:25:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.