Bug 1820687

Summary: [sig-devex][Feature:ImageEcosystem][Slow] openshift images should be SCL enabled is forbidden: unable to validate against any security context constraint: []
Product: OpenShift Container Platform Reporter: Gabe Montero <gmontero>
Component: kube-controller-managerAssignee: Maciej Szulik <maszulik>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.5CC: adam.kaplan, aos-bugs, dcbw, deads, mfojtik, wzheng, yinzhou
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1829327 (view as bug list) Environment:
Last Closed: 2020-07-13 17:25:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1829327    

Description Gabe Montero 2020-04-03 15:34:51 UTC
So these test cases have flaked in a few ways.

This bug covers the edge case where the test namespace does not have its SCC annotations because of 

- namespace_scc_allocation_controller.go:336
- from the cluster-policy-controller pod
- of the openshift-kube-controller-manager

hit update conflicts, as we have a few operators trying to update a namespace on its creation

An example error log:

namespace_scc_allocation_controller.go:336] error syncing namespace, it will be retried: Operation cannot be fulfilled on namespaces "e2e-test-s2i-usage-ctzrv": the object has been modified; please apply your changes to the latest version and try again

note, it appears this controller can get backed up enough when a bunch of namespaces get created at once that a namespace missing its SCC annotations may not be listed in those error logs, because its queue gets filled up enough that the test case fails and namespace cleanup occurs before the controller has a chance to update the namespace.

See https://github.com/openshift/origin/pull/24703#issuecomment-608443318

So I'm going to add a wait for namespace SCC annotations to the e2e SetupProject

**May look at updating cluster-policy-controller reduce wait between retries, reduce the conflict window, or just make debug a bit easier in the future**

Comment 1 Gabe Montero 2020-04-07 19:12:09 UTC
Upon initiation by Adam, and slack discussion between Adam, David Eads, and myself, updating cluster-policy-controller seems warranted.

See discussion in slack: https://coreos.slack.com/archives/CB48XQ4KZ/p1586282511367900

Highlights:

- Latest failed image eco test:  https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24703/pull-ci-openshift-origin-master-e2e-gcp-image-ecosystem/200
- look at the "openshift images should be SCL enabled" failures
- And the additional debug in PR 24703
- David suggests a Patch instead of Update should be used in namespace_scc_allocation_controller.go
- David is also unhappy that there were no metrics for cluster-policy-controller

https://ftygpjfx-promecieus.svc.ci.openshift.org/graph?g0.range_input=72m&g0.end_input=2020-04-02%2021%3A00&g0.expr=workqueue_retries_total&g0.tab=0
https://ftygpjfx-promecieus.svc.ci.openshift.org/graph?g0.range_input=72m&g0.end_input=2020-04-02%2021%3A00&g0.expr=workqueue_retries_total%7Bnamespace%3D%22openshift-kube-controller-manager%22%2Cname%3D%22namespace%22%7D&g0.tab=0

For now leaving the test case work around of waiting for the namespace scc annotations associated with this bug.

Comment 2 Maciej Szulik 2020-04-09 08:51:05 UTC
*** Bug 1822298 has been marked as a duplicate of this bug. ***

Comment 3 Maciej Szulik 2020-04-09 08:51:39 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1822298 for more data.

Comment 6 zhou ying 2020-04-30 07:41:59 UTC
Confirmed with latest code , I run the e2e test with command:

`openshift-tests run all --dry-run | grep -E "\[Feature:ImageEcosystem\]\[Slow\] openshift images should be SCL enabled"  | openshift-tests run -f -
`

26 pass, 0 skip (2m8s)

I'll verify this issue. Correct me when I was wrong.

Comment 7 Maciej Szulik 2020-05-04 09:58:35 UTC
It would be best to use audit and confirm that SCC allocation controller is using the patch operation when updating the namespace.

Comment 8 zhou ying 2020-05-08 09:32:49 UTC
Double checked with payload: 4.5.0-0.nightly-2020-05-06-003431, when update namespace could see audit log like:

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"dde874a5-26b7-4b68-b2fe-bf412bf6e9ab","stage":"ResponseComplete","requestURI":"/apis/project.openshift.io/v1/projects/zhouy","verb":"patch","user":{"username":"system:admin","groups":["system:masters","system:authenticated"]},"sourceIPs":["10.0.6.201","10.130.0.1"],"userAgent":"oc/4.4.0 (linux/amd64) kubernetes/2576e48","objectRef":{"resource":"projects","namespace":"zhouy","name":"zhouy","apiGroup":"project.openshift.io","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2020-05-08T09:30:41.613870Z","stageTimestamp":"2020-05-08T09:30:41.626430Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}}

Comment 9 Maciej Szulik 2020-05-11 16:23:33 UTC
*** Bug 1817099 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2020-07-13 17:25:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409