Description of problem:
The logs show:
Copying system trust bundle
cp: cannot remove '/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem': Read-only file system
Root cause is the customer was PoCing a security product StackRox. The product was correctly creating it's own scc, but they had given their SCC a priority of 100 with RunAsAny and readOnlyRootFilesystem: true. This put it's priority ahead of anyuid for certain operators causing them to crash as seen above.
Similarly to the DefaultSecurityContextConstraints_Mutated alerting, how can we prevent an ill-advised SCC from negatively impacting the platform and ensure ISVs are configuring things correctly.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a scc as noted above
2. Observe operators like authentication
Operators in crashloop without an obviously reason related to the SCC changes.
Warning or guidance around these types of platform impacting changes.
Setting the priority to less than anyuid fixed the issue.
DefaultSecurityContextConstraints_Mutated is going to reverted. PRs merged. Next z stream release should have it removed.
The other topic must be analyzed. Have you done a comparison between the original and the installed SCC? Hard to believe that equal SCCs behave differently.
It's not that 2 equal SCCs are behaving differently, it's the impact of a 3rd party SCC can have on the platform components.
A default install:
oc get pod authentication-operator-7fb9bc495c-5pt9p -o yaml | grep scc
oc get pod oauth-openshift-594478b797-xkgxj -o yaml | grep scc
3rd party tool comes along and creates its own SCC, as it should, but the SCC creates a conflict with anyuid.
oc apply -f securitycontextconstraints-collector.yaml
The full scc is attached above.
For a while, nothing may change as all of the pods are already running.
An oauth change happens and the oauth pods start rolling:
oc get pods oauth-openshift-594478b797-9gc98 -o yaml | grep scc
The first pods goes into a crashloopbackoff because now its using the collector SCC (because it has has a higher priority and it setting readonly) instead of anyuid which leads to the pods failing. This would be a bigger issue during an upgrade event.
There are 4 operators and the oauth pods that use the anyuid SCC: authentication-operator, oauth-openshift, cluster-node-tuning-operator, openshift-service-catalog-apiserver-operator and openshift-service-catalog-controller-manager-operator
This is caused by the oauth-server pods not being specific enough about their security context and their service-account's privileges being too broad
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.