Bug 2113015

Summary: Confusion around disable-home-env-overwrite and disable-working-directory-overwrite
Product: Red Hat OpenShift Pipelines Reporter: Luiz Carvalho <lucarval>
Component: pipelinesAssignee: Vincent Demeester <vdemeest>
Status: NEW --- QA Contact: Ruchir Garg <rgarg>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 1.7CC: cboudjna, kbaig, nikthoma, pgarg, ppitonak, rbehera, rwagner, sashture, vdemeest
Target Milestone: ---Flags: lucarval: needinfo? (vdemeest)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luiz Carvalho 2022-08-01 17:45:43 UTC
Description of problem:
There's a Red Hat solution (https://access.redhat.com/solutions/6956010) providing a remediation to the error:

warning: unsuccessful cred copy: ".docker" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.docker: permission denied

However, this error does not occur when using the "pipeline" ServiceAccount. It's not clear why that is the case, but it appears to be related to the SecurityContextConstraints associated with the ServiceAccount (and the created Pod).

There is no explanation in the solution, nor in the docs (https://docs.openshift.com/container-platform/4.10/cicd/pipelines/understanding-openshift-pipelines.html#about-tasks_understanding-openshift-pipelines), as to this subtlety. This can lead to confusion for Task authors and users, especially because the ServiceAccount used is a property of the TaskRun, not the Task.

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.8.0-202201210133.p0.g88e7eba.assembly.stream-88e7eba
Server Version: 4.10.22
Kubernetes Version: v1.23.5+3afdacb

$ tkn version
Client version: 0.21.0
Pipeline version: v0.33.2
Triggers version: v0.19.0

How reproducible:
Always.

Steps to Reproduce:
1. Create a simple task and two taskruns from it. One using the "pipeline" ServiceAccount, the other using the "default" ServiceAccount:

oc create -f - <<EOF
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: reproducer
spec:
  steps:
  - name: reproducer
    image: registry.access.redhat.com/ubi8/ubi:latest
    script: |
      echo 'hello'
---
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: reproducer-default-sa
spec:
  serviceAccountName: default
  taskRef:
    kind: Task
    name: reproducer
---
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: reproducer-pipeline-sa
spec:
  serviceAccountName: pipeline
  taskRef:
    kind: Task
    name: reproducer
EOF

Actual results:
The TaskRun reproducer-pipeline-sa succeeds while the TaskRun reproducer-default-sa fails.

Expected results:
Goth TaskRuns should either fail or succeed.

Additional info:

The main issue here is confusion. If we can make both cases behave the same, that's great. But I think it's also acceptable to document this oddity.

Comment 1 Luiz Carvalho 2022-08-01 17:50:11 UTC
Also important to mention that adding the "pipelines-scc" SecurityContextConstraint to whichever ServiceAccount is used by the Pod succeeds. With the example from the description in mind, the following command makes both use cases succeed:

$ oc policy add-role-to-user pipelines-scc-clusterrole -z default

where "default" is the name of whichever ServiceAccount is being used by the TaskRun.

Comment 2 Vincent Demeester 2022-08-04 06:29:33 UTC
@lucarval note that this is a "warning" and not necessarily an error. It won't make the Task fail on its own – but the Task could fail later due to authentication because it didn't find the files.
That said, this mainly happens when the Task and/or the image doesn't set the `HOME` environment variable. In that case (no HOME set), the "creds-init" behavior (that tekton does when using auth with ServiceAccount), it will "still" try to write to `$HOME/.docker` (for example for docker), which becomes `/.docker` which is, usually, not writable by any user but root (or not even by root).

If you apply the `pipelines-scc` SecurityContextConstraint, you are basically applying something very close to anyuid. The anyuid SCC is gonna run Pods as root if not otherwise specified (in the Pod definition *or* with a `USER` in the image). This is the reason why applying it fixes the issue.

This is one of the reason why the "best practices" tendency on authentications and secrets with Tekton points toward being more explicit with secrets bound to workspace (see https://docs.openshift-pipelines.xyz/pipeline/auth.html#secrets-and-workspaces). It's explicit, you can choose exactly where you "mount" them in the containers so that your Task content can do its thing.

Comment 3 Luiz Carvalho 2022-08-04 14:11:37 UTC
@vdemeest, thank you for the clarification. It makes sense. It would be nice to include this information in the solution I linked earlier. Something like this:

> This behavior may not occur if the Task is executed with a ServiceAccount which uses a less restricted SecurityContextConstraint. For example, by default tasks are executed with the "pipeline" ServiceAccount which uses the "pipelines-scc"" SecurityContextConstraint. As such, the ServiceAccount has enough access to not trigger the warning.

I was made aware of this issue because one of the tasks I wrote did not work properly on a certain environment. If the text above was in the solution, it would've saved me a few hours of investigation to narrow down why I couldn't reproduce the issue initially. Wdyt?