Bug 1838372

Summary: Builds fail after running postCommit script if OCP cluster is configured with a container registry whitelist
Product: OpenShift Container Platform Reporter: Garrett Hyde <ghyde>
Component: BuildAssignee: Adam Kaplan <adam.kaplan>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.zCC: adam.kaplan, antgarci, aos-bugs, apaladug, btomlins, clasohm, wzheng
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the container image signature policy used in builds did not contain any configuration for local images Consequence: when customers only allowed images from specific registries, postCommit scripts in builds failed because they could not use local image Fix: updated container image signature policy to always allow images that reference local storage layers directly Result: builds can successfully complete if they contain a postCommit hook
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:00:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1849173    
Attachments:
Description Flags
Test script for reproducing bug and implementing work around none

Description Garrett Hyde 2020-05-21 02:31:26 UTC
Created attachment 1690453 [details]
Test script for reproducing bug and implementing work around

Created attachment 1690453 [details]
Test script for reproducing bug and implementing work around

## Description of problem

When a BuildConfig has a postCommit script defined, and the OpenShift cluster is configured to [whitelist specific registries](https://docs.openshift.com/container-platform/4.4/openshift_images/image-configuration.html), all builds with a postCommit will fail during the COMMIT step.

## Version-Release number of selected component (if applicable)

OpenShift Container Platform 4.2, 4.3, & 4.4

## How reproducible

Always

## Steps to reproduce

1. Add a whitelist of allowed registries.

   ```bash
   oc patch image.config.openshift.io/cluster --type=merge -p '
   spec:
     registrySources:
       allowedRegistries:
       - image-registry.openshift-image-registry.svc:5000
       - registry.access.redhat.com
       - registry.redhat.io
       - registry.connect.redhat.com
       - quay.io
       - docker.io
   '
   ```

2. Deploy an application in OpenShift.

3. Add a postCommit to the application's build.

   ```bash
   oc patch bc/${APP_NAME} --type=merge -p '
   spec:
     postCommit:
       script: echo "This is a test"
   '
   ```

4. Start a build.

   ```bash
   oc start-build ${APP_NAME} --build-loglevel=5 --wait --follow
   ```

5. Wait for build to fail.

## Actual results

Build fails with the following error:

```text
...
STEP 9: CMD /usr/libexec/s2i/run
Getting image source signatures
Copying blob sha256:35b7a5c4e1b4a84fb05d9c6658572c2b7a9925a270e8f7860c0ae30671c0a57c
Copying blob sha256:eddcd8d2986daee57d8cd75add7ff3c998e668857847e0f2b3c3d3b7e02a3ab6
Copying blob sha256:f0f97bb39344256e639831d65c0c9db84aca2e9b0f1507f267b7cc128068fff0
Copying blob sha256:5a9c62a939b5a7eb752536378f00381f42c8cb293a026b29fa4a9384e56da6af
Copying blob sha256:72beca8812421a68c0ac833a371148e35043be85ad138b67cbce72602b92f4cc
Copying blob sha256:2aebf74dd0b4cfd3bd9b653dcae05a5c1ebd08fd27a6ea36f7a560fac9b9a5fe
Copying config sha256:1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918
Writing manifest to image destination
Storing signatures
1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918
STEP 10: FROM 1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918
STEP 11: RUN /bin/sh -ic 'echo "This is a test"'
sh: no job control in this shell
This is a test
STEP 12: FROM 1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918
STEP 13: COMMIT temp.builder.openshift.io/test-postcommit/python-3:e6712cd2
F0520 23:25:37.890850       1 helpers.go:114] error: build error: error copying image "1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918": Source image rejected: Running image containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage]@1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918 is rejected by policy.
```

## Expected results

Build should succeed.

## Additional info

1. When a container registry whitelist is not configured, builds with a postCommit succeed.

2. When a container registry whitelist is configured, builds without a postCommit succeed.

3. Patching a compute node's `/etc/containers/policy.json` does not fix this issue.

4. This [GitHub issue](https://github.com/openshift/builder/issues/71) is related but does not resolve the issue in this BZ.

## Root cause

When a build pod runs, it uses it's own containers policy located at `/etc/containers/policy.json` inside the container image.

```json
{
    "default": [
        {
            "type": "insecureAcceptAnything"
        }
    ]
}
```

[*Source*](https://github.com/openshift/builder/blob/bb6e41a1e23a61e070274f778d86d4211bdb41ff/imagecontent/policy.json)

If a container registry whitelist is configured, however, the policy.json is overridden by a ConfigMap mounted at `/var/run/configs/openshift.io/build-system/policy.json`. This ConfigMap, `${APP_NAME}-${BUILD_NUMBER}-sys-config`, is generated by OpenShift prior to a build starting.

The issue is that OpenShift does not include `containers-storage` in the whitelisted transports, so it is rejected by default.

## Possible Fix

I believe the `openshift-controller-manager` contains the [broken code](https://github.com/openshift/openshift-controller-manager/blob/9d0118b20168324d21efba6ff7c244730abbd855/pkg/build/controller/build/build_controller.go#L2159). When it creates the transports, it needs to include a `containers-storage` entry.

```golang
policyObj.Transports = map[string]signature.PolicyTransportScopes{
    "atomic": transportScopes,
    "docker": transportScopes,
    "containers-storage": TODO,  // add entry here
}
```

## Work around

As stated in BZ 1758014, users can define their own security policy for builds.

1. Create a policy.json file that includes `containers-storage` in the whitelist.

   ```json
   {
     "default": [
       {
         "type": "reject"
       }
     ],
     "transports": {
       "atomic": {
         ...
       },
       "docker": {
         ...
       },
       "containers-storage": {
         "": [
           {
             "type": "insecureAcceptAnything"
           }
         ]
       }
     }
   }
   ```

2. Create the ConfigMap `${APP_NAME}-${NEXT_BUILD_NUMBER}-sys-config` which includes the custom policy.json file.

3. Create the ConfigMaps `${APP_NAME}-${NEXT_BUILD_NUMBER}-ca` and `${APP_NAME}-${NEXT_BUILD_NUMBER}-global-ca`. These can be copied from previous builds.

   **NOTE:** If you don't create the CA ConfigMaps, the build will fail because the missing ConfigMaps couldn't be mounted in the build pod.

4. Start the build.

5. Wait for build to succeed.

The issue with this work around is that it must be executed prior to every build. I recommend using CI/CD to automate this process.

Comment 6 wewang 2020-06-05 01:47:05 UTC
Still wait for available 4.6 nightly build payload to verify it.

Comment 7 wewang 2020-06-08 09:21:16 UTC
Verified in version: 
4.6.0-0.nightly-2020-06-07-065515

Steps:
1. Create apps 
  $oc new-app openshift/ruby~https://github.com/openshift/ruby-hello-world

2.Add a whitelist of allowed registries.

3. Add a postCommit to the application's build.

   ```bash
   oc patch bc/ruby-hello-world --type=merge -p '
   spec:
     postCommit:
       script: echo "This is a test"
   '
   ```
4. Start a build,build complete

Comment 11 errata-xmlrpc 2020-10-27 16:00:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 12 Anand Paladugu 2022-10-14 13:34:00 UTC
@adam.kaplan 

Hi Adam

This issue is noticed again in 4.8 through 4.10 again. (Refer to 03317106).  Should we re-open this BZ or open a new one ?

Thx

Anand

Comment 13 Adam Kaplan 2022-10-14 14:33:24 UTC
Hi Anand,

Please open a new BZ and link the associated case. The original root cause of this issue was verified by QE in 4.6.

Thank You,
Adam