Bug 1838372 - Builds fail after running postCommit script if OCP cluster is configured with a container registry whitelist
Summary: Builds fail after running postCommit script if OCP cluster is configured with...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 4.2.z
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Adam Kaplan
QA Contact: wewang
URL:
Whiteboard:
Depends On:
Blocks: 1849173
TreeView+ depends on / blocked
 
Reported: 2020-05-21 02:31 UTC by Garrett Hyde
Modified: 2023-12-15 17:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the container image signature policy used in builds did not contain any configuration for local images Consequence: when customers only allowed images from specific registries, postCommit scripts in builds failed because they could not use local image Fix: updated container image signature policy to always allow images that reference local storage layers directly Result: builds can successfully complete if they contain a postCommit hook
Clone Of:
Environment:
Last Closed: 2020-10-27 16:00:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Test script for reproducing bug and implementing work around (5.82 KB, application/x-shellscript)
2020-05-21 02:31 UTC, Garrett Hyde
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-controller-manager pull 114 0 None closed Bug 1838372: Allow image push after postCommit script completes 2020-12-11 14:47:02 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:00:42 UTC

Description Garrett Hyde 2020-05-21 02:31:26 UTC
Created attachment 1690453 [details]
Test script for reproducing bug and implementing work around

Created attachment 1690453 [details]
Test script for reproducing bug and implementing work around

## Description of problem

When a BuildConfig has a postCommit script defined, and the OpenShift cluster is configured to [whitelist specific registries](https://docs.openshift.com/container-platform/4.4/openshift_images/image-configuration.html), all builds with a postCommit will fail during the COMMIT step.

## Version-Release number of selected component (if applicable)

OpenShift Container Platform 4.2, 4.3, & 4.4

## How reproducible

Always

## Steps to reproduce

1. Add a whitelist of allowed registries.

   ```bash
   oc patch image.config.openshift.io/cluster --type=merge -p '
   spec:
     registrySources:
       allowedRegistries:
       - image-registry.openshift-image-registry.svc:5000
       - registry.access.redhat.com
       - registry.redhat.io
       - registry.connect.redhat.com
       - quay.io
       - docker.io
   '
   ```

2. Deploy an application in OpenShift.

3. Add a postCommit to the application's build.

   ```bash
   oc patch bc/${APP_NAME} --type=merge -p '
   spec:
     postCommit:
       script: echo "This is a test"
   '
   ```

4. Start a build.

   ```bash
   oc start-build ${APP_NAME} --build-loglevel=5 --wait --follow
   ```

5. Wait for build to fail.

## Actual results

Build fails with the following error:

```text
...
STEP 9: CMD /usr/libexec/s2i/run
Getting image source signatures
Copying blob sha256:35b7a5c4e1b4a84fb05d9c6658572c2b7a9925a270e8f7860c0ae30671c0a57c
Copying blob sha256:eddcd8d2986daee57d8cd75add7ff3c998e668857847e0f2b3c3d3b7e02a3ab6
Copying blob sha256:f0f97bb39344256e639831d65c0c9db84aca2e9b0f1507f267b7cc128068fff0
Copying blob sha256:5a9c62a939b5a7eb752536378f00381f42c8cb293a026b29fa4a9384e56da6af
Copying blob sha256:72beca8812421a68c0ac833a371148e35043be85ad138b67cbce72602b92f4cc
Copying blob sha256:2aebf74dd0b4cfd3bd9b653dcae05a5c1ebd08fd27a6ea36f7a560fac9b9a5fe
Copying config sha256:1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918
Writing manifest to image destination
Storing signatures
1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918
STEP 10: FROM 1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918
STEP 11: RUN /bin/sh -ic 'echo "This is a test"'
sh: no job control in this shell
This is a test
STEP 12: FROM 1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918
STEP 13: COMMIT temp.builder.openshift.io/test-postcommit/python-3:e6712cd2
F0520 23:25:37.890850       1 helpers.go:114] error: build error: error copying image "1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918": Source image rejected: Running image containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage]@1875230d5230a5d11979d5c7cac7ffbe115cbeb83b7a904ffca219ceda8db918 is rejected by policy.
```

## Expected results

Build should succeed.

## Additional info

1. When a container registry whitelist is not configured, builds with a postCommit succeed.

2. When a container registry whitelist is configured, builds without a postCommit succeed.

3. Patching a compute node's `/etc/containers/policy.json` does not fix this issue.

4. This [GitHub issue](https://github.com/openshift/builder/issues/71) is related but does not resolve the issue in this BZ.

## Root cause

When a build pod runs, it uses it's own containers policy located at `/etc/containers/policy.json` inside the container image.

```json
{
    "default": [
        {
            "type": "insecureAcceptAnything"
        }
    ]
}
```

[*Source*](https://github.com/openshift/builder/blob/bb6e41a1e23a61e070274f778d86d4211bdb41ff/imagecontent/policy.json)

If a container registry whitelist is configured, however, the policy.json is overridden by a ConfigMap mounted at `/var/run/configs/openshift.io/build-system/policy.json`. This ConfigMap, `${APP_NAME}-${BUILD_NUMBER}-sys-config`, is generated by OpenShift prior to a build starting.

The issue is that OpenShift does not include `containers-storage` in the whitelisted transports, so it is rejected by default.

## Possible Fix

I believe the `openshift-controller-manager` contains the [broken code](https://github.com/openshift/openshift-controller-manager/blob/9d0118b20168324d21efba6ff7c244730abbd855/pkg/build/controller/build/build_controller.go#L2159). When it creates the transports, it needs to include a `containers-storage` entry.

```golang
policyObj.Transports = map[string]signature.PolicyTransportScopes{
    "atomic": transportScopes,
    "docker": transportScopes,
    "containers-storage": TODO,  // add entry here
}
```

## Work around

As stated in BZ 1758014, users can define their own security policy for builds.

1. Create a policy.json file that includes `containers-storage` in the whitelist.

   ```json
   {
     "default": [
       {
         "type": "reject"
       }
     ],
     "transports": {
       "atomic": {
         ...
       },
       "docker": {
         ...
       },
       "containers-storage": {
         "": [
           {
             "type": "insecureAcceptAnything"
           }
         ]
       }
     }
   }
   ```

2. Create the ConfigMap `${APP_NAME}-${NEXT_BUILD_NUMBER}-sys-config` which includes the custom policy.json file.

3. Create the ConfigMaps `${APP_NAME}-${NEXT_BUILD_NUMBER}-ca` and `${APP_NAME}-${NEXT_BUILD_NUMBER}-global-ca`. These can be copied from previous builds.

   **NOTE:** If you don't create the CA ConfigMaps, the build will fail because the missing ConfigMaps couldn't be mounted in the build pod.

4. Start the build.

5. Wait for build to succeed.

The issue with this work around is that it must be executed prior to every build. I recommend using CI/CD to automate this process.

Comment 6 wewang 2020-06-05 01:47:05 UTC
Still wait for available 4.6 nightly build payload to verify it.

Comment 7 wewang 2020-06-08 09:21:16 UTC
Verified in version: 
4.6.0-0.nightly-2020-06-07-065515

Steps:
1. Create apps 
  $oc new-app openshift/ruby~https://github.com/openshift/ruby-hello-world

2.Add a whitelist of allowed registries.

3. Add a postCommit to the application's build.

   ```bash
   oc patch bc/ruby-hello-world --type=merge -p '
   spec:
     postCommit:
       script: echo "This is a test"
   '
   ```
4. Start a build,build complete

Comment 11 errata-xmlrpc 2020-10-27 16:00:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 12 Anand Paladugu 2022-10-14 13:34:00 UTC
@adam.kaplan 

Hi Adam

This issue is noticed again in 4.8 through 4.10 again. (Refer to 03317106).  Should we re-open this BZ or open a new one ?

Thx

Anand

Comment 13 Adam Kaplan 2022-10-14 14:33:24 UTC
Hi Anand,

Please open a new BZ and link the associated case. The original root cause of this issue was verified by QE in 4.6.

Thank You,
Adam


Note You need to log in before you can comment on or make changes to this bug.