Bug 1745192 - Builds are not configured to use mirrors in disconnected enivronments
Summary: Builds are not configured to use mirrors in disconnected enivronments
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.3.0
Assignee: Gabe Montero
QA Contact: wewang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-23 20:11 UTC by Adam Kaplan
Modified: 2020-03-13 05:52 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:05:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
build log at loglevel 8 (51.90 KB, text/plain)
2019-09-23 19:08 UTC, Gabe Montero
no flags Details
Build log with log level 8 (173.26 KB, text/plain)
2019-11-29 07:03 UTC, Wenjing Zheng
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github containers image pull 722 0 'None' closed allow for .dockercfg files to reside in non-home directories (facilit… 2021-01-24 11:50:14 UTC
Github openshift builder pull 102 0 'None' closed Bug 1745192: seed containers/image with entire dockerconfig for authentication in … 2021-01-24 11:50:18 UTC
Github openshift machine-config-operator pull 1087 0 'None' closed Bug 1745192: Move registries.conf editing into a subpackage 2021-01-24 11:50:14 UTC
Github openshift openshift-controller-manager pull 19 0 'None' closed Bug 1745192: configure builds to use mirrors in disconnected enivronments 2021-01-24 11:50:15 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:05:41 UTC

Description Adam Kaplan 2019-08-23 20:11:47 UTC
Description of problem:

OpenShift builds generate their own registries.conf config that is independent of the node host. 


Version-Release number of selected component (if applicable): 4.2.0


How reproducible: Always


Steps to Reproduce:
1. Install an OpenShift cluster in a disconnected environment, with quay.io's fedora/fedora images mirrored
2. Run a Docker strategy build with a Dockerfile that starts like `FROM quay.io/fedora/fedora:latest`

Actual results:

Build fails because fedora/fedora:latest cannot be pulled from the public quay.io registry.

Expected results:

Build can pull quay.io/fedora/fedora:latest from the mirror

Comment 1 Adam Kaplan 2019-08-23 21:00:13 UTC
openshift-controller-manager generates a `registries.conf` file that is mounted into build pods via a ConfigMap [1].

We need to do the following:

1. Watch the updates to the cluster ImageContentSourcePolicy [2].
2. Migrate our representation of `registries.conf` to use the V2 `registries.conf` format [3].
3. Use the image content policy data to set the mirror registries.

[1] https://github.com/openshift/openshift-controller-manager/blob/master/pkg/build/controller/build/build_controller.go#L2026-L2056
[2] https://github.com/openshift/api/blob/master/operator/v1alpha1/types_image_content_source_policy.go#L56-L67
[3] https://github.com/containers/image/blob/master/pkg/sysregistriesv2/system_registries_v2.go#L137-L141

Comment 2 Adam Kaplan 2019-08-23 21:01:48 UTC
/cc Miloslav and Nalin

Comment 4 Gabe Montero 2019-09-05 13:48:19 UTC
PRs have merged ... bot must have missed this ... manually moving to modified

Comment 6 wewang 2019-09-06 02:55:11 UTC
Since disconnected env install is block by bug, when can install it, will verify the bug.

Comment 8 Miloslav Trmač 2019-09-10 15:04:20 UTC
(In reply to wewang from comment #7)
> [root@Desktop test]# oc image mirror quay.io/drahnr/fedora:latest 
> warning: Layer size mismatch for
> sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: had
> 0, wrote 32
> warning: Layer size mismatch for
> sha256:0be2a68855d7bbbba01b447a79c873f137e6fb47362e79f2fd79c72575c9b73a: had
> 0, wrote 89867780

This is apparently a bug in the mirroring code: the source image uses schema1, which does not contain blob sizes.  Seems harmless (cosmetic only, though).

> error: unable to push manifest to
> mirror-registry.qe.devcluster.openshift.com:5000/openshift/fedora1:latest:
> received unexpected HTTP status: 500 Internal Server Error

Yeah, that’s not very helpful. Maybe the registry contains an actual error cause.


----


> info: Mirroring completed in 46.74s (1.922MB/s)
> error: one or more errors occurred while uploading images
> 
> So i using below steps to test:
> 1. docker tag quay.io/drahnr/fedora:latest
> mirror-registry.qe.devcluster.openshift.com:5000/openshift/fedora2:latest
> 2. docker push
> mirror-registry.qe.devcluster.openshift.com:5000/openshift/fedora2:latest


This is pretty likely to convert that schema1 image to schema2, and possibly change the manifest digest for other reasons.  Use (skopeo copy docker://quay.io/drahnr/fedora:latest docker://mirror-registry.qe.devcluster.openshift.com:5000/openshift/fedora2:latest) instead.


> $oc new-build -D $'FROM quay.io/drahnr/fedora:latest\nRUN yum install -y

OOPS; mirroring is configured to only apply to digest references. This is not going to use the mirror anyway.

If the image is correctly mirrored, using a digest reference (FROM quay.io/drahnr/fedora@sha256:5562f951443b829832cfc603eebc0057d5e23b2448db3192f7024dbb06abac04) should work.  That would locally test that the fixes made work correctly; but it’s not going to be all that helpful for ordinary use.

Comment 10 Gabe Montero 2019-09-11 13:11:51 UTC
OK, I'm going to try and summarize a discussion that has been going on in #warroom-disconnected (and some of which has been discussed previously in the context of https://bugzilla.redhat.com/show_bug.cgi?id=1741391)

1) Per Miloslav:  Mirrors are always set up with MirrorByDigestOnly, and that completely breaks FROM image:latest in Dockerfiles.
OpenShift installations don’t mind because they always use digest references, but that’s not really reasonable for builds.
The idea of MirrorByDigestOnly supposedly was that we don’t want to risk having several mirrors out of sync, but breaking builds to get that seems like a pretty wrong trade-off.

2) So that means either 
  a) we try again if we can get Oleg's https://bugzilla.redhat.com/show_bug.cgi?id=1741391

But we don't want to do that at this point.  Oleg's work here is complicated, and is still at risk for 4.2.

  b) QE changes the scenario so the build references any input images via an image reference that uses a SHA and the mirror registry ... i.e. no use of imagestream references 

3) the changes this bug's PRs revolved around updating the registries.conf used by the build, pulling in the ICSP mirror config.  So all that means, if we take Wen's test of

oc new-build -D $'FROM quay.io/drahnr/fedora:latest\nRUN yum install -y httpd' --strategy=docker

If 'fedora:latest' is changed to 'fedora:<sha reference>' then the new registry.conf should pick up the ICSP mirror definitions and pull the quay.io image mirror that was previously set up.

Wen - change your test case in this fashion, and we'll go from there.

At this time, we don't want to block on

Comment 39 Ben Parees 2019-09-19 17:05:28 UTC
1) run with build loglevel 5  (oc start-build foo --build-loglevel=5)
2) collect the buildconfig yaml, build yaml, pod yaml, imagestream yaml, and imagecontentsourcepolicy yaml
3) collect the build logs.

Comment 40 wewang 2019-09-20 03:06:36 UTC
(In reply to Ben Parees from comment #39)
> 1) run with build loglevel 5  (oc start-build foo --build-loglevel=5)
> 2) collect the buildconfig yaml, build yaml, pod yaml, imagestream yaml, and
> imagecontentsourcepolicy yaml
> 3) collect the build logs.


Here's info: http://pastebin.test.redhat.com/799074

Comment 41 Ben Parees 2019-09-20 03:49:26 UTC
line 32 of your pastebin indicates that this buildconfig is going to use " docker.io/nodeshift/centos7-s2i-nodejs:latest" to substitute the FROM line of your dockerfile (which is exactly what the logs show it doing).  So it's definitely not utilizing your mirror, and this also means your cluster does have access to docker.io.

 strategy:
    dockerStrategy:
      from:
        kind: DockerImage
        name: docker.io/nodeshift/centos7-s2i-nodejs:latest

Pulling image docker.io/nodeshift/centos7-s2i-nodejs ...
STEP 1: FROM docker.io/nodeshift/centos7-s2i-nodejs


It's not clear to me *why* the buildconfig is being constructed that way (something to do with new-app i assume, if that's how you're creating it), but a valid test would require you to remove "from" section of the dockerStrategy from the buildconfig, or change the value of the name to:

docker.io/nodeshift/centos7-s2i-nodejs@sha256:eea192da5dc21ddfbfbc1a1947ecb3c73e074e2d9516e5bed7ce66015464cce9

instead of

docker.io/nodeshift/centos7-s2i-nodejs:latest

But of course none of that matters if the cluster isn't actually disconnected from docker.io.

Comment 42 Ben Parees 2019-09-20 03:50:43 UTC
Sending back to QE to re-validate this since it seems like the validation performed wasn't correct.

that said, if this fails QE we are not going to hold 4.2 for it.  we'll move it to 4.3 and backport to 4.2.z as needed, if there is a code change needed.

Comment 43 Wenjing Zheng 2019-09-20 09:30:44 UTC
Sorry for the confusion. We figure our why it passed in GCP disconnected cluster, proxy are added to BuildDefault to make github.com accessible. After we remove the proxy, cannot build for failed to pull mirror image with mirror rule defined. So moving this bug to assigned : (

Comment 44 Gabe Montero 2019-09-20 11:42:07 UTC
@Wenjing has provided me access to their clusters.

If today I can
a) find the projects and existing mirror images they have set up
b) and confirm their ICSP objects have properly set up the mirror
c) and some of the existing build configs they have tried 

I will attempt to change the build configs do *NOT* override with an imagestream the dockerfile FROM with image:sha reference
as I described in https://bugzilla.redhat.com/show_bug.cgi?id=1745192#c32
and what Ben reiterated in https://bugzilla.redhat.com/show_bug.cgi?id=1745192#c41

As I mentioned to Ben/Adam in slack yesterday, and as he noted in https://bugzilla.redhat.com/show_bug.cgi?id=1745192#c41
the key at this point is this line in the build log:

Pulling image docker.io/nodeshift/centos7-s2i-nodejs ...

The line needs to container the sha of the image that was mirrored.

If that works, we can decided if 
a) we leave this in 4.2, *I* will mark this Verified, and we'll update the release notes / docs to clarify this need
wrt builds ... even if the env is not disconnected, we've validated the change the PR for this bug was introducing.
b) we move to 4.3, and get QE to try those precise steps above in a truly disconnected env if that is what is deemed required

If for some reason that does not work, then as Ben noted, we'll address this in 4.3 and backport to 4.2.x, and add 
the release note about the general issue with builds in disconnected.

Comment 45 Gabe Montero 2019-09-20 17:49:46 UTC
OK, using the AWS-Kubeconfig provided, and looking at @Wen's attempt on project wewang1, with the build config ruby-22-centos7
I'll start dumping various artifacts.

Here are the ICSPs.  You'll see an entry for docker.io/wewang58/ruby-22-centos7 in there.  I don't have the expertise to fully
know if it is correct, but it seems OK.

gmontero ~/QE_bzs/disconnected $ oc get imagecontentsourcepolicy --all-namespaces -o yaml 
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1alpha1
  kind: ImageContentSourcePolicy
  metadata:
    creationTimestamp: "2019-09-20T01:38:51Z"
    generation: 1
    name: image-policy-0
    resourceVersion: "435"
    selfLink: /apis/operator.openshift.io/v1alpha1/imagecontentsourcepolicies/image-policy-0
    uid: 6611d381-db47-11e9-b240-021d1471e24a
  spec:
    repositoryDigestMirrors:
    - mirrors:
      - ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/ocp/release
      source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
- apiVersion: operator.openshift.io/v1alpha1
  kind: ImageContentSourcePolicy
  metadata:
    creationTimestamp: "2019-09-20T01:38:51Z"
    generation: 1
    name: image-policy-1
    resourceVersion: "436"
    selfLink: /apis/operator.openshift.io/v1alpha1/imagecontentsourcepolicies/image-policy-1
    uid: 662e9c12-db47-11e9-b240-021d1471e24a
  spec:
    repositoryDigestMirrors:
    - mirrors:
      - ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/ocp/release
      source: registry.svc.ci.openshift.org/ocp/release
- apiVersion: operator.openshift.io/v1alpha1
  kind: ImageContentSourcePolicy
  metadata:
    creationTimestamp: "2019-09-20T02:57:07Z"
    generation: 1
    name: image-policy-centos
    resourceVersion: "30937"
    selfLink: /apis/operator.openshift.io/v1alpha1/imagecontentsourcepolicies/image-policy-centos
    uid: 54ffa7de-db52-11e9-8952-02b2fa52eb60
  spec:
    repositoryDigestMirrors:
    - mirrors:
      - ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7
      source: docker.io/wewang58/ruby-22-centos7
- apiVersion: operator.openshift.io/v1alpha1
  kind: ImageContentSourcePolicy
  metadata:
    creationTimestamp: "2019-09-20T05:36:28Z"
    generation: 1
    name: image-policy-ruby22
    resourceVersion: "75832"
    selfLink: /apis/operator.openshift.io/v1alpha1/imagecontentsourcepolicies/image-policy-ruby22
    uid: 9820c333-db68-11e9-a378-06d03e4c03ea
  spec:
    repositoryDigestMirrors:
    - mirrors:
      - ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/centos/ruby-22-centos7
      source: docker.io/centos/ruby-22-centos7
- apiVersion: operator.openshift.io/v1alpha1
  kind: ImageContentSourcePolicy
  metadata:
    creationTimestamp: "2019-09-20T08:30:37Z"
    generation: 1
    name: image-policy-wzheng
    resourceVersion: "126387"
    selfLink: /apis/operator.openshift.io/v1alpha1/imagecontentsourcepolicies/image-policy-wzheng
    uid: ec37266e-db80-11e9-9592-02b2fa52eb60
  spec:
    repositoryDigestMirrors:
    - mirrors:
      - ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/nodeshift/centos7-s2i-nodejs
      source: docker.io/nodeshift/centos7-s2i-nodejs

Comment 46 Gabe Montero 2019-09-20 17:51:31 UTC
Here is the registries.conf the build creates after accessing the ICPSs.  I see the docker.io/wewang58/ruby-22-centos7 entry there, pointing to the same mirror as the ICSP:

apiVersion: v1
data:
  registries.conf: |
    unqualified-search-registries = ["docker.io"]

    [[registry]]
      prefix = ""
      location = "docker.io/centos/ruby-22-centos7"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/centos/ruby-22-centos7"

    [[registry]]
      prefix = ""
      location = "docker.io/wewang58/ruby-22-centos7"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7"

    [[registry]]
      prefix = ""
      location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/ocp/release"

    [[registry]]
      prefix = ""
      location = "registry.svc.ci.openshift.org/ocp/release"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/ocp/release"
kind: ConfigMap

Comment 47 Gabe Montero 2019-09-20 17:52:54 UTC
The local imagestream oc new-build creates by default is odd, and this add validation to our theory.  You'll see the import did not work.

gmontero ~/QE_bzs/disconnected $ oc get is -o yaml
apiVersion: v1
items:
- apiVersion: image.openshift.io/v1
  kind: ImageStream
  metadata:
    annotations:
      openshift.io/generated-by: OpenShiftNewBuild
    creationTimestamp: "2019-09-20T03:05:14Z"
    generation: 1
    labels:
      build: ruby-22-centos7
    name: ruby-22-centos7
    namespace: wewang1
    resourceVersion: "34828"
    selfLink: /apis/image.openshift.io/v1/namespaces/wewang1/imagestreams/ruby-22-centos7
    uid: 7739aee2-db53-11e9-b6f4-0a580a820024
  spec:
    lookupPolicy:
      local: false
  status:
    dockerImageRepository: image-registry.openshift-image-registry.svc:5000/wewang1/ruby-22-centos7
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 48 Gabe Montero 2019-09-20 17:56:59 UTC
Lastly, I edited the buildconfig to remove any From references in the strategy.

    source:
      dockerfile: FROM docker.io/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa
      type: Dockerfile
    strategy:
      dockerStrategy: {}
      type: Docker
    successfulBuildsHistoryLimit: 5
    triggers:
    - github:
        secret: OdBXFputu9jqPMjIg111
      type: GitHub
    - generic:
        secret: MoYxBFDnGKKz_pXN4gjk
      type: Generic
    - type: ConfigChange


When I ran the build, the Pulling image reference is now correct, in that it has the sha:

Pulling image docker.io/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa ...

But the pull fails for a different reason than before:

error: build error: failed to pull image: After retrying 2 times, Pull image still failed due to error: while pulling "docker://docker.io/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa" as "docker.io/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa": Error initializing source docker://wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa: pinging docker registry returned: Get https://registry-1.docker.io/v2/: Forbidden


I can pull that sha locally

gmontero ~/QE_bzs/disconnected $ docker pull docker.io/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa
sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa: Pulling from wewang58/ruby-22-centos7




Digest: sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa
Status: Downloaded newer image for wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa
docker.io/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa
gmontero ~/QE_bzs/disconnected $ 

So progress, but something is still amiss.  Not sure what it is at first blush.

Comment 49 Gabe Montero 2019-09-20 17:58:26 UTC
I've moved out to 4.3.

I have a draft of the "this doesn't work" release note queued up, but will spend some more time understanding 

Error initializing source docker://wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa: pinging docker registry returned: Get https://registry-1.docker.io/v2/: Forbidden

before submitting the draft.

Comment 50 Gabe Montero 2019-09-20 18:44:00 UTC
Of course if the mirroring is correct, when does wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa get translated to the mirror location?

Comment 51 Gabe Montero 2019-09-20 19:01:59 UTC
And is there perhaps a token that is needed to access the mirror ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000 ?

Presumably it is a registry we have to log into and get a token, no?

I cannot pull from that registry on my own, as I get a cert error ... though 

@Wen - going through this admittedly long bugzilla, I do see our exchanges about the mirror's cert being in the config map 
the build controller mounts into the build pod, but I'm not finding anything about the mirror's auth token.

Perhaps you on Wenjing could provide the kubeadmin password for the AWS.kubeconfig file you provided, and I could see about 
logging into the console to get a token, and try the docker login / oc registry login, to try and at least get the token,
and create the secret for that.

Or if you do know that was not done perhaps you can do that in the wewang1 project for the BC I have modified.

I'm guessing the answer to my question in #comment 50 is that it is "under the covers" (i.e in containers/image), and the Forbidden error stems from
the fact that the mapping to the mirror has occurred "under the covers", and if in fact we do not have the token/creds for the mirror registry,
that is why it is Forbidden.

Comment 52 Miloslav Trmač 2019-09-20 22:27:40 UTC
(In reply to Gabe Montero from comment #50)
> Of course if the mirroring is correct, when does
> wewang58/ruby-22-centos7@sha256:
> fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa get
> translated to the mirror location?

The c/image library, transparently to CRI-O or c/buildah or openshift/builder. I’m afraid that does not currently show up in logs, even if accesses to that fail. So, the
> pinging docker registry returned: Get https://registry-1.docker.io/v2/: Forbidden
error means “all accesses to mirrors, if any, failed in some way that is not reported; then the access to the actual docker.io registry failed with a Forbidden error.”  (Is the Forbidden to docker.io consistent with the cluster setup?)




We definitely need to improve that, but right now, the most practical way to figure out what is going on would, I think, be to do the mapping manually and see what `podman pull` reports (or, similarly, but as it turns out below, not quite equivalently, kick of an OpenShift build that references the mirror directly):
> podman --log-level=debug pull docker://ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa
or so.  Outside of the cluster, with no configuration at all, I get
> ERRO[0001] error pulling image "docker://ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa": unable to pull docker://ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa: unable to pull image: Error initializing source docker://ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa: pinging docker registry returned: Get https://ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/v2/: x509: certificate signed by unknown authority 
and with … pull --tls-verify=false …
> ERRO[0001] error pulling image "docker://ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa": unable to pull docker://ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa: unable to pull image: Error initializing source docker://ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7@sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa: Error reading manifest sha256:fda21bc1af022fb34abbecb798a0cb1a37c82ef159b57df3e21688a6adcef9fa in ec2-18-219-50-21.us-east-2.compute.amazonaws.com:5000/wewang58/ruby-22-centos7: unauthorized: authentication required 

So, yes, the build must be configured to trust that registry, and authentication is probably required.

If I’m reading https://github.com/openshift/builder/blob/04c78176099139a5d229578a9a98ed2e1d17a19d/pkg/build/builder/daemonless.go#L275 and surrounding code right, the build pod can actually receive all the necessary secrets (for the upstream repository as well as and all mirrors) via $PULL_DOCKERCFG_PATH, but the “pull image” path code is structured in a way that only supports passing along exactly one secret, a secret that matches the upstream repository (i.e. none of the mirrors); unlike e.g. https://github.com/openshift/builder/blob/04c78176099139a5d229578a9a98ed2e1d17a19d/pkg/build/builder/daemonless.go#L142 , which at least in principle seems to support providing multiple secrets.

I can’t see anything openshift/builder is explicitly doing to manage TLS trusted CAs; https://github.com/openshift/openshift-controller-manager/blob/bf63394ad3ad412202d00792612e9b5fbfd4dd27/pkg/build/controller/strategy/util.go#L506  presumably already works for all kinds of registries and is not directly affected by the builder code, and that one should work for the mirrors in the usual way (but it might have to be explicitly configured).

Comment 53 Wenjing Zheng 2019-09-23 10:09:17 UTC
Gabe, I have sent necessary authentication to you in the email. 

And after podman login with correct username/password, I can podman pull the image:
# podman --log-level=debug pull docker://ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000/centos/ruby-22-centos7:latest
<snip>
DEBU[0280] set names of image "e42d0dccf073123561d83ea8bbc9f0cc5e491cfd07130a464a416cdb99ced387" to [ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000/centos/ruby-22-centos7:latest] 
DEBU[0280] saved image metadata "{}"                    
DEBU[0280] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000/centos/ruby-22-centos7:latest" 
e42d0dccf073123561d83ea8bbc9f0cc5e491cfd07130a464a416cdb99ced387

Comment 54 Gabe Montero 2019-09-23 18:48:34 UTC
OK, I was able to log onto @Wenjing's cluster (fyi Wenjing, oc complained about the format of the kubeconfig you emailed, but I was able to pull the api server address from it and use the kubeadmin password you provided)

And I was able to use the cert, id, password to pull from the mirror via podman from my system.

and then validate the latest flavor of build config QE has, validate the auth/cert are getting injected into the build pod, the registry.conf file is properly created, and the basic gist of Miloslav's theory in #comment 52, though in addition to the links he noted, 
there is also https://github.com/openshift/builder/blob/04c78176099139a5d229578a9a98ed2e1d17a19d/pkg/build/builder/daemonless.go#L75-L80 for pulling the FROM image, and that is where we hit problems.

More changes are in fact needed in openshift/builder to facilitate the disconnected scenario.

Gory details:

1) the latest version of QE's BC does the trick, where they massaged the oc new-build generated BC as we discussed earlier ... this time, they had both a dockerfile: From and a dockerStrategy, but they made sure 
the SHA image refs, no images streams, are used, and the included the PULL secret for the mirror'ed registry.  I also added BUILD_LOGLEVEL=8 to get the confirming debug we've been needing/asking for.

    source:
      dockerfile: FROM docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae
      type: Dockerfile
    strategy:
      dockerStrategy:
        env:
        - name: BUILD_LOGLEVEL
          value: "8"
        from:
          kind: DockerImage
          name: docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae
        pullSecret:
          name: pull
      type: Docker

Both the removal of imagestreams and inclusion of the pull secret were items missing from previous runs.

2) I used oc debug on Wenjing's existing build pods to confirm that 
  a) the build secret with CAs cert mounting worked.  I confirmed that the disconnected cert she sent via email was in /etc/pki/ca-trust/extracted/tls-ca-bundle.pem
  b) the pull secret she specifed there was a dockerconfigjson secret that included an entry for her mirror.  In particular, /run/secrets/openshift.io/pull/.dockerconfigjson
     contained:

		"ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000": {
			"auth": "ZHVtbXk6ZHVtbXk="
		},

     based on how that looks, I'm assuming that a token was provided for this secret vs. the username/password Wenjing provided me.
  c) we previously had confirmed the registries.conf file in the sysconfig configmap looked good after creating the ICSPs ... it still does with these latest runs

  registries.conf: |
    unqualified-search-registries = ["docker.io"]

    [[registry]]
      prefix = ""
      location = "docker.io/centos/ruby-22-centos7"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000/centos/ruby-22-centos7"

    [[registry]]
      prefix = ""
      location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000/ocp/release"

    [[registry]]
      prefix = ""
      location = "registry.svc.ci.openshift.org/ocp/release"
      mirror-by-digest-only = true

      [[registry.mirror]]
        location = "ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000/ocp/release"


3) On re-running the build with loglevel 8, you can see Miloslav's theory .... Only Wenjing's creds for docker are being passed down.  I'll attach the entire log file separately, but 
   here is the key snippet (by the way, our running at loglevel 6 or greater triggered debug level logging in c/image, CRI-O, buildah, etc.):

Asked to pull fresh copy of "docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae".
I0923 18:16:37.240758       1 daemonless.go:544] Setting authentication for registry "docker.io" for "docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae".
time="2019-09-23T18:16:37Z" level=debug msg="parsed reference into \"[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae\""
time="2019-09-23T18:16:37Z" level=debug msg="parsed reference into \"[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae\""
time="2019-09-23T18:16:37Z" level=debug msg="reference \"[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae\" does not resolve to an image ID"
time="2019-09-23T18:16:37Z" level=debug msg="registry \"docker.io\" is not listed in registries configuration \"/var/run/configs/openshift.io/build-system/registries.conf\", assuming it's not blocked"
time="2019-09-23T18:16:37Z" level=debug msg="parsed reference into \"[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae\""
time="2019-09-23T18:16:37Z" level=debug msg="parsed reference into \"[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae\""
time="2019-09-23T18:16:37Z" level=debug msg="copying \"docker://centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae\" to \"docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae\""
time="2019-09-23T18:16:37Z" level=debug msg="starting to write to image \"containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae\" using blob cache in \"/var/cache/blobs\""
time="2019-09-23T18:16:37Z" level=debug msg="reference rewritten from 'docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae' to 'ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae'"
time="2019-09-23T18:16:37Z" level=debug msg="reference rewritten from 'docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae' to 'docker.io/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae'"
time="2019-09-23T18:16:37Z" level=debug msg="Trying to pull \"ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000/centos/ruby-22-centos7@sha256:a18c8706118a5c4c9f1adf045024d2abf06ba632b5674b23421019ee4d3edcae\""
time="2019-09-23T18:16:37Z" level=debug msg="Credentials not found"


I'll start working on a PR.

Comment 55 Gabe Montero 2019-09-23 19:08:33 UTC
Created attachment 1618338 [details]
build log at loglevel 8

Comment 56 Gabe Montero 2019-09-23 19:15:28 UTC
FYI ... things may also be further complicated but what old fsouza provides wrt auth config:  https://github.com/openshift/builder/blob/master/vendor/github.com/fsouza/go-dockerclient/auth.go#L25-L30

Seems to be only username/password.

Doesn't seem like it handles token based pull secrets like the one Wenjing provided:


		"ec2-18-221-93-104.us-east-2.compute.amazonaws.com:5000": {
			"auth": "ZHVtbXk6ZHVtbXk="
		},

Comment 57 Gabe Montero 2019-09-23 21:18:09 UTC
OK according to https://stackoverflow.com/questions/43441454/docker-login-auth-token 

"auth" is base64 encoded username:password

And I know see that when I base64 -d the value "ZHVtbXk6ZHVtbXk="

that said, I do not believe the json serialization with the current fsouza struct will work

Comment 58 Miloslav Trmač 2019-09-23 21:34:45 UTC
What is the precise type (`SecretType…`) of the secret? (Or what kind of object it is at what point in the API?) The above looks like a reasonably accurate _partial fragment_ of https://github.com/projectatomic/docker/blob/f9f056ec099cc3849d15e36f08fec50130c20073/cliconfig/configfile/file.go#L24 , with the base64 encoding that is used in ~/.docker/config.json-formatted files.  (I can’t see it in the log file, so I can’t tell whether that is what it is supposed to look like, or whether it is malformed.)

(Also, note that fsouza/go-dockerclient is really only relevant for Docker daemon connections, it’s not used at all for c/image and c/buildah ; if you see that as the place where data is lost, either the code is reusing the fsouza/go-dockerclient for not-strictly-related purposes, or something is very misconfigured to use Docker instead of CRI-O/buildah.)

Comment 59 Gabe Montero 2019-09-24 16:30:51 UTC
my concern about auth getting lost in populating the fsouza struct is a red herring, with respect to the situation here.  And to clarify, the use of the fsouza structs are just an ecapsulating mechanism for propagating the data through the code until the calls to set up containers/image. 

I was able to confirm via unit tests that the keyring stuff manages to take the data from "auth" and populate username and password i.e. https://stackoverflow.com/questions/43441454/docker-login-auth-token

Comment 60 Gabe Montero 2019-10-08 17:29:51 UTC
Just did some booking keeping ... I had to craft https://github.com/containers/image/pull/722 so that c/image could properly handle openshift build pull secrets (where they leveraged the legacy format via .dockercfg files and the like).

I linked it to this bug directly, since I assumed the openshift bugzilla bot would not work for https://github.com/containers

https://github.com/openshift/builder/pull/102 is showing green with the e2e's and we are in the process of final review

Comment 63 Gabe Montero 2019-10-14 18:33:07 UTC
OK the last of the changes needed to allow for authentication against a mirrored registry when performing builds has merge

As this moves back to QE, a quick recap on the many things to reconcile when trying this again:

1) The input image reference in you builds must be by sha/digest, and that image needs to be mirrored
2) the output from the mirror command should given you the info needed to create an ImageContentSourcePolicy; that will be needed so the build controller can construct a proper registries.conf for containers/image and buildah
3) any certs needed to communicate with the mirrored registry need to be added either via a) the new global ca support introduced in 4.2, or b) via an explicit CA secret supplied on the build / build config
4) any authentication needed to communicate with the mirrored registry needs to be added as a pull secret to the build /build config 

If you run into an problems, running the build with loglevel 8 (i.e. set the BUILD_LOGLEVEL env var on the build config) should be gathered.  With that, we should get both openshift build and containers/image/buildah debug 
info to see where things are breaking down.

Comment 66 Wenjing Zheng 2019-11-29 07:03:14 UTC
Created attachment 1640587 [details]
Build log with log level 8

Comment 67 Gabe Montero 2019-12-01 20:35:21 UTC
According to your repro steps @Wenjing you only linked your secrete to the default SA:

$oc secrets link default pullsecret --for=pull

you need to link it to the builder SA in order for it to get picked up.

Please retry, but where you link to the builder SA as well, and we'll go from there.

Comment 68 Wenjing Zheng 2019-12-02 07:20:35 UTC
Actually, I have tried to link my secret to the builder SA. Still failed back then. But I have figure our why I fail now.

If I create secret with below command, it will fail:
$oc create secret docker-registry pullsecret \
    --docker-server=upshift.mirror-registry.qe.devcluster.openshift.com:5000 \
    --docker-username=xxxx \
    --docker-password=xxxx\
    --docker-email=wzheng

If I create secret with below command, it will succeed:
$docker login upshift.mirror-registry.qe.devcluster.openshift.com:5000 -u xxxx -p xxxx
$oc create secret generic pull --from-file=.dockerconfigjson=/home/wzheng/.docker/config.json --type=kubernetes.io/dockerconfigjson

Anyway, this is no related to current bug now, I will move this bug to verified on 4.3.0-0.nightly-2019-11-29-051144.

Comment 69 Ben Parees 2019-12-02 13:44:51 UTC
i believe that "oc create docker-registry pullsecret" creates a ".dockercfg" secret, which is a different format from a "dockerconfigjson" secret (dockerconfigjson is the newer format).  I would have expected both secrets to work, however.  It might be worth investigating a little further if we lost the ability for builds to use ".dockercfg" secrets at some point, or if in particular they are not working with the mirroring logic.

Comment 70 Gabe Montero 2019-12-02 19:35:06 UTC
Actually Ben I just tried it and it creates a dockerconfigjson secret:

gmontero ~ $ oc create secret docker-registry pullsecret --docker-server=upshift.mirror-registry.qe.devcluster.openshift.com:5000 --docker-username=xxxx --docker-password=xxxx --docker-email=wzheng

secret/pullsecret created
gmontero ~ $ oc get secret pullsecret -o yaml 
apiVersion: v1
data:
  .dockerconfigjson: eyJhdXRocyI6eyJ1cHNoaWZ0Lm1pcnJvci1yZWdpc3RyeS5xZS5kZXZjbHVzdGVyLm9wZW5zaGlmdC5jb206NTAwMCI6eyJ1c2VybmFtZSI6Inh4eHgiLCJwYXNzd29yZCI6Inh4eHgiLCJlbWFpbCI6Ind6aGVuZ0ByZWRoYXQuY29tIiwiYXV0aCI6ImVIaDRlRHA0ZUhoNCJ9fX0=
kind: Secret
metadata:
  creationTimestamp: "2019-12-02T19:09:27Z"
  name: pullsecret
  namespace: ggmtest
  resourceVersion: "39465"
  selfLink: /api/v1/namespaces/ggmtest/secrets/pullsecret
  uid: f2d40f7d-cae9-4de8-8a7b-6e8ffd329cc6
type: kubernetes.io/dockerconfigjson
gmontero ~ $ 

That said, I know .dockercfg format works because I had to submit a fix to containers/image in order to get the image registry pull secret to work for builds, since that still uses .dockercfg format.

Using the second form, oc create secret generic vs. oc create secret docker-registry, creates a very similar secret, except the value for the key ".dockerconfigjson" is different, since you are pointing it at your entire config.json file.

That would imply to me that containers/image does not like the format of the data stored in the ".dockerconfigjson" entry when one uses "oc create secret docker-registry ..", though in looking
at the params for that command, perhaps the option

 --generator='secret-for-docker-registry/v1': The name of the API generator to use.

is needed to create the secret in a format containers/image wants.

End of the day, with whatever new bug is opened, who do we assign the debug/diagnosis to ... either the containers team or us.

Comment 74 errata-xmlrpc 2020-01-23 11:05:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.