Description of problem: The following test fails: [Feature:Builds][Conformance] oc new-app should succeed with a --name of 58 characters [Suite:openshift/conformance/parallel/minimal] With the following message: The build "xxxx" status is "Failed" Caused by: error: build error: After retrying 2 times, Pull image still failed due to error: while pulling "docker://registry.redhat.io/rhoar-nodejs/nodejs-10@sha256:cd0003f4abfa61f4e801ca498acb20120008ecd3ea7ce7bb618e06ab7b2f8b1d" as "registry.redhat.io/rhoar-nodejs/nodejs-10@sha256:cd0003f4abfa61f4e801ca498acb20120008ecd3ea7ce7bb618e06ab7b2f8b1d": Error determining manifest MIME type for docker://registry.redhat.io/rhoar-nodejs/nodejs-10@sha256:cd0003f4abfa61f4e801ca498acb20120008ecd3ea7ce7bb618e06ab7b2f8b1d: unable to retrieve auth token: invalid username/password Version-Release number of selected component (if applicable): 4.1 How reproducible: Run a *-master-e2e-aws test. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Test to succeed Additional info: Failed builds: pull/openshift_console-operator/211/pull-ci-openshift-console-operator-master-e2e-aws/1333/build-log.txt pull/operator-framework_operator-marketplace/167/pull-ci-operator-framework-operator-marketplace-master-e2e-aws/809/build-log.txt pull/openshift_cluster-kube-apiserver-operator/437/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws/2089/build-log.txt pull/openshift_cluster-svcat-apiserver-operator/46/pull-ci-openshift-cluster-svcat-apiserver-operator-master-e2e-aws/97/build-log.txt pull/openshift_cluster-svcat-apiserver-operator/44/pull-ci-openshift-cluster-svcat-apiserver-operator-master-e2e-aws/95/build-log.txt pull/openshift_cluster-image-registry-operator/259/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1314/build-log.txt pull/openshift_cluster-samples-operator/137/pull-ci-openshift-cluster-samples-operator-master-e2e-aws/498/build-log.txt pull/openshift_installer/1649/pull-ci-openshift-installer-master-e2e-aws/5516/build-log.txt pull/openshift_installer/1639/pull-ci-openshift-installer-master-e2e-aws/5522/build-log.txt pull/22403/pull-ci-openshift-origin-master-e2e-aws/7718/build-log.txt pull/openshift_release/3565/rehearse-3565-pull-ci-openshift-cluster-update-keys-master-e2e-aws/4/build-log.txt pull/openshift_cluster-monitoring-operator/314/pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws/754/build-log.txt pull/openshift_cluster-monitoring-operator/334/pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws/761/build-log.txt pull/openshift_elasticsearch-operator/125/pull-ci-openshift-elasticsearch-operator-master-e2e-aws/538/build-log.txt pull/22369/pull-ci-openshift-origin-master-e2e-aws/7666/build-log.txt pull/22571/pull-ci-openshift-origin-master-e2e-aws/7658/build-log.txt pull/openshift_cluster-authentication-operator/112/pull-ci-openshift-cluster-authentication-operator-master-e2e-aws/573/build-log.txt pull/openshift_cluster-authentication-operator/115/pull-ci-openshift-cluster-authentication-operator-master-e2e-aws/574/build-log.txt pull/openshift_cluster-authentication-operator/115/pull-ci-openshift-cluster-authentication-operator-master-e2e-aws/576/build-log.txt pull/openshift_cluster-authentication-operator/111/pull-ci-openshift-cluster-authentication-operator-master-e2e-aws/565/build-log.txt pull/22581/pull-ci-openshift-origin-master-e2e-aws-builds/1440/build-log.txt pull/openshift_console/1478/pull-ci-openshift-console-master-e2e-aws/912/build-log.txt pull/openshift_jenkins/842/pull-ci-openshift-jenkins-master-e2e-aws/219/build-log.txt pull/openshift_cluster-version-operator/170/pull-ci-openshift-cluster-version-operator-master-e2e-aws/633/build-log.txt pull/openshift_machine-api-operator/296/pull-ci-openshift-machine-api-operator-master-e2e-aws/984/build-log.txt pull/openshift_machine-config-operator/613/pull-ci-openshift-machine-config-operator-master-e2e-aws/3303/build-log.txt release-openshift-origin-installer-e2e-aws-4.1/121/build-log.txt release-openshift-origin-installer-e2e-aws-4.1/129/build-log.txt
Linking to 1694878 - potentially related
41 out of the last 175 (23%) CI failures have the symptoms in this bug report.
@Corey We need to add additional debugging to this test to figure out why the build is trying to pull from registry.redhat.io. The test _should_ be referencing the appropriate nodejs imagestreamtag on the cluster registry. Try dumping the following on failure: 1. The nodejs imagestream YAML 2. The YAML for the BuildConfig created by new-app.
looking at https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_console-operator/211/pull-ci-openshift-console-operator-master-e2e-aws/1333/ based on this: --> Found image 2413420 (2 weeks old) in image stream "openshift/nodejs" under tag "10" for "nodejs" i'm pretty confident the buildconfig is using an appropriate reference to the openshift/nodejs:10 imagestreamtag. So that means either: 1) the build controller, when resolving the imagestreamtag to an image reference, didn't properly resolve it as a "local" reference 2) the imagestreamtag status itself isn't properly populated such that it points to the internal registry.... i'm not sure if that's a function of when the tag is imported, or when the tag is resolved, but my likely suspect would be that the openshift apiserver's config doesn't have the registry's internal hostname set properly at the time when this is being set. Since this is happening as a flake, (2) seems most likely.
Is the same reason with my follow steps? we thought it should be expected error result of the following steps before, because get images from registry.redhat.io which is product repoistory,it should have username/password, not get imagestream from openshift/xxx directly. steps: 1. Create a project wewang1 2. Tag ruby image $oc tag openshift/ruby:latest ruby:latest -n wewang1 $ oc get is NAME IMAGE REPOSITORY TAGS UPDATED ruby image-registry.openshift-image-registry.svc:5000/wewang1/ruby latest 7 minutes ago $ oc describe is ruby Name: ruby Namespace: wewang1 Created: 25 seconds ago Labels: <none> Annotations: openshift.io/image.dockerRepositoryCheck=2019-04-25T08:09:45Z Image Repository: image-registry.openshift-image-registry.svc:5000/wewang1/ruby Image Lookup: local=false Unique Images: 1 Tags: 1 latest tagged from openshift/ruby@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775 * registry.redhat.io/rhscl/ruby-25-rhel7@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775 14 seconds ago 3.Create app $ oc new-app wewang1/ruby:latest~https://github.com/sclorg/ruby-ex.git ruby-ex-1 Source Git@c00ecd7 Failed (GenericBuildFailed) 32 seconds ago 32s $ oc logs build/ruby-ex-1 Cloning "https://github.com/sclorg/ruby-ex.git" ... Commit: c00ecd7c762590f1d52c316c7d00141a745ede18 (Merge pull request #25 from pvalena/master) Author: Honza Horak <hhorak> Date: Thu Dec 13 15:35:54 2018 +0100 Caching blobs under "/var/cache/blobs". Warning: Pull failed, retrying in 5s ... Warning: Pull failed, retrying in 5s ... Warning: Pull failed, retrying in 5s ... error: build error: After retrying 2 times, Pull image still failed due to error: while pulling "docker://registry.redhat.io/rhscl/ruby-25-rhel7@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775" as "registry.redhat.io/rhscl/ruby-25-rhel7@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775": Error determining manifest MIME type for docker://registry.redhat.io/rhscl/ruby-25-rhel7@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775: unable to retrieve auth token: invalid username/password
yes that looks the same. can you supply: oc get is openshift/ruby -o yaml oc get images.config.openshift.io -o yaml ?
Disabling test until a resolution can be found https://github.com/openshift/origin/pull/22676
Created attachment 1558913 [details] imagestream openshift/ruby yaml
Created attachment 1558914 [details] images.config.openshift.io cluster yaml
@Ben Parees, Added attachements to the bug, please check it.
*** Bug 1703399 has been marked as a duplicate of this bug. ***
Corey, this is how the imagestream gets updated w/ the internal docker registry hostname on a Get: https://github.com/openshift/origin/blob/master/pkg/image/apiserver/registry/imagestream/strategy.go#L563-L580 and this should be the code that resolves an imagestream reference in a buildconfig, to a docker pull spec, when a build is created: https://github.com/openshift/origin/blob/master/pkg/build/controller/build/build_controller.go#L819 Note that because the internal registry hostname is added to the imagestream as part of a decorate operation during a Get, an event watcher that is caching objects could have gotten an event for the imagestream *before* the internal registry hostname was set. It would then cache that imagestream, with no internal registry hostname. Then when the internal registry hostname is published, no event is generated (because it's not an update to the imagestream). The user of that cached object (ie the build_controller or imagestream_controller that resolve imagestreams to pullspecs) would be working off a stale object w/ no internal registryhostname set, and thus resolve to the external pullspec instead. I *thought* we had safely addressed this because the openshift controller should be getting restarted any time the internal registryhostname is changed, thus wiping out any cached values in the controllers. But perhaps something has changed such that that is no longer guaranteed? Either that or we're managing to run the build before the internal registry hostname value has been propagated to the openshift apiserver. At least those are my theories.
I think you may be onto something Ben. See my PR to drop the port number from the internal registry hostname [1]. If our tests were running on a clean install, the test should never tried to pull from `:5000`. [1] https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/259/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws-image-registry/723#openshift-tests-featureimagequotaregistryserialsuiteopenshiftregistryserial-image-limit-range-should-deny-a-docker-image-reference-exceeding-limit-on-openshiftioimage-tags-resource-suiteopenshift-disabledspecialconfig
PR to wait for registry hostname for imagestream tests: https://github.com/openshift/origin/pull/22705
Blocked by Jenkins/cri-o issue
Merged (temporarily disabling Jenkins sync test until cri-o issue is resolved)
PR to fix flaking test: https://github.com/openshift/origin/pull/22736
Tested e2e test in my local host, it works now: http://pastebin.test.redhat.com/760597 Version: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.0-0.nightly-2019-05-04-210601 True False 4h40m Cluster version is 4.1.0-0.nightly-2019-05-04-210601 payload: image: registry.svc.ci.openshift.org/ocp/release@sha256:7e5686825a7cbd2fa17b0179933a8e65bdfca3af1f499fffc63f0ac101f718a0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-shared-vpc-4.3/563