Bug 1702743 - Pull image still failed due to error: while pulling "docker://registry.redhat.io/rhoar-nodejs/nodejs-10...
Summary: Pull image still failed due to error: while pulling "docker://registry.redhat...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.1.0
Assignee: Adam Kaplan
QA Contact: wewang
: 1703399 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2019-04-24 16:01 UTC by Luis Sanchez
Modified: 2020-01-31 21:28 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2019-06-04 10:47:56 UTC
Target Upstream Version:

Attachments (Terms of Use)
imagestream openshift/ruby yaml (6.85 KB, text/plain)
2019-04-26 05:32 UTC, wewang
no flags Details
images.config.openshift.io cluster yaml (641 bytes, text/plain)
2019-04-26 05:33 UTC, wewang
no flags Details

System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1694878 0 unspecified CLOSED Unexpected `Unauthorized` errors in e2e extended tests when openshift-apiserver available==true 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:48:03 UTC

Internal Links: 1767076

Description Luis Sanchez 2019-04-24 16:01:27 UTC
Description of problem:

The following test fails:

[Feature:Builds][Conformance] oc new-app should succeed with a --name of 58 characters [Suite:openshift/conformance/parallel/minimal]

With the following message:
The build "xxxx" status is "Failed"

Caused by:
error: build error: After retrying 2 times, Pull image still failed due to error: while pulling "docker://registry.redhat.io/rhoar-nodejs/nodejs-10@sha256:cd0003f4abfa61f4e801ca498acb20120008ecd3ea7ce7bb618e06ab7b2f8b1d" as "registry.redhat.io/rhoar-nodejs/nodejs-10@sha256:cd0003f4abfa61f4e801ca498acb20120008ecd3ea7ce7bb618e06ab7b2f8b1d": Error determining manifest MIME type for docker://registry.redhat.io/rhoar-nodejs/nodejs-10@sha256:cd0003f4abfa61f4e801ca498acb20120008ecd3ea7ce7bb618e06ab7b2f8b1d: unable to retrieve auth token: invalid username/password

Version-Release number of selected component (if applicable):


How reproducible:

Run a *-master-e2e-aws test.

Steps to Reproduce:

Actual results:

Expected results:

Test to succeed

Additional info:

Failed builds:

Comment 1 Adam Kaplan 2019-04-24 16:50:12 UTC
Linking to 1694878 - potentially related

Comment 2 Luis Sanchez 2019-04-24 19:06:22 UTC
41 out of the last 175 (23%) CI failures have the symptoms in this bug report.

Comment 3 Adam Kaplan 2019-04-24 21:39:13 UTC
We need to add additional debugging to this test to figure out why the build is trying to pull from registry.redhat.io. The test _should_ be referencing the appropriate nodejs imagestreamtag on the cluster registry.
Try dumping the following on failure:
1. The nodejs imagestream YAML
2. The YAML for the BuildConfig created by new-app.

Comment 4 Ben Parees 2019-04-25 02:30:01 UTC
looking at https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_console-operator/211/pull-ci-openshift-console-operator-master-e2e-aws/1333/

based on this:
--> Found image 2413420 (2 weeks old) in image stream "openshift/nodejs" under tag "10" for "nodejs"

i'm pretty confident the buildconfig is using an appropriate reference to the openshift/nodejs:10 imagestreamtag.

So that means either:
1) the build controller, when resolving the imagestreamtag to an image reference, didn't properly resolve it as a "local" reference
2) the imagestreamtag status itself isn't properly populated such that it points to the internal registry....  i'm not sure if that's a function of when the tag is imported, or when the tag is resolved, but my likely suspect would be that the openshift apiserver's config doesn't have the registry's internal hostname set properly at the time when this is being set.

Since this is happening as a flake, (2) seems most likely.

Comment 5 wewang 2019-04-25 08:19:39 UTC
Is the same reason with my follow steps? we thought it should be expected error result of the following steps before, 
because get images from registry.redhat.io which is product repoistory,it should have username/password, not get imagestream from openshift/xxx directly.

1. Create a project wewang1

2. Tag ruby image
   $oc tag openshift/ruby:latest ruby:latest -n wewang1
   $ oc get is
NAME      IMAGE REPOSITORY                                                   TAGS     UPDATED
ruby      image-registry.openshift-image-registry.svc:5000/wewang1/ruby      latest   7 minutes ago

$ oc describe is ruby 
Name:			ruby
Namespace:		wewang1
Created:		25 seconds ago
Labels:			<none>
Annotations:		openshift.io/image.dockerRepositoryCheck=2019-04-25T08:09:45Z
Image Repository:	image-registry.openshift-image-registry.svc:5000/wewang1/ruby
Image Lookup:		local=false
Unique Images:		1
Tags:			1

  tagged from openshift/ruby@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775

  * registry.redhat.io/rhscl/ruby-25-rhel7@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775
      14 seconds ago

3.Create app 
$ oc new-app wewang1/ruby:latest~https://github.com/sclorg/ruby-ex.git

ruby-ex-1   Source   Git@c00ecd7   Failed (GenericBuildFailed)   32 seconds ago   32s
$ oc logs build/ruby-ex-1
Cloning "https://github.com/sclorg/ruby-ex.git" ...
	Commit:	c00ecd7c762590f1d52c316c7d00141a745ede18 (Merge pull request #25 from pvalena/master)
	Author:	Honza Horak <hhorak@redhat.com>
	Date:	Thu Dec 13 15:35:54 2018 +0100
Caching blobs under "/var/cache/blobs".
Warning: Pull failed, retrying in 5s ...
Warning: Pull failed, retrying in 5s ...
Warning: Pull failed, retrying in 5s ...
error: build error: After retrying 2 times, Pull image still failed due to error: while pulling "docker://registry.redhat.io/rhscl/ruby-25-rhel7@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775" as "registry.redhat.io/rhscl/ruby-25-rhel7@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775": Error determining manifest MIME type for docker://registry.redhat.io/rhscl/ruby-25-rhel7@sha256:da945b963e3350db9368a27cfee3bfab6218dc78af1aa9331687f6ee45c99775: unable to retrieve auth token: invalid username/password

Comment 6 Ben Parees 2019-04-25 13:30:32 UTC
yes that looks the same.  can you supply:

oc get is openshift/ruby -o yaml
oc get images.config.openshift.io -o yaml


Comment 7 Corey Daley 2019-04-26 00:10:41 UTC
Disabling test until a resolution can be found

Comment 8 wewang 2019-04-26 05:32:15 UTC
Created attachment 1558913 [details]
imagestream  openshift/ruby yaml

Comment 9 wewang 2019-04-26 05:33:48 UTC
Created attachment 1558914 [details]
images.config.openshift.io cluster yaml

Comment 10 wewang 2019-04-26 05:35:42 UTC
@Ben Parees, Added attachements to the bug, please check it.

Comment 11 Ben Parees 2019-04-26 13:01:34 UTC
*** Bug 1703399 has been marked as a duplicate of this bug. ***

Comment 12 Ben Parees 2019-04-26 13:29:39 UTC
Corey, this is how the imagestream gets updated w/ the internal docker registry hostname on a Get:

and this should be the code that resolves an imagestream reference in a buildconfig, to a docker pull spec, when a build is created:

Note that because the internal registry hostname is added to the imagestream as part of a decorate operation during a Get, an event watcher that is caching objects could have gotten an event for the imagestream *before* the internal registry hostname was set.  It would then cache that imagestream, with no internal registry hostname.  Then when the internal registry hostname is published, no event is generated (because it's not an update to the imagestream).  The user of that cached object (ie the build_controller or imagestream_controller that resolve imagestreams to pullspecs) would be working off a stale object w/ no internal registryhostname set, and thus resolve to the external pullspec instead.

I *thought* we had safely addressed this because the openshift controller should be getting restarted any time the internal registryhostname is changed, thus wiping out any cached values in the controllers.  But perhaps something has changed such that that is no longer guaranteed?

Either that or we're managing to run the build before the internal registry hostname value has been propagated to the openshift apiserver.  

At least those are my theories.

Comment 14 Adam Kaplan 2019-04-30 11:16:51 UTC
PR to wait for registry hostname for imagestream tests: https://github.com/openshift/origin/pull/22705

Comment 15 Adam Kaplan 2019-05-03 01:27:10 UTC
Blocked by Jenkins/cri-o issue

Comment 16 Adam Kaplan 2019-05-03 12:39:58 UTC
Merged (temporarily disabling Jenkins sync test until cri-o issue is resolved)

Comment 17 Adam Kaplan 2019-05-04 18:58:20 UTC
PR to fix flaking test: https://github.com/openshift/origin/pull/22736

Comment 19 wewang 2019-05-05 06:59:14 UTC
Tested e2e test in my local host, it works now: http://pastebin.test.redhat.com/760597
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-04-210601   True        False         4h40m   Cluster version is 4.1.0-0.nightly-2019-05-04-210601
image: registry.svc.ci.openshift.org/ocp/release@sha256:7e5686825a7cbd2fa17b0179933a8e65bdfca3af1f499fffc63f0ac101f718a0

Comment 21 errata-xmlrpc 2019-06-04 10:47:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.