Bug 1912590

Summary: publicImageRepository not being populated
Product: OpenShift Container Platform Reporter: brad.williams
Component: ImageStreamsAssignee: Oleg Bulatov <obulatov>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.6CC: aos-bugs, ccoleman, jokerman, mfojtik, skuznets, wking
Target Milestone: ---Keywords: Regression, UpcomingSprint
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: just created imagestreams were not decorated with publicDockerImageRepository, and they were cached without this value Consequence: watchers didn't receive proper publicDockerImageRepository for just created objects Fix: decorate imagestreams when they are created Result: watchers (oc get -w) get imagestreams with correct publicDockerImageRepository
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:49:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description brad.williams 2021-01-04 20:37:09 UTC
Description of problem:
While investigating issues from migrating the release-controller, and the imagestreams it monitors, from the api.ci cluster (3.11) to the app.ci cluster (4.6), we observed that the publicImageRepository entry isn't always being populated.  

Version-Release number of selected component (if applicable):
4.6

How reproducible:
At the time. this was occurring continuously.  So much so, that we had to add a safeguard, in the release-controller, to prevent this issue from failing releases:
https://github.com/openshift/release-controller/pull/240

Actual results:
The publicImageRepository field wasn't populated until some point after the imagestream was created and potentially used by downstream processes. 

Expected results:
The publicImageRepository should always be populated.

Comment 2 Steve Kuznetsov 2021-01-04 20:50:22 UTC
It is always populated but after some time - when a new ImageStream is created, if a client has a Watch open they will see the ImageStream without this status field.

Comment 3 Stefan Schimanski 2021-01-05 08:57:47 UTC
Sending to Adam. Reducing severity as to my knowledge there is no production outage or data loss involved. Please comment if it is.

Comment 4 Adam Kaplan 2021-01-05 13:06:44 UTC
Sending to Oleg, as his team owns ImageStreams.

Comment 5 Clayton Coleman 2021-01-05 16:00:36 UTC
This completely breaks consistency of any client built on image stream watch.  I'm bumping it back to high until I get a determination of why it broke, it completely broke release controller.

From the moment the public route is created, there is NEVER any reason for this value to be empty unless the user deletes all the routes.  Any flakiness here is probably a broken operator.

Comment 6 Clayton Coleman 2021-01-05 16:05:02 UTC
There is no other place in our API we would accept "sometimes the API returns incorrect info to watch" that has not been a significant bug that imposes significant costs on clients, therefore, we fix it.

Comment 7 Oleg Bulatov 2021-01-06 15:12:50 UTC
So far I wasn't able to reproduce this problem, but I'm able to occasionally observe it on build02.

To observe events I used curl '.../apis/image.openshift.io/v1/imagestreams?resourceVersion=...&watch=true'. publicDockerImageRepository is usually populated except for some events that have type="ADDED". I don't see any patterns, previous and subsequent events in the same watch request have this field correctly populated.

Example of an incorrect event:

{"type":"ADDED","object":{"kind":"ImageStream","apiVersion":"image.openshift.io/v1","metadata":{"name":"pipeline","namespace":"ci-op-kimvmsws","selfLink":"/apis/image.openshift.io/v1/namespaces/ci-op-kimvmsws/imagestreams/pipeline","uid":"b8dfe90b-40b8-42a9-ab0e-26b8b8f63440","resourceVersion":"204693272","generation":1,"creationTimestamp":"2021-01-06T15:06:46Z","managedFields":[{"manager":"ci-operator","operation":"Update","apiVersion":"image.openshift.io/v1","time":"2021-01-06T15:06:46Z","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:lookupPolicy":{"f:local":{}}}}}]},"spec":{"lookupPolicy":{"local":true}},"status":{"dockerImageRepository":"image-registry.openshift-image-registry.svc:5000/ci-op-kimvmsws/pipeline"}}}

There are also events with tags (both in spec and status) but without publicDockerImageRepository.

Comment 8 Oleg Bulatov 2021-01-07 13:37:34 UTC
I was able to reproduce it on 4.7-nightly and 4.6.0. I cannot reproduce it on 4.5.24. So apparently it's a regression in 4.6.

Comment 11 Wenjing Zheng 2021-02-07 06:39:20 UTC
Verified on 4.7.0-0.nightly-2021-02-03-165316:
  status:
    dockerImageRepository: image-registry.openshift-image-registry.svc:5000/wzheng1/rails-postgresql-example
    publicDockerImageRepository: default-route-openshift-image-registry.apps.wsun47kuryr.0204-kdj.qe.rhcloud.com/wzheng1/rails-postgresql-example

Comment 14 errata-xmlrpc 2021-02-24 15:49:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633