Bug 1508563 - Image tag keeps disappearing
Summary: Image tag keeps disappearing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.8.0
Assignee: Michal Minar
QA Contact: Dongbo Yan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-01 16:44 UTC by Stefanie Forrester
Modified: 2018-03-28 14:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Broken prioritization algorithm of imagestreamtags could not cope with two tags with the same name, one having additional "v" prefix. Consequence: One of the tags got lost during image stream update. Fix: Prioritization algorithm has been fixed. Conversion functions no longer use the prioritization algorithm. Result: Image stream tags do not disappear.
Clone Of:
Environment:
Last Closed: 2018-03-28 14:09:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Current running configuration of our docker registry (5.93 KB, text/plain)
2017-11-01 16:44 UTC, Stefanie Forrester
no flags Details
master-config (7.04 KB, text/plain)
2017-11-01 16:47 UTC, Stefanie Forrester
no flags Details
node-config (1.32 KB, text/plain)
2017-11-01 16:47 UTC, Stefanie Forrester
no flags Details
Example of destroying a tag in real time (3.27 KB, text/plain)
2017-11-21 22:02 UTC, Justin Pierce
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:10:51 UTC

Description Stefanie Forrester 2017-11-01 16:44:16 UTC
Created attachment 1346634 [details]
Current running configuration of our docker registry

Description of problem:

We have a docker registry running on a 3.5 cluster that keeps losing or deleting some image tags. Every time we upload the tags again, they end up disappearing within a day or so.

Version-Release number of selected component (if applicable):

oc v3.5.5.26
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://internal.api.reg-aws.openshift.com:443
openshift v3.5.5.26
kubernetes v1.5.2+43a9be4

image: registry.access.redhat.com/openshift3/ose-docker-registry:v3.5.5.26


How reproducible:

Every time.

Steps to Reproduce:
1. Add the tag 'v3.7.0' to an image.
2. Wait 12-24 hours.
3. Check to see if the tag is still there. 'oc get is -n openshift3 ose -o yaml'.

Actual results:

Tag v3.7.0 has gone missing from the image yaml.

Expected results:

Tag v3.7.0 should still exist, and should have new entries pre-pended to it as we push that tag a few times per week, (similar to the 'latest' tag).

Additional info:

Comment 2 Stefanie Forrester 2017-11-01 16:47:14 UTC
Created attachment 1346636 [details]
master-config

Comment 3 Stefanie Forrester 2017-11-01 16:47:34 UTC
Created attachment 1346637 [details]
node-config

Comment 4 Ben Parees 2017-11-01 16:49:15 UTC
imagestream tags are not the same as what is in your registry.

I suspect you are updating(replacing) your imagestreams with one that does not have the 3.7.0 tag.

either that or you are running pruning and the 3.7.0 tag is not referenced by anything.

Comment 6 Stefanie Forrester 2017-11-01 16:56:53 UTC
(In reply to Ben Parees from comment #4)
> imagestream tags are not the same as what is in your registry.
> 
> I suspect you are updating(replacing) your imagestreams with one that does
> not have the 3.7.0 tag.
> 
> either that or you are running pruning and the 3.7.0 tag is not referenced
> by anything.

Ok, it sounds like I'm checking for the existence of this tag incorrectly then. That's good to know, thanks. But I can also confirm that the tag is missing by doing this:

[root@online-int-master-05114 ~]# curl -sH "Authorization: Bearer $(oc --config=/root/.kube/reg-aws whoami -t)" https://registry.reg-aws.openshift.com/v2/openshift3/ose/tags/list  | python -m json.tool |grep v3.7.0\"
[root@online-int-master-05114 ~]# 

Whereas the same command tells me that tag v3.7 exists.

[root@online-int-master-05114 ~]# curl -sH "Authorization: Bearer $(oc --config=/root/.kube/reg-aws whoami -t)" https://registry.reg-aws.openshift.com/v2/openshift3/ose/tags/list  | python -m json.tool |grep v3.7\"
        "v3.7",


I checked for the presence of a pruning cron job on the Ops side, but it appears to be gone. I had disabled that job over a week ago, so it makes sense that the cron job is gone now.

Is there maybe somewhere else I can check for the presence of a pruning job? Maybe something internal to openshift?

Would it help if I posted the master audit logs?

Comment 8 Ben Parees 2017-11-02 02:39:35 UTC
the only way to run pruning is via oadm prune (or oc adm prune).

But it can be run from anywhere that has admin credentials.

I'm not aware of any other "normal" mechanism that would be just deleting tags out of the registry.

Comment 10 Ben Parees 2017-11-02 20:02:11 UTC
I spoke w/ Stefanie and she's going to set loglevel 3 on the master-api and master-controllers so we can catch the DELETE api calls (assuming they are happening).  

In theory we should never see a delete event on this cluster since no tags are ever being deleted, so if we see any in the logs that is indicative that someone is explicitly deleting them.  (Assuming we do see the DELETE api call though, i'm still not sure how we track down who is doing it... we'll get some client information, hopefully that will be enough).

Comment 11 Ben Parees 2017-11-06 23:01:47 UTC
have any more tags disappeared since we turned on logging?

Comment 18 Justin Pierce 2017-11-21 22:02:52 UTC
Created attachment 1357016 [details]
Example of destroying a tag in real time

Comment 19 Justin Pierce 2017-11-21 22:07:33 UTC
In the attached example, pushing "3.7.9" destroys the "v3.7.9" tag. Pushing "3.7.9-1" destroys the "v3.7.9-1" tag.

Comment 20 Michal Minar 2017-11-22 12:17:02 UTC
Wow, that's really interesting. I can reproduce it locally on 3.7 cluster. Debugging now. Thanks for the reproducer Justin!

Comment 21 Michal Minar 2017-11-22 12:23:07 UTC
Sorry, false alarm. My reproducer was buggy. Trying to reproduce once more.

Comment 22 Michal Minar 2017-11-22 13:05:16 UTC
I switched to latest 3.5 release and am happy to report that I can reproduce there.

Comment 23 Ben Parees 2017-11-22 14:25:24 UTC
That is fantastic news, thank you Michal.

Justin in the meantime, you're going to want to make very sure you don't run your job w/ the wrong tag being pushed since that seems to be the definitive cause of the "good" tags being lost.

Comment 24 Michal Minar 2017-11-23 16:09:07 UTC
Fix: https://github.com/openshift/ose/pull/932

Comment 27 Dongbo Yan 2018-01-04 09:16:57 UTC
Verified
oc v3.9.0-0.9.0
kubernetes v1.8.1+0d5291c
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server 
openshift v3.9.0-0.9.0
kubernetes v1.8.1+0d5291c

1.push image to imagestreamTag nodejs-mongodb-example:v3.9, then check image if exists.
2.push image to imagestreamTag nodejs-mongodb-example:3.9, then check v3.9 image if exists.

both v3.9 and 3.9 images are existing, could move to verified

Comment 30 errata-xmlrpc 2018-03-28 14:09:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.