Created attachment 1346634 [details] Current running configuration of our docker registry Description of problem: We have a docker registry running on a 3.5 cluster that keeps losing or deleting some image tags. Every time we upload the tags again, they end up disappearing within a day or so. Version-Release number of selected component (if applicable): oc v3.5.5.26 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://internal.api.reg-aws.openshift.com:443 openshift v3.5.5.26 kubernetes v1.5.2+43a9be4 image: registry.access.redhat.com/openshift3/ose-docker-registry:v3.5.5.26 How reproducible: Every time. Steps to Reproduce: 1. Add the tag 'v3.7.0' to an image. 2. Wait 12-24 hours. 3. Check to see if the tag is still there. 'oc get is -n openshift3 ose -o yaml'. Actual results: Tag v3.7.0 has gone missing from the image yaml. Expected results: Tag v3.7.0 should still exist, and should have new entries pre-pended to it as we push that tag a few times per week, (similar to the 'latest' tag). Additional info:
Created attachment 1346636 [details] master-config
Created attachment 1346637 [details] node-config
imagestream tags are not the same as what is in your registry. I suspect you are updating(replacing) your imagestreams with one that does not have the 3.7.0 tag. either that or you are running pruning and the 3.7.0 tag is not referenced by anything.
(In reply to Ben Parees from comment #4) > imagestream tags are not the same as what is in your registry. > > I suspect you are updating(replacing) your imagestreams with one that does > not have the 3.7.0 tag. > > either that or you are running pruning and the 3.7.0 tag is not referenced > by anything. Ok, it sounds like I'm checking for the existence of this tag incorrectly then. That's good to know, thanks. But I can also confirm that the tag is missing by doing this: [root@online-int-master-05114 ~]# curl -sH "Authorization: Bearer $(oc --config=/root/.kube/reg-aws whoami -t)" https://registry.reg-aws.openshift.com/v2/openshift3/ose/tags/list | python -m json.tool |grep v3.7.0\" [root@online-int-master-05114 ~]# Whereas the same command tells me that tag v3.7 exists. [root@online-int-master-05114 ~]# curl -sH "Authorization: Bearer $(oc --config=/root/.kube/reg-aws whoami -t)" https://registry.reg-aws.openshift.com/v2/openshift3/ose/tags/list | python -m json.tool |grep v3.7\" "v3.7", I checked for the presence of a pruning cron job on the Ops side, but it appears to be gone. I had disabled that job over a week ago, so it makes sense that the cron job is gone now. Is there maybe somewhere else I can check for the presence of a pruning job? Maybe something internal to openshift? Would it help if I posted the master audit logs?
the only way to run pruning is via oadm prune (or oc adm prune). But it can be run from anywhere that has admin credentials. I'm not aware of any other "normal" mechanism that would be just deleting tags out of the registry.
I spoke w/ Stefanie and she's going to set loglevel 3 on the master-api and master-controllers so we can catch the DELETE api calls (assuming they are happening). In theory we should never see a delete event on this cluster since no tags are ever being deleted, so if we see any in the logs that is indicative that someone is explicitly deleting them. (Assuming we do see the DELETE api call though, i'm still not sure how we track down who is doing it... we'll get some client information, hopefully that will be enough).
have any more tags disappeared since we turned on logging?
Created attachment 1357016 [details] Example of destroying a tag in real time
In the attached example, pushing "3.7.9" destroys the "v3.7.9" tag. Pushing "3.7.9-1" destroys the "v3.7.9-1" tag.
Wow, that's really interesting. I can reproduce it locally on 3.7 cluster. Debugging now. Thanks for the reproducer Justin!
Sorry, false alarm. My reproducer was buggy. Trying to reproduce once more.
I switched to latest 3.5 release and am happy to report that I can reproduce there.
That is fantastic news, thank you Michal. Justin in the meantime, you're going to want to make very sure you don't run your job w/ the wrong tag being pushed since that seems to be the definitive cause of the "good" tags being lost.
Fix: https://github.com/openshift/ose/pull/932
origin master: https://github.com/openshift/origin/pull/17430 ocp 3.5: https://github.com/openshift/ose/pull/932 ocp 3.6: https://github.com/openshift/ose/pull/934 ocp 3.7: https://github.com/openshift/ose/pull/935
Verified oc v3.9.0-0.9.0 kubernetes v1.8.1+0d5291c features: Basic-Auth GSSAPI Kerberos SPNEGO Server openshift v3.9.0-0.9.0 kubernetes v1.8.1+0d5291c 1.push image to imagestreamTag nodejs-mongodb-example:v3.9, then check image if exists. 2.push image to imagestreamTag nodejs-mongodb-example:3.9, then check v3.9 image if exists. both v3.9 and 3.9 images are existing, could move to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489