Bug 1937020 - Release new from image stream chooses incorrect ID based on status
Summary: Release new from image stream chooses incorrect ID based on status
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Clayton Coleman
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-09 17:11 UTC by Hongkai Liu
Modified: 2021-07-27 22:52 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:52:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 797 0 None open Bug 1937020: `oc adm release new` should look at image stream status tags 2021-03-30 17:52:27 UTC
Github openshift oc pull 815 0 None open Bug 1937020: Releases from image streams must prefer status tag 2021-04-29 16:17:00 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:52:54 UTC

Description Hongkai Liu 2021-03-09 17:11:53 UTC
Description of problem:
it's a bug in oc new, triggered by the mirroring changes when we moved from single cluster to multi cluster (the push from child cluster to app.ci changes the behavior)

Version-Release number of selected component (if applicable):
oc binary (4.8.0-0.ci-2021-03-05-011742)

How reproducible:

$ oc adm release new -n ocp --from-image-stream 4.8 --to-image  quay.io/skumari/ocp-release:v4.8 machine-config-operator=quay.io/skumari/machine-config-operator:latest  --allow-missing-images=false 
info: Found 257 images in image stream
error: image "registry.ci.openshift.org/ocp/4.8@sha256:d11a40b241e7d790fb34dd500da5aef3b83755b929764b7f25ce22422e6f7f4c" not found: manifest unknown: manifest unknown

Comment 2 Clayton Coleman 2021-03-09 18:00:49 UTC
Roughly the old contract was:

1. look at spec tags, those are both intent and truth
2. fill in anything from status tags

That was ok because both release controller and CI model used tags and those were both intent and truth. However, when we switched to pushing (from satellite to app.ci) "truth" got more nuanced - intent and truth got separated.

This would be a backward incompatible change (by doing it, we might break old users who were expecting the old behavior).  However, in general the behavior of an image stream is that status is truth and spec is intent, and any push is reasonably expected to be part of the truth of the release.

Comment 3 W. Trevor King 2021-03-31 20:30:10 UTC
Pulling spec/status comparisons out of the ImageStream, to make this more concrete for folks like me without much ImageStream experience:

$ yaml2json <ocp.4.8.is.yaml | jq '.[].spec.tags[] | select(tostring | contains("sha256:d11a40b241e7d790fb34dd500da5aef3b83755b929764b7f25ce22422e6f7f4c"))'
{
  "annotations": null,
  "from": {
    "kind": "DockerImage",
    "name": "registry.svc.ci.openshift.org/ocp/4.8@sha256:d11a40b241e7d790fb34dd500da5aef3b83755b929764b7f25ce22422e6f7f4c"
  },
  "generation": 35,
  "importPolicy": {},
  "name": "ansible",
  "referencePolicy": {
    "type": "Local"
  }
}
$ yaml2json <ocp.4.8.is.yaml | jq '.[].status.tags[] | select(tostring | contains("sha256:d11a40b241e7d790fb34dd500da5aef3b83755b929764b7f25ce22422e6f7f4c"))'
...no hits...
$ yaml2json <ocp.4.8.is.yaml | jq '.[].status.tags[] | select(.tag == "ansible")'
{
  "items": [
    {
      "created": "2021-02-03T16:20:43Z",
      "dockerImageReference": "image-registry.openshift-image-registry.svc:5000/ocp/4.8@sha256:9e3a28e2ca71e24cb49ec2138c16710e7799d1df6ca72096e6576d0739d127ba",
      "generation": 3053,
      "image": "sha256:9e3a28e2ca71e24cb49ec2138c16710e7799d1df6ca72096e6576d0739d127ba"
    },
    {
      "created": "2021-02-02T20:18:09Z",
      "dockerImageReference": "image-registry.openshift-image-registry.svc:5000/ocp/4.8@sha256:e0dcfb44c9e2230fe968f2a5831c318c9a3f00fa5e1ff53ccb79f94f4497e053",
      "generation": 3009,
      "image": "sha256:e0dcfb44c9e2230fe968f2a5831c318c9a3f00fa5e1ff53ccb79f94f4497e053"
    },
    {
      "created": "2021-02-01T21:10:23Z",
      "dockerImageReference": "image-registry.openshift-image-registry.svc:5000/ocp/4.8@sha256:37f0119ad24f3fa4794692715cf8416804a6edbe6123fca4b368ce15ee4f3a1a",
      "generation": 2943,
      "image": "sha256:37f0119ad24f3fa4794692715cf8416804a6edbe6123fca4b368ce15ee4f3a1a"
    }
  ],
  "tag": "ansible"
}

Huh, those generation numbers seem completely decoupled between status and spec.  Picking a different image:

$ yaml2json <ocp.4.8.is.yaml | jq '.[].spec.tags[] | select(.name == "machine-os-content")'
{
  "annotations": null,
  "from": {
    "kind": "ImageStreamImage",
    "name": "4.8-art-latest@sha256:a32077727aa2ef96a1e2371dbcc53ba06f3d9727e836b72be0f0dd4513937e1e",
    "namespace": "ocp"
  },
  "generation": 3057,
  "importPolicy": {},
  "name": "machine-os-content",
  "referencePolicy": {
    "type": "Source"
  }
}
$ yaml2json <ocp.4.8.is.yaml | jq '.[].status.tags[] | select(.tag == "machine-os-content")'
{
  "items": [
    {
      "created": "2021-03-08T19:06:56Z",
      "dockerImageReference": "image-registry.openshift-image-registry.svc:5000/ocp/4.8@sha256:05c8aec832d747fa21d1554bda66c9c7bf15a9241942cfd63a8c6d428559baeb",
      "generation": 3057,
      "image": "sha256:05c8aec832d747fa21d1554bda66c9c7bf15a9241942cfd63a8c6d428559baeb"
    },
    {
      "created": "2021-03-08T15:20:05Z",
      "dockerImageReference": "image-registry.openshift-image-registry.svc:5000/ocp/4.8@sha256:e6a29805478181c58ee8922085fc919cd19a15617f6e32ca9b0580c086fcfb41",
      "generation": 3057,
      "image": "sha256:e6a29805478181c58ee8922085fc919cd19a15617f6e32ca9b0580c086fcfb41"
    },
    {
      "created": "2021-03-06T15:00:13Z",
      "dockerImageReference": "image-registry.openshift-image-registry.svc:5000/ocp/4.8@sha256:ec6deaaa131ae3446b1d3f8f134d867ceab4c1fa4e94d2d6433669445e7aeca9",
      "generation": 3057,
      "image": "sha256:ec6deaaa131ae3446b1d3f8f134d867ceab4c1fa4e94d2d6433669445e7aeca9"
    }
  ],
  "tag": "machine-os-content"
}

In that case, the generation numbers match up, but there are multiple entries in status with the same generation number but different timestamps and digests.  And in no case here does the status digest match a spec digest.

$ yaml2json <ocp.4.8.is.yaml | jq -r '.[] | ([.status.tags[].items[].image | {key: ., value: true}] | from_entries) as $statusDigests | .spec.tags[].from.name | split("@")[1] | select($statusDigests[.])' | wc -l
62
$ yaml2json <ocp.4.8.is.yaml | jq -r '.[] | ([.status.tags[].items[].image | {key: ., value: true}] | from_entries) as $statusDigests | .spec.tags[].from.name | split("@")[1] | select($statusDigests[.] | not)' |  wc -l
193

So of our spec digests, 62 were present in a status entry (like I'd expect), and 193 were not represented in a status entry (which I don't understand).  There are no status.conditions or anything like that:

$ yaml2json <ocp.4.8.is.yaml | jq -r '.[].status | keys[]'
dockerImageRepository
publicDockerImageRepository
tags

Names match up fairly well:

$ diff <(yaml2json <ocp.4.8.is.yaml | jq -r '.[].spec.tags[].name' | sort) <(yaml2json <ocp.4.8.is.yaml | jq -r '.[].status.tags[].tag' | sort)
122a123
> hyperconverged-cluster-functest
245a247
> test-build-roots2i

Digging into the generations:

$ yaml2json <ocp.4.8.is.yaml | jq -r '.[] | ([.status.tags[] | {key: .tag, value: .items}] | from_entries) as $status | .spec.tags[] | select($status[.name] != null) | select($status[.name][0].generation == .generation).name' | wc -l
43
$ yaml2json <ocp.4.8.is.yaml | jq -r '.[] | ([.status.tags[] | {key: .tag, value: .items}] | from_entries) as $status | .spec.tags[] | select($status[.name] != null) | select($status[.name][0].generation > .generation).name' | wc -l
212
$ yaml2json <ocp.4.8.is.yaml | jq -r '.[] | ([.status.tags[] | {key: .tag, value: .items}] | from_entries) as $status | .spec.tags[] | select($status[.name] != null) | select($status[.name][-1].generation > .gene
ration).name' | wc -l
192
$ yaml2json <ocp.4.8.is.yaml | jq -r '.[] | ([.status.tags[] | {key: .tag, value: .items}] | from_entries) as $status | .spec.tags[] | select($status[.name] != null) | select($status[.name][0].generation < .generation).name' | wc -l
0
$ yaml2json <ocp.4.8.is.yaml | jq -r '.[] | ([.status.tags[] | {key: .tag, value: .items}] | from_entries) as $status | .spec.tags[] | select($status[.name] != null) | select($status[.name][-1].generation < .generation).name' | wc -l
11

So vs. the current spec generation:

* 43 tags where the latest status item matches (I'd expect lots of these).
* 212 tags where the newest status item exceeds the spec (I'd expect none of these.  Who's bumping the status generation in the absence of spec bumps?).
* 192 tags where even the oldest status item exceeds the spec (I'd expect none of these.  Who's bumping the status generation in the absence of spec bumps?).
* 0 tags where the newest status item is older than the spec (I'd expect a few of these, as the controller worked to import a bumped spec tag, and the PR up for this bug is making this fatal, as it should).
* 11 tags where the oldest status item is older than the spec (I'd expect lots of these).

Comment 4 W. Trevor King 2021-04-01 03:33:58 UTC
Clayton straightened me out over in [1].  The upshot is the tags where the status exceeds the spec are when someone pushed an image (instead of bumping the spec tag and having the ImageStream controller import the image), and on pushes the current metadata.generation gets frozen out into the status tag (which is how it can advance beyond the spec tag).

[1]: https://github.com/openshift/oc/pull/797#discussion_r605217396

Comment 6 Sinny Kumari 2021-04-15 11:54:36 UTC
I am no longer seeing this bug. I can create custom OCP image payload now.
Thanks for fixing it!

Verified it with oc from latest available nightly registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-04-15-074503
$ oc version
Client Version: 4.8.0-0.nightly-2021-04-15-074503
Kubernetes Version: v1.20.0+bafe72f

$ oc adm release new -n ocp --from-image-stream 4.8 --to-image  quay.io/skumari/ocp-release:v4.8 machine-config-operator=quay.io/skumari/machine-config-operator:latest 
info: Found 270 images in image stream
info: Loading sha256:3a9d1aa72deb808d5445df1734b6e258384bdce878a9be3baf42acd67ba72265 baremetal-operator
...
info: Loading sha256:7edee18da908f8994480f9452d8135f7861ef21292dcbfa654f67435f0f1abae machine-api-operator
...
info: Included 137 images from 39 input operators into the release
info: Pushed to quay.io/skumari/ocp-release:v4.8
sha256:73ecff565fc5ff9c561eb353a111af357f1e8c4d765c3639f32f7a3bccb8e4f7 0.0.1-2021-04-15-114835 2021-04-15T11:48:35Z

Comment 7 zhou ying 2021-04-15 12:15:16 UTC
Since https://bugzilla.redhat.com/show_bug.cgi?id=1937020#c6, will move to verified status.

Comment 8 W. Trevor King 2021-04-29 16:16:35 UTC
We're adding a follow-up fix.

Comment 10 zhou ying 2021-05-06 08:38:47 UTC
Confirmed with latest oc , can't reproduce this issue:

[root@localhost ~]# oc adm release new -n ocp --from-image-stream 4.8 --to-image  zhouy-registry-quay-openshift-operators.apps.yinzhou.qe.gcp.devcluster.openshift.com/yinzhou/test:ocp48 --insecure 
info: Found 271 images in image stream
info: Loading sha256:c2e239c172fc3cdd315f3127f16063dbb43a1c0262cc95010c2c49a308d315e6 baremetal-operator
info: Loading sha256:22016163fd2651a7d7b66ebe444b9969c641db57a81d9e998724e0f3f82e161b cloud-credential-operator
...
info: Included 140 images from 40 input operators into the release
info: Pushed to zhouy-registry-quay-openshift-operators.apps.yinzhou.qe.gcp.devcluster.openshift.com/yinzhou/test:ocp48
sha256:409b219276710b9c1f7ba8f98d7192408e045daba029a45a8b98758f17e7772d 0.0.1-2021-05-06-081459 2021-05-06T08:14:59Z


[root@localhost oc]# oc version --client
Client Version: 4.8.0-202104292348.p0.git.a765590-a765590

Comment 13 errata-xmlrpc 2021-07-27 22:52:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.