Bug 1685089
| Summary: | 'openshiftClusterID' is missing for ec2 resource created by installer | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Johnny Liu <jialiu> | 
| Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> | 
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Wenjing Zheng <wzheng> | 
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.0 | CC: | adam.kaplan, aos-bugs, erich, jialiu, jokerman, kalexand, mmccomas, sponnaga, wking, wsun, xtian | 
| Target Milestone: | --- | Keywords: | Regression | 
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-07-31 17:47:03 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| 
        
          Description
        
        
          Johnny Liu
        
        
        
        
        
          2019-03-04 10:43:09 UTC
        
       $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-04-033148 | grep installer installer https://github.com/openshift/installer 563f71fdfb75f96177912ca9b1d4285c7f03cea1 $ git log --graph -27 --first-parent --format='%ai %h %d %s' v0.13.0 v0.13.1 afce3acda * 2019-02-28 14:16:38 -0800 a9bb8a698 (tag: v0.13.1, origin/pr/1345) hack/build: Pin to RHCOS 47.330 and quay.io/openshift-release-dev/ocp-release:4.0.0-0.6 * 2019-02-28 14:07:28 -0800 38769677d (origin/pr/1344) CHANGELOG: Document changes since 0.13.0 * 2019-02-28 13:32:53 -0800 2971dbd9b Merge branch 'version-0.13.0' into version-0.13.1 | * 2019-02-27 05:22:11 +0100 afce3acda Merge pull request #1316 from wking/version-0.13.0 | * 2019-02-27 03:35:19 +0100 99425cfc9 Merge pull request #1296 from wking/encrypted-master-amis |/ * 2019-02-27 01:44:41 +0100 563f71fdf Merge pull request #1306 from cuppett/cuppett/network-cleanups ... * 2019-02-26 01:04:54 +0100 1b13d68b1 Merge pull request #1308 from derekwaynecarr/update-route53 | * 2019-02-19 01:10:00 -0800 e02b35a3d (tag: v0.13.0, origin/pr/1271) hack/build: Pin to RHCOS 47.330 and quay.io/openshift-release-dev/ocp-release:4.0.0-0.5 | * 2019-02-26 00:16:50 -0800 4caac531f (origin/pr/1316) CHANGELOG: Document changes since 8f08508 |/ * 2019-02-25 22:31:18 +0100 020820434 Merge pull request #1305 from deads2k/admin-kubeconfig-2 ... * 2019-02-23 14:35:25 -0500 8e8774c8e Merge pull request #1302 from abhinavdahiya/fix_aws_destroy So it's not clear to me how you saw this on 4.0.0-0.nightly-2019-03-04-033148. Can you provide more details on: > Version-Release number of the following components: > Extract installer from 4.0.0-0.nightly-2019-03-04-033148 > ./openshift-install v4.0.6-1-dirty How did you extract the installer? Did you set OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE? What does 'oc adm release info' show you? For launching clusters from a given release image, I recommend the script in flight with [1]. [1]: https://github.com/openshift/installer/pull/1221/files I use today's latest payload, still reproduced.
# env|grep OVER
OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE=ami-0a0438a5e62dc0f9f
OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-05-065158
# oc image extract $( oc adm release info "${OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE}" --image-for=installer ) --file /usr/bin/openshift-install; chmod +x ./openshift-install
# ./openshift-install version
./openshift-install v4.0.16-1-dirty
# oc adm release info --commits $OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE | grep installer
  installer                                     https://github.com/openshift/installer                                     c8b3b5532694c7713efe300a636108174d623c52
For me, this definitely happened this week, I do not think this my steps for cluster install are wrong.
Ah, I'd misread [1], which restored consuming the tag on delete, but not setting it on create. CC Abhinav. [1]: https://github.com/openshift/installer/pull/1302 trevor - looks like the above PR is not working for this issue. As far as I know, GEE guys wrote some docs for customer about "How to delete an OpenShift 4 cluster without a metadata.json file" [1], in the steps, openshiftClusterID is a key value for re-create metadata.json, QE also reference the same steps in test cases, and even ask doc team to update our official doc with the same steps. If this is on purpose everything need to be updated. If this installer candidate release out, that means, beta2 customer's cluster would have no 'opoenshiftClusterID', beta1 customer's cluster would have 'opoenshiftClusterID'. They become inconsistent. So far, I can not evaluate all possible side effect about 'openshiftClusterID' tag removal. @Eric, WDYT here? I am not sure if this is final decision about 'openshiftClusterID' tag removal, but I want everyone on the same page. I added 'beta2blocker' back to wait for CEE's input. [1]: https://access.redhat.com/solutions/3826921 > ... in the steps, openshiftClusterID is a key value for re-create metadata.json... Deletion will still work with this value set there, and since [1] we've been setting openshiftClusterID again in CI. So this seems fine to me. A potentially larger issue is the change from clusterName to infraID (appending some random chars) for kubernetes.io/cluster/... But if the docs recommend pulling that value from an existing tag, you'll be ok there too. Can you link the docs? [1]: https://github.com/openshift/installer/pull/1302 Ah, you did link them, but I can't check from my phone. Ok, so assuming we're moving ahead with the openshiftClusterID removal now that kubernetes.io/cluster/{infraID} is more unique, the docs that need updating are:
* [1] needs to drop the "You can also get the clusterID from the aws tags..." block, since that no longer works.
* [2] needs to drop openshiftClusterID.  It also needs to be updated to get the infrastructure ID from the kubernetes.io/cluster tag, with something like:
    $ aws --region "${AWS_REGION}" ec2 describe-vpcs | jq -c '.Vpcs[] | .Tags | select(. != null) | from_entries | ."name" = (keys[] | select(. | startswith("kubernetes.io/cluster/")) | sub("^kubernetes.io/cluster/"; "")) | .name | select(. | startswith("REPLACE_ME"))'
  Then reconstruct metadata.json with:
    $ echo "{\"aws\":{\"region\":\"${AWS_REGION}\",\"identifier\":[{\"kubernetes.io/cluster/${INFRA_ID}\":\"owned\"}]}}" >metadata.json
* I've filed https://github.com/openshift/training/pull/438 to drop this from the training docs.
* I've filed https://github.com/openshift/openshift-docs/pull/13987 to drop this from the product docs.
I'm targeting the Documentation component for this remaining work (is that right for Knowledgebase Solutions?).
[1]: https://access.redhat.com/solutions/3831361slee
[2]: https://access.redhat.com/solutions/3826921
(In reply to W. Trevor King from comment #9) > * [2] needs to drop openshiftClusterID. It also needs to be updated to get > the infrastructure ID from the kubernetes.io/cluster tag, with something > like: > > $ aws --region "${AWS_REGION}" ec2 describe-vpcs | jq -c '.Vpcs[] | > .Tags | select(. != null) | from_entries | ."name" = (keys[] | select(. | > startswith("kubernetes.io/cluster/")) | sub("^kubernetes.io/cluster/"; "")) > | .name | select(. | startswith("REPLACE_ME"))' +1 > > Then reconstruct metadata.json with: > > $ echo > "{\"aws\":{\"region\":\"${AWS_REGION}\",\"identifier\":[{\"kubernetes.io/ > cluster/${INFRA_ID}\":\"owned\"}]}}" >metadata.json I do not think this is enough. Here is an existing one from a successful install. # cat demo3/metadata.json {"clusterName":"qe-jialiu3","clusterID":"622e8b4b-242d-4741-a8a0-94f4c084767f","infraID":"qe-jialiu3-x5rjl","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu3-x5rjl":"owned"},{"openshiftClusterID":"622e8b4b-242d-4741-a8a0-94f4c084767f"}]}} Obvious cluster still use openshiftClusterID to tag some resource, seem like it is a s3 resource created for image-registry # aws --region ${AWS_REGION} resourcegroupstaggingapi get-resources --tag-filters "Key=openshiftClusterID,Values=622e8b4b-242d-4741-a8a0-94f4c084767f" { "ResourceTagMappingList": [ { "ResourceARN": "arn:aws:s3:::image-registry-us-east-2-622e8b4b242d4741a8a094f4c084767f-c3ad", "Tags": [ { "Value": "622e8b4b-242d-4741-a8a0-94f4c084767f", "Key": "openshiftClusterID" }, { "Value": "2019-03-05T12:15:02.712135+00:00", "Key": "openshift_creationDate" }, { "Value": "2019-03-07T12:15:02.712135+00:00", "Key": "openshift_expirationDate" } ] } ] } If use the above echo command to re-create metadata.json to destroy cluster, the S3 destroy would be missed. > > * I've filed https://github.com/openshift/training/pull/438 to drop this > from the training docs. +1 Trevor, will you PTAL at https://bugzilla.redhat.com/show_bug.cgi?id=1685089#c11? If Jianlin's right, I think we need more info before we can update the doc. I'll look into the registry's S3 tagging. PR for image-registry-operator: https://github.com/openshift/cluster-image-registry-operator/pull/267 This introduces a significant number of changes prior to code freeze. I don't think we can safely land it and ensure storage is correctly set up/tagged in time. Moving target to 4.2.0 Is this still an issue? What is the intended resolution for this issue?  openshiftClusterID is deprecated, so once we demonstrate consistent tagging with the new kubernetes.io/cluster/{infra-ID} tag, we should be able to close this as fixed, right?  I think we're there since [1], but I haven't done a full audit of created resources or anything.
[1]: https://github.com/openshift/cluster-image-registry-operator/commit/588ae43a99bc9b1f0b2955a396b83535c8242cd9
The image registry no longer tags the s3 bucket that it creates with the clusterID, we use infrastructures now. (In reply to Corey Daley from comment #17) > Is this still an issue? LGTM. s3 bucket does not have clusterID any more. Confirmed. |