Bug 1685089 - 'openshiftClusterID' is missing for ec2 resource created by installer
Summary: 'openshiftClusterID' is missing for ec2 resource created by installer
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Oleg Bulatov
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-04 10:43 UTC by Johnny Liu
Modified: 2019-08-01 10:37 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-31 17:47:03 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 260 None None None 2019-04-24 07:15:55 UTC
Github openshift cluster-image-registry-operator pull 308 None None None 2019-07-31 19:15:00 UTC

Description Johnny Liu 2019-03-04 10:43:09 UTC
Description of problem:

Version-Release number of the following components:
Extract installer from 4.0.0-0.nightly-2019-03-04-033148
./openshift-install v4.0.6-1-dirty

How reproducible:
Always

Steps to Reproduce:
1. Trigger an install
2. Go to aws console to check created ec2 resource tag
3.

Actual results:
ec2 resource missed 'openshiftClusterID' tag.

Expected results:
ec2 resource should have 'openshiftClusterID' tag.

Additional info:
Confirm with @wking, this issue already fixed. Open this bug for tracking beta2 release blocker.

Jianlin Liu   [Today at 6:00 PM]
@wking now installer stopping adding "openshiftClusterID" tag now ?
W. Trevor King   [21 minutes ago]
there was a brief window without it, but it's back since https://github.com/openshift/installer/pull/1302
W. Trevor King   [18 minutes ago]
it had been removed by https://github.com/openshift/installer/pull/1280/commits/2b9b596262a2a9992a8f323c86d2cc568eb72853 earlier.  Both of those commits were released in 0.13.0, so we haven't had a release without `openshiftClusterID` since we transitioned from `tectonicClusterID` in 0.7.0

Comment 2 W. Trevor King 2019-03-05 08:35:47 UTC
$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-04-033148 | grep installer
  installer                                     https://github.com/openshift/installer                                     563f71fdfb75f96177912ca9b1d4285c7f03cea1
$ git log --graph -27 --first-parent --format='%ai %h %d %s' v0.13.0 v0.13.1 afce3acda
* 2019-02-28 14:16:38 -0800 a9bb8a698  (tag: v0.13.1, origin/pr/1345) hack/build: Pin to RHCOS 47.330 and quay.io/openshift-release-dev/ocp-release:4.0.0-0.6
* 2019-02-28 14:07:28 -0800 38769677d  (origin/pr/1344) CHANGELOG: Document changes since 0.13.0
* 2019-02-28 13:32:53 -0800 2971dbd9b  Merge branch 'version-0.13.0' into version-0.13.1
| * 2019-02-27 05:22:11 +0100 afce3acda  Merge pull request #1316 from wking/version-0.13.0
| * 2019-02-27 03:35:19 +0100 99425cfc9  Merge pull request #1296 from wking/encrypted-master-amis
|/  
* 2019-02-27 01:44:41 +0100 563f71fdf  Merge pull request #1306 from cuppett/cuppett/network-cleanups
...
* 2019-02-26 01:04:54 +0100 1b13d68b1  Merge pull request #1308 from derekwaynecarr/update-route53
| * 2019-02-19 01:10:00 -0800 e02b35a3d  (tag: v0.13.0, origin/pr/1271) hack/build: Pin to RHCOS 47.330 and quay.io/openshift-release-dev/ocp-release:4.0.0-0.5
| * 2019-02-26 00:16:50 -0800 4caac531f  (origin/pr/1316) CHANGELOG: Document changes since 8f08508
|/  
* 2019-02-25 22:31:18 +0100 020820434  Merge pull request #1305 from deads2k/admin-kubeconfig-2
...
* 2019-02-23 14:35:25 -0500 8e8774c8e  Merge pull request #1302 from abhinavdahiya/fix_aws_destroy

So it's not clear to me how you saw this on 4.0.0-0.nightly-2019-03-04-033148.  Can you provide more details on:

> Version-Release number of the following components:
> Extract installer from 4.0.0-0.nightly-2019-03-04-033148
> ./openshift-install v4.0.6-1-dirty

How did you extract the installer?  Did you set OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE?  What does 'oc adm release info' show you?  For launching clusters from a given release image, I recommend the script in flight with [1].

[1]: https://github.com/openshift/installer/pull/1221/files

Comment 3 Johnny Liu 2019-03-05 10:38:45 UTC
I use today's latest payload, still reproduced.

# env|grep OVER
OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE=ami-0a0438a5e62dc0f9f
OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-05-065158

# oc image extract $( oc adm release info "${OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE}" --image-for=installer ) --file /usr/bin/openshift-install; chmod +x ./openshift-install

# ./openshift-install version
./openshift-install v4.0.16-1-dirty

# oc adm release info --commits $OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE | grep installer
  installer                                     https://github.com/openshift/installer                                     c8b3b5532694c7713efe300a636108174d623c52

For me, this definitely happened this week, I do not think this my steps for cluster install are wrong.

Comment 4 W. Trevor King 2019-03-05 10:54:19 UTC
Ah, I'd misread [1], which restored consuming the tag on delete, but not setting it on create. 
CC Abhinav.

[1]: https://github.com/openshift/installer/pull/1302

Comment 5 Sudha Ponnaganti 2019-03-05 11:07:57 UTC
trevor - looks like the above PR is not working for this issue.

Comment 6 Johnny Liu 2019-03-05 11:45:10 UTC
As far as I know, GEE guys wrote some docs for customer about "How to delete an OpenShift 4 cluster without a metadata.json file" [1], in the steps, openshiftClusterID is a key value for re-create metadata.json, QE also reference the same steps in test cases, and even ask doc team to update our official doc with the same steps. If this is on purpose everything need to be updated.

If this installer candidate release out, that means, beta2 customer's cluster would have no 'opoenshiftClusterID', beta1 customer's cluster would have 'opoenshiftClusterID'. They become inconsistent. So far, I can not evaluate all possible side effect about 'openshiftClusterID' tag removal. @Eric, WDYT here?

I am not sure if this is final decision about 'openshiftClusterID' tag removal, but I want everyone on the same page. I added 'beta2blocker' back to wait for CEE's input.


[1]: https://access.redhat.com/solutions/3826921

Comment 7 W. Trevor King 2019-03-05 11:54:16 UTC
> ... in the steps, openshiftClusterID is a key value for re-create metadata.json...

Deletion will still work with this value set there, and since [1] we've been setting openshiftClusterID again in CI.  So this seems fine to me.

A potentially larger issue is the change from clusterName to infraID (appending some random chars) for kubernetes.io/cluster/...  But if the docs recommend pulling that value from an existing tag, you'll be ok there too.  Can you link the docs?

[1]: https://github.com/openshift/installer/pull/1302

Comment 8 W. Trevor King 2019-03-05 11:55:58 UTC
Ah, you did link them, but I can't check from my phone.

Comment 9 W. Trevor King 2019-03-05 19:23:01 UTC
Ok, so assuming we're moving ahead with the openshiftClusterID removal now that kubernetes.io/cluster/{infraID} is more unique, the docs that need updating are:

* [1] needs to drop the "You can also get the clusterID from the aws tags..." block, since that no longer works.
* [2] needs to drop openshiftClusterID.  It also needs to be updated to get the infrastructure ID from the kubernetes.io/cluster tag, with something like:

    $ aws --region "${AWS_REGION}" ec2 describe-vpcs | jq -c '.Vpcs[] | .Tags | select(. != null) | from_entries | ."name" = (keys[] | select(. | startswith("kubernetes.io/cluster/")) | sub("^kubernetes.io/cluster/"; "")) | .name | select(. | startswith("REPLACE_ME"))'

  Then reconstruct metadata.json with:

    $ echo "{\"aws\":{\"region\":\"${AWS_REGION}\",\"identifier\":[{\"kubernetes.io/cluster/${INFRA_ID}\":\"owned\"}]}}" >metadata.json

* I've filed https://github.com/openshift/training/pull/438 to drop this from the training docs.
* I've filed https://github.com/openshift/openshift-docs/pull/13987 to drop this from the product docs.

I'm targeting the Documentation component for this remaining work (is that right for Knowledgebase Solutions?).

[1]: https://access.redhat.com/solutions/3831361slee
[2]: https://access.redhat.com/solutions/3826921

Comment 11 Johnny Liu 2019-03-06 07:26:28 UTC
(In reply to W. Trevor King from comment #9)
> * [2] needs to drop openshiftClusterID.  It also needs to be updated to get
> the infrastructure ID from the kubernetes.io/cluster tag, with something
> like:
> 
>     $ aws --region "${AWS_REGION}" ec2 describe-vpcs | jq -c '.Vpcs[] |
> .Tags | select(. != null) | from_entries | ."name" = (keys[] | select(. |
> startswith("kubernetes.io/cluster/")) | sub("^kubernetes.io/cluster/"; ""))
> | .name | select(. | startswith("REPLACE_ME"))'
+1
> 
>   Then reconstruct metadata.json with:
> 
>     $ echo
> "{\"aws\":{\"region\":\"${AWS_REGION}\",\"identifier\":[{\"kubernetes.io/
> cluster/${INFRA_ID}\":\"owned\"}]}}" >metadata.json
I do not think this is enough. Here is an existing one from a successful install.
# cat demo3/metadata.json 
{"clusterName":"qe-jialiu3","clusterID":"622e8b4b-242d-4741-a8a0-94f4c084767f","infraID":"qe-jialiu3-x5rjl","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu3-x5rjl":"owned"},{"openshiftClusterID":"622e8b4b-242d-4741-a8a0-94f4c084767f"}]}}

Obvious cluster still use openshiftClusterID to tag some resource, seem like it is a s3 resource created for image-registry 
# aws --region ${AWS_REGION} resourcegroupstaggingapi get-resources --tag-filters "Key=openshiftClusterID,Values=622e8b4b-242d-4741-a8a0-94f4c084767f"
{
    "ResourceTagMappingList": [
        {
            "ResourceARN": "arn:aws:s3:::image-registry-us-east-2-622e8b4b242d4741a8a094f4c084767f-c3ad", 
            "Tags": [
                {
                    "Value": "622e8b4b-242d-4741-a8a0-94f4c084767f", 
                    "Key": "openshiftClusterID"
                }, 
                {
                    "Value": "2019-03-05T12:15:02.712135+00:00", 
                    "Key": "openshift_creationDate"
                }, 
                {
                    "Value": "2019-03-07T12:15:02.712135+00:00", 
                    "Key": "openshift_expirationDate"
                }
            ]
        }
    ]
}

If use the above echo command to re-create metadata.json to destroy cluster, the S3 destroy would be missed.

> 
> * I've filed https://github.com/openshift/training/pull/438 to drop this
> from the training docs.
+1

Comment 13 Kathryn Alexander 2019-04-22 14:55:53 UTC
Trevor, will you PTAL at https://bugzilla.redhat.com/show_bug.cgi?id=1685089#c11? If Jianlin's right, I think we need more info before we can update the doc.

Comment 14 W. Trevor King 2019-04-22 15:20:47 UTC
I'll look into the registry's S3 tagging.

Comment 15 Adam Kaplan 2019-05-07 19:28:11 UTC
PR for image-registry-operator: https://github.com/openshift/cluster-image-registry-operator/pull/267

This introduces a significant number of changes prior to code freeze. I don't think we can safely land it and ensure storage is correctly set up/tagged in time.

Comment 16 Adam Kaplan 2019-05-08 12:52:51 UTC
Moving target to 4.2.0

Comment 17 Corey Daley 2019-07-31 17:30:27 UTC
Is this still an issue?

Comment 18 W. Trevor King 2019-07-31 17:40:40 UTC
What is the intended resolution for this issue?  openshiftClusterID is deprecated, so once we demonstrate consistent tagging with the new kubernetes.io/cluster/{infra-ID} tag, we should be able to close this as fixed, right?  I think we're there since [1], but I haven't done a full audit of created resources or anything.

[1]: https://github.com/openshift/cluster-image-registry-operator/commit/588ae43a99bc9b1f0b2955a396b83535c8242cd9

Comment 19 Corey Daley 2019-07-31 17:47:03 UTC
The image registry no longer tags the s3 bucket that it creates with the clusterID, we use infrastructures now.

Comment 20 Johnny Liu 2019-08-01 10:37:06 UTC
(In reply to Corey Daley from comment #17)
> Is this still an issue?

LGTM. s3 bucket does not have clusterID any more. Confirmed.


Note You need to log in before you can comment on or make changes to this bug.