CI has turned up some races like: 1. The installer removes a bucket 2. The still-running registry operator tries to self-heal and creates a new bucket, but before it can push tags... 3. The installer terminates the instance where the registry operator was running. 4. The installer leaks the new, untagged bucket. This has been happening in CI in jobs like [1,2], where CloudFormation logs show: 12:56:32, registry operator creates the bucket 14:31:10, installer deletes the bucket 14:31:11, registry operator creates the bucket again 14:31:11, installer starts requesting instance termination And it could theoretically happen to any resource (not just buckets) which we discover by tag, but which is not tagged atomically on creation. This issue is for 4.2, but the underlying race also impacts 4.1.z (at least as of 4.1.11). Suggested fix is terminating instances first, and only moving ahead to delete other resources after there are no longer any cluster-owned instances that are running or shutting down [3]. [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/operator-framework_operator-marketplace/232/pull-ci-operator-framework-operator-marketplace-master-e2e-aws-upgrade/454 [2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/operator-framework_operator-marketplace/232/pull-ci-operator-framework-operator-marketplace-master-e2e-aws-upgrade/454/artifacts/e2e-aws-upgrade/installer/.openshift_install.log [3]: https://github.com/openshift/installer/pull/2169
*** Bug 1743313 has been marked as a duplicate of this bug. ***
*** Bug 1740935 has been marked as a duplicate of this bug. ***
I'm not sure what validation looks like for this bug. Ideally we stop leaking registry buckets in CI (at least from 4.2 clusters that will have the new code). But narrowly it might just be "I looked and saw the instances terminated first", like [1]: time="2019-09-05T00:15:08Z" level=debug msg="OpenShift Installer unreleased-master-1687-g9c59c82b8f8631d082eaa4276e1595c95a581c4a-dirty" time="2019-09-05T00:15:08Z" level=debug msg="Built from commit 9c59c82b8f8631d082eaa4276e1595c95a581c4a" time="2019-09-05T00:15:08Z" level=debug msg="search for and delete matching instances by tag matching aws.Filter{\"kubernetes.io/cluster/ci-op-j16mjwc5-1d3f3-gpzqh\":\"owned\"}" time="2019-09-05T00:15:08Z" level=debug msg=Terminated instance=i-01d82808356254e87 time="2019-09-05T00:15:09Z" level=info msg=Disassociated instance=i-06a364cefe8e22247 name=ci-op-j16mjwc5-1d3f3-gpzqh-master-profile role=ci-op-j16mjwc5-1d3f3-gpzqh-master-role time="2019-09-05T00:15:09Z" level=info msg=Deleted InstanceProfileName=ci-op-j16mjwc5-1d3f3-gpzqh-master-profile arn="arn:aws:iam::460538899914:instance-profile/ci-op-j16mjwc5-1d3f3-gpzqh-master-profile" instance=i-06a364cefe8e22247 time="2019-09-05T00:15:09Z" level=info msg=Terminating instance=i-06a364cefe8e22247 time="2019-09-05T00:15:09Z" level=info msg=Terminating instance=i-0d1f65cc7f2141bae time="2019-09-05T00:15:09Z" level=info msg=Disassociated instance=i-0f5205e4c5e40e39d name=ci-op-j16mjwc5-1d3f3-gpzqh-worker-profile role=ci-op-j16mjwc5-1d3f3-gpzqh-worker-role time="2019-09-05T00:15:09Z" level=info msg=Deleted InstanceProfileName=ci-op-j16mjwc5-1d3f3-gpzqh-worker-profile arn="arn:aws:iam::460538899914:instance-profile/ci-op-j16mjwc5-1d3f3-gpzqh-worker-profile" instance=i-0f5205e4c5e40e39d time="2019-09-05T00:15:09Z" level=info msg=Terminating instance=i-0f5205e4c5e40e39d time="2019-09-05T00:15:09Z" level=info msg=Terminating instance=i-0102c7a760fd719a2 time="2019-09-05T00:15:10Z" level=info msg=Terminating instance=i-0a6d5ce45bedab363 time="2019-09-05T00:15:10Z" level=info msg=Terminating instance=i-0a57eb385a5c0226a time="2019-09-05T00:15:10Z" level=debug msg="search for and delete matching instances by tag matching aws.Filter{\"openshiftClusterID\":\"c106fc46-8afb-4285-b3b2-5faf76fbda72\"}" ... time="2019-09-05T00:15:40Z" level=debug msg="search for and delete matching instances by tag matching aws.Filter{\"kubernetes.io/cluster/ci-op-j16mjwc5-1d3f3-gpzqh\":\"owned\"}" time="2019-09-05T00:15:40Z" level=debug msg=Terminated instance=i-06a364cefe8e22247 time="2019-09-05T00:15:40Z" level=debug msg=Terminated instance=i-0d1f65cc7f2141bae time="2019-09-05T00:15:40Z" level=debug msg=Terminated instance=i-0f5205e4c5e40e39d time="2019-09-05T00:15:40Z" level=debug msg=Terminated instance=i-0a6d5ce45bedab363 time="2019-09-05T00:15:40Z" level=debug msg="search for and delete matching instances by tag matching aws.Filter{\"openshiftClusterID\":\"c106fc46-8afb-4285-b3b2-5faf76fbda72\"}" ... time="2019-09-05T00:17:00Z" level=debug msg="search for and delete matching instances by tag matching aws.Filter{\"kubernetes.io/cluster/ci-op-j16mjwc5-1d3f3-gpzqh\":\"owned\"}" time="2019-09-05T00:17:00Z" level=debug msg=Terminated instance=i-0a57eb385a5c0226a time="2019-09-05T00:17:00Z" level=debug msg="search for and delete matching instances by tag matching aws.Filter{\"openshiftClusterID\":\"c106fc46-8afb-4285-b3b2-5faf76fbda72\"}" time="2019-09-05T00:17:10Z" level=debug msg="search for and delete matching instances by tag matching aws.Filter{\"kubernetes.io/cluster/ci-op-j16mjwc5-1d3f3-gpzqh\":\"owned\"}" time="2019-09-05T00:17:11Z" level=debug msg=Terminated instance=i-0102c7a760fd719a2 time="2019-09-05T00:17:11Z" level=debug msg="search for and delete matching instances by tag matching aws.Filter{\"openshiftClusterID\":\"c106fc46-8afb-4285-b3b2-5faf76fbda72\"}" time="2019-09-05T00:17:11Z" level=debug msg="search for and delete matching resources by tag in us-east-1 matching aws.Filter{\"kubernetes.io/cluster/ci-op-j16mjwc5-1d3f3-gpzqh\":\"owned\"}" time="2019-09-05T00:17:11Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:image/ami-04c4612436f0cde76" id=ami-04c4612436f0cde76 ... [1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/2169/pull-ci-openshift-installer-master-e2e-aws/7522/artifacts/e2e-aws/installer/.openshift_install.log
PR is not merged to day's nightly build
Verified with version 4.2.0-0.nightly-2019-09-08-232045
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922