Hide Forgot
Description of problem: Version-Release number of the following components: 4.0.0-0.nightly-2019-04-05-165550 How reproducible: Always Steps to Reproduce: 1. Follow https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md#configure-router-for-upi-dns to create UPI on aws. 2. 3. Actual results: After installation, cluster is not getting into some healthy state. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-04-05-165550 False True 3h54m Unable to apply 4.0.0-0.nightly-2019-04-05-165550: the cluster operator console has not yet successfully rolled out # oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING FAILING SINCE authentication False False True 3h46m console 4.0.0-0.nightly-2019-04-05-165550 False True False 3h46m Both operators are not getting into healthy state due to *.apps DNS issue. After workaround BZ#1697236, found no DNS *.apps registration. Check ingress router log: # oc logs -f ingress-operator-7dc9b877-qzss2 -n openshift-ingress-operator <--snip--> 2019-04-08T07:54:50.788Z ERROR operator.init.kubebuilder.controller controller/controller.go:217 Reconciler error {"controller": "operator-controller", "request": "openshift-ingress-operator/default", "error": "failed to ensure clusteringress: failed to ensure DNS for default: failed to ensure DNS record &{{ map[Name:jialiuuuu1-xbhm2-int kubernetes.io/cluster/jialiuuuu1-xbhm2:owned]} ALIAS *.apps.jialiuuuu1.qe1.devcluster.openshift.com -> a854be69459ad11e9a77f02661f80878-1072364770.us-east-2.elb.amazonaws.com} for openshift-ingress-operator/default: failed to find hosted zone for record &{{ map[Name:jialiuuuu1-xbhm2-int kubernetes.io/cluster/jialiuuuu1-xbhm2:owned]} ALIAS *.apps.jialiuuuu1.qe1.devcluster.openshift.com -> a854be69459ad11e9a77f02661f80878-1072364770.us-east-2.elb.amazonaws.com}: no matching hosted zone found", "errorCauses": [{"error": "failed to ensure clusteringress: failed to ensure DNS for default: failed to ensure DNS record &{{ map[Name:jialiuuuu1-xbhm2-int kubernetes.io/cluster/jialiuuuu1-xbhm2:owned]} ALIAS *.apps.jialiuuuu1.qe1.devcluster.openshift.com -> a854be69459ad11e9a77f02661f80878-1072364770.us-east-2.elb.amazonaws.com} for openshift-ingress-operator/default: failed to find hosted zone for record &{{ map[Name:jialiuuuu1-xbhm2-int kubernetes.io/cluster/jialiuuuu1-xbhm2:owned]} ALIAS *.apps.jialiuuuu1.qe1.devcluster.openshift.com -> a854be69459ad11e9a77f02661f80878-1072364770.us-east-2.elb.amazonaws.com}: no matching hosted zone found"}]} Expected results: *.apps dns should be created automatically. Additional info: Compare with a IPI env: IPI: # aws route53 list-tags-for-resource --resource-type hostedzone --resource-id Z3AV2FDEBVHLZN { "ResourceTagSet": { "ResourceType": "hostedzone", "ResourceId": "Z3AV2FDEBVHLZN", "Tags": [ { "Value": "owned", "Key": "kubernetes.io/cluster/qe-jialiu-jj626" }, { "Value": "2019-04-08T03:15:02.479817+00:00", "Key": "openshift_creationDate" }, { "Value": "2019-04-10T03:15:02.479817+00:00", "Key": "openshift_expirationDate" }, { "Value": "qe-jialiu-jj626-int", "Key": "Name" } ] } } UPI: # aws route53 list-tags-for-resource --resource-type hostedzone --resource-id Z295LDR9P4ZPLI { "ResourceTagSet": { "ResourceType": "hostedzone", "ResourceId": "Z295LDR9P4ZPLI", "Tags": [ { "Value": "2019-04-08T03:15:02.479817+00:00", "Key": "openshift_creationDate" }, { "Value": "2019-04-10T03:15:02.479817+00:00", "Key": "openshift_expirationDate" } ] } } Workaround: # aws route53 change-tags-for-resource --resource-type hostedzone --resource-id Z295LDR9P4ZPLI --add-tags Key=kubernetes.io/cluster/jialiuuuu1-xbhm2,Value=owned Key=Name,Value=jialiuuuu1-xbhm2-int # aws route53 list-tags-for-resource --resource-type hostedzone --resource-id Z295LDR9P4ZPLI{ "ResourceTagSet": { "ResourceType": "hostedzone", "ResourceId": "Z295LDR9P4ZPLI", "Tags": [ { "Value": "owned", "Key": "kubernetes.io/cluster/jialiuuuu1-xbhm2" }, { "Value": "2019-04-08T03:15:02.479817+00:00", "Key": "openshift_creationDate" }, { "Value": "owned", "Key": "kubernetes.io/cluster/kubernetes.io/cluster/jialiuuuu1-xbhm2" }, { "Value": "2019-04-10T03:15:02.479817+00:00", "Key": "openshift_expirationDate" }, { "Value": "jialiuuuu1-xbhm2-int", "Key": "Name" } ] } } Wait for some mins, *.apps is created by ingress operator, cluster works. NOTE: hostzone tag is using 'infraID' string as tag name, that is different with BZ#1697236 which is using 'cluster name' string as tag name, ideally they should be keep insistent.
In all the CloudFormation, the infrastructure ID is not passed in as a parameter (e.g. "jialiuuuu1-xbhm2", cluster=jialiuuuu1, infrastructure_id=xbhm2). The UPI instructions indicate a few places [1] where security group, subnet, etc. search tags need updated for UPI. We cannot guarantee the customer will have their DNS zone *exactly* set like IPI and should try to avoid that. The CloudFormation templates could set a tag, but I'd want to make sure we call out how to update the operator to find the right thing if something like the infrastructure ID is not represented. The TODO [2] was left in there specifically for this purpose. [1]: https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md#option-1-dynamic-compute-using-machine-api [2]: https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md#configure-router-for-upi-dns
https://github.com/openshift/installer/pull/1590
The PR is merged now, I re-test this bug with 4.0.0-0.nightly-2019-04-10-182914, and PASS. After workaround for BZ#1697236, *.apps DNS record is created successfully.
Some steps for verification: Get infrastructure Name from metada.json by `jq -r .infraID metadata.json`, and pass `ParameterKey=InfrastructureName,ParameterValue=${your_InfrastructureName}` when creating 02_cluster_infra.yaml CF stack.
Per comment 3, move this to "VERIFIED"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758