Bug 1697262

Summary: [upi-on-aws] hostedzone created by cloudformation is missing some necessary tag which lead to no DNS *.apps registration
Product: OpenShift Container Platform Reporter: Johnny Liu <jialiu>
Component: InstallerAssignee: Stephen Cuppett <scuppett>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: aos-bugs, dmace, hongli, jokerman, mmccomas, scuppett, wsun
Target Milestone: ---Keywords: BetaBlocker
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:47:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Johnny Liu 2019-04-08 08:36:49 UTC
Description of problem:

Version-Release number of the following components:
4.0.0-0.nightly-2019-04-05-165550

How reproducible:
Always

Steps to Reproduce:
1. Follow https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md#configure-router-for-upi-dns to create UPI on aws.
2.
3.

Actual results:
After installation, cluster is not getting into some healthy state.
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-04-05-165550   False       True          3h54m   Unable to apply 4.0.0-0.nightly-2019-04-05-165550: the cluster operator console has not yet successfully rolled out

# oc get clusteroperator
NAME                                 VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
authentication                                                           False       False         True      3h46m
console                              4.0.0-0.nightly-2019-04-05-165550   False       True          False     3h46m

Both operators are not getting into healthy state due to *.apps DNS issue.

After workaround BZ#1697236, found no DNS *.apps registration.
Check ingress router log:
# oc logs -f ingress-operator-7dc9b877-qzss2 -n openshift-ingress-operator
<--snip-->
2019-04-08T07:54:50.788Z	ERROR	operator.init.kubebuilder.controller	controller/controller.go:217	Reconciler error	{"controller": "operator-controller", "request": "openshift-ingress-operator/default", "error": "failed to ensure clusteringress: failed to ensure DNS for default: failed to ensure DNS record &{{ map[Name:jialiuuuu1-xbhm2-int kubernetes.io/cluster/jialiuuuu1-xbhm2:owned]} ALIAS *.apps.jialiuuuu1.qe1.devcluster.openshift.com -> a854be69459ad11e9a77f02661f80878-1072364770.us-east-2.elb.amazonaws.com} for openshift-ingress-operator/default: failed to find hosted zone for record &{{ map[Name:jialiuuuu1-xbhm2-int kubernetes.io/cluster/jialiuuuu1-xbhm2:owned]} ALIAS *.apps.jialiuuuu1.qe1.devcluster.openshift.com -> a854be69459ad11e9a77f02661f80878-1072364770.us-east-2.elb.amazonaws.com}: no matching hosted zone found", "errorCauses": [{"error": "failed to ensure clusteringress: failed to ensure DNS for default: failed to ensure DNS record &{{ map[Name:jialiuuuu1-xbhm2-int kubernetes.io/cluster/jialiuuuu1-xbhm2:owned]} ALIAS *.apps.jialiuuuu1.qe1.devcluster.openshift.com -> a854be69459ad11e9a77f02661f80878-1072364770.us-east-2.elb.amazonaws.com} for openshift-ingress-operator/default: failed to find hosted zone for record &{{ map[Name:jialiuuuu1-xbhm2-int kubernetes.io/cluster/jialiuuuu1-xbhm2:owned]} ALIAS *.apps.jialiuuuu1.qe1.devcluster.openshift.com -> a854be69459ad11e9a77f02661f80878-1072364770.us-east-2.elb.amazonaws.com}: no matching hosted zone found"}]}


Expected results:
*.apps dns should be created automatically.

Additional info:
Compare with a IPI env:
IPI:
# aws route53 list-tags-for-resource --resource-type hostedzone --resource-id  Z3AV2FDEBVHLZN
{
    "ResourceTagSet": {
        "ResourceType": "hostedzone", 
        "ResourceId": "Z3AV2FDEBVHLZN", 
        "Tags": [
            {
                "Value": "owned", 
                "Key": "kubernetes.io/cluster/qe-jialiu-jj626"
            }, 
            {
                "Value": "2019-04-08T03:15:02.479817+00:00", 
                "Key": "openshift_creationDate"
            }, 
            {
                "Value": "2019-04-10T03:15:02.479817+00:00", 
                "Key": "openshift_expirationDate"
            }, 
            {
                "Value": "qe-jialiu-jj626-int", 
                "Key": "Name"
            }
        ]
    }
}
UPI:
# aws route53 list-tags-for-resource --resource-type hostedzone --resource-id   Z295LDR9P4ZPLI
{
    "ResourceTagSet": {
        "ResourceType": "hostedzone", 
        "ResourceId": "Z295LDR9P4ZPLI", 
        "Tags": [
            {
                "Value": "2019-04-08T03:15:02.479817+00:00", 
                "Key": "openshift_creationDate"
            }, 
            {
                "Value": "2019-04-10T03:15:02.479817+00:00", 
                "Key": "openshift_expirationDate"
            }
        ]
    }
}

Workaround:
# aws route53 change-tags-for-resource --resource-type hostedzone --resource-id Z295LDR9P4ZPLI --add-tags Key=kubernetes.io/cluster/jialiuuuu1-xbhm2,Value=owned Key=Name,Value=jialiuuuu1-xbhm2-int

# aws route53 list-tags-for-resource --resource-type hostedzone --resource-id   Z295LDR9P4ZPLI{
    "ResourceTagSet": {
        "ResourceType": "hostedzone", 
        "ResourceId": "Z295LDR9P4ZPLI", 
        "Tags": [
            {
                "Value": "owned", 
                "Key": "kubernetes.io/cluster/jialiuuuu1-xbhm2"
            }, 
            {
                "Value": "2019-04-08T03:15:02.479817+00:00", 
                "Key": "openshift_creationDate"
            }, 
            {
                "Value": "owned", 
                "Key": "kubernetes.io/cluster/kubernetes.io/cluster/jialiuuuu1-xbhm2"
            }, 
            {
                "Value": "2019-04-10T03:15:02.479817+00:00", 
                "Key": "openshift_expirationDate"
            }, 
            {
                "Value": "jialiuuuu1-xbhm2-int", 
                "Key": "Name"
            }
        ]
    }
}

Wait for some mins, *.apps is created by ingress operator, cluster works.

NOTE:
hostzone tag is using 'infraID' string as tag name, that is different with BZ#1697236 which is using 'cluster name' string as tag name, ideally they should be keep insistent.

Comment 1 Stephen Cuppett 2019-04-09 11:32:44 UTC
In all the CloudFormation, the infrastructure ID is not passed in as a parameter (e.g. "jialiuuuu1-xbhm2", cluster=jialiuuuu1, infrastructure_id=xbhm2). The UPI instructions indicate a few places [1] where security group, subnet, etc. search tags need updated for UPI. We cannot guarantee the customer will have their DNS zone *exactly* set like IPI and should try to avoid that. The CloudFormation templates could set a tag, but I'd want to make sure we call out how to update the operator to find the right thing if something like the infrastructure ID is not represented. The TODO [2] was left in there specifically for this purpose.

[1]: https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md#option-1-dynamic-compute-using-machine-api
[2]: https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md#configure-router-for-upi-dns

Comment 2 Stephen Cuppett 2019-04-10 20:46:39 UTC
https://github.com/openshift/installer/pull/1590

Comment 3 Johnny Liu 2019-04-15 08:14:02 UTC
The PR is merged now, I re-test this bug with 4.0.0-0.nightly-2019-04-10-182914, and PASS.

After workaround for BZ#1697236, *.apps DNS record is created successfully.

Comment 4 Johnny Liu 2019-04-15 09:47:11 UTC
Some steps for verification:
Get infrastructure Name from metada.json by `jq -r .infraID metadata.json`, and pass `ParameterKey=InfrastructureName,ParameterValue=${your_InfrastructureName}` when creating 02_cluster_infra.yaml CF stack.

Comment 6 Johnny Liu 2019-04-16 08:02:21 UTC
Per comment 3, move this to "VERIFIED"

Comment 8 errata-xmlrpc 2019-06-04 10:47:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758