Description of problem: Unable to install OCP Cluster when the domain name ends with a digit. Authentication Clusteroperator fails with error: # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.10.0-fc.4 False False True 41m OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found See detailed message below. Version-Release number of selected component (if applicable): # oc version Client Version: 4.10.0-fc.4 Server Version: 4.10.0-fc.4 Kubernetes Version: v1.23.0+d30ebbc # oc get nodes NAME STATUS ROLES AGE VERSION master-0.m3558001.ocptest1 Ready master 38m v1.23.0+d30ebbc master-1.m3558001.ocptest1 Ready master 38m v1.23.0+d30ebbc master-2.m3558001.ocptest1 Ready master 37m v1.23.0+d30ebbc worker-0.m3558001.ocptest1 Ready worker 24m v1.23.0+d30ebbc worker-1.m3558001.ocptest1 Ready worker 24m v1.23.0+d30ebbc Also reported for OCP 4.9.! How reproducible: Install a OCP cluster with a domain name which ends with a digit Steps to Reproduce: 1. Install a OCP cluster with a domain name which ends with a digit 2. 3. Actual results: # oc describe co authentication Name: authentication Namespace: Labels: <none> Annotations: exclude.release.openshift.io/internal-openshift-hosted: true include.release.openshift.io/self-managed-high-availability: true include.release.openshift.io/single-node-developer: true API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2022-02-02T08:39:01Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:exclude.release.openshift.io/internal-openshift-hosted: f:include.release.openshift.io/self-managed-high-availability: f:include.release.openshift.io/single-node-developer: f:ownerReferences: .: k:{"uid":"699e9556-072a-4a5a-8899-d8809b6ff3b4"}: f:spec: Manager: Go-http-client Operation: Update Time: 2022-02-02T08:39:01Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: f:extension: f:relatedObjects: f:versions: Manager: Go-http-client Operation: Update Subresource: status Time: 2022-02-02T08:41:21Z Owner References: API Version: config.openshift.io/v1 Kind: ClusterVersion Name: version UID: 699e9556-072a-4a5a-8899-d8809b6ff3b4 Resource Version: 25295 UID: 497eda94-d826-4d18-bdf7-00785402e9fb Spec: Status: Conditions: Last Transition Time: 2022-02-02T08:43:23Z Message: CustomRouteControllerDegraded: Ingress.config.openshift.io "cluster" is invalid: [status.componentRoutes.currentHostnames: Invalid value: "oauth-openshift.apps.m3558001.ocptest1": status.componentRoutes.currentHostnames in body must be of type hostname: "oauth-openshift.apps.m3558001.ocptest1", status.componentRoutes.defaultHostname: Invalid value: "oauth-openshift.apps.m3558001.ocptest1": status.componentRoutes.defaultHostname in body must be of type hostname: "oauth-openshift.apps.m3558001.ocptest1"] OAuthServerRouteEndpointAccessibleControllerDegraded: ingress.config/cluster does not yet have status for the "openshift-authentication/oauth-openshift" route Reason: CustomRouteController_SyncError::OAuthServerRouteEndpointAccessibleController_SyncError Status: True Type: Degraded Last Transition Time: 2022-02-02T09:04:19Z Message: AuthenticatorCertKeyProgressing: All is well Reason: AsExpected Status: False Type: Progressing Last Transition Time: 2022-02-02T08:41:21Z Message: OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found Reason: OAuthServerRouteEndpointAccessibleController_ResourceNotFound Status: False Type: Available Last Transition Time: 2022-02-02T08:41:21Z Message: All is well Reason: AsExpected Status: True Type: Upgradeable Extension: <nil> Related Objects: Group: operator.openshift.io Name: cluster Resource: authentications Group: config.openshift.io Name: cluster Resource: authentications Group: config.openshift.io Name: cluster Resource: infrastructures Group: config.openshift.io Name: cluster Resource: oauths Group: route.openshift.io Name: oauth-openshift Namespace: openshift-authentication Resource: routes Group: Name: oauth-openshift Namespace: openshift-authentication Resource: services Group: Name: openshift-config Resource: namespaces Group: Name: openshift-config-managed Resource: namespaces Group: Name: openshift-authentication Resource: namespaces Group: Name: openshift-authentication-operator Resource: namespaces Group: Name: openshift-ingress Resource: namespaces Group: Name: openshift-oauth-apiserver Resource: namespaces Versions: Name: operator Version: 4.10.0-fc.4 Name: oauth-apiserver Version: 4.10.0-fc.4 Name: oauth-openshift Version: 4.10.0-fc.4_openshift Events: <none> Expected results: OCP Cluster installation works with a domain name which ends with a digit. As a workaround, the limitation should be documented. Additional info: All other Cluster Operator installed successfully: # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.10.0-fc.4 False False True 94m OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found baremetal 4.10.0-fc.4 True False False 92m cloud-controller-manager 4.10.0-fc.4 True False False 95m cloud-credential 4.10.0-fc.4 True False False 94m cluster-autoscaler 4.10.0-fc.4 True False False 92m config-operator 4.10.0-fc.4 True False False 94m console 4.10.0-fc.4 True False False 75m csi-snapshot-controller 4.10.0-fc.4 True False False 93m dns 4.10.0-fc.4 True False False 92m etcd 4.10.0-fc.4 True False False 92m image-registry 4.10.0-fc.4 True False False 82m ingress 4.10.0-fc.4 True False False 44m insights 4.10.0-fc.4 True False False 85m kube-apiserver 4.10.0-fc.4 True False False 81m kube-controller-manager 4.10.0-fc.4 True False False 91m kube-scheduler 4.10.0-fc.4 True False False 91m kube-storage-version-migrator 4.10.0-fc.4 True False False 39m machine-api 4.10.0-fc.4 True False False 93m machine-approver 4.10.0-fc.4 True False False 92m machine-config 4.10.0-fc.4 True False False 92m marketplace 4.10.0-fc.4 True False False 92m monitoring 4.10.0-fc.4 True False False 73m network 4.10.0-fc.4 True False False 94m node-tuning 4.10.0-fc.4 True False False 80m openshift-apiserver 4.10.0-fc.4 True False False 88m openshift-controller-manager 4.10.0-fc.4 True False False 91m openshift-samples 4.10.0-fc.4 True False False 84m operator-lifecycle-manager 4.10.0-fc.4 True False False 93m operator-lifecycle-manager-catalog 4.10.0-fc.4 True False False 93m operator-lifecycle-manager-packageserver 4.10.0-fc.4 True False False 84m service-ca 4.10.0-fc.4 True False False 94m storage 4.10.0-fc.4 True False False 94m
Also seen in 4.9.15
It looks like there is a discrepancy in hostname validation between what's allowed in installer and what's considered a "hostname" by the CRD validation. The CRD validation for fields marked as `Format=hostname`, such as the "currentHostname" here (ref. https://github.com/openshift/api/blob/d74727069f6fb9193d6ba50ed7e3a5edb2477bec/config/v1/types_ingress.go#L95 https://github.com/openshift/api/blob/d74727069f6fb9193d6ba50ed7e3a5edb2477bec/config/v1/types_ingress.go#L179) is eventually going to be handled by the following regex: https://github.com/kubernetes/kube-openapi/blob/424119656bbfd8b633f4b9f9ef5f93cd1e01266a/pkg/validation/strfmt/default.go#L58. As you can see, the regular expression does not allow a digit to be the last symbol. I am going to move this BZ to the Routing component since they own the ingress.config object to determine what they want to do with it. They may have better opinions about what a hostname should look like, too.
We have a report of a similar problem in bug 2039256. In the previous report, the issue is that the reporter is trying to customize routes to use a TLD that contains a digit, and updating the ingress.config object's spec fails, which causes the user's customization to fail. In this new report, the issue is that the reporter is trying to install a cluster with a TLD that contains a digit; operators put the default routes' host names in the ingress.config object's status, and updating this status fails, which causes the operators to go degraded. Both reports have a similar cause: Validation related to route customization in the ingress.config object uses `+kubebuilder:validation:Format=hostname`, which restricts TLDs from including digits. For spec, this restriction poses a limitation in the use of the new customizeable route feature. For status though, this restriction poses a problem for functionality that presumably used to work (namely using TLDs with digits for the cluster domain) before the enhancement was implemented. That makes the issue in this new report a regression. As I understand it, the problem was introduced in OpenShift 4.8.0 with the custom route configuration enhancement (<https://github.com/openshift/enhancements/blob/master/enhancements/ingress/custom-route-configuration.md>). We need to fix the issue in 4.11.0, and we can backport fixes in z-streams if required. Because the issue is in a shipped release, it doesn't make sense to block 4.10.0 or any z-stream releases for this issue, so I am setting blocker-.
*** This bug has been marked as a duplicate of bug 2039256 ***