Bug 2049473 - Authentication Operator fails during installation when domain name ends with a digit
Summary: Authentication Operator fails during installation when domain name ends with ...
Keywords:
Status: CLOSED DUPLICATE of bug 2039256
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: s390x
OS: Linux
unspecified
low
Target Milestone: ---
: ---
Assignee: aos-network-edge-staff
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks: ocp-49-z-tracker 2009709
TreeView+ depends on / blocked
 
Reported: 2022-02-02 10:17 UTC by Stefan Orth
Modified: 2022-08-04 22:35 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-03 15:23:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Stefan Orth 2022-02-02 10:17:55 UTC
Description of problem:

Unable to install OCP Cluster when the domain name ends with a digit. Authentication Clusteroperator fails with error:

# oc get co
NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.0-fc.4   False       False         True       41m     OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found

See detailed message below.


Version-Release number of selected component (if applicable):

# oc version
Client Version: 4.10.0-fc.4
Server Version: 4.10.0-fc.4
Kubernetes Version: v1.23.0+d30ebbc

# oc get nodes
NAME                         STATUS   ROLES    AGE   VERSION
master-0.m3558001.ocptest1   Ready    master   38m   v1.23.0+d30ebbc
master-1.m3558001.ocptest1   Ready    master   38m   v1.23.0+d30ebbc
master-2.m3558001.ocptest1   Ready    master   37m   v1.23.0+d30ebbc
worker-0.m3558001.ocptest1   Ready    worker   24m   v1.23.0+d30ebbc
worker-1.m3558001.ocptest1   Ready    worker   24m   v1.23.0+d30ebbc

Also reported for OCP 4.9.!

How reproducible:

Install a OCP cluster with a domain name which ends with a digit

Steps to Reproduce:
1. Install a OCP cluster with a domain name which ends with a digit
2.
3.

Actual results:

# oc describe co authentication
Name:         authentication
Namespace:    
Labels:       <none>
Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2022-02-02T08:39:01Z
  Generation:          1
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:exclude.release.openshift.io/internal-openshift-hosted:
          f:include.release.openshift.io/self-managed-high-availability:
          f:include.release.openshift.io/single-node-developer:
        f:ownerReferences:
          .:
          k:{"uid":"699e9556-072a-4a5a-8899-d8809b6ff3b4"}:
      f:spec:
    Manager:      Go-http-client
    Operation:    Update
    Time:         2022-02-02T08:39:01Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
        f:extension:
        f:relatedObjects:
        f:versions:
    Manager:      Go-http-client
    Operation:    Update
    Subresource:  status
    Time:         2022-02-02T08:41:21Z
  Owner References:
    API Version:     config.openshift.io/v1
    Kind:            ClusterVersion
    Name:            version
    UID:             699e9556-072a-4a5a-8899-d8809b6ff3b4
  Resource Version:  25295
  UID:               497eda94-d826-4d18-bdf7-00785402e9fb
Spec:
Status:
  Conditions:
    Last Transition Time:  2022-02-02T08:43:23Z
    Message:               CustomRouteControllerDegraded: Ingress.config.openshift.io "cluster" is invalid: [status.componentRoutes.currentHostnames: Invalid value: "oauth-openshift.apps.m3558001.ocptest1": status.componentRoutes.currentHostnames in body must be of type hostname: "oauth-openshift.apps.m3558001.ocptest1", status.componentRoutes.defaultHostname: Invalid value: "oauth-openshift.apps.m3558001.ocptest1": status.componentRoutes.defaultHostname in body must be of type hostname: "oauth-openshift.apps.m3558001.ocptest1"]
OAuthServerRouteEndpointAccessibleControllerDegraded: ingress.config/cluster does not yet have status for the "openshift-authentication/oauth-openshift" route
    Reason:                CustomRouteController_SyncError::OAuthServerRouteEndpointAccessibleController_SyncError
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2022-02-02T09:04:19Z
    Message:               AuthenticatorCertKeyProgressing: All is well
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2022-02-02T08:41:21Z
    Message:               OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found
    Reason:                OAuthServerRouteEndpointAccessibleController_ResourceNotFound
    Status:                False
    Type:                  Available
    Last Transition Time:  2022-02-02T08:41:21Z
    Message:               All is well
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:      operator.openshift.io
    Name:       cluster
    Resource:   authentications
    Group:      config.openshift.io
    Name:       cluster
    Resource:   authentications
    Group:      config.openshift.io
    Name:       cluster
    Resource:   infrastructures
    Group:      config.openshift.io
    Name:       cluster
    Resource:   oauths
    Group:      route.openshift.io
    Name:       oauth-openshift
    Namespace:  openshift-authentication
    Resource:   routes
    Group:      
    Name:       oauth-openshift
    Namespace:  openshift-authentication
    Resource:   services
    Group:      
    Name:       openshift-config
    Resource:   namespaces
    Group:      
    Name:       openshift-config-managed
    Resource:   namespaces
    Group:      
    Name:       openshift-authentication
    Resource:   namespaces
    Group:      
    Name:       openshift-authentication-operator
    Resource:   namespaces
    Group:      
    Name:       openshift-ingress
    Resource:   namespaces
    Group:      
    Name:       openshift-oauth-apiserver
    Resource:   namespaces
  Versions:
    Name:     operator
    Version:  4.10.0-fc.4
    Name:     oauth-apiserver
    Version:  4.10.0-fc.4
    Name:     oauth-openshift
    Version:  4.10.0-fc.4_openshift
Events:       <none>


Expected results:

OCP Cluster installation works with a domain name which ends with a digit. As a workaround, the limitation should be documented.

Additional info:

All other Cluster Operator installed successfully:

# oc get co
NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.0-fc.4   False       False         True       94m     OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found
baremetal                                  4.10.0-fc.4   True        False         False      92m     
cloud-controller-manager                   4.10.0-fc.4   True        False         False      95m     
cloud-credential                           4.10.0-fc.4   True        False         False      94m     
cluster-autoscaler                         4.10.0-fc.4   True        False         False      92m     
config-operator                            4.10.0-fc.4   True        False         False      94m     
console                                    4.10.0-fc.4   True        False         False      75m     
csi-snapshot-controller                    4.10.0-fc.4   True        False         False      93m     
dns                                        4.10.0-fc.4   True        False         False      92m     
etcd                                       4.10.0-fc.4   True        False         False      92m     
image-registry                             4.10.0-fc.4   True        False         False      82m     
ingress                                    4.10.0-fc.4   True        False         False      44m     
insights                                   4.10.0-fc.4   True        False         False      85m     
kube-apiserver                             4.10.0-fc.4   True        False         False      81m     
kube-controller-manager                    4.10.0-fc.4   True        False         False      91m     
kube-scheduler                             4.10.0-fc.4   True        False         False      91m     
kube-storage-version-migrator              4.10.0-fc.4   True        False         False      39m     
machine-api                                4.10.0-fc.4   True        False         False      93m     
machine-approver                           4.10.0-fc.4   True        False         False      92m     
machine-config                             4.10.0-fc.4   True        False         False      92m     
marketplace                                4.10.0-fc.4   True        False         False      92m     
monitoring                                 4.10.0-fc.4   True        False         False      73m     
network                                    4.10.0-fc.4   True        False         False      94m     
node-tuning                                4.10.0-fc.4   True        False         False      80m     
openshift-apiserver                        4.10.0-fc.4   True        False         False      88m     
openshift-controller-manager               4.10.0-fc.4   True        False         False      91m     
openshift-samples                          4.10.0-fc.4   True        False         False      84m     
operator-lifecycle-manager                 4.10.0-fc.4   True        False         False      93m     
operator-lifecycle-manager-catalog         4.10.0-fc.4   True        False         False      93m     
operator-lifecycle-manager-packageserver   4.10.0-fc.4   True        False         False      84m     
service-ca                                 4.10.0-fc.4   True        False         False      94m     
storage                                    4.10.0-fc.4   True        False         False      94m

Comment 1 Stefan Orth 2022-02-02 10:27:17 UTC
Also seen in 4.9.15

Comment 2 Standa Laznicka 2022-02-02 13:22:44 UTC
It looks like there is a discrepancy in hostname validation between what's allowed in installer and what's considered a "hostname" by the CRD validation.

The CRD validation for fields marked as `Format=hostname`, such as the "currentHostname" here (ref. https://github.com/openshift/api/blob/d74727069f6fb9193d6ba50ed7e3a5edb2477bec/config/v1/types_ingress.go#L95 https://github.com/openshift/api/blob/d74727069f6fb9193d6ba50ed7e3a5edb2477bec/config/v1/types_ingress.go#L179) is eventually going to be handled by the following regex: https://github.com/kubernetes/kube-openapi/blob/424119656bbfd8b633f4b9f9ef5f93cd1e01266a/pkg/validation/strfmt/default.go#L58. As you can see, the regular expression does not allow a digit to be the last symbol.

I am going to move this BZ to the Routing component since they own the ingress.config object to determine what they want to do with it. They may have better opinions about what a hostname should look like, too.

Comment 3 Miciah Dashiel Butler Masters 2022-02-02 14:13:30 UTC
We have a report of a similar problem in bug 2039256.  In the previous report, the issue is that the reporter is trying to customize routes to use a TLD that contains a digit, and updating the ingress.config object's spec fails, which causes the user's customization to fail.  

In this new report, the issue is that the reporter is trying to install a cluster with a TLD that contains a digit; operators put the default routes' host names in the ingress.config object's status, and updating this status fails, which causes the operators to go degraded.  

Both reports have a similar cause: Validation related to route customization in the ingress.config object uses `+kubebuilder:validation:Format=hostname`, which restricts TLDs from including digits.  For spec, this restriction poses a limitation in the use of the new customizeable route feature.  For status though, this restriction poses a problem for functionality that presumably used to work (namely using TLDs with digits for the cluster domain) before the enhancement was implemented.  That makes the issue in this new report a regression.  

As I understand it, the problem was introduced in OpenShift 4.8.0 with the custom route configuration enhancement (<https://github.com/openshift/enhancements/blob/master/enhancements/ingress/custom-route-configuration.md>).  We need to fix the issue in 4.11.0, and we can backport fixes in z-streams if required.  Because the issue is in a shipped release, it doesn't make sense to block 4.10.0 or any z-stream releases for this issue, so I am setting blocker-.

Comment 4 Candace Holman 2022-02-03 15:23:43 UTC

*** This bug has been marked as a duplicate of bug 2039256 ***


Note You need to log in before you can comment on or make changes to this bug.