Bug 1962502

Summary: The route generated from ingress is still admitted after updating the spec.ingressClassName to mismatch
Product: OpenShift Container Platform Reporter: Hongan Li <hongli>
Component: NetworkingAssignee: Suleyman Akbas <sakbas>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: low CC: jaldinge, mmasters, sakbas
Version: 4.8   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
*Previously, a cluster that was upgraded from a version of {product-title} earlier than 4.8 could have `orphaned` `Route` objects. This is caused by earlier versions of {product-title} translating `Ingress` objects into `Route` objects irrespective of a given `Ingress` object’s indicated `IngressClass`. With this update, an administrator is made aware of `Ingress` objects that do not specify an `IngressClass` and `Route` objects that have been orphaned through an alert. The alert notifies administrators about any `orphaned` `Route` objects and any `Ingress` objects that do not specify any `IngressClass`. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1962502[*BZ#1962502*])
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:46:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hongan Li 2021-05-20 08:38:13 UTC
Description of problem:
The route generated from ingress is still admitted after updating the spec.ingressClassName to mismatch

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-05-19-123944

How reproducible:
100%

Steps to Reproduce:
1. create pod/svc in your namespace/project
$ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/routing/web-server-rc.yaml

2. create ingress resource with spec.ingressClassName option 
e.g.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: k8s-ingress-test
spec:
  ingressClassName: openshift-test
  rules:
  - host: foo.bar.com
    http:
      paths:
      - backend:
          service:
            name: service-unsecure
            port:
              number: 27017
        pathType: ImplementationSpecific

3. update the spec.ingressClassName to openshift-default
$ oc patch ingress/k8s-ingress-test --type=merge --patch '{"spec":{"ingressClassName":"openshift-default"}}'

4. update the spec.ingressClassName back to openshift-test
$ oc patch ingress/k8s-ingress-test --type=merge --patch '{"spec":{"ingressClassName":"openshift-test"}}'


Actual results:
step 2: no route is generated from the ingress
step 3: a route named k8s-ingress-test-5s4gn is created
step 4: the route named k8s-ingress-test-5s4gn is still there and admitted

Expected results:
step 4: the route generated from ingress should be removed since the ingressClassName is mismatched.

Additional info:
workaround: deleting the route manually

Comment 1 Miciah Dashiel Butler Masters 2021-05-21 04:14:31 UTC
> step 4: the route named k8s-ingress-test-5s4gn is still there and admitted

In a way, OpenShift is actually working as designed.  The behavior is related to a decision regarding how we should handle ingresses that were made with previous versions of OpenShift:

    Finally, it is possible that a user could have created an Ingress with
    some nonempty value for spec.ingressClassName that did not match an
    OpenShift IngressClass object, and nevertheless intended for OpenShift
    to expose this Ingress.  Again, it is impossible to determine reliably
    what a user's intent was in such a scenario, but as OpenShift exposed
    such an Ingress before this enhancement, changing this behavior could
    break existing applications.
    
    To mitigate this last risk, the ingress-to-route controller does not
    remove Routes that earlier versions of OpenShift created for Ingresses
    that specify spec.ingressClassName.  Thus these Routes will continue to
    be in effect.  However, after this enhancement, OpenShift does not update
    such Routes and does not recreate them if the user deletes them.  As
    follow-up work to this enhancement, we are considering adding alerts in
    case any Routes existed in this state, so that the administrator would
    know that the Routes needed to be deleted, or the Ingress modified to
    specify an appropriate IngressClass so that OpenShift would once again
    reconcile the Routes.

https://github.com/openshift/enhancements/blob/master/enhancements/ingress/transition-ingress-from-beta-to-stable.md#risks-and-mitigations

    As follow-up work, we are considering modifying the ingress operator
    to list all Ingresses and Routes in the cluster and publish a metric
    for Routes that were created for Ingresses that OpenShift no longer
    manages.  This metric could be used in alerting rules.  The following
    alerting rules would be added to the ingress operator (see "Risks and
    Mitigations" for more context as to the purpose of these alerts):
    
    * An alert for Routes that were created from Ingresses that OpenShift
      is no longer managing.  

    * An alert for Ingresses older than 1 day that do not specify
      spec.ingressClassName.

https://github.com/openshift/enhancements/blob/master/enhancements/ingress/transition-ingress-from-beta-to-stable.md#implementation-details

The scenario that you describe is similar, in that it relates to what OpenShift should do with a route that was created for an ingress when the ingress specifies a nonexistent ingressclass.  

Granted, the scenario that you describe isn't quite the same as the scenario addressed in the enhancement proposal, and we *could* modify the ingress operator to remove routes for ingresses associated with an ingressclass when that ingressclass were deleted.  However, I'm still concerned that automatically performing such a destructive action would be risky.  It makes sense that deleting an ingress, route, or ingresscontroller disrupts traffic, but it seems less obvious that deleting an ingressclass would do the same.  

In my opinion, it thus makes sense to resolve this BZ by implementing the alert described in the enhancement proposal.  The result would be that after Step 4 in the BZ description, OpenShift would raise an alert indicating that an ingress that specified a nonexistent ingressclass existed.  Does this seem like a reasonable solution?

Comment 2 Hongan Li 2021-05-21 09:29:04 UTC
Thank you for your detailed explanation, Miciah. 

Yes, I think implementing the alert is a reasonable solution.

Comment 4 Arvind iyengar 2022-09-06 08:13:48 UTC
Verified with "4.12.0-0.ci.test-2022-09-06-043444-ci-ln-b6zz2d2-latest" image build with the patch via clusterbot. With this payload, the newly added "openshift_ingress_to_route_controller_ingress_without_class_name" and "openshift_ingress_to_route_controller_route_with_unmanaged_owner" are visible in the metric queries and they appear to get updated correctly for the respect state of ingresses having the ingressclasses uset or unset with mismatching values [Reference images attached]

Comment 7 Arvind iyengar 2022-09-20 07:59:25 UTC
With the inclusion of PR#823, it is observed that the "openshift_ingress_to_route_controller_ingress_without_class_name" increments and decrements properly as expected

Comment 12 Hongan Li 2022-11-08 06:50:36 UTC
Verified with 4.12.0-0.nightly-2022-11-07-181244 and passed.

we can see the alerts message on web UI: 
"Route k8s-ingress-test-mns6d is owned by an unmanaged Ingress."

and the ingress can be queried by metrics "openshift_ingress_to_route_controller_ingress_without_class_name"

Comment 15 errata-xmlrpc 2023-01-17 19:46:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399