Bug 1686204 - Restoring the default ingress controller requires operator restart
Summary: Restoring the default ingress controller requires operator restart
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.1.0
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-06 22:34 UTC by Dan Mace
Modified: 2019-06-04 10:45 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:45:17 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:45:33 UTC
Github openshift cluster-ingress-operator pull 172 None None None 2019-03-18 17:20:30 UTC

Description Dan Mace 2019-03-06 22:34:20 UTC
Description of problem:

Restoring the default ingress controller after deletion requires restarting the ingress operator. The ingress operator should recreate the default without user intervention or operator downtime.

Version-Release number of selected component (if applicable):


How reproducible:

oc delete -n openshift-ingress-operator clusteringresses/default


Actual results:

clusteringresses/default is only recreated if the ingress operator is restarted.

Expected results:

The default ingress controller should be automatically recreated.

Additional info:

Comment 1 Daneyon Hansen 2019-03-07 00:39:14 UTC
I was able to delete and recreate the default ingress controller (aka clusteringress) without restarting the cluster-ingress-operator:

$ oc delete clusteringress/default -n openshift-ingress-operator
clusteringress.ingress.openshift.io "default" deleted

$ oc get clusteringresses -n openshift-ingress-operator
No resources found.

$ oc get deploy -n openshift-ingress
No resources found.

$ oc get svc -n openshift-ingress
No resources found.
 
$ oc create -f assets/defaults/cluster-ingress.yaml 
clusteringress.ingress.openshift.io/default created

$ oc get clusteringresses -n openshift-ingress-operator
NAME      AGE
default   56s

$ oc get deploy -n openshift-ingress
NAME             READY     UP-TO-DATE   AVAILABLE   AGE
router-default   2/2       2            2           61s

$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)                      AGE
router-default            LoadBalancer   172.30.197.85    aa41d9d6c406e11e9bd5a0e63434e58b-1280454278.us-east-1.elb.amazonaws.com   80:30794/TCP,443:32084/TCP   65s
router-internal-default   ClusterIP      172.30.139.207   <none>                                                                    80/TCP,443/TCP,1936/TCP      65s

I verified before/after ingress connectivity by accessing the web console using the hostname from the console route. Keep in mind that propagating DNS names from authoritative name servers to resolvers can take several minutes:

$ dig console-openshift-console.apps.danehans.devcluster.openshift.com

; <<>> DiG 9.10.6 <<>> console-openshift-console.apps.danehans.devcluster.openshift.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12580
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;console-openshift-console.apps.danehans.devcluster.openshift.com. IN A

;; ANSWER SECTION:
console-openshift-console.apps.danehans.devcluster.openshift.com. 5 IN A 34.199.157.5
console-openshift-console.apps.danehans.devcluster.openshift.com. 5 IN A 18.214.218.55

;; Query time: 42 msec
;; SERVER: 10.192.20.245#53(10.192.20.245)
;; WHEN: Wed Mar 06 19:26:40 EST 2019
;; MSG SIZE  rcvd: 125


For the "The ingress operator should recreate the default without user intervention" part of the bug, it sounds like this ingress controller should be named "mandatory" or "required" instead of default.

Comment 4 Hongan Li 2019-03-21 06:10:18 UTC
verified with 4.0.0-0.nightly-2019-03-20-153904 and issue has been fixed. The ingresscontroller/default can be recreated automatically after deleting it.


$ oc get pod -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7cf558bd7f-hj5cm   1/1     Running   0          4h48m
router-default-7cf558bd7f-r55hj   1/1     Running   0          4h48m
$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)                      AGE
router-default            LoadBalancer   172.30.110.219   a74971e0e4b7511e9a8eb0a21cd44590-1114946490.ap-northeast-1.elb.amazonaws.com   80:32276/TCP,443:31791/TCP   4h50m
router-internal-default   ClusterIP      172.30.180.114   <none>                                                                         80/TCP,443/TCP,1936/TCP      4h50m


$ oc delete -n openshift-ingress-operator ingresscontroller/default


$ oc get pod -n openshift-ingress
NAME                              READY   STATUS              RESTARTS   AGE
router-default-7cf558bd7f-rj46k   0/1     ContainerCreating   0          2s
router-default-7cf558bd7f-sp8fh   0/1     ContainerCreating   0          2s
$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)                      AGE
router-default            LoadBalancer   172.30.129.155   a4123fbf24b9e11e980c9069c412f6e1-1870353468.ap-northeast-1.elb.amazonaws.com   80:31146/TCP,443:31551/TCP   65s
router-internal-default   ClusterIP      172.30.145.212   <none>                                                                         80/TCP,443/TCP,1936/TCP      65s

As mentioned in Comment 1, since LB changed and DNS propagation need some time so cannot access any route during this time.

Comment 6 errata-xmlrpc 2019-06-04 10:45:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.