1686204 – Restoring the default ingress controller requires operator restart

Bug 1686204 - Restoring the default ingress controller requires operator restart

Summary: Restoring the default ingress controller requires operator restart

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Dan Mace
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-06 22:34 UTC by Dan Mace
Modified:	2022-08-04 22:20 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:45:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 172	0	None	closed	Bug 1686204: operator: periodically ensure default ingresscontroller	2020-11-04 17:01:27 UTC
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:45:33 UTC

Description Dan Mace 2019-03-06 22:34:20 UTC

Description of problem:

Restoring the default ingress controller after deletion requires restarting the ingress operator. The ingress operator should recreate the default without user intervention or operator downtime.

Version-Release number of selected component (if applicable):


How reproducible:

oc delete -n openshift-ingress-operator clusteringresses/default


Actual results:

clusteringresses/default is only recreated if the ingress operator is restarted.

Expected results:

The default ingress controller should be automatically recreated.

Additional info:

Comment 1 Daneyon Hansen 2019-03-07 00:39:14 UTC

I was able to delete and recreate the default ingress controller (aka clusteringress) without restarting the cluster-ingress-operator:

$ oc delete clusteringress/default -n openshift-ingress-operator
clusteringress.ingress.openshift.io "default" deleted

$ oc get clusteringresses -n openshift-ingress-operator
No resources found.

$ oc get deploy -n openshift-ingress
No resources found.

$ oc get svc -n openshift-ingress
No resources found.
 
$ oc create -f assets/defaults/cluster-ingress.yaml 
clusteringress.ingress.openshift.io/default created

$ oc get clusteringresses -n openshift-ingress-operator
NAME      AGE
default   56s

$ oc get deploy -n openshift-ingress
NAME             READY     UP-TO-DATE   AVAILABLE   AGE
router-default   2/2       2            2           61s

$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)                      AGE
router-default            LoadBalancer   172.30.197.85    aa41d9d6c406e11e9bd5a0e63434e58b-1280454278.us-east-1.elb.amazonaws.com   80:30794/TCP,443:32084/TCP   65s
router-internal-default   ClusterIP      172.30.139.207   <none>                                                                    80/TCP,443/TCP,1936/TCP      65s

I verified before/after ingress connectivity by accessing the web console using the hostname from the console route. Keep in mind that propagating DNS names from authoritative name servers to resolvers can take several minutes:

$ dig console-openshift-console.apps.danehans.devcluster.openshift.com

; <<>> DiG 9.10.6 <<>> console-openshift-console.apps.danehans.devcluster.openshift.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12580
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;console-openshift-console.apps.danehans.devcluster.openshift.com. IN A

;; ANSWER SECTION:
console-openshift-console.apps.danehans.devcluster.openshift.com. 5 IN A 34.199.157.5
console-openshift-console.apps.danehans.devcluster.openshift.com. 5 IN A 18.214.218.55

;; Query time: 42 msec
;; SERVER: 10.192.20.245#53(10.192.20.245)
;; WHEN: Wed Mar 06 19:26:40 EST 2019
;; MSG SIZE  rcvd: 125


For the "The ingress operator should recreate the default without user intervention" part of the bug, it sounds like this ingress controller should be named "mandatory" or "required" instead of default.

Comment 4 Hongan Li 2019-03-21 06:10:18 UTC

verified with 4.0.0-0.nightly-2019-03-20-153904 and issue has been fixed. The ingresscontroller/default can be recreated automatically after deleting it.


$ oc get pod -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7cf558bd7f-hj5cm   1/1     Running   0          4h48m
router-default-7cf558bd7f-r55hj   1/1     Running   0          4h48m
$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)                      AGE
router-default            LoadBalancer   172.30.110.219   a74971e0e4b7511e9a8eb0a21cd44590-1114946490.ap-northeast-1.elb.amazonaws.com   80:32276/TCP,443:31791/TCP   4h50m
router-internal-default   ClusterIP      172.30.180.114   <none>                                                                         80/TCP,443/TCP,1936/TCP      4h50m


$ oc delete -n openshift-ingress-operator ingresscontroller/default


$ oc get pod -n openshift-ingress
NAME                              READY   STATUS              RESTARTS   AGE
router-default-7cf558bd7f-rj46k   0/1     ContainerCreating   0          2s
router-default-7cf558bd7f-sp8fh   0/1     ContainerCreating   0          2s
$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)                      AGE
router-default            LoadBalancer   172.30.129.155   a4123fbf24b9e11e980c9069c412f6e1-1870353468.ap-northeast-1.elb.amazonaws.com   80:31146/TCP,443:31551/TCP   65s
router-internal-default   ClusterIP      172.30.145.212   <none>                                                                         80/TCP,443/TCP,1936/TCP      65s

As mentioned in Comment 1, since LB changed and DNS propagation need some time so cannot access any route during this time.

Comment 6 errata-xmlrpc 2019-06-04 10:45:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.