Description of problem:
Kuryr uses OpenStack Octavia to create loadbalancers that are the OpenShift Services. Queens Octavia does not support UDP, so those services won't be created. The problem starts when we're having an LB for the DNS service. To work this around, installations with Kuryr create an admission controller that modifies all pods to have `use-vc` option set in the resolv.conf, forcing pods to do DNS resolution through TCP.
insights operator seems to be started before the admission controller, so it is not getting that option and fails to send reports to cloud.redhat.com due to DNS resolution issue.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Run installation with Kuryr
Insights operator has this in logs:
I0919 07:05:05.686477 1 diskrecorder.go:303] Found files to send: [/var/lib/insights-operator/insights-2019-09-19-064437.tar.gz]
I0919 07:05:05.686615 1 insightsuploader.go:126] Uploading latest report since 0001-01-01T00:00:00Z
I0919 07:05:05.696600 1 insightsclient.go:160] Uploading application/vnd.redhat.openshift.periodic to https://cloud.redhat.com/api/ingress/v1/upload
I0919 07:05:35.696932 1 insightsclient.go:163] Unable to build a request, possible invalid token: Post https://cloud.redhat.com/api/ingress/v1/upload: dial tcp: i/o timeout
I0919 07:05:35.697032 1 insightsuploader.go:132] Unable to upload report after 30.01s: unable to build request to connect to Insights server
I0919 07:05:35.697070 1 controllerstatus.go:34] name=insightsuploader healthy=false reason=UploadFailed message=Unable to report: unable to build request to connect to Insights server
W0919 07:07:56.939684 1 configobserver.go:54] Unable to retrieve config: could not check support secret: Get https://172.30.0.1:443/api/v1/namespaces/openshift-config/secrets/support: unexpected EOF
E0919 07:07:56.939724 1 status.go:302] Unable to write cluster operator status: Get https://172.30.0.1:443/apis/config.openshift.io/v1/clusteroperators/insights: unexpected EOF
I0919 07:07:56.959915 1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
I0919 07:08:37.065988 1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
I0919 07:08:37.066057 1 status.go:338] No status update necessary, objects are identical
I0919 07:10:37.057369 1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
I0919 07:10:37.057472 1 status.go:338] No status update necessary, objects are identical
I0919 07:12:37.058380 1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
Smooth sending without errors.
Moved to 4.3. We can backport a fix into 4.2.z if appropriate.
Hi Michal, is this a 4.3 blocker bug?
I don't think it should be a blocker - we don't really have a feasible solution in close perspective and there is a workaround - just delete problematic pod and it will be fine once recreated.
Verified in 4.4.0-0.nightly-2019-12-20-210709 on 2019-12-13.1 OSP 13 puddle.
The OCP installer finishes successfully:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.4.0-0.nightly-2019-12-20-210709 True False 14h Cluster version is 4.4.0-0.nightly-2019-12-20-210709
insights-operator is newer then the kuryr-dns-admission-controller pods, as now insights-operator is restarted once the admission controller
is ready, so it can be configured to use TCP instead of UDP for DNS resolution.
$ oc get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-insights insights-operator-796d568d96-9jrt5 1/1 Running 2 17h
openshift-kuryr kuryr-dns-admission-controller-bjz5b 1/1 Running 0 17h
openshift-kuryr kuryr-dns-admission-controller-r5jmk 1/1 Running 0 17h
openshift-kuryr kuryr-dns-admission-controller-t77dt 1/1 Running 0 17h
$ oc -n openshift-insights describe pod insights-operator-796d568d96-9jrt5 | grep Started
Started: Mon, 23 Dec 2019 12:40:06 -0500
Started: Mon, 23 Dec 2019 12:30:26 -0500
$ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-bjz5b | grep Started
Started: Mon, 23 Dec 2019 12:22:20 -0500
$ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-r5jmk | grep Started
Started: Mon, 23 Dec 2019 12:20:29 -0500
$ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-t77dt | grep Started
Started: Mon, 23 Dec 2019 12:22:56 -0500
insights-operator is not showing error messages either.
Adding more info:
$ oc -n openshift-insights get pod insights-operator-796d568d96-9jrt5 -o yaml | grep -A3 dnsConfig
- name: use-vc
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.