Description of problem: Kuryr uses OpenStack Octavia to create loadbalancers that are the OpenShift Services. Queens Octavia does not support UDP, so those services won't be created. The problem starts when we're having an LB for the DNS service. To work this around, installations with Kuryr create an admission controller that modifies all pods to have `use-vc` option set in the resolv.conf, forcing pods to do DNS resolution through TCP. insights operator seems to be started before the admission controller, so it is not getting that option and fails to send reports to cloud.redhat.com due to DNS resolution issue. Version-Release number of selected component (if applicable): 4.2.0 How reproducible: Always Steps to Reproduce: 1. Run installation with Kuryr Actual results: Insights operator has this in logs: I0919 07:05:05.686477 1 diskrecorder.go:303] Found files to send: [/var/lib/insights-operator/insights-2019-09-19-064437.tar.gz] I0919 07:05:05.686615 1 insightsuploader.go:126] Uploading latest report since 0001-01-01T00:00:00Z I0919 07:05:05.696600 1 insightsclient.go:160] Uploading application/vnd.redhat.openshift.periodic to https://cloud.redhat.com/api/ingress/v1/upload I0919 07:05:35.696932 1 insightsclient.go:163] Unable to build a request, possible invalid token: Post https://cloud.redhat.com/api/ingress/v1/upload: dial tcp: i/o timeout I0919 07:05:35.697032 1 insightsuploader.go:132] Unable to upload report after 30.01s: unable to build request to connect to Insights server I0919 07:05:35.697070 1 controllerstatus.go:34] name=insightsuploader healthy=false reason=UploadFailed message=Unable to report: unable to build request to connect to Insights server W0919 07:07:56.939684 1 configobserver.go:54] Unable to retrieve config: could not check support secret: Get https://172.30.0.1:443/api/v1/namespaces/openshift-config/secrets/support: unexpected EOF E0919 07:07:56.939724 1 status.go:302] Unable to write cluster operator status: Get https://172.30.0.1:443/apis/config.openshift.io/v1/clusteroperators/insights: unexpected EOF I0919 07:07:56.959915 1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server I0919 07:08:37.065988 1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server I0919 07:08:37.066057 1 status.go:338] No status update necessary, objects are identical I0919 07:10:37.057369 1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server I0919 07:10:37.057472 1 status.go:338] No status update necessary, objects are identical I0919 07:12:37.058380 1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server Expected results: Smooth sending without errors. Additional info:
Moved to 4.3. We can backport a fix into 4.2.z if appropriate.
Hi Michal, is this a 4.3 blocker bug?
I don't think it should be a blocker - we don't really have a feasible solution in close perspective and there is a workaround - just delete problematic pod and it will be fine once recreated.
Verified in 4.4.0-0.nightly-2019-12-20-210709 on 2019-12-13.1 OSP 13 puddle. The OCP installer finishes successfully: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2019-12-20-210709 True False 14h Cluster version is 4.4.0-0.nightly-2019-12-20-210709 insights-operator is newer then the kuryr-dns-admission-controller pods, as now insights-operator is restarted once the admission controller is ready, so it can be configured to use TCP instead of UDP for DNS resolution. $ oc get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE openshift-insights insights-operator-796d568d96-9jrt5 1/1 Running 2 17h openshift-kuryr kuryr-dns-admission-controller-bjz5b 1/1 Running 0 17h openshift-kuryr kuryr-dns-admission-controller-r5jmk 1/1 Running 0 17h openshift-kuryr kuryr-dns-admission-controller-t77dt 1/1 Running 0 17h $ oc -n openshift-insights describe pod insights-operator-796d568d96-9jrt5 | grep Started Started: Mon, 23 Dec 2019 12:40:06 -0500 Started: Mon, 23 Dec 2019 12:30:26 -0500 $ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-bjz5b | grep Started Started: Mon, 23 Dec 2019 12:22:20 -0500 $ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-r5jmk | grep Started Started: Mon, 23 Dec 2019 12:20:29 -0500 $ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-t77dt | grep Started Started: Mon, 23 Dec 2019 12:22:56 -0500 insights-operator is not showing error messages either.
Adding more info: $ oc -n openshift-insights get pod insights-operator-796d568d96-9jrt5 -o yaml | grep -A3 dnsConfig dnsConfig: options: - name: use-vc value: ""
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581