Bug 1754046

Summary: Insights operator unable to resolve cloud.redhat.com with Kuryr
Product: OpenShift Container Platform Reporter: Michał Dulko <mdulko>
Component: NetworkingAssignee: Luis Tomas Bolivar <ltomasbo>
Networking sub component: kuryr QA Contact: Jon Uriarte <juriarte>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: bbennett, eduen, juriarte, ltomasbo, nagrawal
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1783259 (view as bug list) Environment:
Last Closed: 2020-05-04 11:13:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1783259    

Description Michał Dulko 2019-09-20 16:17:54 UTC
Description of problem:
Kuryr uses OpenStack Octavia to create loadbalancers that are the OpenShift Services. Queens Octavia does not support UDP, so those services won't be created. The problem starts when we're having an LB for the DNS service. To work this around, installations with Kuryr create an admission controller that modifies all pods to have `use-vc` option set in the resolv.conf, forcing pods to do DNS resolution through TCP.

insights operator seems to be started before the admission controller, so it is not getting that option and fails to send reports to cloud.redhat.com due to DNS resolution issue.

Version-Release number of selected component (if applicable):
4.2.0

How reproducible:
Always

Steps to Reproduce:
1. Run installation with Kuryr

Actual results:
Insights operator has this in logs:
I0919 07:05:05.686477       1 diskrecorder.go:303] Found files to send: [/var/lib/insights-operator/insights-2019-09-19-064437.tar.gz]
I0919 07:05:05.686615       1 insightsuploader.go:126] Uploading latest report since 0001-01-01T00:00:00Z
I0919 07:05:05.696600       1 insightsclient.go:160] Uploading application/vnd.redhat.openshift.periodic to https://cloud.redhat.com/api/ingress/v1/upload
I0919 07:05:35.696932       1 insightsclient.go:163] Unable to build a request, possible invalid token: Post https://cloud.redhat.com/api/ingress/v1/upload: dial tcp: i/o timeout
I0919 07:05:35.697032       1 insightsuploader.go:132] Unable to upload report after 30.01s: unable to build request to connect to Insights server
I0919 07:05:35.697070       1 controllerstatus.go:34] name=insightsuploader healthy=false reason=UploadFailed message=Unable to report: unable to build request to connect to Insights server
W0919 07:07:56.939684       1 configobserver.go:54] Unable to retrieve config: could not check support secret: Get https://172.30.0.1:443/api/v1/namespaces/openshift-config/secrets/support: unexpected EOF
E0919 07:07:56.939724       1 status.go:302] Unable to write cluster operator status: Get https://172.30.0.1:443/apis/config.openshift.io/v1/clusteroperators/insights: unexpected EOF
I0919 07:07:56.959915       1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
I0919 07:08:37.065988       1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
I0919 07:08:37.066057       1 status.go:338] No status update necessary, objects are identical
I0919 07:10:37.057369       1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
I0919 07:10:37.057472       1 status.go:338] No status update necessary, objects are identical
I0919 07:12:37.058380       1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server

Expected results:
Smooth sending without errors.

Additional info:

Comment 3 Ben Bennett 2019-09-20 18:00:02 UTC
Moved to 4.3.  We can backport a fix into 4.2.z if appropriate.

Comment 4 Neelesh Agrawal 2019-11-26 14:32:31 UTC
Hi Michal, is this a 4.3 blocker bug?

Comment 5 Michał Dulko 2019-11-28 10:36:19 UTC
I don't think it should be a blocker - we don't really have a feasible solution in close perspective and there is a workaround - just delete problematic pod and it will be fine once recreated.

Comment 7 Jon Uriarte 2019-12-24 10:52:35 UTC
Verified in 4.4.0-0.nightly-2019-12-20-210709 on 2019-12-13.1 OSP 13 puddle.

The OCP installer finishes successfully:

 $ oc get clusterversion
 NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
 version   4.4.0-0.nightly-2019-12-20-210709   True        False         14h     Cluster version is 4.4.0-0.nightly-2019-12-20-210709

insights-operator is newer then the kuryr-dns-admission-controller pods, as now insights-operator is restarted once the admission controller
is ready, so it can be configured to use TCP instead of UDP for DNS resolution.

 $ oc get pods -A
 NAMESPACE             NAME                                  READY   STATUS      RESTARTS   AGE
 openshift-insights    insights-operator-796d568d96-9jrt5    1/1     Running     2          17h
 openshift-kuryr       kuryr-dns-admission-controller-bjz5b  1/1     Running     0          17h
 openshift-kuryr       kuryr-dns-admission-controller-r5jmk  1/1     Running     0          17h
 openshift-kuryr       kuryr-dns-admission-controller-t77dt  1/1     Running     0          17h

 $ oc -n openshift-insights describe pod insights-operator-796d568d96-9jrt5 | grep Started
      Started:   Mon, 23 Dec 2019 12:40:06 -0500
      Started:      Mon, 23 Dec 2019 12:30:26 -0500

 $ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-bjz5b | grep Started
      Started:      Mon, 23 Dec 2019 12:22:20 -0500

 $ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-r5jmk | grep Started
      Started:      Mon, 23 Dec 2019 12:20:29 -0500

 $ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-t77dt | grep Started
      Started:      Mon, 23 Dec 2019 12:22:56 -0500

insights-operator is not showing error messages either.

Comment 8 Jon Uriarte 2019-12-24 11:00:44 UTC
Adding more info:

$ oc -n openshift-insights get pod insights-operator-796d568d96-9jrt5 -o yaml | grep -A3 dnsConfig 
  dnsConfig:
    options:
    - name: use-vc
      value: ""

Comment 10 errata-xmlrpc 2020-05-04 11:13:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581