Bug 1754046 - Insights operator unable to resolve cloud.redhat.com with Kuryr
Summary: Insights operator unable to resolve cloud.redhat.com with Kuryr
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: All
OS: All
low
low
Target Milestone: ---
: 4.4.0
Assignee: Luis Tomas Bolivar
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks: 1783259
TreeView+ depends on / blocked
 
Reported: 2019-09-20 16:17 UTC by Michał Dulko
Modified: 2020-05-04 11:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1783259 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:13:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 423 0 None closed Bug 1754046: Modif kuryr-admission-controller pod definition 2021-02-16 11:40:40 UTC
Github openshift kuryr-kubernetes pull 108 0 None closed Bug 1754046: Ensure kuryr webhook is running before other operators' pods 2021-02-16 11:40:40 UTC
OpenStack gerrit 699177 0 None MERGED Set defaults for certs and token on the k8s client 2021-02-16 11:40:40 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:14:26 UTC

Description Michał Dulko 2019-09-20 16:17:54 UTC
Description of problem:
Kuryr uses OpenStack Octavia to create loadbalancers that are the OpenShift Services. Queens Octavia does not support UDP, so those services won't be created. The problem starts when we're having an LB for the DNS service. To work this around, installations with Kuryr create an admission controller that modifies all pods to have `use-vc` option set in the resolv.conf, forcing pods to do DNS resolution through TCP.

insights operator seems to be started before the admission controller, so it is not getting that option and fails to send reports to cloud.redhat.com due to DNS resolution issue.

Version-Release number of selected component (if applicable):
4.2.0

How reproducible:
Always

Steps to Reproduce:
1. Run installation with Kuryr

Actual results:
Insights operator has this in logs:
I0919 07:05:05.686477       1 diskrecorder.go:303] Found files to send: [/var/lib/insights-operator/insights-2019-09-19-064437.tar.gz]
I0919 07:05:05.686615       1 insightsuploader.go:126] Uploading latest report since 0001-01-01T00:00:00Z
I0919 07:05:05.696600       1 insightsclient.go:160] Uploading application/vnd.redhat.openshift.periodic to https://cloud.redhat.com/api/ingress/v1/upload
I0919 07:05:35.696932       1 insightsclient.go:163] Unable to build a request, possible invalid token: Post https://cloud.redhat.com/api/ingress/v1/upload: dial tcp: i/o timeout
I0919 07:05:35.697032       1 insightsuploader.go:132] Unable to upload report after 30.01s: unable to build request to connect to Insights server
I0919 07:05:35.697070       1 controllerstatus.go:34] name=insightsuploader healthy=false reason=UploadFailed message=Unable to report: unable to build request to connect to Insights server
W0919 07:07:56.939684       1 configobserver.go:54] Unable to retrieve config: could not check support secret: Get https://172.30.0.1:443/api/v1/namespaces/openshift-config/secrets/support: unexpected EOF
E0919 07:07:56.939724       1 status.go:302] Unable to write cluster operator status: Get https://172.30.0.1:443/apis/config.openshift.io/v1/clusteroperators/insights: unexpected EOF
I0919 07:07:56.959915       1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
I0919 07:08:37.065988       1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
I0919 07:08:37.066057       1 status.go:338] No status update necessary, objects are identical
I0919 07:10:37.057369       1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server
I0919 07:10:37.057472       1 status.go:338] No status update necessary, objects are identical
I0919 07:12:37.058380       1 status.go:245] The operator has some internal errors: Unable to report: unable to build request to connect to Insights server

Expected results:
Smooth sending without errors.

Additional info:

Comment 3 Ben Bennett 2019-09-20 18:00:02 UTC
Moved to 4.3.  We can backport a fix into 4.2.z if appropriate.

Comment 4 Neelesh Agrawal 2019-11-26 14:32:31 UTC
Hi Michal, is this a 4.3 blocker bug?

Comment 5 Michał Dulko 2019-11-28 10:36:19 UTC
I don't think it should be a blocker - we don't really have a feasible solution in close perspective and there is a workaround - just delete problematic pod and it will be fine once recreated.

Comment 7 Jon Uriarte 2019-12-24 10:52:35 UTC
Verified in 4.4.0-0.nightly-2019-12-20-210709 on 2019-12-13.1 OSP 13 puddle.

The OCP installer finishes successfully:

 $ oc get clusterversion
 NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
 version   4.4.0-0.nightly-2019-12-20-210709   True        False         14h     Cluster version is 4.4.0-0.nightly-2019-12-20-210709

insights-operator is newer then the kuryr-dns-admission-controller pods, as now insights-operator is restarted once the admission controller
is ready, so it can be configured to use TCP instead of UDP for DNS resolution.

 $ oc get pods -A
 NAMESPACE             NAME                                  READY   STATUS      RESTARTS   AGE
 openshift-insights    insights-operator-796d568d96-9jrt5    1/1     Running     2          17h
 openshift-kuryr       kuryr-dns-admission-controller-bjz5b  1/1     Running     0          17h
 openshift-kuryr       kuryr-dns-admission-controller-r5jmk  1/1     Running     0          17h
 openshift-kuryr       kuryr-dns-admission-controller-t77dt  1/1     Running     0          17h

 $ oc -n openshift-insights describe pod insights-operator-796d568d96-9jrt5 | grep Started
      Started:   Mon, 23 Dec 2019 12:40:06 -0500
      Started:      Mon, 23 Dec 2019 12:30:26 -0500

 $ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-bjz5b | grep Started
      Started:      Mon, 23 Dec 2019 12:22:20 -0500

 $ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-r5jmk | grep Started
      Started:      Mon, 23 Dec 2019 12:20:29 -0500

 $ oc -n openshift-kuryr describe pod kuryr-dns-admission-controller-t77dt | grep Started
      Started:      Mon, 23 Dec 2019 12:22:56 -0500

insights-operator is not showing error messages either.

Comment 8 Jon Uriarte 2019-12-24 11:00:44 UTC
Adding more info:

$ oc -n openshift-insights get pod insights-operator-796d568d96-9jrt5 -o yaml | grep -A3 dnsConfig 
  dnsConfig:
    options:
    - name: use-vc
      value: ""

Comment 10 errata-xmlrpc 2020-05-04 11:13:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.