Bug 1786217 - [Kuryr] OLM cannot work since DNS is unavailable
Summary: [Kuryr] OLM cannot work since DNS is unavailable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.4.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On: 1811131
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-24 02:42 UTC by Jian Zhang
Modified: 2020-05-04 11:21 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 11:21:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
PR (74 bytes, text/plain)
2020-01-08 02:14 UTC, Jian Zhang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1216 0 None closed Enable CGO and CGO_DEBUG 2020-06-18 02:37:43 UTC
Github operator-framework operator-lifecycle-manager pull 1219 0 None closed Only enable CGO for prod builds 2020-06-18 02:37:43 UTC
Github operator-framework operator-lifecycle-manager pull 1221 0 None closed remove CGO_DEBUG 2020-06-18 02:37:42 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:21:47 UTC

Comment 1 Jian Zhang 2019-12-24 09:39:43 UTC
> Due to octavia not supporting UDP on OSP13, DNS resolution must use TCP and that is why we use thie use-vc configuration option (that should be added automatically for every new created pod). However there is a few limitations documented here: https://docs.openshift.com/container-platform/4.2/installing/installing_openstack/installing-openstack-installer-kuryr.html#installation-osp-kuryr-known-limitations_installing-openstack-installer-kuryr

So, if your base image is based on alpine/busybox or if it is a go application compiled with CGO_DEBUG, that use-vc flag is not enforced and still the dns resolution goes through UDP, hence failing.

The above comments from Luis. And, move on this issue to OLM component.

Comment 2 Dan Geoffroy 2020-01-02 12:43:56 UTC
Moving this to 4.4.  Will investigate and consider for 4.3.z backport.

Comment 3 Jian Zhang 2020-01-08 02:13:19 UTC
I submit a fixed PR for it: https://github.com/operator-framework/operator-lifecycle-manager/pull/1216

Comment 4 Jian Zhang 2020-01-08 02:14:29 UTC
Created attachment 1650563 [details]
PR

Comment 6 Jian Zhang 2020-01-09 03:28:05 UTC
Hi, Luis

Could you help explain the questions from Evan? Thanks!
Actually, I didn't find the "CGO_DEBUG" in https://golang.org/cmd/cgo/, only find the "GODEBUG" flag.

Comment 7 Luis Tomas Bolivar 2020-01-09 10:20:22 UTC
(In reply to Jian Zhang from comment #6)
> Hi, Luis
> 
> Could you help explain the questions from Evan? Thanks!
> Actually, I didn't find the "CGO_DEBUG" in https://golang.org/cmd/cgo/, only
> find the "GODEBUG" flag.

This is what was done on the image registry operator to fix the same issue: 
https://github.com/openshift/cluster-image-registry-operator/commit/1e85fb7cb3e2c7ec4c75f58c2413b645dfc9cabe

Sorry, it is not GODEBUG, but CGO_ENABLED

Comment 8 Jian Zhang 2020-01-10 02:23:45 UTC
Hi, Luis

OK, thanks for your updates. Submit a doc bug for it: https://bugzilla.redhat.com/show_bug.cgi?id=1789657

Hi, Evan

Sorry for this, I submit another PR to remove this "CGO_DEBUG" flag.

Comment 9 Evan Cordell 2020-01-14 14:14:49 UTC
The linked PR merged a while ago - OLM is built with CGO enabled which should allow DNS to resolve.

Comment 11 Zhang Cheng 2020-03-09 12:09:06 UTC
Removed depend on bug 1800633 since it was verified.

Comment 12 Jian Zhang 2020-03-19 07:39:54 UTC
Cluster version is 4.4.0-0.nightly-2020-03-18-011500
mac:~ jianzhang$ oc image info quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e68d59fb08856b9fa14061b1db5fd016c8fdf4c7b81f561ac420a63975bfe2b1
...
               io.openshift.build.commit.id=7dc56de256fea16f84974ca8d1e9108103d9d83a
               io.openshift.build.commit.url=https://github.com/operator-framework/operator-lifecycle-manager/commit/7dc56de256fea16f84974ca8d1e9108103d9d83a


Check the logs of catalog-operator pods, the connection is READY.
...
time="2020-03-19T04:01:02Z" level=info msg="state.Key.Namespace=openshift-operators state.Key.Name=csctestkey state.State=READY"

mac:~ jianzhang$ oc get pods
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-7cfc46bd78-9qqx8   1/1     Running   0          12m
olm-operator-865ff4c4d7-zg89f       1/1     Running   0          19h
packageserver-7c94c85564-ggx9s      1/1     Running   0          29m
packageserver-7c94c85564-s5ffs      1/1     Running   0          32m

Check the TCP DNS, it works well, LGTM, verify it.
mac:~ jianzhang$ oc rsh catalog-operator-7cfc46bd78-xvztq
sh-4.2$ dig  qe-app-registry.openshift-marketplace.svc +tcp 

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> qe-app-registry.openshift-marketplace.svc +tcp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 5562
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;qe-app-registry.openshift-marketplace.svc. IN A

;; AUTHORITY SECTION:
.			30	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2020031900 1800 900 604800 86400

;; Query time: 413 msec
;; SERVER: 172.30.0.10#53(172.30.0.10)
;; WHEN: Thu Mar 19 07:02:14 UTC 2020
;; MSG SIZE  rcvd: 145

sh-4.2$ dig  csctestkey.openshift-marketplace.svc +tcp 

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> csctestkey.openshift-marketplace.svc +tcp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 53682
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;csctestkey.openshift-marketplace.svc. IN A

;; AUTHORITY SECTION:
.			30	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2020031900 1800 900 604800 86400

;; Query time: 13 msec
;; SERVER: 172.30.0.10#53(172.30.0.10)
;; WHEN: Thu Mar 19 07:01:45 UTC 2020
;; MSG SIZE  rcvd: 140

Comment 14 errata-xmlrpc 2020-05-04 11:21:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.