Bug 1786217
| Summary: | [Kuryr] OLM cannot work since DNS is unavailable | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jian Zhang <jiazha> | ||||
| Component: | OLM | Assignee: | Evan Cordell <ecordell> | ||||
| OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | bandrade, chezhang, dageoffr, hongli, jfan, ltomasbo, wsun | ||||
| Version: | 4.3.0 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.4.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-05-04 11:21:22 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1811131 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
Moving this to 4.4. Will investigate and consider for 4.3.z backport. I submit a fixed PR for it: https://github.com/operator-framework/operator-lifecycle-manager/pull/1216 Created attachment 1650563 [details]
PR
Hi, Luis Could you help explain the questions from Evan? Thanks! Actually, I didn't find the "CGO_DEBUG" in https://golang.org/cmd/cgo/, only find the "GODEBUG" flag. (In reply to Jian Zhang from comment #6) > Hi, Luis > > Could you help explain the questions from Evan? Thanks! > Actually, I didn't find the "CGO_DEBUG" in https://golang.org/cmd/cgo/, only > find the "GODEBUG" flag. This is what was done on the image registry operator to fix the same issue: https://github.com/openshift/cluster-image-registry-operator/commit/1e85fb7cb3e2c7ec4c75f58c2413b645dfc9cabe Sorry, it is not GODEBUG, but CGO_ENABLED Hi, Luis OK, thanks for your updates. Submit a doc bug for it: https://bugzilla.redhat.com/show_bug.cgi?id=1789657 Hi, Evan Sorry for this, I submit another PR to remove this "CGO_DEBUG" flag. The linked PR merged a while ago - OLM is built with CGO enabled which should allow DNS to resolve. Removed depend on bug 1800633 since it was verified. Cluster version is 4.4.0-0.nightly-2020-03-18-011500
mac:~ jianzhang$ oc image info quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e68d59fb08856b9fa14061b1db5fd016c8fdf4c7b81f561ac420a63975bfe2b1
...
io.openshift.build.commit.id=7dc56de256fea16f84974ca8d1e9108103d9d83a
io.openshift.build.commit.url=https://github.com/operator-framework/operator-lifecycle-manager/commit/7dc56de256fea16f84974ca8d1e9108103d9d83a
Check the logs of catalog-operator pods, the connection is READY.
...
time="2020-03-19T04:01:02Z" level=info msg="state.Key.Namespace=openshift-operators state.Key.Name=csctestkey state.State=READY"
mac:~ jianzhang$ oc get pods
NAME READY STATUS RESTARTS AGE
catalog-operator-7cfc46bd78-9qqx8 1/1 Running 0 12m
olm-operator-865ff4c4d7-zg89f 1/1 Running 0 19h
packageserver-7c94c85564-ggx9s 1/1 Running 0 29m
packageserver-7c94c85564-s5ffs 1/1 Running 0 32m
Check the TCP DNS, it works well, LGTM, verify it.
mac:~ jianzhang$ oc rsh catalog-operator-7cfc46bd78-xvztq
sh-4.2$ dig qe-app-registry.openshift-marketplace.svc +tcp
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> qe-app-registry.openshift-marketplace.svc +tcp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 5562
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;qe-app-registry.openshift-marketplace.svc. IN A
;; AUTHORITY SECTION:
. 30 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2020031900 1800 900 604800 86400
;; Query time: 413 msec
;; SERVER: 172.30.0.10#53(172.30.0.10)
;; WHEN: Thu Mar 19 07:02:14 UTC 2020
;; MSG SIZE rcvd: 145
sh-4.2$ dig csctestkey.openshift-marketplace.svc +tcp
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> csctestkey.openshift-marketplace.svc +tcp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 53682
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;csctestkey.openshift-marketplace.svc. IN A
;; AUTHORITY SECTION:
. 30 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2020031900 1800 900 604800 86400
;; Query time: 13 msec
;; SERVER: 172.30.0.10#53(172.30.0.10)
;; WHEN: Thu Mar 19 07:01:45 UTC 2020
;; MSG SIZE rcvd: 140
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |
> Due to octavia not supporting UDP on OSP13, DNS resolution must use TCP and that is why we use thie use-vc configuration option (that should be added automatically for every new created pod). However there is a few limitations documented here: https://docs.openshift.com/container-platform/4.2/installing/installing_openstack/installing-openstack-installer-kuryr.html#installation-osp-kuryr-known-limitations_installing-openstack-installer-kuryr So, if your base image is based on alpine/busybox or if it is a go application compiled with CGO_DEBUG, that use-vc flag is not enforced and still the dns resolution goes through UDP, hence failing. The above comments from Luis. And, move on this issue to OLM component.