Bug 1733867
Summary: | [IPI] [OSP] All request rely on frontproxy failed due to node name can not be resolved within cluster | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | weiwei jiang <wjiang> |
Component: | Installer | Assignee: | Tomas Sedovic <tsedovic> |
Installer sub component: | openshift-installer | QA Contact: | weiwei jiang <wjiang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | adam.kaplan, eduen, egarcia, ncredi, ppitonak, scuppett, tsedovic, wewang, wzheng, xiuwang, yanpzhan |
Version: | 4.2.0 | Keywords: | TestBlocker |
Target Milestone: | --- | ||
Target Release: | 4.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | DFG:OSasInfra | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-16 06:33:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1733892 |
Description
weiwei jiang
2019-07-29 05:42:09 UTC
Should be resolved once we complete migration away from Service VM. Assign to Tomas to follow up. Checked with 4.2.0-0.nightly-2019-08-15-232721, still not fixed. ➜ ✗ oc run h --image=aosqe/hello-openshift --replicas=6 kubectl run --generator=deploymentconfig/v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. deploymentconfig.apps.openshift.io/h created ➜ ✗ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES h-1-2kd8f 1/1 Running 0 3m27s 10.128.2.13 wjosp0816d-p6jqc-worker-g87lp <none> <none> h-1-8xdcr 1/1 Running 0 3m27s 10.129.2.8 wjosp0816d-p6jqc-worker-np44n <none> <none> h-1-brnkf 1/1 Running 0 3m27s 10.129.2.6 wjosp0816d-p6jqc-worker-np44n <none> <none> h-1-deploy 0/1 Completed 0 3m35s 10.129.2.4 wjosp0816d-p6jqc-worker-np44n <none> <none> h-1-n9lh7 1/1 Running 0 3m27s 10.131.0.14 wjosp0816d-p6jqc-worker-79z2m <none> <none> h-1-sglg8 1/1 Running 0 3m27s 10.129.2.5 wjosp0816d-p6jqc-worker-np44n <none> <none> h-1-zd6hp 1/1 Running 0 3m27s 10.129.2.7 wjosp0816d-p6jqc-worker-np44n <none> <none> ➜ ✗ oc rsh h-1-sglg8 Error from server: error dialing backend: dial tcp: lookup wjosp0816d-p6jqc-worker-np44n on 192.168.0.6:53: no such host This only block all the requests which need to fetch data from workers via openshift-kube-apiserver as a reversed proxy. Like: oc debug oc exec oc rsh oc logs oc rsync oc proxy oc attach oc cp oc port-forward and also for the same features in the web console. So, I ran this from master, and I think I understand what is wrong here. In our current implementation, we expect the user to add certian entries to their dns. In order to make this work, you must attach a floating ip to the `ingress-port`. Please see here for more info: https://github.com/openshift/installer/tree/master/docs/user/openstack#using-floating-ips. Try this and let me know if this fixes your problems. checked with 4.2.0-0.nightly-2019-08-20-043744, we already make wildcard dns for ingress-port to work well, but still got this issue. ➜ ~ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES h-1-2mxfd 1/1 Running 0 2m49s 10.130.2.18 preserve-groupg-4cf4r-worker-thlsr <none> <none> h-1-54vkj 1/1 Running 0 2m49s 10.130.2.16 preserve-groupg-4cf4r-worker-thlsr <none> <none> h-1-deploy 0/1 Completed 0 2m52s 10.128.2.18 preserve-groupg-4cf4r-worker-lp9lk <none> <none> h-1-lqdlm 1/1 Running 0 2m49s 10.128.2.21 preserve-groupg-4cf4r-worker-lp9lk <none> <none> h-1-pbphl 1/1 Running 0 2m49s 10.130.2.17 preserve-groupg-4cf4r-worker-thlsr <none> <none> h-1-rd68j 1/1 Running 0 2m49s 10.128.2.20 preserve-groupg-4cf4r-worker-lp9lk <none> <none> h-1-smqd7 1/1 Running 0 2m49s 10.128.2.19 preserve-groupg-4cf4r-worker-lp9lk <none> <none> ➜ ~ oc logs h-1-54vkj Error from server: Get https://preserve-groupg-4cf4r-worker-thlsr:10250/containerLogs/default/h-1-54vkj/h: dial tcp: lookup preserve-groupg-4cf4r-worker-thlsr on 192.168.0.6:53: no such host The issue here is that, within openshift-kube-apiserver, it can not resolve worker name to ip. The start-build with --from-* parameters to create a binary build failed too + oc start-build openshift-jee-sample-docker --from-file=target/ROOT.war -n u3gjn Uploading file "target/ROOT.war" as binary input for the build ... . Uploading finished Error from server (InternalError): Internal error occurred: error dialing backend: dial tcp: lookup preserve-groupg-4cf4r-worker1-x25qw on 192.168.0.6:53: no such host @weiwei the commit (8343c018c7b99525d2d13299533b7267ca317c48) is now in the `release-4.2` branch but it looks like a nightly with that commit has not been built yet (the latest one I checked was https://openshift-release-artifacts.svc.ci.openshift.org/4.2.0-0.nightly-2019-09-01-224700/ and the commit is not there). Verified on 4.2.0-0.nightly-2019-09-02-172410. ➜ ~ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES h-1-4flbc 1/1 Running 0 12m 10.131.0.14 share-0903a-rngb2-worker-ghhlq <none> <none> h-1-89rtq 1/1 Running 0 12m 10.128.2.12 share-0903a-rngb2-worker-x45bg <none> <none> h-1-8wr45 1/1 Running 0 12m 10.129.2.10 share-0903a-rngb2-worker-h8z46 <none> <none> h-1-d8xvp 1/1 Running 0 12m 10.129.2.9 share-0903a-rngb2-worker-h8z46 <none> <none> h-1-deploy 0/1 Completed 0 12m 10.129.2.8 share-0903a-rngb2-worker-h8z46 <none> <none> h-1-qrqfc 1/1 Running 0 12m 10.131.0.15 share-0903a-rngb2-worker-ghhlq <none> <none> h-1-wcx7d 1/1 Running 0 12m 10.128.2.13 share-0903a-rngb2-worker-x45bg <none> <none> ➜ ~ oc logs -f h-1-d8xvp serving on 8081 serving on 8888 ➜ ~ oc debug pods/h-1-d8xvp Starting pod/h-1-d8xvp-debug ... Pod IP: 10.129.2.11 If you don't see a command prompt, try pressing enter. / # ls bin dev etc hello hello-openshift home proc root run sys tmp usr var ➜ ~ oc exec -it h-1-d8xvp /bin/sh / # id uid=0(root) gid=0(root) groups=10(wheel) ➜ ~ oc rsh h-1-d8xvp / # id uid=0(root) gid=0(root) groups=10(wheel) Verified on 4.2.0-0.nightly-2019-09-02-172410. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |