Bug 2004594
| Summary: | Builds fail to resolve github.com due to ndots | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Stephen Reaves <reaves735> |
| Component: | Networking | Assignee: | Miheer Salunke <misalunk> |
| Networking sub component: | DNS | QA Contact: | Melvin Joseph <mjoseph> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | aos-bugs, hongli, misalunk, mmasters |
| Version: | 4.8 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-05-06 14:57:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Stephen Reaves
2021-09-15 15:58:21 UTC
Did you follow https://docs.openshift.com/container-platform/4.8/cicd/builds/creating-build-inputs.html#builds-gitconfig-file_creating-build-inputs ? I did not. I don't think it's a git or bc issue since the build config used to work. I simply created the s2i via the webconsole and it pulled and built fine. This was created on a 4.6 cluster if that means anything. When I attach to the current running pod, curling github.com returns the local "Application not found" page unless I change ndots to 1. I also tried to create a new s2i build from the same repo and I get this error in web ui: "URL is valid but cannot be reached. If this is a private repository, enter a source Secret in advanced Git options". This is even after adding the ssh secret. This looks like a DNS configuration issue. Do you have a wildcard DNS record for *.domain.com or *.openshift.domain.com? I do. Through Google Domains I have an A record for domain.com pointing to my clusters IP and a CNAME record for *.domain.com pointing to domain.com. I do have a pihole as a local dns and the hosts are configured to use that as their main name server, which then forwards to 1.1.1.1 and 1.0.0.2 I believe. But I don't think that would be an issue since everything else on my network can resolve github.com You do need a wildcard DNS record for the ingress domain, but this is usually going to be *.apps.<cluster domain> (in your case, I assume that would be "*.apps.openshift.domain.com"). By having domain.com in your search path in /etc/resolv.conf and a wildcard record for *.domain.com, applications inside the cluster will resolve any domain name with 5 or few dots using the wildcard record. Would it be possible either to remove or change the wildcard record or remove domain.com from the search path? Are you talking about changing the /etc/resolv.conf on the nodes themselves? Those were auto generated using IPI so I'd have to create a machine config to overwrite them. Or are you talking about changing them on the pod itself? Because I don't know how to change the /etc/resolv.conf on the build pods I meant /etc/resolv.conf on the nodes themselves. Most likely the search path that is in /etc/resolv.conf came from DHCP; can you change your DHCP server not to include domain.com in the search-domain list that it sends to clients? If domain.com (the domain for which you have a *.domain.com wildcard DNS record) is the cluster's domain, then that might not be feasible; in that case, you probably need to remove the wildcard DNS record or use a different domain for the cluster. I took it out of my DHCP, its not in the /etc/resolv.conf of the node or the pod and I'm getting the same results. Pod resolv.conf in the same namespace: ``` search homelab-main.svc.cluster.local svc.cluster.local cluster.local openshift.domain.com nameserver 172.30.0.10 options ndots:5 ``` Node resolv.conf: ``` # Generated by KNI resolv prepender NM dispatcher script search openshift.reaves.dev nameserver 192.168.0.221 nameserver 192.168.0.22 ``` I also don't see how that would change anything anyway. Github.com and github.domain.com and even github.com.domain.com all have less than 5 dots, so they should all be treated the same, right? Changing ndots is the only thing that's changed the outcome for me, but I can't do that on a build, only on the pod after a build... Any update on this? It's been over two months since I've heard any update from any dev, my cluster is now on 4.9.8 (about to upgrade to 4.9.9) and I am still having this same issue. Is there anybody who can help take a look at this? My other (non-openshift) machines can access github just fine I migrated this cluster off of oVirt and onto BareMetal (partially because of this ticket, but also because RHEV is being dropped) and I decided to use the Assisted Installer. During the install process, the Assisted Installer pointed out that I had a wildcard subdomain on my tld (i.e. '*.domain.com') and it wouldn't let me continue until I changed that. After removing that wildcard, the install went smoothly (big fan of the assisted installer btw) and builds were working. Just to test, I put the wildcard back and I was seeing the same issues as before. It seems fine to have '*.apps.cluster.domain.com' and '*.api.cluster.domain.com' but if I'm adding a route above those I need to manually add a specific cname for that specific route. So if I was adding Nextcloud, I could leave it as 'nextcloud.apps.cluster.domain.com' and everything works fine, but that's quite an ugly url imo, so I manually add 'nextcloud.domain.com' to the DNS, then the route works as expected and builds aren't mysteriously broken. TL;DR: Wildcard subdomains break lots of things. If somebody can link some documentation saying wildcard subdomains aren't supported in OpenShift then I'd be comfortable closing this ticket. -smr (In reply to Stephen Reaves from comment #11) > I migrated this cluster off of oVirt and onto BareMetal (partially because > of this ticket, but also because RHEV is being dropped) and I decided to use > the Assisted Installer. During the install process, the Assisted Installer > pointed out that I had a wildcard subdomain on my tld (i.e. '*.domain.com') > and it wouldn't let me continue until I changed that. After removing that > wildcard, the install went smoothly (big fan of the assisted installer btw) > and builds were working. Just to test, I put the wildcard back and I was > seeing the same issues as before. Put wildcard over where ? It seems fine to have > '*.apps.cluster.domain.com' and '*.api.cluster.domain.com' but if I'm adding > a route above those I need to manually add a specific cname for that > specific route. So if I was adding Nextcloud, I could leave it as > 'nextcloud.apps.cluster.domain.com' and everything works fine, but that's > quite an ugly url imo, so I manually add 'nextcloud.domain.com' to the DNS, > then the route works as expected and builds aren't mysteriously broken. > The domain or subdomain of the domain present in oc get ingress.config.openshift.io/cluster -o yaml can be used for setting ingress controllers domain. Also ingress operator does not automatically create dns records(oc get dnsrecords -n openshift-ingress-operator -o yaml) in the DNS for baremetal(it only creates for AWS GCP Azure) so that is why you will need to configure your DNS. > TL;DR: Wildcard subdomains break lots of things. > > If somebody can link some documentation saying wildcard subdomains aren't > supported in OpenShift then I'd be comfortable closing this ticket. > > -smr Hi, Did you get a chance to check my update ? Thanks and regards, Miheer Closing this ticket. Please reopen if needed. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |