Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2004594

Summary:	Builds fail to resolve github.com due to ndots
Product:	OpenShift Container Platform	Reporter:	Stephen Reaves <reaves735>
Component:	Networking	Assignee:	Miheer Salunke <misalunk>
Networking sub component:	DNS	QA Contact:	Melvin Joseph <mjoseph>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	low
Priority:	low	CC:	aos-bugs, hongli, misalunk, mmasters
Version:	4.8
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-05-06 14:57:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Stephen Reaves 2021-09-15 15:58:21 UTC

Description of problem:

Running s2i builds fail pulling from github.com

OpenShift release version:

Client Version: 4.6.6
Server Version: 4.8.10
Kubernetes Version: v1.21.1+9807387

Cluster Platform:

Openshift on oVirt

How reproducible:

Everytime

Steps to Reproduce (in detail):
1. Create new app pulling from public github repo
2. start build
3.


Actual results:

Cloning "https://github.com/user/repo" ...
error: fatal: unable to access 'https://github.com/user/repo/': SSL certificate problem: self signed certificate in certificate chain

Expected results:

Build succeeds


Impact of the problem:

Cannot update app.  App still runs on an older version


Additional info:

This build has worked, but is now failing and I'm not sure why.

Build Config:

```yaml
spec:
  nodeSelector: null
  output:
    to:
      kind: ImageStreamTag
      name: 'portfolio:latest'
  resources: {}
  successfulBuildsHistoryLimit: 5
  failedBuildsHistoryLimit: 5
  strategy:
    type: Source
    sourceStrategy:
      from:
        kind: ImageStreamTag
        namespace: openshift
        name: 'python:3.8-ubi7'
  postCommit: {}
  source:
    type: Git
    git:
      uri: 'https://github.com/<user>/<project>'
    contextDir: /
    sourceSecret:
      name: github-ssh-private-key
  triggers:
    - type: Generic
      generic:
        secretReference:
          name: portfolio-generic-webhook-secret
    - type: GitHub
      github:
        secretReference:
          name: portfolio-github-webhook-secret
    - type: ImageChange
      imageChange:
        lastTriggeredImageID: >-
          image-registry.openshift-image-registry.svc:5000/openshift/python@sha256:9caeaafa9409cae8a5ba72f7c02e94978c0fbed85753470d265777aa6d2281fa
    - type: ConfigChange
  runPolicy: Serial
```

Build Logs (--build-loglevel=5):

```
I0908 12:22:02.744117       1 builder.go:420] openshift-builder 4.8.0-202108130208.p0.git.b59f2b3.assembly.stream-b59f2b3
I0908 12:22:02.744217       1 builder.go:420] Powered by buildah v1.20.1
I0908 12:22:02.748729       1 builder.go:421] redacted build: {"kind":"Build","apiVersion":"build.openshift.io/v1","metadata":{"name":"portfolio-29","namespace":"homelab-main","uid":"6f1c30a2-adea-4972-9a3f-2da1e5f8e9b8","resourceVersion":"130395602","generation":1,"creationTimestamp":"2021-09-08T12:21:56Z","labels":{"app":"portfolio","app.kubernetes.io/component":"portfolio","app.kubernetes.io/instance":"portfolio","app.kubernetes.io/name":"python","app.kubernetes.io/part-of":"portfolio-app","app.openshift.io/runtime":"python","app.openshift.io/runtime-version":"3.8-ubi7","buildconfig":"portfolio","openshift.io/build-config.name":"portfolio","openshift.io/build.start-policy":"Serial"},"annotations":{"openshift.io/build-config.name":"portfolio","openshift.io/build.number":"29"},"ownerReferences":[{"apiVersion":"build.openshift.io/v1","kind":"BuildConfig","name":"portfolio","uid":"ffaf7264-6669-47e9-b54c-0c2f285e8259","controller":true}],"managedFields":[{"manager":"openshift-apiserver","operation":"Update"...
I0908 12:22:02.749793       1 scmauths.go:61] Finding auth for "..2021_09_08_12_21_58.218620292"
I0908 12:22:02.749825       1 scmauths.go:61] Finding auth for "..data"
I0908 12:22:02.749832       1 scmauths.go:61] Finding auth for "ssh-privatekey"
I0908 12:22:02.749840       1 scmauths.go:61] Found SCMAuth "ssh-privatekey" to handle "ssh-privatekey"
I0908 12:22:02.749849       1 scmauths.go:67] Setting up SCMAuth "ssh-privatekey"
I0908 12:22:02.750090       1 scmauths.go:46] source secret dir /var/run/secrets/openshift.io/source has file ..2021_09_08_12_21_58.218620292
I0908 12:22:02.750107       1 scmauths.go:46] source secret dir /var/run/secrets/openshift.io/source has file ..data
I0908 12:22:02.750111       1 scmauths.go:46] source secret dir /var/run/secrets/openshift.io/source has file ssh-privatekey
I0908 12:22:02.750119       1 scmauths.go:46] Adding Private SSH Auth:
#!/bin/sh
ssh -i /var/run/secrets/openshift.io/source/ssh-privatekey -o StrictHostKeyChecking=false "$@"

Cloning "https://github.com/<user>/<project>" ...
I0908 12:22:02.750285       1 source.go:237] git ls-remote --heads https://github.com/<user>/<project>
I0908 12:22:02.750312       1 repository.go:450] Executing git ls-remote --heads https://github.com/<user>/<project>
I0908 12:22:02.890618       1 repository.go:541] Error executing command: exit status 128
I0908 12:22:02.890729       1 source.go:237] fatal: unable to access 'https://github.com/<user>/<project>/': SSL certificate problem: self signed certificate in certificate chain
error: fatal: unable to access 'https://github.com/<user>/<project>/': SSL certificate problem: self signed certificate in certificate chain
```

I think it has something to do with dns or ndots because when I debug the current running pod (from a previous build with the same config), github does not resolve correctly until I change ndots.  More specifically, it returns  the default internal openshift 'Application is not available' page until I change ndots.

```
> oc debug deployment/portfolio --image=fedora
Starting pod/portfolio-debug ...
Pod IP: 10.129.2.35
If you don't see a command prompt, try pressing enter.
sh-5.1# curl github.com
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <style type="text/css">
      body {
        font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
        line-height: 1.66666667;
        font-size: 16px;
        color: #333;
        background-color: #fff;
        margin: 2em 1em;
      }
      h1 {
        font-size: 28px;
        font-weight: 400;
      }
      p {
        margin: 0 0 10px;
      }
      .alert.alert-info {
        background-color: #F0F0F0;
        margin-top: 30px;
        padding: 30px;
      }
      .alert p {
        padding-left: 35px;
      }
      ul {
        padding-left: 51px;
        position: relative;
      }
      li {
        font-size: 14px;
        margin-bottom: 1em;
      }
      p.info {
        position: relative;
        font-size: 20px;
      }
      p.info:before, p.info:after {
        content: "";
        left: 0;
        position: absolute;
        top: 0;
      }
      p.info:before {
        background: #0066CC;
        border-radius: 16px;
        color: #fff;
        content: "i";
        font: bold 16px/24px serif;
        height: 24px;
        left: 0px;
        text-align: center;
        top: 4px;
        width: 24px;
      }

      @media (min-width: 768px) {
        body {
          margin: 6em;
        }
      }
    </style>
  </head>
  <body>
    <div>
      <h1>Application is not available</h1>
      <p>The application is currently not serving requests at this endpoint. It may not have been started or is still starting.</p>

      <div class="alert alert-info">
        <p class="info">
          Possible reasons you are seeing this page:
        </p>
        <ul>
          <li>
            <strong>The host doesn't exist.</strong>
            Make sure the hostname was typed correctly and that a route matching this hostname exists.
          </li>
          <li>
            <strong>The host exists, but doesn't have a matching path.</strong>
            Check if the URL path was typed correctly and that the route was created using the desired path.
          </li>
          <li>
            <strong>Route and path matches, but all pods are down.</strong>
            Make sure that the resources exposed by this route (pods, services, deployment configs, etc) have at least one pod running.
          </li>
        </ul>
      </div>
    </div>
  </body>
</html>
sh-5.1# cat /etc/resolv.conf
search homelab-main.svc.cluster.local svc.cluster.local cluster.local domain.com openshift.domain.com
nameserver 172.30.0.10
options ndots:5
sh-5.1# cp /etc/resolv.conf myResolv
sh-5.1# vi myResolv
sh-5.1# cat /etc/resolv.conf
search homelab-main.svc.cluster.local svc.cluster.local cluster.local domain.com openshift.domain.com
nameserver 172.30.0.10
options ndots:5
sh-5.1# cat /myResolv
search homelab-main.svc.cluster.local svc.cluster.local cluster.local domain.com openshift.domain.com
nameserver 172.30.0.10
options ndots:1
sh-5.1# cp myResolv /etc/resolv.conf
sh-5.1# cat /etc/resolv.conf
search homelab-main.svc.cluster.local svc.cluster.local cluster.local domain.com openshift.domain.com
nameserver 172.30.0.10
options ndots:1
sh-5.1# curl github.com
sh-5.1# curl https://raw.githubusercontent.com/<user>/<project>/master/README.md
<div align="center"><img src="/scrots/d.png" align="center"/></div>
# ...
# ... It returned the expected info
# ...
sh-5.1#
```



** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 1 Miheer Salunke 2021-09-15 16:08:23 UTC

Did you follow https://docs.openshift.com/container-platform/4.8/cicd/builds/creating-build-inputs.html#builds-gitconfig-file_creating-build-inputs ?

Comment 2 Stephen Reaves 2021-09-15 16:45:18 UTC

I did not.  I don't think it's a git or bc issue since the build config used to work.  I simply created the s2i via the webconsole and it pulled and built fine.  This was created on a 4.6 cluster if that means anything.  When I attach to the current running pod, curling github.com returns the local "Application not found" page unless I change ndots to 1.

I also tried to create a new s2i build from the same repo and I get this error in web ui: "URL is valid but cannot be reached. If this is a private repository, enter a source Secret in advanced Git options".  This is even after adding the ssh secret.

Comment 3 Miciah Dashiel Butler Masters 2021-09-16 16:13:40 UTC

This looks like a DNS configuration issue.  Do you have a wildcard DNS record for *.domain.com or *.openshift.domain.com?

Comment 4 Stephen Reaves 2021-09-16 17:04:57 UTC

I do. Through Google Domains I have an A record for domain.com pointing to my clusters IP and a CNAME record for *.domain.com pointing to domain.com.  

I do have a pihole as a local dns and the hosts are configured to use that as their main name server, which then forwards to 1.1.1.1 and 1.0.0.2 I believe.  But I don't think that would be an issue since everything else on my network can resolve github.com

Comment 5 Miciah Dashiel Butler Masters 2021-09-16 18:57:00 UTC

You do need a wildcard DNS record for the ingress domain, but this is usually going to be *.apps.<cluster domain> (in your case, I assume that would be "*.apps.openshift.domain.com").  By having domain.com in your search path in /etc/resolv.conf and a wildcard record for *.domain.com, applications inside the cluster will resolve any domain name with 5 or few dots using the wildcard record.  Would it be possible either to remove or change the wildcard record or remove domain.com from the search path?

Comment 6 Stephen Reaves 2021-09-16 20:36:56 UTC

Are you talking about changing the /etc/resolv.conf on the nodes themselves?  Those were auto generated using IPI so I'd have to create a machine config to overwrite them.  Or are you talking about changing them on the pod itself?  Because I don't know how to change the /etc/resolv.conf on the build pods

Comment 7 Miciah Dashiel Butler Masters 2021-09-20 13:02:24 UTC

I meant /etc/resolv.conf on the nodes themselves.  Most likely the search path that is in /etc/resolv.conf came from DHCP; can you change your DHCP server not to include domain.com in the search-domain list that it sends to clients?  

If domain.com (the domain for which you have a *.domain.com wildcard DNS record) is the cluster's domain, then that might not be feasible; in that case, you probably need to remove the wildcard DNS record or use a different domain for the cluster.

Comment 8 Stephen Reaves 2021-09-20 18:40:15 UTC

I took it out of my DHCP, its not in the /etc/resolv.conf of the node or the pod and I'm getting the same results.

Pod resolv.conf in the same namespace:

```
search homelab-main.svc.cluster.local svc.cluster.local cluster.local openshift.domain.com
nameserver 172.30.0.10
options ndots:5
```

Node resolv.conf:

```
# Generated by KNI resolv prepender NM dispatcher script
search   openshift.reaves.dev
nameserver 192.168.0.221
nameserver 192.168.0.22
```

I also don't see how that would change anything anyway.  Github.com and github.domain.com and even github.com.domain.com all have less than 5 dots, so they should all be treated the same, right?  Changing ndots is the only thing that's changed the outcome for me, but I can't do that on a build, only on the pod after a build...

Comment 9 Stephen Reaves 2021-10-07 16:30:13 UTC

Any update on this?

Comment 10 Stephen Reaves 2021-11-30 15:58:33 UTC

It's been over two months since I've heard any update from any dev, my cluster is now on 4.9.8 (about to upgrade to 4.9.9) and I am still having this same issue.  Is there anybody who can help take a look at this?  My other (non-openshift) machines can access github just fine

Comment 11 Stephen Reaves 2022-04-04 13:16:37 UTC

I migrated this cluster off of oVirt and onto BareMetal (partially because of this ticket, but also because RHEV is being dropped) and I decided to use the Assisted Installer.  During the install process, the Assisted Installer pointed out that I had a wildcard subdomain on my tld (i.e. '*.domain.com') and it wouldn't let me continue until I changed that.  After removing that wildcard, the install went smoothly (big fan of the assisted installer btw) and builds were working.  Just to test, I put the wildcard back and I was seeing the same issues as before.  It seems fine to have '*.apps.cluster.domain.com' and '*.api.cluster.domain.com' but if I'm adding a route above those I need to manually add a specific cname for that specific route.  So if I was adding Nextcloud, I could leave it as 'nextcloud.apps.cluster.domain.com' and everything works fine, but that's quite an ugly url imo, so I manually add 'nextcloud.domain.com' to the DNS, then the route works as expected and builds aren't mysteriously broken.

TL;DR: Wildcard subdomains break lots of things.

If somebody can link some documentation saying wildcard subdomains aren't supported in OpenShift then I'd be comfortable closing this ticket.

-smr

Comment 12 Miheer Salunke 2022-05-04 02:46:26 UTC

(In reply to Stephen Reaves from comment #11)
> I migrated this cluster off of oVirt and onto BareMetal (partially because
> of this ticket, but also because RHEV is being dropped) and I decided to use
> the Assisted Installer.  During the install process, the Assisted Installer
> pointed out that I had a wildcard subdomain on my tld (i.e. '*.domain.com')
> and it wouldn't let me continue until I changed that.  After removing that
> wildcard, the install went smoothly (big fan of the assisted installer btw)
> and builds were working.  Just to test, I put the wildcard back and I was
> seeing the same issues as before.

Put wildcard over where ?

  It seems fine to have
> '*.apps.cluster.domain.com' and '*.api.cluster.domain.com' but if I'm adding
> a route above those I need to manually add a specific cname for that
> specific route.  So if I was adding Nextcloud, I could leave it as
> 'nextcloud.apps.cluster.domain.com' and everything works fine, but that's
> quite an ugly url imo, so I manually add 'nextcloud.domain.com' to the DNS,
> then the route works as expected and builds aren't mysteriously broken.
> 


The domain or subdomain of the  domain present in oc get ingress.config.openshift.io/cluster -o yaml can be used for setting ingress controllers domain.

Also ingress operator does not automatically create dns records(oc get dnsrecords -n openshift-ingress-operator -o yaml) in the DNS for baremetal(it only creates for AWS GCP Azure) so that is why you will need to configure your DNS.


> TL;DR: Wildcard subdomains break lots of things.
> 
> If somebody can link some documentation saying wildcard subdomains aren't
> supported in OpenShift then I'd be comfortable closing this ticket.
> 
> -smr

Comment 13 Miheer Salunke 2022-05-06 14:55:46 UTC

Hi,

Did you get a chance to check my update ?

Thanks and regards,
Miheer

Comment 14 Miheer Salunke 2022-05-06 14:57:57 UTC

Closing this ticket. Please reopen if needed.

Comment 15 Red Hat Bugzilla 2023-09-15 01:36:05 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days