Bug 1880865 - oc login with option --insecure-skip-tls-verify=true should not fail with error: net/http: TLS handshake timeout
Summary: oc login with option --insecure-skip-tls-verify=true should not fail with err...
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.5
Hardware: All
OS: Mac OS
medium
medium
Target Milestone: ---
: ---
Assignee: Mike Dame
QA Contact: zhou ying
URL:
Whiteboard: LifecycleReset
: 1917756 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-20 19:53 UTC by Franck Boudinet
Modified: 2021-09-27 16:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-12 15:56:47 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 746 0 None None None 2021-02-22 14:33:45 UTC

Description Franck Boudinet 2020-09-20 19:53:22 UTC
Description of problem:
Cannot login to an OCP 4.x OCP cluster (reproduced on 4.2.x & 4.3.x) with --insecure-skip-tls-verify=true from MacOS with keychain containing more than 250 certificates with trust policies

Version-Release number of selected component (if applicable):
oc Client Version 4.5.2 or above

How reproducible:

Steps to Reproduce:
1.Load around 250 certificates with Trust Policy in your MacOS keychain
2.oc login -u kubeadmin -p <mypwd> api.myocp.mydomain.local:6443 --insecure-skip-tls-verify=true

Actual results:
Login fails with error message: "error: net/http: TLS handshake timeout"

Expected results:
Login should succeed. As the user has explicitly asked for server certificates not to be checked by using the option: '--insecure-skip-tls-verify=true', the error above which is coming from an attempt to verify server certificate should not occur. 

Additional info:

- See the following url for an analysis of this bug (part1 of the issue) https://github.com/openshift/oc/issues/496
- See the following url for a proposed PR to fix this bug: https://github.com/openshift/oc/pull/514

Comment 1 Maciej Szulik 2020-10-01 14:50:23 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 2 Michal Fojtik 2020-10-20 20:12:06 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 3 Franck Boudinet 2020-11-07 13:26:43 UTC
Hi Guys, 

I saw it was parked for 4.7 so I stopped pushing for the fix. The alternative I'm using is to compile my own version of oc to benefit from the fix... 

Cheers.

Comment 4 Michal Fojtik 2020-11-07 14:12:13 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 5 Maciej Szulik 2020-11-09 13:36:55 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 6 Franck Boudinet 2021-01-12 12:50:28 UTC
Hello
& happy new year, all the best for 2021 ! 

I see that you're re-opening this, are u willing/planning to get it integrated in 4.7 or am I getting it wrong ?
I had closed the related issue & PR as I had understood from previous comment in the PR.

Thx.

Comment 7 Novonil Choudhuri 2021-01-12 15:06:28 UTC
Hello, Happy New year to you too. I am happy to have it fixed in 4.7 and backported to 4.6 as well. However i would like to know if there is any workaround available as of now.

Comment 10 Franck Boudinet 2021-01-13 08:32:56 UTC
@Novonil,

Don't know if you looked at the analysis of the problem and in particular the inputs provided in the PR: https://github.com/openshift/oc/pull/514 on Sept 9, 2020. The reason of the problem is that even if the user selects the option "--insecure-skip-tls-verify=true" oc makes a first https request to the cluster and tries to validate the certificate (this validation should not happen, as per the request of the user adding the option insecure-skip-tls-verify=true and this is actually the reason for this bug report) which can be fixed with exactly one line of code.

The root cause of the bug is that on mac, certificates verification implemented in the go libraries can be very long and actually as written in the PR updates from Sep 9 (https://github.com/openshift/oc/pull/514#issuecomment-689348366): "...the duration of the checking is mostly coming from certificates with trust policies..."

- So one workaround can be to delete some (unused) certificates with trust policies in the keychain of the end user's mac so that the checking does not take so long, so the timeout does not occur and the connection can be made. 

Two other workarounds are also mentioned in the initial issue that lead to the PR mentioned above and to this bug report, see here: https://github.com/openshift/oc/issues/496
- do a 'sudo oc ...' from the mac, in this case the certificates verification only applies to a subset of the certificates in the keychain (the ones owned by root) and the timeout does not occur(post from July 22nd https://github.com/openshift/oc/issues/496#issuecomment-662164068)
- or add a cacert.pem file as parameter when invoking oc  (even a wrong one) to pass the "hurdle" of the first HEAD https request that fails  and then continue by selecting 'Use insecure connections? (y/n): y' (See here https://github.com/openshift/oc/issues/496#issuecomment-662177056)

Hope this helps.

Comment 11 Maciej Szulik 2021-01-13 11:46:06 UTC
Lowering the priority to reflect the customer case.

Comment 12 Novonil Choudhuri 2021-01-13 15:49:21 UTC
Thanks Frank. We tried the following but still it did not work. We will try the others and update here. 

oc login -u kubeadmin -p ##### --server=https://api.ocp1.example.com:6443 --certificate-authority=./ca.pem --loglevel=9
I0112 14:09:19.747241   11299 loader.go:375] Config loaded from file:  /Users/a600000/.kube/config
I0112 14:09:19.747911   11299 round_trippers.go:423] curl -k -v -XHEAD  'https://api.ocp1.example.com:6443/'
I0112 14:09:29.912408   11299 round_trippers.go:443] HEAD https://api.ocp1.example.com:6443/  in 10164 milliseconds
I0112 14:09:29.912478   11299 round_trippers.go:449] Response Headers:
F0112 14:09:29.912563   11299 helpers.go:115] error: net/http: TLS handshake timeout'

Comment 15 Franck Boudinet 2021-01-14 13:42:39 UTC
OK...

do you have the problem at a customer site, or are you trying to reproduce the problem on your test system ?
Have you tried:

GODEBUG=x509roots=1 oc login -u kubeadmin -p <myPassword>  api.<myCluster>.<myDomain>:6443

to make sure this is the same problem ?

Comment 16 Novonil Choudhuri 2021-01-15 15:41:25 UTC
(In reply to Franck Boudinet from comment #15)
> OK...
> 
> do you have the problem at a customer site, or are you trying to reproduce
> the problem on your test system ?
> Have you tried:
> 
> GODEBUG=x509roots=1 oc login -u kubeadmin -p <myPassword> 
> api.<myCluster>.<myDomain>:6443
> 
> to make sure this is the same problem ?

Frank the problem is with customer site. I do not have a mac to test this.

Comment 18 Maciej Szulik 2021-01-18 12:45:42 UTC
Reading through https://github.com/golang/go/issues/18203#issuecomment-273005127 I found 
a suggestion to try CGO, we have another issue with mac + vpn (https://bugzilla.redhat.com/show_bug.cgi?id=1838512)
for which we have an open fix https://github.com/openshift/oc/pull/644.
Unfortunately, we can't merge that yet, see the PR comments. 

Can either of you verify if compiling oc with CGO_ENABLED=1 helps with this issue?

Comment 19 Maciej Szulik 2021-01-20 11:02:57 UTC
A similar issue popped up in https://bugzilla.redhat.com/show_bug.cgi?id=1917756

Comment 20 Maciej Szulik 2021-01-20 11:03:12 UTC
*** Bug 1917756 has been marked as a duplicate of this bug. ***

Comment 21 Novonil Choudhuri 2021-01-20 15:42:16 UTC
(In reply to Maciej Szulik from comment #18)
> Reading through
> https://github.com/golang/go/issues/18203#issuecomment-273005127 I found 
> a suggestion to try CGO, we have another issue with mac + vpn
> (https://bugzilla.redhat.com/show_bug.cgi?id=1838512)
> for which we have an open fix https://github.com/openshift/oc/pull/644.
> Unfortunately, we can't merge that yet, see the PR comments. 
> 
> Can either of you verify if compiling oc with CGO_ENABLED=1 helps with this
> issue?


Hello,

I don't see any difference. It still fails with the same error.

CGO_ENABLED=1
oc login --token=##### --server=https://api.os4.xxx.yyy.com:6443 --loglevel=10
I0120 09:32:09.274984   84808 loader.go:375] Config loaded from file:  /Users/a6002246/.kube/config
I0120 09:32:09.275441   84808 round_trippers.go:423] curl -k -v -XHEAD  'https://api.os4.xxx.yyy.com:6443/'
I0120 09:32:20.029254   84808 round_trippers.go:443] HEAD https://api.os4.xxx.yyy.com:6443/  in 10753 milliseconds
I0120 09:32:20.029311   84808 round_trippers.go:449] Response Headers:
F0120 09:32:20.029379   84808 helpers.go:115] error: net/http: TLS handshake timeout
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc00000e001, 0xc000639d40, 0x55, 0xb7)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x53defc0, 0xc000000003, 0x0, 0x0, 0xc0015b44d0, 0x5183380, 0xa, 0x73, 0x100b300)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:945 +0x191
k8s.io/klog/v2.(*loggingT).printDepth(0x53defc0, 0x3, 0x0, 0x0, 0x2, 0xc00123f5e0, 0x1, 0x1)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:718 +0x165
k8s.io/klog/v2.FatalDepth(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1449
k8s.io/kubectl/pkg/cmd/util.fatal(0xc0005b80f0, 0x26, 0x1)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:93 +0x1f0
k8s.io/kubectl/pkg/cmd/util.checkErr(0x3e29860, 0x54137c0, 0x3b84f40)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:188 +0x945
k8s.io/kubectl/pkg/cmd/util.CheckErr(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:115
github.com/openshift/oc/pkg/cli/login.NewCmdLogin.func1(0xc00110edc0, 0xc0015819b0, 0x0, 0x3)
        /go/src/github.com/openshift/oc/pkg/cli/login/login.go:76 +0x5af
github.com/spf13/cobra.(*Command).execute(0xc00110edc0, 0xc001581980, 0x3, 0x3, 0xc00110edc0, 0xc001581980)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:846 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xc00110eb00, 0x2, 0xc00110eb00, 0x2)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:950 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:887
main.main()
        /go/src/github.com/openshift/oc/cmd/oc/oc.go:110 +0x885

goroutine 6 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x53defc0)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1131 +0x8b
created by k8s.io/klog/v2.init.0
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:416 +0xd8

goroutine 49 [select]:
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x3b84e68, 0x3e24d60, 0xc0005a2000, 0x1, 0xc000060420)
        /go/src/github.com/openshift/oc/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:167 +0x149
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x3b84e68, 0x12a05f200, 0x0, 0xc000d05101, 0xc000060420)
        /go/src/github.com/openshift/oc/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
k8s.io/apimachinery/pkg/util/wait.Forever(0x3b84e68, 0x12a05f200)
        /go/src/github.com/openshift/oc/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:81 +0x4f
created by k8s.io/component-base/logs.InitLogs
        /go/src/github.com/openshift/oc/vendor/k8s.io/component-base/logs/logs.go:58 +0x8a

goroutine 50 [select]:
io.(*pipe).Read(0xc001391920, 0xc0005ae000, 0x1000, 0x1000, 0x349ec20, 0x1, 0xc0005ae000)
        /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/io/pipe.go:57 +0xe7
io.(*PipeReader).Read(0xc0002fa250, 0xc0005ae000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/io/pipe.go:134 +0x4c
bufio.(*Scanner).Scan(0xc00147e400, 0x0)
        /opt/rh/go-toolset-1.15/root/usr/lib/go-toolset-1.15-golang/src/bufio/scan.go:214 +0xa9
github.com/openshift/oc/pkg/cli/admin/mustgather.newPrefixWriter.func1(0xc00147e400, 0x3e26720, 0xc00000e018, 0x39a8830, 0x17)
        /go/src/github.com/openshift/oc/pkg/cli/admin/mustgather/mustgather.go:424 +0x13e
created by github.com/openshift/oc/pkg/cli/admin/mustgather.newPrefixWriter
        /go/src/github.com/openshift/oc/pkg/cli/admin/mustgather/mustgather.go:423 +0x1d0

Comment 22 Franck Boudinet 2021-01-21 08:15:46 UTC
Hi Novonil,

did you try with export GODEBUG=x509roots=1 to really make sure that the problem you're facing is similar to the one I described in my issue ? You should see a list of certificate being one by one checkded and at some point the timeout occuring as they are too many.

Also may I ask which version of oc you're using ? I had put a lot of traces at the time when I investigated, I can have a look if I still have the VM where I did the cross compile to generate the oc mac version with these traces & provided it to you if that helps.

Franck

Comment 23 Novonil Choudhuri 2021-01-21 15:30:46 UTC
hi Frank,

Yes tried this already and it is failing too. 

GODEBUG=x509roots=1 oc login --token=### --server=https://api.os4.xxx.yyy.com:6443 --loglevel=9
I0115 09:45:13.504639   84481 loader.go:375] Config loaded from file:  /Users/000000/.kube/config
I0115 09:45:13.506198   84481 round_trippers.go:423] curl -k -v -XHEAD  'https://api.os4.xxx.yyy.com:6443/'
I0115 09:45:24.032962   84481 round_trippers.go:443] HEAD https://api.os4.test.xxx.yyy.com:6443/  in 10526 milliseconds
I0115 09:45:24.032993   84481 round_trippers.go:449] Response Headers:
F0115 09:45:24.033054   84481 helpers.go:115] error: net/http: TLS handshake timeout

We are using oc 4.6.9

Comment 25 Franck Boudinet 2021-01-22 17:46:40 UTC
OK Novonil,

based on what I read above, my conclusion is that :
  - either you did the test with the export of GODEBUG=x509roots=1 on linux, & I confirm no additional traces are displayed on linux with this option & I think there is no point to do that because the problem I reported was on mac only
  - or either the problem you are having is different from the one I reported

I just double checked with the oc client 4.6.9 on both linux & mac and:
  - on linux the option GODEBUG=x509roots=1 does not display any additional traces as mentionned above
  - but on mac it does, see below an example of the trace your customer should get:
````
GODEBUG=x509roots=1 oc login -u <username> -p <password> https://<myclusterapiurl>:6443 --loglevel=9
I0122 18:23:13.269377   98842 loader.go:375] Config loaded from file:  /Users/fb/.kube/config
I0122 18:23:13.269766   98842 round_trippers.go:423] curl -k -v -XHEAD  'https://<myclusterapiurl>:6443/'
crypto/x509: trust settings for CN=localhost,OU=defaultServer,O=ibm,C=us: 4
crypto/x509: trust settings for CN=localhost:2998,OU=garage,O=hal,L=nice,ST=paca,C=fr,1.2.840.113549.1.9.1=#0c0f66636b626f407961686f6f2e636f6d: 1
crypto/x509: trust settings for CN=sma-unleash.ratl.swg.usma.ibm.com,O=IBM: 4
...
...
````

btw I can also see that compared to the time where I raised the issue using 4.5.x, the oc client behavior has changed from that prospective, most probably some updates in the go librairies being used compared to oc 4.6.9 because here is the trace using oc client 4.5.2 & trying to connect to the same OCP server from exactly the same mac & user:
````
oc.4.5.2 login -u <username> -p <password> https://<myclusterapiurl>:6443 --loglevel=9
I0122 18:32:31.529731   99156 loader.go:375] Config loaded from file:  /Users/fb/.kube/config
I0122 18:32:31.531035   99156 round_trippers.go:423] curl -k -v -XHEAD  'https://<myclusterapiurl>:6443/'
crypto/x509: 189 certs have a trust policy
crypto/x509: verify-cert approved CN=IBM JSS Built-in Certificate Authority
crypto/x509: verify-cert approved CN=IBM Internal Root CA,O=International Business Machines Corporation,C=US
crypto/x509: verify-cert approved CN=ucopia.zonezei,OU=DGSI,O=DA,L=St Cloud,ST=St Cloud,C=FR
crypto/x509: verify-cert approved CN=icp-router,O=IBM Cloud Private,L=Xi'an,ST=Shaanxi,C=CN
crypto/x509: verify-cert approved CN=http://openvpn.net/localca.html #1567088347
crypto/x509: verify-cert rejected CN=1.1.1.1,OU=DeviceSSL (WebAuth),O=Cisco Systems Inc.,C=US: "Cert Verify Result: CSSMERR_TP_NOT_TRUSTED"
crypto/x509: verify-cert approved CN=www.ibm.com,O=IBM Cloud Private,L=Armonk,ST=New York,C=US
crypto/x509: verify-cert approved CN=IBM CA,O=International Business Machines Corporation,C=US
...
...
````

You can see that the output is different.
So please confirm that you're getting the 'crypto/x509:' kind of traces when running the oc login from a mac with the GODEBUG=x509roots=1 exports. If yes then hopefully it will help to confirm that the issue is due to the TLS certificates checking taking too long because of too many certificates and if not then it means the problem you're having is different from the one I raised

Have a good WE.
cheers

Comment 26 Maciej Szulik 2021-01-25 11:43:44 UTC
I spoke with other folks, we'll try to fix the bug in oc in 4.7

Comment 27 Maciej Szulik 2021-02-05 14:12:11 UTC
Due to other commitments we weren't able to get that resolved for 4.7, I'm pushing this over to 4.8

Comment 30 Maciej Szulik 2021-02-18 10:58:26 UTC
See the PR from the initial comment, I spoke with Standa and we've decided to go with that fix.

Comment 31 Mike Dame 2021-02-25 20:51:14 UTC
I have opened a new PR to make this change at github.com/openshift/oc/pull/746

Comment 33 zhou ying 2021-03-08 06:48:09 UTC
Confirmed with latest oc version : 
Zhous-MacBook-Pro:~ zhouying$  oc version --client -o yaml 
clientVersion:
  buildDate: "2021-03-06T04:04:17Z"
  compiler: gc
  gitCommit: f7835631278b03e4388f65145b56e1fd93bf4e37
  gitTreeState: clean
  gitVersion: 4.8.0-202103060050.p0-f783563
  goVersion: go1.15.5
  major: ""
  minor: ""
  platform: darwin/amd64
releaseClientVersion: 4.8.0-0.nightly-2021-03-06-055252


Zhous-MacBook-Pro:~ zhouying$ time oc login  https://api.yinzhou8aws.qe.devcluster.openshift.com:6443 -u pm305  --insecure-skip-tls-verify
WARNING: Using insecure TLS client config. Setting this option is not supported!

Logged into "https://api.yinzhou8aws.qe.devcluster.openshift.com:6443" as "pm305" using existing credentials.

You don't have any projects. You can try to create a new project, by running

    oc new-project <projectname>


real	0m47.463s
user	0m0.304s
sys	0m0.053s

Comment 35 Mike Dame 2021-05-20 15:56:59 UTC
I still have not had a chance to check with the upstream auth folks about the proper way (if any) that we can allow both of these flags to be set.

See https://bugzilla.redhat.com/show_bug.cgi?id=1935102#c2 for more context on why this was reverted. It appears the limitation is not just oc (as previously thought in this bugzilla) but an internal k8s client restriction that will require more work.


Note You need to log in before you can comment on or make changes to this bug.