Bug 1734504

Summary: Parts of oc adm must-gather do not respect --config CLI option
Product: OpenShift Container Platform Reporter: Devan Goodwin <dgoodwin>
Component: ocAssignee: Luis Sanchez <sanchezl>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: aos-bugs, jokerman, mfojtik, sanchezl, vlaad, wzheng
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:34:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Devan Goodwin 2019-07-30 17:49:53 UTC
Description of problem:

It appears parts of must-gather do not respect the --config CLI option. A healthy amount of work is done using the --config provided on CLI before failing with what appears to indicate use of the current kubeconfig file for the user. (in our case the default kubeconfig given to pods where we're running the installer)

Version-Release number of selected component (if applicable):

sh-4.2$ ./oc version
Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.4-201906271212+6b97d85-dirty", GitCommit:"6b97d85", GitTreeState:"dirty", BuildDate:"2019-06-27T18:11:21Z", GoVersion:"go1.11.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2019-07-22T15:47:46Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}


How reproducible:

100%

Steps to Reproduce:
1. Obtain a kubeconfig for another cluster. (not the one you're currently logged into with current kubeconfig file)
2. oc --kubeconfig=auth/kubeconfig adm must-gather 


Actual results:
2019/07/30 17:42:35 Finished successfully with no errors.
2019/07/30 17:42:35 Finished successfully with no errors.
2019/07/30 17:42:35 Finished successfully with no errors.
2019/07/30 17:42:35 Finished successfully with no errors.
2019/07/30 17:42:35 Finished successfully with no errors.
2019/07/30 17:42:35 Finished successfully with no errors.
2019/07/30 17:42:35 Finished successfully with no errors.
WARNING: Collecting one or more audit logs on ALL masters in your cluster. This could take a large amount of time.
/usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=openshift-apiserver/audit.log
/usr/bin/oc adm node-logs ip-10-0-147-35.ec2.internal --path=openshift-apiserver/audit.log
/usr/bin/oc adm node-logs ip-10-0-169-191.ec2.internal --path=openshift-apiserver/audit.log
INFO: Audit logs for openshift-apiserver collected.
WARNING: Collecting one or more audit logs on ALL masters in your cluster. This could take a large amount of time.
/usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit-2019-07-30T16-26-04.758.log
/usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit-2019-07-30T16-51-17.699.log
/usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit-2019-07-30T17-15-45.423.log
/usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit-2019-07-30T17-40-08.877.log
/usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit.log
/usr/bin/oc adm node-logs ip-10-0-147-35.ec2.internal --path=kube-apiserver/audit.log
/usr/bin/oc adm node-logs ip-10-0-169-191.ec2.internal --path=kube-apiserver/audit-2019-07-30T16-58-57.138.log
/usr/bin/oc adm node-logs ip-10-0-169-191.ec2.internal --path=kube-apiserver/audit.log
INFO: Audit logs for kube-apiserver collected.
WARNING: Collecting one or more service logs on ALL master nodes in your cluster. This could take a large amount of time.
INFO: Collecting host service logs for kubelet
INFO: Collecting host service logs for crio
INFO: Waiting for worker host service log collection to complete ...
INFO: Worker host service log collection to complete.
Error from server (Forbidden): pods "must-gather-4jz8m" is forbidden: User "system:serviceaccount:hive:cluster-installer" cannot get pods in the namespace "openshift-must-gather-x7zqp": no RBAC policy matched
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [Receiver=3.1.2]
clusterrolebinding.rbac.authorization.k8s.io/must-gather-fkqh5 deleted
namespace/openshift-must-gather-x7zqp deleted
error: exit status 12


Additional info:

Based on the Username mentioned above this appeared to indicate the must-gather command was no longer respecting the --config we provided, rather falling back to the default auth for the pod we're running in.

exporting KUBECONFIG to point to the kubeconfig file works around the problem.

Comment 2 zhou ying 2019-08-26 05:58:30 UTC
The issue still could be reproduced with :[root@dhcp-140-138 ~]# oc version
Client Version: version.Info{Major:"4", Minor:"2+", GitVersion:"v4.2.0", GitCommit:"ceed07c42", GitTreeState:"clean", BuildDate:"2019-08-25T17:53:06Z", GoVersion:"go1.12.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+8a74efb", GitCommit:"8a74efb", GitTreeState:"clean", BuildDate:"2019-08-23T14:58:50Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}


oc  --kubeconfig=/home/roottest/kubeconfig adm must-gather
....

INFO: Audit logs for kube-apiserver collected.
WARNING: Collecting one or more service logs on ALL master nodes in your cluster. This could take a large amount of time.
INFO: Collecting host service logs for kubelet
INFO: Collecting host service logs for crio
INFO: Waiting for worker host service log collection to complete ...
INFO: Worker host service log collection to complete.
Error from server (NotFound): pods "must-gather-nph8g" not found
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [Receiver=3.1.2]
clusterrolebinding.rbac.authorization.k8s.io/must-gather-c79vh deleted
namespace/openshift-must-gather-dlgh2 deleted
error: exit status 12

Comment 3 zhou ying 2019-10-11 07:52:54 UTC
Checked with version, the issue has fixed:
[root@dhcp-140-138 ~]# oc --kubeconfig=/home/roottest/kubeconfig  version
Client Version: v4.2.0
Server Version: 4.2.0-0.nightly-2019-10-09-203306
Kubernetes Version: v1.14.6+c795c6c



[root@dhcp-140-138 ~]# oc --kubeconfig=/home/roottest/kubeconfig  adm must-gather
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2130377ba7ab9dbed8350c52b098dae1575a7dbafe279f8c013e6455d2da6a93
[must-gather      ] OUT namespace/openshift-must-gather-t5p4v created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-p5cjn created
[must-gather      ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2130377ba7ab9dbed8350c52b098dae1575a7dbafe279f8c013e6455d2da6a93 created
[must-gather-8vqn6] POD 2019/10/11 07:39:11 Finished successfully with no errors.
[must-gather-8vqn6] POD 2019/10/11 07:39:11 Gathering data for ns/openshift-cluster-version...
[must-gather-8vqn6] POD 2019/10/11 07:39:11     Collecting resources for namespace "openshift-cluster-version"...
[must-gather-8vqn6] POD 2019/10/11 07:39:11     Gathering pod data for namespace "openshift-cluster-version"...
[must-gather-8vqn6] POD 2019/10/11 07:39:11         Gathering data for pod "cluster-version-operator-59cc658c74-vgbqw"
......

Comment 4 Wenjing Zheng 2019-10-11 08:15:44 UTC
Since the bug has been fixed, could you change this bug to on_qa, so that QE can verify it.

Comment 6 Wenjing Zheng 2019-10-14 14:53:17 UTC
Moving to verified per comment #3.

Comment 7 errata-xmlrpc 2019-10-16 06:34:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922