Description of problem: It appears parts of must-gather do not respect the --config CLI option. A healthy amount of work is done using the --config provided on CLI before failing with what appears to indicate use of the current kubeconfig file for the user. (in our case the default kubeconfig given to pods where we're running the installer) Version-Release number of selected component (if applicable): sh-4.2$ ./oc version Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.4-201906271212+6b97d85-dirty", GitCommit:"6b97d85", GitTreeState:"dirty", BuildDate:"2019-06-27T18:11:21Z", GoVersion:"go1.11.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2019-07-22T15:47:46Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"} How reproducible: 100% Steps to Reproduce: 1. Obtain a kubeconfig for another cluster. (not the one you're currently logged into with current kubeconfig file) 2. oc --kubeconfig=auth/kubeconfig adm must-gather Actual results: 2019/07/30 17:42:35 Finished successfully with no errors. 2019/07/30 17:42:35 Finished successfully with no errors. 2019/07/30 17:42:35 Finished successfully with no errors. 2019/07/30 17:42:35 Finished successfully with no errors. 2019/07/30 17:42:35 Finished successfully with no errors. 2019/07/30 17:42:35 Finished successfully with no errors. 2019/07/30 17:42:35 Finished successfully with no errors. WARNING: Collecting one or more audit logs on ALL masters in your cluster. This could take a large amount of time. /usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=openshift-apiserver/audit.log /usr/bin/oc adm node-logs ip-10-0-147-35.ec2.internal --path=openshift-apiserver/audit.log /usr/bin/oc adm node-logs ip-10-0-169-191.ec2.internal --path=openshift-apiserver/audit.log INFO: Audit logs for openshift-apiserver collected. WARNING: Collecting one or more audit logs on ALL masters in your cluster. This could take a large amount of time. /usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit-2019-07-30T16-26-04.758.log /usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit-2019-07-30T16-51-17.699.log /usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit-2019-07-30T17-15-45.423.log /usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit-2019-07-30T17-40-08.877.log /usr/bin/oc adm node-logs ip-10-0-129-195.ec2.internal --path=kube-apiserver/audit.log /usr/bin/oc adm node-logs ip-10-0-147-35.ec2.internal --path=kube-apiserver/audit.log /usr/bin/oc adm node-logs ip-10-0-169-191.ec2.internal --path=kube-apiserver/audit-2019-07-30T16-58-57.138.log /usr/bin/oc adm node-logs ip-10-0-169-191.ec2.internal --path=kube-apiserver/audit.log INFO: Audit logs for kube-apiserver collected. WARNING: Collecting one or more service logs on ALL master nodes in your cluster. This could take a large amount of time. INFO: Collecting host service logs for kubelet INFO: Collecting host service logs for crio INFO: Waiting for worker host service log collection to complete ... INFO: Worker host service log collection to complete. Error from server (Forbidden): pods "must-gather-4jz8m" is forbidden: User "system:serviceaccount:hive:cluster-installer" cannot get pods in the namespace "openshift-must-gather-x7zqp": no RBAC policy matched rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(226) [Receiver=3.1.2] clusterrolebinding.rbac.authorization.k8s.io/must-gather-fkqh5 deleted namespace/openshift-must-gather-x7zqp deleted error: exit status 12 Additional info: Based on the Username mentioned above this appeared to indicate the must-gather command was no longer respecting the --config we provided, rather falling back to the default auth for the pod we're running in. exporting KUBECONFIG to point to the kubeconfig file works around the problem.
The issue still could be reproduced with :[root@dhcp-140-138 ~]# oc version Client Version: version.Info{Major:"4", Minor:"2+", GitVersion:"v4.2.0", GitCommit:"ceed07c42", GitTreeState:"clean", BuildDate:"2019-08-25T17:53:06Z", GoVersion:"go1.12.8", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+8a74efb", GitCommit:"8a74efb", GitTreeState:"clean", BuildDate:"2019-08-23T14:58:50Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"} oc --kubeconfig=/home/roottest/kubeconfig adm must-gather .... INFO: Audit logs for kube-apiserver collected. WARNING: Collecting one or more service logs on ALL master nodes in your cluster. This could take a large amount of time. INFO: Collecting host service logs for kubelet INFO: Collecting host service logs for crio INFO: Waiting for worker host service log collection to complete ... INFO: Worker host service log collection to complete. Error from server (NotFound): pods "must-gather-nph8g" not found rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(226) [Receiver=3.1.2] clusterrolebinding.rbac.authorization.k8s.io/must-gather-c79vh deleted namespace/openshift-must-gather-dlgh2 deleted error: exit status 12
Checked with version, the issue has fixed: [root@dhcp-140-138 ~]# oc --kubeconfig=/home/roottest/kubeconfig version Client Version: v4.2.0 Server Version: 4.2.0-0.nightly-2019-10-09-203306 Kubernetes Version: v1.14.6+c795c6c [root@dhcp-140-138 ~]# oc --kubeconfig=/home/roottest/kubeconfig adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2130377ba7ab9dbed8350c52b098dae1575a7dbafe279f8c013e6455d2da6a93 [must-gather ] OUT namespace/openshift-must-gather-t5p4v created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-p5cjn created [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2130377ba7ab9dbed8350c52b098dae1575a7dbafe279f8c013e6455d2da6a93 created [must-gather-8vqn6] POD 2019/10/11 07:39:11 Finished successfully with no errors. [must-gather-8vqn6] POD 2019/10/11 07:39:11 Gathering data for ns/openshift-cluster-version... [must-gather-8vqn6] POD 2019/10/11 07:39:11 Collecting resources for namespace "openshift-cluster-version"... [must-gather-8vqn6] POD 2019/10/11 07:39:11 Gathering pod data for namespace "openshift-cluster-version"... [must-gather-8vqn6] POD 2019/10/11 07:39:11 Gathering data for pod "cluster-version-operator-59cc658c74-vgbqw" ......
Since the bug has been fixed, could you change this bug to on_qa, so that QE can verify it.
Moving to verified per comment #3.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922