Bug 1755428
Summary: | must-gather fails to run and returns unhelpful error | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Timothy Rees <trees> |
Component: | oc | Assignee: | Sally <somalley> |
Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.z | CC: | aos-bugs, jokerman, maszulik, mfojtik |
Target Milestone: | --- | ||
Target Release: | 4.4.0 | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-04 11:13:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Timothy Rees
2019-09-25 13:42:38 UTC
Addtional information, pods runs and immediately terminates: # date && oc adm must-gather; date Thu Sep 26 05:28:12 CEST 2019 namespace/openshift-must-gather-8lvfz created clusterrolebinding.rbac.authorization.k8s.io/must-gather-pphzr created WARNING: cannot use rsync: rsync not available in container WARNING: cannot use tar: tar not available in container clusterrolebinding.rbac.authorization.k8s.io/must-gather-pphzr deleted namespace/openshift-must-gather-8lvfz deleted error: No available strategies to copy. # date; oc get pods -n openshift-must-gather-8lvfz -w ; date Thu Sep 26 05:29:18 CEST 2019 NAME READY STATUS RESTARTS AGE must-gather-kck76 0/1 PodInitializing 0 66s must-gather-kck76 1/1 Running 0 67s must-gather-kck76 1/1 Terminating 0 73s must-gather-kck76 1/1 Terminating 0 73s Thu Sep 26 05:30:26 CEST 2019 [root@int-lb ~]# oc logs must-gather-kck76 -p Somehow there were csr approvals pending. I don't understand how this is the case since the cluster was working fine before it had been upgraded. Approving the CSRs per [1] resolved the issue and must-gather now runs. Is there any way to get a more useful error message from must-gather in this scenario? [1] https://access.redhat.com/solutions/4307511 This is not going to make 4.3, moving to 4.4 I've looked through the rsync, must-gather code for this. The must-gather code has been restructured since 4.1 (moved from openshift/origin to openshift/oc as of 4.2). It's difficult to reproduce this issue. I do notice with -v=4 you'd get more information regarding errors. I found in trying to reproduce this that if I delete the must-gather pod, the command hangs for the timeout (10 min). I've opened a PR to fix that specifically, and overall it will aid in getting more information from failed must-gather runs. For this bz, however, I suggest running must-gather with higher log-level. I assume the cmd hung for you, also, since the must-gather pod was terminated? In that sense, this PR will serve as a fix. https://github.com/openshift/oc/pull/295 Confirmed with latest oc client , can't reproduce the issue now: [root@dhcp-140-138 ~]# oc version -o yaml clientVersion: buildDate: "2020-02-13T22:50:14Z" compiler: gc gitCommit: 5d7a12f03389b03b651f963cb5ee8ddfa9cff559 gitTreeState: clean gitVersion: v4.4.0 goVersion: go1.13.4 major: "" minor: "" platform: linux/amd64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |