Bug 1819954
Summary: | oc adm drain panics | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hongkai Liu <hongkliu> | |
Component: | oc | Assignee: | Maciej Szulik <maszulik> | |
Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.3.0 | CC: | aaleman, aos-bugs, jokerman, mfojtik, skuznets, yinzhou | |
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1820507 (view as bug list) | Environment: | ||
Last Closed: | 2020-07-13 17:24:45 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Hongkai Liu
2020-04-01 22:14:25 UTC
oc adm must-gather --dest-dir='./must-gather' [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:97ea12139f980154850164233b34c8eb4622823bd6dbb8e7772f873cb157f221 [must-gather ] OUT namespace/openshift-must-gather-zvdnt created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-f2282 created [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:97ea12139f980154850164233b34c8eb4622823bd6dbb8e7772f873cb157f221 created [must-gather-zqwrb] POD Wrote inspect data to must-gather. [must-gather-zqwrb] POD Gathering data for ns/openshift-cluster-version... [must-gather-zqwrb] POD Wrote inspect data to must-gather. [must-gather-zqwrb] POD Gathering data for ns/openshift-config... [must-gather-zqwrb] POD Gathering data for ns/openshift-config-managed... [must-gather-zqwrb] POD Gathering data for ns/openshift-authentication... [must-gather-zqwrb] POD Gathering data for ns/openshift-authentication-operator... [must-gather-zqwrb] POD Gathering data for ns/openshift-ingress... [must-gather-zqwrb] POD Gathering data for ns/openshift-cloud-credential-operator... [must-gather-zqwrb] POD Gathering data for ns/openshift-machine-api... [must-gather-zqwrb] POD Gathering data for ns/openshift-console-operator... [must-gather-zqwrb] POD Gathering data for ns/openshift-console... [must-gather-zqwrb] POD Gathering data for ns/openshift-dns-operator... [must-gather-zqwrb] POD Gathering data for ns/openshift-dns... [must-gather-zqwrb] POD Gathering data for ns/openshift-image-registry... [must-gather-zqwrb] OUT waiting for gather to complete [must-gather-zqwrb] OUT gather never finished: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-f2282 deleted [must-gather ] OUT namespace/openshift-must-gather-zvdnt deleted error: gather never finished for pod must-gather-zqwrb: timed out waiting for the condition There are two problems here, one is with nil pointer in drain and that bit is going to be addressed by my team, but additionally the stuck pods I've split into https://bugzilla.redhat.com/show_bug.cgi?id=1820507. The oc part is fixed starting from 4.4, moving accordingly. FYI, there is already a PR out to address the forbidden errors here: https://github.com/kubernetes/kubernetes/pull/89314 I don't know if or how that might impact the null pointer problem, though. Confirmed with latest oc client, can't reproduce the issue now: [root@dhcp-140-138 roottest]# oc get po -A -o wide|grep Termi openshift-multus multus-b5gs4 0/1 Terminating 0 45h 10.0.99.139 hrw-bar12-9rfxg-rhel-0 <none> <none> openshift-sdn ovs-5m82p 0/1 Terminating 0 45h 10.0.99.139 hrw-bar12-9rfxg-rhel-0 <none> <none> openshift-sdn sdn-nfkd2 0/1 Terminating 1 45h 10.0.99.139 hrw-bar12-9rfxg-rhel-0 <none> <none> [root@dhcp-140-138 roottest]# oc adm drain "hrw-bar12-9rfxg-rhel-0" --delete-local-data --ignore-daemonsets --force node/hrw-bar12-9rfxg-rhel-0 cordoned WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-j4qb7, openshift-dns/dns-default-h5bd8, openshift-image-registry/node-ca-kvtt8, openshift-machine-config-operator/machine-config-daemon-ddhzp, openshift-monitoring/node-exporter-r96z2, openshift-multus/multus-b5gs4, openshift-sdn/ovs-5m82p, openshift-sdn/sdn-nfkd2 evicting pod openshift-marketplace/community-operators-879b5f6ff-2pzgl evicting pod openshift-image-registry/image-pruner-1586304000-t6skm evicting pod openshift-marketplace/redhat-marketplace-6d46bccd87-hx2f5 evicting pod openshift-marketplace/redhat-operators-bfd786b97-d9ts5 pod/community-operators-879b5f6ff-2pzgl evicted pod/redhat-operators-bfd786b97-d9ts5 evicted pod/image-pruner-1586304000-t6skm evicted pod/redhat-marketplace-6d46bccd87-hx2f5 evicted node/hrw-bar12-9rfxg-rhel-0 evicted [root@dhcp-140-138 roottest]# oc version -o yaml clientVersion: buildDate: "2020-04-06T21:08:17Z" compiler: gc gitCommit: f2b01c4e4ae8c4ca11caabf8cb8e76b7a28b7009 gitTreeState: clean gitVersion: 4.5.0-202004062101-f2b01c4 goVersion: go1.13.4 major: "" minor: "" platform: linux/amd64 @zhou, could you reproduce with an older version? It is not that drain-node command does not work at all. We need to see a pod stuck with terminating first, then drain node. (In reply to Hongkai Liu from comment #10) > @zhou, > > could you reproduce with an older version? > > It is not that drain-node command does not work at all. > We need to see a pod stuck with terminating first, then drain node. For the verify , I just looking a exist cluster with pod state: Terminating , and run `oc adm drain` command , please see https://bugzilla.redhat.com/show_bug.cgi?id=1819954#c8. Today I've tried to create pod and make it Terminating, but failed. I can't find way to do this. If my verify is wrong , please correct me . thanks. Hey Zhou Ying, Thanks for the information. The cluster was with lots of ongoing builds when I hit the issue. The oc-cli panic was a result of simulating what machine-controller did upon scaling down the cluster. My feeling is that when the server side bug (https://bugzilla.redhat.com/show_bug.cgi?id=1820507) is fixed, oc-cli wont panic any more. I do not know how to verify this without the cooperation from server side. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |