Description of problem: when evacuate pods by node-selector with dry-run option, it can not list correct pods. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-25-233755 How reproducible: always Steps to Reproduce: 1.$ oc run hello --image=openshift/hello-openshift:latest --replicas=3 deploymentconfig.apps.openshift.io/hello created 2.$ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-1-deploy 0/1 Completed 0 26s 10.129.2.9 preserve-groupg-4cf4r-worker-s9pxg <none> <none> hello-1-jpsnt 1/1 Running 0 20s 10.129.2.10 preserve-groupg-4cf4r-worker-s9pxg <none> <none> hello-1-jvq9l 1/1 Running 0 20s 10.129.2.12 preserve-groupg-4cf4r-worker-s9pxg <none> <none> hello-1-w7kwx 1/1 Running 0 20s 10.129.2.11 preserve-groupg-4cf4r-worker-s9pxg <none> <none> 3.$ oc describe pod hello-1-jpsnt Name: hello-1-jpsnt Namespace: minmli Priority: 0 PriorityClassName: <none> Node: preserve-groupg-4cf4r-worker-s9pxg/192.168.0.40 Start Time: Wed, 28 Aug 2019 10:54:05 +0800 Labels: deployment=hello-1 deploymentconfig=hello run=hello ... 4.$ oc adm drain preserve-groupg-4cf4r-worker-s9pxg --pod-selector="run=hello" --dry-run node/preserve-groupg-4cf4r-worker-s9pxg cordoned (dry run) Actual results: 4.can not list correct pods Expected results: 4.list correct pods, like: node/preserve-groupg-4cf4r-worker-s9pxg cordoned (dry run) Listing matched pods on node "preserve-groupg-4cf4r-worker-s9pxg": NAMESPACE NAME AGE minmli pod/hello-1-jpsnt 6m minmli pod/hello-1-jvq9l 6m minmli pod/hello-1-w7kwx 6m Additional info:
Currently the code in drain just ignores listing pods when invoked with --dry-run, see: https://github.com/kubernetes/kubernetes/blob/acf5411774ebd1b1376d389763781afc5c4cda8b/staging/src/k8s.io/kubectl/pkg/cmd/drain/drain.go#L292-L294 we could list them, though. Lowering the priority since it's a nice have but definitely not a bug and moving to 4.3.
opened upstream: https://github.com/kubernetes/kubernetes/pull/82660
confirm this bug fixed in 4.2 ? I tested but not as expected(not list correct pod) [lyman@dhcp-141-235 ~]$ oc run hello --image=openshift/hello-openshift:latest --replicas=3 kubectl run --generator=deploymentconfig/v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. deploymentconfig.apps.openshift.io/hello created [lyman@dhcp-141-235 ~]$ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-1-deploy 0/1 Completed 0 84s 10.128.2.20 ip-10-0-147-58.us-east-2.compute.internal <none> <none> hello-1-t8nnn 1/1 Running 0 74s 10.131.0.27 ip-10-0-131-160.us-east-2.compute.internal <none> <none> hello-1-w8bwm 1/1 Running 0 74s 10.128.2.21 ip-10-0-147-58.us-east-2.compute.internal <none> <none> hello-1-zftr7 1/1 Running 0 74s 10.131.0.26 ip-10-0-131-160.us-east-2.compute.internal <none> <none> [lyman@dhcp-141-235 ~]$ oc describe pod hello-1-t8nnn Name: hello-1-t8nnn Namespace: minmli Priority: 0 PriorityClassName: <none> Node: ip-10-0-131-160.us-east-2.compute.internal/10.0.131.160 Start Time: Fri, 18 Oct 2019 14:33:29 +0800 Labels: deployment=hello-1 deploymentconfig=hello run=hello ... [lyman@dhcp-141-235 ~]$ oc adm drain ip-10-0-131-160.us-east-2.compute.internal --pod-selector="run=hello" --dry-run node/ip-10-0-131-160.us-east-2.compute.internal cordoned (dry run) node/ip-10-0-131-160.us-east-2.compute.internal drained (dry run) [lyman@dhcp-141-235 ~]$
this was moved to QA in error. The upstream PR hasn't merged yet. I'll report back here when it does.
The fix for this is merged, please confirm, thanks.
not fix ! $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-05-08-040144 True False 12m Cluster version is 4.4.0-0.nightly-2020-05-08-040144 $ oc run hello --image=openshift/hello-openshift:latest --replicas=4 kubectl run --generator=deploymentconfig/v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. deploymentconfig.apps.openshift.io/hello created $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-1-deploy 0/1 Completed 0 90s 10.129.2.18 zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal <none> <none> hello-1-h5smx 1/1 Running 0 87s 10.129.2.19 zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal <none> <none> hello-1-lj674 1/1 Running 0 87s 10.131.0.17 zzhao4-g94ml-w-a-nfp5l.c.openshift-qe.internal <none> <none> hello-1-qgqx5 1/1 Running 0 87s 10.129.2.20 zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal <none> <none> hello-1-scmxb 1/1 Running 0 87s 10.128.2.16 zzhao4-g94ml-w-c-6jww8.c.openshift-qe.internal <none> <none> $ oc adm drain zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal --pod-selector="run=hello" --dry-run node/zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal cordoned (dry run) node/zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal drained (dry run)
Sally, What's the OpenShift PR that includes this backport? Can you please link to that? If there wasn't an openshift PR this looks like it made it into Kube 1.18 so we'd only expect this to have been fixed in 4.5 at this point, correct?
Fix should have been pulled in with this: https://github.com/kubernetes/kubernetes/pull/82660, I'm looking at this now (upcoming sprint) to resolve, thanks.
QA please confirm this for 4.5, the fix is in 4.5. We don't plan on backporting. I've moved Target Release to 4.5 for this bug.
verified with version: 4.5.0-0.nightly-2020-06-11-035450 $ oc adm drain ip-10-0-58-91.us-east-2.compute.internal --pod-selector="run=hello-openshift" --dry-run W0611 16:31:30.136869 1132 helpers.go:535] --dry-run is deprecated and can be replaced with --dry-run=client. node/ip-10-0-58-91.us-east-2.compute.internal cordoned (dry run) evicting pod default/hello-openshift-252mj (dry run) evicting pod default/hello-openshift-2z4t5 (dry run) evicting pod default/hello-openshift-b5gqg (dry run) node/ip-10-0-58-91.us-east-2.compute.internal drained (dry run)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409