Bug 1746227 - evacuate pods by node-selector with dry-run option can not list correct pods
Summary: evacuate pods by node-selector with dry-run option can not list correct pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.2.0
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: 4.5.0
Assignee: Sally
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-28 03:12 UTC by MinLi
Modified: 2020-07-13 17:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:11:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:11:52 UTC

Description MinLi 2019-08-28 03:12:13 UTC
Description of problem:
when evacuate pods by node-selector with dry-run option, it can not list correct pods.

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-08-25-233755

How reproducible:
always

Steps to Reproduce:
1.$ oc run hello --image=openshift/hello-openshift:latest  --replicas=3
deploymentconfig.apps.openshift.io/hello created

2.$ oc get pod -o wide
NAME             READY     STATUS      RESTARTS   AGE       IP            NODE                                 NOMINATED NODE   READINESS GATES
hello-1-deploy   0/1       Completed   0          26s       10.129.2.9    preserve-groupg-4cf4r-worker-s9pxg   <none>           <none>
hello-1-jpsnt    1/1       Running     0          20s       10.129.2.10   preserve-groupg-4cf4r-worker-s9pxg   <none>           <none>
hello-1-jvq9l    1/1       Running     0          20s       10.129.2.12   preserve-groupg-4cf4r-worker-s9pxg   <none>           <none>
hello-1-w7kwx    1/1       Running     0          20s       10.129.2.11   preserve-groupg-4cf4r-worker-s9pxg   <none>           <none>

3.$ oc describe pod hello-1-jpsnt
Name:               hello-1-jpsnt
Namespace:          minmli
Priority:           0
PriorityClassName:  <none>
Node:               preserve-groupg-4cf4r-worker-s9pxg/192.168.0.40
Start Time:         Wed, 28 Aug 2019 10:54:05 +0800
Labels:             deployment=hello-1
                    deploymentconfig=hello
                    run=hello
...

4.$ oc adm drain preserve-groupg-4cf4r-worker-s9pxg --pod-selector="run=hello" --dry-run
node/preserve-groupg-4cf4r-worker-s9pxg cordoned (dry run)


Actual results:
4.can not list correct pods

Expected results:
4.list correct pods, like:
node/preserve-groupg-4cf4r-worker-s9pxg cordoned (dry run)
Listing matched pods on node "preserve-groupg-4cf4r-worker-s9pxg":
NAMESPACE   NAME                AGE   
minmli      pod/hello-1-jpsnt   6m    
minmli      pod/hello-1-jvq9l   6m    
minmli      pod/hello-1-w7kwx   6m 


Additional info:

Comment 1 Maciej Szulik 2019-08-29 11:14:49 UTC
Currently the code in drain just ignores listing pods when invoked with --dry-run, see:
https://github.com/kubernetes/kubernetes/blob/acf5411774ebd1b1376d389763781afc5c4cda8b/staging/src/k8s.io/kubectl/pkg/cmd/drain/drain.go#L292-L294
we could list them, though. Lowering the priority since it's a nice have but definitely not a bug and moving to 4.3.

Comment 2 Sally 2019-09-12 20:14:35 UTC
opened upstream: https://github.com/kubernetes/kubernetes/pull/82660

Comment 5 MinLi 2019-10-18 06:44:00 UTC
confirm this bug fixed in 4.2 ? 
I tested but not as expected(not list correct pod)

[lyman@dhcp-141-235 ~]$ oc run hello --image=openshift/hello-openshift:latest  --replicas=3
kubectl run --generator=deploymentconfig/v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deploymentconfig.apps.openshift.io/hello created

[lyman@dhcp-141-235 ~]$ oc get pod -o wide
NAME             READY   STATUS      RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
hello-1-deploy   0/1     Completed   0          84s   10.128.2.20   ip-10-0-147-58.us-east-2.compute.internal    <none>           <none>
hello-1-t8nnn    1/1     Running     0          74s   10.131.0.27   ip-10-0-131-160.us-east-2.compute.internal   <none>           <none>
hello-1-w8bwm    1/1     Running     0          74s   10.128.2.21   ip-10-0-147-58.us-east-2.compute.internal    <none>           <none>
hello-1-zftr7    1/1     Running     0          74s   10.131.0.26   ip-10-0-131-160.us-east-2.compute.internal   <none>           <none>

[lyman@dhcp-141-235 ~]$ oc describe pod hello-1-t8nnn 
Name:               hello-1-t8nnn
Namespace:          minmli
Priority:           0
PriorityClassName:  <none>
Node:               ip-10-0-131-160.us-east-2.compute.internal/10.0.131.160
Start Time:         Fri, 18 Oct 2019 14:33:29 +0800
Labels:             deployment=hello-1
                    deploymentconfig=hello
                    run=hello
...

[lyman@dhcp-141-235 ~]$  oc adm drain  ip-10-0-131-160.us-east-2.compute.internal --pod-selector="run=hello" --dry-run
node/ip-10-0-131-160.us-east-2.compute.internal cordoned (dry run)
node/ip-10-0-131-160.us-east-2.compute.internal drained (dry run)
[lyman@dhcp-141-235 ~]$

Comment 6 Sally 2019-10-24 19:23:31 UTC
this was moved to QA in error.  The upstream PR hasn't merged yet.  I'll report back here when it does.

Comment 9 Sally 2020-04-27 03:22:38 UTC
The fix for this is merged, please confirm, thanks.

Comment 16 MinLi 2020-05-08 07:37:31 UTC
not fix !

$ oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-05-08-040144   True        False         12m     Cluster version is 4.4.0-0.nightly-2020-05-08-040144

$ oc run hello --image=openshift/hello-openshift:latest  --replicas=4
kubectl run --generator=deploymentconfig/v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deploymentconfig.apps.openshift.io/hello created

$ oc get pod -o wide 
NAME             READY   STATUS      RESTARTS   AGE   IP            NODE                                             NOMINATED NODE   READINESS GATES
hello-1-deploy   0/1     Completed   0          90s   10.129.2.18   zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal   <none>           <none>
hello-1-h5smx    1/1     Running     0          87s   10.129.2.19   zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal   <none>           <none>
hello-1-lj674    1/1     Running     0          87s   10.131.0.17   zzhao4-g94ml-w-a-nfp5l.c.openshift-qe.internal   <none>           <none>
hello-1-qgqx5    1/1     Running     0          87s   10.129.2.20   zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal   <none>           <none>
hello-1-scmxb    1/1     Running     0          87s   10.128.2.16   zzhao4-g94ml-w-c-6jww8.c.openshift-qe.internal   <none>           <none>

$ oc adm drain zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal --pod-selector="run=hello" --dry-run
node/zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal cordoned (dry run)
node/zzhao4-g94ml-w-b-9b577.c.openshift-qe.internal drained (dry run)

Comment 17 Scott Dodson 2020-05-08 23:43:40 UTC
Sally,

What's the OpenShift PR that includes this backport? Can you please link to that? If there wasn't an openshift PR this looks like it made it into Kube 1.18 so we'd only expect this to have been fixed in 4.5 at this point, correct?

Comment 19 Sally 2020-05-20 15:51:17 UTC
Fix should have been pulled in with this: https://github.com/kubernetes/kubernetes/pull/82660, I'm looking at this now (upcoming sprint) to resolve, thanks.

Comment 21 Sally 2020-06-10 16:39:03 UTC
QA please confirm this for 4.5, the fix is in 4.5.  We don't plan on backporting. I've moved Target Release to 4.5 for this bug.

Comment 24 MinLi 2020-06-11 08:38:10 UTC
verified with version: 4.5.0-0.nightly-2020-06-11-035450

$ oc adm  drain ip-10-0-58-91.us-east-2.compute.internal --pod-selector="run=hello-openshift" --dry-run
W0611 16:31:30.136869    1132 helpers.go:535] --dry-run is deprecated and can be replaced with --dry-run=client.
node/ip-10-0-58-91.us-east-2.compute.internal cordoned (dry run)
evicting pod default/hello-openshift-252mj (dry run)
evicting pod default/hello-openshift-2z4t5 (dry run)
evicting pod default/hello-openshift-b5gqg (dry run)
node/ip-10-0-58-91.us-east-2.compute.internal drained (dry run)

Comment 26 errata-xmlrpc 2020-07-13 17:11:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.