Bug 1835739

Summary: Support 'oc adm node drain' without --ignore-daemonsets=true --delete-local-data=true flags
Product: OpenShift Container Platform Reporter: Maciej Szulik <maszulik>
Component: ocAssignee: Maciej Szulik <maszulik>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: aos-bugs, jokerman, lshilin, mfojtik, yinzhou
Target Milestone: ---Keywords: UpcomingSprint
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Wrong condition in the code caused the logic to ignore deleted pods. Consequence: oc adm node drain was not properly accounting daemon sets and local data attached to pods when draining a node. Fix: Fix the logic, so that all pods are accounted accordingly when draining a node. Result: When trying to drain a node which has a daemonset's pod running, or pod has attached local volume data the drain command will fail pointing to use flags which will ignore the two.
Story Points: ---
Clone Of: 1835628 Environment:
Last Closed: 2020-06-17 22:26:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1835628    
Bug Blocks:    

Description Maciej Szulik 2020-05-14 12:14:28 UTC
+++ This bug was initially created as a clone of Bug #1835628 +++

Description of problem:

Running 'oc adm node drain' without --ignore-daemonsets --delete-local-data flags resulting in error:
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): openshift-cluster-node-tuning-operator/tuned-2kls9, openshift-dns/dns-default-c675s, openshift-image-registry/node-ca-d57th, openshift-machine-config-operator/machine-config-daemon-c5kx4, openshift-monitoring/node-exporter-jf5cr, openshift-multus/multus-cxqwz, openshift-ovn-kubernetes/ovnkube-node-zbzpv, openshift-ovn-kubernetes/ovs-node-kbjgf
cannot delete Pods with local storage (use --delete-local-data to override): openshift-image-registry/image-registry-5bcb67497d-gvq6b

Version-Release number of selected component (if applicable):
Client Version: 4.5
Server Version: 4.5

How reproducible:
Always with 4.5 client version

Steps to Reproduce:
Run 'oc adm drain NODE'

Actual results:
Error reported, drain aborted

Expected results:
Node drained, application pods evicted

Additional info:
1. When used Client Version 4.4 there is no problem with running the command without flags
2. The similar problem reported for CNV (see linked bug). Our setup is BareMetal without CNV

--- Additional comment from Maciej Szulik on 2020-05-14 13:40:00 CEST ---

--- Additional comment from Maciej Szulik on 2020-05-14 14:13:21 CEST ---

This is not a bug, this was reported upstream in https://github.com/kubernetes/kubectl/issues/803
and only kubectl 1.17 and accompanying oc 4.4 are affected. If you try older version of oc 4.3 or 4.2
you'll get a similar error. I'm closing this as not a bug and I'll try to cherry-pick the fix into 4.4.

Comment 1 Maciej Szulik 2020-05-14 12:15:24 UTC
Actually, it's the opposite, we need to pick https://github.com/kubernetes/kubernetes/pull/87361 fix so that 
oc adm drain warns about daemonsets and local storage.

Comment 2 Maciej Szulik 2020-05-20 09:39:39 UTC
PR waiting in queue https://github.com/openshift/oc/pull/420

Comment 5 zhou ying 2020-06-01 06:39:19 UTC
Confirmed with latest oc: with DaemonSets or Volumes attached should give you the warning and abort the drain. 

[root@dhcp-140-138 ~]# oc version -o yaml 
  buildDate: "2020-05-29T06:43:55Z"
  compiler: gc
  gitCommit: 1960dd73b123241730531db09489d951228ad853
  gitTreeState: clean
  gitVersion: 4.4.0-202005290638-1960dd7
  goVersion: go1.13.4
  major: ""
  minor: ""
  platform: linux/amd64
openshiftVersion: 4.4.0-0.nightly-2020-05-30-022631
  buildDate: "2020-05-30T01:52:40Z"
  compiler: gc
  gitCommit: f5fb168
  gitTreeState: clean
  gitVersion: v1.17.1+f5fb168
  goVersion: go1.13.4
  major: "1"
  minor: 17+
  platform: linux/amd64

[root@dhcp-140-138 ~]# oc adm drain node/ip-10-0-187-6.us-east-2.compute.internal
node/ip-10-0-187-6.us-east-2.compute.internal cordoned
error: unable to drain node "ip-10-0-187-6.us-east-2.compute.internal", aborting command...

There are pending nodes to be drained:
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): openshift-cluster-node-tuning-operator/tuned-m4rdr, openshift-dns/dns-default-6lhrh, openshift-image-registry/node-ca-6lkn2, openshift-machine-config-operator/machine-config-daemon-hzg4r, openshift-monitoring/node-exporter-n4k45, openshift-multus/multus-647pc, openshift-sdn/ovs-mzkjd, openshift-sdn/sdn-tnd4z
cannot delete Pods with local storage (use --delete-local-data to override): openshift-monitoring/alertmanager-main-2, openshift-monitoring/kube-state-metrics-5595b5958b-bzcpj

Comment 9 errata-xmlrpc 2020-06-17 22:26:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.