1835739 – Support 'oc adm node drain' without --ignore-daemonsets=true --delete-local-data=true flags

Bug 1835739 - Support 'oc adm node drain' without --ignore-daemonsets=true --delete-local-data=true flags

Summary: Support 'oc adm node drain' without --ignore-daemonsets=true --delete-local-d...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	oc
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.4.z
Assignee:	Maciej Szulik
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:	1835628
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-14 12:14 UTC by Maciej Szulik
Modified:	2020-06-17 22:26 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Wrong condition in the code caused the logic to ignore deleted pods. Consequence: oc adm node drain was not properly accounting daemon sets and local data attached to pods when draining a node. Fix: Fix the logic, so that all pods are accounted accordingly when draining a node. Result: When trying to drain a node which has a daemonset's pod running, or pod has attached local volume data the drain command will fail pointing to use flags which will ignore the two.
Clone Of:	1835628
Environment:
Last Closed:	2020-06-17 22:26:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift oc pull 420	0	None	closed	[release-4.4] Bug 1835739: Fix oc drain to ignore daemonsets and others	2020-10-21 05:46:16 UTC
Red Hat Product Errata	RHBA-2020:2445	0	None	None	None	2020-06-17 22:26:53 UTC

Description Maciej Szulik 2020-05-14 12:14:28 UTC

+++ This bug was initially created as a clone of Bug #1835628 +++

Description of problem:

Running 'oc adm node drain' without --ignore-daemonsets --delete-local-data flags resulting in error:
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): openshift-cluster-node-tuning-operator/tuned-2kls9, openshift-dns/dns-default-c675s, openshift-image-registry/node-ca-d57th, openshift-machine-config-operator/machine-config-daemon-c5kx4, openshift-monitoring/node-exporter-jf5cr, openshift-multus/multus-cxqwz, openshift-ovn-kubernetes/ovnkube-node-zbzpv, openshift-ovn-kubernetes/ovs-node-kbjgf
cannot delete Pods with local storage (use --delete-local-data to override): openshift-image-registry/image-registry-5bcb67497d-gvq6b

Version-Release number of selected component (if applicable):
Client Version: 4.5
Server Version: 4.5

How reproducible:
Always with 4.5 client version

Steps to Reproduce:
Run 'oc adm drain NODE'

Actual results:
Error reported, drain aborted

Expected results:
Node drained, application pods evicted

Additional info:
1. When used Client Version 4.4 there is no problem with running the command without flags
2. The similar problem reported for CNV (see linked bug). Our setup is BareMetal without CNV

--- Additional comment from Maciej Szulik on 2020-05-14 13:40:00 CEST ---



--- Additional comment from Maciej Szulik on 2020-05-14 14:13:21 CEST ---

This is not a bug, this was reported upstream in https://github.com/kubernetes/kubectl/issues/803
and only kubectl 1.17 and accompanying oc 4.4 are affected. If you try older version of oc 4.3 or 4.2
you'll get a similar error. I'm closing this as not a bug and I'll try to cherry-pick the fix into 4.4.

Comment 1 Maciej Szulik 2020-05-14 12:15:24 UTC

Actually, it's the opposite, we need to pick https://github.com/kubernetes/kubernetes/pull/87361 fix so that 
oc adm drain warns about daemonsets and local storage.

Comment 2 Maciej Szulik 2020-05-20 09:39:39 UTC

PR waiting in queue https://github.com/openshift/oc/pull/420

Comment 5 zhou ying 2020-06-01 06:39:19 UTC

Confirmed with latest oc: with DaemonSets or Volumes attached should give you the warning and abort the drain. 

[root@dhcp-140-138 ~]# oc version -o yaml 
clientVersion:
  buildDate: "2020-05-29T06:43:55Z"
  compiler: gc
  gitCommit: 1960dd73b123241730531db09489d951228ad853
  gitTreeState: clean
  gitVersion: 4.4.0-202005290638-1960dd7
  goVersion: go1.13.4
  major: ""
  minor: ""
  platform: linux/amd64
openshiftVersion: 4.4.0-0.nightly-2020-05-30-022631
serverVersion:
  buildDate: "2020-05-30T01:52:40Z"
  compiler: gc
  gitCommit: f5fb168
  gitTreeState: clean
  gitVersion: v1.17.1+f5fb168
  goVersion: go1.13.4
  major: "1"
  minor: 17+
  platform: linux/amd64

[root@dhcp-140-138 ~]# oc adm drain node/ip-10-0-187-6.us-east-2.compute.internal
node/ip-10-0-187-6.us-east-2.compute.internal cordoned
error: unable to drain node "ip-10-0-187-6.us-east-2.compute.internal", aborting command...

There are pending nodes to be drained:
 ip-10-0-187-6.xxx-2.compute.internal
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): openshift-cluster-node-tuning-operator/tuned-m4rdr, openshift-dns/dns-default-6lhrh, openshift-image-registry/node-ca-6lkn2, openshift-machine-config-operator/machine-config-daemon-hzg4r, openshift-monitoring/node-exporter-n4k45, openshift-multus/multus-647pc, openshift-sdn/ovs-mzkjd, openshift-sdn/sdn-tnd4z
cannot delete Pods with local storage (use --delete-local-data to override): openshift-monitoring/alertmanager-main-2, openshift-monitoring/kube-state-metrics-5595b5958b-bzcpj

Comment 9 errata-xmlrpc 2020-06-17 22:26:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2445

Note You need to log in before you can comment on or make changes to this bug.