Bug 1968759
Summary: | drain timeout and pool degrading period is too short | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Kirsten Garrison <kgarriso> |
Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.7 | CC: | bembery, wking |
Target Milestone: | --- | ||
Target Release: | 4.7.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-06-29 04:19:45 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1968019 | ||
Bug Blocks: | 1987221 |
Description
Kirsten Garrison
2021-06-07 22:31:34 UTC
Doing a manual backport for this one. OpenShift engineering has decided to not ship Red Hat OpenShift Container Platform 4.7.17 due a regression https://bugzilla.redhat.com/show_bug.cgi?id=1973006. All the fixes which were part of 4.7.17 will be now part of 4.7.18 and planned to be available in candidate channel on June 23 2021 and in fast channel on June 28th. $ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.0-0.nightly-2021-06-20-093308 True False 30m Cluster version is 4.7.0-0.nightly-2021-06-20-093308
$ cat pdb.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: dontevict
spec:
minAvailable: 1
selector:
matchLabels:
app: dontevict
$ oc create -f pdb.yaml
poddisruptionbudget.policy/dontevict created
$ oc get pdb
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
dontevict 1 N/A 0 4s
$ oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-143-129.us-west-2.compute.internal Ready master 52m v1.20.0+87cc9a4
ip-10-0-145-198.us-west-2.compute.internal Ready worker 47m v1.20.0+87cc9a4
ip-10-0-162-222.us-west-2.compute.internal Ready worker 47m v1.20.0+87cc9a4
ip-10-0-173-218.us-west-2.compute.internal Ready master 53m v1.20.0+87cc9a4
ip-10-0-220-204.us-west-2.compute.internal Ready master 53m v1.20.0+87cc9a4
ip-10-0-222-240.us-west-2.compute.internal Ready worker 42m v1.20.0+87cc9a4
$ oc get node/ip-10-0-222-240.us-west-2.compute.internal -o yaml | grep hostname
kubernetes.io/hostname: ip-10-0-222-240
f:kubernetes.io/hostname: {}
$ oc run --restart=Never --labels app=dontevict --overrides='{ "spec": { "nodeSelector": { "kubernetes.io/hostname": "ip-10-0-222-240"} } }' --image=quay.io/prometheus/busybox dont-evict-this-pod -- sleep 4h
pod/dont-evict-this-pod created
$ oc get pod
NAME READY STATUS RESTARTS AGE
dont-evict-this-pod 0/1 ContainerCreating 0 4s
$ cat << EOF > file-ig3.yaml
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
> labels:
> machineconfiguration.openshift.io/role: worker
> name: test-file
> spec:
> config:
> ignition:
> version: 3.1.0
> storage:
> files:
> - contents:
> source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
> filesystem: root
> mode: 0644
> path: /etc/test
> EOF
$ oc create file -f file-ig3.yaml
error: Unexpected args: [file]
See 'oc create -h' for help and examples.
$ oc create -f file-ig3.yaml
machineconfig.machineconfiguration.openshift.io/test-file created
$ oc get mc
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
00-worker 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
01-master-container-runtime 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
01-master-kubelet 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
01-worker-container-runtime 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
01-worker-kubelet 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
99-master-generated-registries 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
99-master-ssh 3.2.0 62m
99-worker-generated-registries 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
99-worker-ssh 3.2.0 62m
rendered-master-13a8e238f99c2aae13c24eac159a2db2 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
rendered-worker-96cc244920a4ac522616e39024e1d35d 8530c27d3d9b6155923d348058bc025a6a98ec3c 3.2.0 53m
test-file 3.1.0 3s
$ oc get pods -A --field-selector spec.nodeName=ip-10-0-222-240.us-west-2.compute.internal | grep machine-config-daemon
openshift-machine-config-operator machine-config-daemon-6gkv4 2/2 Running 0 45m
$ oc -n openshift-machine-config-operator logs -f machine-config-daemon-6gkv4 -c machine-config-daemon | grep 'Draining failed with'
I0621 14:15:24.136453 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
I0621 14:17:54.532931 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
I0621 14:20:24.923638 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
I0621 14:22:55.313792 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
I0621 14:25:25.750570 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
I0621 14:27:56.145450 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
I0621 14:34:26.537000 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
I0621 14:40:56.929332 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
I0621 14:47:27.320342 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
I0621 14:53:57.705077 1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.18 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2502 *** Bug 1906254 has been marked as a duplicate of this bug. *** |