Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1755314

Summary: [4.1.z]machine-config clusteroperator degraded for a long time during upgrade
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: Machine Config OperatorAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED NOTABUG QA Contact: Michael Nguyen <mnguyen>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.1.z   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-25 13:07:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
machine-config info none

Comment 3 Junqi Zhao 2019-09-25 11:51:01 UTC
Created attachment 1618993 [details]
machine-config info

Comment 6 Antonio Murdaca 2019-09-25 13:07:53 UTC
The rollout to the master taking some time is expected and it was probably because there was some load. It eventually reconciles indeed.

The workers are instaed hanging because of the failure to fully terminate/remove/drain this pod:

```
$ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=ip-10-0-66-192.us-east-2.compute.internal
NAMESPACE                                NAME                             READY   STATUS        RESTARTS   AGE     IP            NODE                                        NOMINATED NODE   READINESS GATES
default                                  orca-operator-5bd67bc7f5-xbmst   1/1     Terminating   0          10h     10.128.2.54   ip-10-0-66-192.us-east-2.compute.internal   <none>           <none>
...

```

The drain in the MCD for that node is just waiting for the eviction to finish but it never finishes:

```
...
I0925 12:59:34.904008   72033 update.go:848] Update prepared; beginning drain
I0925 12:59:35.132759   72033 update.go:93] ignoring DaemonSet-managed pods: hello-daemonset-nbgkv, tuned-p77s5, dns-default-bn6jz, node-ca-wkfgd, fluentd-rz8l8, machine-config-daemon-wvdmv, node-exporter-zslwc, multus-sxx82, ovs-55j4m, sdn-7r9d7, hello-daemonset-8c68p; deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: olm-operators-8ldjt
I0925 12:59:44.165447   72033 update.go:89] pod "olm-operators-8ldjt" removed (evicted)
```

This isn't an MCO issue, please get in touch with orca's operator owners and file a bug against them, should be pretty easy to reproduce as well.