Bug 1998673 - Machine Config Daemon pod takes a long time to terminate due to "Got SIGTERM, but actively updating"
Summary: Machine Config Daemon pod takes a long time to terminate due to "Got SIGTERM,...
Keywords:
Status: CLOSED DUPLICATE of bug 1995853
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.7
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Yu Qi Zhang
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-27 22:42 UTC by Sai Sindhur Malleni
Modified: 2021-08-30 19:41 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-30 17:32:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
machine-config-daemon logs (14.49 KB, text/plain)
2021-08-27 22:42 UTC, Sai Sindhur Malleni
no flags Details

Description Sai Sindhur Malleni 2021-08-27 22:42:56 UTC
Created attachment 1818463 [details]
machine-config-daemon logs

Description of problem:
Inan OpenShift upgrade on baremetal from 4.7.11 to 4.7.24, during the upgrade of the machine-config operator, the machine-config-daemon pods take a long time to terminate which not only causes the machine-config operator to degrade but also contributes a lot of time to the overall upgrade time - delaying it significantly. From what I've see each pod stays stuck in terminating for atleast 5-7 mins.


Log snippet
===============================================================================
I0827 18:12:04.150851    5901 update.go:1292] Deleting stale data
I0827 18:12:04.163526    5901 update.go:1735] Writing SSHKeys at "/home/core/.ssh/authorized_keys"
I0827 18:12:04.182142    5901 update.go:1904] Node has Desired Config rendered-worker-group-4-be7070fffc9b1bb28637063800e3cfef, skipping reboot
I0827 18:12:04.183722    5901 daemon.go:802] Current config: rendered-worker-be7070fffc9b1bb28637063800e3cfef
I0827 18:12:04.183732    5901 daemon.go:803] Desired config: rendered-worker-group-4-be7070fffc9b1bb28637063800e3cfef
I0827 18:12:04.193152    5901 daemon.go:1151] Completing pending config rendered-worker-group-4-be7070fffc9b1bb28637063800e3cfef
I0827 18:12:04.193166    5901 update.go:1904] completed update for config rendered-worker-group-4-be7070fffc9b1bb28637063800e3cfef
I0827 18:12:04.194679    5901 daemon.go:1167] In desired config rendered-worker-group-4-be7070fffc9b1bb28637063800e3cfef
I0827 22:34:51.888447    5901 daemon.go:586] Got SIGTERM, but actively updating
=============================================================================


[kni@e16-h18-b03-fc640 kube-burner-templates]$ oc get pods -o wide
NAME                                         READY   STATUS        RESTARTS   AGE     IP                NODE              NOMINATED NODE   READINESS GATES
machine-config-controller-7d9bcdf859-mg54q   1/1     Running       1          2d23h   10.128.0.23       master-0          <none>           <none>
machine-config-daemon-2bxlp                  2/2     Running       0          16m     192.168.216.52    worker039-fc640   <none>           <none>
machine-config-daemon-2jsk4                  2/2     Running       0          2d22h   192.168.216.61    worker048-fc640   <none>           <none>
machine-config-daemon-2mvbq                  2/2     Running       0          16m     192.168.216.113   worker100-fc640   <none>           <none>
machine-config-daemon-2xbq4                  2/2     Running       0          2d22h   192.168.216.53    worker040-fc640   <none>           <none>
machine-config-daemon-2xpwl                  2/2     Running       0          2d19h   192.168.216.94    worker081-fc640   <none>           <none>
machine-config-daemon-2xzdc                  2/2     Running       0          2d19h   192.168.216.106   worker093-fc640   <none>           <none>
machine-config-daemon-474qf                  2/2     Running       0          2d22h   192.168.216.75    worker062-fc640   <none>           <none>
machine-config-daemon-49j6b                  2/2     Running       0          2d22h   192.168.216.81    worker068-fc640   <none>           <none>
machine-config-daemon-4b2tw                  2/2     Running       0          16m     192.168.216.69    worker056-fc640   <none>           <none>
machine-config-daemon-4h4sr                  2/2     Running       0          2d22h   192.168.216.29    worker016-fc640   <none>           <none>
machine-config-daemon-4rrwz                  2/2     Running       0          2d22h   192.168.216.56    worker043-fc640   <none>           <none>
machine-config-daemon-598mz                  2/2     Running       0          26m     192.168.216.99    worker086-fc640   <none>           <none>
machine-config-daemon-5c6kk                  2/2     Terminating   0          2d22h   192.168.216.80    worker067-fc640   <none>           <none>
machine-config-daemon-5j9wp                  2/2     Running       0          26m     192.168.216.78    worker065-fc640   <none>           <none>
machine-config-daemon-5lk7r                  2/2     Running       0          16m     192.168.216.44    worker031-fc640   <none>           <none>
machine-config-daemon-5x4xm                  2/2     Running       0          2d22h   192.168.216.66    worker053-fc640   <none>           <none>
machine-config-daemon-5xqmw                  2/2     Running       0          2d22h   192.168.216.27    worker014-fc640   <none>           <none>
machine-config-daemon-69f54                  2/2     Running       0          2d22h   192.168.216.91    worker078-fc640   <none>           <none>
machine-config-daemon-6cs8x                  2/2     Running       0          16m     192.168.216.129   worker116-fc640   <none>           <none>
machine-config-daemon-6nxlj                  2/2     Running       0          2d19h   192.168.216.118   worker105-fc640   <none>           <none>
machine-config-daemon-6q6n4                  2/2     Running       0          2d22h   192.168.216.45    worker032-fc640   <none>           <none>
machine-config-daemon-6qsq5                  2/2     Running       0          2d19h   192.168.216.130   worker117-r640    <none>           <none>
machine-config-daemon-6r8w8                  2/2     Running       0          2d19h   192.168.216.107   worker094-fc640   <none>           <none>
machine-config-daemon-76vw9                  2/2     Running       0          2d19h   192.168.216.110   worker097-fc640   <none>           <none>
machine-config-daemon-7fzv2                  2/2     Running       0          6m10s   192.168.216.30    worker017-fc640   <none>           <none>
machine-config-daemon-87gpz                  2/2     Running       0          2d19h   192.168.216.102   worker089-fc640   <none>           <none>
machine-config-daemon-8dx2l                  2/2     Running       0          26m     192.168.216.59    worker046-fc640   <none>           <none>
machine-config-daemon-9b94m                  2/2     Running       0          26m     192.168.216.37    worker024-fc640   <none>           <none>
machine-config-daemon-9bvcm                  2/2     Terminating   0          2d22h   192.168.216.89    worker076-fc640   <none>           <none>
machine-config-daemon-9twkq                  2/2     Running       0          2d22h   192.168.216.76    worker063-fc640   <none>           <none>
machine-config-daemon-9w2mg                  2/2     Running       0          16m     192.168.216.16    worker003-fc640   <none>           <none>
machine-config-daemon-9wtq9                  2/2     Running       0          2d19h   192.168.216.126   worker113-fc640   <none>           <none>
machine-config-daemon-b26hp                  2/2     Running       0          2d22h   192.168.216.25    worker012-fc640   <none>           <none>
machine-config-daemon-bv7d5                  2/2     Running       0          6m11s   192.168.216.95    worker082-fc640   <none>           <none>
machine-config-daemon-bw7cs                  2/2     Running       0          6m21s   192.168.216.22    worker009-fc640   <none>           <none>
machine-config-daemon-c5wqs                  2/2     Running       0          2d22h   192.168.216.48    worker035-fc640   <none>           <none>
machine-config-daemon-c66nv                  2/2     Terminating   0          2d22h   192.168.216.72    worker059-fc640   <none>           <none>
machine-config-daemon-cghbz                  2/2     Running       0          2d22h   192.168.216.39    worker026-fc640   <none>           <none>
machine-config-daemon-cjxkx                  2/2     Terminating   0          2d22h   192.168.216.21    worker008-fc640   <none>           <none>
machine-config-daemon-cx8t4                  2/2     Running       0          2d22h   192.168.216.88    worker075-fc640   <none>           <none>
machine-config-daemon-dgskf                  2/2     Running       0          26m     192.168.216.86    worker073-fc640   <none>           <none>
machine-config-daemon-dtkjx                  2/2     Running       0          2d19h   192.168.216.122   worker109-fc640   <none>           <none>
machine-config-daemon-f4npx                  2/2     Running       0          26m     192.168.216.108   worker095-fc640   <none>           <none>
machine-config-daemon-fjrs2                  2/2     Running       0          26m     192.168.216.82    worker069-fc640   <none>           <none>
machine-config-daemon-fn2nk                  2/2     Running       0          2d22h   192.168.216.73    worker060-fc640   <none>           <none>
machine-config-daemon-g8dsp                  2/2     Running       0          2d22h   192.168.216.32    worker019-fc640   <none>           <none>
machine-config-daemon-g96sq                  2/2     Terminating   0          2d19h   192.168.216.93    worker080-fc640   <none>           <none>
machine-config-daemon-gl4sv                  2/2     Terminating   0          2d22h   192.168.216.70    worker057-fc640   <none>           <none>
machine-config-daemon-gr9ls                  2/2     Running       0          2d22h   192.168.216.57    worker044-fc640   <none>           <none>
machine-config-daemon-h9ltw                  2/2     Running       0          6m17s   192.168.216.14    worker001-fc640   <none>           <none>
machine-config-daemon-hgt59                  2/2     Running       0          2d19h   192.168.216.97    worker084-fc640   <none>           <none>
machine-config-daemon-hpb4r                  2/2     Running       0          2d22h   192.168.216.24    worker011-fc640   <none>           <none>
machine-config-daemon-hrhnc                  2/2     Running       0          2d22h   192.168.216.35    worker022-fc640   <none>           <none>
machine-config-daemon-hzrwt                  2/2     Running       0          2d22h   192.168.216.47    worker034-fc640   <none>           <none>
machine-config-daemon-jgsrh                  2/2     Running       0          2d22h   192.168.216.68    worker055-fc640   <none>           <none>
machine-config-daemon-jhqxl                  2/2     Running       0          16m     192.168.216.46    worker033-fc640   <none>           <none>
machine-config-daemon-jtb8b                  2/2     Running       0          2d22h   192.168.216.31    worker018-fc640   <none>           <none>
machine-config-daemon-jtz4f                  2/2     Terminating   0          2d19h   192.168.216.105   worker092-fc640   <none>           <none>
machine-config-daemon-kdzwz                  2/2     Running       0          2d22h   192.168.216.60    worker047-fc640   <none>           <none>
machine-config-daemon-kgjmq                  2/2     Running       0          2d22h   192.168.216.87    worker074-fc640   <none>           <none>
machine-config-daemon-kgrjk                  2/2     Terminating   0          2d22h   192.168.216.85    worker072-fc640   <none>           <none>
machine-config-daemon-kh2fm                  2/2     Running       0          2d19h   192.168.216.115   worker102-fc640   <none>           <none>
machine-config-daemon-krbsz                  2/2     Running       0          26m     192.168.216.74    worker061-fc640   <none>           <none>
machine-config-daemon-ktckx                  2/2     Running       0          2d19h   192.168.216.119   worker106-fc640   <none>           <none>
machine-config-daemon-ktz4x                  2/2     Running       0          2d23h   192.168.216.11    master-1          <none>           <none>
machine-config-daemon-l7vhs                  2/2     Running       0          2d19h   192.168.216.112   worker099-fc640   <none>           <none>
machine-config-daemon-lbbvp                  2/2     Running       0          6m27s   192.168.216.127   worker114-fc640   <none>           <none>
machine-config-daemon-lct9m                  2/2     Running       0          16m     192.168.216.104   worker091-fc640   <none>           <none>
machine-config-daemon-ldbbl                  2/2     Running       0          2d22h   192.168.216.65    worker052-fc640   <none>           <none>
machine-config-daemon-lfv9p                  2/2     Running       0          16m     192.168.216.50    worker037-fc640   <none>           <none>
machine-config-daemon-lks2p                  2/2     Running       0          2d22h   192.168.216.34    worker021-fc640   <none>           <none>
machine-config-daemon-lrbzb                  2/2     Running       0          2d22h   192.168.216.67    worker054-fc640   <none>           <none>
machine-config-daemon-lw7qh                  2/2     Running       0          2d19h   192.168.216.103   worker090-fc640   <none>           <none>
machine-config-daemon-m65cw                  2/2     Running       0          2d19h   192.168.216.125   worker112-fc640   <none>           <none>
machine-config-daemon-m9jr6                  2/2     Running       0          26m     192.168.216.49    worker036-fc640   <none>           <none>
machine-config-daemon-mgv64                  2/2     Running       0          2d22h   192.168.216.13    worker000-fc640   <none>           <none>
machine-config-daemon-mkspb                  2/2     Running       0          2d22h   192.168.216.36    worker023-fc640   <none>           <none>
machine-config-daemon-msbpk                  2/2     Running       0          2d22h   192.168.216.26    worker013-fc640   <none>           <none>
machine-config-daemon-mvsp6                  2/2     Running       0          26m     192.168.216.63    worker050-fc640   <none>           <none>
machine-config-daemon-nz4d2                  2/2     Running       0          2d22h   192.168.216.79    worker066-fc640   <none>           <none>
machine-config-daemon-p4xcb                  2/2     Running       0          2d22h   192.168.216.62    worker049-fc640   <none>           <none>
machine-config-daemon-pdnn5                  2/2     Running       0          2d23h   192.168.216.10    master-0          <none>           <none>
machine-config-daemon-pq5zx                  2/2     Running       0          6m23s   192.168.216.90    worker077-fc640   <none>           <none>
machine-config-daemon-pt45j                  2/2     Running       0          2d19h   192.168.216.116   worker103-fc640   <none>           <none>
machine-config-daemon-pvdrf                  2/2     Running       0          6m17s   192.168.216.38    worker025-fc640   <none>           <none>
machine-config-daemon-pvr8d                  2/2     Running       0          36m     192.168.216.18    worker005-fc640   <none>           <none>
machine-config-daemon-pvs9w                  2/2     Running       0          2d22h   192.168.216.43    worker030-fc640   <none>           <none>
machine-config-daemon-pvw2k                  2/2     Running       0          6m29s   192.168.216.71    worker058-fc640   <none>           <none>
machine-config-daemon-pwbbk                  2/2     Running       0          2d19h   192.168.216.117   worker104-fc640   <none>           <none>
machine-config-daemon-pz2lq                  2/2     Running       0          16m     192.168.216.17    worker004-fc640   <none>           <none>
machine-config-daemon-qpd76                  2/2     Running       0          16m     192.168.216.40    worker027-fc640   <none>           <none>
machine-config-daemon-qvbht                  2/2     Running       0          2d22h   192.168.216.54    worker041-fc640   <none>           <none>
machine-config-daemon-rbv6b                  2/2     Running       0          16m     192.168.216.121   worker108-fc640   <none>           <none>
machine-config-daemon-rghkc                  2/2     Running       0          6m29s   192.168.216.128   worker115-fc640   <none>           <none>
machine-config-daemon-rtnzr                  2/2     Running       0          2d22h   192.168.216.83    worker070-fc640   <none>           <none>
machine-config-daemon-rx7xs                  2/2     Running       0          26m     192.168.216.33    worker020-fc640   <none>           <none>
machine-config-daemon-rzlf9                  2/2     Terminating   0          2d22h   192.168.216.41    worker028-fc640   <none>           <none>
machine-config-daemon-s5qps                  2/2     Running       0          6m7s    192.168.216.55    worker042-fc640   <none>           <none>
machine-config-daemon-s8t6c                  2/2     Running       0          2d19h   192.168.216.96    worker083-fc640   <none>           <none>
machine-config-daemon-scq75                  2/2     Running       0          2d19h   192.168.216.114   worker101-fc640   <none>           <none>
machine-config-daemon-sdjh2                  2/2     Running       0          2d22h   192.168.216.84    worker071-fc640   <none>           <none>
machine-config-daemon-sh8hr                  2/2     Running       0          26m     192.168.216.58    worker045-fc640   <none>           <none>
machine-config-daemon-sjrgd                  2/2     Running       0          5m59s   192.168.216.51    worker038-fc640   <none>           <none>
machine-config-daemon-sqvf8                  2/2     Terminating   0          2d22h   192.168.216.42    worker029-fc640   <none>           <none>
machine-config-daemon-tcbnk                  2/2     Running       0          2d22h   192.168.216.77    worker064-fc640   <none>           <none>
machine-config-daemon-td2bl                  2/2     Running       0          2d19h   192.168.216.111   worker098-fc640   <none>           <none>
machine-config-daemon-tw9xw                  2/2     Running       0          6m      192.168.216.19    worker006-fc640   <none>           <none>
machine-config-daemon-vkpw8                  2/2     Running       0          2d19h   192.168.216.123   worker110-fc640   <none>           <none>
machine-config-daemon-w8hd2                  2/2     Running       0          2d22h   192.168.216.23    worker010-fc640   <none>           <none>
machine-config-daemon-w9l6h                  2/2     Running       0          6m15s   192.168.216.109   worker096-fc640   <none>           <none>
machine-config-daemon-wctwv                  2/2     Running       0          2d19h   192.168.216.120   worker107-fc640   <none>           <none>
machine-config-daemon-wf69d                  2/2     Running       0          2d19h   192.168.216.100   worker087-fc640   <none>           <none>
machine-config-daemon-wrg7l                  2/2     Terminating   0          2d19h   192.168.216.124   worker111-fc640   <none>           <none>
machine-config-daemon-wvfs5                  2/2     Running       0          6m22s   192.168.216.28    worker015-fc640   <none>           <none>
machine-config-daemon-x98qj                  2/2     Running       0          26m     192.168.216.15    worker002-fc640   <none>           <none>
machine-config-daemon-xcr72                  2/2     Running       0          16m     192.168.216.12    master-2          <none>           <none>
machine-config-daemon-xzv49                  2/2     Terminating   0          2d22h   192.168.216.20    worker007-fc640   <none>           <none>
machine-config-daemon-zmz9b                  2/2     Running       0          2d22h   192.168.216.92    worker079-fc640   <none>           <none>
machine-config-daemon-zrlrp                  2/2     Running       0          2d19h   192.168.216.101   worker088-fc640   <none>           <none>
machine-config-daemon-zzpcc                  2/2     Running       0          2d19h   192.168.216.98    worker085-fc640   <none>           <none>
machine-config-operator-b67f5997c-5qcwn      1/1     Running       0          38m     10.129.0.6        master-1          <none>           <none>
machine-config-server-4v4fx                  1/1     Running       0          2d23h   192.168.216.11    master-1          <none>           <none>
machine-config-server-p9fvw                  1/1     Running       0          2d23h   192.168.216.12    master-2          <none>           <none>
machine-config-server-vf58h                  1/1     Running       0          2d23h   192.168.216.10    master-0          <none>           <none>

===========================================================================
[kni@e16-h18-b03-fc640 kube-burner-templates]$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.24    True        False         False      126m
baremetal                                  4.7.24    True        False         False      2d23h
cloud-credential                           4.7.24    True        False         False      2d23h
cluster-autoscaler                         4.7.24    True        False         False      2d23h
config-operator                            4.7.24    True        False         False      2d23h
console                                    4.7.24    True        False         False      133m
csi-snapshot-controller                    4.7.24    True        False         False      126m
dns                                        4.7.24    True        False         False      2d23h
etcd                                       4.7.24    True        False         False      2d23h
image-registry                             4.7.24    True        False         False      3h58m
ingress                                    4.7.24    True        False         False      2d22h
insights                                   4.7.24    True        False         False      2d23h
kube-apiserver                             4.7.24    True        False         False      2d23h
kube-controller-manager                    4.7.24    True        False         False      2d23h
kube-scheduler                             4.7.24    True        False         False      2d23h
kube-storage-version-migrator              4.7.24    True        False         False      2d22h
machine-api                                4.7.24    True        False         False      2d23h
machine-approver                           4.7.24    True        False         False      2d23h
machine-config                             4.7.11    False       True          True       39m
marketplace                                4.7.24    True        False         False      134m
monitoring                                 4.7.24    True        False         False      132m
network                                    4.7.24    True        False         False      2d23h
node-tuning                                4.7.24    True        False         False      135m
openshift-apiserver                        4.7.24    True        False         False      126m
openshift-controller-manager               4.7.24    True        False         False      2d23h
openshift-samples                          4.7.24    True        False         False      135m
operator-lifecycle-manager                 4.7.24    True        False         False      2d23h
operator-lifecycle-manager-catalog         4.7.24    True        False         False      2d23h
operator-lifecycle-manager-packageserver   4.7.24    True        False         False      135m
service-ca                                 4.7.24    True        False         False      2d23h
storage                                    4.7.24    True        False         False      2d23h
=========================================================================
Version-Release number of selected component (if applicable):
4.7.11 -> 4.7.24 uprade

How reproducible:
100%

Steps to Reproduce:
1. Kick upgrade on cluster
2. wait until machine-config operator is updated
3. Observe the status of the machine-config-daemon pods

Actual results:
Pods are stuck in terminating for a long time

Expected results:
machine-config-daemon pods like other pods should terminate gracefull in a short time after receiving SIGTERM

Additional info:

Comment 1 Sinny Kumari 2021-08-30 16:41:44 UTC
This could be related to  BZ https://bugzilla.redhat.com/show_bug.cgi?id=1995853? Before update, was there a MachineConig change applied that didn't require node reboot?

Comment 2 Yu Qi Zhang 2021-08-30 17:32:04 UTC
In the attached MCD logs, we see:

I0827 18:11:43.227189    6634 update.go:1904] Node has Desired Config rendered-worker-group-3-be7070fffc9b1bb28637063800e3cfef, skipping reboot

So it is very likely a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1995853, which is in the backport process. I will mark this as a duplicate and up the urgency of that.

If you would like to make sure, please attach a must-gather with the fully logs, so we can see what the update was and the timing.

*** This bug has been marked as a duplicate of bug 1995853 ***

Comment 3 Sai Sindhur Malleni 2021-08-30 18:19:30 UTC
(In reply to Sinny Kumari from comment #1)
> This could be related to  BZ
> https://bugzilla.redhat.com/show_bug.cgi?id=1995853? Before update, was
> there a MachineConig change applied that didn't require node reboot?

Yes, this is a  large 120 node environment. So we split up existing worker nodes into 11 MCPs and since the configuration didn't change - a reboot was not required.

Comment 4 Sai Sindhur Malleni 2021-08-30 18:24:24 UTC
So yes, I did split up the worker nodes into multiple MCPs before the upgrade so they got added to a new MCP without needing a reboot - so rebootless upgrades are the trigger even for https://bugzilla.redhat.com/show_bug.cgi?id=1995853 right?

Comment 5 Yu Qi Zhang 2021-08-30 19:41:58 UTC
Correct. https://bugzilla.redhat.com/show_bug.cgi?id=1995853 would manifest if you perform a rebootless update of any kind, and then another update. So it sounds like a duplicate.

The fix is already in 4.9 and 4.8, if you would like to test that. Otherwise we need to wait for patch manager approval for the linked BZ for 4.7


Note You need to log in before you can comment on or make changes to this bug.