1760484 – [TELEMETRY] Proactive: machine-config is reporting degraded during an update from 4.1.18 -> 4.2.0-rc.1 in Telemetry reporting NEEDINFO

Bug 1760484 - [TELEMETRY] Proactive: machine-config is reporting degraded during an update from 4.1.18 -> 4.2.0-rc.1 in Telemetry reporting NEEDINFO

Summary: [TELEMETRY] Proactive: machine-config is reporting degraded during an update ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1768879
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Micah Abbott
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-10-10 16:14 UTC by Yu Qi Zhang
Modified:	2023-09-07 20:46 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-31 19:27:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
must-gather.partaf (18.12 MB, application/octet-stream) 2019-10-11 22:01 UTC, Eric Jones	no flags	Details
must-gather.partal (18.12 MB, application/octet-stream) 2019-10-11 22:02 UTC, Eric Jones	no flags	Details
must-gather.partar (18.12 MB, application/octet-stream) 2019-10-11 22:02 UTC, Eric Jones	no flags	Details
must-gather.partax (18.12 MB, application/octet-stream) 2019-10-11 22:02 UTC, Eric Jones	no flags	Details
must-gather.partbd (4.87 MB, application/octet-stream) 2019-10-11 22:03 UTC, Eric Jones	no flags	Details
must-gather.partaa (18.12 MB, application/octet-stream) 2019-10-11 22:03 UTC, Eric Jones	no flags	Details
must-gather.partag (18.12 MB, application/octet-stream) 2019-10-11 22:03 UTC, Eric Jones	no flags	Details
must-gather.partam (18.12 MB, application/octet-stream) 2019-10-11 22:04 UTC, Eric Jones	no flags	Details
must-gather.partas (18.12 MB, application/octet-stream) 2019-10-11 22:04 UTC, Eric Jones	no flags	Details
must-gather.partay (18.12 MB, application/octet-stream) 2019-10-11 22:04 UTC, Eric Jones	no flags	Details
must-gather.partab (18.12 MB, application/octet-stream) 2019-10-11 22:04 UTC, Eric Jones	no flags	Details
must-gather.partah (18.12 MB, application/octet-stream) 2019-10-11 22:05 UTC, Eric Jones	no flags	Details
must-gather.partan (18.12 MB, application/octet-stream) 2019-10-11 22:05 UTC, Eric Jones	no flags	Details
must-gather.partat (18.12 MB, application/octet-stream) 2019-10-11 22:05 UTC, Eric Jones	no flags	Details
must-gather.partaz (18.12 MB, application/octet-stream) 2019-10-11 22:06 UTC, Eric Jones	no flags	Details
must-gather.partac (18.12 MB, application/octet-stream) 2019-10-11 22:06 UTC, Eric Jones	no flags	Details
must-gather.partai (18.12 MB, application/octet-stream) 2019-10-11 22:06 UTC, Eric Jones	no flags	Details
must-gather.partao (18.12 MB, application/octet-stream) 2019-10-11 22:07 UTC, Eric Jones	no flags	Details
must-gather.partau (18.12 MB, application/octet-stream) 2019-10-11 22:07 UTC, Eric Jones	no flags	Details
must-gather.partba (18.12 MB, application/octet-stream) 2019-10-11 22:07 UTC, Eric Jones	no flags	Details
must-gather.partad (18.12 MB, application/octet-stream) 2019-10-11 22:08 UTC, Eric Jones	no flags	Details
must-gather.partaj (18.12 MB, application/octet-stream) 2019-10-11 22:08 UTC, Eric Jones	no flags	Details
must-gather.partap (18.12 MB, application/octet-stream) 2019-10-11 22:08 UTC, Eric Jones	no flags	Details
must-gather.partav (18.12 MB, application/octet-stream) 2019-10-11 22:09 UTC, Eric Jones	no flags	Details
must-gather.partbb (18.12 MB, application/octet-stream) 2019-10-11 22:09 UTC, Eric Jones	no flags	Details
must-gather.partae (18.12 MB, application/octet-stream) 2019-10-11 22:09 UTC, Eric Jones	no flags	Details
must-gather.partak (18.12 MB, application/octet-stream) 2019-10-11 22:09 UTC, Eric Jones	no flags	Details
must-gather.partaq (18.12 MB, application/octet-stream) 2019-10-11 22:10 UTC, Eric Jones	no flags	Details
must-gather.partaw (18.12 MB, application/octet-stream) 2019-10-11 22:10 UTC, Eric Jones	no flags	Details
must-gather.partbc (18.12 MB, application/octet-stream) 2019-10-11 22:10 UTC, Eric Jones	no flags	Details
View All

Description Yu Qi Zhang 2019-10-10 16:14:12 UTC

Description of problem:
Based on telemetry data, a UPI AWS cluster is stuck trying to upgrade from 4.1.18 to 4.2.0-rc.1 since 6 days ago. machine-config is reporting degraded with RequiredPoolsFailed. Supportshell shows:

Unable to apply 4.2.0-rc.1: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-d7df7ffc4886508dcc5aaa2ed70cad6e expected b8898db9af98e5c3d6a450ae123121677b0dbcb3 has a2175e587b007272f26305fe7d8b603c49e8f1fc, retrying


Version-Release number of selected component (if applicable):
4.1.18 -> 4.2.0-rc.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Eric Jones 2019-10-11 22:01:58 UTC

Created attachment 1624884 [details]
must-gather.partaf

Comment 4 Eric Jones 2019-10-11 22:02:17 UTC

Created attachment 1624885 [details]
must-gather.partal

Comment 5 Eric Jones 2019-10-11 22:02:36 UTC

Created attachment 1624886 [details]
must-gather.partar

Comment 6 Eric Jones 2019-10-11 22:02:58 UTC

Created attachment 1624887 [details]
must-gather.partax

Comment 7 Eric Jones 2019-10-11 22:03:06 UTC

Created attachment 1624888 [details]
must-gather.partbd

Comment 8 Eric Jones 2019-10-11 22:03:23 UTC

Created attachment 1624889 [details]
must-gather.partaa

Comment 9 Eric Jones 2019-10-11 22:03:42 UTC

Created attachment 1624890 [details]
must-gather.partag

Comment 10 Eric Jones 2019-10-11 22:04:02 UTC

Created attachment 1624891 [details]
must-gather.partam

Comment 11 Eric Jones 2019-10-11 22:04:21 UTC

Created attachment 1624892 [details]
must-gather.partas

Comment 12 Eric Jones 2019-10-11 22:04:40 UTC

Created attachment 1624893 [details]
must-gather.partay

Comment 13 Eric Jones 2019-10-11 22:04:58 UTC

Created attachment 1624894 [details]
must-gather.partab

Comment 14 Eric Jones 2019-10-11 22:05:16 UTC

Created attachment 1624895 [details]
must-gather.partah

Comment 15 Eric Jones 2019-10-11 22:05:35 UTC

Created attachment 1624896 [details]
must-gather.partan

Comment 16 Eric Jones 2019-10-11 22:05:53 UTC

Created attachment 1624897 [details]
must-gather.partat

Comment 17 Eric Jones 2019-10-11 22:06:12 UTC

Created attachment 1624898 [details]
must-gather.partaz

Comment 18 Eric Jones 2019-10-11 22:06:30 UTC

Created attachment 1624899 [details]
must-gather.partac

Comment 19 Eric Jones 2019-10-11 22:06:49 UTC

Created attachment 1624900 [details]
must-gather.partai

Comment 20 Eric Jones 2019-10-11 22:07:10 UTC

Created attachment 1624901 [details]
must-gather.partao

Comment 21 Eric Jones 2019-10-11 22:07:28 UTC

Created attachment 1624902 [details]
must-gather.partau

Comment 22 Eric Jones 2019-10-11 22:07:47 UTC

Created attachment 1624903 [details]
must-gather.partba

Comment 23 Eric Jones 2019-10-11 22:08:05 UTC

Created attachment 1624904 [details]
must-gather.partad

Comment 24 Eric Jones 2019-10-11 22:08:24 UTC

Created attachment 1624905 [details]
must-gather.partaj

Comment 25 Eric Jones 2019-10-11 22:08:42 UTC

Created attachment 1624906 [details]
must-gather.partap

Comment 26 Eric Jones 2019-10-11 22:09:01 UTC

Created attachment 1624907 [details]
must-gather.partav

Comment 27 Eric Jones 2019-10-11 22:09:20 UTC

Created attachment 1624908 [details]
must-gather.partbb

Comment 28 Eric Jones 2019-10-11 22:09:40 UTC

Created attachment 1624909 [details]
must-gather.partae

Comment 29 Eric Jones 2019-10-11 22:09:59 UTC

Created attachment 1624910 [details]
must-gather.partak

Comment 30 Eric Jones 2019-10-11 22:10:19 UTC

Created attachment 1624911 [details]
must-gather.partaq

Comment 31 Eric Jones 2019-10-11 22:10:37 UTC

Created attachment 1624912 [details]
must-gather.partaw

Comment 32 Eric Jones 2019-10-11 22:10:56 UTC

Created attachment 1624913 [details]
must-gather.partbc

Comment 33 Antonio Murdaca 2019-10-14 16:16:42 UTC

So masters on this cluster have been ssh accessed:

2019-10-07T15:37:07.212551663Z I1007 15:37:07.212474   11165 daemon.go:542] Detected a new login session: New session 1 of user core.
2019-10-07T15:37:07.212551663Z I1007 15:37:07.212492   11165 daemon.go:543] Login access is discouraged! Applying annotation: machineconfiguration.openshift.io/ssh

Then, one of the MCD for a master says:

```
2019-10-11T19:27:21.264402178Z 
2019-10-11T19:27:21.264402178Z A: cgroupDriver: systemd
2019-10-11T19:27:21.264402178Z clusterDNS:
2019-10-11T19:27:21.264402178Z   - 10.56.0.10
2019-10-11T19:27:21.264402178Z clusterDomain: cluster.local
2019-10-11T19:27:21.264402178Z maxPods: 250
2019-10-11T19:27:21.264402178Z runtimeRequestTimeout: 10m
2019-10-11T19:27:21.264402178Z serializeImagePulls: false
2019-10-11T19:27:21.264402178Z staticPodPath: /etc/kubernetes/manifests
2019-10-11T19:27:21.264402178Z systemReserved:
2019-10-11T19:27:21.264402178Z   cpu: 500m
2019-10-11T19:27:21.264402178Z   memory: 500Mi
2019-10-11T19:27:21.264402178Z featureGates:
2019-10-11T19:27:21.264402178Z   RotateKubeletServerCertificate: true
2019-10-11T19:27:21.264402178Z   ExperimentalCriticalPodAnnotation: true
2019-10-11T19:27:21.264402178Z   SupportPodPidsLimit: true
2019-10-11T19:27:21.264402178Z   LocalStorageCapacityIsolation: false
2019-10-11T19:27:21.264402178Z serverTLSBootstrap: true
2019-10-11T19:27:21.264402178Z 
2019-10-11T19:27:21.264402178Z 
2019-10-11T19:27:21.264402178Z B: authentication:
2019-10-11T19:27:21.264402178Z   x509:
2019-10-11T19:27:21.264402178Z     clientCAFile: /etc/kubernetes/kubelet-ca.crt
2019-10-11T19:27:21.264402178Z   anonymous:
2019-10-11T19:27:21.264402178Z     enabled: false
2019-10-11T19:27:21.264402178Z cgroupDriver: systemd
2019-10-11T19:27:21.264402178Z clusterDNS:
2019-10-11T19:27:21.264402178Z   - 10.56.0.10
2019-10-11T19:27:21.264402178Z clusterDomain: cluster.local
2019-10-11T19:27:21.264402178Z containerLogMaxSize: 50Mi
2019-10-11T19:27:21.264402178Z maxPods: 250
2019-10-11T19:27:21.264402178Z serializeImagePulls: false
2019-10-11T19:27:21.264402178Z staticPodPath: /etc/kubernetes/manifests
2019-10-11T19:27:21.264402178Z systemReserved:
2019-10-11T19:27:21.264402178Z   cpu: 500m
2019-10-11T19:27:21.264402178Z   memory: 500Mi
2019-10-11T19:27:21.264402178Z featureGates:
2019-10-11T19:27:21.264402178Z   RotateKubeletServerCertificate: true
2019-10-11T19:27:21.264402178Z   ExperimentalCriticalPodAnnotation: true
2019-10-11T19:27:21.264402178Z   SupportPodPidsLimit: true
2019-10-11T19:27:21.264402178Z   LocalStorageCapacityIsolation: false
2019-10-11T19:27:21.264402178Z serverTLSBootstrap: true
2019-10-11T19:27:21.264402178Z 
```

so looks like someone jumped on the node and changed the kubelet.conf manually? and now MCD is surely complaining.

Comment 34 Antonio Murdaca 2019-10-14 16:18:13 UTC

moving to 4.3 to further investigate but since no other such reports came, I'm still leaning towards not a blocker since it looks like someone manually patched configurations.

Comment 35 Eric Jones 2019-10-16 14:10:24 UTC

Hi Antonio,

Thank you for those notes, I will reach out to the customer to see what changes they made and will update here once they get back to me.

Comment 37 Kirsten Garrison 2019-11-08 19:32:29 UTC

re podman's pivot problem, RHCOS team: PTAL

Comment 38 Colin Walters 2019-11-08 19:35:35 UTC

This may be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1768879

Comment 39 Micah Abbott 2019-11-08 22:09:56 UTC

@erjones  Would it be possible to have the customer try out the workaround in https://bugzilla.redhat.com/show_bug.cgi?id=1768879#c13

Essentially, upgrade the cluster to a newer version of 4.1 that has the fixed `podman`, then try upgrading to OCP 4.2

Comment 42 Micah Abbott 2019-12-11 20:25:49 UTC

Without additional information, we are unable to investigate this further for in time for the 4.3 deadline that is approaching.  Moving to 4.4

Comment 44 Colin Walters 2020-01-31 19:27:34 UTC

I believe this was a dup - support, please try having them upgrade to the latest 4.1 before proceeding onto the latest >= 4.2.

*** This bug has been marked as a duplicate of bug 1768879 ***

Note You need to log in before you can comment on or make changes to this bug.