Bug 2026562 fixed the error for the field ".spec.maxUnhealthy". Now found another field ".spec.unhealthyConditions[0].timeout" of similar error in latest 4.8 payload, so creating this separate bug to track this new field, details are given in next comment
Reproduced again in NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.24 True False 7m15s Cluster version is 4.8.24 /var/log/pods/openshift-kube-apiserver_kube-apiserver-qeci-..-lftfh-master-1_b..e-8c..-..b4-a9ba-ae0aa..be../kube-apiserver/..log..-..-..+.. stderr F E.. .. .. fieldmanager.go..] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (machine.openshift.io/v1beta1, Kind=MachineHealthCheck) to smd typed: .spec.unhealthyConditions[0].timeout: expected string, got &value.valueUnstructured{Value..}" VersionKind="machine.openshift.io/v1beta1, Kind=MachineHealthCheck" 1
(In reply to Rahul Gangwar from comment #1) > Reproduced again in > NAME VERSION AVAILABLE PROGRESSING SINCE STATUS > version 4.8.24 True False 7m15s Cluster version is > 4.8.24 Hi Rahul, can you please show me how to get this log? > /var/log/pods/openshift-kube-apiserver_kube-apiserver-qeci-..-lftfh-master- > 1_b..e-8c..-..b4-a9ba-ae0aa..be../kube-apiserver/..log..-..-..+.. stderr F > E.. .. .. fieldmanager.go..] "[SHOULD NOT HAPPEN] failed to update > managedFields" err="failed to convert new object > (machine.openshift.io/v1beta1, Kind=MachineHealthCheck) to smd typed: > .spec.unhealthyConditions[0].timeout: expected string, got > &value.valueUnstructured{Value..}" > VersionKind="machine.openshift.io/v1beta1, Kind=MachineHealthCheck" > 1 I checked like this but didn't get the error. 1.Vsphere 7 IPI installed cluster with windows_server_ver_2004 workers and OVNKubernetes network 2.Run this test case https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-25691 3.Check the kube-apiserver logs on all master nodes. liuhuali@Lius-MacBook-Pro huali-test % oc get pod -n openshift-kube-apiserver NAME READY STATUS RESTARTS AGE installer-2-huliu131-gsm4h-master-1 0/1 Completed 0 85m installer-3-huliu131-gsm4h-master-0 0/1 Completed 0 84m installer-3-huliu131-gsm4h-master-2 0/1 Completed 0 82m installer-4-huliu131-gsm4h-master-2 0/1 Completed 0 81m installer-5-huliu131-gsm4h-master-1 0/1 Completed 0 80m installer-6-huliu131-gsm4h-master-0 0/1 Completed 0 72m installer-6-huliu131-gsm4h-master-1 0/1 Completed 0 79m installer-6-huliu131-gsm4h-master-2 0/1 Completed 0 75m installer-7-huliu131-gsm4h-master-0 0/1 Completed 0 63m installer-7-huliu131-gsm4h-master-1 0/1 Completed 0 66m installer-7-huliu131-gsm4h-master-2 0/1 Completed 0 59m kube-apiserver-huliu131-gsm4h-master-0 5/5 Running 0 61m kube-apiserver-huliu131-gsm4h-master-1 5/5 Running 0 64m kube-apiserver-huliu131-gsm4h-master-2 5/5 Running 0 57m revision-pruner-3-huliu131-gsm4h-master-0 0/1 Completed 0 82m revision-pruner-6-huliu131-gsm4h-master-0 0/1 Completed 0 67m revision-pruner-6-huliu131-gsm4h-master-1 0/1 Completed 0 76m revision-pruner-6-huliu131-gsm4h-master-2 0/1 Completed 0 72m revision-pruner-7-huliu131-gsm4h-master-0 0/1 Completed 0 60m revision-pruner-7-huliu131-gsm4h-master-1 0/1 Completed 0 64m revision-pruner-7-huliu131-gsm4h-master-2 0/1 Completed 0 56m liuhuali@Lius-MacBook-Pro huali-test % oc logs kube-apiserver-huliu131-gsm4h-master-0 -n openshift-kube-apiserver|grep "SHOULD NOT HAPPEN" Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-huliu131-gsm4h-master-0. Please use `kubectl.kubernetes.io/default-container` instead liuhuali@Lius-MacBook-Pro huali-test % oc logs kube-apiserver-huliu131-gsm4h-master-2 -n openshift-kube-apiserver|grep "SHOULD NOT HAPPEN" Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-huliu131-gsm4h-master-2. Please use `kubectl.kubernetes.io/default-container` instead liuhuali@Lius-MacBook-Pro huali-test % oc logs kube-apiserver-huliu131-gsm4h-master-1 -n openshift-kube-apiserver|grep "SHOULD NOT HAPPEN" Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-huliu131-gsm4h-master-1. Please use `kubectl.kubernetes.io/default-container` instead liuhuali@Lius-MacBook-Pro huali-test %
We got the logs from test run. Cluster template - ipi-on-osp/versioned-installer-fips-ovn-ci must-gather link - https://drive.google.com/file/d/1q3zugFCdOAsWlGqTXyE1ApmY3OdEFddT/view?usp=sharing find . -name "*.log"|grep -inr "SHOULD NOT HAPPEN" ./must-gather.local.3960990593128029605/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-a4dc776b581ba6c4b2d1a5a08ccd09598177e2934e9159d0eac4067b1bdcce01/audit_logs/kube-apiserver/qeci-30541-fvs2b-master-1.c.openshift-qe.internal-termination.log:2814:E1208 23:07:04.288418 20 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind=" ./must-gather.local.3960990593128029605/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-a4dc776b581ba6c4b2d1a5a08ccd09598177e2934e9159d0eac4067b1bdcce01/audit_logs/kube-apiserver/qeci-30541-fvs2b-master-1.c.openshift-qe.internal-termination.log:3605:E1208 23:07:10.082625 20 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
ok, thanks for you information. I'll check again.
Hi Rahul, (In reply to Rahul Gangwar from comment #3) > We got the logs from test run. > Cluster template - ipi-on-osp/versioned-installer-fips-ovn-ci > must-gather link - > https://drive.google.com/file/d/1q3zugFCdOAsWlGqTXyE1ApmY3OdEFddT/ > view?usp=sharing > in Comment 3 the error shows "(certificates.k8s.io/v1, Kind=CertificateSigningRequest)", > find . -name "*.log"|grep -inr "SHOULD NOT HAPPEN" > ./must-gather.local.3960990593128029605/quay-io-openshift-release-dev-ocp-v4- > 0-art-dev-sha256- > a4dc776b581ba6c4b2d1a5a08ccd09598177e2934e9159d0eac4067b1bdcce01/audit_logs/ > kube-apiserver/qeci-30541-fvs2b-master-1.c.openshift-qe.internal-termination. > log:2814:E1208 23:07:04.288418 20 fieldmanager.go:191] "[SHOULD NOT > HAPPEN] failed to update managedFields" err="failed to convert new object > (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: > .status.conditions: duplicate entries for key [type=\"Approved\"]" > VersionKind="/, Kind=" > ./must-gather.local.3960990593128029605/quay-io-openshift-release-dev-ocp-v4- > 0-art-dev-sha256- > a4dc776b581ba6c4b2d1a5a08ccd09598177e2934e9159d0eac4067b1bdcce01/audit_logs/ > kube-apiserver/qeci-30541-fvs2b-master-1.c.openshift-qe.internal-termination. > log:3605:E1208 23:07:10.082625 20 fieldmanager.go:191] "[SHOULD NOT > HAPPEN] failed to update managedFields" err="failed to convert new object > (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: > .status.conditions: duplicate entries for key [type=\"Approved\"]" > VersionKind="/, Kind=" (In reply to Rahul Gangwar from comment #1) > Reproduced again in > NAME VERSION AVAILABLE PROGRESSING SINCE STATUS > version 4.8.24 True False 7m15s Cluster version is > 4.8.24 > but in the title and Comment 1 the error shows "(machine.openshift.io/v1beta1, Kind=MachineHealthCheck)", > /var/log/pods/openshift-kube-apiserver_kube-apiserver-qeci-..-lftfh-master- > 1_b..e-8c..-..b4-a9ba-ae0aa..be../kube-apiserver/..log..-..-..+.. stderr F > E.. .. .. fieldmanager.go..] "[SHOULD NOT HAPPEN] failed to update > managedFields" err="failed to convert new object > (machine.openshift.io/v1beta1, Kind=MachineHealthCheck) to smd typed: > .spec.unhealthyConditions[0].timeout: expected string, got > &value.valueUnstructured{Value..}" > VersionKind="machine.openshift.io/v1beta1, Kind=MachineHealthCheck" > 1 These are two different errors. Can you please confirm which is the real error?
so it's another error, and the above test steps cannot be applied to it. Sorry for mistaken it with Bug 2026562
Looking through the existing logs and gathers attached to this BZ, there is no evidence that this issue has actually occurred. While I appreciate that it's hard to capture the logs, if we could capture the MachineHealthCheck resources from the clusters that this error has occurred on/is occurring on, it would be helpful to understand what values are triggering this error. Without having a particular value that we know triggers the issue, it's difficult to reproduce the issue and therefore difficult to validate we are fixing the actual issue
Please check below link is helpful to you. https://drive.google.com/file/d/1RaNO8Qk96slVIk0-tBnw1ondfq__y2bh/view?usp=sharing
Looking at that link, there is only one MHC installed, which is the default. It has a timeout value of `0s` which is a string, not sure why the error would be tripping. Notably, looking at the logs again, there is no mention of the MHC error, just job and batch
Old pods can't be gathered if they're gone after rotated. That's why we don't have this MHC error in must-gather logs. We will try to update logs, if error captured again.
@Joel same detected again, must-gather for that https://drive.google.com/file/d/1hfsa33AC6DB24BcSVpIEBcYEBeLBortG/view?usp=sharing
Looking into these logs, we see: 2021-12-17T03:46:01.363296436Z E1217 03:46:01.363130 16 fieldmanager.go:197] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (machine.openshift.io/v1beta1, Kind=MachineHealthCheck) to smd typed: .spec.unhealthyConditions[0].timeout: expected string, got &value.valueUnstructured{Value:3}" VersionKind="machine.openshift.io/v1beta1, Kind=MachineHealthCheck" However, the only MHC installed within the cluster, from the gather at least, has a timeout value of `0s`. This implies to me that either we aren't actually seeing all of the machinehealthchecks or someone is trying to create a new one, and the API server is rightfully rejecting the invalid MHC. Looking at the MHC validation (which notably hasn't changed between releases: https://github.com/openshift/machine-api-operator/blob/383c9b959b69044ec533118cf5d41f17101137f1/pkg/apis/machine/v1beta1/machinehealthcheck_types.go#L111-L112): ``` // +kubebuilder:validation:Pattern="^([0-9]+(\\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$" // +kubebuilder:validation:Type:=string ``` The value 3 wouldn't be valid according to the regex. The regex forces a string suffix, so the string validation seems appropriate to me.
@rgangwar Do you have any response to my previous comment? Not sure how we can move forward at the moment as we aren't able to reproduce this and the must gathers have thus far not aligned to the errors reported
@joel same error detected again but these must gather error not have error because at the time must-gather taken before that pods are rotated. Not sure these logs help you or not. https://drive.google.com/file/d/1GMBkIJh-ZAgSJ5qy4Vf8gxGizLUyvxHJ/view?usp=sharing, https://drive.google.com/file/d/1uKnEIWRIsWxJgPtHCh8V9QCQkW7fy0SO/view?usp=sharing
Nothing useful in those either. Do you have access to the clusters that these errors are occurring on? Could you `oc get machinehealthchecks -A -o yaml` into a file and upload here some examples from clusters experiencing the issue?
We got above error in auto testruns, in this cluster are got pruned after test run completion. I will try to share the details if it is again reported.
Do you have a link to the test suite that is being run in these automated tests? Perhaps looking at the test source code might help us to identify what's happening
@zhsun Please share the MHC test cases details to Joel of above logs and testrun-id is 20220103-0541.
I tried with mhc below, I could get SHOULD NOT HAPPEN error from must-gather. apiVersion: "machine.openshift.io/v1beta1" kind: "MachineHealthCheck" metadata: name: mhc2 namespace: openshift-machine-api spec: selector: {} unhealthyConditions: - type: Ready status: "False" timeout: 300s - type: Ready status: Unknown timeout: 3 maxUnhealthy: 3 $ find . -name "*.log"|grep -inr "SHOULD NOT HAPPEN" [19:55:43] ./quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-207543a62af5243a05ab3fe972768d534d2d21bc3c29acb36bdf80691e841ce6/namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-0-129-90.us-east-2.compute.internal/kube-apiserver/kube-apiserver/logs/current.log:1526:2022-01-11T11:52:50.058691868Z E0111 11:52:50.058606 17 fieldmanager.go:197] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (machine.openshift.io/v1beta1, Kind=MachineHealthCheck) to smd typed: .spec.unhealthyConditions[1].timeout: expected string, got &value.valueUnstructured{Value:3}" VersionKind="machine.openshift.io/v1beta1, Kind=MachineHealthCheck"
I've just tried to reproduce this given the example above. When I create the MHC I get: ``` The MachineHealthCheck "mhc2" is invalid: spec.unhealthyConditions.timeout: Invalid value: "integer": spec.unhealthyConditions.timeout in body must be of type string: "integer" ``` This error is expected and intended, we have restricted the type so that users have to set a string, notably, the MHC object is not stored and is not created. As far as I can tell, this is working as expected and unfortunately, this log line is a bit of a red herring. I don't know why the log line occurs when the MHC object is not actually created. This error will occur whenever a user attempts to create an invalid object. I agree with Huali Liu who mentioned on slack that this isn't actually a bug, the openapi validations are working as expected.