Bug 1713479
| Summary: | During upgrade, LocalStorageCapacityIsolation feature gate is turned on temporarily | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
| Component: | kube-apiserver | Assignee: | Lukasz Szaszkiewicz <lszaszki> |
| Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.2.0 | CC: | aos-bugs, deads, jokerman, lszaszki, mfojtik, mmccomas, tnozicka, xxia |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1713207 | Environment: | |
| Last Closed: | 2019-10-16 06:29:21 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1713207 | ||
| Bug Blocks: | |||
|
Description
Clayton Coleman
2019-05-23 19:59:21 UTC
This likely is an issue in 4.1 and could potentially be impactful. Once we identify the cause we will need to backport. Urgent because flag gates going on during upgrade is bad (if true) (In reply to Clayton Coleman from comment #1) > This likely is an issue in 4.1 and could potentially be impactful. Once we > identify the cause we will need to backport. > > Urgent because flag gates going on during upgrade is bad (if true) It looks like that feature gate is set off by the scheduler operator. If support operator or anything else is faster than scheduler operator there might be revision of kube apiserver with that gate on (because it is default on and beta in kube). That feature gate belongs to scheduler logically, so I don't think we want to move it to kube apiserver. One option could be to make support operator observe the kubeapiserver config and wait for that gate to be off before creating replicas? /cc David xrefs: https://github.com/openshift/api/blob/master/config/v1/types_feature.go#L68 https://github.com/openshift/cluster-kube-scheduler-operator/blob/master/pkg/operator/target_config_reconciler_v311_00.go#L207 > One option could be to make support operator observe the kubeapiserver config and wait for that gate to be off before creating replicas?
Bad suggestion. We probably don't want to introduce something like this into other operators, there should be some generic fix.
From the event logs, I don't see evidence of the feature gate configuration changing after/prior to upgrade.
│2019-05-23 05:41:08 +0200 CEST to 2019-05-23 05:41:08 +0200 CEST (1) "openshift-kube-apiserver-operator" ObserveFeatureFlagsUpdated Updated apiServerArguments.feature-gates to ExperimentalCriticalPodAnnotation=true,RotateKubeletServerCe│
rtificate=true,SupportPodPidsLimit=true,LocalStorageCapacityIsolation=false
....
│2019-05-23 05:47:42 +0200 CEST to 2019-05-23 05:47:42 +0200 CEST (1) "openshift-kube-apiserver-operator" ObservedConfigChanged Writing updated observed config: {"admissionPluginConfig":{"network.openshift.io/RestrictedEndpointsAdmission│
":{"configuration":{"restrictedCIDRs":["10.128.0.0/14","172.30.0.0/16"]}}},"apiServerArguments":{"cloud-provider":["aws"],"feature-gates":["ExperimentalCriticalPodAnnotation=true","RotateKubeletServerCertificate=true","SupportPodPidsLimi│
t=true","LocalStorageCapacityIsolation=false"]},"
....
│2019-05-23 05:58:25 +0200 CEST to 2019-05-23 05:58:25 +0200 CEST (2) "openshift-kube-apiserver-operator" OperatorVersionChanged clusteroperator/kube-apiserver version "raw-internal" changed from "0.0.1-2019-05-23-032300" to "0.0.1-2019-│
05-23-032421"
....
│2019-05-23 05:58:30 +0200 CEST to 2019-05-23 05:58:30 +0200 CEST (1) "openshift-kube-apiserver-operator" ConfigMapUpdated Updated ConfigMap/kube-apiserver-pod -n openshift-kube-apiserver: cause by changes in data.pod.yaml │
│2019-05-23 05:58:30 +0200 CEST to 2019-05-23 05:58:30 +0200 CEST (1) "openshift-kube-apiserver-operator" RevisionTriggered new revision 7 triggered by "configmap/kube-apiserver-pod has changed"
....
(no evidence of changing/flipping the feature gate config)... I can see a room for a race before scheduler make the initial change, which should happen shortly after install, so it should not cause problems during upgrade.
we decided to explicitly set feature gates in bootstrap-config-overrides for kube-apiserver-operator, link to PR https://github.com/openshift/cluster-kube-apiserver-operator/pull/506 Verified in 4.2.0-0.nightly-2019-07-21-222447 to 4.2.0-0.nightly-2019-07-22-160516 upgrade: Because of comment 5's explicit fix, the LocalStorageCapacityIsolation is always false. In addition, per https://github.com/kubernetes/kubernetes/issues/57167 , created a deployment who's spec.template is: volumes: - emptyDir: sizeLimit: "0" name: foo Didn't meet the issue 57167. Finally, observed log of openshift-kube-apiserver-operator, didn't see LocalStorageCapacityIsolation changing. So moving bug to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |