Background Info: The only method OLM has for operators to inform it that an upgrade is occurring is the Readiness probe. Operators are to set their Readiness to false whenever they were doing work that shouldn't be interrupted by OLM. However, using a Readiness probe has negative side affects like logs and metrics not being gathered (https://github.com/operator-framework/operator-lifecycle-manager/issues/922). Therefore, it made sense to push these problems onto CNV's user-operator, HCO, in order to avoid the side effects on component operators. Problem: With the conditions now on the HCO and components operators, we can't say with 100% certainty that they always will be able to gate upgrades from OLM. For example, if all the operators are Running 1/1 and one of the operators hasn't started reporting conditions in it's CR, the HCO will see the operator as Ready and will report upgradable to OLM when it's not. The root of this problem has to do with tracking state in Kubernetes. We need a way to communicate state with 100% accuracy while being "kubernetes-like", declarative and distributed. TLDR: We can't 100% guarantee state using a declarative API, so there's a chance OLM can interrupt an upgrade. However it's important to note that it's not likley for this to happen with 'Automatic upgrades' because most of the interruption risk is at the beginning of an upgrade, when operators haven't had a chance to complete a reconcile loop. Therefore, the majority of the risk is negated as long as multiple releases aren't published at the same time on the same upgrade graph. Long term solutions: 1) OLM provides a better method for upgrade gating (https://github.com/openshift/enhancements/pull/28) 2) Component operators support multiple upgrades (one-to-many) across operator versions.
Michael was mentioning on an issue that instead of gating, operators should tolerate to get replaced by another operator. To some degree I understand this pespective, as containers in general can always get interrupted. This is slighlty touching the point of eventually consisiting systems.
Point 2) would be the implementation of that concept. > 2) Component operators support multiple upgrades (one-to-many) across operator versions.
please add fixed in version
This bug tracks a documented release note. The fix will be in 2.3.
The fix will not be in 2.3, we might actually not fix it in this way. Deferring for now.
I'm proposing to close this bug. I tis not a bug per-se. It's an enhancement. https://issues.redhat.com/browse/CNV-474 is the right place to work on this.