Doc Text:
|
Cause: based on the timing of the events it receives, the samples operator could end up breaking the contract with k8s SharedInformers an mutate the controller cache for the objects (Samples Config, Templates, Imagestreams) that it watches. Also, the frequency of concurrent updates to the samples config cr instance when tracking imagestream status lead to increasing the likelihood of hitting this timing window with incorrectly mutating the controller cache.
Consequence: In many cases, robustness in k8s kept thing OK, but we've now seen cases where this violation produced a panic in k8s when samples operator tried to updated the objects it watches.
Fix: Stop mutating the cache via better use of k8s DeepCopy prior to updates. Also adjusted when we copy config information from spec to status in the samples config CR instances. Also, in 4.6.z a change has been added which reduces, though we cannot eliminate, concurrent attempts during imagestream event proccessing to update the samples operator config CR instance.
Result: the samples operator no longer mutates its SharedInformer cache, and avoids panics in k8s when updating the objects it manages. Also, the amount of update conflicts that can occur when concurrent imagestream events result in updating the samples operator CR instances, has been greatly reduced, though they cannot be eliminated, an a few of them still occurring are OK.
|