Bug 1883946
Summary: | Understand why trident CSI pods are getting deleted by OCP | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Andre Costa <andcosta> |
Component: | kube-controller-manager | Assignee: | Maciej Szulik <maszulik> |
Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.5 | CC: | aos-bugs, armin.kunaschik, jsafrane, mfojtik, vjaypurk |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:21:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andre Costa
2020-09-30 14:44:42 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. From reading https://github.com/NetApp/trident/issues/444 and https://github.com/NetApp/trident/issues/474 it looks like the problem is on the trident side, which should be fixed in newer release - https://github.com/NetApp/trident/issues/444#issuecomment-718059956 For the k8s gc issue, there's a WIP PR fixing the races in https://github.com/kubernetes/kubernetes/pull/92743, hopefully that should land in k8s 1.20 and we'll get that with the next k8s bump. Netapp release 20.10.0 fixes the broken deployment. That means that the deployment is done correctly when parts of the deployment are removed. There is a second hint about an incorrect ownerReference. See github issue 474 for detail. This is not yet fixed. I cannot judge whether or not the ownerReference is the real cause of the problem. But if it isn't, I'd expect a backport of the gc fix for at least 4.6. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. With k8s 1.20 already available in 4.7 and trident bug fixed, I'm moving this to qa. FYI: Netapp fixed the github issue 474 in Trident 20.10.1. We're still testing this version. If this fixes the issue completely, we're fine. If not, it's our expectation: it might still necessary to backport the gc fix to 4.6 because it's a longterm support release. (In reply to Armin Kunaschik from comment #12) > FYI: Netapp fixed the github issue 474 in Trident 20.10.1. We're still > testing this version. If this fixes the issue completely, we're fine. If > not, it's our expectation: it might still necessary to backport the gc fix > to 4.6 because it's a longterm support release. I don't expect that GC fix being backported since this is a too big and too risky change. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |