Bug 2100703
| Summary: | [Metro-DR] NetworkFence CR is not reconciled by the operator | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Raghavendra Talur <rtalur> |
| Component: | csi-addons | Assignee: | Niels de Vos <ndevos> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | akarsha <akrai> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.11 | CC: | aclewett, akrai, hnallurv, muagarwa, ndevos, ocs-bugs, odf-bz-bot, rar, sheggodu |
| Target Milestone: | --- | Keywords: | TestBlocker |
| Target Release: | ODF 4.12.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: When a Ceph-CSI Pod is started, it passes the IP-address of the Pod to the CSI-Addons sidecar. When the Pod restarts, it is possible that the IP-address was changed.
Consequence: If the restart does not cause a change to the name of the Pod, it can happen that the CSIAddonsNode CR contains the previous IP-address. In case the previous IP-address is listed in the CSIAddonsNode CR, the CSI-Addons Controller will not be able to detect the new IP-address, and fails to connect to the side-car.
Fix: Use the name of the Ceph-CSI Pod and the Namespace where the Pod is running, instead of the IP-address.
Result: The CSI-Addons Controller will be able to lookup the CSIAddonsNode CR, get the endpoint attrubute and and resolve the name of the Pod in the Namespace to an IP-address.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-02-08 14:06:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Raghavendra Talur
2022-06-24 03:12:52 UTC
The workaround is to stop the csi-rbdplugin-provisioner pod and csi-addons-controller-manager pod delete the csiaddonnodes object start csi-rbdplugin-provisioner pod and csi-addons-controller-manager pod How often does this happen? Does it happen consistently, or was this a one-time occurrence? I guess we could investigate marking a node unavailable if there is some error, and retry on certain errors (resolve the node, and create a new connection). Not a 4.11 blocker (In reply to Niels de Vos from comment #5) > How often does this happen? Does it happen consistently, or was this a > one-time occurrence? > > I guess we could investigate marking a node unavailable if there is some > error, and retry on certain errors (resolve the node, and create a new > connection). This was seen only once when this bug was filed, but in the past week, we seem to have hit this issue twice. Rakshith R had collected the required logs last time but I had not been able to reproduce it after that. Attaching a must-gather for further debugging. rtalur attached must-gather. *** Bug 2106613 has been marked as a duplicate of this bug. *** rtalur ocs-must-gather link - https://drive.google.com/file/d/1dbKCFJdFctlJMFF0Hfnq4VXYfrD1sIXd/view?usp=sharing. A workaround has been posted for review in upstream at https://github.com/csi-addons/kubernetes-csi-addons/pull/186 We're still investigating a more appropriate solution. https://github.com/csi-addons/kubernetes-csi-addons/pull/190 is an alternative that does not delete and re-create the CSIAddonsNode CR. I'd like to test the solution, but without steps to reproduce it is rather difficult :-/ https://github.com/red-hat-storage/kubernetes-csi-addons/commit/a9febe2efde7a9426cb53f27a86efb3535913e34 is the backport that should prevent this issue from happening again. It was included in the release-4.12 branch with https://github.com/red-hat-storage/kubernetes-csi-addons/pull/54 . Builds from the beginning of September have the fix already. |