Bug 1982729
Summary: | MigAnalytic fails to get source cluster resources, all reported as 0 in ocp 3.9 sometimes | ||
---|---|---|---|
Product: | Migration Toolkit for Containers | Reporter: | Sergio <sregidor> |
Component: | Documentation | Assignee: | Avital Pinnick <apinnick> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Xin jiang <xjiang> |
Severity: | high | Docs Contact: | Avital Pinnick <apinnick> |
Priority: | unspecified | ||
Version: | 1.4.6 | CC: | apinnick, ernelson, jmatthew, jmontleo, pgaikwad, sregidor, whu, xjiang |
Target Milestone: | --- | ||
Target Release: | 1.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1982604 | Environment: | |
Last Closed: | 2021-07-21 13:41:26 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1982604 | ||
Bug Blocks: |
Description
Sergio
2021-07-15 14:48:09 UTC
I've seen this bug in my 3.9 environment (aws 3.9 -> aws 4.8). When I clicked in "refresh", the analytic reported the right values. Nevertheless, the warning regarding the resize functionality remained there, even after refreshing the migplan apart from refreshing the miganalytic. This isn't as much a problem with Analytics as it is the pv resize feature which attempts to use the restic daemonset to determine the actual disk usage of the volumes. IIUC the failures are limited to using analytics for pv resize when migrating from older OCP releases (3.7, 3.9) and the pod comes into existence after the restic daemonset was started. restic uses a hostPath mount to peer into the volume and bind remount does not exist on these versions so if the application comes up after the daemonset it is oblivious to it. Possible solutions might include restarting the daeonset before running the analytic (I think this would be costly performance wise on large clusters) or creating a pod on the node to run the size check instead of using the restic daemonset so it always exists after the application. Pranav/Jason, My suggestion is we do _not_ address this fix in code changes. We document this as a known issue of customers running a source cluster of 3.7/3.9, explain they could restart Restic or they could proceed without some functionality (assume won't be able to resize PVs, maybe some loss of progress, etc...we can explain in doc note). Does that sound reasonable? Pranav/Jason, My suggestion is we do _not_ address this fix in code changes. We document this as a known issue of customers running a source cluster of 3.7/3.9, explain they could restart Restic or they could proceed without some functionality (assume won't be able to resize PVs, maybe some loss of progress, etc...we can explain in doc note). Does that sound reasonable? John, That is reasonable. Assuming that the feature degrades gracefully and the users still have a way to manually mitigate the degradation, I wouldn't consider this as a blocker. I will take the responibility of documenting this in our upstream docs. In my previous comment, I forgot to note an important thing. When PV Resizing degrades gracefully (FailedRunningDF condition on MigAnalytic), it does _not_ block migrations from proceeding. The migrations still work. The only difference is that the migration cannot resize the volumes automatically in the target cluster based on usage of the volume because MigAnalytic failed to collect that information. If users do care about the PV resizing, they need to bounce the Restic pods once for resizing to happen automatically. If the users don't care about PV resizing, then they can simply proceed with PV resizing disabled. Shifting this to a docs BZ so we can document upstream and get that info propagated downstream as you see fit, Avital. Pranav will follow with the details. I can add this to the release notes > known issues for 1.4.6. Since you do not plan to address this with a code fix, I will add the bug and workaround to 1.5.0 release notes as well. Update: I will put this in 1.5.0 release notes and not in 1.4.6 because PV resizing was introduced as a 1.5.0 feature and only appears in the documentation for that release. Changes merged for OCP 4.8/MTC 1.5.0 RN *** Bug 1982604 has been marked as a duplicate of this bug. *** |