Bug 2054231
| Summary: | Migration statistics are wrong and we are using less bandwidth than possible | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Juan Quintela <quintela> |
| Component: | qemu-kvm | Assignee: | Juan Quintela <quintela> |
| qemu-kvm sub component: | Live Migration | QA Contact: | Li Xiaohui <xiaohli> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aadam, coli, jinzhao, juzhang, nilal, virt-maint, xiaohli |
| Version: | 9.1 | Keywords: | Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-08 14:54:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2024555 | ||
| Bug Blocks: | |||
|
Description
Juan Quintela
2022-02-14 13:35:15 UTC
Hi Juan, I have some questions here: 1. According to the tile and Comment 0 of this bug, there're two issues, right? 1) The calculation of the remaining amount of memory and expected downtime are wrong; 2) The bandwidth during migration is less than the value we can use 2. Could you describe the way how you get the conclusion that calculation of the remaining amount of memory is wrong? And did it only happen when hosts have multiterabytes memory? 3. I'm not aware what specific things would happen for multiterabyte hosts based your below words. Could you explain more? > When we have hosts with Multiterabyte amounts of memory, it notices too much, > because we are not sending memory even when we won that things are Hi Juan, Nitesh, Do we have the plan to fix this issue on RHEL 9.3.0? (In reply to Li Xiaohui from comment #3) > Hi Juan, Nitesh, > Do we have the plan to fix this issue on RHEL 9.3.0? Hi Xiaohui, As I have mentioned that Juan will be reviewing his assigned BZs and update them with the status and ITR if he is planning to fix anything in 9.3.0. Let's give some time go through the bugs. Is there a timeline by which you are looking for an update? (In reply to Nitesh Narayan Lal from comment #4) > (In reply to Li Xiaohui from comment #3) > > Hi Juan, Nitesh, > > Do we have the plan to fix this issue on RHEL 9.3.0? > > Hi Xiaohui, As I have mentioned that Juan will be reviewing his assigned BZs > and update them with the status and ITR if he is planning to fix anything in > 9.3.0. > Let's give some time go through the bugs. Is there a timeline by which you > are looking for an update? For this bug, it's better before Tuesday of next week. Because QE would discuss it in our group Bug Triage meeting. The aim of Bug Triage meeting is to review all opened bugs to see if they have the customer impacted or not. And also confirm if we have a plan to fix them. We have reviewed nearly all opened migration bugs, now only this bug, Bug 2048460 and Bug 2016966. Bug 2048460 has been discussed in the last Bi-weekly Live migration sync meeting. Juan said we plan to close it. Can you help do it? (In reply to Li Xiaohui from comment #5) > (In reply to Nitesh Narayan Lal from comment #4) > > (In reply to Li Xiaohui from comment #3) > > > Hi Juan, Nitesh, > > > Do we have the plan to fix this issue on RHEL 9.3.0? > > > > Hi Xiaohui, As I have mentioned that Juan will be reviewing his assigned BZs > > and update them with the status and ITR if he is planning to fix anything in > > 9.3.0. > > Let's give some time go through the bugs. Is there a timeline by which you > > are looking for an update? > > For this bug, it's better before Tuesday of next week. Because QE would > discuss it in our group Bug Triage meeting. > The aim of Bug Triage meeting is to review all opened bugs to see if they > have the customer impacted or not. And also confirm if we have a plan to fix > them. Sorry, not all opened bugs, I mean all new, assigned bugs. > > We have reviewed nearly all opened migration bugs, now only this bug, Bug > 2048460 and Bug 2016966. Sorry, update my words: We have reviewed nearly all opened new and assigned migration bugs, now only remain this bug, Bug 2048460 and Bug 2016966. > > Bug 2048460 has been discussed in the last Bi-weekly Live migration sync > meeting. Juan said we plan to close it. Can you help do it? Hi The fixes for this patch are already upstream (post 8.0), I will backport them. There is nothing to test for, because the problem is that we are using less bandwidth that possible, but not a lot that we can do about that. Hi The fixes for this patch are already upstream (post 8.0), I will backport them. There is nothing to test for, because the problem is that we are using less bandwidth that possible, but not a lot that we can do about that. Hi second try, bugzilla got confused with my second comment. We have fixes upstream already, I have to backport them. But I can think on an easy way that QE can test this improvement. It will only help when we are near the limit of the bandwidth, but you will never know if the reason that now the migration finishes is because the patches or plain old luck. Hi Juan, thanks for the update. > > The fixes for this patch are already upstream (post 8.0), I will backport them. So we plan to fix this bug on RHEL 9.3.0? If so, please help set the ITR to 9.3.0. also for DTM if you can get it. > > because the problem is that we are using less bandwidth that possible > > But I can think on an easy way that QE can test this improvement. It will only help when we are near the limit of the bandwidth, > > but you will never know if the reason that now the migration finishes is because the patches or plain old luck QE has two AMD machines with 200Gbps network card. Can't we compare the bandwidth during migration active under 1) pre-patch and 2) patch to see if the fix help improve the bandwidth utilization? You can, but as said, it is only seen during edge cases, and I am not sure ou are going to be able to measure it. Basically the difference is if it decides to finish migration or not, and that is not something that one can "detect" from outside qemu. I think I will not try to check it, because I am not sure you are going to be able to create an scenary where you can reproduce this consistently. Juan, Do you have the original environment/machine where you first observed this issue? Xiaohui will have to do some testing before marking this BZ as Tested. Or are you suggesting that we will have to verify this bug by only doing sanity testing? I am setting the ITR. Can you please share a tentative date or DTM by which you are planning to do the backport? Thanks See https://bugzilla.redhat.com/show_bug.cgi?id=2024555#c7 There I show the links to the upstream series that improve the statistics. The thing that is improved is the calculations. There is no easy way to check for QE that they have improved. So we are closing this bug as NOT A BUG. We can check that it is bettter on the other bug. |