Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2054231

Summary:	Migration statistics are wrong and we are using less bandwidth than possible
Product:	Red Hat Enterprise Linux 9	Reporter:	Juan Quintela <quintela>
Component:	qemu-kvm	Assignee:	Juan Quintela <quintela>
qemu-kvm sub component:	Live Migration	QA Contact:	Li Xiaohui <xiaohli>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	medium
Priority:	medium	CC:	aadam, coli, jinzhao, juzhang, nilal, virt-maint, xiaohli
Version:	9.1	Keywords:	Triaged
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-05-08 14:54:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2024555
Bug Blocks:

Description Juan Quintela 2022-02-14 13:35:15 UTC

Description of problem:

When doing migration, the calculation of the remaining amount of memory is wrong.  This means that we are doing wrongly with the calculation of expected downtime.  When we have hosts with Multiterabyte amounts of memory, it notices too much, because we are not sending memory even when we won that things are 


Version-Release number of selected component (if applicable):

All.


How reproducible:

It happens always.  but it is more acute the higger amount of memory.

Steps to Reproduce:
1.  
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Li Xiaohui 2022-03-27 05:14:10 UTC

Hi Juan,

I have some questions here:
1. According to the tile and Comment 0 of this bug, there're two issues, right?
1) The calculation of the remaining amount of memory and expected downtime are wrong;
2) The bandwidth during migration is less than the value we can use

2. Could you describe the way how you get the conclusion that calculation of the remaining amount of memory is wrong? And did it only happen when hosts have multiterabytes memory?


3. I'm not aware what specific things would happen for multiterabyte hosts based your below words. Could you explain more?
> When we have hosts with Multiterabyte amounts of memory, it notices too much, 
> because we are not sending memory even when we won that things are

Comment 3 Li Xiaohui 2023-04-21 02:44:40 UTC

Hi Juan, Nitesh, 
Do we have the plan to fix this issue on RHEL 9.3.0?

Comment 4 Nitesh Narayan Lal 2023-04-21 02:51:49 UTC

(In reply to Li Xiaohui from comment #3)
> Hi Juan, Nitesh, 
> Do we have the plan to fix this issue on RHEL 9.3.0?

Hi Xiaohui, As I have mentioned that Juan will be reviewing his assigned BZs and update them with the status and ITR if he is planning to fix anything in 9.3.0.
Let's give some time go through the bugs. Is there a timeline by which you are looking for an update?

Comment 5 Li Xiaohui 2023-04-21 03:48:10 UTC

(In reply to Nitesh Narayan Lal from comment #4)
> (In reply to Li Xiaohui from comment #3)
> > Hi Juan, Nitesh, 
> > Do we have the plan to fix this issue on RHEL 9.3.0?
> 
> Hi Xiaohui, As I have mentioned that Juan will be reviewing his assigned BZs
> and update them with the status and ITR if he is planning to fix anything in
> 9.3.0.
> Let's give some time go through the bugs. Is there a timeline by which you
> are looking for an update?

For this bug, it's better before Tuesday of next week. Because QE would discuss it in our group Bug Triage meeting.
The aim of Bug Triage meeting is to review all opened bugs to see if they have the customer impacted or not. And also confirm if we have a plan to fix them.

We have reviewed nearly all opened migration bugs, now only this bug, Bug 2048460 and Bug 2016966.

Bug 2048460 has been discussed in the last Bi-weekly Live migration sync meeting. Juan said we plan to close it. Can you help do it?

Comment 6 Li Xiaohui 2023-04-21 03:52:10 UTC

(In reply to Li Xiaohui from comment #5)
> (In reply to Nitesh Narayan Lal from comment #4)
> > (In reply to Li Xiaohui from comment #3)
> > > Hi Juan, Nitesh, 
> > > Do we have the plan to fix this issue on RHEL 9.3.0?
> > 
> > Hi Xiaohui, As I have mentioned that Juan will be reviewing his assigned BZs
> > and update them with the status and ITR if he is planning to fix anything in
> > 9.3.0.
> > Let's give some time go through the bugs. Is there a timeline by which you
> > are looking for an update?
> 
> For this bug, it's better before Tuesday of next week. Because QE would
> discuss it in our group Bug Triage meeting.
> The aim of Bug Triage meeting is to review all opened bugs to see if they
> have the customer impacted or not. And also confirm if we have a plan to fix
> them.

Sorry, not all opened bugs, I mean all new, assigned bugs.

> 
> We have reviewed nearly all opened migration bugs, now only this bug, Bug
> 2048460 and Bug 2016966.

Sorry, update my words:
We have reviewed nearly all opened new and assigned migration bugs, now only remain this bug, Bug 2048460 and Bug 2016966.


> 
> Bug 2048460 has been discussed in the last Bi-weekly Live migration sync
> meeting. Juan said we plan to close it. Can you help do it?

Comment 7 Juan Quintela 2023-04-25 10:10:16 UTC

Hi
The fixes for this patch are already upstream (post 8.0), I will backport them.
There is nothing to test for, because the problem is that we are using less bandwidth that possible, but not a lot that we can do about that.

Comment 8 Juan Quintela 2023-04-25 10:11:50 UTC

Hi
The fixes for this patch are already upstream (post 8.0), I will backport them.
There is nothing to test for, because the problem is that we are using less bandwidth that possible, but not a lot that we can do about that.

Comment 9 Juan Quintela 2023-04-25 10:13:31 UTC

Hi
second try, bugzilla got confused with my second comment.

We have fixes upstream already, I have to backport them.
But I can think on an easy way that QE can test this improvement.  It will only help when we are near the limit of the bandwidth, but you will never know if the reason that now the migration finishes is because the patches or plain old luck.

Comment 10 Li Xiaohui 2023-04-25 10:29:03 UTC

Hi Juan, thanks for the update.

> > The fixes for this patch are already upstream (post 8.0), I will backport them.

So we plan to fix this bug on RHEL 9.3.0? If so, please help set the ITR to 9.3.0. also for DTM if you can get it.

> > because the problem is that we are using less bandwidth that possible

> > But I can think on an easy way that QE can test this improvement.  It will only help when we are near the limit of the bandwidth, > > but you will never know if the reason that now the migration finishes is because the patches or plain old luck

QE has two AMD machines with 200Gbps network card.
Can't we compare the bandwidth during migration active under 1) pre-patch and 2) patch to see if the fix help improve the bandwidth utilization?

Comment 11 Juan Quintela 2023-04-25 10:52:39 UTC

You can, but as said, it is only seen during edge cases, and I am not sure ou are going to be able to measure it.  Basically the difference is if it decides to finish migration or not, and that is not something that one can "detect" from outside qemu.
I think I will not try to check it, because I am not sure you are going to be able to create an scenary where you can reproduce this consistently.

Comment 12 Nitesh Narayan Lal 2023-04-26 23:12:38 UTC

Juan, Do you have the original environment/machine where you first observed this issue? Xiaohui will have to do some testing before marking this BZ as Tested. Or are you suggesting that we will have to verify this bug by only doing sanity testing?

I am setting the ITR. Can you please share a tentative date or DTM by which you are planning to do the backport?
Thanks

Comment 13 Juan Quintela 2023-05-08 14:54:20 UTC

See

https://bugzilla.redhat.com/show_bug.cgi?id=2024555#c7

There I show the links to the upstream series that improve the statistics.  The thing that is improved is the calculations.  There is no easy way to check for QE that they have improved.  So we are closing this bug as NOT A BUG.

We can check that it is bettter on the other bug.