Bug 1919863

Summary: dirty-rate is divided by calc-time when calculate guest dirty-rate
Product: Red Hat Enterprise Linux 9 Reporter: Li Xiaohui <xiaohli>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Live Migration QA Contact: Li Xiaohui <xiaohli>
Status: CLOSED MIGRATED Docs Contact:
Severity: medium    
Priority: medium CC: chayang, jinzhao, juzhang, virt-maint
Version: 9.0Keywords: MigratedToJIRA, Reopened, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-30 17:57:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Li Xiaohui 2021-01-25 10:13:29 UTC
Description of problem:
dirty-rate is divided by calc-time when calculate guest dirty-rate


Version-Release number of selected component (if applicable):
host info:
kernel-4.18.0-275.el8.x86_64&qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08.x86_64
guest info:
kernel-4.18.0-275.el8.x86_64


How reproducible:
100%


Steps to Reproduce:
1.Boot a guest on src host, please see qemu command[1];
2.Run stress in guest
# stressapptest -M 200 -s 100000
3.Query dirty rate via qmp cmds
(1)Scenario 1: set calc-time to 3:
{"execute":"calc-dirty-rate", "arguments": {"calc-time": 3}}
(2)Scenario 2: set calc-time to 1:
{"execute":"calc-dirty-rate", "arguments": {"calc-time": 1}}
after (1) or (2), check dirty rate:
{"execute":"query-dirty-rate"}


Actual results:
After step 3, get different dirty rate between Scenario 1 and Scenario 2 on src host:
{"execute":"calc-dirty-rate", "arguments": {"calc-time": 3}}
{"return": {}}
{"execute":"query-dirty-rate"}
{"return": {"status": "measured", "dirty-rate": 61, "start-time": 218710, "calc-time": 3}}
{"execute":"calc-dirty-rate", "arguments": {"calc-time": 3}}
{"return": {}}
{"execute":"query-dirty-rate"}
{"return": {"status": "measured", "dirty-rate": 58, "start-time": 219020, "calc-time": 3}}
{"execute":"calc-dirty-rate", "arguments": {"calc-time": 1}}
{"return": {}}
{"execute":"query-dirty-rate"}
{"return": {"status": "measured", "dirty-rate": 204, "start-time": 219044, "calc-time": 1}}
{"execute":"calc-dirty-rate", "arguments": {"calc-time": 1}}
{"return": {}}
{"execute":"query-dirty-rate"}
{"return": {"status": "measured", "dirty-rate": 196, "start-time": 219068, "calc-time": 1}}
{"execute":"calc-dirty-rate", "arguments": {"calc-time": 3}}
{"return": {}}
{"execute":"query-dirty-rate"}
{"return": {"status": "measured", "dirty-rate": 60, "start-time": 219082, "calc-time": 3}}
{"execute":"calc-dirty-rate", "arguments": {"calc-time": 1}}
{"return": {}}
{"execute":"query-dirty-rate"}

Notes: have checked the stressapptest program, it works well. 


Expected results:
The dirty-rate shouldn't be divided by calc-time.


Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1833235#c14

Comment 1 Dr. David Alan Gilbert 2021-01-25 12:51:33 UTC
This maybe a limitation of the mechanism used;  It works by doing:

   a) Start dirty tracking
   b) Wait for calc-time
   c) Stop dirty tracking
   d) Count number of pages dirty

That can't tell the difference between a large area of memory that's slowly changed during 'calc-time'
and the same area of memory that's rapidly changed repeatedly.

Using the 1 second calc-time seems to make more sense here; any scaling seems bogus.

An interesting question is whether we could provide the user with the results
with different calc-time's - that could then distinguish between the two cases.

Comment 2 John Ferlan 2021-02-08 19:37:17 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 5 Li Xiaohui 2021-07-15 09:11:06 UTC
Hi Dave,
Hit this bz on the latest rhel9.0, shall I clone it on rhel9.0?

Comment 6 John Ferlan 2021-09-08 21:28:23 UTC
Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 8 RHEL Program Management 2022-07-25 07:28:06 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 9 Li Xiaohui 2022-07-25 08:36:57 UTC
Reopen this bug as I can reproduce it on the latest RHEL 8.7 and RHEL 9.1

Comment 12 Li Xiaohui 2023-01-03 08:16:44 UTC
Extend Stale date to +6 months for this bug as I can reproduce it on the latest RHEL 8.8 and RHEL 9.2