Bug 2130490

Summary: There is some long period of low bandwidth regularly when running live migration with 6TiB guest.
Product: Red Hat Enterprise Linux 8 Reporter: Chensheng Dong <chdong>
Component: qemu-kvmAssignee: Nitesh Narayan Lal <nilal>
qemu-kvm sub component: Live Migration QA Contact: Chensheng Dong <chdong>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: low    
Priority: low CC: coli, jinzhao, juzhang, nilal, peterx, virt-maint, xiaohli, xuhan
Version: 8.8Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-06 11:00:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chensheng Dong 2022-09-28 10:43:48 UTC
Description of problem:

I run the live migration for the 6 TiB guest with SAP HANA DB (idle scenario) and record the result of `virsh domjobinfo rhel84-hana` every 3 seconds,after I query the memory status and bandwidth, it shows that there are some long period (about 5 minutes) of low bandwidth every 5 minutes. 

Please refer the chart:
https://docs.google.com/spreadsheets/d/1mJAppNyge7wQ1yR8SqKNDv_3Lel2iA3LWVq0KBldabc/edit#gid=1553189673

Version-Release number of selected component (if applicable):
Host: RHEL 8.6.z
Linux lenovo-sr950-02.lab.eng.pek2.redhat.com 4.18.0-372.27.1.el8_6.x86_64 #1 SMP Thu Sep 8 10:43:18 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
QEMU version:
QEMU emulator version 6.2.0 (qemu-kvm-6.2.0-11.module+el8.6.0+16538+01ea313d.6)


How reproducible:


Steps to Reproduce:
1. First time, execute live migrate with multi-fd and zero-copy, set downtime to 300ms. The cmd is: virsh migrate rhel84-hana qemu+tcp://192.168.10.20/system --live --parallel --parallel-connections 11 --migrateuri tcp://192.168.10.20 --verbose -bandwidth 999999999999999
2. The second time, execute live migrate with multi-fd and without zero-copy, set downtime to 1000ms. The cmd is :virsh migrate rhel84-hana qemu+tcp://192.168.10.30/system --live --parallel --parallel-connections 5 --migrateuri tcp://192.168.10.30 --verbose -bandwidth 999999999999999
3. 

Actual results:


Expected results:


Additional info:
The script for querying and recording the memory status and bandwidth:
#!/bin/bash
echo "There are 3 output files:
1. temp.log -- All of the domjobinfo is saved in it
2. source_data.txt -- The data which is ready for google sheet
3. migrate+date.log -- temp.log + date
"
MIGRATE_COMMAND='virsh domjobinfo rhel84-hana'
FINISH='true'
FILENAME=migrate$(date +%Y%m%d)_$(date +%H%M%S)
TIME=`date +%m%d-%H%M`
touch $FILENAME.log
#If forget add $1, read it, then use $1
if [ -z $1 ]
then
  read -p "Please specify the interval!" INTERVAL
else
  INTERVAL=$1
fi

function make_log {
    date>temp.log
    $MIGRATE_COMMAND >> temp.log
    cat temp.log >> $FILENAME.log
    migrate_time=`sed -n "1p" temp.log | awk '{print $4}'`

    #
    # Get the unit name and convert to GiB, so if get TiB then * 1024 else if get MiB then / 1024 else print directly.
    #
    mem_processed=`sed -n "8p" temp.log | gawk '{if ($4 == "TiB") print $3 * 1024; else if ($4 == "GiB") print $3; else print $3 / 1024}'`
    mem_remaining=`sed -n "9p" temp.log | gawk '{if ($4 == "TiB") print $3 * 1024; else if ($4 == "GiB") print $3; else print $3 / 1024}'` 
    mem_total=`sed -n "10p" temp.log | gawk '{if ($4 == "TiB") print $3 * 1024; else if ($4 == "GiB") print $3; else print $3 / 1024}'`
    mem_bandwidth=`sed -n "11p" temp.log | gawk '{if ($4 == "GiB/s") print $3 * 1024 ; else print $3}'`
    source_data=`echo -e "$migrate_time\t$mem_processed\t$mem_remaining\t$mem_total\t$mem_bandwidth"`
    echo $source_data >> sourcedata$TIME.txt
   
}


while $FINISH; do
        date >> $FILENAME.log
        $MIGRATE_COMMAND >> $FILENAME.log
        make_log
        sleep $INTERVAL
        TEST=$(virsh domjobinfo rhel84-hana 2>&1 |grep 'error')
        if [ -n "$TEST"  ]
        then
                FINISH='false'
                echo 'Live migration completed!' >> $FILENAME.log
                break
        fi
done

Comment 1 Chensheng Dong 2022-09-28 10:45:03 UTC
*** Bug 2130491 has been marked as a duplicate of this bug. ***

Comment 2 John Ferlan 2022-10-02 12:55:45 UTC
*** Bug 2130492 has been marked as a duplicate of this bug. ***

Comment 3 Nitesh Narayan Lal 2023-06-06 11:00:34 UTC
As per a discussion with Chensheng closing this as fixed in the current release since we have already successfully completed the 6TB testing.