Bug 1536357

Summary: Guest memory leak after adding/deleting vlan devices
Product: Red Hat Enterprise Linux 8 Reporter: Zhengtong <zhengtli>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Networking QA Contact: Lei Yang <leiyang>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: low    
Priority: low CC: aadam, ailan, chayang, jinzhao, juzhang, knoel, leiyang, pezhang, qzhang, rbalakri, virt-bugs, virt-maint, yuhuang
Version: 8.0Keywords: Triaged
Target Milestone: rc   
Target Release: 8.1   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-08 07:26:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zhengtong 2018-01-19 08:47:45 UTC
Description of problem:
Added vlan devices to the device in guest, and remove the vlan devices. the memory leak happened.

Version-Release number of selected component (if applicable):
Host: 3.10.0-830.el7.x86_64
qemu-kvm-rhev-2.10.0-17.el7
Guest:3.10.0-831.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1.Boot guest up
2.inside guest,disable swap,and watch the memory usage with the command till step 4 finished:
# swapoff -a
# watch -n.5 free -h
3.Inside guest, add 300(or more) vlan devices 

[root@dhcp-9-75 home]# sh adding.sh eth0 300
adding...1
adding...2
adding...3
...
adding...298
adding...299
adding...300

4.Inside guest, deleting vlan devices added in step2 
[root@dhcp-9-75 home]# sh deleting.sh eth0 300
deleting...1
deleting...2
deleting...3
...
deleting...298
deleting...299
deleting...300

Actual results:
The memory in watched is increased.

Expected results:
No memory increased

Additional info:

[root@dhcp-9-75 home]# cat adding.sh 
if [ $# -ne 2 ]; then
    echo "parameter error"
    echo "sh adding.sh \$device \$vlan_number"
    exit 1
fi
for i in $(seq 1 $2); do
    echo "adding...$i"
    ip link add link $1 $1.$i type vlan id $i;
done
[root@dhcp-9-75 home]# cat deleting.sh 
if [ $# -ne 2 ]; then
    echo "parameter error"
    echo "sh deleting.sh \$device \$vlan_number"
    exit 1
fi
for i in $(seq 1 $2); do
    echo "deleting...$i"
    ip link delete $1.$i;
done

Comment 2 Vlad Yasevich 2018-04-02 13:01:24 UTC
I have been trying to reproduce this on a isolated system to make sure that there are minimal interactions between memory allocations and this test, and I am not seeing the issue.

At first try, there is some memory that appears that it's not being released, but on subsequent runs of the test, there doesn't appear to be any additional memory growth.  The memory allocations we do see are skb caches used by the kernel and vlan array that's allocated the first time vlan devices are created.

I've also tried running with KMEMLEAK and am not seeing any reported leaks.

Could you please try running this test more then once on your system and see if the amount of memory allocated consistently increases.  Also make sure that you are seeing not just the cache increase, but the actual memory usage.
Cache is allowed to increase as it will get re-used.
`
-vlad

Comment 3 Vlad Yasevich 2018-04-02 13:01:56 UTC
I have been trying to reproduce this on a isolated system to make sure that there are minimal interactions between memory allocations and this test, and I am not seeing the issue.

At first try, there is some memory that appears that it's not being released, but on subsequent runs of the test, there doesn't appear to be any additional memory growth.  The memory allocations we do see are skb caches used by the kernel and vlan array that's allocated the first time vlan devices are created.

I've also tried running with KMEMLEAK and am not seeing any reported leaks.

Could you please try running this test more then once on your system and see if the amount of memory allocated consistently increases.  Also make sure that you are seeing not just the cache increase, but the actual memory usage.
Cache is allowed to increase as it will get re-used.
`
-vlad

Comment 4 juzhang 2018-04-03 01:35:25 UTC
Hi Xiyue,

Could you have a try?

Best Regards,
Junyi

Comment 5 xiywang 2018-04-03 09:32:24 UTC
Re-tested.

1. after boot the guest
# swapoff -a
# watch -n.5 free -h
Every 0.5s: free -h                                                                                                                                                                        Tue Apr  3 17:23:51 2018

              total        used        free	 shared  buff/cache   available
Mem:           3.7G        693M        2.6G         19M        476M        2.8G
Swap:            0B          0B          0B

# cat /proc/meminfo
MemTotal:        3880672 kB
MemFree:         2681412 kB
MemAvailable:    2890768 kB
Buffers:            2112 kB
Cached:           418596 kB
SwapCached:            0 kB
Active:           671800 kB
Inactive:         350600 kB
Active(anon):     602016 kB
Inactive(anon):    19696 kB
Active(file):      69784 kB
Inactive(file):   330904 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        601832 kB
Mapped:           111708 kB
Shmem:             20020 kB
Slab:              67720 kB
SReclaimable:      26808 kB
SUnreclaim:        40912 kB
KernelStack:        7152 kB
PageTables:        35320 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1940336 kB
Committed_AS:    3360420 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       14224 kB
VmallocChunk:   34359720856 kB
HardwareCorrupted:     0 kB
AnonHugePages:     92160 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       87928 kB
DirectMap2M:     3057664 kB
DirectMap1G:     3145728 kB

2. add 300 vlan devices
# sh add.sh eth0 300
the result of "watch -n.5 free -h" changed to:
Every 0.5s: free -h                                                                                                                                                                        Tue Apr  3 17:25:26 2018

              total        used        free	 shared  buff/cache   available
Mem:           3.7G        818M        2.4G         20M        493M        2.6G
Swap:            0B          0B          0B

# cat /proc/meminfo
MemTotal:        3880672 kB
MemFree:         2531116 kB
MemAvailable:    2747236 kB
Buffers:            2112 kB
Cached:           424080 kB
SwapCached:            0 kB
Active:           801488 kB
Inactive:         353744 kB
Active(anon):     730568 kB
Inactive(anon):    19692 kB
Active(file):      70920 kB
Inactive(file):   334052 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        729160 kB
Mapped:           112048 kB
Shmem:             21220 kB
Slab:              84132 kB
SReclaimable:      31768 kB
SUnreclaim:        52364 kB
KernelStack:        7152 kB
PageTables:        35436 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1940336 kB
Committed_AS:    3393644 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       14236 kB
VmallocChunk:   34359720856 kB
HardwareCorrupted:     0 kB
AnonHugePages:     92160 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       87928 kB
DirectMap2M:     3057664 kB
DirectMap1G:     3145728 kB

3. delete 300 vlan devices in guest
# sh del.sh eth0 300
the result of "watch -n.5 free -h" changed to:
Every 0.5s: free -h                                                                                                                                                                        Tue Apr  3 17:27:12 2018

              total        used        free	 shared  buff/cache   available
Mem:           3.7G        920M        2.3G         19M        489M        2.5G
Swap:            0B          0B          0B

# cat /proc/meminfo
MemTotal:        3880672 kB
MemFree:         2437112 kB
MemAvailable:    2652992 kB
Buffers:            2112 kB
Cached:           422940 kB
SwapCached:            0 kB
Active:           904436 kB
Inactive:         353720 kB
Active(anon):     833432 kB
Inactive(anon):    19692 kB
Active(file):      71004 kB
Inactive(file):   334028 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        833152 kB
Mapped:           112512 kB
Shmem:             20020 kB
Slab:              75720 kB
SReclaimable:      31168 kB
SUnreclaim:        44552 kB
KernelStack:        7056 kB
PageTables:        35900 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1940336 kB
Committed_AS:    3394224 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       14236 kB
VmallocChunk:   34359720856 kB
HardwareCorrupted:     0 kB
AnonHugePages:    104448 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       87928 kB
DirectMap2M:     3057664 kB
DirectMap1G:     3145728 kB


We can see that the free memory decreased from 2.6G to 2.3G after the test.
Moreover, after a second round of the add and del vlan devices, free memory decreased to 2.1G.
Every 0.5s: free -h                                                                                                                                                                        Tue Apr  3 17:29:21 2018

              total        used        free	 shared  buff/cache   available
Mem:           3.7G        1.1G        2.1G         19M        489M        2.3G
Swap:            0B          0B          0B

At the end, after running add and del vlan for about 5 times, free memory is only 1.6G.
Every 0.5s: free -h                                                                                                                                                                        Tue Apr  3 17:31:01 2018

              total        used        free	 shared  buff/cache   available
Mem:           3.7G        1.7G        1.6G         19M        489M        1.8G
Swap:            0B          0B          0B

Comment 8 Wei 2019-02-20 09:30:59 UTC
It can be reproduced with rhel7 kernel 3.10.0-831 but no problem with upstream 4.20.0 kernel.

RHEL7 3.10.0-831
    #free && ./add.sh eth0 3600 && free && ./delete.sh  eth0 
                 total        used        free      shared  buff/cache   available
    Mem:        3880936      117752     3612692        8508      150492     3546824
    Swap:       6160380           0     6160380
    [   54.174963] 8021q: 802.1Q VLAN Support v1.8
    [   54.175603] 8021q: adding VLAN 0 to HW filter on device eth0
    adding...
                total        used        free      shared  buff/cache   available
                Mem:        3880936      308684     3205916       22804      366336     3166588
                Swap:       6160380           0     6160380
    deleting...
                total        used        free      shared  buff/cache   available
                Mem:        3880936      316272     3350584        8532      214080     3303092
                Swap:       6160380           0     6160380

Upstream 4.20:
    # free && ./add.sh eth0 3600 && free && ./delete.sh  eth0 
                  total        used        free      shared  buff/cache   available
    Mem:        3771188      339792     3112068       16664      319328     3143888
    Swap:       6160380           0     6160380
    adding...
                  total        used        free      shared  buff/cache   available
    Mem:        3771188      343276     2924172       31032      503740     2965480
    Swap:       6160380           0     6160380
    deleting...
                  total        used        free      shared  buff/cache   available
    Mem:        3771188      341184     3111744       16736      318260     3142796
    Swap:       6160380           0     6160380

Comment 9 Wei 2019-02-21 03:08:36 UTC
Hi Xiyue,
Were you using RHEL7 or RHEL8 kernel? this bug has been moved to rhel8, can you try it on RHEL8?

Comment 10 Lei Yang 2019-09-24 08:39:44 UTC
It can reproduce with rhel8 kernel kernel-4.18.0-144.el8.x86_64.
1.After boot the guest.
# free -h
              total        used        free      shared  buff/cache   available
Mem:          3.7Gi       788Mi       2.3Gi        12Mi       622Mi       2.7Gi
Swap:         2.0Gi          0B       2.0Gi

2.Add 4094 vlan devices.
# for i in $(seq 4094); do ip link add link eth0 name eth0.$i type vlan id $i; ip link set eth0.$i up; done
# free -h
              total        used        free      shared  buff/cache   available
Mem:          3.7Gi       3.5Gi       107Mi        28Mi       103Mi        27Mi
Swap:         2.0Gi       559Mi       1.5Gi

3.Delete 4094 vlan devices.
# for i in $(seq 4094); do ip link delete eth0.$i ; done;
# free -h
              total        used        free      shared  buff/cache   available
Mem:          3.7Gi       3.5Gi       101Mi       4.0Mi        77Mi        20Mi
Swap:         2.0Gi       747Mi       1.3Gi

Comment 12 Ademar Reis 2020-02-05 22:46:36 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 13 Lei Yang 2020-06-23 07:51:58 UTC
Hit same issue with 'qemu-kvm-4.2.0-25.module+el8.3.0+6986+29a4dcd7.x86_64'

Test Version:
qemu-kvm-4.2.0-25.module+el8.3.0+6986+29a4dcd7.x86_64
kernel-4.18.0-216.el8.x86_64

Guest: RHEL8.3.0

Comment 18 RHEL Program Management 2021-01-08 07:26:42 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 19 Lei Yang 2021-01-20 10:35:38 UTC
Since this problem has not been reproduced on RHEL8.3-AV, RHEL8.4-virt and RHEL8.4-AV products, maybe this problem no longer exists, so change the status to "CURRENTRELEASE".