Bug 660315 - [XEN][RHEL5.5] xen_net: Memory squeeze in netback driver.
Summary: [XEN][RHEL5.5] xen_net: Memory squeeze in netback driver.
Keywords:
Status: CLOSED DUPLICATE of bug 653262
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.5
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-12-06 13:41 UTC by asilva
Modified: 2018-11-14 16:36 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-06 15:05:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description asilva 2010-12-06 13:41:17 UTC
> Description of problem:
this customer is experiencing interruptions of the Xen DomU's after the kernel spits out messages like "xen_net: Memory squeeze in netback driver".

> Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux 5.5 x86_64

> How reproducible:
at the moment only in customer environment. We are able to ask for tests there.

> Steps to Reproduce:

I did the tests to try a reproducer behaviour (see below) but unfortunately I cannot reproduce it in house.

I will give a new try using kernel 2.6.18-194.el5xen (custormer's kernel on Dom0)

===================== TEST =========================================


       +------+                       +------+
 tg3   | eth0 |-----------------------| eth0 | tg3
       +------+                       +------+
          |                               |
          |                               |
          |                               |
   +------------+                +------------+           
   | HOST 2     |                |    HOST 1  |
   +------------+                +------------+
          |                              |
   +-----------+                    +-----------+
   |   bridge  |                    |   bridge  |
   +-----------+                    +-----------+
          |				| | |	
     +--------+				| | |                      
     |  VM4   |              +----------+ | +------------+
     +--------+              |            |              |
		             |        +--------+         |
			     |        |  VM2   |         |
          		     |        +--------+         |
			   +--------+		    +--------+	
    			   |  VM1   | 		    |  VM3   |
  			   +--------+  		    +--------+	


Host 1
===========
- RHEL 5.5
- Kernel 2.6.18-194.17.4.el5xen
- 4G RAM

- Kernel/XEN params:
dom0_mem=2048M

- VMs Runnning:
[root@host1 /]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     2048     4 r-----    136.6
rhel5_test01                               1      511     2 -b----     31.4
rhel5_test02                               2      511     2 -b----     28.3
rhel5_test03                               3      511     2 -b----     27.1
 

Host 2
===========
- RHEL 5.5
- Kernel 2.6.18-194.17.4.el5xen
- 2G RAM

[root@xen02 /]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     1480     2 r-----    799.9
rhel5_test04                               1      511     1 -b----     35.9

========= Guest settings ====================

VM 1 (rhel5_test01) - PV
=============================
- 2 CPU 
- 512M 
- RHEL 5.5
- IP 10.1.1.13

VM 2 (rhel5_test02) - PV
============================
- 2 CPU 
- 512M 
- RHEL 5.5
- IP 10.1.1.24

VM 3 (rhel5_test03) - PV
===========================
- 2 CPU 
- 512M 
- RHEL 5.5
- IP 10.1.1.25

VM 4 (rhel5_test04) - PV
===========================
- 1 CPU 
- 512M 
- RHEL 5.5
- IP 10.1.1.16


Installing VMs
===================

1) Mounting ISO
# mount -o loop rhel55.iso /distro

2) Export the directory as NFS
# vi /etc/exports
/distro *(ro,sync)
             
3) Start nfs service

# service nfs start

4) Install your distro via NFS

# virt-install --paravirt --name rhel5_testXX --ram 512 --nographics --os-type=linux
--disk path=/var/lib/libvirt/images/rhel5_testXX.img,size=10 --location

Obs: XX = VM number (01, 02, 03, 04)


================ Test =====================


VM1, VM2, VM3 to VM4
=============================

- Sending big pkgs (1500) from VM[1,2,3] to VM4:
[root@rhel5_test01 ~]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 1500
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET
Socket  Message  Elapsed      Messages                   CPU      Service
Size    Size     Time         Okay Errors   Throughput   Util     Demand
bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB

129024    1500   60.00      655177      0      131.0     0.20     0.256 
129024           60.00      641195             128.2     -1.00    -1.000


[root@rhel5_test02 /]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 1500
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET
Socket  Message  Elapsed      Messages                   CPU      Service
Size    Size     Time         Okay Errors   Throughput   Util     Demand
bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB

129024    1500   60.00      637797      0      127.6     0.61     0.801 
129024           60.00      622325             124.5     -1.00    -1.000

[root@rhel5_test03 /]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 1500
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET
Socket  Message  Elapsed      Messages                   CPU      Service
Size    Size     Time         Okay Errors   Throughput   Util     Demand
bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB

129024    1500   60.00      654484      0      130.9     0.23     0.299 
129024           60.00      639516             127.9     -1.00    -1.000


- Monitoring host 1 with vmstat. During the test the memory remained unchanged.
[root@host1 ~]# vmstat 3
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 1087168  61760 585988    0    0    86    83  217  334  1  1 91  4  4
 3  0      0 1084696  61772 588264    0    0     0  1163 1737 2242  9  3 86  2  0
 5  0      0 1082836  61780 590040    0    0     0  1157 4005 1740 10  4 83  2  0
 4  0      0 1081224  61780 591456    0    0     0    37 6679 1479  5 13 81  0  1
 1  0      0 1079860  61792 592852    0    0     0   955 4290 1551  7 24 66  1  2
 5  0      0 1078372  61792 594384    0    0     0    16 6002 2096  2 20 75  0  3
 3  0      0 1076132  61804 596416    0    0     0   683 3997 1651  3 25 68  1  3
 2  0      0 1074024  61812 598452    0    0     0  1233 3773 1857 11 26 59  1  3
 2  0      0 1071784  61816 600528    0    0     0     0 4960 1681 12 26 59  0  3
 2  0      0 1069916  61824 602544    0    0     0  1164 4281 1786  8 26 62  1  3
 2  0      0 1068328  61824 604180    0    0     0     0 3911 1530  7 25 66  0  2
 2  0      0 1065724  61836 606476    0    0     0  1236 4660 2494  9 27 60  1  3
 3  0      0 1064848  61836 607252    0    0     0     8 3973 1227  5 24 68  0  3
 5  0      0 1062360  61848 609600    0    0     0   772 4260 2390 10 27 60  0  3
 8  0      0 1060624  61856 611248    0    0     0  1280 4601 2173  8 25 63  1  2
 2  0      0 1058136  61860 613684    0    0     0    25 4138 2453  7 26 64  0  3
 3  0      0 1056284  61868 615296    0    0     0  1120 4329 1709  8 24 65  1  3
 1  0      0 1054688  61872 616844    0    0     0     0 4472 1812  7 26 64  0  3
 5  0      0 1053184  61880 618272    0    0     0   860 4583 1813  8 24 65  1  2
 3  0      0 1050952  61888 620384    0    0     0  1040 4300 2207 10 27 61  0  3
 4  0      0 1049332  61892 622016    0    0     0     0 4189 1858  7 26 65  0  2

- Sending Small pkgs (10) from VM[1,2,3] to VM4:

[root@rhel5_test01 ~]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 10
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET
Socket  Message  Elapsed      Messages                   CPU      Service
Size    Size     Time         Okay Errors   Throughput   Util     Demand
bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB

129024      10   60.00     5993583      0        8.9     53.21    5122.857
129024           60.00     3173414               4.5     -1.00    -1.000

[root@rhel5_test02 ~]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 10
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET
Socket  Message  Elapsed      Messages                   CPU      Service
Size    Size     Time         Okay Errors   Throughput   Util     Demand
bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB

129024      10   60.00     5890275      0        7.9     39.48    1429.859
129024           60.00     3393410               4.5     -1.00    -1.000

[root@rhel5_test03 ~]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 10
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET
Socket  Message  Elapsed      Messages                   CPU      Service
Size    Size     Time         Okay Errors   Throughput   Util     Demand
bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB

129024      10   60.00     6524444      0        8.7     51.12    8289.897
129024           60.00      757835               1.0     -1.00    -1.000

- Monitoring host 1 with vmstat. During the test the memory remained unchanged

[root@xen01 ~]# vmstat 3
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  1      0 354444  66100 1329260    0    0    74    92  251  339  1  1 91  4  3
 4  1      0 354444  66100 1329260    0    0     0     0 1743  427  0  0 75 25  0
 4  1      0 354320  66100 1329260    0    0     0     0 1803  485  0  0 75 25  0
 4  1      0 354320  66100 1329260    0    0     0     3 1821  483  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1839  484  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0    16 1800  498  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1820  483  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1798  477  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1780  482  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1754  481  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1932  406  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1728  402  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1735  408  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1723  404  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1724  405  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1734  403  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1690  405  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 2026  404  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1612  407  0  0 75 25  0
 4  1      0 354320  66100 1329256    0    0     0     0 1640  479  0  0 75 25  0
 0  0      0 355948  66796 1329260    0    0     0   551  753  886  0  0 94  6  0

> Actual results:
getting the following messages "xen_net: Memory squeeze in netback driver"

[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:207) XendDomainInfo.create(['vm', ['name', 'sapvmhlbe5'], ['memory', 8192], ['maxmem', 80000], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['vcpus', 16], ['uuid', '806352c9
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:329) parseConfig: config is ['vm', ['name', 'sapvmhlbe5'], ['memory', 8192], ['maxmem', 80000], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['vcpus', 16], ['uuid', '806352c
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:446) parseConfig: result is {'features': None, 'image': ['linux', ['kernel', '/etc/xen/boot/sapvmhlbe5/vmlinuz-2.6.18-194.el5xen'], ['ramdisk', '/etc/xen/boot/sapvmhlbe5/initrd_osr-2.6.18-194.el5xen.img']],
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1784) XendDomainInfo.construct: None
[2010-10-29 10:34:19 xend 23443] DEBUG (balloon:145) Balloon: 7569796 KiB free; need 4096; done.
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1953) XendDomainInfo.initDomain: 43 256
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:229) XendDomainInfo.recreate({'paused': 1, 'cpu_time': 0L, 'ssidref': 0, 'hvm': 0, 'shutdown_reason': 0, 'dying': 0, 'mem_kb': 0L, 'domid': 43, 'max_vcpu_id': 15, 'crashed': 0, 'running': 0, 'maxmem_kb': 0L,
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] INFO (XendDomainInfo:241) Recreating domain 43, UUID 806352c9-0f40-4e9c-9b4b-8476b049b166.
[2010-10-29 10:34:19 xend 23443] ERROR (XendDomain:222) Failed to recreate information for domain 43.  Destroying it in the hope of recovery.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 216, in refresh
    self._add_domain(
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 265, in recreate
    vm = XendDomainInfo(xeninfo, domid, dompath, True, priv)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 534, in __init__
    self.validateInfo()
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 836, in validateInfo
    raise VmError('Invalid memory size')
VmError: Invalid memory size
[2010-10-29 10:34:19 xend 23443] ERROR (XendDomain:228) Destruction of 43 failed.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 225, in refresh
    do_FLR(d, doms[d]['hvm'])
NameError: global name 'do_FLR' is not defined
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1943) allocating 4 NUMA nodes
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:229) XendDomainInfo.recreate({'paused': 1, 'cpu_time': 0L, 'ssidref': 0, 'hvm': 0, 'shutdown_reason': 0, 'dying': 0, 'mem_kb': 0L, 'domid': 43, 'max_vcpu_id': 15, 'crashed': 0, 'running': 0, 'maxmem_kb': 0L,
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] INFO (XendDomainInfo:241) Recreating domain 43, UUID 806352c9-0f40-4e9c-9b4b-8476b049b166.
[2010-10-29 10:34:19 xend 23443] ERROR (XendDomain:222) Failed to recreate information for domain 43.  Destroying it in the hope of recovery.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 216, in refresh
    self._add_domain(
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 265, in recreate
    vm = XendDomainInfo(xeninfo, domid, dompath, True, priv)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 534, in __init__
    self.validateInfo()
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 836, in validateInfo
    raise VmError('Invalid memory size')
VmError: Invalid memory size
[2010-10-29 10:34:19 xend 23443] ERROR (XendDomain:228) Destruction of 43 failed.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 225, in refresh
    do_FLR(d, doms[d]['hvm'])
NameError: global name 'do_FLR' is not defined
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1994) _initDomain:shadow_memory=0x0, maxmem=0x13880, memory=0x2000.
[2010-10-29 10:34:19 xend 23443] DEBUG (balloon:151) Balloon: 7569788 KiB free; 0 to scrub; need 8388608; retries: 20.
[2010-10-29 10:34:19 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB.
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB.
[2010-10-29 10:34:19 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB.
[2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB.
[2010-10-29 10:34:20 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5342 MiB.
[2010-10-29 10:34:20 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5342 MiB.
[2010-10-29 10:34:20 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB.
[2010-10-29 10:34:20 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB.
[2010-10-29 10:34:20 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB.
[2010-10-29 10:34:20 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB.
[2010-10-29 10:34:21 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5343 MiB.
[2010-10-29 10:34:21 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5343 MiB.
[2010-10-29 10:34:21 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB.
[2010-10-29 10:34:21 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB.
[2010-10-29 10:34:22 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB.
[2010-10-29 10:34:22 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB.
[2010-10-29 10:34:23 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB.
[2010-10-29 10:34:23 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB.
[2010-10-29 10:34:24 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB.
[2010-10-29 10:34:24 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB.
[2010-10-29 10:34:25 xend 23443] DEBUG (balloon:145) Balloon: 8388852 KiB free; need 8388608; done.


> Expected results:
understand why the customer is getting this messages.

> Additional info:
The xensource has a similar issue: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=762

> Customer Hardware:
System Information
	Manufacturer: HP
	Product Name: ProLiant DL380 G6
	Family: ProLiant

# cat lspci | egrep Ethernet
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
07:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
07:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
0a:00.0 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42)
0a:00.1 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42)
0d:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
0d:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)

Cheers,
Alberto Silva

Comment 1 Laszlo Ersek 2010-12-06 15:05:16 UTC
This is a duplicate of bug 653262. The fix for bug 653501 is also needed.

The guest's netfront drivers (in flip mode) release pages to the host's netback driver, but those pages are consumed by the host's balloon driver. (Looking at the log above, dom0 is actively trying to balloon up.) The netfronts have to be switched to copying (the fix to bug 653262 renders this the default), but the host must also be stopped to try to balloon up temporarily for copying receivers (fix to bug 653501).

*** This bug has been marked as a duplicate of bug 653262 ***


Note You need to log in before you can comment on or make changes to this bug.