> Description of problem: this customer is experiencing interruptions of the Xen DomU's after the kernel spits out messages like "xen_net: Memory squeeze in netback driver". > Version-Release number of selected component (if applicable): Red Hat Enterprise Linux 5.5 x86_64 > How reproducible: at the moment only in customer environment. We are able to ask for tests there. > Steps to Reproduce: I did the tests to try a reproducer behaviour (see below) but unfortunately I cannot reproduce it in house. I will give a new try using kernel 2.6.18-194.el5xen (custormer's kernel on Dom0) ===================== TEST ========================================= +------+ +------+ tg3 | eth0 |-----------------------| eth0 | tg3 +------+ +------+ | | | | | | +------------+ +------------+ | HOST 2 | | HOST 1 | +------------+ +------------+ | | +-----------+ +-----------+ | bridge | | bridge | +-----------+ +-----------+ | | | | +--------+ | | | | VM4 | +----------+ | +------------+ +--------+ | | | | +--------+ | | | VM2 | | | +--------+ | +--------+ +--------+ | VM1 | | VM3 | +--------+ +--------+ Host 1 =========== - RHEL 5.5 - Kernel 2.6.18-194.17.4.el5xen - 4G RAM - Kernel/XEN params: dom0_mem=2048M - VMs Runnning: [root@host1 /]# xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 2048 4 r----- 136.6 rhel5_test01 1 511 2 -b---- 31.4 rhel5_test02 2 511 2 -b---- 28.3 rhel5_test03 3 511 2 -b---- 27.1 Host 2 =========== - RHEL 5.5 - Kernel 2.6.18-194.17.4.el5xen - 2G RAM [root@xen02 /]# xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 1480 2 r----- 799.9 rhel5_test04 1 511 1 -b---- 35.9 ========= Guest settings ==================== VM 1 (rhel5_test01) - PV ============================= - 2 CPU - 512M - RHEL 5.5 - IP 10.1.1.13 VM 2 (rhel5_test02) - PV ============================ - 2 CPU - 512M - RHEL 5.5 - IP 10.1.1.24 VM 3 (rhel5_test03) - PV =========================== - 2 CPU - 512M - RHEL 5.5 - IP 10.1.1.25 VM 4 (rhel5_test04) - PV =========================== - 1 CPU - 512M - RHEL 5.5 - IP 10.1.1.16 Installing VMs =================== 1) Mounting ISO # mount -o loop rhel55.iso /distro 2) Export the directory as NFS # vi /etc/exports /distro *(ro,sync) 3) Start nfs service # service nfs start 4) Install your distro via NFS # virt-install --paravirt --name rhel5_testXX --ram 512 --nographics --os-type=linux --disk path=/var/lib/libvirt/images/rhel5_testXX.img,size=10 --location Obs: XX = VM number (01, 02, 03, 04) ================ Test ===================== VM1, VM2, VM3 to VM4 ============================= - Sending big pkgs (1500) from VM[1,2,3] to VM4: [root@rhel5_test01 ~]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 1500 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 129024 1500 60.00 655177 0 131.0 0.20 0.256 129024 60.00 641195 128.2 -1.00 -1.000 [root@rhel5_test02 /]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 1500 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 129024 1500 60.00 637797 0 127.6 0.61 0.801 129024 60.00 622325 124.5 -1.00 -1.000 [root@rhel5_test03 /]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 1500 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 129024 1500 60.00 654484 0 130.9 0.23 0.299 129024 60.00 639516 127.9 -1.00 -1.000 - Monitoring host 1 with vmstat. During the test the memory remained unchanged. [root@host1 ~]# vmstat 3 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 1087168 61760 585988 0 0 86 83 217 334 1 1 91 4 4 3 0 0 1084696 61772 588264 0 0 0 1163 1737 2242 9 3 86 2 0 5 0 0 1082836 61780 590040 0 0 0 1157 4005 1740 10 4 83 2 0 4 0 0 1081224 61780 591456 0 0 0 37 6679 1479 5 13 81 0 1 1 0 0 1079860 61792 592852 0 0 0 955 4290 1551 7 24 66 1 2 5 0 0 1078372 61792 594384 0 0 0 16 6002 2096 2 20 75 0 3 3 0 0 1076132 61804 596416 0 0 0 683 3997 1651 3 25 68 1 3 2 0 0 1074024 61812 598452 0 0 0 1233 3773 1857 11 26 59 1 3 2 0 0 1071784 61816 600528 0 0 0 0 4960 1681 12 26 59 0 3 2 0 0 1069916 61824 602544 0 0 0 1164 4281 1786 8 26 62 1 3 2 0 0 1068328 61824 604180 0 0 0 0 3911 1530 7 25 66 0 2 2 0 0 1065724 61836 606476 0 0 0 1236 4660 2494 9 27 60 1 3 3 0 0 1064848 61836 607252 0 0 0 8 3973 1227 5 24 68 0 3 5 0 0 1062360 61848 609600 0 0 0 772 4260 2390 10 27 60 0 3 8 0 0 1060624 61856 611248 0 0 0 1280 4601 2173 8 25 63 1 2 2 0 0 1058136 61860 613684 0 0 0 25 4138 2453 7 26 64 0 3 3 0 0 1056284 61868 615296 0 0 0 1120 4329 1709 8 24 65 1 3 1 0 0 1054688 61872 616844 0 0 0 0 4472 1812 7 26 64 0 3 5 0 0 1053184 61880 618272 0 0 0 860 4583 1813 8 24 65 1 2 3 0 0 1050952 61888 620384 0 0 0 1040 4300 2207 10 27 61 0 3 4 0 0 1049332 61892 622016 0 0 0 0 4189 1858 7 26 65 0 2 - Sending Small pkgs (10) from VM[1,2,3] to VM4: [root@rhel5_test01 ~]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 10 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 129024 10 60.00 5993583 0 8.9 53.21 5122.857 129024 60.00 3173414 4.5 -1.00 -1.000 [root@rhel5_test02 ~]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 10 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 129024 10 60.00 5890275 0 7.9 39.48 1429.859 129024 60.00 3393410 4.5 -1.00 -1.000 [root@rhel5_test03 ~]# netperf -c -H 10.1.1.16 -l 60 -t UDP_STREAM -- -m 10 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.16 (10.1.1.16) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 129024 10 60.00 6524444 0 8.7 51.12 8289.897 129024 60.00 757835 1.0 -1.00 -1.000 - Monitoring host 1 with vmstat. During the test the memory remained unchanged [root@xen01 ~]# vmstat 3 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 4 1 0 354444 66100 1329260 0 0 74 92 251 339 1 1 91 4 3 4 1 0 354444 66100 1329260 0 0 0 0 1743 427 0 0 75 25 0 4 1 0 354320 66100 1329260 0 0 0 0 1803 485 0 0 75 25 0 4 1 0 354320 66100 1329260 0 0 0 3 1821 483 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1839 484 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 16 1800 498 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1820 483 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1798 477 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1780 482 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1754 481 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1932 406 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1728 402 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1735 408 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1723 404 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1724 405 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1734 403 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1690 405 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 2026 404 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1612 407 0 0 75 25 0 4 1 0 354320 66100 1329256 0 0 0 0 1640 479 0 0 75 25 0 0 0 0 355948 66796 1329260 0 0 0 551 753 886 0 0 94 6 0 > Actual results: getting the following messages "xen_net: Memory squeeze in netback driver" [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:207) XendDomainInfo.create(['vm', ['name', 'sapvmhlbe5'], ['memory', 8192], ['maxmem', 80000], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['vcpus', 16], ['uuid', '806352c9 [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:329) parseConfig: config is ['vm', ['name', 'sapvmhlbe5'], ['memory', 8192], ['maxmem', 80000], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['vcpus', 16], ['uuid', '806352c [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:446) parseConfig: result is {'features': None, 'image': ['linux', ['kernel', '/etc/xen/boot/sapvmhlbe5/vmlinuz-2.6.18-194.el5xen'], ['ramdisk', '/etc/xen/boot/sapvmhlbe5/initrd_osr-2.6.18-194.el5xen.img']], [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1784) XendDomainInfo.construct: None [2010-10-29 10:34:19 xend 23443] DEBUG (balloon:145) Balloon: 7569796 KiB free; need 4096; done. [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1953) XendDomainInfo.initDomain: 43 256 [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:229) XendDomainInfo.recreate({'paused': 1, 'cpu_time': 0L, 'ssidref': 0, 'hvm': 0, 'shutdown_reason': 0, 'dying': 0, 'mem_kb': 0L, 'domid': 43, 'max_vcpu_id': 15, 'crashed': 0, 'running': 0, 'maxmem_kb': 0L, [2010-10-29 10:34:19 xend.XendDomainInfo 23443] INFO (XendDomainInfo:241) Recreating domain 43, UUID 806352c9-0f40-4e9c-9b4b-8476b049b166. [2010-10-29 10:34:19 xend 23443] ERROR (XendDomain:222) Failed to recreate information for domain 43. Destroying it in the hope of recovery. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 216, in refresh self._add_domain( File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 265, in recreate vm = XendDomainInfo(xeninfo, domid, dompath, True, priv) File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 534, in __init__ self.validateInfo() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 836, in validateInfo raise VmError('Invalid memory size') VmError: Invalid memory size [2010-10-29 10:34:19 xend 23443] ERROR (XendDomain:228) Destruction of 43 failed. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 225, in refresh do_FLR(d, doms[d]['hvm']) NameError: global name 'do_FLR' is not defined [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1943) allocating 4 NUMA nodes [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:229) XendDomainInfo.recreate({'paused': 1, 'cpu_time': 0L, 'ssidref': 0, 'hvm': 0, 'shutdown_reason': 0, 'dying': 0, 'mem_kb': 0L, 'domid': 43, 'max_vcpu_id': 15, 'crashed': 0, 'running': 0, 'maxmem_kb': 0L, [2010-10-29 10:34:19 xend.XendDomainInfo 23443] INFO (XendDomainInfo:241) Recreating domain 43, UUID 806352c9-0f40-4e9c-9b4b-8476b049b166. [2010-10-29 10:34:19 xend 23443] ERROR (XendDomain:222) Failed to recreate information for domain 43. Destroying it in the hope of recovery. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 216, in refresh self._add_domain( File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 265, in recreate vm = XendDomainInfo(xeninfo, domid, dompath, True, priv) File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 534, in __init__ self.validateInfo() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 836, in validateInfo raise VmError('Invalid memory size') VmError: Invalid memory size [2010-10-29 10:34:19 xend 23443] ERROR (XendDomain:228) Destruction of 43 failed. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 225, in refresh do_FLR(d, doms[d]['hvm']) NameError: global name 'do_FLR' is not defined [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1994) _initDomain:shadow_memory=0x0, maxmem=0x13880, memory=0x2000. [2010-10-29 10:34:19 xend 23443] DEBUG (balloon:151) Balloon: 7569788 KiB free; 0 to scrub; need 8388608; retries: 20. [2010-10-29 10:34:19 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB. [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB. [2010-10-29 10:34:19 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB. [2010-10-29 10:34:19 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB. [2010-10-29 10:34:20 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5342 MiB. [2010-10-29 10:34:20 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5342 MiB. [2010-10-29 10:34:20 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB. [2010-10-29 10:34:20 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB. [2010-10-29 10:34:20 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB. [2010-10-29 10:34:20 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB. [2010-10-29 10:34:21 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5343 MiB. [2010-10-29 10:34:21 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5343 MiB. [2010-10-29 10:34:21 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB. [2010-10-29 10:34:21 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB. [2010-10-29 10:34:22 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB. [2010-10-29 10:34:22 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB. [2010-10-29 10:34:23 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB. [2010-10-29 10:34:23 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB. [2010-10-29 10:34:24 xend 23443] DEBUG (balloon:166) Balloon: setting dom0 target to 5344 MiB. [2010-10-29 10:34:24 xend.XendDomainInfo 23443] DEBUG (XendDomainInfo:1346) Setting memory target of domain Domain-0 (0) to 5344 MiB. [2010-10-29 10:34:25 xend 23443] DEBUG (balloon:145) Balloon: 8388852 KiB free; need 8388608; done. > Expected results: understand why the customer is getting this messages. > Additional info: The xensource has a similar issue: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=762 > Customer Hardware: System Information Manufacturer: HP Product Name: ProLiant DL380 G6 Family: ProLiant # cat lspci | egrep Ethernet 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 07:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 07:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 0a:00.0 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42) 0a:00.1 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42) 0d:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 0d:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) Cheers, Alberto Silva
This is a duplicate of bug 653262. The fix for bug 653501 is also needed. The guest's netfront drivers (in flip mode) release pages to the host's netback driver, but those pages are consumed by the host's balloon driver. (Looking at the log above, dom0 is actively trying to balloon up.) The netfronts have to be switched to copying (the fix to bug 653262 renders this the default), but the host must also be stopped to try to balloon up temporarily for copying receivers (fix to bug 653501). *** This bug has been marked as a duplicate of bug 653262 ***