Bug 454285

Summary:	xen_net: Memory squeeze in netback driver
Product:	Red Hat Enterprise Linux 5	Reporter:	Monty Walls <mwalls>
Component:	kernel-xen	Assignee:	Chris Lalancette <clalance>
Status:	CLOSED DUPLICATE	QA Contact:	Martin Jenner <mjenner>
Severity:	medium	Docs Contact:
Priority:	low
Version:	5.2	CC:	beres.laszlo, berrange, clalance, daniel.brnak, dwysocha, dzickus, grimme, herrold, jmh, mschick, orion, pbonzini, syeghiay, tao, tomg, xen-maint
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	When running multiple guest domains, guest networking may temporarily stop working, resulting in the following error being reported in the dom0 logs: Memory squeeze in netback driver To work around this, raise the amount of memory available to the dom0 with the dom0_mem hypervisor command line option.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-05-06 08:42:06 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	454962, 492568

Description Monty Walls 2008-07-07 14:47:03 UTC

with only 4 vm's running on a quad opteron v40z (32gb ram) running current
patched rhel5.2 (memory allocation to vm's = 22gb reserved, 18gb allocated, 12gb
free ram in dom0).

Current network setup is 
xenbr0 = 10.254.0.0/16 = eth0
xenbr1 = 10.1.250.0/16 = eth1
crossover = eth2 (down)

with 3 vm's dual-homed on xenbr0 & xenbr1, and 1 vm with only xenbr1.

all are running production (2 oracle servers, 1 phpbbs server, 1 tomcat apps
server).

Under rhel5.1 I did not get this error, but under 5.2 I'm get the netback driver
memory squeeze.

Jul  7 07:23:19 fortuna kernel: xen_net: Memory squeeze in netback driver.
Jul  7 07:24:18 fortuna kernel: xen_net: Memory squeeze in netback driver.
Jul  7 07:32:00 fortuna kernel: xen_net: Memory squeeze in netback driver.
Jul  7 07:36:44 fortuna kernel: xen_net: Memory squeeze in netback driver.
Jul  7 08:28:18 fortuna kernel: xen_net: Memory squeeze in netback driver.
Jul  7 09:30:01 fortuna kernel: xen_net: Memory squeeze in netback driver.

Comment 1 Jan Mark Holzer 2008-07-07 15:06:59 UTC

Hi ,

    one workaround/solution is to specify dom0_mem=XXMB to your
    xen* line in /etc/grub.conf
    This will force dom0 to a pre-specified amount of memory
     and avoid the 'Memory squeeze in netback driver' error 
     you are seeing .
     with your current configuration (32GB) you should be ok
      with 1GB-2GB for dom0_mem .

      The entry in /etc/grub.conf would look similar to the following :
      Note the "xen" line where we allocate 2GB of memory to dom0

title Red Hat Enterprise Linux Server (2.6.18-90.el5xen)
	root (hd0,0)
	kernel /xen.gz-2.6.18-90.el5 dom0_mem=2G
	module /vmlinuz-2.6.18-90.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb q
uiet
	module /initrd-2.6.18-90.el5xen.img

- Jan

Comment 2 Laszlo Beres 2008-07-21 13:56:21 UTC

We have the same issue with dom0_mem=512M setting, I assume we cannot lower that.

Comment 3 Tom Georgoulias 2008-07-22 17:56:24 UTC

I am also experiencing this issue, using xen-3.0.3-64.el5_2.1 and
kernel-xen-2.6.18-92.1.1.el5 (32-bit).  Server has 16GB of RAM.

I just rebooted the server with dom0_mem=1g.  Here's what my grub.conf looks like:

title Red Hat Enterprise Linux Server (2.6.18-92.1.1.el5xen)
	root (hd0,0)
	kernel /xen.gz-2.6.18-92.1.1.el5 dom0_mem=1g
	module /vmlinuz-2.6.18-92.1.1.el5xen ro root=/dev/vg0/root
	module /initrd-2.6.18-92.1.1.el5xen.img

Comment 4 Bill Burns 2008-07-23 12:34:20 UTC

Re comment #2, the idea is to raise the memory for dom0, not lower it. That is
where the netback driver runs and where the memory is needed.

Comment 5 Tom Georgoulias 2008-07-28 14:39:07 UTC

Is there a rule of thumb that I should use to properly determine how much memory
should be allocated to the dom0 when using this setting?  Seems like the number
of domUs that will running as well as the amount of RAM in the server would play
a role in configuring this properly, but I could be wrong.

Comment 15 Chris Lalancette 2008-12-02 14:27:34 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
When running multiple guest domains, you may see an error in the dom0 logs that says "Memory squeeze in netback driver", and guest networking may temporarily stop working.  You may be able to work around this issue by specifying "dom0_mem" on the hypervisor command-line when you boot.  For instance, if you have a machine with 16GB of memory, you can try to add "dom0_mem=2GB" on the hypervisor command-line.

Comment 16 Ryan Lerch 2008-12-03 02:52:34 UTC

This release note is now in the 5.3 Release Notes.
The changes should be visible internally (within 12 hours) on:
http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.3/html/Release_Notes/

If there are any changes required, please edit the "Release Notes" field above
and set the requires_release_notes flag back to ?

Comment 17 Ryan Lerch 2008-12-03 02:52:34 UTC

Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,5 @@
-When running multiple guest domains, you may see an error in the dom0 logs that says "Memory squeeze in netback driver", and guest networking may temporarily stop working.  You may be able to work around this issue by specifying "dom0_mem" on the hypervisor command-line when you boot.  For instance, if you have a machine with 16GB of memory, you can try to add "dom0_mem=2GB" on the hypervisor command-line.+When running multiple guest domains, guest networking may temporarily stop working, resulting in the following error being reported in the dom0 logs:
+			
+Memory squeeze in netback driver
+			
+To work around this, raise the amount of memory available to the dom0 with the dom0_mem hypervisor command line option.

Comment 18 Chris Lalancette 2009-01-09 17:13:04 UTC

*** Bug 456328 has been marked as a duplicate of this bug. ***

Comment 19 Chris Lalancette 2009-01-15 14:47:41 UTC

I've uploaded a test kernel that contains this fix (along with several others)
to this location:

http://people.redhat.com/clalance/virttest

Could the original reporter try out the test kernels there, and report back if
it fixes the problem?

Thanks,
Chris Lalancette

P.S.  In my own testing so far, this hasn't seemed to change much for me.  But I'd still like to see the results from other people testing.

Comment 20 Dave Wysochanski 2009-02-11 01:38:38 UTC

I upgraded a RHEL5.2 box (amd x86_64, 6GB RAM) to RHEL5.3 and now all my xen domUs have been rendered crippled by this bug.  I will try your fix.

Previously I was running 2.6.18-92.1.22 with no issues.  This would point to a regression between this version and 2.6.18-128.1.1.

Comment 21 Dave Wysochanski 2009-02-11 02:25:15 UTC

Same problem with your test kernel, 2.6.18-130.el5virttest6.x86_64.rpm.  Booting back to 2.6.18-92.1.22 solves the problem.

Also confirmed that 2.6.18-128.1.1 runs ok (at least so far) on my other recently upgraded RHEL5.3 box with 2GB ram (x86_64 intel xeon).

Comment 22 Chris Lalancette 2009-02-11 07:54:28 UTC

Dave,

Yes, the patch in here actually is seeming to make things worse for me, not better.  I don't really understand why, though.  It will need more looking at.  What is interesting is that between 5.2 and 5.3 we didn't really touch this code, so something else must have changed that is causing it to happen more frequently on your box.  Can you give me more details about the machine that is having more problems with this?  Hardware details, dom0 details, guest details, workload details?  I can only make it happen randomly here, so it's proven difficult for me to debug.

Thanks,
Chris Lalancette

Comment 23 Guy Streeter 2009-02-25 18:58:51 UTC

Copied from IT 139549:

The customer confirms that the issue remains under kernel-xen-2.6.18-128.el5.  The customer has tested kernel-xen-2.6.18-130.el5virttest6 from http://people.redhat.com/clalance/virttest/, and this has resolved the issue for him.

Thanks,
Eric

Comment 24 Chris Lalancette 2009-02-25 19:05:07 UTC

(In reply to comment #23)
> Copied from IT 139549:
> 
> The customer confirms that the issue remains under kernel-xen-2.6.18-128.el5. 
> The customer has tested kernel-xen-2.6.18-130.el5virttest6 from
> http://people.redhat.com/clalance/virttest/, and this has resolved the issue
> for him.

Hm, that is actually interesting.  In my testing, this patch seemed to make things worse, not better.  I've actually dropped it for now from the virttest kernels.  That being said, it was a pretty subjective test; I didn't have a reliable reproducer, so it just "felt" like it happened more often.  Did the customer have a reliable reproducer, and if so, can you share the details?

Thanks,
Chris Lalancette

Comment 26 Orion Poplawski 2009-03-02 21:20:06 UTC

virttest6 seemed to not help me.  I've been running fine for a while with virttest7.  Will give virttest10 a try soon.

Comment 27 Orion Poplawski 2009-03-03 17:05:12 UTC

virttest10 is working fine for me as well.  Dom0 is now down to 452MB. Before I'd see problems below 1G.

Comment 28 Chris Lalancette 2009-03-05 11:00:59 UTC

(In reply to comment #26)
> virttest6 seemed to not help me.  I've been running fine for a while with
> virttest7.  Will give virttest10 a try soon.

(In reply to comment #27)
> virttest10 is working fine for me as well.  Dom0 is now down to 452MB. Before
> I'd see problems below 1G.

See, that's interesting.  I dropped the patch in this BZ way back at virttest5, and haven't had it in any of the virttest builds since.  So your test with virttest6 and virttest7 should have had no differences.  Maybe something else is tickling the problem slightly, but unfortunately, these results are current inconclusive.

For what it's worth, I've looked into the issue a little more deeply.  The problem seems to be that when the networking ring fills up, the dom0 tries to balloon a little bit to get more memory.  However, for reasons I don't quite understand, this doesn't always succeed, and that's when you start getting the "Memory squeeze" errors.  At this point, if you were to manually balloon the dom0 or another domU down, I believe you would leave enough room for the auto-balloon facility of netback.  But I haven't proven that yet.

Chris Lalancette

Comment 29 Chris Lalancette 2009-04-17 08:24:30 UTC

Can some of the people who are affected by this bug please try out the virttest16 kernel?  It's available here:

http://people.redhat.com/clalance/virttest

As a side effect of the probable fix for BZ 479754, it seems to have fixed this bug as well.  I would like to get confirmation, though.

Thank you,
Chris Lalancette

Comment 32 Chris Lalancette 2009-05-06 08:42:06 UTC

While not technically a dup, the patches that were posted to fix bz 479754 probably fix this issue as well.  I'm going to close it as a dup of that BZ.

Chris Lalancette

*** This bug has been marked as a duplicate of bug 479754 ***

Comment 33 Dave Wysochanski 2009-11-30 20:28:42 UTC

For the record, I started seeing this again with 2.6.18-164.2.1.el5xen.  For now I seem to have resolved the issue by following the directions here:
http://support.neosurge.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=43
How do I fix "xen_net: Memory squeeze in netback driver" ?
Article

This error is caused by memory being dynamically allocated on the dom0 device on virtualized systems. It can be fixed by assigning a static amount of ram on the dom0. This procedure is specific for virtualized systems running the Xen platform.

   1. Edit /etc/grub.conf using your favorite editor.
      In the "kernel /versionnumberhere" line add the following to the end of the line: dom0_mem=512M
      An example would look like:
      kernel /xen.gz-2.6.18-128.1.10.el5 dom0_mem=512M
   2. Edit /etc/xen/xend-config.sxp and change the following:
      (dom0-min-mem 256) to:
      (dom0-min-mem=0)
   3. Lastly, reboot your dom0 device. This will completely fix the memory squeeze issue. 

I set dom0_mem=768M and did step #2 above as well.  Until now I was not running with a "dom0_mem" line - tried this before and it did not seem to help.  This fix above may be a red-herring though since from what I can tell, the machine was up for 34 days without these messages.  Sometime this morning the machine started acting up and when I tried to reboot it, I was getting this error again, which prevented some domU's from starting.  So I went searching for answers once again.  Not sure what is really going on as this problem seems to appear/disappear at random.  Once I see the problem a simple reboot (warm or cold) did not seem to solve it so maybe these new settings will.

Comment 34 Paolo Bonzini 2011-01-25 13:55:05 UTC


*** This bug has been marked as a duplicate of bug 648763 ***