Bug 454285
Summary: | xen_net: Memory squeeze in netback driver | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Monty Walls <mwalls> |
Component: | kernel-xen | Assignee: | Chris Lalancette <clalance> |
Status: | CLOSED DUPLICATE | QA Contact: | Martin Jenner <mjenner> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.2 | CC: | beres.laszlo, berrange, clalance, daniel.brnak, dwysocha, dzickus, grimme, herrold, jmh, mschick, orion, pbonzini, syeghiay, tao, tomg, xen-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
When running multiple guest domains, guest networking may temporarily stop working, resulting in the following error being reported in the dom0 logs:
Memory squeeze in netback driver
To work around this, raise the amount of memory available to the dom0 with the dom0_mem hypervisor command line option.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2009-05-06 08:42:06 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 454962, 492568 |
Description
Monty Walls
2008-07-07 14:47:03 UTC
Hi , one workaround/solution is to specify dom0_mem=XXMB to your xen* line in /etc/grub.conf This will force dom0 to a pre-specified amount of memory and avoid the 'Memory squeeze in netback driver' error you are seeing . with your current configuration (32GB) you should be ok with 1GB-2GB for dom0_mem . The entry in /etc/grub.conf would look similar to the following : Note the "xen" line where we allocate 2GB of memory to dom0 title Red Hat Enterprise Linux Server (2.6.18-90.el5xen) root (hd0,0) kernel /xen.gz-2.6.18-90.el5 dom0_mem=2G module /vmlinuz-2.6.18-90.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb q uiet module /initrd-2.6.18-90.el5xen.img - Jan We have the same issue with dom0_mem=512M setting, I assume we cannot lower that. I am also experiencing this issue, using xen-3.0.3-64.el5_2.1 and kernel-xen-2.6.18-92.1.1.el5 (32-bit). Server has 16GB of RAM. I just rebooted the server with dom0_mem=1g. Here's what my grub.conf looks like: title Red Hat Enterprise Linux Server (2.6.18-92.1.1.el5xen) root (hd0,0) kernel /xen.gz-2.6.18-92.1.1.el5 dom0_mem=1g module /vmlinuz-2.6.18-92.1.1.el5xen ro root=/dev/vg0/root module /initrd-2.6.18-92.1.1.el5xen.img Re comment #2, the idea is to raise the memory for dom0, not lower it. That is where the netback driver runs and where the memory is needed. Is there a rule of thumb that I should use to properly determine how much memory should be allocated to the dom0 when using this setting? Seems like the number of domUs that will running as well as the amount of RAM in the server would play a role in configuring this properly, but I could be wrong. Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When running multiple guest domains, you may see an error in the dom0 logs that says "Memory squeeze in netback driver", and guest networking may temporarily stop working. You may be able to work around this issue by specifying "dom0_mem" on the hypervisor command-line when you boot. For instance, if you have a machine with 16GB of memory, you can try to add "dom0_mem=2GB" on the hypervisor command-line. This release note is now in the 5.3 Release Notes. The changes should be visible internally (within 12 hours) on: http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.3/html/Release_Notes/ If there are any changes required, please edit the "Release Notes" field above and set the requires_release_notes flag back to ? Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,5 @@ -When running multiple guest domains, you may see an error in the dom0 logs that says "Memory squeeze in netback driver", and guest networking may temporarily stop working. You may be able to work around this issue by specifying "dom0_mem" on the hypervisor command-line when you boot. For instance, if you have a machine with 16GB of memory, you can try to add "dom0_mem=2GB" on the hypervisor command-line.+When running multiple guest domains, guest networking may temporarily stop working, resulting in the following error being reported in the dom0 logs: + +Memory squeeze in netback driver + +To work around this, raise the amount of memory available to the dom0 with the dom0_mem hypervisor command line option. *** Bug 456328 has been marked as a duplicate of this bug. *** I've uploaded a test kernel that contains this fix (along with several others) to this location: http://people.redhat.com/clalance/virttest Could the original reporter try out the test kernels there, and report back if it fixes the problem? Thanks, Chris Lalancette P.S. In my own testing so far, this hasn't seemed to change much for me. But I'd still like to see the results from other people testing. I upgraded a RHEL5.2 box (amd x86_64, 6GB RAM) to RHEL5.3 and now all my xen domUs have been rendered crippled by this bug. I will try your fix. Previously I was running 2.6.18-92.1.22 with no issues. This would point to a regression between this version and 2.6.18-128.1.1. Same problem with your test kernel, 2.6.18-130.el5virttest6.x86_64.rpm. Booting back to 2.6.18-92.1.22 solves the problem. Also confirmed that 2.6.18-128.1.1 runs ok (at least so far) on my other recently upgraded RHEL5.3 box with 2GB ram (x86_64 intel xeon). Dave, Yes, the patch in here actually is seeming to make things worse for me, not better. I don't really understand why, though. It will need more looking at. What is interesting is that between 5.2 and 5.3 we didn't really touch this code, so something else must have changed that is causing it to happen more frequently on your box. Can you give me more details about the machine that is having more problems with this? Hardware details, dom0 details, guest details, workload details? I can only make it happen randomly here, so it's proven difficult for me to debug. Thanks, Chris Lalancette Copied from IT 139549: The customer confirms that the issue remains under kernel-xen-2.6.18-128.el5. The customer has tested kernel-xen-2.6.18-130.el5virttest6 from http://people.redhat.com/clalance/virttest/, and this has resolved the issue for him. Thanks, Eric (In reply to comment #23) > Copied from IT 139549: > > The customer confirms that the issue remains under kernel-xen-2.6.18-128.el5. > The customer has tested kernel-xen-2.6.18-130.el5virttest6 from > http://people.redhat.com/clalance/virttest/, and this has resolved the issue > for him. Hm, that is actually interesting. In my testing, this patch seemed to make things worse, not better. I've actually dropped it for now from the virttest kernels. That being said, it was a pretty subjective test; I didn't have a reliable reproducer, so it just "felt" like it happened more often. Did the customer have a reliable reproducer, and if so, can you share the details? Thanks, Chris Lalancette virttest6 seemed to not help me. I've been running fine for a while with virttest7. Will give virttest10 a try soon. virttest10 is working fine for me as well. Dom0 is now down to 452MB. Before I'd see problems below 1G. (In reply to comment #26) > virttest6 seemed to not help me. I've been running fine for a while with > virttest7. Will give virttest10 a try soon. (In reply to comment #27) > virttest10 is working fine for me as well. Dom0 is now down to 452MB. Before > I'd see problems below 1G. See, that's interesting. I dropped the patch in this BZ way back at virttest5, and haven't had it in any of the virttest builds since. So your test with virttest6 and virttest7 should have had no differences. Maybe something else is tickling the problem slightly, but unfortunately, these results are current inconclusive. For what it's worth, I've looked into the issue a little more deeply. The problem seems to be that when the networking ring fills up, the dom0 tries to balloon a little bit to get more memory. However, for reasons I don't quite understand, this doesn't always succeed, and that's when you start getting the "Memory squeeze" errors. At this point, if you were to manually balloon the dom0 or another domU down, I believe you would leave enough room for the auto-balloon facility of netback. But I haven't proven that yet. Chris Lalancette Can some of the people who are affected by this bug please try out the virttest16 kernel? It's available here: http://people.redhat.com/clalance/virttest As a side effect of the probable fix for BZ 479754, it seems to have fixed this bug as well. I would like to get confirmation, though. Thank you, Chris Lalancette While not technically a dup, the patches that were posted to fix bz 479754 probably fix this issue as well. I'm going to close it as a dup of that BZ. Chris Lalancette *** This bug has been marked as a duplicate of bug 479754 *** For the record, I started seeing this again with 2.6.18-164.2.1.el5xen. For now I seem to have resolved the issue by following the directions here: http://support.neosurge.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=43 How do I fix "xen_net: Memory squeeze in netback driver" ? Article This error is caused by memory being dynamically allocated on the dom0 device on virtualized systems. It can be fixed by assigning a static amount of ram on the dom0. This procedure is specific for virtualized systems running the Xen platform. 1. Edit /etc/grub.conf using your favorite editor. In the "kernel /versionnumberhere" line add the following to the end of the line: dom0_mem=512M An example would look like: kernel /xen.gz-2.6.18-128.1.10.el5 dom0_mem=512M 2. Edit /etc/xen/xend-config.sxp and change the following: (dom0-min-mem 256) to: (dom0-min-mem=0) 3. Lastly, reboot your dom0 device. This will completely fix the memory squeeze issue. I set dom0_mem=768M and did step #2 above as well. Until now I was not running with a "dom0_mem" line - tried this before and it did not seem to help. This fix above may be a red-herring though since from what I can tell, the machine was up for 34 days without these messages. Sometime this morning the machine started acting up and when I tried to reboot it, I was getting this error again, which prevented some domU's from starting. So I went searching for answers once again. Not sure what is really going on as this problem seems to appear/disappear at random. Once I see the problem a simple reboot (warm or cold) did not seem to solve it so maybe these new settings will. *** This bug has been marked as a duplicate of bug 648763 *** |