Bug 213430
Summary: | xen_net: Memory squeeze in netback driver. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Stephen Tweedie <sct> |
Component: | kernel-xen | Assignee: | Herbert Xu <herbert.xu> |
Status: | CLOSED DUPLICATE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.0 | CC: | dkovalsk, mmayer, mnielsen, pbonzini, riel, xen-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-04-01 12:21:10 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 212826 | ||
Bug Blocks: |
Description
Stephen Tweedie
2006-11-01 14:27:05 UTC
I can provide more info on this.. What would help? I *am* seeing this in RHEL 5.1. kernel-xen-2.6.18-53.el5 xen-libs-3.0.3-41.el5 xen-3.0.3-41.el5 I am able to reproduce with 9 VMs running. Each have 2 interfaces presented. As soon as I add the 10th VM (19th and 20th vif) I get the memory squeeze and lose network connection to all VMs. Mark, your problem sounds quite different. I think in your case the HV is simply running out of memory. With the default RHEL setup, we assign almost all the available RAM to guests, leaving the HV with very little. This is simply broken as the HV needs to have some free memory so that things like networking can operate. The original problem here is a suspicion that a broken domU can bring down the whole machine. However, there is currently no proof of that. So if you want to persue your issue please open a new bug against the xen regarding memory distribution between the HV and the domains. In any case, if you adjust your memory allocation (by shrinking your guests or dom0) so that the HV has some memory (64M should be more than enough) then it should work correctly. Herbert, I just ran the VMs all up on the same system again. I'll get the logs, xm list, and networking configs for you today because I believe this is my bug. I have 10 VMs running right now. The total memory assigned to them is 11G. I have a 32G system. Domain-0 in xm list reports 20552 memory available. I was watching one of the VMs which is in a cluster through it's console. As soon as I started up the 11th VM, which brought up vif 20 and 21 in this case, I see my clustered VM lose all it's DLM connections. That shows me a definite loss of networking. I've also tested this with pings. As soon as I shut down that 11th VM, bring my total VIF count to 19, connections are re-established and the memory squeeze error stops. Herbert, I just re-read your comment after thinking a bit more. Are you saying I need to shrink dom0 itself to keep it from taking all the system memory? If so, do I do that in /etc/xen/xend-config.sxp? Yes you need to shrink dom0. The easiest way is probably "xm mem-set" or its virsh equivalent. Which would be virsh setmem. I tried both ways (virsh setmem and xm mem-set) and set the Domain-0 down to 25G. I started up the same amount of VMs, which only take a total of 11G, and I get the same error "memory squeeze in netback driver". Then I lose all my network connections to the VMs. After about 30 seconds, I also lost my ssh connection, though I could still ping the system. You've got a 32G system and you set dom0 to 25G. That leaves only 7G free for the other guests. You then start 11G worth of guests, which means now the HV has almost no memory. So I suggest that you try setting dom0 down to 10G as a test. Thanks! OK, Herbert straightened me out in e-mails about what is going on with the memory, I wasn't understanding properly. I set dom0 down to 10G, then started up 13 domU systems at a total of 14G and do not have the memory squeeze. Sorry for the misunderstanding, I've set this bug back to medium/medium. Since the original bug report has now been closed and I've not received any new info indicating any bugs in xen netfront/netback, I'm going to close this bug. In conclusion the original issue was mostly like due to an incorrect memory assignment, i.e., leaving too little memory for the hypervisor. I have at least three customers (fourth case about to be escalated) still reporting this problem in RHEL5.1 and RHEL5.2 beta. It is true that the problem arises if the allocated memory for Dom0 and DomUs becomes close to the total physical memory. However I would expect - maybe I am wrong - from an Enterprise OS that xen is able to prevent the hypervisor from running out of memory in the first place. While the conclusion of comment 12 is correct, newer RHELs are sidestepping the problem by disabling flipping. So, re-closing as a dup of the bug about flipping-induced network failures. *** This bug has been marked as a duplicate of bug 648763 *** *** Bug 723919 has been marked as a duplicate of this bug. *** |