Bug 1474704
Summary: | VM instabilities with Cisco vPC-DI instances | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Pierre-Andre MOREY <pmorey> |
Component: | openstack-nova | Assignee: | Eoghan Glynn <eglynn> |
Status: | CLOSED NOTABUG | QA Contact: | Joe H. Rahme <jhakimra> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 10.0 (Newton) | CC: | areis, awaugama, berrange, dasmith, dorian.grandsire, eglynn, kchamart, pmorey, saime, sbauza, sferdjao, sgordon, skhodri, srevivo, vromanso |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-08 11:19:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 1
Sahid Ferdjaoui
2017-07-25 09:07:01 UTC
Yes the problem seems to happen only with Cisco OS. If they start a mininet image with 22 Cpus and 114GB of RAM it work without any issue while it fails when CISCO OS is used. Info from customer regarding kernel version used. Version kernel N5.5 (CISCO OS): kernel 2.6.3.8 Mininet used by ATOS : mininet 3.16.0-30-generic (In reply to skhodri from comment #16) > Yes the problem seems to happen only with Cisco OS. If they start a mininet > image with 22 Cpus and 114GB of RAM it work without any issue while it fails > when CISCO OS is used. > > Info from customer regarding kernel version used. > Version kernel N5.5 (CISCO OS): kernel 2.6.3.8 > Mininet used by ATOS : mininet 3.16.0-30-generic If you're talking about the guest OS, this is also expected. In Linux memory overcommitment is enabled by default and processes are only killed by the OOM-Killer when the memory is actually *used*. So the linux kernel (host) will happily accept qemu-kvm starting with more memory than the machine has, until that memory is actually used, by qemu-kvm itself or, in the case of memory allocated for a guest, by the guest OS. Different guest OSes have different memory usage patterns, which is why the behavior is different. Potential solutions: First of all, there should be some swap space (I'm not a memory management expert, but for a machine with 130GB of RAM, I would have *at the very least*, 64GB of swap). I don't see any reasons to have a machine with 0 swap space. If someone thinks they're doing it as a performance optimization, they're doing it wrong. Second, they should fine-tune their memory usage and guest size, with the explanations I gave above in mind. Proper amount of RAM and swap space, depending on the memory usage patterns of the host and guest. Fixing problems from comment #14 would also be a good idea. I think we should close this issue as NOTABUG. On a host of 130 GiB with 12GiB reserved for Hugepages and no swap, customer was trying to boot a VM of 114 GiB (small pages). It seems "expected" that OOM-Killer to kill the process at some points. During the call customer wanted to consider using only hugepages for its VM. SO we have assisted customer to reserve the 114Gib of hugepages on host and enable the usage of them with Nova. The VM get spawned with success. Since there are now using hugepages and the memory is locked/reserved for that QEMU process they should not be in a scenario where the process get killed. Related to my previous comment I'm closing it as NOTABUG. Please feel free to reopen if needed. (In reply to Sahid Ferdjaoui from comment #18) > I think we should close this issue as NOTABUG. > > On a host of 130 GiB with 12GiB reserved for Hugepages and no swap, customer > was trying to boot a VM of 114 GiB (small pages). It seems "expected" that > OOM-Killer to kill the process at some points. > > During the call customer wanted to consider using only hugepages for its VM. > SO we have assisted customer to reserve the 114Gib of hugepages on host and > enable the usage of them with Nova. The VM get spawned with success. Since > there are now using hugepages and the memory is locked/reserved for that > QEMU process they should not be in a scenario where the process get killed. What's the default swap setup configured by director? Does it follow the guidelines in the documentation: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/ch-swapspace.html Albeit this would only result in 4 GB of swap for a > 64 GB machine. (In reply to Stephen Gordon from comment #20) > (In reply to Sahid Ferdjaoui from comment #18) > > I think we should close this issue as NOTABUG. > > > > On a host of 130 GiB with 12GiB reserved for Hugepages and no swap, customer > > was trying to boot a VM of 114 GiB (small pages). It seems "expected" that > > OOM-Killer to kill the process at some points. > > > > During the call customer wanted to consider using only hugepages for its VM. > > SO we have assisted customer to reserve the 114Gib of hugepages on host and > > enable the usage of them with Nova. The VM get spawned with success. Since > > there are now using hugepages and the memory is locked/reserved for that > > QEMU process they should not be in a scenario where the process get killed. > > What's the default swap setup configured by director? Does it follow the > guidelines in the documentation: > > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/ > html/Storage_Administration_Guide/ch-swapspace.html > > Albeit this would only result in 4 GB of swap for a > 64 GB machine. Filed Bug # 1482681. |