Bug 1667560
| Summary: | When scheduling an instance with PCI PT NIC SR-IOV on an hypervisor with a swapfile on the root partition, the IO can kill the OS | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | David Vallee Delisle <dvd> |
| Component: | qemu-kvm | Assignee: | Alex Williamson <alex.williamson> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Pei Zhang <pezhang> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 7.5 | CC: | aarcange, alex.williamson, chayang, dvd, juzhang, knoel, michen, pbonzini, pezhang, virt-maint, yfu |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-03-20 21:45:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
David Vallee Delisle
2019-01-18 19:50:49 UTC
This appears to be a case of user error, a VM making use of device assignment cannot be swapped. The January 11th statement in the customer case that a 40G VM should be able to be launched on a 20G host with 20G swap is incorrect for an assigned device VM. All of the memory for the VM must be pinned in memory at the instantiation of the VM for device assignment. If swap is present the host will try very hard to free memory, often invoking the OOM killer to free that memory. Host behavior is undesirable during this phase. Running the host system without swap can improve the behavior through this transition, the VM will fail more quickly without such a stall on the host system overall. This looks like a configuration error, is there more to this request? Alex, I believe I might have not reproduced the exact symptoms as the customer. After looking closely, our stack traces aren't the same. This is the customer's [1] and this is mine [2]. In my case, there's 3 retries and after the 3rd one, the compute is back to an operational state. In the customer's case, the compute is still frozen after 15h. When looking in messages [3], we see that the driver (?) is unable to remove the VF's MAC and ~45s when the second try is starting, it's setting a new one and it looks like it's at this moment that the host is completely frozen. Customer has enabled sysrq and will try to reproduce this issue and have a dump for us. In the meantime, do we have anything helpful in these traces or with this new information? Thank you very much, DVD [1] ~~~ Dec 14 23:33:41 compute-038 kernel: [<ffffffff885135d4>] dump_stack+0x19/0x1b Dec 14 23:33:41 compute-038 kernel: [<ffffffff8850e79f>] dump_header+0x90/0x229 Dec 14 23:33:41 compute-038 kernel: [<ffffffff880dc63b>] ? cred_has_capability+0x6b/0x120 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87f9ac64>] oom_kill_process+0x254/0x3d0 Dec 14 23:33:41 compute-038 kernel: [<ffffffff880dc71e>] ? selinux_capable+0x2e/0x40 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87f9b4a6>] out_of_memory+0x4b6/0x4f0 Dec 14 23:33:41 compute-038 kernel: [<ffffffff8850f2a3>] __alloc_pages_slowpath+0x5d6/0x724 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87fa17f5>] __alloc_pages_nodemask+0x405/0x420 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87fef7c5>] alloc_pages_vma+0xb5/0x200 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87fc8a17>] handle_pte_fault+0x887/0xd10 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87fcae3d>] handle_mm_fault+0x39d/0x9b0 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87fb8223>] ? zone_statistics+0x63/0xa0 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87fc16c6>] __get_user_pages+0x1c6/0x760 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87fc1fcd>] get_user_pages_unlocked+0x15d/0x1f0 Dec 14 23:33:41 compute-038 kernel: [<ffffffff87e7911f>] get_user_pages_fast+0x9f/0x1a0 Dec 14 23:33:41 compute-038 kernel: [<ffffffffc0856396>] vaddr_get_pfn+0x156/0x170 [vfio_iommu_type1] Dec 14 23:33:41 compute-038 kernel: [<ffffffffc085695f>] vfio_pin_pages_remote+0x11f/0x370 [vfio_iommu_type1] Dec 14 23:33:41 compute-038 kernel: [<ffffffffc0857c42>] vfio_iommu_type1_ioctl+0x532/0x970 [vfio_iommu_type1] Dec 14 23:33:41 compute-038 kernel: [<ffffffffc08748c8>] vfio_fops_unl_ioctl+0x68/0x2b0 [vfio] Dec 14 23:33:41 compute-038 kernel: [<ffffffff88034040>] do_vfs_ioctl+0x360/0x550 Dec 14 23:33:41 compute-038 kernel: [<ffffffff880dccdf>] ? file_has_perm+0x9f/0xb0 Dec 14 23:33:41 compute-038 kernel: [<ffffffff880342d1>] SyS_ioctl+0xa1/0xc0 Dec 14 23:33:41 compute-038 kernel: [<ffffffff8852579b>] system_call_fastpath+0x22/0x27 ~~~ [2] ~~~ Jan 18 18:45:01 compute-0 kernel: CPU 4/KVM invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 Jan 18 18:45:01 compute-0 kernel: CPU 4/KVM cpuset=vcpu4 mems_allowed=0-1 Jan 18 18:45:01 compute-0 kernel: CPU: 16 PID: 62175 Comm: CPU 4/KVM Kdump: loaded Not tainted 3.10.0-862.11.6.el7.x86_64 #1 Jan 18 18:45:01 compute-0 kernel: Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.4.2 01/09/2017 Jan 18 18:45:01 compute-0 kernel: Call Trace: Jan 18 18:45:01 compute-0 kernel: [<ffffffffa97135d4>] dump_stack+0x19/0x1b Jan 18 18:45:01 compute-0 kernel: [<ffffffffa970e79f>] dump_header+0x90/0x229 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa92dc63b>] ? cred_has_capability+0x6b/0x120 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa919ac64>] oom_kill_process+0x254/0x3d0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa92dc71e>] ? selinux_capable+0x2e/0x40 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa919b4a6>] out_of_memory+0x4b6/0x4f0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa970f2a3>] __alloc_pages_slowpath+0x5d6/0x724 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa91a17f5>] __alloc_pages_nodemask+0x405/0x420 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa91ef7c5>] alloc_pages_vma+0xb5/0x200 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa91dde85>] __read_swap_cache_async+0x115/0x190 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa91ddf26>] read_swap_cache_async+0x26/0x60 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa91de008>] swapin_readahead+0xa8/0x110 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa91c89a2>] handle_pte_fault+0x812/0xd10 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa91cae3d>] ? handle_mm_fault+0x39d/0x9b0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa91cae3d>] handle_mm_fault+0x39d/0x9b0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa91c16c6>] __get_user_pages+0x1c6/0x760 Jan 18 18:45:01 compute-0 kernel: [<ffffffffc08cc549>] __gfn_to_pfn_memslot+0x179/0x480 [kvm] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc08f29b7>] try_async_pf+0x67/0x1f0 [kvm] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc08f4a6a>] tdp_page_fault+0x13a/0x260 [kvm] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc063fea2>] ? vmx_vcpu_run+0x352/0xa90 [kvm_intel] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc08ebf51>] kvm_mmu_page_fault+0x71/0x120 [kvm] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc063fea2>] ? vmx_vcpu_run+0x352/0xa90 [kvm_intel] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc06389dd>] handle_ept_violation+0x8d/0x100 [kvm_intel] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc0641a14>] vmx_handle_exit+0x294/0xc90 [kvm_intel] Jan 18 18:45:01 compute-0 kernel: [<ffffffffa914bdf4>] ? rcu_eqs_exit_common.isra.31+0x24/0xe0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffc063feae>] ? vmx_vcpu_run+0x35e/0xa90 [kvm_intel] Jan 18 18:45:01 compute-0 kernel: [<ffffffffa914bf00>] ? rcu_eqs_exit+0x50/0xa0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffc08de74d>] vcpu_enter_guest+0x64d/0x12c0 [kvm] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc08f292f>] ? kvm_can_do_async_pf+0x4f/0x70 [kvm] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc08e64e1>] ? kvm_arch_can_inject_async_page_present+0x21/0x30 [kvm] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc08e5ea8>] kvm_arch_vcpu_ioctl_run+0x358/0x480 [kvm] Jan 18 18:45:01 compute-0 kernel: [<ffffffffc08cb641>] kvm_vcpu_ioctl+0x2b1/0x650 [kvm] Jan 18 18:45:01 compute-0 kernel: [<ffffffffa922045e>] ? do_readv_writev+0x19e/0x260 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa9234040>] do_vfs_ioctl+0x360/0x550 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa914bdf4>] ? rcu_eqs_exit_common.isra.31+0x24/0xe0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa92dccdf>] ? file_has_perm+0x9f/0xb0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa914bf00>] ? rcu_eqs_exit+0x50/0xa0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa92342d1>] SyS_ioctl+0xa1/0xc0 Jan 18 18:45:01 compute-0 kernel: [<ffffffffa9725a1b>] tracesys+0xa3/0xc9 ~~~ [3] ~~~ Dec 14 23:33:45 compute-038 kernel: ixgbe 0000:09:00.0: removing MAC on VF 29 Dec 14 23:33:45 compute-038 kernel: ixgbe 0000:09:00.0: Could NOT remove the VF MAC address. Dec 14 23:33:45 compute-038 kernel: ixgbe 0000:09:00.1: removing MAC on VF 29 Dec 14 23:33:45 compute-038 kernel: ixgbe 0000:09:00.1: Could NOT remove the VF MAC address. Dec 14 23:33:45 compute-038 kernel: ixgbe 0000:09:00.0: removing MAC on VF 28 Dec 14 23:33:45 compute-038 kernel: ixgbe 0000:09:00.0: Could NOT remove the VF MAC address. Dec 14 23:33:45 compute-038 kernel: ixgbe 0000:09:00.1: removing MAC on VF 28 Dec 14 23:33:45 compute-038 kernel: ixgbe 0000:09:00.1: Could NOT remove the VF MAC address. Dec 14 23:33:45 compute-038 libvirtd: 2018-12-14 23:33:45.414+0000: 11521: error : virNetDevSetVfConfig:1701 : Cannot set interface MAC/vlanid to 00:00:00:00:00:00/0 for ifname ens1f0 vf 29: Cannot allocate memory Dec 14 23:33:45 compute-038 libvirtd: 2018-12-14 23:33:45.416+0000: 11521: error : virNetDevSetVfConfig:1701 : Cannot set interface MAC/vlanid to 00:00:00:00:00:00/0 for ifname ens1f1 vf 29: Cannot allocate memory Dec 14 23:33:45 compute-038 libvirtd: 2018-12-14 23:33:45.419+0000: 11521: error : virNetDevSetVfConfig:1701 : Cannot set interface MAC/vlanid to 00:00:00:00:00:00/0 for ifname ens1f0 vf 28: Cannot allocate memory Dec 14 23:33:45 compute-038 libvirtd: 2018-12-14 23:33:45.422+0000: 11521: error : virNetDevSetVfConfig:1701 : Cannot set interface MAC/vlanid to 00:00:00:00:00:00/0 for ifname ens1f1 vf 28: Cannot allocate memory <snip> Dec 14 23:34:32 compute-038 kernel: ixgbe 0000:09:00.0: setting MAC fa:16:3e:cf:b5:4f on VF 28 Dec 14 23:34:32 compute-038 kernel: ixgbe 0000:09:00.0: Reload the VF driver to make this change effective. Dec 14 23:34:32 compute-038 kernel: ixgbe 0000:09:00.1: setting MAC fa:16:3e:6f:48:da on VF 28 Dec 14 23:34:32 compute-038 kernel: ixgbe 0000:09:00.1: Reload the VF driver to make this change effective. ~~~ (In reply to David Vallee Delisle from comment #5) > Alex, > > I believe I might have not reproduced the exact symptoms as the customer. > After looking closely, our stack traces aren't the same. No, you're not reproducing the issue. The customer stack trace is the point at which the vfio driver is trying to pin the guest memory and appears as a classic attempt to over-commit the host memory. In your case, I don't see what the issue is. If you run enough VMs, you can always induce an out-of-memory condition, the limit is simply much higher with VMs that allow over-committing rather than device assignment VMs which do not. The ixgbe messages in your logs suggest you haven't enabled device assignment and are probably using some sort of macvlan approach which does not involved the vfio driver. As in comment 4, this seems to be a case of a VM being provisioned on a host with insufficient resources for it, resulting in a swap storm, OOM, and generally poor behavior on the host. Device assignment VMs do not support memory over-commit. Please re-open with additional information if there's reason to pursue further. |