Bug 1493470
Summary: | qemu-kvm core dumped if running stress-ng test inside guest and manually quit qemu after guest crashed | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yilzhang |
Component: | qemu-kvm-rhev | Assignee: | David Gibson <dgibson> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | yilzhang |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.4-Alt | CC: | knoel, michen, qzhang, rbalakri, virt-maint, yilzhang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-11-15 05:19:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
yilzhang
2017-09-20 09:31:24 UTC
Move to qemu-kvm-rhev. This fix will apply to both RHEL KVM and qemu-kvm-rhev for RHV and RHOSP. Both packages are using the same code base. Retest this case against qemu-kvm-rhev on both Power8 and Power9, and the test result is: In step3, guest won't crash, but instead, guest will hang with a lot of call traces popped-up on guest's console. Version of components on Power8: Host kernel: 3.10.0-747.el7.ppc64le qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-3.el7 Guest kernel: 3.10.0-747.el7.ppc64le Host processors number: 185 MemTotal: 1067454016 kB Version of components on Power9: Host kernel: 4.11.0-42.el7a.ppc64le qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-3.el7 Guest kernel: 4.11.0-42.el7a.ppc64le Host processors number: 176 MemTotal: 32232768 kB Part of the call traces on Power8: [ 2259.069411] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2259.069548] cache: nf_conntrack_c00000000120ab80, object size: 312, buffer size: 320, default order: 0, min order: 0 [ 2259.069764] node 0: slabs: 734, objs: 149736, free: 0 [ 2259.292171] stress-ng-icmp-: page allocation failure: order:0, mode:0x200020 [ 2259.302360] CPU: 55 PID: 18479 Comm: stress-ng-icmp- Not tainted 3.10.0-747.el7.ppc64le #1 [ 2259.302534] Call Trace: [ 2259.302605] [c0000001ba1e70e0] [c00000000001b340] show_stack+0x80/0x330 (unreliable) [ 2259.302817] [c0000001ba1e7190] [c0000000009f5cf4] dump_stack+0x30/0x44 [ 2259.302995] [c0000001ba1e71b0] [c000000000264cdc] warn_alloc_failed+0x10c/0x160 [ 2259.303203] [c0000001ba1e7260] [c00000000026ba78] __alloc_pages_nodemask+0xb68/0xc70 [ 2259.303410] [c0000001ba1e7450] [c0000000002e2a60] alloc_pages_current+0x1f0/0x430 [ 2259.303627] [c0000001ba1e74d0] [c0000000002f1a8c] new_slab+0x67c/0x690 [ 2259.303820] [c0000001ba1e7530] [c0000000002f4da8] ___slab_alloc+0x538/0x680 [ 2259.304000] [c0000001ba1e7650] [c0000000009f1554] __slab_alloc+0x2c/0x70 [ 2259.304184] [c0000001ba1e7680] [c0000000002f4ff4] kmem_cache_alloc+0x104/0x2e0 [ 2259.304418] [c0000001ba1e76d0] [d000000006b43f40] init_conntrack+0x1e0/0x960 [nf_conntrack] [ 2259.304633] [c0000001ba1e77b0] [d000000006b44e2c] nf_conntrack_in+0x76c/0x820 [nf_conntrack] [ 2259.304844] [c0000001ba1e78b0] [d000000006ce0948] ipv4_conntrack_local+0x58/0x80 [nf_conntrack_ipv4] [ 2259.305081] [c0000001ba1e78d0] [c0000000008916f8] nf_hook_slow+0xc8/0x1f0 [ 2259.305264] [c0000001ba1e7930] [c0000000008e3e00] raw_sendmsg+0x9e0/0xaa0 [ 2259.305447] [c0000001ba1e7af0] [c0000000008fcb7c] inet_sendmsg+0x7c/0x180 [ 2259.305626] [c0000001ba1e7b30] [c0000000008044dc] sock_sendmsg+0xec/0x140 [ 2259.305806] [c0000001ba1e7ca0] [c00000000080a7bc] SyS_sendto+0x15c/0x240 [ 2259.305986] [c0000001ba1e7dd0] [c00000000080be08] SyS_socketcall+0x2d8/0x430 [ 2259.306165] [c0000001ba1e7e30] [c00000000000a184] system_call+0x38/0xb4 [ 2259.306341] Mem-Info: [ 2259.306429] active_anon:289035 inactive_anon:22264 isolated_anon:0 [ 2259.306429] active_file:1801 inactive_file:2683 isolated_file:32 [ 2259.306429] unevictable:606 dirty:3308 writeback:33 unstable:0 [ 2259.306429] slab_reclaimable:2619 slab_unreclaimable:27867 [ 2259.306429] mapped:4117 shmem:14694 pagetables:2707 bounce:0 [ 2259.306429] free:107 free_pcp:0 free_cma:0 [ 2259.307154] Node 0 DMA free:6848kB min:19136kB low:23872kB high:28672kB active_anon:18498240kB inactive_anon:1424896kB active_file:115264kB inactive_file:171712kB unevictable:38784kB isolated(anon):0kB isolated(file):2048kB present:24117248kB managed:23012416kB mlocked:38784kB dirty:211712kB writeback:2112kB mapped:263488kB shmem:940416kB slab_reclaimable:167616kB slab_unreclaimable:1783488kB kernel_stack:197936kB pagetables:173248kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5835004 all_unreclaimable? no [ 2259.308226] lowmem_reserve[]: 0 0 0 [ 2259.308407] Node 0 DMA: 108*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 6912kB [ 2259.308835] 19298 total pagecache pages [ 2259.308927] 16 pages in swap cache [ 2259.309016] Swap cache stats: add 36879, delete 36851, find 1387/1665 [ 2259.309165] Free swap = 0kB [ 2259.309254] Total swap = 2112832kB Part of the call traces on Power9: [ 5606.871593] Free swap = 0kB [ 5606.871616] Total swap = 2143552kB [ 5606.871653] 376832 pages RAM [ 5606.871668] 0 pages HighMem/MovableOnly [ 5606.871711] 17475 pages reserved [ 5606.871747] 0 pages cma reserved [ 5606.871770] 2 pages hwpoisoned [ 5612.874790] stress-ng-resou invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=1000 [ 5612.874797] stress-ng-resou cpuset=/ mems_allowed=0 [ 5612.874806] CPU: 30 PID: 14304 Comm: stress-ng-resou Not tainted 4.11.0-42.el7a.ppc64le #1 [ 5612.874807] Call Trace: [ 5612.874815] [c0000001a25e7510] [c000000000c09008] dump_stack+0xb0/0xf0 (unreliable) [ 5612.874819] [c0000001a25e7550] [c000000000c00768] dump_header+0xd4/0x284 [ 5612.874824] [c0000001a25e7630] [c00000000030618c] oom_kill_process+0x49c/0x7b0 [ 5612.874828] [c0000001a25e76f0] [c0000000003070f4] out_of_memory+0x8e4/0x930 [ 5612.874831] [c0000001a25e7790] [c00000000030fe74] __alloc_pages_nodemask+0xf54/0x1000 [ 5612.874835] [c0000001a25e7980] [c0000000003ba804] alloc_pages_vma+0x584/0x6e0 [ 5612.874839] [c0000001a25e7a40] [c00000000039b008] __read_swap_cache_async+0x1f8/0x2f0 [ 5612.874843] [c0000001a25e7ac0] [c00000000039b6fc] swapin_readahead+0x31c/0x5a0 [ 5612.874847] [c0000001a25e7bb0] [c00000000036a268] do_swap_page+0x608/0xad0 [ 5612.874850] [c0000001a25e7c30] [c00000000036f708] __handle_mm_fault+0x9c8/0x1100 [ 5612.874853] [c0000001a25e7d30] [c00000000036ff68] handle_mm_fault+0x128/0x210 [ 5612.874857] [c0000001a25e7d70] [c000000000072874] do_page_fault+0x5b4/0x850 [ 5612.874861] [c0000001a25e7e30] [c00000000000a3dc] handle_page_fault+0x18/0x38 [ 5612.874863] Mem-Info: [ 5612.874873] active_anon:246805 inactive_anon:21374 isolated_anon:0 active_file:499 inactive_file:338 isolated_file:32 unevictable:4692 dirty:480 writeback:0 unstable:0 slab_reclaimable:11635 slab_unreclaimable:47150 mapped:10126 shmem:55060 pagetables:5404 bounce:0 free:3023 free_pcp:74 free_cma:0 stress-ng is deliberately designed to allocate lots of memory, until it can't any more. The call traces are memory allocation failures, caused by stress-ng, as expected. Unless something else is going wrong, this doesn't appear to be a bug. Spoke with yilzhang. I suspect the problem at comment 3 is just allocation failures from the stress, along with appearing unresponsive because it's loaded by the stress. In any case the original qemu segv is no longer reproducible, so if there's a remaining problem it can be filed as a different bug. |