Bug 1908424 - [ARK] stress-ng bigheap stressor can trigger kernel BUG at include/linux/swapops.h:197! Panic
Summary: [ARK] stress-ng bigheap stressor can trigger kernel BUG at include/linux/swa...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1913485
TreeView+ depends on / blocked
 
Reported: 2020-12-16 16:40 UTC by Rachel Sibley
Modified: 2021-01-06 21:47 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1913485 (view as bug list)
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
console.log (1.21 MB, text/plain)
2020-12-16 16:40 UTC, Rachel Sibley
no flags Details

Description Rachel Sibley 2020-12-16 16:40:34 UTC
Created attachment 1739701 [details]
console.log

1. Please describe the problem:

Running stress-ng bigheap stressor on ark-eln kernel triggers oom-killer resulting
in a kernel panic on aarch64. 

kernel BUG at include/linux/swapops.h:197! Panic

2. What is the Version-Release number of the kernel:
5.10.0-100.test.eln.aarch64
5.10.0-101.test.eln.aarch64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

It's unclear when it first started as sometimes the the failure can be masked as a timeout

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Build and run stress-ng and focus on the bigheap stressor:

git clone git://kernel.ubuntu.com/cking/stress-ng.git
cd stress-ng
make
stress-ng --bigheap 0 --timeout 5 --log-file bigheap.log

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

It's also reproducible with the latest eln kernel 5.10.0-101.test.eln.aarch64

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Panic can be seen in the attached console log, but here's a snippet of the failure below: 

[ 1526.626863] stress-ng-bighe invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=1000 
[ 1526.638599] CPU: 199 PID: 1032329 Comm: stress-ng-bighe Kdump: loaded Tainted: G               X --------- ---  5.10.0-100.test.eln.aarch64 #1 
[ 1526.651420] Hardware name: HPE ASSY,ARx4z SystemBoard/Comanche_2S_CN99X_ARM , BIOS L50_5.13_1.11 06/18/2019 
[ 1526.661198] Call trace: 
[ 1526.663777]  dump_backtrace+0x0/0x1f0 
[ 1526.667698]  show_stack+0x24/0x70 
[ 1526.671166]  dump_stack+0xd0/0x128 
[ 1526.674577]  dump_header+0x50/0x1f8 
[ 1526.678197]  oom_kill_process+0x228/0x230 
[ 1526.682312]  out_of_memory.part.0+0xbc/0x2d0 
[ 1526.686793]  out_of_memory+0x54/0xac 
[ 1526.690486]  __alloc_pages_may_oom+0x120/0x1a0 
[ 1526.694924]  __alloc_pages_slowpath.constprop.0+0x4b8/0x744 
[ 1526.700649]  __alloc_pages_nodemask+0x298/0x2ec 
[ 1526.705196]  alloc_pages_vma+0x98/0x240 
[ 1526.709062]  do_anonymous_page+0x9c/0x530 
[ 1526.713133]  handle_pte_fault+0x1d4/0x214 
[ 1526.717214]  __handle_mm_fault+0x10c/0x350 
[ 1526.721311]  handle_mm_fault+0xa8/0x210 
[ 1526.725150]  do_page_fault+0x154/0x3b0 
[ 1526.728892]  do_translation_fault+0x98/0xb4 
[ 1526.733101]  do_mem_abort+0x4c/0xb0 
[ 1526.736609]  el0_da+0x44/0x80 
[ 1526.739814]  el0_sync_handler+0x168/0x1c0 
[ 1526.743840]  el0_sync+0x174/0x180 
[ 1526.747436] Mem-Info: 
[ 1526.750170] active_anon:70791 inactive_anon:3671345 isolated_anon:1756 
[ 1526.750170]  active_file:1823 inactive_file:2376 isolated_file:0 
[ 1526.750170]  unevictable:2 dirty:26 writeback:0 
[ 1526.750170]  slab_reclaimable:10833 slab_unreclaimable:64399 
[ 1526.750170]  mapped:969 shmem:365 pagetables:3091 bounce:0 
[ 1526.750170]  free:322329 free_pcp:157 free_cma:0 
....
[ 1532.49 4613] oom_reape[ 1534.922975] ------------[ cut here ]------------ 
[ 1534.927591] kernel BUG at include/linux/swapops.h:197! 
[ 1534.932720] Internal error: Oops - BUG: 0 [#1] SMP 
[ 1534.937501] Modules linked in: salsa20_generic camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common xts ofb lrw tgr192 wp512 rmd320 rmd256 rmd160 rmd128 md4 raid10 raid1 raid0 raid456 async_raid6_recov async_memcpy async_pq async_xor xor xor_neon async_tx raid6_pq loop tun af_key crypto_user xt_multiport ip_gre ip_tunnel gre overlay xt_CONNSECMARK xt_SECMARK nft_counter xt_state xt_conntrack nft_compat ah6 ah4 nft_objref nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink sctp rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser ib_umad rdma_cm iw_cm libiscsi ib_ipoib scsi_transport_iscsi ib_cm mlx5_ib rfkill ib_uverbs ib_core sunrpc sr_mod mlx5_core cdrom acpi_ipmi i2c_smbus ipmi_ssif mlxfw ipmi_devintf ipmi_msghandler thunderx2_pmu ext4 vfat fat mbcache jbd2 fuse zram ip_tables xfs libcrc32c ast i2c_algo_bit drm_vram_helper 
[ 1534.937692]  drm_kms_helper syscopyarea sysfillrect sysimgblt crct10dif_ce fb_sys_fops cec ghash_ce sha2_ce drm_ttm_helper uas ttm sha256_arm64 sha1_ce drm usb_storage gpio_xlp i2c_xlp9xx aes_neon_bs 
[ 1535.042069] CPU: 22 PID: 1032201 Comm: stress-ng-bighe Kdump: loaded Tainted: G               X --------- ---  5.10.0-100.test.eln.aarch64 #1 
[ 1535.054746] Hardware name: HPE ASSY,ARx4z SystemBoard/Comanche_2S_CN99X_ARM , BIOS L50_5.13_1.11 06/18/2019 
[ 1535.064474] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--) 
[ 1535.070473] pc : __migration_entry_wait+0x148/0x154 
[ 1535.075338] lr : migration_entry_wait+0x60/0x6c 
[ 1535.079856] sp : ffff80006ba6fc20 
[ 1535.083158] x29: ffff80006ba6fc20 x28: ffff00bcf779c080  
[ 1535.088459] x27: 0000000000000000 x26: 0000000000000002  
[ 1535.093759] x25: ffff00a0ed019ba8 x24: ffff00a0ed019b40  
[ 1535.099059] x23: 0000000000000000 x22: ffff00a01f737530  
[ 1535.104360] x21: 7c000000000b3fa4 x20: ffff0008f9642ef8  
[ 1535.109660] x19: ffffffe0021e5928 x18: 0000000000000000  
[ 1535.114960] x17: 0000000000000000 x16: 0000000000000000  
[ 1535.120260] x15: 0000000000000000 x14: 0000000000000000  
[ 1535.125560] x13: 0000000000000000 x12: 0000000000000000  
[ 1535.130860] x11: 0000000000000000 x10: 0000000000000000  
[ 1535.136159] x9 : ffff800010347764 x8 : 0000000000000000  
[ 1535.141460] x7 : 0000000000000000 x6 : fffffc1fffe00000  
[ 1535.146759] x5 : fff1000080000000 x4 : 0000000979640003  
[ 1535.152060] x3 : ffffffe0028fe900 x2 : ffffffe0256c4187  
[ 1535.157360] x1 : ffffffe0256c4188 x0 : 17ffff8000080034  
[ 1535.162661] Call trace: 
[ 1535.165097]  __migration_entry_wait+0x148/0x154 
[ 1535.169616]  migration_entry_wait+0x60/0x6c 
[ 1535.173790]  do_swap_page+0x790/0x8f0 
[ 1535.177441]  handle_pte_fault+0x1e0/0x214 
[ 1535.181439]  __handle_mm_fault+0x10c/0x350 
[ 1535.185524]  handle_mm_fault+0xa8/0x210 
[ 1535.189349]  do_page_fault+0x154/0x3b0 
[ 1535.193086]  do_translation_fault+0x98/0xb4 
[ 1535.197259]  do_mem_abort+0x4c/0xb0 
[ 1535.200738]  el0_da+0x44/0x80 
[ 1535.203694]  el0_sync_handler+0x168/0x1c0 
[ 1535.207692]  el0_sync+0x174/0x180 
[ 1535.211000] Code: 17ffffcb 92407c21 14000832 17fffff7 (d4210000)  
[ 1535.217082] ---[ end trace c00b94b9b9a7b815 ]--- 
[ 1535.221687] Kernel panic - not syncing: Oops - BUG: Fatal exception 
[ 1535.227941] SMP: stopping secondary CPUs 
[ 1535.232126] Kernel Offset: disabled 
[ 1535.235603] CPU features: 0x0046002,63000c38 
[ 1535.239859] Memory Limit: none 
[ 1535.243739] Starting crashdump kernel... 
[ 1535.247652] Bye!


Note You need to log in before you can comment on or make changes to this bug.