Bug 1977916
| Summary: | [RHEL-9.0][PANIC] Unable to handle kernel paging request at virtual address 0058272cb3040f50 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | PaulB <pbunyan> |
| Component: | kernel | Assignee: | Kernel Drivers <hwkernel-mgr> |
| kernel sub component: | Platform Enablement | QA Contact: | PaulB <pbunyan> |
| Status: | CLOSED WORKSFORME | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | aquini, crecklin, efuller, ernunes, hkrzesin, jbastian, jiyin, jlayton, lsahlber, msalter, pbunyan, pifang, rvr, smeisner, steved |
| Version: | 9.0 | Flags: | pbunyan:
needinfo-
pm-rhel: mirror+ |
| Target Milestone: | beta | ||
| Target Release: | --- | ||
| Hardware: | aarch64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-02-11 22:09:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2041894 | ||
|
Description
PaulB
2021-06-30 17:28:20 UTC
All, Here is a reproducer on a baremetal aarch54 system during the nfs/connectathon test: host: gigabyte-r120-04 distro: RHEL-9.0.0-20210626.0 kernel: 5.13.0-0.rc7.51.el9 task: /kernel/filesystems/nfs/connectathon 3.0-95 https://beaker.engineering.redhat.com/recipes/10205916#task128050274 https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2021/06/55108/5510872/10205916/console.log ---%<-snip->%--- [ 2776.764798] restraintd[1295]: ** Running task: 128050274 [/kernel/filesystems/nfs/connectathon] [-- MARK -- Mon Jun 28 04:45:00 2021] [ 2837.041406] restraintd[1295]: *** Current Time: Mon Jun 28 00:45:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 2868.631167] FS-Cache: Loaded [ 2868.712349] FS-Cache: Netfs 'nfs' registered for caching [ 2869.091058] NFS: Registering the id_resolver key type [ 2869.096128] Key type id_resolver registered [ 2869.100331] Key type id_legacy registered [ 2869.137132] NFS: Server rhel5-nfs.rhts.eng.bos.redhat.com reports our clientid is in use [ 2869.145217] NFS: state manager: lease expired failed on NFSv4 server rhel5-nfs.rhts.eng.bos.redhat.com with error 1 [ 2875.989045] systemd-sysv-generator[105153]: SysV service '/etc/rc.d/init.d/anamon' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust. [ 2897.032131] restraintd[1295]: *** Current Time: Mon Jun 28 00:46:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 2928.380016] NFS: Server rhel5-nfs.rhts.eng.bos.redhat.com reports our clientid is in use [ 2928.388129] NFS: state manager: lease expired failed on NFSv4 server rhel5-nfs.rhts.eng.bos.redhat.com with error 1 [ 2957.023341] restraintd[1295]: *** Current Time: Mon Jun 28 00:47:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 2958.964592] NFS: Server rhel5-nfs.rhts.eng.bos.redhat.com reports our clientid is in use [ 2958.972701] NFS: state manager: lease expired failed on NFSv4 server rhel5-nfs.rhts.eng.bos.redhat.com with error 1 [ 3017.031345] restraintd[1295]: *** Current Time: Mon Jun 28 00:48:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 3077.032030] restraintd[1295]: *** Current Time: Mon Jun 28 00:49:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [-- MARK -- Mon Jun 28 04:50:00 2021] [ 3137.031627] restraintd[1295]: *** Current Time: Mon Jun 28 00:50:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 3197.039311] restraintd[1295]: *** Current Time: Mon Jun 28 00:51:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 3257.038970] restraintd[1295]: *** Current Time: Mon Jun 28 00:52:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 3317.038353] restraintd[1295]: *** Current Time: Mon Jun 28 00:53:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 3377.021116] restraintd[1295]: *** Current Time: Mon Jun 28 00:54:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [-- MARK -- Mon Jun 28 04:55:00 2021] [ 3437.029087] restraintd[1295]: *** Current Time: Mon Jun 28 00:55:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 3497.028485] restraintd[1295]: *** Current Time: Mon Jun 28 00:56:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 3557.036514] restraintd[1295]: *** Current Time: Mon Jun 28 00:57:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 3617.029095] restraintd[1295]: *** Current Time: Mon Jun 28 00:58:12 2021 Localwatchdog at: Mon Jun 28 04:44:12 2021 [ 3628.182248] Unable to handle kernel paging request at virtual address 5a9cda118662f8fe [ 3628.190185] Mem abort info: [ 3628.193078] ESR = 0x96000004 [ 3628.196268] EC = 0x25: DABT (current EL), IL = 32 bits [ 3628.201654] SET = 0, FnV = 0 [ 3628.204712] EA = 0, S1PTW = 0 [ 3628.207843] Data abort info: [ 3628.210712] ISV = 0, ISS = 0x00000004 [ 3628.214542] CM = 0, WnR = 0 [ 3628.217498] [5a9cda118662f8fe] address between user and kernel address ranges [ 3628.224628] Internal error: Oops: 96000004 [#1] SMP [ 3628.229496] Modules linked in: nfsv3 nfs_acl ib_core rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs md4 cifs libdes libarc4 dns_resolver nls_koi8_u nls_cp932 ts_kmp nls_utf8 rfkill sunrpc vfat fat ast drm_vram_helper drm_ttm_helper ttm i2c_algo_bit drm_kms_helper cec fb_sys_fops syscopyarea cavium_rng_vf sysfillrect ipmi_ssif ipmi_devintf sysimgblt ipmi_msghandler cavium_rng thunderx_edac drm fuse xfs libcrc32c nicvf nicpf crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce thunder_bgx i2c_thunderx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod aes_neon_bs [last unloaded: zram] [ 3628.284708] CPU: 10 PID: 122817 Comm: runtest.sh Tainted: G X --------- --- 5.13.0-0.rc7.51.el9.aarch64 #1 [ 3628.295653] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS F02 08/06/2019 [ 3628.303037] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--) [ 3628.309034] pc : kmem_cache_alloc+0xd0/0x2e0 [ 3628.313304] lr : kmem_cache_alloc+0xa0/0x2e0 [ 3628.317565] sp : ffff800016acfb50 [ 3628.320869] x29: ffff800016acfb50 x28: 0000000001200000 x27: 00000000000006c7 [ 3628.328000] x26: ffff00011a1902c0 x25: ffff800011fb0000 x24: ffff8000101c7ca0 [ 3628.335130] x23: ffff000100015a00 x22: 0000000000000dc0 x21: 0000000000000000 [ 3628.342261] x20: ffff8000117144b0 x19: ffff000100015a00 x18: 0000000000000000 [ 3628.349391] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 3628.356522] x14: ffff00014d004d80 x13: 0000000000000000 x12: 0000000000000000 [ 3628.363652] x11: 0000000000000000 x10: ffff00011b258a00 x9 : ffff8000103486f4 [ 3628.370782] x8 : ffff000158e73480 x7 : 0000000000000000 x6 : 0000000000000020 [ 3628.377912] x5 : 0000000000000000 x4 : ffff000ff8f944b0 x3 : 9f231e39f54f625d [ 3628.385043] x2 : 0000000000000028 x1 : fef8628611da9c5a x0 : 5a9cda118662f8d6 [ 3628.392174] Call trace: [ 3628.394610] kmem_cache_alloc+0xd0/0x2e0 [ 3628.398524] __delayacct_tsk_init+0x30/0x50 [ 3628.402699] copy_process+0xc50/0x1094 [ 3628.406439] kernel_clone+0x98/0x474 [ 3628.410004] __do_sys_clone+0x70/0xac [ 3628.413657] __arm64_sys_clone+0x2c/0x40 [ 3628.417570] invoke_syscall.constprop.0+0x58/0xf0 [ 3628.422268] el0_svc_common.constprop.0+0x5c/0x164 [ 3628.427051] do_el0_svc+0x34/0xcc [ 3628.430357] el0_svc+0x2c/0x90 [ 3628.433403] el0_sync_handler+0xa4/0x150 [ 3628.437316] el0_sync+0x198/0x1c0 [ 3628.440625] Code: b9402a62 f9405e63 8b020001 dac00c21 (f862681a) [ 3628.446730] ---[ end trace 10e0701d5830169d ]--- [ 3628.451337] Kernel panic - not syncing: Oops: Fatal exception [ 3628.457085] SMP: stopping secondary CPUs [ 3628.461006] Kernel Offset: disabled [ 3628.464482] CPU features: 0x00180251,20800a40 [ 3628.468829] Memory Limit: none [ 3628.471886] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- ---%<-snip->%--- Best, pbunyan steved, Please have a look at this issue seen while running nfs/connectathon testing. Best, pbunyan All, Here is a reproducer: host: qualcomm-amberwing-rep-17 distro: RHEL-9.0.0-20210714.2 kernel: 5.13.0-1.el9 task: kernel/distribution/ltp/lite 20210524-1 https://beaker.engineering.redhat.com/recipes/10308318#task128920979 https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2021/07/55814/5581485/10308318/console.log ---%<-snip->%--- [ 5302.317146] restraintd[1535]: ** Installing dependencies [ 5413.001814] restraintd[1535]: ** Running task: 128920979 [/kernel/distribution/ltp/lite] [ 5473.044833] restraintd[1535]: *** Current Time: Thu Jul 15 21:18:37 2021 Localwatchdog at: Fri Jul 16 00:17:36 2021 [ 5533.027740] restraintd[1535]: *** Current Time: Thu Jul 15 21:19:36 2021 Localwatchdog at: Fri Jul 16 00:17:36 2021 [-- MARK -- Fri Jul 16 01:20:00 2021] [ 5593.042218] restraintd[1535]: *** Current Time: Thu Jul 15 21:20:37 2021 Localwatchdog at: Fri Jul 16 00:17:36 2021 [ 5638.140444] Unable to handle kernel paging request at virtual address b2c34797740ca191 [ 5638.147427] Mem abort info: [ 5638.150184] ESR = 0x96000004 [ 5638.153249] EC = 0x25: DABT (current EL), IL = 32 bits [ 5638.158524] SET = 0, FnV = 0 [ 5638.161555] EA = 0, S1PTW = 0 [ 5638.164699] Data abort info: [ 5638.167545] ISV = 0, ISS = 0x00000004 [ 5638.171365] CM = 0, WnR = 0 [ 5638.174325] [b2c34797740ca191] address between user and kernel address ranges [ 5638.181454] Internal error: Oops: 96000004 [#1] SMP [ 5638.186314] Modules linked in: sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nf_tables nfnetlink ext4 mbcache jbd2 nfs_layout_nfsv41_files nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs md4 cifs libdes libarc4 dns_resolver nls_koi8_u nls_cp932 ts_kmp nls_utf8 rfkill sunrpc vfat fat mlx5_ib ib_uverbs ib_core acpi_ipmi ipmi_ssif ipmi_devintf ipmi_msghandler cppc_cpufreq fuse xfs libcrc32c crct10dif_ce mlx5_core ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt mlxfw psample sdhci_acpi tls ahci_platform sdhci libahci_platform mmc_core qcom_emac hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod aes_neon_bs [last unloaded: stap_cce5ee1d3f9699422cb35c3ced2b7d_161026] [ 5638.247940] CPU: 26 PID: 206538 Comm: make Tainted: G OE X --------- --- 5.13.0-1.el9.aarch64 #1 [ 5638.257742] Hardware name: WIWYNN Qualcomm Centriq 2400 Reference Evaluation Platform CV90-LA115-P11/Qualcomm Centriq 2400 Customer Reference Board, BIOS [ 5638.271557] pstate: a0400005 (NzCv daif +PAN -UAO -TCO BTYPE=--) [ 5638.277528] pc : kmem_cache_alloc+0xd0/0x2e0 [ 5638.281781] lr : kmem_cache_alloc+0xa0/0x2e0 [ 5638.286035] sp : ffff80003748fc10 [ 5638.289339] x29: ffff80003748fc10 x28: ffff3fda71962280 x27: 000000000007e952 [ 5638.296451] x26: 0000000000000000 x25: ffffb31b59e90000 x24: ffffb31b58371028 [ 5638.303576] x23: ffff3fda00014000 x22: 0000000000000dc0 x21: ffff3fda468cec00 [ 5638.310687] x20: ffffb31b595f3670 x19: ffff3fda00014000 x18: 0000000000000000 [ 5638.317806] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 5638.324923] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 5638.332042] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f x9 : ffffb31b582260d8 [ 5638.339160] x8 : ffff3fda468cece8 x7 : 0000000000000000 x6 : 0000000000000020 [ 5638.346278] x5 : 0000000000000030 x4 : ffff3ff0f8783670 x3 : 4afc49b5c25a382a [ 5638.353396] x2 : 0000000000000008 x1 : 91a10c749747c3b2 x0 : b2c34797740ca189 [ 5638.360527] Call trace: [ 5638.362963] kmem_cache_alloc+0xd0/0x2e0 [ 5638.366851] security_file_alloc+0x38/0xa4 [ 5638.370931] __alloc_file+0x60/0xf0 [ 5638.374403] alloc_empty_file+0x6c/0x10c [ 5638.378309] alloc_file+0x38/0x144 [ 5638.381694] alloc_file_clone+0x28/0x60 [ 5638.385514] create_pipe_files+0x108/0x1dc [ 5638.389594] do_pipe2+0x4c/0x164 [ 5638.392806] __arm64_sys_pipe2+0x28/0x3c [ 5638.396712] invoke_syscall.constprop.0+0x58/0xf0 [ 5638.401399] el0_svc_common.constprop.0+0x160/0x164 [ 5638.406260] do_el0_svc+0x34/0xcc [ 5638.409559] el0_svc+0x2c/0x90 [ 5638.412597] el0_sync_handler+0xa4/0x150 [ 5638.416503] el0_sync+0x198/0x1c0 [ 5638.419817] Code: b9402a62 f9405e63 8b020001 dac00c21 (f862681a) [ 5638.425913] ---[ end trace 2b8bc703f0fc168c ]--- [ 5638.430498] Kernel panic - not syncing: Oops: Fatal exception [ 5638.436224] SMP: stopping secondary CPUs [ 5638.440166] Kernel Offset: 0x331b47ee0000 from 0xffff800010000000 [ 5638.446191] PHYS_OFFSET: 0xfff0c02700000000 [ 5638.450364] CPU features: 0x00000251,21000840 [ 5638.454698] Memory Limit: none [ 5638.457771] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- [-- MARK -- Fri Jul 16 01:25:00 2021] ---%<-snip->%--- smeisner, Can you help with proper assignment of this BZ? Best, pbunyan Hi Paul, Could you boot the kernel with slub_debug=FPZU added to the kernel boot line and try to reproduce the problem? This should hopefully capture use-after-free and out-of-bounds writes, as well as adding tracking info (stack backtraces of allocs and frees). If you could capture a crashdump with slub_debug set I could take a look. Thanks, Chris (In reply to Chris von Recklinghausen from comment #6) > Hi Paul, > > Could you boot the kernel with slub_debug=FPZU added to the kernel boot line > and try to reproduce the problem? > > This should hopefully capture use-after-free and out-of-bounds writes, as > well as adding tracking info (stack backtraces of allocs and frees). > Agreed, This one smells like an use-after-free. slub_debug should help on proving or refuting that guess. -- Rafael All, I was not able to reproduce this issue using recent distro and kernel: distro: RHEL-9.0.0-20220128.1 kernel: 5.14.0-48.el9 https://beaker.engineering.redhat.com/jobs/6277892 https://beaker.engineering.redhat.com/jobs/6277891 https://beaker.engineering.redhat.com/jobs/6277887 The failures seen in those jobs are other known issues unrelated to this BZ1977916. I will close this BZ and reopen if issue resurfaces. Best, pbunyan |