Bug 1948381
| Summary: | [RHEL9 pmem]: general protection fault, probably for non-canonical address 0xdffffc00007c8031: 0000 [#1] SMP KASAN PTI | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | xiaoli feng <xifeng> |
| Component: | kernel | Assignee: | fs-maint |
| kernel sub component: | File Systems | QA Contact: | xiaoli feng <xifeng> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | bfoster, cmaiolin, dhowells, esandeen, feliu, fsorenso, mszeredi, swhiteho, xzhou |
| Version: | 9.0 | ||
| Target Milestone: | beta | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-09-21 01:26:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1971841 | ||
|
Description
xiaoli feng
2021-04-12 06:30:18 UTC
I gave this a quick test on upstream v5.12.0-rc4 (with KASAN enabled) and couldn't reproduce. The RHEL9 baseline referenced in the error is v5.11.0, so it might be worth identifying whether this is something that has been resolved upstream the next time the RHEL9 baseline kernel updates. Hi Brian I also hit the issue "general protection fault", is it same issue as the bug? job:https://beaker.engineering.redhat.com/recipes/10203224#task128019466 console: https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2021/06/55091/5509187/10203224/console.log [ 67.200089] general protection fault, probably for non-canonical address 0x36c2ecee7f737b1e: 0000 [#1] SMP PTI l Module configf[ 67.484988] power_meter ACPI000D:00: Ignoring unsafe software power cap! [ 67.532719] CPU: 10 PID: 1352 Comm: systemd-udevd Tainted: G X --------- --- 5.13.0-0.rc4.33.el9.x86_64 #1 s . [ 67.532727] Hardware name: HP ProLiant DL580 Gen8, BIOS P79 10/27/2015 [ 67.532729] RIP: 0010:kmem_cache_alloc_trace+0x98/0x220 [ 67.564256] power_meter ACPI000D:00: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). [ 67.615215] Code: 2e c4 0f 56 49 8b 00 49 83 78 10 00 48 89 44 24 08 0f 84 4d 01 00 00 48 85 c0 0f 84 44 01 00 00 8b 4d 28 48 8b 7d 00 48 01 c1 <48> 8b 19 48 89 ce 48 33 9d b8 00 00 00 48 8d 4a 01 48 0f ce 48 31 [ 67.615220] RSP: 0018:ffffbd6e89703cf0 EFLAGS: 00010206 [ 67.615225] RAX: 36c2ecee7f737ade RBX: 0000000000000000 RCX: 36c2ecee7f737b1e [ 67.871307] RDX: 00000000000000b4 RSI: 0000000000000dc0 RDI: 00000000000310e0 [ 67.871310] RBP: ffff951500042700 R08: ffff951c6f8310e0 R09: 0000000000000000 [ 67.871311] R10: ffffbd6e89841fff R11: 000fffffffe00000 R12: 0000000000000dc0 [ 67.871312] R13: ffff951500042700 R14: 0000000000000268 R15: ffffffffaa0987d2 [ 67.871314] FS: 00007fe7d3a79b40(0000) GS:ffff951c6f800000(0000) knlGS:0000000000000000 [ 67.871316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 67.871318] CR2: 00007fe7d2653000 CR3: 0000000110e56006 CR4: 00000000001706e0 [ OK [[ 67.871321] Call Trace: 0m] Finished [0[ 68.207803] pkcs7_parse_message+0x32/0x190 ;1;39mColdplug A[ 68.227060] verify_pkcs7_signature+0x24/0x80 ll udev Devices [ 68.246937] mod_verify_sig+0x85/0xb0 [ 68.263818] load_module+0x336/0xb60 [0m. [ 68.280191] ? __vmalloc_node_range+0x7f/0xe0 [ 68.300607] __do_sys_init_module+0x12e/0x1b0 [ 68.320600] do_syscall_64+0x40/0x80 [ 68.336971] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 68.360017] RIP: 0033:0x7fe7d46a427e [ 68.376286] Code: 48 8b 0d f5 1b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c2 1b 0c 00 f7 d8 64 89 01 48 [ OK [[ 68.466186] RSP: 002b:00007fff69a87138 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 68.503536] RAX: ffffffffffffffda RBX: 000055f9bb7549e0 RCX: 00007fe7d46a427e [ 68.536208] RDX: 00007fe7d47f932c RSI: 000000000000cd08 RDI: 000055f9bb7951c0 0m] Finished [0[ 68.568812] RBP: 000055f9bb7951c0 R08: 000055f9bb756300 R09: 00007fff69a83ce8 [ 68.602313] R10: 000055fce4ee1786 R11: 0000000000000246 R12: 00007fe7d47f932c ;1;39mFlush Jour[ 68.635182] R13: 000055f9bb7bfd00 R14: 0000000000000007 R15: 000055f9bb7548e0 nal to Persisten[ 68.667953] Modules linked in: acpi_power_meter fuse drm ip_tables xfs libcrc32c mlx4_en sd_mod crct10dif_pclmul t10_pi crc32_pclmul cxgb4 csiostor crc32c_intel mlx4_core ghash_clmulni_intel tls scsi_transport_fc tg3 hpsa hpwdt scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod t Storage . [ 68.784565] ---[ end trace ed2a7ccf91ba5ef1 ]--- [ 68.810362] dca service started, version 1.12.1 [ 68.814836] RIP: 0010:kmem_cache_alloc_trace+0x98/0x220 [ 68.855556] Code: 2e c4 0f 56 49 8b 00 49 83 78 10 00 48 89 44 24 08 0f 84 4d 01 00 00 48 85 c0 0f 84 44 01 00 00 8b 4d 28 48 8b 7d 00 48 01 c1 <48> 8b 19 48 89 ce 48 33 9d b8 00 00 00 48 8d 4a 01 48 0f ce 48 31 [ 69.042996] RSP: 0018:ffffbd6e89703cf0 EFLAGS: 00010206 [ 69.066837] RAX: 36c2ecee7f737ade RBX: 0000000000000000 RCX: 36c2ecee7f737b1e [ 69.099801] RDX: 00000000000000b4 RSI: 0000000000000dc0 RDI: 00000000000310e0 [ 69.132419] RBP: ffff951500042700 R08: ffff951c6f8310e0 R09: 0000000000000000 [ 69.165014] R10: ffffbd6e89841fff R11: 000fffffffe00000 R12: 0000000000000dc0 [ 69.197728] R13: ffff951500042700 R14: 0000000000000268 R15: ffffffffaa0987d2 [ 69.230454] FS: 00007fe7d3a79b40(0000) GS:ffff951c6f800000(0000) knlGS:0000000000000000 [ 69.267482] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 69.293778] CR2: 00007fe7d2653000 CR3: 0000000110e56006 CR4: 00000000001706e0 [ 69.326512] Kernel panic - not syncing: Fatal exception [ 69.350423] Kernel Offset: 0x28c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 69.407502] ---[ end Kernel panic - not syncing: Fatal exception ]--- (In reply to Fei Liu from comment #2) > Hi Brian > I also hit the issue "general protection fault", is it same issue as the bug? > This looks totally unrelated to the original issue? Did you hit it with the same steps as the original report? > job:https://beaker.engineering.redhat.com/recipes/10203224#task128019466 > console: > https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2021/06/ > 55091/5509187/10203224/console.log > [ 67.200089] general protection fault, probably for non-canonical address > 0x36c2ecee7f737b1e: 0000 [#1] SMP PTI > l Module configf[ 67.484988] power_meter ACPI000D:00: Ignoring unsafe > software power cap! > [ 67.532719] CPU: 10 PID: 1352 Comm: systemd-udevd Tainted: G > X --------- --- 5.13.0-0.rc4.33.el9.x86_64 #1 > s > . > [ 67.532727] Hardware name: HP ProLiant DL580 Gen8, BIOS P79 10/27/2015 > [ 67.532729] RIP: 0010:kmem_cache_alloc_trace+0x98/0x220 > [ 67.564256] power_meter ACPI000D:00: hwmon_device_register() is > deprecated. Please convert the driver to use > hwmon_device_register_with_info(). > [ 67.615215] Code: 2e c4 0f 56 49 8b 00 49 83 78 10 00 48 89 44 24 08 0f > 84 4d 01 00 00 48 85 c0 0f 84 44 01 00 00 8b 4d 28 48 8b 7d 00 48 01 c1 <48> > 8b 19 48 89 ce 48 33 9d b8 00 00 00 48 8d 4a 01 48 0f ce 48 31 > [ 67.615220] RSP: 0018:ffffbd6e89703cf0 EFLAGS: 00010206 > [ 67.615225] RAX: 36c2ecee7f737ade RBX: 0000000000000000 RCX: > 36c2ecee7f737b1e > [ 67.871307] RDX: 00000000000000b4 RSI: 0000000000000dc0 RDI: > 00000000000310e0 > [ 67.871310] RBP: ffff951500042700 R08: ffff951c6f8310e0 R09: > 0000000000000000 > [ 67.871311] R10: ffffbd6e89841fff R11: 000fffffffe00000 R12: > 0000000000000dc0 > [ 67.871312] R13: ffff951500042700 R14: 0000000000000268 R15: > ffffffffaa0987d2 > [ 67.871314] FS: 00007fe7d3a79b40(0000) GS:ffff951c6f800000(0000) > knlGS:0000000000000000 > [ 67.871316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 67.871318] CR2: 00007fe7d2653000 CR3: 0000000110e56006 CR4: > 00000000001706e0 > [ > OK [[ 67.871321] Call Trace: > 0m] Finished [0[ 68.207803] pkcs7_parse_message+0x32/0x190 > ;1;39mColdplug A[ 68.227060] verify_pkcs7_signature+0x24/0x80 > ll udev Devices [ 68.246937] mod_verify_sig+0x85/0xb0 > [ 68.263818] load_module+0x336/0xb60 > [0m. > [ 68.280191] ? __vmalloc_node_range+0x7f/0xe0 > [ 68.300607] __do_sys_init_module+0x12e/0x1b0 > [ 68.320600] do_syscall_64+0x40/0x80 > [ 68.336971] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 68.360017] RIP: 0033:0x7fe7d46a427e > [ 68.376286] Code: 48 8b 0d f5 1b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 > 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> > 3d 01 f0 ff ff 73 01 c3 48 8b 0d c2 1b 0c 00 f7 d8 64 89 01 48 > [ > OK [[ 68.466186] RSP: 002b:00007fff69a87138 EFLAGS: 00000246 ORIG_RAX: > 00000000000000af > [ 68.503536] RAX: ffffffffffffffda RBX: 000055f9bb7549e0 RCX: > 00007fe7d46a427e > [ 68.536208] RDX: 00007fe7d47f932c RSI: 000000000000cd08 RDI: > 000055f9bb7951c0 > 0m] Finished [0[ 68.568812] RBP: 000055f9bb7951c0 R08: 000055f9bb756300 > R09: 00007fff69a83ce8 > [ 68.602313] R10: 000055fce4ee1786 R11: 0000000000000246 R12: > 00007fe7d47f932c > ;1;39mFlush Jour[ 68.635182] R13: 000055f9bb7bfd00 R14: 0000000000000007 > R15: 000055f9bb7548e0 > nal to Persisten[ 68.667953] Modules linked in: acpi_power_meter fuse drm > ip_tables xfs libcrc32c mlx4_en sd_mod crct10dif_pclmul t10_pi crc32_pclmul > cxgb4 csiostor crc32c_intel mlx4_core ghash_clmulni_intel tls > scsi_transport_fc tg3 hpsa hpwdt scsi_transport_sas wmi dm_mirror > dm_region_hash dm_log dm_mod > t Storage > . > [ 68.784565] ---[ end trace ed2a7ccf91ba5ef1 ]--- > [ 68.810362] dca service started, version 1.12.1 > [ 68.814836] RIP: 0010:kmem_cache_alloc_trace+0x98/0x220 > [ 68.855556] Code: 2e c4 0f 56 49 8b 00 49 83 78 10 00 48 89 44 24 08 0f > 84 4d 01 00 00 48 85 c0 0f 84 44 01 00 00 8b 4d 28 48 8b 7d 00 48 01 c1 <48> > 8b 19 48 89 ce 48 33 9d b8 00 00 00 48 8d 4a 01 48 0f ce 48 31 > [ 69.042996] RSP: 0018:ffffbd6e89703cf0 EFLAGS: 00010206 > [ 69.066837] RAX: 36c2ecee7f737ade RBX: 0000000000000000 RCX: > 36c2ecee7f737b1e > [ 69.099801] RDX: 00000000000000b4 RSI: 0000000000000dc0 RDI: > 00000000000310e0 > [ 69.132419] RBP: ffff951500042700 R08: ffff951c6f8310e0 R09: > 0000000000000000 > [ 69.165014] R10: ffffbd6e89841fff R11: 000fffffffe00000 R12: > 0000000000000dc0 > [ 69.197728] R13: ffff951500042700 R14: 0000000000000268 R15: > ffffffffaa0987d2 > [ 69.230454] FS: 00007fe7d3a79b40(0000) GS:ffff951c6f800000(0000) > knlGS:0000000000000000 > [ 69.267482] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 69.293778] CR2: 00007fe7d2653000 CR3: 0000000110e56006 CR4: > 00000000001706e0 > [ 69.326512] Kernel panic - not syncing: Fatal exception > [ 69.350423] Kernel Offset: 0x28c00000 from 0xffffffff81000000 (relocation > range: 0xffffffff80000000-0xffffffffbfffffff) > [ 69.407502] ---[ end Kernel panic - not syncing: Fatal exception ]--- (In reply to Carlos Maiolino from comment #3) > (In reply to Fei Liu from comment #2) > > Hi Brian > > I also hit the issue "general protection fault", is it same issue as the bug? > > > > This looks totally unrelated to the original issue? Did you hit it with the > same steps > as the original report? Actually I didnot try with the steps, just hit it in job. (In reply to Fei Liu from comment #4) > (In reply to Carlos Maiolino from comment #3) > > (In reply to Fei Liu from comment #2) > > > Hi Brian > > > I also hit the issue "general protection fault", is it same issue as the bug? > > > > > > > This looks totally unrelated to the original issue? Did you hit it with the > > same steps > > as the original report? > > Actually I didnot try with the steps, just hit it in job. Ok. So, I don't see anything on your protection fault that is related to this bz itself. If by any chance you have something else that makes you suspect it is related to this bz, let me know. cheers. I'm trying to reproduce it with no success using kernel 5.11. Did you try reproducing it manually? Without a beaker job? Also, you mentioned you had reproduced it 3 times, were those three times, consecutive? Or 3 times in several attempts? I'm trying to get a system with rhel9 where I can actually try to reproduce it manually, but it isn't being an easy task. (In reply to Carlos Maiolino from comment #6) > I'm trying to reproduce it with no success using kernel 5.11. > > Did you try reproducing it manually? Without a beaker job? Ok, I will try to reproduce it manually. > Also, you mentioned you had reproduced it 3 times, were those three times, > consecutive? Or 3 times in several attempts? No, it's 3 times in several attempts. > > I'm trying to get a system with rhel9 where I can actually try to reproduce > it manually, but it isn't being an easy task. *** Bug 1979165 has been marked as a duplicate of this bug. *** This issue also can be reproduced on 5.13.0-0.rc7.51.el9.x86_64+debug. vmcore files: : http://ibm-x3250m4-03.rhts.eng.pek2.redhat.com/vmcore/xifeng/5.13.0-0.rc7.51.el9.x86_64+debug/5531785/hp-dl360g9-15.rhts.eng.pek2.redhat.com/10.73.4.139-2021-07-04-18:21:21/vmcore : http://ibm-x3250m4-03.rhts.eng.pek2.redhat.com/vmcore/xifeng/5.13.0-0.rc7.51.el9.x86_64+debug/5531785/hp-dl360g9-15.rhts.eng.pek2.redhat.com/10.73.4.139-2021-07-04-18:21:21/vmcore-dmesg.txt (In reply to xiaoli feng from comment #9) > This issue also can be reproduced on 5.13.0-0.rc7.51.el9.x86_64+debug. > > vmcore files: > : > http://ibm-x3250m4-03.rhts.eng.pek2.redhat.com/vmcore/xifeng/5.13.0-0.rc7.51. > el9.x86_64+debug/5531785/hp-dl360g9-15.rhts.eng.pek2.redhat.com/10.73.4.139- > 2021-07-04-18:21:21/vmcore > : > http://ibm-x3250m4-03.rhts.eng.pek2.redhat.com/vmcore/xifeng/5.13.0-0.rc7.51. > el9.x86_64+debug/5531785/hp-dl360g9-15.rhts.eng.pek2.redhat.com/10.73.4.139- > 2021-07-04-18:21:21/vmcore-dmesg.txt Thanks for the report, and for the vmcore, we do have something now to investigate further. Meanwhile, how reliable is the reproducer? Can you hit it every time? Could you please share the steps to start a beaker job with the reproducer? I have a virtual machine running your reproducer for days and still couldn't hit it, Although I'm using emulated pmem, not real pmem hardware. can someone point me to the debuginfo rpms for 5.13.0-0.rc7.51.el9.x86_64+.x86_64.debug kernel-debug-5.13.0-0.rc7.51.el9.x86_64+.x86_64.debug looks to be a custom respin of kernel-debug-5.13.0-0.rc7.51.el9.x86_64 Nevermind. The kernel version number comes up differently from the version number of the rpm. $ rpm -qp kernel-debug-debuginfo-5.13.0-0.rc7.51.el9.x86_64.rpm kernel-debuginfo-common-x86_64-5.13.0-0.rc7.51.el9.x86_64.rpm kernel-debug-debuginfo-5.13.0-0.rc7.51.el9.x86_64 kernel-debuginfo-common-x86_64-5.13.0-0.rc7.51.el9.x86_64 $ rpm -qlp kernel-debug-debuginfo-5.13.0-0.rc7.51.el9.x86_64.rpm kernel-debuginfo-common-x86_64-5.13.0-0.rc7.51.el9.x86_64.rpm | grep vmlinux$ /usr/lib/debug/lib/modules/5.13.0-0.rc7.51.el9.x86_64+debug/vmlinux for vmcore 189530657 on galvatron.cee.redhat.com, I made a script to start crash with the necessary args: $ /cores/retrace/tasks/189530657/run_crash or dwysocha also fixed retrace by pointing to the vmlinux in the directory: $ retrace-server-interact 189530657 crash [ 167.450145] general protection fault, probably for non-canonical address 0xdffffc00007c8031: 0000 [#1] SMP KASAN PTI
[ 167.500143] KASAN: probably user-memory-access in range [0x0000000003e40188-0x0000000003e4018f]
[ 167.541390] CPU: 25 PID: 235 Comm: kworker/25:1 Kdump: loaded Tainted: G X --------- --- 5.13.0-0.rc7.51.el9.x86_64+debug #1
[ 167.599694] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019
[ 167.637878] Workqueue: inode_switch_wbs inode_switch_wbs_work_fn
[ 167.665487] RIP: 0010:inode_switch_wbs_work_fn+0x356/0x11a0
[ 167.691136] Code: c0 b4 0a 85 a4 48 bb 00 00 00 00 00 fc ff df 48 89 c5 83 e0 07 48 c1 ed 03 49 89 c7 48 01 dd 49 8d 7e 08 48 89 f8 48 c1 e8 03 <80> 3c 18 00 0f 85 0b 0c 00 00 49 8b 46 08 be 08 00 00 00 a8 01 48
[ 167.777735] RSP: 0000:ffffc9000751fbc8 EFLAGS: 00010002
[ 167.801685] RAX: 00000000007c8031 RBX: dffffc0000000000 RCX: 0000000000000001
[ 167.834915] RDX: 0000000000000006 RSI: ffff8888bfc15a10 RDI: 0000000003e40189
[ 167.867748] RBP: fffffbfff490a156 R08: ffff8884ee735800 R09: ffffffffa4850027
[ 167.900751] R10: fffffbfff490a004 R11: 0000000000000001 R12: ffff8884de02a000
[ 167.933931] R13: ffff8888bfc157b0 R14: 0000000003e40181 R15: 0000000000000004
[ 167.966759] FS: 0000000000000000(0000) GS:ffff888fdd400000(0000) knlGS:0000000000000000
[ 168.003981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 168.030543] CR2: 00007efc3a590000 CR3: 00000008d0530005 CR4: 00000000001706e0
[ 168.064145] Call Trace:
[ 168.075591] ? sched_clock_cpu+0x18/0x170
[ 168.094176] ? wbc_attach_and_unlock_inode+0x210/0x210
[ 168.117753] process_one_work+0x8b2/0x1480
[ 168.136524] ? pwq_dec_nr_in_flight+0x260/0x260
[ 168.157314] ? __lock_contended+0x910/0x910
[ 168.176527] ? worker_thread+0x150/0xf80
[ 168.194583] worker_thread+0x590/0xf80
[ 168.211966] ? __kthread_parkme+0xcb/0x1b0
[ 168.230758] ? process_one_work+0x1480/0x1480
[ 168.250704] kthread+0x368/0x440
[ 168.265525] ? _raw_spin_unlock_irq+0x24/0x30
[ 168.285489] ? set_mems_allowed+0x300/0x300
[ 168.304926] ret_from_fork+0x22/0x30
0xffffffffa1276b1a <inode_switch_wbs_work_fn+0x2fa>: jb 0xffffffffa12775a8 <inode_switch_wbs_work_fn+0xd88>
/usr/src/debug/kernel-5.13.0-0.rc7.51/linux-5.13.0-0.rc7.51.el9.x86_64/fs/fs-writeback.c: 405
0xffffffffa1276b20 <inode_switch_wbs_work_fn+0x300>: lea 0xb8(%rsp),%rax
0xffffffffa1276b28 <inode_switch_wbs_work_fn+0x308>: xor %edx,%edx
0xffffffffa1276b2a <inode_switch_wbs_work_fn+0x30a>: mov $0xffffffffffffffff,%rsi
0xffffffffa1276b31 <inode_switch_wbs_work_fn+0x311>: mov %rax,%rdi
0xffffffffa1276b34 <inode_switch_wbs_work_fn+0x314>: mov %rax,0x20(%rsp)
0xffffffffa1276b39 <inode_switch_wbs_work_fn+0x319>: callq 0xffffffffa17a4f40 <xas_find_marked>
0xffffffffa1276b3e <inode_switch_wbs_work_fn+0x31e>: mov %rax,%r14
0xffffffffa1276b41 <inode_switch_wbs_work_fn+0x321>: test %rax,%rax
0xffffffffa1276b44 <inode_switch_wbs_work_fn+0x324>: je 0xffffffffa1276be9 <inode_switch_wbs_work_fn+0x3c9>
0xffffffffa1276b4a <inode_switch_wbs_work_fn+0x32a>: mov $0xffffffffa4850ab4,%rax
0xffffffffa1276b51 <inode_switch_wbs_work_fn+0x331>: movabs $0xdffffc0000000000,%rbx
0xffffffffa1276b5b <inode_switch_wbs_work_fn+0x33b>: mov %rax,%rbp
0xffffffffa1276b5e <inode_switch_wbs_work_fn+0x33e>: and $0x7,%eax
0xffffffffa1276b61 <inode_switch_wbs_work_fn+0x341>: shr $0x3,%rbp
0xffffffffa1276b65 <inode_switch_wbs_work_fn+0x345>: mov %rax,%r15
0xffffffffa1276b68 <inode_switch_wbs_work_fn+0x348>: add %rbx,%rbp
/usr/src/debug/kernel-5.13.0-0.rc7.51/linux-5.13.0-0.rc7.51.el9.x86_64/./include/linux/page-flags.h: 334
0xffffffffa1276b6b <inode_switch_wbs_work_fn+0x34b>: lea 0x8(%r14),%rdi
0xffffffffa1276b6f <inode_switch_wbs_work_fn+0x34f>: mov %rdi,%rax
0xffffffffa1276b72 <inode_switch_wbs_work_fn+0x352>: shr $0x3,%rax
0xffffffffa1276b76 <inode_switch_wbs_work_fn+0x356>: cmpb $0x0,(%rax,%rbx,1)
cmpb $0x0,(%rax,%rbx,1)
the address was 0xdffffc00007c8031
[ 167.801685] RAX: 00000000007c8031 RBX: dffffc0000000000 RCX: 0000000000000001
the address in the instruction at 0xffffffffa1276b4a:
ffffffffa4850ab4 (D) nr_cpu_ids
not sure what 0xdffffc0000000000 is
399
400 /*
401 * Count and transfer stats. Note that PAGECACHE_TAG_DIRTY points
402 * to possibly dirty pages while PAGECACHE_TAG_WRITEBACK points to
403 * pages actually under writeback.
404 */
405 xas_for_each_marked(&xas, page, ULONG_MAX, PAGECACHE_TAG_DIRTY) {
406 if (PageDirty(page)) {
407 dec_wb_stat(old_wb, WB_RECLAIMABLE);
408 inc_wb_stat(new_wb, WB_RECLAIMABLE);
409 }
410 }
the address in %rsi is ffff8888bfc15a10
crash> kmem -s ffff8888bfc15a10
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
ffff8888c3c303c0 1840 8913 8944 559 32k xfs_inode
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffea0022ff0400 ffff8888bfc10000 1 16 16 0
FREE / [ALLOCATED]
[ffff8888bfc15540]
crash> xfs_inode.i_vnode -ox ffff8888bfc15540
struct xfs_inode {
[ffff8888bfc157b0] struct inode i_vnode;
crash> files -d ffff88850873b8a8
DENTRY INODE SUPERBLK TYPE PATH
ffff88850873b8a8 ffff8888bfc157b0 ffff8888991c6000 REG /daxmnt/test-file
on this mount:
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff8888a2b6c900 ffff8888991c6000 xfs /dev/pmem0 /daxmnt
(In reply to Carlos Maiolino from comment #10) > (In reply to xiaoli feng from comment #9) > Meanwhile, how reliable is the reproducer? Can you hit it every time? Could > you please share the steps to > start a beaker job with the reproducer? > I have a virtual machine running your reproducer for days and still couldn't > hit it, Although I'm using emulated > pmem, not real pmem hardware. From my tests, happen twice every five tests. I can't hit it every time. https://beaker.engineering.redhat.com/jobs/5543808 panic https://beaker.engineering.redhat.com/jobs/5543809 panic https://beaker.engineering.redhat.com/jobs/5543805 don't hit https://beaker.engineering.redhat.com/jobs/5544126 don't hit https://beaker.engineering.redhat.com/jobs/5544133 don't hit These two panics are on the same host fsqe-r430-02.lab.eng.bos.redhat.com. Then I run the same tests on it by manually. But I can't hit this panic. So it's a little hard to reproduce. But it's more easily in beaker. If you want to quick to start beaker job. You can open this link and click queue in https://beaker.engineering.redhat.com/jobs/clone?job_id=5544133. Thanks. Has this ever been seen in later/released RHEL9 kernels? Thanks, -Eric (In reply to Eric Sandeen from comment #16) > Has this ever been seen in later/released RHEL9 kernels? > > Thanks, > -Eric This hasn't been seen in kernel-debug-5.14.0-162.el9. Close this bug. |