Bug 1897329 - BUG: Bad page state in process systemctl pfn:1367199
Summary: BUG: Bad page state in process systemctl pfn:1367199
Keywords:
Status: CLOSED DUPLICATE of bug 1897330
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel-rt
Version: 8.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: core-kernel-bot
QA Contact: Kernel General QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-12 19:38 UTC by Mark Simmons
Modified: 2023-08-08 03:28 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-12 19:44:36 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mark Simmons 2020-11-12 19:38:11 UTC
Description of problem: RT panic's when running stress-ng --memhotplug stressor. I confirmed that regular RHEL does not panic on the same test. 
RT will panic almost instantly with 1 stressor.
Regular RHEL does not panic when running 16 stressors


Version-Release number of selected component (if applicable):
Kernel:    4.18.0-247.rt14.12.el8.x86_64
Stress-ng: version 0.11.10


How reproducible:
Repro happens instantly every time i have run it.


Steps to Reproduce:
1. stress-ng --memhotplug 1 --timeout 10s --verbose


Actual results:
Host panics with the following Call Trace:

[  178.966681] Offlined Pages 262144
[  178.987765] Offlined Pages 262144
[  179.001403] BUG: Bad page state in process systemctl  pfn:1367199
[  179.007503] page:ffffd7bc4d9c6640 refcount:2 mapcount:129 mapping:ffff9488d7ed9601 index:0x1
[  179.015937] anon flags: 0x10000000040048(uptodate|active|swapbacked)
[  179.022287] raw: 0010000000040048 dead000000000100 dead000000000200 ffff9488d7ed9601
[  179.030027] raw: 0000000000000001 0000000000000000 0000000200000080 ffff949b84ab4000
[  179.037764] page dumped because: page still charged to cgroup
[  179.043512] page->mem_cgroup:ffff949b84ab4000
[  179.047872] bad because of flags: 0x40048(uptodate|active|swapbacked)
[  179.054311] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support dcdbas intel_uncore intel_rapl_perf pcspkr lpc_ich i2c_i801 ipmi_ssif mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_vram_helper drm_ttm_helper crc32c_intel ttm ixgbe bnxt_en drm mdio ahci dca libahci i2c_algo_bit megaraid_sas tg3 libata dm_mirror dm_region_hash dm_log dm_mod
[  179.117891] CPU: 10 PID: 2124 Comm: systemctl Kdump: loaded Tainted: G          I      --------- -  - 4.18.0-247.rt14.12.el8.x86_64 #1
[  179.117892] Hardware name: Dell Inc. PowerEdge R440/0N28XX, BIOS 1.2.71 12/06/2017
[  179.117893] Call Trace:
[  179.117903]  dump_stack+0x5c/0x80
[  179.117908]  bad_page.cold.114+0x89/0xc6
[  179.117911]  get_page_from_freelist+0xa70/0x1720
[  179.117916]  __alloc_pages_nodemask+0x10d/0x2b0
[  179.117922]  alloc_pages_vma+0xc5/0x170
[  179.117927]  __handle_mm_fault+0x27c/0xa70
[  179.117931]  handle_mm_fault+0xd2/0x1e0
[  179.117936]  __do_page_fault+0x28e/0x5d0
[  179.117939]  do_page_fault+0x47/0x1b0
[  179.117943]  ? page_fault+0x8/0x30
[  179.117946]  page_fault+0x1e/0x30
[  179.117948] RIP: 0033:0x7fe3b12a2b24
[  179.117951] Code: 1f 80 00 00 00 00 48 8b 08 8b 50 08 4c 01 f9 48 83 fa 26 74 0a 48 83 fa 08 0f 85 65 10 00 00 48 8b 50 10 48 83 c0 18 4c 01 fa <48> 89 11 48 39 c3 77 d4 4d 8b b2 d0 01 00 00 4d 85 f6 0f 85 7e fa
[  179.117952] RSP: 002b:00007ffe1e5e8550 EFLAGS: 00010206
[  179.117954] RAX: 00007fe3adb39730 RBX: 00007fe3adb64f00 RCX: 00007fe3adfae028
[  179.117955] RDX: 00007fe3add3a428 RSI: 00007fe3b14bf988 RDI: 00007fe3adb65d10
[  179.117956] RBP: 00007ffe1e5e8650 R08: 00007fe3adb65d10 R09: 0000000000000000
[  179.117957] R10: 00007fe3b14aff00 R11: 00007fe3b14aff00 R12: 0000000000000000

[  179.117958] R13: 0000000000000000 R14: 00007fe3b14c0150 R15: 00007fe3adaec000
[  179.117960] Disabling lock debugging due to kernel taint
[  179.251263] list_del corruption. prev->next should be ffffd7bc4d9c6ec8, but was ffff948be0f745f0
[  179.256908] BUG: Bad page map in process systemd-run  pte:8000001367199865 pmd:14d7fbc067
[  179.256911] page:ffffd7bc4d9c6640 refcount:1 mapcount:-1 mapping:ffff9488d7ed9601 index:0x1
[  179.256913] anon flags: 0x10000000040068(uptodate|lru|active|swapbacked)
[  179.256915] raw: 0010000000040068 ffffd7bc4d9c6608 ffffd7bc4d9c6b48 ffff9488d7ed9601
[  179.256917] raw: 0000000000000001 0000000000000000 00000001fffffffe ffff949b84ab4000
[  179.256917] page dumped because: bad pte
[  179.256918] page->mem_cgroup:ffff949b84ab4000
[  179.256920] addr:00000000475a3cf8 vm_flags:08100071 anon_vma:00000000a1034350 mapping:00000000980b101d index:286 
[  179.257025] file:libsystemd-shared-239.so fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage:xfs_vm_readpage [xfs]
[  179.257028] CPU: 5 PID: 2121 Comm: systemd-run Kdump: loaded Tainted: G    B     I      --------- -  - 4.18.0-247.rt14.12.el8.x86_64 #1
[  179.257029] Hardware name: Dell Inc. PowerEdge R440/0N28XX, BIOS 1.2.71 12/06/2017
[  179.257029] Call Trace:
[  179.257041]  dump_stack+0x5c/0x80
[  179.257048]  print_bad_pte.cold.110+0x6d/0xd5
[  179.257053]  ? __dec_node_state+0x69/0xf0
[  179.257056]  unmap_page_range+0x95d/0xbd0
[  179.257060]  unmap_vmas+0xae/0xd0
[  179.257065]  exit_mmap+0xaa/0x190
[  179.257073]  mmput+0x3a/0x100
[  179.257075]  do_exit+0x34b/0xbd0
[  179.257077]  ? handle_mm_fault+0xd2/0x1e0
[  179.257081]  ? syscall_trace_enter+0x202/0x310
[  179.257084]  do_group_exit+0x47/0xc0
[  179.257086]  __x64_sys_exit_group+0x14/0x20
[  179.257087]  do_syscall_64+0x87/0x1a0
[  179.257092]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  179.257095] RIP: 0033:0x7f12b0292b26
[  179.257099] Code: Bad RIP value.
[  179.257100] RSP: 002b:00007ffe38397838 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  179.257102] RAX: ffffffffffffffda RBX: 00007f12b0582880 RCX: 00007f12b0292b26
[  179.257103] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[  179.257104] RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffec0
[  179.257104] R10: 00007ffe38397602 R11: 0000000000000246 R12: 00007f12b0582880
[  179.257105] R13: 0000000000000002 R14: 00007f12b058b348 R15: 0000000000000000
[  179.262375] BUG: Bad page state in process sh  pfn:1367199
[  179.262378] page:ffffd7bc4d9c6640 refcount:0 mapcount:-1 mapping:0000000000000000 index:0x1
[  179.262379] flags: 0x10000000000000()
[  179.262381] raw: 0010000000000000 dead000000000100 dead000000000200 0000000000000000
[  179.262382] raw: 0000000000000001 0000000000000000 00000000fffffffe 0000000000000000
[  179.262383] page dumped because: nonzero mapcount
[  179.262384] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support dcdbas intel_uncore intel_rapl_perf pcspkr lpc_ich i2c_i801 ipmi_ssif mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_vram_helper drm_ttm_helper crc32c_intel ttm ixgbe bnxt_en drm mdio ahci dca libahci i2c_algo_bit megaraid_sas tg3 libata dm_mirror dm_region_hash dm_log dm_mod








Expected results:


Additional info:

Comment 1 Mark Simmons 2020-11-12 19:44:36 UTC

*** This bug has been marked as a duplicate of bug 1897330 ***


Note You need to log in before you can comment on or make changes to this bug.