Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1897329

Summary: BUG: Bad page state in process systemctl pfn:1367199
Product: Red Hat Enterprise Linux 8 Reporter: Mark Simmons <msimmons>
Component: kernel-rtAssignee: core-kernel-bot <core-kernel-mgr>
kernel-rt sub component: Memory Management QA Contact: Kernel General QE <kernel-general-qe>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bhu, mm-maint, rt-maint, rt-qe
Version: 8.4Flags: pm-rhel: mirror+
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-12 19:44:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mark Simmons 2020-11-12 19:38:11 UTC
Description of problem: RT panic's when running stress-ng --memhotplug stressor. I confirmed that regular RHEL does not panic on the same test. 
RT will panic almost instantly with 1 stressor.
Regular RHEL does not panic when running 16 stressors


Version-Release number of selected component (if applicable):
Kernel:    4.18.0-247.rt14.12.el8.x86_64
Stress-ng: version 0.11.10


How reproducible:
Repro happens instantly every time i have run it.


Steps to Reproduce:
1. stress-ng --memhotplug 1 --timeout 10s --verbose


Actual results:
Host panics with the following Call Trace:

[  178.966681] Offlined Pages 262144
[  178.987765] Offlined Pages 262144
[  179.001403] BUG: Bad page state in process systemctl  pfn:1367199
[  179.007503] page:ffffd7bc4d9c6640 refcount:2 mapcount:129 mapping:ffff9488d7ed9601 index:0x1
[  179.015937] anon flags: 0x10000000040048(uptodate|active|swapbacked)
[  179.022287] raw: 0010000000040048 dead000000000100 dead000000000200 ffff9488d7ed9601
[  179.030027] raw: 0000000000000001 0000000000000000 0000000200000080 ffff949b84ab4000
[  179.037764] page dumped because: page still charged to cgroup
[  179.043512] page->mem_cgroup:ffff949b84ab4000
[  179.047872] bad because of flags: 0x40048(uptodate|active|swapbacked)
[  179.054311] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support dcdbas intel_uncore intel_rapl_perf pcspkr lpc_ich i2c_i801 ipmi_ssif mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_vram_helper drm_ttm_helper crc32c_intel ttm ixgbe bnxt_en drm mdio ahci dca libahci i2c_algo_bit megaraid_sas tg3 libata dm_mirror dm_region_hash dm_log dm_mod
[  179.117891] CPU: 10 PID: 2124 Comm: systemctl Kdump: loaded Tainted: G          I      --------- -  - 4.18.0-247.rt14.12.el8.x86_64 #1
[  179.117892] Hardware name: Dell Inc. PowerEdge R440/0N28XX, BIOS 1.2.71 12/06/2017
[  179.117893] Call Trace:
[  179.117903]  dump_stack+0x5c/0x80
[  179.117908]  bad_page.cold.114+0x89/0xc6
[  179.117911]  get_page_from_freelist+0xa70/0x1720
[  179.117916]  __alloc_pages_nodemask+0x10d/0x2b0
[  179.117922]  alloc_pages_vma+0xc5/0x170
[  179.117927]  __handle_mm_fault+0x27c/0xa70
[  179.117931]  handle_mm_fault+0xd2/0x1e0
[  179.117936]  __do_page_fault+0x28e/0x5d0
[  179.117939]  do_page_fault+0x47/0x1b0
[  179.117943]  ? page_fault+0x8/0x30
[  179.117946]  page_fault+0x1e/0x30
[  179.117948] RIP: 0033:0x7fe3b12a2b24
[  179.117951] Code: 1f 80 00 00 00 00 48 8b 08 8b 50 08 4c 01 f9 48 83 fa 26 74 0a 48 83 fa 08 0f 85 65 10 00 00 48 8b 50 10 48 83 c0 18 4c 01 fa <48> 89 11 48 39 c3 77 d4 4d 8b b2 d0 01 00 00 4d 85 f6 0f 85 7e fa
[  179.117952] RSP: 002b:00007ffe1e5e8550 EFLAGS: 00010206
[  179.117954] RAX: 00007fe3adb39730 RBX: 00007fe3adb64f00 RCX: 00007fe3adfae028
[  179.117955] RDX: 00007fe3add3a428 RSI: 00007fe3b14bf988 RDI: 00007fe3adb65d10
[  179.117956] RBP: 00007ffe1e5e8650 R08: 00007fe3adb65d10 R09: 0000000000000000
[  179.117957] R10: 00007fe3b14aff00 R11: 00007fe3b14aff00 R12: 0000000000000000

[  179.117958] R13: 0000000000000000 R14: 00007fe3b14c0150 R15: 00007fe3adaec000
[  179.117960] Disabling lock debugging due to kernel taint
[  179.251263] list_del corruption. prev->next should be ffffd7bc4d9c6ec8, but was ffff948be0f745f0
[  179.256908] BUG: Bad page map in process systemd-run  pte:8000001367199865 pmd:14d7fbc067
[  179.256911] page:ffffd7bc4d9c6640 refcount:1 mapcount:-1 mapping:ffff9488d7ed9601 index:0x1
[  179.256913] anon flags: 0x10000000040068(uptodate|lru|active|swapbacked)
[  179.256915] raw: 0010000000040068 ffffd7bc4d9c6608 ffffd7bc4d9c6b48 ffff9488d7ed9601
[  179.256917] raw: 0000000000000001 0000000000000000 00000001fffffffe ffff949b84ab4000
[  179.256917] page dumped because: bad pte
[  179.256918] page->mem_cgroup:ffff949b84ab4000
[  179.256920] addr:00000000475a3cf8 vm_flags:08100071 anon_vma:00000000a1034350 mapping:00000000980b101d index:286 
[  179.257025] file:libsystemd-shared-239.so fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage:xfs_vm_readpage [xfs]
[  179.257028] CPU: 5 PID: 2121 Comm: systemd-run Kdump: loaded Tainted: G    B     I      --------- -  - 4.18.0-247.rt14.12.el8.x86_64 #1
[  179.257029] Hardware name: Dell Inc. PowerEdge R440/0N28XX, BIOS 1.2.71 12/06/2017
[  179.257029] Call Trace:
[  179.257041]  dump_stack+0x5c/0x80
[  179.257048]  print_bad_pte.cold.110+0x6d/0xd5
[  179.257053]  ? __dec_node_state+0x69/0xf0
[  179.257056]  unmap_page_range+0x95d/0xbd0
[  179.257060]  unmap_vmas+0xae/0xd0
[  179.257065]  exit_mmap+0xaa/0x190
[  179.257073]  mmput+0x3a/0x100
[  179.257075]  do_exit+0x34b/0xbd0
[  179.257077]  ? handle_mm_fault+0xd2/0x1e0
[  179.257081]  ? syscall_trace_enter+0x202/0x310
[  179.257084]  do_group_exit+0x47/0xc0
[  179.257086]  __x64_sys_exit_group+0x14/0x20
[  179.257087]  do_syscall_64+0x87/0x1a0
[  179.257092]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  179.257095] RIP: 0033:0x7f12b0292b26
[  179.257099] Code: Bad RIP value.
[  179.257100] RSP: 002b:00007ffe38397838 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  179.257102] RAX: ffffffffffffffda RBX: 00007f12b0582880 RCX: 00007f12b0292b26
[  179.257103] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[  179.257104] RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffec0
[  179.257104] R10: 00007ffe38397602 R11: 0000000000000246 R12: 00007f12b0582880
[  179.257105] R13: 0000000000000002 R14: 00007f12b058b348 R15: 0000000000000000
[  179.262375] BUG: Bad page state in process sh  pfn:1367199
[  179.262378] page:ffffd7bc4d9c6640 refcount:0 mapcount:-1 mapping:0000000000000000 index:0x1
[  179.262379] flags: 0x10000000000000()
[  179.262381] raw: 0010000000000000 dead000000000100 dead000000000200 0000000000000000
[  179.262382] raw: 0000000000000001 0000000000000000 00000000fffffffe 0000000000000000
[  179.262383] page dumped because: nonzero mapcount
[  179.262384] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support dcdbas intel_uncore intel_rapl_perf pcspkr lpc_ich i2c_i801 ipmi_ssif mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_vram_helper drm_ttm_helper crc32c_intel ttm ixgbe bnxt_en drm mdio ahci dca libahci i2c_algo_bit megaraid_sas tg3 libata dm_mirror dm_region_hash dm_log dm_mod








Expected results:


Additional info:

Comment 1 Mark Simmons 2020-11-12 19:44:36 UTC

*** This bug has been marked as a duplicate of bug 1897330 ***