Bug 1756206

Summary: [Intel 8.2 Feature] Crystal Ridge - Sub-section memory hotplug support [kexec-tools part]
Product: Red Hat Enterprise Linux 8 Reporter: Baoquan He <bhe>
Component: kexec-toolsAssignee: Baoquan He <bhe>
Status: CLOSED ERRATA QA Contact: Emma Wu <xiawu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.2CC: anderson, ruyang, xiawu
Target Milestone: rcKeywords: FutureFeature
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kexec-tools-2.0.20-7.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:43:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1724969, 1732733    

Description Baoquan He 2019-09-27 05:18:56 UTC
Description of problem:
kernel commit 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY
flag") added the flag to mem_section->section_mem_map value, and it caused
makedumpfile an error like the following:

readmem: Can't convert a virtual address(fffffc97d1000000) to physical address.
readmem: type_addr: 0, addr:fffffc97d1000000, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.

Upstream makedumpfile has fixed it with commit:
commit 7bdb468c2c99 ("[PATCH] Increase SECTION_MAP_LAST_BIT to 4")

The above kernel commit 326e1b8f83a4 will be back ported to rhel8.2
kernel, so the makedumpfile commit need be back ported. Otherwise
vmcore dumping will fail. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Dave Anderson 2019-11-25 15:45:39 UTC
> Description of problem:
> kernel commit 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY
> flag") added the flag to mem_section->section_mem_map value, and it caused
> makedumpfile an error like the following:
> 
> readmem: Can't convert a virtual address(fffffc97d1000000) to physical address.
> readmem: type_addr: 0, addr:fffffc97d1000000, size:32768
> __exclude_unnecessary_pages: Can't read the buffer of struct page.
> create_2nd_bitmap: Can't exclude unnecessary pages.
> 
> Upstream makedumpfile has fixed it with commit:
> commit 7bdb468c2c99 ("[PATCH] Increase SECTION_MAP_LAST_BIT to 4")
> 
> The above kernel commit 326e1b8f83a4 will be back ported to rhel8.2
> kernel, so the makedumpfile commit need be back ported. Otherwise
> vmcore dumping will fail. 
 
Your tests were run with crash-7.2.6-2.el8:

> crash 7.2.6-2.el8
...
> crash> bt -F
> PID: 6544   TASK: c0000000fd0d4900  CPU: 5   COMMAND: "runtest.sh"
>  #0 [c0000007f3a8f8b0] crash_kexec at c000000000251cd0
>     c0000007f3a8f8b0: c0000007f3a8f8f0 0000000000000000 
>     c0000007f3a8f8c0: crash_kexec+128  c0000007f3a8fa50 
>     c0000007f3a8f8d0: c0000007f3a8f8f0 0000000000000000 
>     c0000007f3a8f8e0: 000000000000000b c0000007f3a8fa50 
>  #1 [c0000007f3a8f8f0] oops_end at c000000000029c78
>     c0000007f3a8f8f0: c0000007f3a8f970 0000000000000005 
>     c0000007f3a8f900: oops_end+392     000000000000000b 
>     c0000007f3a8f910: c0000007f3a8f970 0000000000000001 
>     c0000007f3a8f920: bt: invalid kernel virtual address: ffffffff01fe0118  type: "page.slab"
>
> crash> kmem -s
> CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
> kmem: page_to_nid: invalid page: f000000001c9bb00
> kmem: kmalloc-8k(593:restraintd.service): cannot gather relevant slab data
> c0000007f8591800     8192          ?         ?      ?   256k  kmalloc-8k(593:restraintd.service)
> c0000007dd09de00    16384          0         0      0   512k  kmalloc-16k(593:restraintd.service)
> kmem: page_to_nid: invalid page: f000000001c99580
> kmem: kmalloc-8(593:restraintd.service): cannot gather relevant slab data
> c0000007f8592400        8          ?         ?      ?    64k  kmalloc-8(593:restraintd.service)
> kmem: page_to_nid: invalid page: f000000001f99d40
> kmem: pde_opener(593:restraintd.service): cannot gather relevant slab data
> c0000007f76c6600       40          ?         ?      ?    64k  pde_opener(593:restraintd.service)
> kmem: page_to_nid: invalid page: f000000001bb1800
> kmem: kmalloc-4k(593:restraintd.service): cannot gather relevant slab data
> c0000007f8596000     4096          ?         ?      ?   128k  kmalloc-4k(593:restraintd.service)
> kmem: page_to_nid: invalid page: f000000001fdaa40
> kmem: kmalloc-32(593:restraintd.service): cannot gather relevant slab data
> c0000007dd097200       32          ?         ?      ?    64k  kmalloc-32(593:restraintd.service)
> kmem: page_to_nid: invalid page: f000000001efcc00
> kmem: kmalloc-rcl-128(593:restraintd.service): cannot gather relevant slab data
> ...

Support for the Linux 5.3-rc1 SECTION_IS_EARLY bit was addressed in
crash-7.2.7 in this patch from Kazuhito Hagio, although the error
symptoms he described were different:

  commit e1df72964f8a583000e6cb74e54f8efbab6721ac
  Author: Dave Anderson <anderson>
  Date:   Fri Jul 26 14:31:33 2019 -0400

    Fix for the "kmem -n" option on Linux 5.3-rc1 and later kernels
    that contain commit 326e1b8f83a4318b09033ef754f40c785aed5e68,
    titled "mm/sparsemem: introduce a SECTION_IS_EARLY flag".  Without
    the patch, mem_map addresses containing the flag in bit 3 incorrectly
    show it as part of the virtual address; with the patch, the option
    displays the new "E" state flag.
    (k-hagio.nec.com)


Here is crash-7.2.7.el8 running the two commands above on the supplied vmcore 
from comment #3:
  
  crash> bt -F
  PID: 6544   TASK: c0000000fd0d4900  CPU: 5   COMMAND: "runtest.sh"
   #0 [c0000007f3a8f8b0] crash_kexec at c000000000251cd0
      c0000007f3a8f8b0: [thread_stack(593:restraintd.service)] 0000000000000000 
      c0000007f3a8f8c0: crash_kexec+128  [thread_stack(593:restraintd.service)] 
      c0000007f3a8f8d0: [thread_stack(593:restraintd.service)] 0000000000000000 
      c0000007f3a8f8e0: 000000000000000b [thread_stack(593:restraintd.service)] 
   #1 [c0000007f3a8f8f0] oops_end at c000000000029c78
      c0000007f3a8f8f0: [thread_stack(593:restraintd.service)] 0000000000000005 
      c0000007f3a8f900: oops_end+392     000000000000000b 
      c0000007f3a8f910: [thread_stack(593:restraintd.service)] 0000000000000001 
      c0000007f3a8f920: [kmalloc-4k]     0000000000000f7f 
      c0000007f3a8f930: [thread_stack(593:restraintd.service)] nvram_pstore_info+16 
      c0000007f3a8f940: 0000000000000000 0000000000000000 
      c0000007f3a8f950: 0000000000000000 0000000000000063 
      c0000007f3a8f960: 000000000000000b [thread_stack(593:restraintd.service)] 
   #2 [c0000007f3a8f970] bad_page_fault at c00000000007bb6c
      c0000007f3a8f970: [thread_stack(593:restraintd.service)] 0000000000000007 
      c0000007f3a8f980: bad_page_fault+268 0000000000000063 
      c0000007f3a8f990: [thread_stack(593:restraintd.service)] sysrq_handle_crash+40 
      c0000007f3a8f9a0: 0000000000000000 00126a8dfa540c7f 
      c0000007f3a8f9b0: 0000000000000166 0000000000000007 
      c0000007f3a8f9c0: 0000000000000007 0000000000000001 
      c0000007f3a8f9d0: suppress_printk  console_printk   
   #3 [c0000007f3a8f9e0] handle_page_fault at c00000000000a720
      c0000007f3a8f9e0: [thread_stack(593:restraintd.service)] c000000028222288 
      c0000007f3a8f9f0: handle_page_fault+52 [selinux_inode_security] 
      c0000007f3a8fa00: msg_print_text+216 00000001109ad558 
      c0000007f3a8fa10: [thread_stack(593:restraintd.service)] [proc_dir_entry] 
      c0000007f3a8fa20: 0000000000000117 000000000000000f 
      c0000007f3a8fa30: [thread_stack(593:restraintd.service)] 0044b82fa09b5a53 
      c0000007f3a8fa40: 7265677368657265 20c49ba5e353f7cf 
      c0000007f3a8fa50: __handle_sysrq+228 [thread_stack(593:restraintd.service)] 
      c0000007f3a8fa60: .TOC.            0000000000000063 
      c0000007f3a8fa70: c0000007ffc0cf90 c0000007ffc94668 
      c0000007f3a8fa80: 00126a8dfa53faec 0000000000000165 
      c0000007f3a8fa90: 0000000000000007 0000000000000001 
      c0000007f3a8faa0: 0000000000000000 0000000000000000 
      c0000007f3a8fab0: sysrq_handle_crash c000000007fa8a00 
      c0000007f3a8fac0: 0000000040000000 00000001109a9788 
      c0000007f3a8fad0: 00000001109a9714 0000000110946638 
      c0000007f3a8fae0: 00000001108def10 00000001109ad558 
      c0000007f3a8faf0: 00000100185059d0 0000000000000001 
      c0000007f3a8fb00: 0000000110959370 00007ffffc355324 
      c0000007f3a8fb10: 00007ffffc355320 sysrq_crash_op   
      c0000007f3a8fb20: 0000000000000000 0000000000000007 
      c0000007f3a8fb30: 0000000000000000 0000000000000063 
      c0000007f3a8fb40: suppress_printk  console_printk   
      c0000007f3a8fb50: sysrq_handle_crash+40 8000000000009033 
      c0000007f3a8fb60: slb_miss_common+228 sysrq_handle_crash 
      c0000007f3a8fb70: __handle_sysrq+228 000000000000000f 
      c0000007f3a8fb80: 0000000028222282 0000000000000000 
      c0000007f3a8fb90: 0000000000000300 0000000000000000 
      c0000007f3a8fba0: 0000000042000000 0000000000000000 
      c0000007f3a8fbb0: [thread_stack(593:restraintd.service)] console_sem      
      c0000007f3a8fbc0: 0000000000000001 0000000000000000 
      c0000007f3a8fbd0: [thread_stack(593:restraintd.service)] 000000000000000f 
      c0000007f3a8fbe0: irq_work_queue+156 log_buf_len      
      c0000007f3a8fbf0: [thread_stack(593:restraintd.service)] 0000000000002000 
      c0000007f3a8fc00: vprintk_emit+416 000000000000002c 
      c0000007f3a8fc10: 0000000110959370 00007ffffc355324 
      c0000007f3a8fc20: 00007ffffc355320 sysrq_crash_op   
      c0000007f3a8fc30: 0000000000000000 0000000000000007 
      c0000007f3a8fc40: 0000000000000000 kallsyms_token_index+13840 
      c0000007f3a8fc50: c0000000011ccfa0 c0000000011d0fa0 
      c0000007f3a8fc60: [thread_stack(593:restraintd.service)] 0000000000002000 
      c0000007f3a8fc70: vprintk_func+116 [thread_stack(593:restraintd.service)] 
      c0000007f3a8fc80: [thread_stack(593:restraintd.service)] 0000000000000100 
      c0000007f3a8fc90: [thread_stack(593:restraintd.service)] 0000000000000063 
      c0000007f3a8fca0: suppress_printk  console_printk   
      c0000007f3a8fcb0: [thread_stack(593:restraintd.service)] [dentry(465:systemd-hostnamed.service)] 
      c0000007f3a8fcc0: printk+64        [names_cache]    
   Data Access [300] exception frame:
   R0:  c00000000083be84    R1:  c0000007f3a8fcd0    R2:  c000000001717300   
   R3:  0000000000000063    R4:  c0000007ffc0cf90    R5:  c0000007ffc94668   
   R6:  00126a8dfa53faec    R7:  0000000000000165    R8:  0000000000000007   
   R9:  0000000000000001    R10: 0000000000000000    R11: 0000000000000000   
   R12: c00000000083aeb0    R13: c000000007fa8a00    R14: 0000000040000000   
   R15: 00000001109a9788    R16: 00000001109a9714    R17: 0000000110946638   
   R18: 00000001108def10    R19: 00000001109ad558    R20: 00000100185059d0   
   R21: 0000000000000001    R22: 0000000110959370    R23: 00007ffffc355324   
   R24: 00007ffffc355320    R25: c00000000161aad8    R26: 0000000000000000   
   R27: 0000000000000007    R28: 0000000000000000    R29: 0000000000000063   
   R30: c000000001752374    R31: c0000000015c4b18   
   NIP: c00000000083aed8    MSR: 8000000000009033    OR3: c000000000008934
   CTR: c00000000083aeb0    LR:  c00000000083be84    XER: 000000000000000f
   CCR: 0000000028222282    MQ:  0000000000000000    DAR: 0000000000000000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
   [NIP  : sysrq_handle_crash+40]
   [LR   : __handle_sysrq+228]
   #4 [c0000007f3a8fcd0] sysrq_handle_crash at c00000000083aed8
      c0000007f3a8fcd0: [thread_stack(593:restraintd.service)] [dentry(97:sysroot.mount)] 
      c0000007f3a8fce0: __handle_sysrq+200 .TOC.            
      c0000007f3a8fcf0: [thread_stack(593:restraintd.service)] kallsyms_token_index+576416 
      c0000007f3a8fd00: 000000000000000f 0000000053203a71 
      c0000007f3a8fd10: textbuf.49030+2  00000007fea40000 
      c0000007f3a8fd20: sysrq_handler+24 moom_work        
      c0000007f3a8fd30: [thread_stack(593:restraintd.service)] 00000001109aaf84 
      c0000007f3a8fd40: 0000000000000002 000001001848e8c0 
      c0000007f3a8fd50: 00000000fb06c200 0000000000000002 
      c0000007f3a8fd60: fffffffffffffffb 0000000000000002 
   #5 [c0000007f3a8fd70] write_sysrq_trigger at c00000000083c608
      c0000007f3a8fd70: [thread_stack(593:restraintd.service)] selinux_hooks+2360 
      c0000007f3a8fd80: write_sysrq_trigger+104 0000000000000000 
      c0000007f3a8fd90: [thread_stack(593:restraintd.service)] [proc_dir_entry] 
   #6 [c0000007f3a8fda0] proc_reg_write at c0000000005b3aa4
      c0000007f3a8fda0: [thread_stack(593:restraintd.service)] 000000000026e502 
      c0000007f3a8fdb0: proc_reg_write+132 .TOC.            
      c0000007f3a8fdc0: 0000000000000000 [cred_jar(593:restraintd.service)] 
   #7 [c0000007f3a8fdd0] sys_write at c0000000004d9738
      c0000007f3a8fdd0: [thread_stack(593:restraintd.service)] 00007fffaaf21858 
      c0000007f3a8fde0: sys_write+296    .TOC.            
      c0000007f3a8fdf0: 0000000000000000 00000100184ff060 
      c0000007f3a8fe00: do_syscall_trace_enter+404 000001001848e8c0 
      c0000007f3a8fe10: 0000000000000002 00007fffaaf21858 
      c0000007f3a8fe20: 000001001848e8c0 0000000000000002 
   #8 [c0000007f3a8fe30] system_call at c00000000000b388
   System Call [c00] exception frame:
   R0:  0000000000000004    R1:  00007ffffc355100    R2:  00007fffaaf27300   
   R3:  0000000000000001    R4:  000001001848e8c0    R5:  0000000000000002   
   R6:  0000000000000010    R7:  00007fffaad83af4    R8:  0000000000000000   
   R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
   R12: 0000000000000000    R13: 00007fffab03acd0    R14: 0000000040000000   
   R15: 00000001109a9788    R16: 00000001109a9714    R17: 0000000110946638   
   R18: 00000001108def10    R19: 00000001109ad558    R20: 00000100185059d0   
   R21: 0000000000000001    R22: 0000000110959370    R23: 00007ffffc355324   
   R24: 00007ffffc355320    R25: 00000001109aaf84    R26: 0000000000000002   
   R27: 000001001848e8c0    R28: 0000000000000002    R29: 00007fffaaf21858   
   R30: 000001001848e8c0    R31: 0000000000000002   
   NIP: 00007fffaae380f4    MSR: 800000000000f033    OR3: 0000000000000001
   CTR: 0000000000000000    LR:  00007fffaadb28e4    XER: 0000000000000000
   CCR: 0000000048222282    MQ:  0000000000000000    DAR: 0000010018488540
   DSISR: 000000000a000000     Syscall Result: 0000000000000000
  crash> 

  crash> kmem -s
  CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
  c0000007f8591800     8192          0        32      1   256k  kmalloc-8k(593:restraintd.service)
  c0000007dd09de00    16384          0         0      0   512k  kmalloc-16k(593:restraintd.service)
  c0000007f8592400        8          0      8192      1    64k  kmalloc-8(593:restraintd.service)
  c0000007f76c6600       40          0      8190      5    64k  pde_opener(593:restraintd.service)
  c0000007f8596000     4096          0        32      1   128k  kmalloc-4k(593:restraintd.service)
  c0000007dd097200       32          0     10240      5    64k  kmalloc-32(593:restraintd.service)
  c0000007eda8f900      128         20      4096      8    64k  kmalloc-rcl-128(593:restraintd.service)
  c0000007eda85d00       96         29      5456      8    64k  kmalloc-rcl-96(593:restraintd.service)
  c0000000ff2e9000      752          0       258      3    64k  shmem_inode_cache(593:restraintd.service)
  c0000007dac21b00       64       4357     12288     12    64k  kmalloc-rcl-64(593:restraintd.service)
  c0000007dd09ae00    65536          0        64      8   512k  kmalloc-64k(593:restraintd.service)
  c0000007dac2ed00     1112          8       448      8    64k  signal_cache(593:restraintd.service)
  c0000007dac20900     2088          3       240      8    64k  sighand_cache(593:restraintd.service)
  c0000007dac20c00      768          3       680      8    64k  files_cache(593:restraintd.service)
  c0000007dac23600     1024          1       512      8    64k  kmalloc-1k(593:restraintd.service)
  c0000007dac2bd00      192          1      2728      8    64k  kmalloc-192(593:restraintd.service)
  c0000007dd091800      752          0         0      0    64k  shmem_inode_cache(586:crond.service)
  c0000007dd09cf00       80          0         0      0    64k  task_delay_info(586:crond.service)
  c0000007eda8cf00     1664          0       312      8    64k  UDPv6(593:restraintd.service)
  c0000000ff2e6600     1408          0       368      8    64k  UDP(593:restraintd.service)
  c0000007f859b400       80          9      6552      8    64k  task_delay_info(593:restraintd.service)
  c0000007f859b700    16384          9       256      8   512k  thread_stack(593:restraintd.service)
  c0000007f8594200     6016          9       168      8   128k  task_struct(593:restraintd.service)
  c0000007f859c600     2392          0        52      2    64k  TCP(593:restraintd.service)
  c0000007f8596c00     2544          1       200      8    64k  TCPv6(593:restraintd.service)
  c0000007eda83f00      720          0         0      0    64k  proc_inode_cache(586:crond.service)
  c0000007dac28d00      576       3211      3808     34    64k  radix_tree_node(593:restraintd.service)
  ... [ cut ] ...
  c0000007fc01e400      256       1687      2816     11    64k  kmalloc-256
  c0000007fc01e700      192       1875      6479     19    64k  kmalloc-192
  c0000007fc01ea00      128       4316      7680     15    64k  kmalloc-128
  c0000007fc01ed00       96       3108      9548     14    64k  kmalloc-96
  c0000007fc01f000       64      52083     58368     57    64k  kmalloc-64
  c0000007fc01f300       32     438892    446464    218    64k  kmalloc-32
  c0000007fc01f600       16     114898    135168     33    64k  kmalloc-16
  c0000007fc01f900        8     382570    442368     54    64k  kmalloc-8
  c0000007fc01fc00       64        985      8192      8    64k  kmem_cache_node
  c0000007fc010000      744        985      1190     14    64k  kmem_cache
  crash>

Comment 10 errata-xmlrpc 2020-04-28 16:43:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1783