Bug 1718736

Summary: [debug kernel] crash report: read error: kernel virtual address: ffff20000af33500 type: "idmap_ptrs_per_pgd" on a live system
Product: Red Hat Enterprise Linux 8 Reporter: Emma Wu <xiawu>
Component: crashAssignee: Dave Anderson <anderson>
Status: CLOSED ERRATA QA Contact: Ziqian SUN (Zamir) <zsun>
Severity: low Docs Contact:
Priority: unspecified    
Version: 8.1CC: anderson, ruyang
Target Milestone: rcKeywords: Reopened
Target Release: 8.0   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: crash-7.2.6-2.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 20:53:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1690227    

Comment 1 Dave Anderson 2019-06-10 13:16:13 UTC
> The error is reported when trying to read from /dev/mem. Crash should not show this error message unless 
> reading from /dev/mem and /proc/kcore both fail or debug flag is on: https://bugzilla.redhat.com/show_bug.cgi?id=1585944#c4

Read closely the last sentence:

  If the default live memory source /dev/mem is determined to be
  unusable because the kernel was configured with CONFIG_STRICT_DEVMEM,
  the first memory read during session initialization will fail.  The
  current behavior results in a readmem() error message, followed by two
  notification messages that indicate that /dev/mem is restricted and
  a switch to using /proc/kcore will be attempted; the readmem is
  reattempted from /proc/kcore, and if successful, the session will
  continue initialization.  With this patch, the behavior will change
  such that if the switch to /proc/kcore and the reattempted readmem()
  are successful, no messages will be displayed unless the crash
  session is invoked with "crash -d<number>".
  (anderson)

You are invoking the session with -d7, so the debug flag is "on", and
therefore the informational messages are intentionally displayed.

Comment 3 Dave Anderson 2019-06-10 14:42:46 UTC
Ah, so it is... sorry about that!

It's a simple fix:

--- a/arm64.c
+++ b/arm64.c
@@ -285,7 +285,7 @@ arm64_init(int when)
                case 65536:
                        if (kernel_symbol_exists("idmap_ptrs_per_pgd") &&
                            readmem(symbol_value("idmap_ptrs_per_pgd"), KVADDR,
-                           &value, sizeof(ulong), "idmap_ptrs_per_pgd", RETURN_ON_ERROR))
+                           &value, sizeof(ulong), "idmap_ptrs_per_pgd", QUIET|RETURN_ON_ERROR))
                                machdep->ptrs_per_pgd = value;
                
                        if (machdep->machspec->VA_BITS > PGDIR_SHIFT_L3_64K) {

But what I don't understand is why it's not seen on the regular kernel?

I sanity-check each new RHEL8 kernel version running live, and for example,
here is the latest 4.18.0-103.el8:

  [root@apm-mustang-ev3-07 ~]# crash

  crash 7.2.6-1.el8
  Copyright (C) 2002-2019  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
 
  GNU gdb (GDB) 7.6
  Copyright (C) 2013 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "aarch64-unknown-linux-gnu"...

        KERNEL: /usr/lib/debug/lib/modules/4.18.0-103.el8.aarch64/vmlinux
      DUMPFILE: /proc/kcore
          CPUS: 8
          DATE: Mon Jun 10 10:36:48 2019
        UPTIME: 00:00:58
  LOAD AVERAGE: 1.66, 0.51, 0.18
         TASKS: 229
      NODENAME: apm-mustang-ev3-07.khw2.lab.eng.bos.redhat.com
       RELEASE: 4.18.0-103.el8.aarch64
       VERSION: #1 SMP Sat Jun 8 15:47:32 UTC 2019
       MACHINE: aarch64  (unknown Mhz)
        MEMORY: 16 GB
           PID: 4031
       COMMAND: "crash"
          TASK: ffff8003724b3200  [THREAD_INFO: ffff8003724b3200]
           CPU: 4
         STATE: TASK_RUNNING (ACTIVE)

  crash> 

I'll investigate it further.

Comment 4 Dave Anderson 2019-06-10 16:22:04 UTC
(In reply to Dave Anderson from comment #3)
> 
> I'll investigate it further.

The issue is related to the kernel configuration of CONFIG_DEVMEM
in conjunction with the very first readmem() that is performed.

  $ pwd
  /home/git_repos/rhel8.1.0-git/configs
  $

  $ find . -name CONFIG_DEVMEM
  ./debug/aarch64/CONFIG_DEVMEM
  ./generic/CONFIG_DEVMEM
  ./generic/aarch64/CONFIG_DEVMEM
  $

Generically it is set to y:

  $ cat ./generic/CONFIG_DEVMEM
  CONFIG_DEVMEM=y
  $

But aarch64 overrides the above with two possibilities:

  $ cat ./debug/aarch64/CONFIG_DEVMEM
  CONFIG_DEVMEM=y
  $ cat ./generic/aarch64/CONFIG_DEVMEM
  # CONFIG_DEVMEM is not set
  $ 

On the debug aarch64, /dev/mem does exist, and so it gets used for the very
first readmem() of "idmap_ptrs_per_pgd".  But because of CONFIG_STRICT_DEVMEM=y,
it fails and prints out the error message -- but then gets retried using /proc/kcore
as the live memory source from that point on.

On the other (generic) architectures, /dev/mem also exists, but their
very first readmem() is marked QUIET similar to the patch above, so there
is no error message displayed.

On the generic aarch64, /dev/mem does *not* exist, so /proc/kcore is
set as the live memory source before the first readmem() is done.

Comment 5 Dave Anderson 2019-06-10 18:15:32 UTC
Patch applied upstream:

  https://github.com/crash-utility/crash/commit/bf48dd4e9926515345cad06c1bfce49d7a057a26

  Fix for Linux 4.16 and later ARM64 kernels that contain kernel commit
  fa2a8445b1d3810c52f2a6b3a006456bd1aacb7e, titled "arm64: allow ID map
  to be extended to 52 bits", and which have been configured with both
  CONFIG_DEVMEM=y and CONFIG_STRICT_DEVMEM=y.  Without the patch, an
  inconsequential error message indicating "crash: read error: kernel
  virtual address: <address> type: idmap_ptrs_per_pgd" is displayed
  during initialization.
  (anderson)

Comment 10 errata-xmlrpc 2019-11-05 20:53:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3349