Bug 1451181

Summary:

crash-ptdump-command: ptump cmd returns "[0] invalid ring_buffer..."

Product:

Red Hat Enterprise Linux 7

Reporter:

Emma Wu <xiawu>

Component:

crash-ptdump-command

Assignee:

Dave Anderson <anderson>

Status:

CLOSED ERRATA

QA Contact:

Emma Wu <xiawu>

Severity:

unspecified

Docs Contact:

Priority:

high

Version:

7.4

CC:

anderson, chorn, fj-lsoft-kernel-it, ksanagi, lwang, muneda.takahiro, qguo, qzhao, salmy, tumeya, xiawu, yishimat, yoguma

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Unspecified

Whiteboard:

Fixed In Version:

crash-ptdump-command-1.0.3-2.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-08-01 07:32:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1446211

Attachments:

Description	Flags
crash-ptdump-command-1.0.3-2.el7.src.rpm (scratch package)	none

Description Emma Wu 2017-05-16 04:27:06 UTC

Description of problem:

Tested on 7.4 Beta Compose + crash-7.1.9-2.el7.x86_64

Run ptdump cmd, it returns '[0] invalid ring_buffer...'

crash> extend ptdump.so
/usr/lib64/crash/extensions/ptdump.so: shared object loaded
crash> ptdump /root/ptdump
[0] invalid ring_buffer
[1] invalid ring_buffer
....

Version-Release number of selected component (if applicable):
Distro: RHEL-7.4-20170504.0
kernel: 3.10.0-663.el7.x86_64
crash-ptdump-command-1.0.3-1.el7.x86_64
crash-7.1.9-2.el7.x86_64

How reproducible:
Found on dell-per730-03.khw.lab.eng.bos.redhat.com

Steps to Reproduce:
1. Find a machine with CPUs having intel_pt flag
2. Install the machine with RHEL-7.4-20170504.0
3. Install kernel-debuginfo, crash-ptdump-command and crash-7.1.9-2.el7.x86_64
4. Run crash, load ptdump.so and run ptdump cmd.

Actual results:
ptdump cmd returns "invalid ring_buffer"

[root@dell-per730-03 ptdump]# crash

crash 7.1.9-2.el7
Copyright (C) 2002-2016  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [630MB]: patching 77971 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-663.el7.x86_64/vmlinux 
    DUMPFILE: /dev/crash
        CPUS: 48
        DATE: Tue May 16 00:19:56 2017
      UPTIME: 16:02:41
LOAD AVERAGE: 0.34, 0.09, 0.07
       TASKS: 524
    NODENAME: dell-per730-03.khw.lab.eng.bos.redhat.com
     RELEASE: 3.10.0-663.el7.x86_64
     VERSION: #1 SMP Tue May 2 16:00:29 EDT 2017
     MACHINE: x86_64  (2199 Mhz)
      MEMORY: 63.9 GB
         PID: 33308
     COMMAND: "crash"
        TASK: ffff966a55a64e70  [THREAD_INFO: ffff966a34b98000]
         CPU: 13
       STATE: TASK_RUNNING (ACTIVE)

crash> extend ptdump.so
/usr/lib64/crash/extensions/ptdump.so: shared object loaded
crash> ptdump /root/test
[0] invalid ring_buffer
[1] invalid ring_buffer
[2] invalid ring_buffer
....

Expected results:


Additional info:

I tried to enable the pt logger based on
https://bugzilla.redhat.com/show_bug.cgi?id=1298172#c22
It failed..

perf-3.10.0-663.el7.x86_64

[root@dell-per730-03 ptdump]# perf record -vv -a -T -e intel_pt// -S -o /dev/null &
[2] 33342
[root@dell-per730-03 ptdump]# Using CPUID GenuineIntel-6-4F
intel_pt default config: tsc
intel_pt psb_period 256
Intel PT snapshot size: 4194304
------------------------------------------------------------
perf_event_attr:
  type                             7
  size                             112
  config                           0x400
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|CPU|IDENTIFIER
  read_format                      ID
  disabled                         1
  inherit                          1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -16
Error:
The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (intel_pt//).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?

[2]+  Exit 255                perf record -vv -a -T -e intel_pt// -S -o /dev/null

Comment 2 Dave Anderson 2017-05-16 13:22:54 UTC

The crash-ptdump-command package was written by, and is supported, by Fujitsu.
I have added the author MUNEDA Takahiro <muneda.takahiro.com>
to the cc list.

Can somebody from Fujitsu please investigate this bugzilla?

Comment 3 Dave Anderson 2017-05-16 13:26:55 UTC

Note that I added linuxdev-kernel-it.fujitsu.com to the cc list,
but emails to them ("Fujitsu kernel engineers") from this bugzilla seem
to get excluded for some reason.  That was the email address of the 
reporter of the original new-package-request bugzilla:

  Bug 1298172 - [Fujitsu 7.3 FEAT]: New package request: crash-ptdump-command
                extension module for the crash utility

Comment 4 Koki Sanagi 2017-05-16 14:40:27 UTC

Hi Dave,

I'll ask FJ guys to investigate it.
But I can't see the description or the details of the issue from this BZ.
Maybe it might be written as private.

Koki

Comment 5 Fujitsu kernel team 2017-05-18 05:12:35 UTC

Hi Dave,

This problem does not occur if we add 'nokaslr' in the boot parameter.
Therefore, I think this problem is caused by kaslr.

Regards,
Yuki Inoguchi

Comment 6 Dave Anderson 2017-05-18 14:35:04 UTC

> This problem does not occur if we add 'nokaslr' in the boot parameter.
> Therefore, I think this problem is caused by kaslr.

Actually it is not a "KASLR problem" per se, but rather it is symptomatic
of the text address modifications done by KASLR.  

Here is the bug reported in comment #0:

> Run ptdump cmd, it returns '[0] invalid ring_buffer...'
>
> crash> extend ptdump.so
> /usr/lib64/crash/extensions/ptdump.so: shared object loaded
> crash> ptdump /root/ptdump
> [0] invalid ring_buffer
> [1] invalid ring_buffer
> ...

That comes from line 119 in ptdump.c:

        /* symbol access check */
        if (STRUCT_EXISTS("ring_buffer") &&
            !MEMBER_EXISTS("ring_buffer", "aux_pages")) {
                fprintf(fp, "[%d] invalid ring_buffer\n", cpu);
                return FALSE;
        }

Running on a 3.10.0-665.el7.x86_64 KASLR kernel, this is a ring_buffer
structure that the embedded gdb module finds first, which comes
from "kernel/trace/ring_buffer.c":
  
  crash> struct ring_buffer
  struct ring_buffer {
      unsigned int flags;
      int cpus;
      atomic_t record_disabled;
      atomic_t resize_disabled;
      cpumask_var_t cpumask;
      struct lock_class_key *reader_lock_key;
      struct mutex mutex;
      struct ring_buffer_per_cpu **buffers;
      struct notifier_block cpu_notify;
      u64 (*clock)(void);
      struct rb_irq_work irq_work;
  }
  SIZE: 168
  crash> 
  
It has no "aux_pages" member, hence the error message.
  
The ptdump.so module is accessing a different "ring_buffer" structure
from "kernel/events/internal.h".  Therefore the gdb text "scope" needs 
to be changed to a kernel text region where the desired ring_buffer 
structure would be in scope.  
  
Note how the following works, where the ring_buffer structure that is selected
by gdb depends upon whatever the gdb "scope" is.  By default, it finds this
one if KASLR is in effect:
  
  crash> ring_buffer
  struct ring_buffer {
      unsigned int flags;
      int cpus;
      atomic_t record_disabled;
      atomic_t resize_disabled;
      cpumask_var_t cpumask;
      struct lock_class_key *reader_lock_key;
      struct mutex mutex;
      struct ring_buffer_per_cpu **buffers;
      struct notifier_block cpu_notify;
      u64 (*clock)(void);
      struct rb_irq_work irq_work;
  }
  SIZE: 168
  crash>
  
I looked the kernel code for a text function that accesses the other
ring_buffer structure contents, .e.g., the perf_mmap_to_page function,
which accesses rb->aux_pages:
  
  struct page *
  perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
  {
          if (rb->aux_nr_pages) {
                  /* above AUX space */
                  if (pgoff > rb->aux_pgoff + rb->aux_nr_pages)
                          return NULL;
  
                  /* AUX space */
                  if (pgoff >= rb->aux_pgoff)
                          return virt_to_page(rb->aux_pages[pgoff - rb->aux_pgoff]);
          }
  
          return __perf_mmap_to_page(rb, pgoff);
  }
  
So if I change the gdb scope to that function:
  
  crash> set scope perf_mmap_to_page
  scope: ffffffff8397aaa0 (perf_mmap_to_page)
  crash>
  
Now I see the ring_buffer structure that your module needs:
  
  crash> ring_buffer
  struct ring_buffer {
      atomic_t refcount;
      struct callback_head callback_head;
      int nr_pages;
      int overwrite;
      int paused;
      atomic_t poll;
      local_t head;
      local_t nest;
      local_t events;
      local_t wakeup;
      local_t lost;
      long watermark;
      long aux_watermark;
      spinlock_t event_lock;
      struct list_head event_list;
      atomic_t mmap_count;
      unsigned long mmap_locked;
      struct user_struct *mmap_user;
      local_t aux_head;
      local_t aux_nest;
      local_t aux_wakeup;
      unsigned long aux_pgoff;
      int aux_nr_pages;
      int aux_overwrite;
      atomic_t aux_mmap_count;
      unsigned long aux_mmap_locked;
      void (*free_aux)(void *);
      atomic_t aux_refcount;
      void **aux_pages;
      void *aux_priv;
      struct perf_event_mmap_page *user_page;
      void *data_pages[];
  }
  SIZE: 240
  crash> 
  
And then your module would work.

As far as the KASLR relationship, the kernel virtual address modification
done by KASLR inadvertently modifies the gdb scope, because internal to 
the embedded gdb module, the original text virtual addresses are still 
used for the gdb scope.

This is similar to a modification to the crash-trace-command module in
rhel-7.3.  Note that the same type of "ring_buffer" issue occurred there:

  Bug 1265553 - crash-trace-command: failed to init the offset, struct:ftrace_event_call, member:list
  https://bugzilla.redhat.com/show_bug.cgi?id=1265553#c7

This was the part of that patch-set that fixed it:

@@ -176,6 +183,9 @@ static int init_offsets(void)
                        fprintf(fp, "per cpu buffer sizes\n");
        }

+       if (kernel_symbol_exists("ring_buffer_read"))
+               gdb_set_crash_scope(symbol_value("ring_buffer_read"), "ring_buffer_read");
+
        if (!per_cpu_buffer_sizes)
                init_offset(ring_buffer, pages);
        init_offset(ring_buffer, flags);

The ptdump.so module is going to have to do the same kind of thing,
where it should set the gdb scope to get the ring_buffer structure 
declaration that it needs.

Dave

Comment 7 Dave Anderson 2017-05-18 15:50:45 UTC

Here is a suggested patch:

--- ptdump-1.0.3/ptdump.c.orig
+++ ptdump-1.0.3/ptdump.c
@@ -502,6 +502,14 @@ cmd_ptdump(void)
 		return;
 	}
 
+	/* 
+	 * Set the gdb scope to ensure that the appropriate ring_buffer 
+	 * structure is selected. 
+	 */
+	if (kernel_symbol_exists("perf_mmap_to_page"))
+		gdb_set_crash_scope(symbol_value("perf_mmap_to_page"), 
+			"perf_mmap_to_page");
+
 	online_cpus = get_cpus_online();
 	list_len = sizeof(struct pt_info)*kt->cpus;
 	pt_info_list = malloc(list_len);

Does this patch suffice?  If so, I can move ahead with getting this
into rhel-7.4.

Comment 8 Dave Anderson 2017-05-18 18:27:07 UTC

Created attachment 1280175 [details]
crash-ptdump-command-1.0.3-2.el7.src.rpm  (scratch package)

Comment 9 Dave Anderson 2017-05-18 18:31:06 UTC

Fujitsu,

I have attached a scratch package that contains the proposed patch,
(as well as adding RPM_OPT_FLAGS to the ptdump.mk file for BZ #1450708).

Can you review it?

Comment 10 Dave Anderson 2017-05-18 18:32:29 UTC

Emma,

I have created a scratch build of the attached package here:

Information for task buildArch (crash-ptdump-command-1.0.3-2.el7.src.rpm, x86_64)
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13230889

Can you please test it?

Comment 11 Emma Wu 2017-05-18 23:26:49 UTC

Dave,

Sure. I will let you know the result once it's done.

Comment 13 Fujitsu kernel team 2017-05-19 06:11:55 UTC

Hi Dave, 

Thank you for the fix!

> I have attached a scratch package that contains the proposed patch,
> (as well as adding RPM_OPT_FLAGS to the ptdump.mk file for BZ #1450708).
> 
> Can you review it?

Yes.
We reviewed the patch and confirmed the problem is fixed.

By the way, is it ok to forward your patch to crash ML (crash-utility) 
so we can fix the upstream ptdump too?

Regards,
Yuki Inoguchi

Comment 14 Emma Wu 2017-05-19 10:09:27 UTC

Hi Dave and FJ,

I tested new the ptdump build: 
   crash-ptdump-command-1.0.3-2.el7.x86_64.rpm
   crash-7.1.9-2.el7
   kernel: 3.10.0-663.el7 

It seems worked correctly. 
crash> ptdump /root/test
[0] ring buffer is zero
[1] ring buffer is zero

But if I run it after running cmd "perf record -vv -a -T -e intel_pt// -S -o /dev/null &", it displays:

crash> ptdump /root/test2
ptdump: invalid size request: 0  type: "read page for write"


Here the log:
[root@dell-per730-03 ~]# perf record -vv -a -T -e intel_pt// -S -o /dev/null &
[1] 14890
[root@dell-per730-03 ~]# Using CPUID GenuineIntel-6-4F
intel_pt default config: tsc
intel_pt psb_period 256
Intel PT snapshot size: 4194304
------------------------------------------------------------
perf_event_attr:
  type                             7
  size                             112
  config                           0x400
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|CPU|IDENTIFIER
  read_format                      ID
  disabled                         1
  inherit                          1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 4
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 6
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
sys_perf_event_open: pid -1  cpu 24  group_fd -1  flags 0x8 = 28
sys_perf_event_open: pid -1  cpu 25  group_fd -1  flags 0x8 = 29
sys_perf_event_open: pid -1  cpu 26  group_fd -1  flags 0x8 = 30
sys_perf_event_open: pid -1  cpu 27  group_fd -1  flags 0x8 = 31
sys_perf_event_open: pid -1  cpu 28  group_fd -1  flags 0x8 = 32
sys_perf_event_open: pid -1  cpu 29  group_fd -1  flags 0x8 = 33
sys_perf_event_open: pid -1  cpu 30  group_fd -1  flags 0x8 = 34
sys_perf_event_open: pid -1  cpu 31  group_fd -1  flags 0x8 = 35
sys_perf_event_open: pid -1  cpu 32  group_fd -1  flags 0x8 = 36
sys_perf_event_open: pid -1  cpu 33  group_fd -1  flags 0x8 = 37
sys_perf_event_open: pid -1  cpu 34  group_fd -1  flags 0x8 = 38
sys_perf_event_open: pid -1  cpu 35  group_fd -1  flags 0x8 = 39
sys_perf_event_open: pid -1  cpu 36  group_fd -1  flags 0x8 = 40
sys_perf_event_open: pid -1  cpu 37  group_fd -1  flags 0x8 = 41
sys_perf_event_open: pid -1  cpu 38  group_fd -1  flags 0x8 = 42
sys_perf_event_open: pid -1  cpu 39  group_fd -1  flags 0x8 = 43
sys_perf_event_open: pid -1  cpu 40  group_fd -1  flags 0x8 = 44
sys_perf_event_open: pid -1  cpu 41  group_fd -1  flags 0x8 = 45
sys_perf_event_open: pid -1  cpu 42  group_fd -1  flags 0x8 = 46
sys_perf_event_open: pid -1  cpu 43  group_fd -1  flags 0x8 = 47
sys_perf_event_open: pid -1  cpu 44  group_fd -1  flags 0x8 = 48
sys_perf_event_open: pid -1  cpu 45  group_fd -1  flags 0x8 = 49
sys_perf_event_open: pid -1  cpu 46  group_fd -1  flags 0x8 = 50
sys_perf_event_open: pid -1  cpu 47  group_fd -1  flags 0x8 = 51
------------------------------------------------------------
perf_event_attr:
  type                             1
  size                             112
  config                           0x9
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|CPU|IDENTIFIER
  read_format                      ID
  inherit                          1
  exclude_kernel                   1
  exclude_hv                       1
  mmap                             1
  comm                             1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
  context_switch                   1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 52
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 53
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 54
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 55
sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 56
sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 57
sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 58
sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 59
sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 60
sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 61
sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 62
sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 63
sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 64
sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 65
sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 66
sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 67
sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 68
sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 69
sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 70
sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 71
sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 72
sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 73
sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 74
sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 75
sys_perf_event_open: pid -1  cpu 24  group_fd -1  flags 0x8 = 76
sys_perf_event_open: pid -1  cpu 25  group_fd -1  flags 0x8 = 77
sys_perf_event_open: pid -1  cpu 26  group_fd -1  flags 0x8 = 78
sys_perf_event_open: pid -1  cpu 27  group_fd -1  flags 0x8 = 79
sys_perf_event_open: pid -1  cpu 28  group_fd -1  flags 0x8 = 80
sys_perf_event_open: pid -1  cpu 29  group_fd -1  flags 0x8 = 81
sys_perf_event_open: pid -1  cpu 30  group_fd -1  flags 0x8 = 82
sys_perf_event_open: pid -1  cpu 31  group_fd -1  flags 0x8 = 83
sys_perf_event_open: pid -1  cpu 32  group_fd -1  flags 0x8 = 84
sys_perf_event_open: pid -1  cpu 33  group_fd -1  flags 0x8 = 85
sys_perf_event_open: pid -1  cpu 34  group_fd -1  flags 0x8 = 86
sys_perf_event_open: pid -1  cpu 35  group_fd -1  flags 0x8 = 87
sys_perf_event_open: pid -1  cpu 36  group_fd -1  flags 0x8 = 88
sys_perf_event_open: pid -1  cpu 37  group_fd -1  flags 0x8 = 89
sys_perf_event_open: pid -1  cpu 38  group_fd -1  flags 0x8 = 90
sys_perf_event_open: pid -1  cpu 39  group_fd -1  flags 0x8 = 91
sys_perf_event_open: pid -1  cpu 40  group_fd -1  flags 0x8 = 92
sys_perf_event_open: pid -1  cpu 41  group_fd -1  flags 0x8 = 93
sys_perf_event_open: pid -1  cpu 42  group_fd -1  flags 0x8 = 94
sys_perf_event_open: pid -1  cpu 43  group_fd -1  flags 0x8 = 95
sys_perf_event_open: pid -1  cpu 44  group_fd -1  flags 0x8 = 96
sys_perf_event_open: pid -1  cpu 45  group_fd -1  flags 0x8 = 97
sys_perf_event_open: pid -1  cpu 46  group_fd -1  flags 0x8 = 98
sys_perf_event_open: pid -1  cpu 47  group_fd -1  flags 0x8 = 99
mmap size 528384B
AUX area mmap length 4194304
perf event ring buffer mmapped per cpu
Synthesizing TSC conversion information
Synthesizing auxtrace information

[root@dell-per730-03 ~]# crash

crash 7.1.9-2.el7
Copyright (C) 2002-2016  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [788MB]: patching 77971 gdb minimal_symbol values

please wait... (patching 77971 gdb minimal_symbol values) 
      KERNEL: /usr/lib/debug/lib/modules/3.10.0-663.el7.x86_64/vmlinux 
    DUMPFILE: /dev/crash
        CPUS: 48
        DATE: Fri May 19 06:03:32 2017
      UPTIME: 00:27:40
LOAD AVERAGE: 0.46, 0.25, 0.15
       TASKS: 521
    NODENAME: dell-per730-03.khw.lab.eng.bos.redhat.com
     RELEASE: 3.10.0-663.el7.x86_64
     VERSION: #1 SMP Tue May 2 16:00:29 EDT 2017
     MACHINE: x86_64  (2200 Mhz)
      MEMORY: 63.9 GB
         PID: 14891
     COMMAND: "crash"
        TASK: ffff8969bf9b3ec0  [THREAD_INFO: ffff8968d37f0000]
         CPU: 13
       STATE: TASK_RUNNING (ACTIVE)

crash> 
crash> extend ptdump.so
/usr/lib64/crash/extensions/ptdump.so: shared object loaded
crash> ptdump /root/test2
ptdump: invalid size request: 0  type: "read page for write"
crash> 


Thanks,
Emma

Comment 15 Dave Anderson 2017-05-19 13:15:15 UTC

(In reply to fj-lsoft-kernel-it from comment #13)
> Hi Dave, 
> 
> Thank you for the fix!
> 
> > I have attached a scratch package that contains the proposed patch,
> > (as well as adding RPM_OPT_FLAGS to the ptdump.mk file for BZ #1450708).
> > 
> > Can you review it?
> 
> Yes.
> We reviewed the patch and confirmed the problem is fixed.
> 
> By the way, is it ok to forward your patch to crash ML 
> (crash-utility) 
> so we can fix the upstream ptdump too?
> 
> Regards,
> Yuki Inoguchi

There's really no need to post it on the mailing list given that:

 (1) the patch is from me, and
 (2) you (Fujitsu) are the maintainer

In other words I can just update the upstream package since you have
already ACK'd it.  

However, if there is somebody else (in Fujitsu?) that you would like to
ACK/review the patch, I can post it on the mailing list as well.  Let me
know whether you feel it is necessary.

Comment 16 Dave Anderson 2017-05-19 13:24:27 UTC

(In reply to Emma Wu from comment #14)
> Hi Dave and FJ,
> 
> I tested new the ptdump build: 
>    crash-ptdump-command-1.0.3-2.el7.x86_64.rpm
>    crash-7.1.9-2.el7
>    kernel: 3.10.0-663.el7 
> 
> It seems worked correctly. 
> crash> ptdump /root/test
> [0] ring buffer is zero
> [1] ring buffer is zero
> 
> But if I run it after running cmd "perf record -vv -a -T -e intel_pt// -S -o
> /dev/null &", it displays:
> 
> crash> ptdump /root/test2
> ptdump: invalid size request: 0  type: "read page for write"

Thanks, Emma.

The invalid size request of 0 bytes can come from 5 different places:

  File     Line
0 ptdump.c 315 readmem(page + offset, KVADDR, buf, len, "read page for write",
1 ptdump.c 335 readmem(page + offset, KVADDR, buf, len, "read page for write",
2 ptdump.c 354 readmem(page, KVADDR, buf, len, "read page for write",
3 ptdump.c 391 readmem(page, KVADDR, buf, len, "read page for write",
4 ptdump.c 409 readmem(page, KVADDR, buf, len, "read page for write",

This looks like a bug in functionality (or at least in error handling)
because presumably all of the readmem() calls are not expecting a
a "len" argument of 0.

This is going to have to be debugged by Fujitsu.

Comment 17 Fujitsu kernel team 2017-05-23 00:39:12 UTC

Hi Dave,

(In reply to Dave Anderson from comment #15)
> 
> There's really no need to post it on the mailing list given that:
> 
>  (1) the patch is from me, and
>  (2) you (Fujitsu) are the maintainer
> 
> In other words I can just update the upstream package since you have
> already ACK'd it.  

I understand that I don't need to post it on ML.
Please update the upstream package.

(In reply to Dave Anderson from comment #16)
> Thanks, Emma.
> 
> The invalid size request of 0 bytes can come from 5 different places:
> 
>   File     Line
> 0 ptdump.c 315 readmem(page + offset, KVADDR, buf, len, "read page for
> write",
> 1 ptdump.c 335 readmem(page + offset, KVADDR, buf, len, "read page for
> write",
> 2 ptdump.c 354 readmem(page, KVADDR, buf, len, "read page for write",
> 3 ptdump.c 391 readmem(page, KVADDR, buf, len, "read page for write",
> 4 ptdump.c 409 readmem(page, KVADDR, buf, len, "read page for write",
> 
> This looks like a bug in functionality (or at least in error handling)
> because presumably all of the readmem() calls are not expecting a
> a "len" argument of 0.
> 
> This is going to have to be debugged by Fujitsu.

This is a specification, not a bug.
ptdump command is supposed to be used while the crash is analyzing a vmcore, not the live system.

Therefore, if you would like to see the trace data, you need to collect the vmcore of the system at first.

Regards,
Yuki Inoguchi

Comment 18 Dave Anderson 2017-05-23 13:07:48 UTC

(In reply to fj-lsoft-kernel-it from comment #17)
> ...
> This is a specification, not a bug.
> ptdump command is supposed to be used while the crash is analyzing a vmcore,
> not the live system.
> 
> Therefore, if you would like to see the trace data, you need to collect the
> vmcore of the system at first.

OK, that make sense -- thanks!

Comment 19 Dave Anderson 2017-05-23 14:04:14 UTC

(In reply to fj-lsoft-kernel-it from comment #17)
> Hi Dave,
> 
> (In reply to Dave Anderson from comment #15)
> > 
> > There's really no need to post it on the mailing list given that:
> > 
> >  (1) the patch is from me, and
> >  (2) you (Fujitsu) are the maintainer
> > 
> > In other words I can just update the upstream package since you have
> > already ACK'd it.  
> 
> I understand that I don't need to post it on ML.
> Please update the upstream package.

I've bumped the upstream version to ptdump-1.0.5.tar.gz and posted 
it upstream:

  http://people.redhat.com/anderson/extensions.html#PTDUMP

Comment 20 Dave Anderson 2017-05-23 14:05:11 UTC

Emma, can you give this BZ a qa_ack+?

Comment 27 errata-xmlrpc 2017-08-01 07:32:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2288