Bug 592887

Summary: trace show -c <CPU> causes seg fault
Product: Red Hat Enterprise Linux 6 Reporter: Caspar Zhang <czhang>
Component: crash-trace-commandAssignee: Dave Anderson <anderson>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: laijs, phan, qcai, snagar
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: crash-trace-command-1.0-3.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-10 20:04:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 591876    

Description Caspar Zhang 2010-05-17 10:12:01 UTC
Description of problem:

When I use trace extension in crash, I meet a segment fault by executing `trace show -c <CPU>' command, the whole crash application exited.

Version-Release number of selected component (if applicable):
crash 5.0.0-13.el6
crash-trace-command 1.0-1.2.el6

How reproducible:
about 80%

Steps to Reproduce:
1. crash kernel to capture a vmcore and run crash
2. extend path/to/trace.so
3. execute trace show -c 0 or other trace show commands
  
Actual results:
crash exited and gave seg fault

Expected results:
crash didn't exit, no seg fault.

Additional info:
I'm not familiar with crash and the new package, if I mis-used the trace show command, please tell me :-(

Comment 2 Dave Anderson 2010-05-17 15:21:23 UTC
> I'm not familiar with crash and the new package, if I mis-used the
> trace show command, please tell me :-(

I'm not familiar with the workings of the trace.c extension module,
but I act as a "proxy" maintainer of the package for Fujitsu.

I'm presuming that it is also "legal" to attempt this command on a
live system, which also causes a segmentation violation:

# rpm -qa | grep crash
crash-devel-5.0.0-14.el6.i686
crash-5.0.0-14.el6.i686
crash-debuginfo-5.0.0-14.el6.i686
crash-trace-command-1.0-1.2.el6.i686
#
# crash

crash 5.0.0-14.el6
Copyright (C) 2002-2010  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-19.el6.i686/vmlinux
    DUMPFILE: /dev/crash
        CPUS: 4
        DATE: Mon May 17 10:55:49 2010
      UPTIME: 23:54:23
LOAD AVERAGE: 0.25, 0.07, 0.02
       TASKS: 151
    NODENAME: intel-mccreary-02.lab.bos.redhat.com
     RELEASE: 2.6.32-19.el6.i686
     VERSION: #1 SMP Tue Mar 9 18:10:40 EST 2010
     MACHINE: i686  (2826 Mhz)
      MEMORY: 1.9 GB
         PID: 25442
     COMMAND: "crash"
        TASK: f3f0fa90  [THREAD_INFO: f5b80000]
         CPU: 0
       STATE: TASK_RUNNING (ACTIVE)

crash> extend trace.so
/usr/lib/crash/extensions/trace.so: shared object loaded
crash> trace show -c 2
<Unknown event type>
<Unknown event type>
corrupt
Segmentation fault (core dumped)
#

Also, if I build the current upstream version of crash (5.0.3), and
the trace.c extension module that comes with it, a segmentation
violation also occurs:

# wget http://people.redhat.com/anderson/crash-5.0.3.tar.gz
... [ snip ] ...
# tar xvzmf crash-5.0.3.tar.gz
... [ snip ] ...
# cd crash-5.0.3
# make
... [ snip ] ...
# make extensions
... [ snip ] ...
gcc -nostartfiles -shared -rdynamic -o trace.so trace.c -fPIC -DX86 -D_FILE_OFFSET_BITS=64
# ./crash

crash 5.0.3
Copyright (C) 2002-2010  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-19.el6.i686/vmlinux
    DUMPFILE: /dev/crash
        CPUS: 4
        DATE: Mon May 17 11:17:03 2010
      UPTIME: 1 days, 00:15:37
LOAD AVERAGE: 0.21, 0.23, 0.11
       TASKS: 151
    NODENAME: intel-mccreary-02.lab.bos.redhat.com
     RELEASE: 2.6.32-19.el6.i686
     VERSION: #1 SMP Tue Mar 9 18:10:40 EST 2010
     MACHINE: i686  (2826 Mhz)
      MEMORY: 1.9 GB
         PID: 17546
     COMMAND: "crash"
        TASK: f5887030  [THREAD_INFO: ef574000]
         CPU: 1
       STATE: TASK_RUNNING (ACTIVE)

crash> extend extensions/trace.so
./extensions/trace.so: shared object loaded
crash> trace show -c 2
Segmentation fault (core dumped)
# 


The author of the trace.c extension module is laijs.com,
who does not have a Red Hat bugzilla account.  I will forward him
this information for his take on the matter.

Comment 3 Dave Anderson 2010-05-18 13:26:31 UTC
> The author of the trace.c extension module is laijs.com,
> who does not have a Red Hat bugzilla account.  I will forward him
> this information for his take on the matter.  

This is the email response from Lai Jiangshan -- thanks Lai!

-----------------------------------------------------------------------

Subject: Re: crash utility trace extension module bug
     To: anderson; Caspar Zhang
Sent By: Lai Jiangshan
     On: May 18, 2010 7:34 AM

I have already found how this bug happen, I will send
a patch in 48 hours.

Also, I have created a bugzilla account.
Thanks a lot.

Lai.


Dave Anderson wrote:
> Hello Lai,
>
> I was assigned a Red Hat bugzilla today, for which I am asking your
> assistance:
>

[...]

>
>
> Anyway, I don't know whether the reporter had any trace-point activity
> in his vmcore, but on my live system attempt, I did nothing but run
> the crash utility, install the module, and run the command.
>
> I don't know if the problem is due to there being no tracepoint activity
> in the kernel, or if the tracepoint code in the kernel has changed since
> you wrote the extension module?
>
> In any case, can you:
>
>  (1) create a bugzilla account, and
>  (2) post your response in it?
>
> Thanks very much,
>   Dave Anderson

Comment 4 Lai Jiangshan 2010-05-20 06:34:05 UTC
Subject: [PATCH] crash-trace-command: fix accessing uninitialized data

Caspar Zhang reported that trace show -c <CPU> causes seg fault.
It's because the path is accessing some uninitialized data and causes seg fault.

Reported-by: Caspar Zhang <czhang>
Signed-off-by: Lai Jiangshan <laijs.com>
---
diff --git a/extensions/trace.c b/extensions/trace.c
index 975756b..89eb477 100755
--- a/extensions/trace.c
+++ b/extensions/trace.c
@@ -279,8 +279,12 @@ static void ftrace_destroy_buffers(struct ring_buffer_per_cpu *buffers)
 {
 	int i;
 
-	for (i = 0; i < nr_cpu_ids; i++)
+	for (i = 0; i < nr_cpu_ids; i++) {
+		if (!buffers[i].kaddr)
+			continue;
+
 		free(buffers[i].pages);
+	}
 }
 
 static int ftrace_init_buffers(struct ring_buffer_per_cpu *buffers,
@@ -913,6 +917,7 @@ static int ftrace_dump_event_types(const char *events_path)
 }
 
 struct ring_buffer_per_cpu_stream {
+	struct ring_buffer_per_cpu *cpu_buffer;
 	ulong *pages;
 	void *curr_page;
 	int available_pages;
@@ -929,6 +934,7 @@ int ring_buffer_per_cpu_stream_init(struct ring_buffer_per_cpu *cpu_buffer,
 {
 	unsigned i, count = 0;
 
+	s->cpu_buffer = cpu_buffer;
 	s->curr_page = malloc(PAGESIZE());
 	if (s->curr_page == NULL)
 		return -1;
@@ -1104,9 +1110,7 @@ static void __rbs_destroy(struct ring_buffer_stream *s, int *cpulist, int nr)
 	int cpu;
 
 	for (cpu = 0; cpu < nr; cpu++) {
-		if (!global_buffers[cpu].kaddr)
-			continue;
-		if (cpulist && !cpulist[cpu])
+		if (!s->ss[cpu].cpu_buffer)
 			continue;
 
 		ring_buffer_per_cpu_stream_destroy(s->ss + cpu);
@@ -1132,6 +1136,7 @@ int ring_buffer_stream_init(struct ring_buffer_stream *s, int *cpulist)
 	}
 
 	for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+		s->ss[cpu].cpu_buffer = NULL;
 		s->es[cpu].data = NULL;
 
 		if (!global_buffers[cpu].kaddr)
@@ -1183,7 +1188,7 @@ static int ring_buffer_stream_pop_event(struct ring_buffer_stream *s,
 
 	if (s->popped_cpu == nr_cpu_ids) {
 		for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
-			if (!global_buffers[cpu].kaddr)
+			if (!s->ss[cpu].cpu_buffer)
 				continue;
 
 			ring_buffer_per_cpu_stream_pop_event(s->ss + cpu,

Comment 5 Dave Anderson 2010-05-20 14:52:58 UTC
Lai,

Should your follow-up "generate trace.dat from core-file" patch be 
also applied into the RHEL6 version with the seg-fault fix?

Dave

Comment 6 Lai Jiangshan 2010-05-21 09:24:45 UTC
Dave,

Yes, please apply the "generate trace.dat from core-file" patch
into RHEL6, Thanks.

Lai

Comment 9 Caspar Zhang 2010-06-08 07:44:23 UTC
I can't execute 'extend /path/to/trace.so' to load trace.so module successfully in new version, seems that a regression? new bug is filed at: https://bugzilla.redhat.com/show_bug.cgi?id=601536

Comment 10 Dave Anderson 2010-06-08 15:07:27 UTC
> seems that a regression?

Good catch -- it's most definitely a regression, but it's not associated with the
fix for segfault issue reported in this bugzilla.  It's  caused by the additional
patch requested in comment #6.

Comment 12 Caspar Zhang 2010-07-01 09:18:33 UTC
Verified that the bug has been fixed in 1.0-3.el6.

Comment 13 releng-rhel@redhat.com 2010-11-10 20:04:32 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.