Bug 1440726

Summary: crash-trace-command: fails to load trace.so on ppc64 (big-endian machines)
Product: Red Hat Enterprise Linux 7 Reporter: Emma Wu <xiawu>
Component: crash-trace-commandAssignee: Dave Anderson <anderson>
Status: CLOSED ERRATA QA Contact: Emma Wu <xiawu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: feij.fnst, ksanagi, panand, qzhao, tumeya, yishimat
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64   
OS: Unspecified   
Whiteboard:
Fixed In Version: crash-trace-command-2.0-12.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 12:50:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1446211    

Comment 2 Dave Anderson 2017-04-10 14:15:45 UTC
Please try

Comment 4 Dave Anderson 2017-04-10 15:07:36 UTC
This patch fixes the ppc64 issue:

--- a/extensions/trace.c
+++ b/extensions/trace.c
@@ -198,8 +198,13 @@ static int init_offsets(void)
 			fprintf(fp, "per cpu buffer sizes\n");
 	}
 
+#ifdef PPC64
+	if (kernel_symbol_exists(".ring_buffer_read"))
+		gdb_set_crash_scope(symbol_value(".ring_buffer_read"), ".ring_buffer_read");
+#else
 	if (kernel_symbol_exists("ring_buffer_read"))
 		gdb_set_crash_scope(symbol_value("ring_buffer_read"), "ring_buffer_read");
+#endif
 
 	if (!per_cpu_buffer_sizes)
 		init_offset(ring_buffer, pages);


I am a proxy maintainer for the crash-trace-command, which is owned and
maintained by Fujitsu.  Jie Fei (feij.fnst.com) is the 
maintainer of the upstream trace.c package.

Jie Fei, do you agree with the patch?

Comment 5 Dave Anderson 2017-04-10 15:34:33 UTC
With the patch applied, it gets past the ring_buffer_read issue, but on
a live system running 3.10.0-632.el7, it still fails like so:

crash> extend extensions/trace.so
extend: Num of pages is less than 0
extend: ./extensions/trace.so: no commands registered: shared object unloaded
crash> 

Jie Fei, is that how the module should behave on a live system?

Comment 6 Dave Anderson 2017-04-10 19:01:15 UTC
Note: the "Num of pages is less than 0" error only happens on ppc64.

Comment 7 Dave Anderson 2017-04-11 18:21:44 UTC
The "Num of pages is less than 0" bug occurs on big-endian machines like
ppc64 because of this upstream patch that was recently backported to rhel7,
which changes the size of the ring_buffer_per_cpu.nr_pages field from
an int to a long:.

  commit 370f726386ed3957c34895695672723fd952808c
  Author: Pratyush Anand <panand>
  Date:   Wed Feb 8 06:28:39 2017 -0500

    [kernel] ring-buffer: Use long for nr_pages to avoid overflow failures
    
    Message-id: <bc38a0011de031e2b2c296c386d97933814e1de2.1486534743.git.panand>
    Patchwork-id: 165314
    O-Subject: [RHEL7.4 PATCH 1/2] ring-buffer: Use long for nr_pages to avoid overflow failures
    Bugzilla: 1339451
    RH-Acked-by: Scott Wood <swood>
    RH-Acked-by: Josh Poimboeuf <jpoimboe>
    RH-Acked-by: Jiri Olsa <jolsa>
    
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1339451
    
    commit 9b94a8fba501f38368aef6ac1b30e7335252a220
    Author: Steven Rostedt (Red Hat) <rostedt>
    Date:   Thu May 12 11:01:24 2016 -0400
    
        ring-buffer: Use long for nr_pages to avoid overflow failures
    
        The size variable to change the ring buffer in ftrace is a long. The
        nr_pages used to update the ring buffer based on the size is int. On 64 bit
        machines this can cause an overflow problem.
    
        For example, the following will cause the ring buffer to crash:
    
         # cd /sys/kernel/debug/tracing
         # echo 10 > buffer_size_kb
         # echo 8556384240 > buffer_size_kb
    
        Then you get the warning of:
    
         WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260
    
        Which is:
    
          RB_WARN_ON(cpu_buffer, nr_removed);
    
        Note each ring buffer page holds 4080 bytes.
    
        This is because:
    
         1) 10 causes the ring buffer to have 3 pages.
            (10kb requires 3 * 4080 pages to hold)
    
         2) (2^31 / 2^10  + 1) * 4080 = 8556384240
            The value written into buffer_size_kb is shifted by 10 and then passed
            to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760
    
         3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
            which is 4080. 8761737461760 / 4080 = 2147484672
    
         4) nr_pages is subtracted from the current nr_pages (3) and we get:
            2147484669. This value is saved in a signed integer nr_pages_to_update
    
         5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
            turns into the value of -2147482627
    
         6) As the value is a negative number, in update_pages_handler() it is
            negated and passed to rb_remove_pages() and 2147482627 pages will
            be removed, which is much larger than 3 and it causes the warning
            because not all the pages asked to be removed were removed.
    
        Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001
    
        Cc: stable.org # 2.6.28+
        Fixes: 7a8e76a3829f1 ("tracing: unified trace buffer")
        Reported-by: Hao Qin <QEver.cn>
        Signed-off-by: Steven Rostedt <rostedt>
    
    - Resolved Minor conflicts in kernel/trace/ring_buffer.c
    
    Signed-off-by: Pratyush Anand <panand>
    Signed-off-by: Rafael Aquini <aquini>

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 4b1cfb4..a546fb7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -463,7 +463,7 @@ struct ring_buffer_per_cpu {
 	raw_spinlock_t			reader_lock;	/* serialize readers */
 	arch_spinlock_t			lock;
 	struct lock_class_key		lock_key;
-	unsigned int			nr_pages;
+	unsigned long			nr_pages;
 	struct list_head		*pages;
 	struct buffer_page		*head_page;	/* read from head */
 	struct buffer_page		*tail_page;	/* write to tail */
@@ -483,7 +483,7 @@ struct ring_buffer_per_cpu {
 	u64				write_stamp;
 	u64				read_stamp;
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
-	int				nr_pages_to_update;
+	long				nr_pages_to_update;
 	struct list_head		new_pages; /* new pages to add */
 	struct work_struct		update_pages_work;
 	struct completion		update_done;
@@ -1110,10 +1110,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	return 0;
 }

 ...


When the trace.c module's ftrace_init_buffers() function reads the nr_pages
values for each cpu from the kernel, it reads a 32-bit value, because it
defines its own nr_pages field here: 

struct ring_buffer_per_cpu {
        ulong kaddr;

        ulong head_page;
        ulong tail_page;
        ulong commit_page;
        ulong reader_page;
        ulong real_head_page;

        int head_page_index;
        unsigned int nr_pages;

And then here, when it calls buffer_read_value(nr_pages), it reads a
32-bit quantity:

                if (per_cpu_buffer_sizes) {
   ===>                 buffer_read_value(nr_pages);
                        pages = buffers[i].nr_pages;
                } else
                        buffers[i].nr_pages = pages;

This still works OK for little-endian machines, but not for big-endian
machines like ppc64, where it ends up reading just the high 32-bits,
i.e., the upper 00000000 from the long value of 0000000000000002.

I'm not sure how you want to fix this in order to maintain backwards
compatibility?

Please advise us when you have a patch to test.

Comment 8 Dave Anderson 2017-04-12 14:31:30 UTC
  
FYI: my patch suggestion in comment 4 is not correct.  The "ifdef PPC64"
will work for both ppc64 and ppc64le.  However, the usage of the leading "."
for text symbols is only used by ppc64:

  # uname -m
  ppc64
  # cat /proc/kallsyms | grep "[ .]ring_buffer_read$"
  c0000000001f1030 T .ring_buffer_read
  c0000000013d8080 D ring_buffer_read
  #
  
But note that ppc64le does not use that construct for text symbols:
  
  # uname -m
  ppc64le
  # cat /proc/kallsyms | grep "[ .]ring_buffer_read$"
  c0000000001f6d10 T ring_buffer_read
  #

So the patch will have to check for ".ring_buffer_read", but if it
does not exist, then still use "ring_buffer_read".

So again, there are two bugs with the trace.c module:

 (1) the ".ring_buffer_read" check for ppc64
 (2) the "nr_pages" member size change 
  


Anwway, can someone from Fujitsu please respond to this bugzilla to
acknowledge the request to fix these bugs?

Thanks,
  Dave

Comment 9 Fei Jie 2017-04-13 08:20:43 UTC
(In reply to Dave Anderson from comment #8)
>   
> FYI: my patch suggestion in comment 4 is not correct.  The "ifdef PPC64"
> will work for both ppc64 and ppc64le.  However, the usage of the leading "."
> for text symbols is only used by ppc64:
> 
>   # uname -m
>   ppc64
>   # cat /proc/kallsyms | grep "[ .]ring_buffer_read$"
>   c0000000001f1030 T .ring_buffer_read
>   c0000000013d8080 D ring_buffer_read
>   #
>   
> But note that ppc64le does not use that construct for text symbols:
>   
>   # uname -m
>   ppc64le
>   # cat /proc/kallsyms | grep "[ .]ring_buffer_read$"
>   c0000000001f6d10 T ring_buffer_read
>   #
> 
> So the patch will have to check for ".ring_buffer_read", but if it
> does not exist, then still use "ring_buffer_read".
> 
> So again, there are two bugs with the trace.c module:
> 
>  (1) the ".ring_buffer_read" check for ppc64
>  (2) the "nr_pages" member size change 
>   
> 
> 
> Anwway, can someone from Fujitsu please respond to this bugzilla to
> acknowledge the request to fix these bugs?
> 
> Thanks,
>   Dave

Thanks Dave,
Sorry for the delay. 
For the "nr_pages" member size bug, I think we can use the MEMORY_SIZE to check the size of nr_pages in kernel to decide if it is unsigned int or unsigned long, then read data according to its size and assign to local nr_pages in trace.c.

Comment 10 Fei Jie 2017-04-13 09:01:23 UTC
(In reply to Fei Jie from comment #9)
> (In reply to Dave Anderson from comment #8)
> >   
> > FYI: my patch suggestion in comment 4 is not correct.  The "ifdef PPC64"
> > will work for both ppc64 and ppc64le.  However, the usage of the leading "."
> > for text symbols is only used by ppc64:
> > 
> >   # uname -m
> >   ppc64
> >   # cat /proc/kallsyms | grep "[ .]ring_buffer_read$"
> >   c0000000001f1030 T .ring_buffer_read
> >   c0000000013d8080 D ring_buffer_read
> >   #
> >   
> > But note that ppc64le does not use that construct for text symbols:
> >   
> >   # uname -m
> >   ppc64le
> >   # cat /proc/kallsyms | grep "[ .]ring_buffer_read$"
> >   c0000000001f6d10 T ring_buffer_read
> >   #
> > 
> > So the patch will have to check for ".ring_buffer_read", but if it
> > does not exist, then still use "ring_buffer_read".
> > 
> > So again, there are two bugs with the trace.c module:
> > 
> >  (1) the ".ring_buffer_read" check for ppc64
> >  (2) the "nr_pages" member size change 
> >   
> > 
> > 
> > Anwway, can someone from Fujitsu please respond to this bugzilla to
> > acknowledge the request to fix these bugs?
> > 
> > Thanks,
> >   Dave
> 
> Thanks Dave,
> Sorry for the delay. 
> For the "nr_pages" member size bug, I think we can use the MEMORY_SIZE to
> check the size of nr_pages in kernel to decide if it is unsigned int or
> unsigned long, then read data according to its size and assign to local
> nr_pages in trace.c.

typo:MEMORY_SIZE->MEMBER_SIZE

Comment 11 Dave Anderson 2017-04-13 13:35:17 UTC
> typo:MEMORY_SIZE->MEMBER_SIZE

Yes, that should definitely work.

I wasn't clear whether the local "nr_pages", and the "pages" arguments
passed around to the various functions should also be change to be longs?
But that's up to you.  

I also received a new s390x (also big-endian) BZ #1441914 that is 
probably related to the same issue.  I am currently provisioning an
s390x machine to look into it.

I understand that you most likely do not have ppc64 or s390x machines
available for testing, so if you can provide test patches, I can test
them here for you.

Thanks,
  Dave

Comment 12 Dave Anderson 2017-04-13 14:24:45 UTC
Fei Jie,

Please provide a patch to the upstream version of trace.c, because
the rhel7 crash-trace-command-2.0-10.el7 package needs to be updated
to that version.

Thanks,
  Dave

Comment 13 Fei Jie 2017-04-14 06:55:11 UTC
Hi Dave,

I've sent the patch about nr_pages member size, and I agree with you about your idea and patch for the ring_buffer_read issue.

Thanks

Comment 14 Dave Anderson 2017-04-14 12:51:25 UTC
(In reply to Fei Jie from comment #13)
> Hi Dave,
> 
> I've sent the patch about nr_pages member size, and I agree with you about
> your idea and patch for the ring_buffer_read issue.
> 
> Thanks

OK great!  I will add a .ring_buffer_issue fix to your patch.  Thanks 
for the quick response.

Dave

Comment 20 errata-xmlrpc 2017-08-01 12:50:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2267