Bug 120986

Summary: op_to_source fails with IOQ and FSB event types
Product: Red Hat Enterprise Linux 3 Reporter: Tom Lane <tgl>
Component: oprofileAssignee: William Cohen <wcohen>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: hhorak
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 0.5.4-20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-04-16 18:58:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
handle the bitwise ors of unit-masks when printing descriptions none

Description Tom Lane 2004-04-15 20:59:15 UTC
Description of problem:
I've been unable to get useful output out of op_to_source when trying
to profile bus activity.

Version-Release number of selected component (if applicable):
oprofile-0.5.4-13 running on RHEL3 Update 1.  Hardware is dual Xeon.

How reproducible:


Steps to Reproduce:
1. opcontrol --setup --no-vmlinux --ctr4-event=IOQ_ACTIVE_ENTRIES
--ctr4-count=1000000 --ctr4-unit-mask=0xAFE1
--ctr0-event=IOQ_ALLOCATION --ctr0-count=1000000 --ctr0-unit-mask=0xAFE1
2.  trace something and shut down
3.  op_to_source -i executable
  
Actual results:
output ends halfway through description of active counters

Expected results:
some profile info ...

Additional info:
Similar failure occurs with FSB_DATA_ACTIVITY event type.  It looks
like op_to_source is trying to pretty-print the event mask but curls
up and dies instead.

Comment 1 William Cohen 2004-04-16 13:57:51 UTC
I quickly tried to replicate the problem on my athlon test machine. It
was able to do op_to_source without problem with two events being
measured. The problem is either tied to specific events/machine being
used or the program being analyzed. I would like to find out what is
going wrong. What precisely is meant by "output ends halfway through
description of active counters?"

Could you install the oprofile-debuginfo-0.5.4-13.i386.rpm on your
machine? I have put a copy at
http://people.redhat.com/wcohen/oprofile/oprofile-debuginfo-0.5.4-13.i386.rpm.
Then do the following commands:

gdb /usr/bin/op_to_source
run -i executable

When it dies get a trace back in gdb from:

where

Post the output of the gdb traceback on this bugzilla entry.



Comment 2 Tom Lane 2004-04-16 14:42:20 UTC
I mean that the output looks like this:

[postgres@perc4-db pgsql]$ cat opsource.run10
/*
 * Command line: op_to_source -i /usr/local/pgsql-7.4.1/bin/postgres
 *
 * Interpretation of command line:
 * Output annotated source file with samples
 * Output all files
 *
 * Cpu type: P4 / Xeon
 * Cpu speed (MHz estimation) : 1993.62
 *
 * Counter 0 disabled
 * Counter 1 disabled
 * Counter 2 disabled
 * Counter 3 disabled
 *
 * Counter 4 counted FSB_DATA_ACTIVITY events (DRDY or DBSY events on
the front
side bus) with a unit mask of 0x03 ([postgres@perc4-db pgsql]$

that is, it stops after the left paren following the numeric mask
value.

After some further experimentation, I find that it doesn't seem to be
dumping core or anything --- there is no corefile and the process exit
status is zero.  gdb indicates that main() is returning zero.  If you
can suggest other places to set breakpoints I'll be happy to look...

Comment 3 William Cohen 2004-04-16 16:23:21 UTC
I have been able to replicate this on the athlon.

[wcohen@slingshot SPECS]$ op_to_source -i /usr/bin/oprofiled |more
/*
 * Command line: op_to_source -i /usr/bin/oprofiled
 *
 * Interpretation of command line:
 * Output annotated source file with samples
 * Output all files
 *
 * Cpu type: Athlon
 * Cpu speed (MHz estimation) : 1687.55
 *
 *
 * Counter 0 counted CPU_CLK_UNHALTED events (Cycles outside of halt
state) with a unit mask of 0x00 (No unit mask) count 1000000
 * Total samples : 87
 *
 * Counter 1 counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache
refills from system) with a unit mask of 0x07 (
[wcohen@slingshot SPECS]$

The problem is in the libop++/op_print_event.cpp:.op_print_event(). It
can't handle the unit-mask being a bitwise "or" of several options.


Comment 4 Tom Lane 2004-04-16 16:53:45 UTC
Ah-hah.  I suppose a real solution will take a little bit of work, but
for my immediate problem it'd be fine if it'd just print "???" or some
such.  Are you contemplating a quick fix, or should I go off and hack
my own patched RPM?

Comment 5 William Cohen 2004-04-16 17:00:06 UTC
Created attachment 99486 [details]
handle the bitwise ors of unit-masks when printing descriptions

This essentially backports the printing of "multiple flags" from the 0.8
oprofile cvs to the 0.5.4 oprofile.

This patch has been checked in and rpms oprofile-0.5.4-20 and later will have
this patch.

Comment 6 Tom Lane 2004-04-16 17:33:29 UTC
Looks great.  Could I pester you to spin -20 immediately, so I could
grab the RPM from beehive?  I'm looking into a performance issue on a
customer's machine, and that test setup is only going to be available
to me for a limited time.  Thanks ...