Bug 445849 - 64-bit strace cannot follow 32-bit vfork
Summary: 64-bit strace cannot follow 32-bit vfork
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: strace
Version: 5.3
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Roland McGrath
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On: 447475
Blocks: 391501
TreeView+ depends on / blocked
 
Reported: 2008-05-09 12:26 UTC by Andrew Hecox
Modified: 2018-10-20 02:52 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 22:09:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0233 0 normal SHIPPED_LIVE strace bug-fix update 2009-01-20 16:06:35 UTC

Description Andrew Hecox 2008-05-09 12:26:57 UTC
+++ This bug was initially created as a clone of Bug #445219 +++

Escalated to Bugzilla from IssueTracker

-- Additional comment from tao on 2008-05-05 11:24 EST --
The 64-bit strace tool cannot follow a vfork() in a 32-bit process. It can
follow a normal fork(). In order to debug 32-bit programs on a 64-bit
architecture, it is desirable to have this support. 

In particular, this has come up debugging third-party ISV code targetted towards
multiple UNIX's, where vfork() is more commonly used.

Example program with i386 and x64 binaries are attached. 
This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
File uploaded: hello.c
This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184
it_file 127234

-- Additional comment from tao on 2008-05-05 11:24 EST --
File uploaded: hello32

This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184
it_file 127235

-- Additional comment from tao on 2008-05-05 11:24 EST --
File uploaded: hello64

This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184
it_file 127236

-- Additional comment from tao on 2008-05-05 11:24 EST --
Example output:

$ strace -f -F -q -e nanosleep ./hello32
[ Process PID=23186 runs in 32 bit mode. ]
[pid 23187] nanosleep({0, 1000}, NULL)  = 0
--- SIGCHLD (Child exited) @ 0 (0) ---

$ strace -f -F -q -e nanosleep ./hello64
[pid 23238] nanosleep({0, 1000}, NULL)  = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
[pid 23239] nanosleep({0, 1000}, NULL)  = 0

note the second nanosleep() call detected in the x86_64 version.




This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
Hi,

1. I believe biarch support in strace is a known issue. In BZ 218043,
Jakub Jelinek writes

<snip>

biarch support in current strace is clearly a hack, but until
if ever that is rewritten into a framework that can naturally cope with
this,
I wrote a hack solution just for struct iovec.

</snip>

There have been many such ITs and BZs for specific calls to be handled
correctly by the 64-bit strace on 32-bit applications.

2. SuSE ships strace32 as part of its strace (s)rpm.

Regards,
-- Pai


This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
Johnray,

due to the complexity of operation, it might be easier just to try the
32-bit strace on the production pgp binary. It should only need to be run
once to determine if vfork() is the offending call.

Thanks,

Andrew


This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
similar BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=250830
https://bugzilla.redhat.com/show_bug.cgi?id=65925
https://bugzilla.redhat.com/show_bug.cgi?id=126547

I particularly like this bit, from 65925:

 For this to work correctly one has basically redesign (best rewrite from
scratch)
 strace to give it a plugin architecture, so there would be modules which
would
 handle printing IA-32 syscalls, IA-64 syscalls, e.g. SPARC 32-bit,
64-bit
 syscalls etc. One precondition for this is also that strace must not use
 kernel headers, it must come with its own copies of kernel structures for
each
 supported target (or put it in a text file similar to ltrace.conf).

This is a different issue than 218043, though similiar.  Jakub Jelinek
clearly states this is  a hack for iovec, which effects the writev() and
readv() libc calls.  Perhaps we could get this to work for vfork as well.



This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
Well in this case we have two posibilities for RFE, one in which we ask for
the x86_64 strace binary to follow 32-bit vfork(), and the other in which
we ask for an i386 strace binary to be available for x86_64 systems.

Should I open a bug report?


This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
SEG,

the following strace issue is spun out of a different IT, for clarity. If
you need more business justification/custom impact I can point back to the
original.

All the work for a reproducer should be done, see the attached code and
binaries. Similar ITs are referenced for context. 

   1. Provide time and date of the problem

anytime.

   2. Provide clear and concise problem description as it is understood at
the time of escalation
          * Observed behavior
          * Desired behavior 

The 64-bit version of strace doesn't follow vforks in 32-bit programs. We
would like it to do so.

   3. State specific action requested of SEG

I think this is ready to be escalated to BZ.

   4. State whether or not a defect in the product is suspected

yes -- it seems to be well known that such defects exist. I think this is
part of just tracking them down and identifying them.

          * Provide Bugzilla if one already exists 
   5. If there is a proposed patch, make sure it is in unified diff format
(diff -pruN)

no patch.

   6. Refrain from using the word "hang", as it can mean different
things to different people in different contexts. Use a better and more
specific description of your problem.

ok.

   7. This is especially important for severity one and two issues. What
is the impact to the customer when they experience this problem?

The customer is experiencing issues running a 32-bit close source program
on a 64-bit system but without strace we're having a hard time tracking
down the issue. 


Issue escalated to Support Engineering Group by: ahecox.
Internal Status set to 'Waiting on SEG'
Ticket type changed from 'Feature Request' to 'Problem'

This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
I've looked into the issue a little more deeply.  The problem is that on
x86_64 processors, the rax register gets a little messed.  

in get_error() on line 1468 syscall.c:

    if (rax < 0 && -rax < nerrnos) {
        tcp->u_rval = -1;
        u_error = -rax;
    }

You can see that u_error gets the value of rax if rax < 0, which is what
happens on my system during tracing vfork(). Later on...

in internal_clone() on 827 of process.c:

    if (syserror(tcp)) {
	if (bpt)
            clearbpt(tcp);
            return 0;
	}
    }

This is where the vfork tracing terminates, because of the syserror set
above,  so we don't get any syscalls in the child process.

Still trying to figure out why rax is so messed.  It's either an error in
ptrace () or maybe there is some garbage code in the higher order part of
rax that isn't used by eax



This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
File uploaded: gs-linux-infosec-ebs81-8.1.0-3.i386.rpm

This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184
it_file 128028

-- Additional comment from tao on 2008-05-05 11:24 EST --
to set this up in a lab, you will need a pgp.cfg that points to the
license:

# grep -v ^# pgp.cfg | grep -v ^$
INFO=verbose
RANDOM-DEVICE = /dev/random
TMP = /tmp
LICENSEFILE="/local/opt/ebs-8.1.0/ebs/EBusSvr.lic"


The pgp command is run something like:
   ulimit -f 4000;  cd /usr/jail/client/files; \\
   /opt/ebs-8.1.0/ebs/pgp --encrypt --force --armor --overwrite  --sign 
\\
   --sign-with 'redhat user <redhat>'            \\
   --passphrase 'redhat passphrase'                    \\
   --pubring  /home/pgpuser/.pgp/$HOSTNAME/pubring.pkr    \\
   --secring  /home/pgpuser/.pgp/$HOSTNAME/secring.skr    \\
   --randseed /home/pgpuser/.pgp/$HOSTNAME/randseed.rnd   \\
   --info debug  --user \\"myname <myuser>\\"      \\
   --output $TEST_FILE_PGP      $TEST_FILE

(you'd have to make your own keys of course)


This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
The PGP binary is in the attached RPM. I have not tried to install it on a
vanilla system, so if it won't install, try the following to unpack it:

rpm2cpio <RPM>.rpm | cpio -id

Johnray


This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
By the way, this worked with the standard (64-bit) strace.

The documentation for strace indicates that -F is not implemented in
linux.  Thus it was never tried.

So you can close this particular ticket once you update the manpage for
strace.


This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
Hey Rod,

I'll get the manpage fixed, though there is an existing issue with
following vforks() on 64-32bit binaries. 

For the pgp app though, are you saying that simply using -F made it work?


Thanks,

Andrew


This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
Yes, 64 bit strace with -F was able to follow 32-bit pgp binary.

I can't tell if the output was complete, but at least with -F we can
follow the worker enough to see the read from disk, etc.




This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
SEG,

this ticket conversation has gotten a bit de-railed from the original
point. The vfork() issue still exists within strace, I'd like to continue
that escalation. Please let me know if you need any other info.

Thanks,

Andrew

Internal Status set to 'Waiting on SEG'
Status set to: Waiting on Tech

This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

-- Additional comment from tao on 2008-05-05 11:24 EST --
Steven,

can you please kick this issue up to BZ? Engineering has been aware of the
issue for several weeks now.

Thanks,

Andrew


This event sent from IssueTracker by sfernand  [Support Engineering Group]
 issue 173184

Comment 3 RHEL Program Management 2008-06-02 20:01:15 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Roland McGrath 2008-08-29 00:26:47 UTC
built 4.5.18-1.el5

Comment 10 errata-xmlrpc 2009-01-20 22:09:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0233.html


Note You need to log in before you can comment on or make changes to this bug.