+++ This bug was initially created as a clone of Bug #445219 +++ Escalated to Bugzilla from IssueTracker -- Additional comment from tao on 2008-05-05 11:24 EST -- The 64-bit strace tool cannot follow a vfork() in a 32-bit process. It can follow a normal fork(). In order to debug 32-bit programs on a 64-bit architecture, it is desirable to have this support. In particular, this has come up debugging third-party ISV code targetted towards multiple UNIX's, where vfork() is more commonly used. Example program with i386 and x64 binaries are attached. This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- File uploaded: hello.c This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 it_file 127234 -- Additional comment from tao on 2008-05-05 11:24 EST -- File uploaded: hello32 This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 it_file 127235 -- Additional comment from tao on 2008-05-05 11:24 EST -- File uploaded: hello64 This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 it_file 127236 -- Additional comment from tao on 2008-05-05 11:24 EST -- Example output: $ strace -f -F -q -e nanosleep ./hello32 [ Process PID=23186 runs in 32 bit mode. ] [pid 23187] nanosleep({0, 1000}, NULL) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- $ strace -f -F -q -e nanosleep ./hello64 [pid 23238] nanosleep({0, 1000}, NULL) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- [pid 23239] nanosleep({0, 1000}, NULL) = 0 note the second nanosleep() call detected in the x86_64 version. This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- Hi, 1. I believe biarch support in strace is a known issue. In BZ 218043, Jakub Jelinek writes <snip> biarch support in current strace is clearly a hack, but until if ever that is rewritten into a framework that can naturally cope with this, I wrote a hack solution just for struct iovec. </snip> There have been many such ITs and BZs for specific calls to be handled correctly by the 64-bit strace on 32-bit applications. 2. SuSE ships strace32 as part of its strace (s)rpm. Regards, -- Pai This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- Johnray, due to the complexity of operation, it might be easier just to try the 32-bit strace on the production pgp binary. It should only need to be run once to determine if vfork() is the offending call. Thanks, Andrew This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- similar BZ: https://bugzilla.redhat.com/show_bug.cgi?id=250830 https://bugzilla.redhat.com/show_bug.cgi?id=65925 https://bugzilla.redhat.com/show_bug.cgi?id=126547 I particularly like this bit, from 65925: For this to work correctly one has basically redesign (best rewrite from scratch) strace to give it a plugin architecture, so there would be modules which would handle printing IA-32 syscalls, IA-64 syscalls, e.g. SPARC 32-bit, 64-bit syscalls etc. One precondition for this is also that strace must not use kernel headers, it must come with its own copies of kernel structures for each supported target (or put it in a text file similar to ltrace.conf). This is a different issue than 218043, though similiar. Jakub Jelinek clearly states this is a hack for iovec, which effects the writev() and readv() libc calls. Perhaps we could get this to work for vfork as well. This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- Well in this case we have two posibilities for RFE, one in which we ask for the x86_64 strace binary to follow 32-bit vfork(), and the other in which we ask for an i386 strace binary to be available for x86_64 systems. Should I open a bug report? This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- SEG, the following strace issue is spun out of a different IT, for clarity. If you need more business justification/custom impact I can point back to the original. All the work for a reproducer should be done, see the attached code and binaries. Similar ITs are referenced for context. 1. Provide time and date of the problem anytime. 2. Provide clear and concise problem description as it is understood at the time of escalation * Observed behavior * Desired behavior The 64-bit version of strace doesn't follow vforks in 32-bit programs. We would like it to do so. 3. State specific action requested of SEG I think this is ready to be escalated to BZ. 4. State whether or not a defect in the product is suspected yes -- it seems to be well known that such defects exist. I think this is part of just tracking them down and identifying them. * Provide Bugzilla if one already exists 5. If there is a proposed patch, make sure it is in unified diff format (diff -pruN) no patch. 6. Refrain from using the word "hang", as it can mean different things to different people in different contexts. Use a better and more specific description of your problem. ok. 7. This is especially important for severity one and two issues. What is the impact to the customer when they experience this problem? The customer is experiencing issues running a 32-bit close source program on a 64-bit system but without strace we're having a hard time tracking down the issue. Issue escalated to Support Engineering Group by: ahecox. Internal Status set to 'Waiting on SEG' Ticket type changed from 'Feature Request' to 'Problem' This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- I've looked into the issue a little more deeply. The problem is that on x86_64 processors, the rax register gets a little messed. in get_error() on line 1468 syscall.c: if (rax < 0 && -rax < nerrnos) { tcp->u_rval = -1; u_error = -rax; } You can see that u_error gets the value of rax if rax < 0, which is what happens on my system during tracing vfork(). Later on... in internal_clone() on 827 of process.c: if (syserror(tcp)) { if (bpt) clearbpt(tcp); return 0; } } This is where the vfork tracing terminates, because of the syserror set above, so we don't get any syscalls in the child process. Still trying to figure out why rax is so messed. It's either an error in ptrace () or maybe there is some garbage code in the higher order part of rax that isn't used by eax This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- File uploaded: gs-linux-infosec-ebs81-8.1.0-3.i386.rpm This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 it_file 128028 -- Additional comment from tao on 2008-05-05 11:24 EST -- to set this up in a lab, you will need a pgp.cfg that points to the license: # grep -v ^# pgp.cfg | grep -v ^$ INFO=verbose RANDOM-DEVICE = /dev/random TMP = /tmp LICENSEFILE="/local/opt/ebs-8.1.0/ebs/EBusSvr.lic" The pgp command is run something like: ulimit -f 4000; cd /usr/jail/client/files; \\ /opt/ebs-8.1.0/ebs/pgp --encrypt --force --armor --overwrite --sign \\ --sign-with 'redhat user <redhat>' \\ --passphrase 'redhat passphrase' \\ --pubring /home/pgpuser/.pgp/$HOSTNAME/pubring.pkr \\ --secring /home/pgpuser/.pgp/$HOSTNAME/secring.skr \\ --randseed /home/pgpuser/.pgp/$HOSTNAME/randseed.rnd \\ --info debug --user \\"myname <myuser>\\" \\ --output $TEST_FILE_PGP $TEST_FILE (you'd have to make your own keys of course) This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- The PGP binary is in the attached RPM. I have not tried to install it on a vanilla system, so if it won't install, try the following to unpack it: rpm2cpio <RPM>.rpm | cpio -id Johnray This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- By the way, this worked with the standard (64-bit) strace. The documentation for strace indicates that -F is not implemented in linux. Thus it was never tried. So you can close this particular ticket once you update the manpage for strace. This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- Hey Rod, I'll get the manpage fixed, though there is an existing issue with following vforks() on 64-32bit binaries. For the pgp app though, are you saying that simply using -F made it work? Thanks, Andrew This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- Yes, 64 bit strace with -F was able to follow 32-bit pgp binary. I can't tell if the output was complete, but at least with -F we can follow the worker enough to see the read from disk, etc. This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- SEG, this ticket conversation has gotten a bit de-railed from the original point. The vfork() issue still exists within strace, I'd like to continue that escalation. Please let me know if you need any other info. Thanks, Andrew Internal Status set to 'Waiting on SEG' Status set to: Waiting on Tech This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184 -- Additional comment from tao on 2008-05-05 11:24 EST -- Steven, can you please kick this issue up to BZ? Engineering has been aware of the issue for several weeks now. Thanks, Andrew This event sent from IssueTracker by sfernand [Support Engineering Group] issue 173184
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
built 4.5.18-1.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0233.html