Bug 624657 - Timing issue in systemtap initscript restart command
Timing issue in systemtap initscript restart command
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: systemtap (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: ---
Assigned To: David Smith
qe-baseos-tools
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-17 08:00 EDT by Petr Muller
Modified: 2016-09-19 22:07 EDT (History)
3 users (show)

See Also:
Fixed In Version: systemtap-1.4-2.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 644350 (view as bug list)
Environment:
Last Closed: 2011-05-19 09:54:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0651 normal SHIPPED_LIVE systemtap bug fix and enhancement update 2011-05-19 05:37:25 EDT

  None (edit)
Description Petr Muller 2010-08-17 08:00:05 EDT
Description of problem:
Our Beaker test for the initscript sometimes fails to restart the service when some script is running. When investigating this issue, I've found out this is probably a timing issue: if I add a slight lag (like, sleep 3) between 'stop' and 'start' calls in 'restart' function, the issue disappears.

Version-Release number of selected component (if applicable):
systemtap-1.2-9.el6

How reproducible:
On some boxes, not always. When it appears, it is usually reproducible.

Steps to Reproduce:
1. cat > /etc/systemtap/script.d/heart.stp << EOF
> probe timer.ms(500){
>   print("Beat!\n");
> }
> EOF
2. # service systemtap start; sleep 1; service systemtap restart;
  
Actual results:
Starting systemtap:  Compiling heart ... done
 Starting heart ... done
                                                           [  OK  ]
Stopping systemtap:                                        [  OK  ]
Starting systemtap: heart is dead, but another script is running.
                                                           [FAILED]

Expected results:
Starting systemtap: [  OK  ]
Stopping systemtap: [  OK  ]
Starting systemtap:  Compiling heart ... done
 Starting heart ... done
[  OK  ]

Additional info:
Comment 1 David Smith 2010-11-23 15:41:33 EST
I haven't been able to duplicate this (tried on 3 different machines).  On a machine where this happens, can you show me the new info added to /var/log/systemtap.log?
Comment 2 Petr Muller 2010-11-24 07:58:22 EST
David,

I had a look on the issue and I found I omitted quite important piece of reproducing information: I probably had it configured from the automated test run so I forgot to include it. Sorry about that. I see the issue after doing:

# echo "heart_OPT='-o /tmp/stap-test.log'" > /etc/systemtap/conf.d/heart.conf

before doing step 2. I haven't managed to reproduce the problem without this. Even with it, I had to run the start-sleep-restart triple in a loop, seeing it in about 1 of 5 cases on one box. I can see it failing consistently on s390x, though. 

This shows up in /var/log/systemtap.log:
# tail -f /var/log/systemtap.log
Nov 24 07:57:12: Starting systemtap: 
Nov 24 07:57:12:  Starting heart ... 
Nov 24 07:57:12: Exec: /usr/bin/staprun -o /tmp/stap-test.log -D /var/cache/systemtap/2.6.32-71.el6.ppc64/heart.ko
Nov 24 07:57:12: Exec: cp -f ./pid /var/run/systemtap/heart
Nov 24 07:57:12: done
Nov 24 07:57:12: Pass: systemtap startup
Nov 24 07:57:13: Stopping systemtap: 
Nov 24 07:57:13: Exec: kill -TERM 3787
Nov 24 07:57:13: Pass: systemtap stopping 
Nov 24 07:57:13: Starting systemtap: 
Nov 24 07:57:13: heart is dead, but another script is running.
Nov 24 07:57:13: Error: Failed to run "heart". (4)
Comment 3 David Smith 2010-11-30 15:27:03 EST
Fixed in upstream commit 671a1d8:

<http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commitdiff;h=671a1d824ff1320f9e2fa3ed27d5458cc44a5dcc>

Using the 'heart_OPT' configuration allowed me to reproduce this problem.  Basically we were sending stapio a signal to make it unload the module, but not waiting on the module to unload.

While testing the solution to the stopping problem, I ran into a related, but different, problem when loading the module.  When the '-D' option is used, staprun detaches from the terminal and then prints the pid.  Then we'd check the contents of the pid file before it was written.

The above commit fixes both problems.
Comment 7 errata-xmlrpc 2011-05-19 09:54:36 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0651.html

Note You need to log in before you can comment on or make changes to this bug.