Bug 624657
Summary: | Timing issue in systemtap initscript restart command | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Petr Muller <pmuller> | |
Component: | systemtap | Assignee: | David Smith <dsmith> | |
Status: | CLOSED ERRATA | QA Contact: | qe-baseos-tools-bugs | |
Severity: | medium | Docs Contact: | ||
Priority: | low | |||
Version: | 6.0 | CC: | fche, mjw, ohudlick | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | systemtap-1.4-2.el6 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 644350 (view as bug list) | Environment: | ||
Last Closed: | 2011-05-19 13:54:36 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
I haven't been able to duplicate this (tried on 3 different machines). On a machine where this happens, can you show me the new info added to /var/log/systemtap.log? David, I had a look on the issue and I found I omitted quite important piece of reproducing information: I probably had it configured from the automated test run so I forgot to include it. Sorry about that. I see the issue after doing: # echo "heart_OPT='-o /tmp/stap-test.log'" > /etc/systemtap/conf.d/heart.conf before doing step 2. I haven't managed to reproduce the problem without this. Even with it, I had to run the start-sleep-restart triple in a loop, seeing it in about 1 of 5 cases on one box. I can see it failing consistently on s390x, though. This shows up in /var/log/systemtap.log: # tail -f /var/log/systemtap.log Nov 24 07:57:12: Starting systemtap: Nov 24 07:57:12: Starting heart ... Nov 24 07:57:12: Exec: /usr/bin/staprun -o /tmp/stap-test.log -D /var/cache/systemtap/2.6.32-71.el6.ppc64/heart.ko Nov 24 07:57:12: Exec: cp -f ./pid /var/run/systemtap/heart Nov 24 07:57:12: done Nov 24 07:57:12: Pass: systemtap startup Nov 24 07:57:13: Stopping systemtap: Nov 24 07:57:13: Exec: kill -TERM 3787 Nov 24 07:57:13: Pass: systemtap stopping Nov 24 07:57:13: Starting systemtap: Nov 24 07:57:13: heart is dead, but another script is running. Nov 24 07:57:13: Error: Failed to run "heart". (4) Fixed in upstream commit 671a1d8: <http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commitdiff;h=671a1d824ff1320f9e2fa3ed27d5458cc44a5dcc> Using the 'heart_OPT' configuration allowed me to reproduce this problem. Basically we were sending stapio a signal to make it unload the module, but not waiting on the module to unload. While testing the solution to the stopping problem, I ran into a related, but different, problem when loading the module. When the '-D' option is used, staprun detaches from the terminal and then prints the pid. Then we'd check the contents of the pid file before it was written. The above commit fixes both problems. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0651.html |
Description of problem: Our Beaker test for the initscript sometimes fails to restart the service when some script is running. When investigating this issue, I've found out this is probably a timing issue: if I add a slight lag (like, sleep 3) between 'stop' and 'start' calls in 'restart' function, the issue disappears. Version-Release number of selected component (if applicable): systemtap-1.2-9.el6 How reproducible: On some boxes, not always. When it appears, it is usually reproducible. Steps to Reproduce: 1. cat > /etc/systemtap/script.d/heart.stp << EOF > probe timer.ms(500){ > print("Beat!\n"); > } > EOF 2. # service systemtap start; sleep 1; service systemtap restart; Actual results: Starting systemtap: Compiling heart ... done Starting heart ... done [ OK ] Stopping systemtap: [ OK ] Starting systemtap: heart is dead, but another script is running. [FAILED] Expected results: Starting systemtap: [ OK ] Stopping systemtap: [ OK ] Starting systemtap: Compiling heart ... done Starting heart ... done [ OK ] Additional info: