Bug 755407

Summary: SIGXFSZ signal behaviour is different while running under harness
Product: [Retired] Beaker Reporter: yanfu,wang <yanwang>
Component: beahAssignee: Bill Peck <bpeck>
Status: CLOSED CURRENTRELEASE QA Contact: yanfu,wang <yanwang>
Severity: medium Docs Contact:
Priority: medium    
Version: 0.7CC: bpeck, dcallagh, jstancek, mcsontos, rmancy, stl
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-10 10:56:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description yanfu,wang 2011-11-21 03:37:36 UTC
Description of problem:
SIGXFSZ signal behaviour is different while running under harness and it seems python is to be blamed for this.
The full describe pls refer to below link:
http://post-office.corp.redhat.com/archives/beaker-dev-list/2011-November/msg00027.html

Version-Release number of selected component (if applicable):
# rpm -qa|grep beah
beah-0.6.34-2.el6_0.noarch

How reproducible:
always

Steps to Reproduce:
clone the job https://beaker.engineering.redhat.com/jobs/160167

Actual results:
unexpected behavior in beaker, but run ok by manual.

Expected results:
Snippet of the log:
> :: [   LOG    ] :: mount
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> ...
> :: [   INFO   ] :: /etc/mtab md5 is now 0de9b48c805bfd985a49e1e512b622f2
> :: [   PASS   ] :: Comparing the old and new mds5sum of /etc/mtab
> :: [   PASS   ] :: Should not left stale file /etc/mtab~
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         |-> I need to get FAIL by find the stale file /etc/mtab~
> 
> 
> another one:
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> :: [   LOG    ] :: corrupt
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> 
> :: [   PASS   ] :: Remove stale /etc/mtab~ in order to umount successfully
> :: [   PASS   ] :: Umount done to prepare next test
> :: [   PASS   ] :: No localhost:/tmp entry in /etc/mtab now
> :: [   PASS   ] :: Adding the testing user
> :: [   PASS   ] :: Backing up the mtab
> :: [   FAIL   ] :: Trying to corrupt mtab with mount (Expected 153,16, got 0)
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         |-> I need to get PASS by non-zero return code of command

Additional info:

Comment 1 yanfu,wang 2011-12-09 07:02:17 UTC
(In reply to comment #0)
> Description of problem:
> SIGXFSZ signal behaviour is different while running under harness and it seems
> python is to be blamed for this.
> The full describe pls refer to below link:
> http://post-office.corp.redhat.com/archives/beaker-dev-list/2011-November/msg00027.html
> 
> Version-Release number of selected component (if applicable):
> # rpm -qa|grep beah
> beah-0.6.34-2.el6_0.noarch
> 
> How reproducible:
> always
> 
> Steps to Reproduce:
> clone the job https://beaker.engineering.redhat.com/jobs/160167
> 
> Actual results:
> unexpected behavior in beaker, but run ok by manual.
> 
> Expected results:
> Snippet of the log:
> > :: [   LOG    ] :: mount
> > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> > ...
> > :: [   INFO   ] :: /etc/mtab md5 is now 0de9b48c805bfd985a49e1e512b622f2
> > :: [   PASS   ] :: Comparing the old and new mds5sum of /etc/mtab
> > :: [   PASS   ] :: Should not left stale file /etc/mtab~
> >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >         |-> I need to get FAIL by find the stale file /etc/mtab~
> > 
> > 
> > another one:
> > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> > :: [   LOG    ] :: corrupt
> > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> > 
> > :: [   PASS   ] :: Remove stale /etc/mtab~ in order to umount successfully
> > :: [   PASS   ] :: Umount done to prepare next test
> > :: [   PASS   ] :: No localhost:/tmp entry in /etc/mtab now
> > :: [   PASS   ] :: Adding the testing user
> > :: [   PASS   ] :: Backing up the mtab
> > :: [   FAIL   ] :: Trying to corrupt mtab with mount (Expected 153,16, got 0)
> >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >         |-> I need to get PASS by non-zero return code of command
> 
> Additional info:

any update?

Comment 2 Jan Stancek 2011-12-10 14:18:58 UTC
I googled a little and found other people hitting this too:

1. http://bugs.python.org/issue1652#msg100047
if you look at giant patch in next comment it does this before execvp:
---- snip ----
if restore_signals:
    signals = ('SIGPIPE', 'SIGXFZ', 'SIGXFSZ')
    for sig in signals:
        if hasattr(signal, sig):
        signal.signal(getattr(signal, sig), signal.SIG_DFL)
---- /snip ----


2. http://twistedmatrix.com/trac/ticket/4199
http://twistedmatrix.com/trac/attachment/ticket/4199/4199-3.diff
---- snip ----
for signalnum in range(1, signal.NSIG): 
    if signalnum in (signal.SIGKILL, signal.SIGSTOP): 
        # These two signals (commonly 9 & 19) can't be caught or ignored
        continue 

    if signal.getsignal(signalnum) == signal.SIG_IGN: 
        # Reset signal handling to the default 
        signal.signal(signalnum, signal.SIG_DFL) 
---- /snip ----


My initial approach was very similar:
---- snip ----
for i in range(1, signal.NSIG):
    try:
        signal.signal(i, signal.SIG_DFL)
    except:
        pass
---- /snip ----

I looked at alternatives: newgrp, sg, but these modify signal handlers/masks even more.

Perl seems decent enough to restore signals before exec, but lacks any interface to setgroups().

Comment 3 yanfu,wang 2011-12-19 09:14:39 UTC
(In reply to comment #2)
> I googled a little and found other people hitting this too:
> 
> 1. http://bugs.python.org/issue1652#msg100047
> if you look at giant patch in next comment it does this before execvp:
> ---- snip ----
> if restore_signals:
>     signals = ('SIGPIPE', 'SIGXFZ', 'SIGXFSZ')
>     for sig in signals:
>         if hasattr(signal, sig):
>         signal.signal(getattr(signal, sig), signal.SIG_DFL)
> ---- /snip ----
> 
> 
> 2. http://twistedmatrix.com/trac/ticket/4199
> http://twistedmatrix.com/trac/attachment/ticket/4199/4199-3.diff
> ---- snip ----
> for signalnum in range(1, signal.NSIG): 
>     if signalnum in (signal.SIGKILL, signal.SIGSTOP): 
>         # These two signals (commonly 9 & 19) can't be caught or ignored
>         continue 
> 
>     if signal.getsignal(signalnum) == signal.SIG_IGN: 
>         # Reset signal handling to the default 
>         signal.signal(signalnum, signal.SIG_DFL) 
> ---- /snip ----
> 
> 
> My initial approach was very similar:
> ---- snip ----
> for i in range(1, signal.NSIG):
>     try:
>         signal.signal(i, signal.SIG_DFL)
>     except:
>         pass
> ---- /snip ----
> 
> I looked at alternatives: newgrp, sg, but these modify signal handlers/masks
> even more.
> 
> Perl seems decent enough to restore signals before exec, but lacks any
> interface to setgroups().

Sorry, I'm not python/perl expert, so what's the solution to let my script get expected behaviour?

Comment 4 Jan Stancek 2011-12-19 09:59:08 UTC
(In reply to comment #3)
> Sorry, I'm not python/perl expert, so what's the solution to let my script get
> expected behaviour?

1. wait until harness gets fixed
2. reset the signal before running your test
for example:

--- sigxfsz_reset.py ---
#!/usr/bin/python

import os
import sys
import signal

try:
    signal.signal(signal.SIGXFSZ, signal.SIG_DFL)
except Exception, e:
    print e
    sys.stdout.flush()

if len(sys.argv) > 1:
        os.execvp(sys.argv[1], sys.argv[1:])
else:
    print __file__, 'unexpectedly at the end of chain'
--- /snip ---

chmod a+x sigxfsz_reset.py
./sigxfsz_reset.py ./runtest.sh

Note: I didn't try this with your test, only by checking output of:
cat /proc/self/status | grep Sig[BI]

Comment 5 yanfu,wang 2011-12-20 03:28:11 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Sorry, I'm not python/perl expert, so what's the solution to let my script get
> > expected behaviour?
> 
> 1. wait until harness gets fixed
> 2. reset the signal before running your test
> for example:
> 
> --- sigxfsz_reset.py ---
> #!/usr/bin/python
> 
> import os
> import sys
> import signal
> 
> try:
>     signal.signal(signal.SIGXFSZ, signal.SIG_DFL)
> except Exception, e:
>     print e
>     sys.stdout.flush()
> 
> if len(sys.argv) > 1:
>         os.execvp(sys.argv[1], sys.argv[1:])
> else:
>     print __file__, 'unexpectedly at the end of chain'
> --- /snip ---
> 
> chmod a+x sigxfsz_reset.py
> ./sigxfsz_reset.py ./runtest.sh
> 
> Note: I didn't try this with your test, only by checking output of:
> cat /proc/self/status | grep Sig[BI]

got, thank you for your time and effort to track the issue.

Comment 6 Jan Stancek 2012-04-10 07:15:59 UTC
Bill,

I think this can be closed now. With introduction of tortilla, initgroups have been changed to reset all signal handlers before exec.
Blocked signal mask looks good now:

# ps afx | grep make
18400 pts/0    S+     0:00          \_ grep make
 2262 ?        S      0:00              \_ make run

# cat /proc/2262/status | grep Sig[BI]
SigBlk: 0000000000000000
SigIgn: 0000000000000000