Bug 1623070
Summary: | (s390x only) sos skips collecting random data due to pipe returncode None | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pavel Moravec <pmoravec> | ||||||||
Component: | sos | Assignee: | Pavel Moravec <pmoravec> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Miroslav HradĂlek <mhradile> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 7.6 | CC: | agk, bmr, cstratak, dhorak, gavin, jkejda, lmiksik, mhradile, mplch, ovasik, plambri, pmoravec, pviktori, sbradley | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | s390x | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | sos-3.6-6.el7 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1627544 (view as bug list) | Environment: | |||||||||
Last Closed: | 2018-10-30 10:33:42 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1627544 | ||||||||||
Attachments: |
|
Description
Pavel Moravec
2018-08-28 12:39:17 UTC
Created attachment 1479243 [details]
reproducer script
Created attachment 1479244 [details]
strace from the buggy case
strace -Tttfxvo strace.txt-None python repro_pipe_unfinished.py
from a case when returnstatus=None
Created attachment 1479245 [details]
strace from an OK case
strace -Tttfxvo strace.txt-OK python repro_pipe_unfinished.py
from a case when returnstatus=0
Comparing straces, I *think* this matters (but take the analysis as from a strace noob): OK: execve was here completed: .. 61682 08:19:05.652309 close(1023) = -1 EBADF (Bad file descriptor) <0.000005> 61682 08:19:05.652467 execve("/usr/bin/grep", ["/usr/bin/grep", "-q", "localhost", "/etc/hosts"], ["LAB_CONTROLLER=lab-02.rhts.eng.b"..., "XDG_SESSION_ID=35", "HOSTNAME=ibm-z-33.rhts.eng.bos.r"..., "SELINUX_ROLE_REQUESTED=", "TERM=xterm-256color", "SHELL=/bin/bash", "HISTSIZE=1000", "SSH_CLIENT=10.40.205.102 35474 2"..., "SELINUX_USE_CURRENT_RANGE=", "BUILDURL=http://download.eng.bos"..., "DUMPSERVER=netdump-01.eng.bos.re"..., "QTDIR=/usr/lib64/qt-3.3", "QTINC=/usr/lib64/qt-3.3/include", "SSH_TTY=/dev/pts/1", "RHTS_OPTION_COMPATIBLE=", "QT_GRAPHICSSYSTEM_CHECKED=1", "USER=root", "LS_COLORS=rs=0:di=38;5;27:ln=38;"..., "BEAKER_RECIPE_WHITEBOARD=Server "..., "BEAKER_JOB_WHITEBOARD=[TR#340267"..., "MAIL=/var/spool/mail/root", "PATH=/usr/lib64/qt-3.3/bin:/usr/"..., "PWD=/root", "XMODIFIERS=@im=none", "LANG=en_US.UTF-8", "BEAKER=https://beaker.engineerin"..., "LOOKASIDE=http://download.eng.bo"..., "SELINUX_LEVEL_REQUESTED=", "HISTCONTROL=ignoredups", "BEAKER_HUB_URL=https://beaker.en"..., "BEAKER_RECIPE_ID=5584053", "BEAKER_LAB_CONTROLLER=lab-02.rht"..., "SHLVL=1", "HOME=/root", "NFSSERVERS=rhel4-nfs.rhts.eng.bo"..., "BEAKER_LAB_CONTROLLER_URL=http:/"..., "LOGNAME=root", "RHTS_OPTION_COMPAT_SERVICE=", "QTLIB=/usr/lib64/qt-3.3/lib", "SSH_CONNECTION=10.40.205.102 354"..., "LESSOPEN=||/usr/bin/lesspipe.sh "..., "XDG_RUNTIME_DIR=/run/user/0", "_=/usr/bin/strace"]) = 0 <0.000414> None: execve was interrupted by something else and resumed later on: 61658 08:18:54.097997 close(1023) = -1 EBADF (Bad file descriptor) <0.000006> 61658 08:18:54.098170 execve("/usr/bin/grep", ["/usr/bin/grep", "-q", "localhost", "/etc/hosts"], ["LAB_CONTROLLER=lab-02.rhts.eng.b"..., "XDG_SESSION_ID=35", "HOSTNAME=ibm-z-33.rhts.eng.bos.r"..., "SELINUX_ROLE_REQUESTED=", "TERM=xterm-256color", "SHELL=/bin/bash", "HISTSIZE=1000", "SSH_CLIENT=10.40.205.102 35474 2"..., "SELINUX_USE_CURRENT_RANGE=", "BUILDURL=http://download.eng.bos"..., "DUMPSERVER=netdump-01.eng.bos.re"..., "QTDIR=/usr/lib64/qt-3.3", "QTINC=/usr/lib64/qt-3.3/include", "SSH_TTY=/dev/pts/1", "RHTS_OPTION_COMPATIBLE=", "QT_GRAPHICSSYSTEM_CHECKED=1", "USER=root", "LS_COLORS=rs=0:di=38;5;27:ln=38;"..., "BEAKER_RECIPE_WHITEBOARD=Server "..., "BEAKER_JOB_WHITEBOARD=[TR#340267"..., "MAIL=/var/spool/mail/root", "PATH=/usr/lib64/qt-3.3/bin:/usr/"..., "PWD=/root", "XMODIFIERS=@im=none", "LANG=en_US.UTF-8", "BEAKER=https://beaker.engineerin"..., "LOOKASIDE=http://download.eng.bo"..., "SELINUX_LEVEL_REQUESTED=", "HISTCONTROL=ignoredups", "BEAKER_HUB_URL=https://beaker.en"..., "BEAKER_RECIPE_ID=5584053", "BEAKER_LAB_CONTROLLER=lab-02.rht"..., "SHLVL=1", "HOME=/root", "NFSSERVERS=rhel4-nfs.rhts.eng.bo"..., "BEAKER_LAB_CONTROLLER_URL=http:/"..., "LOGNAME=root", "RHTS_OPTION_COMPAT_SERVICE=", "QTLIB=/usr/lib64/qt-3.3/lib", "SSH_CONNECTION=10.40.205.102 354"..., "LESSOPEN=||/usr/bin/lesspipe.sh "..., "XDG_RUNTIME_DIR=/run/user/0", "_=/usr/bin/strace"] <unfinished ...> 61657 08:18:54.098693 <... read resumed> "", 1048576) = 0 <0.023026> 61657 08:18:54.098715 mremap(0x3fff6884000, 1052672, 4096, MREMAP_MAYMOVE) = 0x3fff6884000 <0.000011> 61657 08:18:54.098756 close(4) = 0 <0.000006> 61657 08:18:54.098788 close(6) = 0 <0.000007> 61657 08:18:54.098812 close(7) = 0 <0.000011> 61657 08:18:54.098849 munmap(0x3fff6884000, 4096) = 0 <0.000010> 61657 08:18:54.098893 fcntl(3, F_GETFL) = 0 (flags O_RDONLY) <0.000007> 61657 08:18:54.098924 fstat(3, {st_dev=makedev(0, 9), st_ino=1639909, st_mode=S_IFIFO|0600, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2018/08/28-08:18:54.064743374, st_mtime=2018/08/28-08:18:54.064743374, st_ctime=2018/08/28-08:18:54.064743374}) = 0 <0.000007> 61657 08:18:54.098974 fcntl(5, F_GETFL) = 0 (flags O_RDONLY) <0.000006> 61657 08:18:54.099000 fstat(5, {st_dev=makedev(0, 9), st_ino=1639910, st_mode=S_IFIFO|0600, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2018/08/28-08:18:54.064743374, st_mtime=2018/08/28-08:18:54.064743374, st_ctime=2018/08/28-08:18:54.064743374}) = 0 <0.000026> 61657 08:18:54.099226 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x3fff6184000 <0.000009> 61657 08:18:54.099267 mprotect(0x3fff6184000, 4096, PROT_NONE) = 0 <0.000008> 61657 08:18:54.099294 clone(child_stack=0x3fff6984210, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x3fff69849e0, tls=0x3fff6984910, child_tidptr=0x3fff69849e0) = 61659 <0.000039> 61658 08:18:54.099353 <... execve resumed> ) = 0 <0.001072> Cant this bad interleaving in "None" case cause the bug? FYI on a different s390x machine (gss5.s390.bos.redhat.com, default beaker password), I can reproduce it much much _rarely_: # for i in $(seq 1 1000); do python ~/repro_pipe_unfinished.py; done | grep poll | sort | uniq -c 995 output: '', return code: 0, pollstatus=0 5 output: '', return code: None, pollstatus=None # (for python maintainers: feel free to use that machine to anything - I might test potential workaround in sosreport there but nothing else) This bug does not occur when compiled from source using --with-pydebug. (maybe the printing of "[# refs]" is slowing the process down and preventing this?) There is also quite a difference in likelihood of this happening between versions 2.7.5 and 2.7.15. (Respectively 30-100 in 1000 and 4-22 in 1000.) I have reproduced this on actual master (Python 3.8), x86_64. I'm going to report this upstream and will backport the patch once it will have become available. ---------------------- $ git clone https://github.com/python/cpython.git $ cd cpython $ ./configure --enable-shared CFLAGS="-O0" LDFLAGS="-Wl,--rpath=$(pwd)" $ make -j $ for i in $(seq 1 1000); do ./python script.py; done | grep poll | sort | uniq -c 630 output: '', return code: 0, pollstatus=0 370 output: '', return code: None, pollstatus=None Changing component to sos, as sosreport wrongly calls python's API for pipes. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:3144 |