Bug 1266881
Summary: | engine-setup hangs indefinitely starting ovirt-websocket-proxy via service using python subprocess module | |||
---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Sandro Bonazzola <sbonazzo> | |
Component: | Services | Assignee: | Sandro Bonazzola <sbonazzo> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Karolína Hajná <khajna> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.6.0 | CC: | alonbl, bmcclain, bugs, fdeutsch, michal.skrivanek, pstehlik, ratamir, stirabos, ylavi | |
Target Milestone: | ovirt-3.6.0-ga | Keywords: | Regression | |
Target Release: | 3.6.0 | Flags: | rule-engine:
ovirt-3.6.0+
rule-engine: blocker+ ylavi: Triaged+ bmcclain: planning_ack+ michal.skrivanek: devel_ack+ pstehlik: testing_ack+ |
|
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | integration | |||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1267187 (view as bug list) | Environment: | ||
Last Closed: | 2015-11-04 11:37:15 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1267187 |
Description
Sandro Bonazzola
2015-09-28 11:20:51 UTC
Adding details: ovirt-websocket-proxy correctly starts if we manually invoke service [root@c66et1 ~]# /sbin/service ovirt-websocket-proxy start Starting oVirt Engine websockets proxy: [ OK ] [root@c66et1 ~]# echo $? 0 bu the issue happens if we start the service thought Otopi. The python daemon goes on, service command exits but the python code doesn't notify it, service process is marked as defunct while the setup still monitors it: 29983 2913 29983 1416 pts/0 29983 Z+ 0 0:00 [service] <defunct> This few python lines are enough to reproduce it (manually stopping the service before that): import subprocess p = subprocess.Popen(('/sbin/service', 'ovirt-websocket-proxy', 'start'), stdin=None, stderr=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True,) output = p.communicate() print 'output: %s' % str(output) service concludes, but this python script will wait forever on the communicate call. If we run it with strace we see that: poll([{fd=3, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 2, -1) = 1 ([{fd=3, revents=POLLIN}]) read(3, "Starting oVirt Engine websockets"..., 4096) = 40 poll([{fd=3, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 2, -1) = 1 ([{fd=3, revents=POLLIN}]) read(3, "\33[60G[\33[0;32m OK \33[0;39m]\r\n", 4096) = 29 poll([{fd=3, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 2, -1) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=10928, si_status=0, si_utime=0, si_stime=0} --- restart_syscall(<... resuming interrupted call ...> The python script correctly gets its SIGCHLD when the service process exits but no code got executed and it continue to wait. And it will wait indefinitely cause at that point service is already died. It could be related to this one: https://bugzilla.redhat.com/1065537 Seen with python 2.6.6-64.el6 not sure I understand, if it is a bug in python and a regression, what version of python last work, what is the first that does not? Checkout the service check I submitted, it closes stdout/stderr of caller, should resolve this issue, I have no el6 environment to test. (In reply to Alon Bar-Lev from comment #4) > Checkout the service check I submitted, it closes stdout/stderr of caller, > should resolve this issue, I have no el6 environment to test. It works on el6, thanks. since this is a regression caused by el6 python we need to backport it to 3.5.z as well After this fix setup should not get stuck on el6. *** Bug 1270580 has been marked as a duplicate of this bug. *** Verified on 3.6.0-16 (rhevm-3.6.0.1-0.1.el6.noarch) oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue. If problems still persist, please open a new BZ and reference this one. |