Bug 1266881 - engine-setup hangs indefinitely starting ovirt-websocket-proxy via service using python subprocess module
engine-setup hangs indefinitely starting ovirt-websocket-proxy via service us...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: Services (Show other bugs)
3.6.0
Unspecified Unspecified
unspecified Severity high (vote)
: ovirt-3.6.0-ga
: 3.6.0
Assigned To: Sandro Bonazzola
Karolína Hajná
integration
: Regression
: 1270580 (view as bug list)
Depends On:
Blocks: 1267187
  Show dependency treegraph
 
Reported: 2015-09-28 07:20 EDT by Sandro Bonazzola
Modified: 2016-05-19 21:23 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1267187 (view as bug list)
Environment:
Last Closed: 2015-11-04 06:37:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.0+
rule-engine: blocker+
ylavi: Triaged+
bmcclain: planning_ack+
michal.skrivanek: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 46769 master MERGED packaging: pythonlib: service: by default redirect to /dev/null stdout/stderr Never
oVirt gerrit 46773 ovirt-engine-3.6 MERGED packaging: pythonlib: service: by default redirect to /dev/null stdout/stderr Never
oVirt gerrit 46775 ovirt-engine-3.6.0 MERGED packaging: pythonlib: service: by default redirect to /dev/null stdout/stderr Never
oVirt gerrit 46781 ovirt-engine-3.5 MERGED packaging: pythonlib: service: by default redirect to /dev/null stdout/stderr Never

  None (edit)
Description Sandro Bonazzola 2015-09-28 07:20:51 EDT
While installing ovirt-engine, the setup is stuck on "service ovirt-websocket-proxy start".
Looks like the service is not working as a proper daemon anymore.

Workaround: manually stop the service during the setup and re-start it when setup fnishes

Workaround: choose to not setup websocket proxy on the system while configuring ovirt-engine
Comment 1 Simone Tiraboschi 2015-09-28 11:11:59 EDT
Adding details:

ovirt-websocket-proxy correctly starts if we manually invoke service

 [root@c66et1 ~]# /sbin/service ovirt-websocket-proxy start
 Starting oVirt Engine websockets proxy:                    [  OK  ]
 [root@c66et1 ~]# echo $?
 0

bu the issue happens if we start the service thought Otopi.
The python daemon goes on, service command exits but the python code doesn't notify it, service process is marked as defunct while the setup still monitors it:

29983  2913 29983  1416 pts/0    29983 Z+       0   0:00 [service] <defunct>

This few python lines are enough to reproduce it (manually stopping the service before that):
 import subprocess
 p = subprocess.Popen(('/sbin/service', 'ovirt-websocket-proxy', 'start'), stdin=None, stderr=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True,)
 output = p.communicate()
 print 'output: %s' % str(output)

service concludes, but this python script will wait forever on the communicate call.


If we run it with strace we see that:
 poll([{fd=3, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
 read(3, "Starting oVirt Engine websockets"..., 4096) = 40
 poll([{fd=3, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
 read(3, "\33[60G[\33[0;32m  OK  \33[0;39m]\r\n", 4096) = 29
 poll([{fd=3, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 2, -1) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=10928, si_status=0, si_utime=0, si_stime=0} ---
 restart_syscall(<... resuming interrupted call ...>

The python script correctly gets its SIGCHLD when the service process exits but no code got executed and it continue to wait. And it will wait indefinitely cause at that point service is already died.

It could be related to this one:
https://bugzilla.redhat.com/1065537
Comment 2 Simone Tiraboschi 2015-09-28 11:16:27 EDT
Seen with python 2.6.6-64.el6
Comment 3 Alon Bar-Lev 2015-09-28 15:05:27 EDT
not sure I understand, if it is a bug in python and a regression, what version of python last work, what is the first that does not?
Comment 4 Alon Bar-Lev 2015-09-28 15:21:52 EDT
Checkout the service check I submitted, it closes stdout/stderr of caller, should resolve this issue, I have no el6 environment to test.
Comment 5 Simone Tiraboschi 2015-09-28 16:18:33 EDT
(In reply to Alon Bar-Lev from comment #4)
> Checkout the service check I submitted, it closes stdout/stderr of caller,
> should resolve this issue, I have no el6 environment to test.

It works on el6, thanks.
Comment 6 Michal Skrivanek 2015-09-29 03:45:12 EDT
since this is a regression caused by el6 python we need to backport it to 3.5.z as well
Comment 7 Yaniv Lavi (Dary) 2015-10-07 07:19:50 EDT
After this fix setup should not get stuck on el6.
Comment 8 Simone Tiraboschi 2015-10-12 03:36:23 EDT
*** Bug 1270580 has been marked as a duplicate of this bug. ***
Comment 9 Karolína Hajná 2015-10-14 09:44:59 EDT
Verified on 3.6.0-16 (rhevm-3.6.0.1-0.1.el6.noarch)
Comment 10 Sandro Bonazzola 2015-11-04 06:37:15 EST
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.

Note You need to log in before you can comment on or make changes to this bug.