Bug 877999

Summary: luci not running after /etc/init.d/luci reports that it has started
Product: Red Hat Enterprise Linux 6 Reporter: michal novacek <mnovacek>
Component: luciAssignee: Jan Pokorný [poki] <jpokorny>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: high    
Version: 6.3CC: apevec, cfeist, cluster-maint, fdinitto, jpokorny, mnovacek, rsteiger, slevine, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: luci-0.26.0-92.el6 Doc Type: Bug Fix
Doc Text:
Cause: Unfortunate design of application server used in luci that starts initializing the application in question while the original initscript-launched process terminates, making it hard to account for later errors in the final status of initscript execution. Consequence: Application configuration errors are not reflected in the initscript execution outcome. More specifically, luci is indicated as running upon "service luci start" while in fact it was the case just a moment before it ultimately fails. Fix: Initscript is granted two explicit graceful periods (for PID file being created and subsequently for not disappearing) of 1 second in which the real outcome is to be decided if not already. This timeout is configured in /etc/sysconfig/luci via PID_FILE_WAIT configuration item, and the script will automatically complain whenever it is found insufficient. Result: No case of marking failed start of luci service as success should happen anymore. At worst, one is warned that the graceful wait period is likely not enough in particular deployment, which can be resolved easily.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-21 11:37:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 988985, 1023565    
Attachments:
Description Flags
Proposed fix for [bug 877999]
none
Proposed fix for [bug 877999] redux
none
1139238: Proposed fix for [bug 877999] redux 2
none
1139238: Proposed fix for [bug 877999] redux 3 none

Description michal novacek 2012-11-19 12:04:33 UTC
Description of problem:
With invalid config file (/var/lib/luci/etc/luci.ini) running luci with
"service luci start" reports that it has started when it has not. 

Version-Release number of selected component (if applicable):
luci-0.26.0-13.el6.x86_64

How reproducible: always

Steps to Reproduce:
1. edit /var/lib/luci/etc/luci.ini and do something syntactically incorect
(change uppercase to lowercase for example) and save that file
2. service luci start
3. check that luci is not running
  
Actual results: 
init script reports that luci has started correctly but it has not and is not
running

Expected results:
report that there has been error trying to run luci

Additional info:
It can be seen in luci log file that it has not been started because of parsing
error in luci.ini which probably means that init script is not properly
checking return value or that this value is incorrecly reported by python
itself.

1) luci is not running.
$ service luci restart
Stop luci...                                               [FAILED]
Start luci...                                              [  OK  ]
Point your web browser to https://rhel63-02:8084 (or equivalent) to access luci
$ lsof -i :8084
COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
python  18581 luci    5u  IPv4 9943462      0t0  TCP *:8084 (LISTEN)


2) breaking luci config file
$ vim /var/lib/luci/etc/luci.ini 
$ service luci restart
Stop luci...                                               [  OK  ]
Start luci...                                              [  OK  ]
Point your web browser to https://rhel63-02:8084 (or equivalent) to access luci
$ lsof -i :8084
$

Comment 2 Jan Pokorný [poki] 2012-11-19 12:54:06 UTC
This is cause because "serve" command in Paste's (the WSGI server
used by luci) daemon mode will spawn a child process and exit
successfully (initscript observes success), and it is this dettached
child which actually uses the base configuration file (via loadserver
and loadapp methods).

Cf. /usr/lib/python*/site-packages/paste/script/serve.py

But agreed this is suboptimal and there might be an additional check if
luci did not bail out due to a problem like this.

Please be aware that the base config file (/var/lib/luci/etc/luci.ini)
is not publicly exposed (those warnings) and additionally, it is generated
by initscript on-the-fly when missing.  Hence, solution is easy: just
remove that file and restart luci.

Comment 3 Jan Pokorný [poki] 2012-11-19 13:25:27 UTC
Created attachment 647756 [details]
Proposed fix for [bug 877999]

Could you check if this patch solves your problem, Michal?

Comment 4 Jan Pokorný [poki] 2012-11-19 14:00:18 UTC
Re [comment 3]:

TODO:
- refactor success/failure functions and print the final verdict only
  after it is known that startup has indeed succeeded
  (the only effect of the patch so far is to return correct exit code,
   not to a proper announcement towards user)
- restart action should also include the deferred check whether luci
  is running

Comment 14 Jan Pokorný [poki] 2013-08-13 13:20:07 UTC
re [comment 13]:
Found the original discussion, I think: [3,4].
Newer reiteration of the same: [5].
IOW, it would have been good if this was addressed on WSGI standard level.

[3] https://github.com/Pylons/pyramid/issues/442
[4] http://mail.python.org/pipermail/web-sig/2012-February/005100.html
[5] https://groups.google.com/forum/#!topic/pylons-discuss/h3hSaMxTKR4

Comment 30 Jan Pokorný [poki] 2016-03-22 21:29:00 UTC
Created attachment 1139238 [details]
Proposed fix for [bug 877999] redux

Comment 31 Jan Pokorný [poki] 2016-03-23 14:40:04 UTC
Created attachment 1139618 [details]
1139238: Proposed fix for [bug 877999] redux 2

Comment 32 Jan Pokorný [poki] 2016-03-23 15:26:35 UTC
Created attachment 1139626 [details]
1139238: Proposed fix for [bug 877999] redux 3

Comment 45 errata-xmlrpc 2017-03-21 11:37:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0766.html