Bug 1347864 - The systemd service unit does not allow tomcat to shut down gracefully
Summary: The systemd service unit does not allow tomcat to shut down gracefully
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: tomcat
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Coty Sutherland
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1347860
TreeView+ depends on / blocked
 
Reported: 2016-06-17 21:10 UTC by Coty Sutherland
Modified: 2016-09-01 18:53 UTC (History)
8 users (show)

Fixed In Version: tomcat-8.0.36-2.fc25 tomcat-8.0.36-2.fc24 tomcat-8.0.36-2.fc23
Clone Of: 1347860
Environment:
Last Closed: 2016-09-01 13:38:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Coty Sutherland 2016-06-17 21:10:54 UTC
+++ This bug was initially created as a clone of Bug #1347860 +++

Description of problem:
The systemd unit terminates the tomcat process prematurely, therefore not allowing it to shutdown gracefully. The direct effect of this is that sessions in the session manager are not persisted to disk and restored after a restart, resulting in a loss of session data (unless you are replicating data within a cluster).

Version-Release number of selected component (if applicable):
tomcat-8.0.32-5.fc23.noarch

How reproducible:
Always

Steps to Reproduce:
1. yum install tomcat
2. cp reproducer.war /usr/share/tomcat/webapps/
3. service tomcat start
4. curl http://localhost:8080/reproducer/getSession.jsp
5. service tomcat stop
6. ls /usr/share/tomcat/work/Catalina/localhost/reproducer/SESSIONS.ser

Actual results:
The session created by the curl request is not persisted into SESSIONS.ser.

Expected results:
The session created by the curl request is persisted to SESSIONS.ser

Additional info:
When the process successfully completes (with FINE logging on the StandardManager (org.apache.catalina.session.StandardManager.level = FINE)), you see the following along with other shutdown messages:

~~~
Jun 17, 2016 4:45:08 PM org.apache.catalina.session.StandardManager doUnload
FINE: Unloading persisted sessions
Jun 17, 2016 4:45:08 PM org.apache.catalina.session.StandardManager doUnload
FINE: Saving persisted sessions to SESSIONS.ser
Jun 17, 2016 4:45:08 PM org.apache.catalina.session.StandardManager doUnload
FINE: Unloading 4 sessions
Jun 17, 2016 4:45:08 PM org.apache.catalina.session.StandardManager doUnload
FINE: Expiring 4 persisted sessions
Jun 17, 2016 4:45:08 PM org.apache.catalina.session.StandardManager doUnload
FINE: Unloading complete
~~~

When you call the service stop, it terminates before the shutdown can complete (almost immediately after calling stop):

~~~
INFO: Server startup in 3501 ms
Jun 17, 2016 4:58:20 PM org.apache.catalina.core.StandardServer await
INFO: A valid shutdown command was received via the shutdown port. Stopping the Server instance.
Jun 17, 2016 4:58:20 PM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler ["http-bio-8080"]
~~~

--- Additional comment from Coty Sutherland on 2016-06-17 17:07:02 EDT ---

I'm able to get a graceful shutdown if I switch the unit type to Forking instead of simple, but then the start hangs :(

Comment 1 Coty Sutherland 2016-06-17 21:48:07 UTC
I don't know much about systemd service units, but I poked around at the service on my machine until I got something working. What I ended up with is here, please take a look and give me some feedback :)

https://github.com/csutherl/fedora-tomcat/commit/cbb79ee

Comment 2 Coty Sutherland 2016-06-20 14:46:00 UTC
I guess another option for this would be KillMode=none, but Type=forking at least provides feedback when things fail and tells you to look at status instead of silently failing.

Comment 3 Ivan Afonichev 2016-06-21 23:17:25 UTC
Forking is what we actually want to avoid.
Not sure how forking can help in this situation.

I'll try to debug systemd behavior. If it is sending term after execstop or kill signal. If it is waiting for TimeoutStopSec or not.

We definetely should have systemd beeing able to kill tomcat finally, after some timeout.

Comment 4 Coty Sutherland 2016-06-22 12:25:02 UTC
> Forking is what we actually want to avoid.

For what reason? Simple just fires the script and doesn't care about a return. Forking is more closely what SysV was doing and actually provides feedback when the process exits abnormally. The only benefit I see to using simple is that it kills the process when it doesn't stop in time (forking may also do that, but I haven't tested).

> If it is waiting for TimeoutStopSec or not.

I've already done that. I think the problem is that the tomcat stop call forks off and does it's thing, which returns immediately and sigterms the process before it can complete a shutdown. I've tried TimeoutStopSec, TimeoutSec, and I even tried putting a sleep in the server script after the stop to allow it time to finish; none of that worked for me :(

Hopefully you will have better luck because I'm stuck, but I'll keep poking around at it also.

Comment 5 Coty Sutherland 2016-06-27 20:26:58 UTC
OK. I've done some pretty extensive tests on this to see how all sorts of different systemd settings respond (nothing really did anything because the stop command returns immediately). I've already stated what I think the problem is above (c#4). I tested that theory a bit more by running the script outside of systemd (it worked as expected), along with a few other tests. My conclusion is that we need to fire off the stop process in the background and provide ample time for it to complete (one second is enough for a vanilla install with minimal deployments).

Here is my proposal (I agree it's a bit hacky, but it's what I came up with and works): https://github.com/csutherl/fedora-tomcat/commit/89eb646

The change will `run stop` in the background and then sleep for two seconds by default (or SHUTDOWN_WAIT if that is defined). After that time passes the ExecStop call returns and systemd SIGTERMs the remaining processes if they haven't stopped already. I've verified that this allows graceful stopping and restarting and that the processes are terminated from a hanging shutdown call. Bonus: This also restores the functionality of SHUTDOWN_WAIT which has a TODO in the tomcat.conf comments and I kept the unit type as simple to satisfy your assertion that forking will not suffice.

Thoughts?

Comment 6 Ivan Afonichev 2016-06-27 22:49:22 UTC
Ok, looks rather good, the only issue I see that it will sleep for this time (e.g. 30 seconds) even if tomcat is already stoped on first several seconds.

I prefer having such timeout feature implemented on systemd side.

As an option I'll try to remove ExecStop command - maybe TERM signal will do the same as receiving SHUTDOWN word via shutdown socket (current stop is just sending it).

Comment 7 Coty Sutherland 2016-06-28 18:22:15 UTC
> I prefer having such timeout feature implemented on systemd side.

Me too :) 

> As an option I'll try to remove ExecStop command - maybe TERM signal will do the same as receiving SHUTDOWN word via shutdown socket 

If removing ExecStop is an option and allowing tomcat to shutdown via SIGTERM then I think that is the way to go. I've tested and validated that tomcat shuts down gracefully when it get's a SIGTERM (I enabled org.apache.level = FINE and compared messages in the log to a vanilla tomcat tarball bin/shutdown.sh call). After the TimeoutStopSec time passes, it receives a SIGKILL and all remaining processes immediately die.

Comment 8 Coty Sutherland 2016-06-28 19:24:15 UTC
I also confirmed with the tomcat community that using SIGTERM to gracefully shutdown tomcat is fine (it's not functionally different than the Bootstrap.stop() call) so I think that's the way we should go, if you agree.

Comment 9 Ivan Afonichev 2016-06-28 19:26:27 UTC
Yes, great work! Thanks!

Comment 10 Coty Sutherland 2016-06-28 19:46:49 UTC
Here is a commit if you'd like :)

https://github.com/csutherl/fedora-tomcat/commit/20c470d

I tried to find a way to implement the SHUTDOWN_WAIT functionality, but it looks like you can't use a variable in the TimeoutStopSec, so I guess we'll have to do without that. We could reduce the setting of TimeoutStopSec from the default 90 seconds down to thirty (or something) just so that there is an example of how to set it in the service script, if you think that is required. Otherwise, just remove ExecStop and we're good to go!

Comment 12 Fedora Update System 2016-08-11 18:50:14 UTC
tomcat-8.0.36-2.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0a4dccdd23

Comment 13 Fedora Update System 2016-08-11 18:50:58 UTC
tomcat-8.0.36-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-2b0c16fd82

Comment 14 Fedora Update System 2016-08-11 18:51:39 UTC
tomcat-8.0.36-2.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-f4a443888b

Comment 15 Fedora Update System 2016-08-12 14:28:26 UTC
tomcat-8.0.36-2.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-f4a443888b

Comment 16 Fedora Update System 2016-08-12 20:53:13 UTC
tomcat-8.0.36-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-0a4dccdd23

Comment 17 Fedora Update System 2016-08-12 20:53:16 UTC
tomcat-8.0.36-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-2b0c16fd82

Comment 18 Fedora Update System 2016-09-01 13:37:19 UTC
tomcat-8.0.36-2.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 19 Fedora Update System 2016-09-01 16:55:54 UTC
tomcat-8.0.36-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 20 Fedora Update System 2016-09-01 18:53:23 UTC
tomcat-8.0.36-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.