718288 – Start-up of mongod appears racey under systemd

Bug 718288 - Start-up of mongod appears racey under systemd

Summary: Start-up of mongod appears racey under systemd

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mongodb
Sub Component:
Version:	15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Chris Lalancette
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-07-01 17:00 UTC by Alex Hudson (Fedora Address)
Modified:	2012-02-03 18:30 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-02-03 18:30:40 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Basic systemd service file for MongoDB (207 bytes, text/plain) 2011-07-19 11:44 UTC, Alex Hudson (Fedora Address)	no flags	Details
Fedora-15 enhanced legacy-style initscript (7.03 KB, application/octet-stream) 2011-08-30 17:40 UTC, Chris Lalancette	no flags	Details
Fix mongod's forking model to play nice with systemd (3.96 KB, patch) 2011-09-12 17:56 UTC, Chris Lalancette	no flags	Details \| Diff
Show Obsolete (2) View All

Description Alex Hudson (Fedora Address) 2011-07-01 17:00:07 UTC

Description of problem:

mongod startup under Fedora 15's systemd init appears to be racy. It always "succeeds", often, the mongod process isn't actually running.

Discussion on #fedora-devel pointed out mongod's fork and exit model probably defeats the MainPID detection in systemd. It should probably follow one of these models:

    http://0pointer.de/public/systemd-man/daemon.html

Version-Release number of selected component (if applicable):

mongodb-server-1.8.0-3.fc15.x86_64

How reproducible:

Sometimes.

Comment 1 Alex Hudson (Fedora Address) 2011-07-19 11:44:23 UTC

Created attachment 513779 [details]
Basic systemd service file for MongoDB

I can't get MongoDB to start reliably at all under F15; and indeed it doesn't seem to shut down correctly either.

I've changed its startup to a native systemd service which seems to be more reliable, and I've attached that file to this bug. I doubt this is the right way to do it, but it works for me.

Comment 2 Chris Lalancette 2011-07-21 20:12:57 UTC

Cool, thanks for that native file.  I've taken that and enhanced it.  I'm still testing this out locally, but assuming it works for me I'll push it to rawhide and then look at doing the same for F-15.

Comment 3 Michal Schmidt 2011-07-21 20:28:49 UTC

https://fedoraproject.org/wiki/Packaging:ScriptletSnippets#Packages_migrating_to_a_systemd_unit_file_from_a_SysV_initscript says:

  Packages are strictly forbidden from migrating to systemd within updates to
  a Fedora release. The migration is only allowed between Fedora releases.

Comment 4 Chris Lalancette 2011-07-25 09:57:35 UTC

So the systemd stuff seems to work pretty well, so I pushed mongodb-1.8.2-5 to rawhide.  I'm not quite sure what to do about F-15.  Unfortunately I won't be able to look at this quite at the moment, but I'll try to get back to it later this week.

Comment 5 Alex Hudson (Fedora Address) 2011-07-26 08:42:05 UTC

Migrating to systemd is clearly verboten. However, I don't think anything would be stopping you shipping the native file in the package so that the end user could enable it - that's not migration, and there's nothing in the rules about providing the option.

It's not a perfect solution, but if other people are having as much trouble with the init system as I did, it would help. With the systemd-based sysvinit it's actually horribly hard to debug what's going on.

I'd love to hear from other F-15 users about this issue. I'd like to think that I'm alone with the init problems, but it seems a kind of fundamental issue.

Comment 6 Chris Lalancette 2011-07-26 09:11:52 UTC

The problem is that if there is both a native systemd file (/lib/systemd/system/mongod.service) and a legacy script (/etc/init.d/mongod), the native systemd one takes precedence.  So that would still be "migrating" during F-15.  I guess we could put the systemd file in an alternate location (like /usr/share/doc or something), but I'm not sure how useful that is in general.

For what it is worth, I've had good luck (on rawhide) with using legacy init-style scripts under systemd.  That is, I've never seen it be the case that it is systemd's fault that my service failed to start.  What I have found, though, is that tracking down why something is failing is more difficult under systemd.

Comment 7 Chris Lalancette 2011-08-30 17:39:22 UTC

OK, I took another look at this.  From what I can tell, this never could have worked on Fedora 15.

When a legacy style initscript attempts to source /etc/init.d/functions, systemd immediately takes over and just tries to launch the name of the program through systemd.  The rest of the initscript is totally ignored.

What this means for mongod is interesting.  If you just run mongod without any arguments, it does launch with defaults, but it launches in the foreground.  Since systemd launches things using a unix socket, presumably the backend process that is doing the launching has no controlling TTY, so mongod probably exits immediately instead of doing anything.

The right fix here is to switch to a native systemd service so we can specify the parameters that are needed.  That's what I've done in F-16/rawhide.  Since the Fedora guidelines prevent us from doing this on F-15, the backup plan is to not source /etc/init.d/functions at all, and just copy the relevant portions that we need into the mongod script.  I've tried this out and it works well for me.  Yes, it is a hack, but it is a temporary hack since we know we are doing the right thing for F-16.  I've attached the modified initscript to this bug.  Alex, if you get a chance could you copy this initscript to /etc/init.d/mongod, and then try it out to see if it works for you?  Thanks.

Comment 8 Chris Lalancette 2011-08-30 17:40:08 UTC

Created attachment 520657 [details]
Fedora-15 enhanced legacy-style initscript

Comment 9 Chris Lalancette 2011-08-30 19:38:02 UTC

Arg.  My analysis above is incomplete.  What actually happens is that systemctl puts the job on the dbus, and then systemd picks up the job from the bus.  It then fork+execs the /etc/init.d script as usual.  So it means that the whole initscript actually is run, just in a roundabout way.

I'm not sure where this leaves us.  I was having problems getting mongod started earlier today, but after yum remove '*mongo*' and a re-install, it now always succeeds.

Comment 10 Michal Schmidt 2011-09-01 15:15:34 UTC

mongodb's double forking in db/cmdline.cpp is wrong. The original process may exit before the pidfile is written. This is why 'systemctl show mongod.service' often shows 'active (exited)' and a wrong value for 'Main PID'. This should be fixed.

However, I don't know why mongod would be not running. I cannot reproduce the situation.

Comment 11 Michal Schmidt 2011-09-01 16:00:26 UTC

(In reply to comment #10)
> However, I don't know why mongod would be not running. I cannot reproduce the
> situation.
... in F15.

In F16 with the native unit file the bug is worse. Where in F15 mongod.service would go into the 'active (exited)' state, in F16 systemd will force kill the service in this case. That's because SysV units get an implicit 'RemainAfterExit=yes' and native units don't.
I do NOT recommend using RemainAfterExit as a workaround. The bug in the daemon should be fixed.

Comment 12 Chris Lalancette 2011-09-09 16:46:53 UTC

(In reply to comment #10)
> mongodb's double forking in db/cmdline.cpp is wrong. The original process may
> exit before the pidfile is written. This is why 'systemctl show mongod.service'
> often shows 'active (exited)' and a wrong value for 'Main PID'. This should be
> fixed.

I guess I don't quite understand this bit, and how it is confusing systemd.  The code in cmdline.cpp seems to be following the standard Unix daemon method, i.e.:

fork()
exit parent
chdir("/")
setsid() // become session leader
fork() // to kill session leader
reopen(stderr)
reopen(stdout)

Additionally, the pidfile isn't written until *after* this sequence, which seems to me to be the right thing; you don't want the PID of the original process, you want the PID of the final child.  Can you explain more about how this is messing up systemd, and what you think the solution would be?

Comment 13 Alex Hudson (Fedora Address) 2011-09-12 11:44:38 UTC

To label it "wrong" is a long stretch, but I think what he's saying is that the initial process shouldn't exit until the PID file is written - so, that first exit should be delayed.

The easiest way to do this would be to have the first process get the exit code of its child (i.e. join it), before it exits, and then that child be responsible for writing the PID file.

At the moment, it looks like the child's child writes the pidfile, but the child knows its child's PID, so it would be capable of doing that.

So the code would change to something like:

if (c) {
   wait(-1);
   _exit (0);
}

...

if (c2) {
   if ( params.count("pidfilepath")) {
        writePidFile( c2,  params["pidfilepath"].as<string>() );
   }
   _exit (0);
}

Comment 14 Michal Schmidt 2011-09-12 12:54:20 UTC

(In reply to comment #12)
> I guess I don't quite understand this bit, and how it is confusing systemd.

As soon as the original process exits with success, systemd assumes the service is now started. systemd then goes on to detect the main PID of the service.

With a perfectly written forking service this is easy, because the pidfile is already on the filesystem at this point and systemd just has to read it (it knows where to find it from the 'PIDFile=' directive in the service's unit file).

mongod, however, exits the main process early and the main PID detection in systemd races with the initialization going on in mongod. The pidfile may not be written yet, when systemd attempts to read it. When systemd is unable to detect the main PID of the service using the reliable method, it falls back to guessing.
The sequence may go like this:
 1. systemd starts mongod (with, say, PID=1000).
 2. mongod forks the first child (PID=1001) and exits PID 1000.
 3. systemd gets notified of the successful exit of PID 1000.
 4. systemd tries to read the pidfile. This fails.
 5. systemd guesses that the main PID of the service must be PID=1001.
 6. PID 1001 forks the second child (PID=1002) and then it exits itself.
 7. systemd gets notified of the exit of PID 1001.
 8. Since PID 1001 was the main PID of the service, systemd considers the
    whole service finished.
 9. systemd kills all the processes remaining in the service's cgroup.

> Can you explain more about how
> this is messing up systemd, and what you think the solution would be?

comment #0 mentioned the link to the solutions:
http://0pointer.de/public/systemd-man/daemon.html

Comment 15 Chris Lalancette 2011-09-12 17:55:44 UTC

OK, thanks for the explanation.  I've now created a patch that does the pipe() trick for mongod startup, which I'll attach here.  I've built an F-16 and rawhide package with this patch, and also sent the patch upstream, so we'll see what the reaction there is.

Comment 16 Chris Lalancette 2011-09-12 17:56:32 UTC

Created attachment 522749 [details]
Fix mongod's forking model to play nice with systemd

Comment 17 Nathaniel McCallum 2012-02-03 18:30:40 UTC

This is now working.

Note You need to log in before you can comment on or make changes to this bug.