Bug 894750 - MySQL does not start if /var/run/mysqld/ is missing
Summary: MySQL does not start if /var/run/mysqld/ is missing
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-13 17:00 UTC by Dr. Tilmann Bubeck
Modified: 2013-01-17 19:24 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-01-13 21:11:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dr. Tilmann Bubeck 2013-01-13 17:00:00 UTC
Description of problem:
MySQL stores its PID file in /var/run/mysqld/mysqld.pid but /var/run is on a tmpfs and therefore /var/run/mysqld does not exist after poweron. Therefore MySQL does not start.

The problem can be fixed by creating the missing directory and starting MySQL again. However, this should be done by MySQL (or the scripts) itself.


Version-Release number of selected component (if applicable):
mysql-server-5.5.29-1.fc18.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install Fedora
2. systemctl enable mysqld.service
3. reboot
4. systemctl status mysqld.service
  
Actual results:
MySQL did not start.

Expected results:
MySQL should be running

Additional info:
[root@frodo iso]# systemctl start mysqld.service
Job for mysqld.service failed. See 'systemctl status mysqld.service' and 'journalctl -n' for details.
[root@frodo iso]# systemctl status mysqld.service
mysqld.service - MySQL database server
          Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled)
          Active: failed (Result: exit-code) since Sun, 2013-01-13 17:44:55 CET; 1s ago
         Process: 2470 ExecStartPost=/usr/libexec/mysqld-wait-ready $MAINPID (code=exited, status=1/FAILURE)
         Process: 2469 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
         Process: 2446 ExecStartPre=/usr/libexec/mysqld-prepare-db-dir %n (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/mysqld.service

Jan 13 17:44:53 frodo.wid.reinform.de systemd[1]: Starting MySQL database server...
Jan 13 17:44:53 frodo.wid.reinform.de mysqld_safe[2469]: 130113 17:44:53 mysqld_safe Logging to '/var/log/mysqld.log'.
Jan 13 17:44:53 frodo.wid.reinform.de mysqld_safe[2469]: 130113 17:44:53 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Jan 13 17:44:55 frodo.wid.reinform.de mysqld_safe[2469]: 130113 17:44:55 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
Jan 13 17:44:55 frodo.wid.reinform.de systemd[1]: Failed to start MySQL database server.
Jan 13 17:44:55 frodo.wid.reinform.de systemd[1]: Unit mysqld.service entered failed state
[root@frodo iso]# tail -20 /var/log/mysqld.log 
130113 17:42:19 [Note] /usr/libexec/mysqld: Shutdown complete

130113 17:42:19 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
130113 17:44:53 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130113 17:44:53 [Note] Plugin 'FEDERATED' is disabled.
130113 17:44:53 InnoDB: The InnoDB memory heap is disabled
130113 17:44:53 InnoDB: Mutexes and rw_locks use GCC atomic builtins
130113 17:44:53 InnoDB: Compressed tables use zlib 1.2.7
130113 17:44:53 InnoDB: Using Linux native AIO
130113 17:44:53 InnoDB: Initializing buffer pool, size = 128.0M
130113 17:44:53 InnoDB: Completed initialization of buffer pool
130113 17:44:53 InnoDB: highest supported file format is Barracuda.
130113 17:44:54  InnoDB: Waiting for the background threads to start
130113 17:44:55 InnoDB: 1.1.8 started; log sequence number 1595675
130113 17:44:55 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
130113 17:44:55 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
130113 17:44:55 [Note] Server socket created on IP: '0.0.0.0'.
130113 17:44:55 [ERROR] /usr/libexec/mysqld: Can't create/write to file '/var/run/mysqld/mysqld.pid' (Errcode: 2)
130113 17:44:55 [ERROR] Can't start server: can't create PID file: No such file or directory
130113 17:44:55 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
[root@frodo iso]# mkdir /var/run/mysqld
[root@frodo iso]# chown mysql /var/run/mysqld
[root@frodo iso]# systemctl start mysqld.service
[root@frodo iso]# systemctl status mysqld.service
mysqld.service - MySQL database server
          Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled)
          Active: active (running) since Sun, 2013-01-13 17:45:31 CET; 3s ago
         Process: 2699 ExecStartPost=/usr/libexec/mysqld-wait-ready $MAINPID (code=exited, status=0/SUCCESS)
         Process: 2675 ExecStartPre=/usr/libexec/mysqld-prepare-db-dir %n (code=exited, status=0/SUCCESS)
        Main PID: 2698 (mysqld_safe)
          CGroup: name=systemd:/system/mysqld.service
                  ├ 2698 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
                  └ 2855 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mysqld.log --pid-file=/var/run/m...

Jan 13 17:45:29 frodo.wid.reinform.de systemd[1]: Starting MySQL database server...
Jan 13 17:45:29 frodo.wid.reinform.de mysqld_safe[2698]: 130113 17:45:29 mysqld_safe Logging to '/var/log/mysqld.log'.
Jan 13 17:45:29 frodo.wid.reinform.de mysqld_safe[2698]: 130113 17:45:29 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Jan 13 17:45:31 frodo.wid.reinform.de systemd[1]: Started MySQL database server.

Comment 1 Tom Lane 2013-01-13 17:57:26 UTC
If true, this would mean that something is broken in systemd's tmpfiles support, because mysql-server does install a file /usr/lib/tmpfiles.d/mysql.conf containing

d /var/run/mysqld 0755 mysql mysql -

So if that directory isn't there after a reboot, it's not mysql's fault.

I wonder whether bug #894590 is related ...

Comment 2 Dr. Tilmann Bubeck 2013-01-13 20:35:25 UTC
Thanks for pointing to /usr/lib/tmpfiles.d, this gives more information:

1. I am running NIS with the mysql user coming from NIS (I was not aware of this).

2. In mysql-server's preinstall scriptlet, the user "mysql" is useradded 
   which does nothing in my case, because that user is already known to the
   system (from NIS).

3. Upon poweron, NIS is not already loaded, so the user mysql is unknown
   to /usr/lib/tmpfiles.d/mysql.conf which gives the following error to
   journald:

Jan 13 21:09:20 frodo.wid.reinform.de systemd-tmpfiles[513]: [/usr/lib/tmpfiles.d/mysql.conf:1] Unknown user 'mysql'.

4. Therefore, the directory is not created and the problem arises.

Proposal 1: I see mysql as a local user which should not come from NIS. Therefore
I delete it from NIS and anything should work.

The problem could be seen by executing "systemctl --failed" which shows that systemd-tmpfiles-setup.service failed. However, because of plymouth I was not aware, that anything was failing at all.

Proposal 2: Invent anything, which shows clearly upon poweron, that something failed, which is otherwise hidden by plymouth. Maybe a information sign in GDM telling "Attention: There are failed services. Please use systemctl --failed to list them"... Or something else.

Comment 3 Tom Lane 2013-01-13 20:56:10 UTC
(In reply to comment #2) 
> 1. I am running NIS with the mysql user coming from NIS (I was not aware of
> this).
> 
> 2. In mysql-server's preinstall scriptlet, the user "mysql" is useradded 
>    which does nothing in my case, because that user is already known to the
>    system (from NIS).
> 
> 3. Upon poweron, NIS is not already loaded, so the user mysql is unknown
>    to /usr/lib/tmpfiles.d/mysql.conf which gives the following error to
>    journald:
> Jan 13 21:09:20 frodo.wid.reinform.de systemd-tmpfiles[513]:
> [/usr/lib/tmpfiles.d/mysql.conf:1] Unknown user 'mysql'.

Wow, that's an interesting failure mode.  It seems unlikely that we'd want to try to make NIS start before systemd-tmpfiles runs.  So that means that all usernames mentioned in tmpfiles scripts had better be locally known.  How can that be implemented/enforced?  Or maybe there had better be some sort of caching of those names, rather than restricting the functionality?

Comment 4 Lennart Poettering 2013-01-13 21:11:14 UTC
You cannot have system users in LDAP really. 

This will fail here and a ton of other areas. It's fine to share actual human users with LDAP, but if you share system users you must make them available even if the network is connectable. There are solutions for that (i think sssd can cache that for you), but as system users are generally managed by postinst scripts, and hence are more under the ownership of the OS than the admin I'd not bother.

Really, I don't see anything to fix there. This is not a supported setup, and hence you keep the parts. You get an error message, like you would get for other issues to.

Closing.

Comment 5 Tom Lane 2013-01-13 22:17:15 UTC
(In reply to comment #4)
> You cannot have system users in LDAP really.

That seems reasonable ...

> Really, I don't see anything to fix there.

... but I'm concerned about how Till managed to get system users into NIS without knowing it.  Seems like there might be at least a usability problem.  Maybe it's not systemd's fault, but some component didn't do a good job here.

As far as mysql in particular is concerned, it seems like it would have been a good thing to force the mysql user to be created locally, whether or not it was known in NIS.  Is there a way to tell useradd to do that?

Comment 6 Lennart Poettering 2013-01-15 23:41:49 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > You cannot have system users in LDAP really.
> 
> That seems reasonable ...
> 
> > Really, I don't see anything to fix there.
> 
> ... but I'm concerned about how Till managed to get system users into NIS
> without knowing it.  Seems like there might be at least a usability problem.
> Maybe it's not systemd's fault, but some component didn't do a good job here.

Hmm, does useradd(8) actually have support for creating users via LDAP? If so, it should refuse to do that if --system is specified. Might be worth filing a bug about that.

> 
> As far as mysql in particular is concerned, it seems like it would have been
> a good thing to force the mysql user to be created locally, whether or not
> it was known in NIS.  Is there a way to tell useradd to do that?

Probably something to ask in the bug to file. But honestly "--system" should be enough of an indication for that I think

Comment 7 Tom Lane 2013-01-16 00:16:23 UTC
(In reply to comment #6)
> Hmm, does useradd(8) actually have support for creating users via LDAP?

More the reverse, actually.  man useradd quoth:

       You may not add a user to a NIS or LDAP group. This must be performed
       on the corresponding server.

       Similarly, if the username already exists in an external user database
       such as NIS or LDAP, useradd will deny the user account creation
       request.

mysql.spec ignores failure of the useradd command (and there's not much it could do about it anyway AFAICS).  So the failure mode was that Till transferred his whole /etc/passwd list into NIS, probably from some previous incarnation of his system, and then mysql.spec failed to recreate the local mysql user on this machine.

I can see the point of the useradd restriction, ie not to let local and remote user databases have conflicting entries, but it's not being helpful in this case.

I'll file a bug, though I suspect there may not be any very nice answer here.

Comment 8 Tom Lane 2013-01-16 00:26:42 UTC
shadow-utils bug filed at bug #895765

Comment 9 Dr. Tilmann Bubeck 2013-01-16 19:06:38 UTC
@Lennart: Yes, it was my fault, to create the mysql user in NIS.

But the proposal I was giving in comment #2 was:

Proposal 2: Invent anything, which shows clearly upon poweron, that a service  failed, which is otherwise hidden by plymouth. Maybe a information sign in GDM telling "Attention: There are failed services. Please use systemctl --failed to list them"... Or something else.

Do you think, this would be helpful for users with problems in their configuration? Are you willing to support it?

Comment 10 Lennart Poettering 2013-01-17 19:24:00 UTC
Till, on systemd's todo list is to turn of boot time logging automatically on the first failing service. Next step would then be to tell Plymouth to terminate too. Added that bit to the todo list now, too


Note You need to log in before you can comment on or make changes to this bug.