Bug 840242 - "systemd-fstab-generator[439]: Failed to create unit file: File exists" and a subsequent failure to boot
"systemd-fstab-generator[439]: Failed to create unit file: File exists" and a...
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: systemd (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: systemd-maint
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-14 14:00 EDT by Michal Jaegermann
Modified: 2012-10-09 17:11 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-09 17:11:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
systemd related messages from dmesg (3.4.0-0.rc4.git2.1.fc18.x86_64 kernel) (3.73 KB, text/plain)
2012-07-14 14:01 EDT, Michal Jaegermann
no flags Details
results of 'systemctl --all | grep dead' (6.83 KB, text/plain)
2012-07-14 14:02 EDT, Michal Jaegermann
no flags Details
dmesg for 3.6.0-0.rc0.git2.1.fc18.x86_64 with systemd debugging info (205.20 KB, text/plain)
2012-08-01 18:38 EDT, Michal Jaegermann
no flags Details
remainder of a dmesg output after a "manual intervention" (89.37 KB, text/plain)
2012-08-01 18:48 EDT, Michal Jaegermann
no flags Details
fstab from an affected test system (1.04 KB, text/plain)
2012-08-01 18:53 EDT, Michal Jaegermann
no flags Details

  None (edit)
Description Michal Jaegermann 2012-07-14 14:00:46 EDT
Description of problem:

After applying the current rawhide updates on my test system an attempt to boot quickly produces a series of messages:

systemd-fstab-generator[439]: Failed to create unit file: File exists

and is downwards from there.  Mostly anything reports "Dependency failed for ..." and after a delay of close to two minutes I am getting:


Welcome to emergency mode. Use "systemctl default" or ^D to enter
default mode.
Press enter for maitenance(or type Control-D to continue):

"Control-D" brings more of the same so it does not make sense. "Maitenance" reveals that only / and /usr are mounted all.  Typing here 'mount -a' mounts
other file systems and starting network is causing a boot to continue, in a sense and by itself, until it brings a graphic login screen.  An attempt to login there results only in a "System is going down" alert and nothing more.

The same messages shows up while loging from a remote while a system is NOT going down by any stretch of imagination.

While there 'systemctl --all --failed' reports only sendmail.service and yum-updatesd.service while 'systemctl --all | grep dead' shows 82 various services (an output attached).  All in all not a very usable outcome.

Version-Release number of selected component (if applicable):
systemd-186-2.fc18

How reproducible:
On every attempt to boot

Additional info:
The above is while using 3.4.0-0.rc4.git2.1.fc18.x86_64 kernel which I can still boot.  Trying 3.5.0-... kernels is causing some "difficulties" (see bug 840235).
Comment 1 Michal Jaegermann 2012-07-14 14:01:48 EDT
Created attachment 598265 [details]
systemd related messages from dmesg (3.4.0-0.rc4.git2.1.fc18.x86_64 kernel)
Comment 2 Michal Jaegermann 2012-07-14 14:02:44 EDT
Created attachment 598266 [details]
results of 'systemctl --all | grep dead'
Comment 3 Michal Jaegermann 2012-07-14 14:22:49 EDT
For an added attraction "poweroff" in a state as described in this report does kill an access to a system but really powering system off does not happen.
Comment 4 Michal Jaegermann 2012-07-24 20:09:46 EDT
Bug 840235 is fixed by kernel 3.5.0-1.fc18.x86_64.  This does not help with systemd, now at systemd-187-1.fc18, which is as broken as it was before and fails to mount most of disks and consequently fails on nearly anything.
Comment 5 Michal Schmidt 2012-08-01 13:00:43 EDT
Hm, we should add the file name to this error message.


Please attach a more detailed dmesg, as described in:
http://freedesktop.org/wiki/Software/systemd/Debugging#If_You_Can_Get_a_Shell
Comment 6 Michal Jaegermann 2012-08-01 18:38:02 EDT
Created attachment 601830 [details]
dmesg for 3.6.0-0.rc0.git2.1.fc18.x86_64 with systemd debugging info

> Please attach a more detailed dmesg ...

Here we go!  This part from the start to the moment when 'Welcome to emergency mode. ...' prompt shows up and that mode is entered.

If at this moment one would try 'systemctl list-jobs' then none of jobs is listed and only things like that:
[  252.676321] systemd[1]: Accepted connection on private bus.
[  252.702988] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.ListJobs() on /org/freedesktop/systemd1
[  252.731174] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local

show up on a console.
Comment 7 Michal Jaegermann 2012-08-01 18:48:50 EDT
Created attachment 601831 [details]
remainder of a dmesg output after a "manual intervention"

Typing 'mount -a; exit' in "emergency shell" allows to proceed although on a resulting system non-root logins fail because presumably "System is going down" while really it is not.  In particular that means that, say, gdm-3.5.4.2-2.fc18 attempts to start but, unfortunately, attempts to run a gnome-shell session for a 'gdm' user and immediately exits as the later failed due to "System is going down".

Before described series of rawhide updates this system was coming up without major issues.
Comment 8 Michal Jaegermann 2012-08-01 18:53:04 EDT
Created attachment 601832 [details]
fstab from an affected test system

/bin, /lib, ... etc were converted to symlinks to /usr/ quite a while ago and before failure to boot happens / and /usr are getting mounted; only other file systems are left in a funk.
Comment 9 Michal Jaegermann 2012-08-01 19:15:49 EDT
Just updated to  systemd-187-3.fc18 and kernel 3.6.0-0.rc0.git6.1.fc18.x86_64. Not a surprise but nothing changed in a situation described in this report.
Comment 10 Michal Schmidt 2012-08-02 08:17:51 EDT
A few observations I made so far:

- This is what eventually leads to the switch to the emergency mode:
[  109.888141] systemd[1]: Job dev-disk-by\x2dlabel-opt1.device/start timed out.

- We seem to have a problem with escaping the name of the affected device unit:
[   19.860537] systemd[1]: Installed new job dev-disk-by\x2dlabel-opt1.device/start as 57
...
[   49.497121] systemd[1]: dev-disk-by\x2dlabel-\x5cx2fopt1.device changed dead -> plugged

  See that the names came out differently in the two cases. That's why systemd
  failed to detect that the device was already available.
  We should be able to reproduce this by refering to disks using labels
  containing "/"...

- You have some old udev rules present that still refer to hal, which is obsolete
  since F16, I believe:
[  198.061563] systemd-udevd[1876]: failed to execute '/usr/lib/udev/socket:@/org/freedesktop/hal/udev_event' 'socket:@/org/freedesktop/hal/udev_event': No such file or directory
Comment 11 Michal Jaegermann 2012-08-02 11:40:26 EDT
(In reply to comment #10)
> A few observations I made so far:
> 
> - This is what eventually leads to the switch to the emergency mode:

Recently on test@lists.fedoraproject.org somebody was complaining that he ended up in an "emergency mode" after an attempt of a fresh rawhide installation.  I am afraid that there were no details or a description of attempts to get at least some explanation.  I did not save a copy of this message and now I cannot find it.  Sigh!

> - You have some old udev rules present that still refer to hal, which is
> obsolete

Yes, but I was not adding or removing udev rules myself when going through various updates.  One of goals of this test system is to see how it holds over time.

> [  198.061563] systemd-udevd[1876]: failed to execute
> '/usr/lib/udev/socket:@/org/freedesktop/hal/udev_event'
> 'socket:@/org/freedesktop/hal/udev_event': No such file or directory

Is this really significant?  "198.061563" is quite past an initial failure to mount disk partitions. It is also quickly followed by "systemd-udevd.service changed start -> running".  In any case what was responsible for cleaning up such leftovers?
Comment 12 Michal Jaegermann 2012-08-02 19:00:37 EDT
BTW - I do not know if this is related but one component of this mess is that
sendmail.service never starts.  It attempts to do that, for quite a while, but eventually it always fails with "Active: failed (Result: timeout) ...".
Comment 13 Michal Jaegermann 2012-08-16 18:37:40 EDT
systemd-188-3.fc18 has the same problem as systemd-187-3.fc18 - i.e. it fails to mount most of disk located file systems.

In case somebody would ask: I made sure that /var/run and /var/lock are symbolic links.  After I reached shell prompts 'systemctl --failed' gives me

systemd-...es-setup.service loaded failed failed     Recreate Volatile Files and Directories
yum-updatesd.service        loaded failed failed     YUM Package	Update Service

sendmail.service is turned off at this moment.
Comment 14 Michal Jaegermann 2012-10-09 17:11:43 EDT
(In reply to comment #10)

> 
> - We seem to have a problem with escaping the name of the affected device
> unit:
> [   19.860537] systemd[1]: Installed new job
> dev-disk-by\x2dlabel-opt1.device/start as 57

AFAICS systemd-194-1.fc18 does not have that problem anymore and it is possible to boot with it without a manual intervention at this point.

Note You need to log in before you can comment on or make changes to this bug.