Bug 694647

Summary: misconfigured swap delays boot
Product: [Fedora] Fedora Reporter: cornel panceac <cpanceac>
Component: systemdAssignee: Lennart Poettering <lpoetter>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 15CC: johannbg, johannbg, lpoetter, metherid, mschmidt, notting, plautrba
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-08 07:05:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
systemd dump
systemd test none

Description cornel panceac 2011-04-07 20:35:11 UTC
Description of problem:
for a while i've noticed that the system seems to hang at boot after displaying the message:

"Starting recreate volatile files and directories."

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. boot the pre-beta f15
Actual results:

Expected results:

Additional info:

# systemd-analyze 
  4021ms NetworkManager.service
  3997ms abrtd.service
  2800ms rtkit-daemon.service
  2788ms sandbox.service
  2766ms mcelog.service
  2412ms ip6tables.service
  1865ms sendmail.service
  1825ms iptables.service
  1605ms udev.service
  1379ms fedora-loadmodules.service
  1306ms cpuspeed.service
  1189ms systemd-vconsole-setup.service
  1181ms rpcbind.service
   878ms netfs.service
   792ms rpcidmapd.service
   752ms nfslock.service
   699ms sshd.service
   680ms fedora-sysinit-hack.service
   586ms systemd-remount-api-vfs.service
   545ms var-lock.mount
   535ms bluetooth.service
   533ms var-run.mount
   515ms media.mount
   511ms systemd-tmpfiles-setup.service
   420ms auditd.service
   399ms pcscd.service
   397ms fedora-sysinit-unhack.service
   389ms rpcgssd.service
   386ms console-kit-log-system-start.service
   371ms fedora-storage-init.service
   299ms dbus.service
   255ms systemd-sysctl.service
   246ms acpid.service
   160ms irqbalance.service
   157ms rsyslog.service
   154ms console-kit-daemon.service
   126ms remount-rootfs.service
   123ms rc-local.service
   118ms systemd-user-sessions.service
   117ms fedora-readonly.service
    79ms fedora-autoswap.service
    77ms udev-trigger.service
    68ms fedora-wait-storage.service
    42ms accounts-daemon.service

Comment 1 J├│hann B. Gu├░mundsson 2011-04-07 20:54:38 UTC
Please follow [1[ and attache the mentioned files to this report

Thank you


Comment 2 cornel panceac 2011-04-08 04:53:41 UTC
Created attachment 490694 [details]

Comment 3 cornel panceac 2011-04-08 04:54:06 UTC
Created attachment 490695 [details]

Comment 4 cornel panceac 2011-04-08 04:54:33 UTC
Created attachment 490696 [details]
systemd dump

Comment 5 cornel panceac 2011-04-08 04:58:02 UTC
update: it does not happen always. i had to reboot once today to see it happen. when it happened, it took 2 minutes and 50 seconds to get over it.

$ cat /proc/cmdline
ro root=UUID=edece934-ea71-4c50-8890-d4539d57af90 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us

Comment 6 cornel panceac 2011-04-08 04:59:03 UTC
Created attachment 490697 [details]
systemd test

Comment 7 Michal Schmidt 2011-04-08 07:05:08 UTC
From the log:
  [  187.312786] systemd[1]: Job dev-sda3.device/start timed out.

I suppose you have in your /etc/fstab something like:
  /dev/sda3  swap  swap  defaults 0 0

You should not use /dev/sd* names to refer to disks. The order in which the sda, sdb, sdc names are assigned is not deterministic. It depends on the more or less random order in which the kernel discovers them.

Use either "UUID=...", "LABEL=...", or deterministic "/dev/disk/by-..." names.

Your messages file has a proof that the random naming really happens.
Notice the disk with 7 partitions is sometimes sda and in other boots it is sdc:

$ grep 'sd[abc]:' messages.txt 
Apr  7 04:43:13 localhost kernel: [    2.348214]  sda: sda1
Apr  7 04:43:13 localhost kernel: [    2.820968]  sdb: sdb1
Apr  7 04:43:13 localhost kernel: [    2.873457]  sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 sdc7 >
Apr  7 18:38:10 localhost kernel: [    2.323890]  sda: sda1
Apr  7 18:38:10 localhost kernel: [    2.794952]  sdb: sdb1
Apr  7 18:38:10 localhost kernel: [    2.841086]  sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 sdc7 >
Apr  7 21:56:09 localhost kernel: [    2.238176]  sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
Apr  7 21:56:09 localhost kernel: [    2.320917]  sdb: sdb1
Apr  7 21:56:09 localhost kernel: [    2.800394]  sdc: sdc1
Apr  7 22:01:10 localhost kernel: [    2.359911]  sda: sda1
Apr  7 22:01:10 localhost kernel: [    2.829966]  sdb: sdb1
Apr  7 22:01:10 localhost kernel: [    2.901630]  sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 sdc7 >

Comment 8 cornel panceac 2011-04-09 04:36:24 UTC
thank you for your help. indeed i had a "faulty" line in fstab and i changed it accordingly. however, i must ask this: does systemd really have to wait 3 minutes to figure out a swap partition is missing? and all this time the boot has to be halted?

Comment 9 Michal Schmidt 2011-04-10 20:57:40 UTC
In general the boot has to wait until the configured swaps are ready, because continuing without them may result in running out of memory.

Also in general it is correct to give the hardware/kernel/udev enough time to discover the disks.

I see how this situation could be considered special though. When the system has discovered the /dev/sda disk and after reading its partition table it sees only one partition, it should know it is futile to wait for sda3. So perhaps the case could be handled better somehow.

But keeping in mind that this is, after all, a misconfiguration and that there are still reported bugs affecting correctly configured systems, I wouldn't expect much effort going into fixing this inconvenience any time soon.

Comment 10 Lennart Poettering 2011-04-11 18:07:20 UTC
You can add "nofail" to the mount options of your swap device. If you do then systemd will use it when it shows up, but not wait for it at boot. So I think your case is pretty well covered already.

Comment 11 cornel panceac 2011-04-11 18:14:52 UTC
thank you very much for your support. do you think one day a system will be able to "learn" from experience and add nofail by itself (maybe temporary) if the device was once missing? or if the device is not present in the partition table?

Comment 12 Lennart Poettering 2011-04-11 18:22:18 UTC
We cannot know what is necessary to make a device show up. We have to assume that a device is just slow in probing, and then eventually time-out.

On certain machines swap devices are needed for normal operation (simply because there is not enough real RAM). I think we should follow user configuration as far as possible, and never rewrite what the user explicitly configured.

Comment 13 cornel panceac 2011-04-11 18:28:54 UTC
i see. is there a way to configure the timeout to a smaller value?

Comment 14 Lennart Poettering 2011-04-11 18:38:49 UTC
Not right now (unless you configure the swap partition via a .swap unit file, instead of a line in /etc/fstab), but I do plan to add an option for that in fstab very soon.

Comment 15 cornel panceac 2011-04-11 19:00:30 UTC
that would be great. thank you very much.