Bug 748920

Summary: Setting back time breaks boot
Product: [Fedora] Fedora Reporter: Kamil Páral <kparal>
Component: e2fsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: amcnabb, awilliam, covex, dracut-maint, duwayne.morris, emailjonathananderson-fedora, esandeen, gholms, harald, infobox.oleg, jaroslav.pulchart, jeff.raber, johannbg, johannbg, jonathan, josef, jsmith.fedora, kevin, kzak, lpoetter, metherid, michaelrado, mishu, mschmidt, nathanael, notting, oliver, plautrba, robatino, samuel-rhbugs, satellitgo, systemd-maint, tflink
Target Milestone: ---Keywords: CommonBugs
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: https://fedoraproject.org/wiki/Common_F16_bugs#changed-time-breaks-boot
Fixed In Version: e2fsprogs-1.42-4.fc17 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-02 04:40:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 752650    
Attachments:
Description Flags
first reboot
none
second reboot
none
no prompt on bare metal
none
f17_virt_problem1.png
none
f17_virt_problem2.png
none
f17_virt_splash.png
none
ubuntu-splash.png
none
ubuntu_problem.png
none
ubuntu_problem_fixed.png none

Description Kamil Páral 2011-10-25 15:07:03 UTC
Description of problem:
Installed default desktop from F16 TC2 DVD. I have set time one month back (Sep 25th instead of Oct 25th). Next reboot Fedora won't boot.

On first reboot, dracut complains about root partition:

/dev/mapper/VolGroup-lv_root: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
	(i.e., without -a or -p options)
dracut Warning: e2fsck returned with 4
dracut Warning: /dev/mapper/VolGroup-lv_root: Superblock last mount time (Tue Oct 25 14:40:08 2011,
dracut Warning: now = Sun Sep 25 14:41:50 2011) is in the future.
dracut Warning: *** An error occurred during the file system check.
dracut Warning: *** Dropping you to a shell; the system will try
dracut Warning: *** to mount the filesystem(s), when you leave the shell.

I am able to hit Ctrl+D and it mounts and continues. Then it complains about /dev/vda2 partition (/boot partition):

systemd-fsck[605]: /dev/vda2: Superblock last mount time (Tue Oct 25 10:40:12 2011,
systemd-fsck[605]: now = Sun Sep 25 10:41:59 2011) is in the future.
systemd-fsck[605]: /dev/vda2: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
systemd-fsck[605]: (i.e., without -a or -p options)
[   13.652068] systemd-fsck[605]: fsck failed with error code 4.
Welcome to emergency mode. Use "systemctl default" or ^D to activate default mode.
Give root password for maintenance
(or type Control-D to continue):

Without manually fixing the problem I'm unable to boot, the prompt repeats itself endlessly.

On second boot, the problem with the root partition disappears and only problem with the /boot partition stays.

However, above described behavior is from VM. When I hit this originally on bare-metal, I did receive dracut shell for root partition problem, but I didn't receive shell for the /boot partition problem. It means even being linux-fu guru didn't help me, because I had no way how to fix it. Booting into "1" or "emergency" yielded same results. Eventually I had to boot anaconda rescue mode to fix my disks.

To sum this up. Setting back time in Fedora for more than one day is a showstopper for a common user. Dracut shell = end of game.
From some reason on my bare metal it is highly problematic even for power users, because they are not allowed to enter maintenance mode shell. I couldn't reproduce that in VM (which I took logs from, because it's easier).

I am not sure whether the fault is in fsck or dracut. But I have to point out that current behavior is ridiculous. Just because filesystem has last mount stamp in the future doesn't mean we can't mount it. Do a full filesystem check first if you must, but don't leave the system in unusable state just because user decided to change his time.

Quick look into fsck's man page revealed that return code 4 is used for "filesystem errors left uncorrected". Maybe it will be needed to add a new return code to fsck to distinguish between real uncorrected errors and future last mount time stamp (that's no error). Or grep in its output or something.

Version-Release number of selected component (if applicable):
systemd-36-3.fc16.i686
dracut-013-15.fc16.noarch
util-linux-2.20-1.fc16.i686

How reproducible:
always

Steps to Reproduce:
1. install Fedora default desktop from DVD
2. reboot into installed system
3. change time to the past, ideally one month
4. reboot
5. see broken system

Comment 1 Kamil Páral 2011-10-25 15:07:25 UTC
Created attachment 530115 [details]
first reboot

Comment 2 Kamil Páral 2011-10-25 15:07:40 UTC
Created attachment 530116 [details]
second reboot

Comment 3 Kamil Páral 2011-10-25 15:13:51 UTC
Due to its severity proposing as F16 Blocker. There are no criteria fitting this use case, probably closest could be:
"All elements of the default panel (or equivalent) configuration in all release-blocking desktops must function correctly in common use"
(setting system clock breaks system boot, that's not "functioning correctly")

But I think we really don't need criteria for this. This is so easy to trigger and it has so severe implications that I believe this automatically qualifies for blocker.

Comment 4 Adam Williamson 2011-10-25 15:40:24 UTC
That criterion doesn't work at all. It's not a desktop issue.

But we could consider this under "The installer must be able to create and install to any workable partition layout using any file system offered in a default installer configuration, LVM, software, hardware or BIOS RAID, or combination of the above" and "In most cases (see Blocker_Bug_FAQ), a system installed according to any of the above criteria (or the appropriate Beta or Final criteria, when applying this criterion to those releases) must boot to the 'firstboot' utility on the first boot after installation, without unintended user intervention, unless the user explicitly chooses to boot in non-graphical mode. This includes correctly accessing any encrypted partitions when the correct passphrase is supplied. The firstboot utility must be able to create a working user account".

Does F15 behave the same?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 5 Adam Williamson 2011-10-25 16:06:48 UTC
CCing Eric for an e2fsprogs perspective.

Comment 6 Nathanael Noblet 2011-10-25 16:22:10 UTC
This bug is present in F15 - I just set the clock back on a VM I had after updating it. I get 

_gfx_test_week_2: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
    (ie without -a or -p options)


Dropping to debug shell.

sh: can't access tty; job control turned off


In my case I ran

fsck /dev/sda

fixed the time and then exit, which allowed the boot to continue and booted into gdm properly. I have two partitions sda1 and sda2, sda1 => /boot and sda2 is an LVM. I didn't have to do anything with that partition however, just sda1.

Comment 7 Nathanael Noblet 2011-10-25 16:22:48 UTC
sorry I meant to say I ran fsck /dev/sda1

Comment 8 Eric Sandeen 2011-10-25 16:43:10 UTC
Right, this is not a new e2fsprogs "feature" (and feature it is, if you ask upstream, though I don't know for sure why there is this fetish about time ...)

If you really want to be able to set your system clock massively behind your filesystem, you can set

broken_system_clock = 1

in /etc/e2fsck.conf and it should skip that check.

-Eric

Comment 9 Adam Williamson 2011-10-25 18:59:01 UTC
given that f15 did the same, it's a 'feature' of e2fsck, and no-one's exploded yet, i'm probably -1 blocker on this. any other votes?

Comment 10 Jared Smith 2011-10-25 19:37:47 UTC
I got bitten by this in F15 as well.  Since it's not something new, and it really is considered a "feature" upstream, I guess I'll vote -1 blocker on this.

Comment 11 Tim Flink 2011-10-25 20:02:14 UTC
I'm -1 blocker on this, too. It's not new and there are workarounds.

That makes for -3 blocker, rejecting.

Comment 12 Dennis Gilmore 2011-10-25 20:04:09 UTC
i have some systems that  often get the time set back to jan 1 1970. as long as i can fsck all filesystems thats ok. but if we are unable to get to a shell and run fsck then i think thats a blocker. as a freshly installed system may not be bootable.

so im +1 blocker

Comment 13 Adam Williamson 2011-10-25 20:17:49 UTC
dennis: this isn't anything new in f16, though...

Comment 14 Eric Sandeen 2011-10-25 22:05:17 UTC
If we as a distro think the time check is nuts, we can ship an e2fsck.conf to turn it off.  It'd need to get pulled into initFOO stuff too.  I agree that it's not a blocker for F16 but it's something we could consider.

-Eric

Comment 15 Adam Williamson 2011-10-25 22:12:27 UTC
I think the ideal thing would be for it to be *noted* but not considered some sort of screaming error, but unfortunately, it seems like this isn't easily possible (it'd require a patch).

Comment 16 Eric Sandeen 2011-10-25 22:22:26 UTC
To be a little more clear - fsck complaining about the clock is not new.  I don't know if the problem where no shell is presented after the error might be new; if it is, it's not e2fsprogs newness but something in the boot process, I think.

Comment 17 Kamil Páral 2011-10-26 10:52:32 UTC
Even though I'm outvoted, my opinion is clear: +1 blocker, +1 nth.

We have +2 blocker, -3 blocker right now, how do we count that? And why we didn't vote at least about NTH?

(In reply to comment #11)
> I'm -1 blocker on this, too. It's not new and there are workarounds.

What do you mean by workarounds? Running fsck in dracut shell? It's not as easy if you are using more complex disk layout. And I didn't even receive that shell on bare metal. And it's game over for common users. Or do you mean setting e2fsck.conf? That prior-problem workaround, not post-problem workaround, and no one knows about that anyway.

(In reply to comment #13)
> dennis: this isn't anything new in f16, though...

I never understood this concept "once broken, always broken". Does that mean that we will never fix such bugs? I understand that sometimes it is useful to know whether this is an immediate regression or a long-standing problem and decide accordingly. But it's not matra. In this case F15 doesn't really matter, it's too severe.


Guys, try to look at it from the perspective of first-time-linux-user people (these people are what drives me to improve Linux, I don't know what's your driver). When they set "incorrect" time, they are screwed. If someone puts his experience to a blog post ("Just set your system time to the past and Fedora will never boot again, hah hah!"), people will die in laughter how easy is to break the system. Imagine that Microsoft would release their system behaving this way. It would hit the headlines all over the world. This problem is embarrassing from PR perspective and from common user experience.

I understand there are technical reasons why the current state is what it is. Maybe some operations in the disk journal may go wrong if you mount the system with past time. Maybe there are other reasons. Maybe it's just a whim of e2fsprogs people and it's not that important at all. But I would like to have some reasonable solution in F16 if possible (let's say there is no risk of losing data so we can just set the option in /etc/e2fsck.conf) and some more proper solution in the future (fsck giving special return code about this, or recognizable output, or running full fsck automatically just to be sure nothing's broken, or the tools for setting system time may add some one-time filesystem flag that time inconsistency shouldn't break next mount because it's expected, whatever).

Even if this is not accepted as a blocker, can we still pull off some easy patch to have this fixed in F16?

Comment 18 Kamil Páral 2011-10-26 10:53:39 UTC
Adding CommonBugs keyword, this should be certainly documented if released this way.

Comment 19 Kamil Páral 2011-10-26 12:09:59 UTC
Created attachment 530275 [details]
no prompt on bare metal

This is how it looks on my bare metal machine. No emergency prompt whatsoever. The only option for me is to burn a LiveCD on a different machine, boot from it and fix the problem manually.

Comment 20 Eric Sandeen 2011-10-26 15:01:14 UTC
This does potentially seem like a problem in the init environment which is
worse than before.

I always hated the dropping-to-a-shell-for-the-clock, but it was semi-common
and fixable.

Shouldn't a fsck failure drop to the shell here too?  That seems like the most
severe bug.  Unfixable fs corruption for any reason (not just the clock) would
land us in the same stuck state.

Comment 21 Adam Williamson 2011-10-26 15:50:30 UTC
harald, can you please look at why dracut isn't showing a prompt here?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 22 Adam Williamson 2011-10-26 15:58:50 UTC
if fixing the lack of prompt in dracut isn't plausible, i wouldn't really hate just disabling the check in e2fsprogs, frankly. It does seem like a dumb thing to consider a critical failure.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 23 Eric Sandeen 2011-10-26 16:05:40 UTC
Changing the behavior by adding /etc/e2fsck.conf will probably require that that file get sucked into the init environment, as well.

-Eric

Comment 24 Adam Williamson 2011-10-26 20:15:19 UTC
I'm re-proposing this as our understanding has changed: the lack of a prompt is new, as far as we know, and makes the problem more serious, as you can't just manually run fsck to escape.

Comment 25 Adam Williamson 2011-10-26 23:06:21 UTC
so, if we wanted to 'fix' the date being a killer, we'd have to adjust e2fsck.conf in e2fsprogs and then I think add:

dracut_install -o /etc/e2fsck.conf

to /usr/share/dracut/modules.d/99fs-lib/module-setup.sh in dracut, AIUI.

Still, not getting a recovery console when there's a critical error seems like a problem that might still be a problem if you hit some *other* critical error too, and we'd really need harald to know why that happened. I'll see if I can reproduce it on my laptop, just a sec...

Comment 26 Adam Williamson 2011-10-27 06:40:03 UTC
So I just tested on my laptop, and I *do* get the fsck errors, but I did get a prompt for both / from dracut and /home from systemd.

the thing that's not giving kparal a prompt looks to be systemd, not dracut, so re-assigning there for now.

lennart or kay, can you look at this?

kamil, can you try a few times and see whether systemd never gives you a prompt, or sometimes does?

when systemd does its fsck, note, the system's /etc/e2fsck.conf would be in play, I believe.

Comment 27 Kamil Páral 2011-10-27 07:47:07 UTC
Progress report: I found out that I can work around the missing prompt by adding rdbreak=pre-mount to kernel options. Then I can manually 'fsck /dev/sda2' and continue booting. After I have done that, I received one more error about /dev/mapper/VolGroup-lv_home, but I was given an emergency shell in this case. So currently it seems I get a dracut or emergency shell for logical volumes, but not for classic partitions.

Will investigate further.

Comment 28 Kamil Páral 2011-10-27 09:42:47 UTC
I tested on a fully updated i686 machine:
systemd-37-2
dracut-013-16

The trick with rdbreak=pre-mount (or rdbreak=mount) is useful for correcting the issue, but system doesn't boot after that and is very reluctant to reboot, so you'll probably need to hard-reboot (sysrq is not enabled by default). Next boot is fine.

I have done several attempts with setting back time [1]. Sometimes I received an error about lv_root, sometimes I did not. But I always (in dozen reboots) received an error about /dev/sda2 (/boot partition) and I never received a recovery shell for that.

Please note that there is no file /etc/e2fsck.conf in my system.

=================

I re-installed the machine (i386 non-PAE, TC2 DVD, default package set), enabled updates-testing, fully updated, used default layout (same as before):
/dev/sda1  BIOS boot
/dev/sda2  /boot
/dev/sda3  VolGroup
VolGroup: lv_root, lv_home, lv_swap

After I changed time and reboot, I received dracut shell for lv_root, fixed the issue, continued booting, received emergency shell for sda2, fixed the issue, continued booting, and received error for lv_home but *not* received any shell. Therefore the problem is a little different than before. After reboot I received an emergency shell for lv_home, fix ed the problem, but system didn't continue booting (similar as when I used rdbreak), had to reset it. After reboot everything started fine.

After I changed time again, I received dracut shell for lv_root, fixed the issue, continued booting, received error for sda2 but *no* received any shell (the same behavior as in the original report).

On all of the subsequent attempts I never received a shell for sda2, only error.

It means that the behavior may differ for the first time you do time manipulation, but all the following attempts work the same (for me).


[1] Use gnome clock to set the time. I had some issues with "date" command. I will investigate further on it, just a heads-up, use gnome clock.

Comment 29 Adam Williamson 2011-10-27 15:50:13 UTC
it's always systemd-fsck that gives you the error with no prompt, yes?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 30 Adam Williamson 2011-10-27 16:37:48 UTC
tested again on my laptop, and I still get a prompt for two partitions. dracut prompts me for / (which is identified by UUID), and systemd for /boot (/dev/sda2) . I can attach a picture of the prompt.

A thought occurs: I'm testing this not by setting the time in Fedora but by doing it in the BIOS. Why wouldn't that always be a relatively easy workaround for this? Just go into the BIOS and set the time forward?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 31 Adam Williamson 2011-10-27 17:07:01 UTC
Since we have differing experiences in this one, can some other people test? To test, just reboot, go to BIOS, set the clock back a couple days or a month or whatever, and then boot to Fedora. You should get a prompt from dracut about the failure and it'll tell you to run fsck manually on some disk: do so, and then continue the boot. during boot, systemd should then have the same issue. it *should* ask for your root password then give you a prompt from which you can run the fsck manually and then continue boot. let us know if it does or doesn't. thanks!

Comment 32 Adam Williamson 2011-10-27 17:07:17 UTC
once you've finished testing you can of course just boot back to BIOS and fix the date.

Comment 33 Eric Sandeen 2011-10-27 17:23:08 UTC
Just a note:

/etc/e2fsck.conf is not currently installed on fedora.  It'd be a change to e2fsprogs to do so.

-Eric

Comment 34 Bill Nottingham 2011-10-27 19:29:42 UTC
Can't we just flip the compiled-in default to e2fsprogs? (Yes, and likely annoy upstream.)

Comment 35 Eric Sandeen 2011-10-27 20:18:25 UTC
We can fork anything we like, I suppose ;)

Seems like it might be better to use the existing upstream methods, though, no?

Comment 36 Bill Nottingham 2011-10-27 20:21:17 UTC
Sure, just wondering whether it's easier to change the default in cases where we can logically state 'upstream default is bad'.

Comment 37 Eric Sandeen 2011-10-27 20:37:33 UTC
to be honest, I do have one patch I've carried for years :(

Comment 38 Tim Flink 2011-10-27 20:46:49 UTC
I was able to work around this pretty easily when I changed the system clock after install but before first reboot (which is my understanding of the blocker issue) but I want to reproduce that again before describing it here since my subsequent "reproductions" have been different.

To put my system into this situation after first reboot, I've been doing 'touch /forcefsck', rebooting and setting my system clock back >= 1 month before the system comes back up.

When I do this, dracut errors out about fsck errors on my lv mounted as / and gives me a repair shell. I am able to fsck the / lv and continue the boot process.

Later on, systemd errors out with fsck errors on my /boot partition but gives me no option to use a maintenance shell. I can reboot any number of times and end up in the same place with the same errors.

However, if I use the 'fastboot' kernel parameter, all of a sudden systemd gives me the option to get into the maintenance shell so I can fix the fsck errors. Once I do this and exit the maintenance shell, systemd shows fsck errors for my /home lv and does not give the option to use a maintenance shell.

If I reboot one more time (without fastboot), systemd fails with fsck errors on my /home lv again but gives me the option of using the maintenance shell. Once I run fsck on my /home lv, the boot process finishes without any more problems.

I've been able to do this twice on my system but this doesn't feel quite right. It seems odd to me that the 'fastboot' param would affect systemd in this way.

Comment 39 Adam Williamson 2011-10-27 20:59:10 UTC
notting: we _can_, yes. but in that case we'd want to make sure dracut respects the change, so we'd have to pull the config file into the initramfs, I think. and then we haven't really solved the underlying bug, which appears to be systemd not always giving you a rescue environment when fsck fails: we've just removed one possible cause of fsck failing. there are others, of course...



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 40 Tim Flink 2011-10-27 21:26:31 UTC
I've been able to do this twice now, so it doesn't seem to be a fluke.

With a fresh minimal install of i686 F16 on bare metal, I moved the system clock back 2 months after the installation but before the first reboot.

When I reboot the system, I end up in the dracut shell with fsck errors on my / lv. When I fix those and exist the dracut shell, systemd halts booting with more fsck errors about my /home lv and gives me no option to use the maintenance shell to fix it.

If I add 'fastboot' as a kernel parameter after rebooting, I am able to boot with no problems. Since that mounts everything, the last mount time is changed and subsequent boots are fine.

After getting a working system, I tried setting the system clock ahead by 2 months to trigger the usual fsck runs at boot. There were no failures and the system is still booting normally.

Can someone else try this as a workaround? Note that I've only been able to get this exact method to work on first reboot. This doesn't work when I've forced fsck on boot (see comment 38).

Comment 41 Adam Williamson 2011-10-29 02:25:21 UTC
this is now the last remaining blocker. i'm still yet to hit a situation with this bug which couldn't reasonably be worked around, so i'm kind of shading -1 on it.

here's a thought: for those who get stuck with 'no prompt' from systemd can you try just blind-typing your root password and see if it dumps you to a prompt?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 42 Adam Williamson 2011-10-29 02:26:19 UTC
fastboot does work as a workaround here, too.

Comment 43 Kevin Fenzi 2011-10-29 02:48:47 UTC
I think making sure we have docs/workaround noted is good enough for now, so -1 blocker for me.

We could also fix this in a update once we have had time to come up with a clean and tested solution in rawhide.

Comment 44 Adam Williamson 2011-10-29 02:56:47 UTC
yes, it is susceptible to fixing with an update, another consideration.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 45 Adam Williamson 2011-10-29 03:56:16 UTC
well, the present vote total is clearly not enough to accept this and all other blockers are addressed, so for at least the purposes of getting RC1 spun, let's mark this as rejected. We can re-test in more depth with RC1 and re-consider it if our understanding changes further.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 46 Lennart Poettering 2011-11-01 21:57:50 UTC
Hmm, so what are you expecting me to fix in systemd here?

Comment 47 Kamil Páral 2011-11-02 12:46:03 UTC
Lennart, I believe the problem we currently see in systemd related to this bug is that if system-fsck fails we receive the emergency console just sometimes, but not every time.

Comment 48 Jóhann B. Guðmundsson 2012-01-29 15:56:20 UTC
Is this still an issue or can this bug be closed?

Comment 49 Kamil Páral 2012-01-30 08:28:27 UTC
I'll confirm when some F17 compose is out.

Comment 50 Nathanael Noblet 2012-01-30 19:31:33 UTC
The latest rawhide VM I have (updated today) won't boot to a console. I get into starting services, but then suddenly I have a systemd-fsck line and then nothing. It sits there for awhile. I'll re-post if it every completes. However it seems still present if the symptoms are slightly different.

Comment 51 Oleg 2012-02-12 10:38:27 UTC
> To sum this up. Setting back time in Fedora for more than one day is a
> showstopper for a common user. Dracut shell = end of game.
I would like to point out the issue with common user - I have a common user using Fedora after Windows virus incident, so I don't have to solve such emergency again. But it is troublesome to explain, what is going on with drop to shell described in this bug and why he has to input cryptic commands instead of "just using" the system.

Please, consider automating this. If you need a new bug for "make boot user friendly", I will post it.

Note: I have Fedora 16 and issue in this bug is very common. It happens almost every time when suspend/hibernate does not succeed.

Comment 52 Kamil Páral 2012-02-20 12:38:49 UTC
I have tried in a VM. The issue is still present. Maybe even worse, because gnome-shell was oopsing on me since I set back time. However, I hit several issues with setting time with regards to KVM, so I'll re-test properly on bare metal soon.

Comment 53 Kamil Páral 2012-02-28 13:50:00 UTC
I have done some research and here are the results:

1. This problem still exists in Fedora 17. Both of them: a) Fedora does not boot if you set back time, and b) you don't get maintenance shell in specific scenarios

Reproducer:
a) Use the default partitioning
/dev/sda1 1MB GPT
/dev/sda2 500MB /boot
/dev/sda3 rest lvm
lvm contains:
lv_root, lv_home, lv_swap

b) If you set back time more than 1 day and reboot, you'll encounter dracut shell for vg-lv_root (see f17_virt_problem1.png). Fixing the problem and continuing boot will print out errors about /dev/sda2 (see f17_virt_problem2.png). You can't fix it, no maintenance shell provided. After reboot you'll skip vg-lv_root problem (that is already fixed), but you'll again encounter /dev/sda2 problem. If you are using plymouth, the boot will appear as "stuck" indefinitely (see f17_virt_splash.png, but you can use Esc to switch to text mode). You can fix this only by reverting system time in BIOS, using a rescue CD or some higher magic (like forcing dracut shell from grub).

c) If you use a different partitioning setup without lv_home, you'll be given a maintenance shell for /dev/sda2 problem, so you can fix it. The boot will hang after that, but next reboot is fine.

2. I was able to reproduce this consistently with a bare metal machine and with a VM. But be aware that when working with VM it can be tricky to change the system time, read http://kparal.wordpress.com/2012/02/28/changing-system-time-in-a-virtual-machine/

3. This problem was already reported as bug 577126 and minor fixes were introduced (like ignoring +-24 hours time changes).

4. This problem is largely prevalent, even though it might not seem so:
* Dennis Gilmore confirmed he saw system time changes in VMs often.
* Comment 51 confirmed this might happen after failed suspend/hibernate, and also the current fix procedure is "cryptic commands magic" from a common user perspective.
* There are Fedora admins who disabled fsck on boot completely (which is madness) just because they need an environment where the system survives users playing with it (including playing with time):
http://unix.stackexchange.com/questions/8409/how-can-i-avoid-run-fsck-manually-messages-while-allowing-experimenting-with-s
* System time can be changed knowingly, by accident, or by bug. For every approach there are many test cases. E.g.: I have already tried to reproduce several bugs by changing system time. Or have your kid play with your computer and you'll end up with shifted time very soon.
* Probably the best evidence how this issue is wide-spread is this link:
https://www.google.com/search?q=Superblock%20last%20mount%20time%20is%20in%20the%20future
Just skim through it and you'll be amazed how often this can break.
* Fedora users are believed to be much more technical than e.g. Ubuntu users. Still I'd bet anything that more than 50% of all Fedora users never used fsck and the maintenance shell is a showstopper for them.

5. I tried to reproduce the same problem in Ubuntu. I have not used LVM (the default installer doesn't support it), just standard partitions swap and /. After setting back time and reboot, the graphical splash gives you some options what to do (see ubuntu-splash.png). Hitting F (fix) or I (ignore) will let you boot into your system without problems. If you boot without splash, you'll get a similar choice in the text mode (see ubuntu-problem.png). Hitting F will fix the problem and reboot the machine (see ubuntu-problem-fixed.png). It's not perfect, but it's far superior user experience.

6. I didn't find any information why e2fsck upstream is so obsessed with time consistency. Eric, you seem like you have some contacts there, could you ask them, please?

7. If there are no clear risks involved, I'd like to set broken_system_clock = 1 in /etc/e2fsck.conf. My biggest argument is: You have to fix the superblock last mount time anyway, so why won't we fix it automatically? Nagging the user to write the commands by hand doesn't help anything.
Another approach (with lowered user experience, but maybe safer) is to provide users with an easy way to trigger fsck (without issuing manual commands), similarly to what Ubuntu does. "Hit F to fix automatically, M for manual recovery, ...". It can be handled by dracut and plymouth.


As you see, there are multiple issues involved. Either e2fsck defaults have to be changed, or a new functionality in dracut and plymouth has to be implemented, or at least the recovery console has to be offered using systemd. I'd like this bug to be used for the first or second issue, and I can report a separate bug against systemd if this doesn't seem like to be resolved soon.

Reassigning to e2fsprogs. Also proposing as F17 Final blocker.
(It was rejected as a blocker for F16, but the situation was a bit different back there - it was close to Final release date and the lack of maintenance shell happened just occasionally).

Comment 54 Kamil Páral 2012-02-28 13:50:59 UTC
Created attachment 566310 [details]
f17_virt_problem1.png

Comment 55 Kamil Páral 2012-02-28 13:51:08 UTC
Created attachment 566311 [details]
f17_virt_problem2.png

Comment 56 Kamil Páral 2012-02-28 13:51:18 UTC
Created attachment 566312 [details]
f17_virt_splash.png

Comment 57 Kamil Páral 2012-02-28 13:51:38 UTC
Created attachment 566313 [details]
ubuntu-splash.png

Comment 58 Kamil Páral 2012-02-28 13:51:49 UTC
Created attachment 566314 [details]
ubuntu_problem.png

Comment 59 Kamil Páral 2012-02-28 13:51:56 UTC
Created attachment 566315 [details]
ubuntu_problem_fixed.png

Comment 60 Eric Sandeen 2012-02-28 15:46:40 UTC
dracut/systemd/whateveritis definitely needs to be fixed if under these circumstances no rescue shell is offered to correct the problem, right?  Please do file that bug.

As for e2fsck, I still say it is working as designed, even if we don't like the way it's designed.

Perhaps routing around the damage in e2fsck.conf is the way to go.

Comment 61 Kamil Páral 2012-02-28 16:34:07 UTC
(In reply to comment #60)
> dracut/systemd/whateveritis definitely needs to be fixed if under these
> circumstances no rescue shell is offered to correct the problem, right?  Please
> do file that bug.

I filed bug 798328. I proposed it as F17 Final blocker, which means it can lower the probability of accepting this one as a blocker as well.

Comment 62 Adam Williamson 2012-02-28 17:16:20 UTC
eric: I'd say this comes under the heading of 'reasonable distribution customization' - I don't think we should interpret Fedora 'stick close to upstream' policy *so* strictly that we refuse to change a default configuration option which certainly seems capable of causing considerable user pain, as comprehensively described by Kamil.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 63 Eric Sandeen 2012-02-29 02:28:23 UTC
Adam, that's fine as long as there's not some dire consequence that I'm missing.  Just sanity checking upstream, I don't mind turning it off, barring unforeseen consequences.

Comment 64 Harald Hoyer 2012-02-29 18:03:57 UTC
dracut includes /etc/e2fsck.conf, if present since dracut-014, and I agree with makeing the failcase nicer.

I will do an Ubuntu like key select thing.

Comment 65 Kamil Páral 2012-03-01 07:43:44 UTC
(In reply to comment #64)
> dracut includes /etc/e2fsck.conf, if present since dracut-014, and I agree with
> makeing the failcase nicer.
> 
> I will do an Ubuntu like key select thing.

That is great, thanks. IIUIC even if set broken_system_clock = 1, there are still use cases where "Ubuntu like key select thing" gets used, like fsck after hard-poweroff.

I'd even say that if we have this functionality, we don't need to have broken_system_clock = 1. But that depends on a) whether there are risks related to this setting b) how fast the new functionality be available. We can switch to broken_system_clock = 1 now and consider reverting it back once the new functionality is present and working.

Comment 66 Jonathan 2012-03-21 18:20:20 UTC
I'll just chime in with a non-power-user opinion. System clock inconsistency at the boot fsck just killed my boot process. Yes, the system clock was set wrong, I have no idea why it was suddenly wrong. As this happened at reboot after a major system update, I assumed that something broke in the update, not that unusual, and spent hours googling and rolling back updates to revive the system. Setting the clock right in BIOS finally made it work.

Now to my opinion: it seems that an incorrect system clock that trips the boot is not that uncommon. Killing the boot process is pretty harsh. How about it just suggests that the clock might be wrong and offer, but maybe not recommend, to ignore this error once and try to resume the boot? Leaving a novice user with this problem will make them leave linux.

Comment 67 Michael Rado 2012-04-04 22:25:53 UTC
I've found that several of the changes mentioned in this thread are required to remove the sanity check for last mount time at boot.

Here's my process which worked, three easy steps ;)

1. Disable the initramfs in your /boot/grub.conf (I use grub instead of grub2) file.  This is required because the initramfs performs a disk check which includes this sanity check.  

WARNING: The initramfs is required if you are using softraid (md) devices or lv's.

2. Move /lib/systemd/systemd-fsck to /lib/systemd/systemd-stopsMyBootUp, then recreate it with:

# echo "exit 0" > /lib/systemd/systemd-fsck

WARNING: This stops systemd's fsck, so no disk check will be run at boot time.

3. Add "broken_system_clock = 1" to your "/etc/e2fsck.config", if you don't have this file create it with:

# echo "broken_system_clock = 1" > /etc/e2fsck.config

Comment 68 DuWayne 2012-04-20 16:47:13 UTC
For a training class, we have nearly 200 laptops that must be rolled back in time every week to a fixed date for the class start (kinda like Ground Hog Day).  We are running a Fedora VM using VMWare Player on a Windows 7 laptop.

I need a configuration that will start successfully when the clock is setback one week without having to run fsck on 200 laptops to fix this system error. 

Comment 67 does not really tell me in enough detail the three easy steps explicitly enough.  I need to know exactly what I need to change so that these VM's will start every week when the clock is set back without having to run fsck.  For example, how do I disable initramfs?  Show me the lines in grub.conf before and after editing.

Also, do I really need to disable all file system integrity checks?  Seems like all I really want is disable fatal start errors when the system date is earlier than the last file system mount time or if I could prevent the system from writing and tracking the last file system mount time.  As was stated earlier in this bugzilla, if Windows did something like this where a cryptic message halted system start, it would not be tolerated.

Comment 69 Adam Williamson 2012-04-20 19:15:58 UTC
Discussed at 2012-04-20 blocker review meeting - http://meetbot.fedoraproject.org/fedora-bugzappers/2012-04-20/fedora-bugzappers.2012-04-20-17.01.log.txt . We had no devel or releng or FPL representation at the meeting and the call is clearly controversial (we had no clear consensus of those who were present), so we agreed to punt the decision to a meeting that is better attended. This bug remains a proposed blocker.

Comment 70 Eric Sandeen 2012-04-20 21:07:55 UTC
I think that to really fix this, whatever it is these days, dracut I guess :) that makes initramfs will need to pick up /etc/e2fsck.conf, if it exists.

once it does, putting:

broken_system_clock = 1

into /etc/e2fsck.conf should suffice for now.

....

You can include it in the initramfs manually by i.e.:

# dracut -v -I /etc/e2fsck.conf -f /boot/initramfs-3.3.2-1.fc16.x86_64.img 3.3.2-1.fc16.x86_64

Presumably you could write a dracut module to do it too, or maybe a core dracut change could be made, I'm not certain.

Comment 71 Eric Sandeen 2012-04-20 21:09:33 UTC
So this probably has 2 pieces; add the prescribed e2fsck.conf to e2fsprogs, and teach dracut and/or mkinitrd to pick it up.

-Eric

Comment 72 Eric Sandeen 2012-04-20 21:12:12 UTC
Harald, I'm game to toss /etc/e2fsck.conf in if dracut can pick it up...

Comment 73 Eric Sandeen 2012-04-20 21:22:45 UTC
# date
Sun Apr 20 16:27:42 EST 1980

That's after a reboot with no trouble :)  I just did what I mentioned above: added e2fsck.conf and rebuilt the initramfs.

I'll stick e2fsck.conf in e2fsprogs now, sorry for the delay.  Dracut will need to pick it up too.

Comment 74 Fedora Update System 2012-04-20 22:25:29 UTC
e2fsprogs-1.42-3.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/e2fsprogs-1.42-3.fc17

Comment 75 Eric Sandeen 2012-04-20 22:26:49 UTC
Ok, I don't know whether the problem of not getting a shell to be able to fix the error has been addressed elsewhere, but the config file is in the package now at least.  Guess I'll file a dracut bug to pick it up.

Comment 76 Eric Sandeen 2012-04-20 22:31:20 UTC
Bug 814874 - Please include /etc/e2fsck.conf in initramfs

Comment 77 Fedora Update System 2012-04-21 21:01:09 UTC
Package e2fsprogs-1.42-3.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing e2fsprogs-1.42-3.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-6299/e2fsprogs-1.42-3.fc17
then log in and leave karma (feedback).

Comment 78 Michael Rado 2012-04-21 23:08:39 UTC
Hi All,

This issue is complicated.  The last boot time "sanity" check is in 2 places that will stop the boot requiring an fsck or a "ctrl + d" to ignore.

1. /lib/systemd/systemd-fsck (which does NOT respond to the /etc/e2fsck.conf config)

 - Run this to remove this tool thus disabling the check:

 #  echo "exit 0" > /lib/systemd/systemd-fsck

2. /lib/systemd/systemd-fsck exists in two places, in dracut (the initramfs) and in the main fs run by systemd during boot.

 - Run this to disable it in your initramfs (dracut)

 # echo "exit0" > /lib/systemd/systemd-fsck
 # dracut --install /lib/systemd/systemd-fsck

3. Also setting using the "system-config-date" and unchecking "system clock uses utc" will prevent the time change that occurs during the boot between "Other" OS's and Fedora.



(In reply to comment #77)
> Package e2fsprogs-1.42-3.fc17:
> * should fix your issue,
> * was pushed to the Fedora 17 testing repository,
> * should be available at your local mirror within two days.
> Update it with:
> # su -c 'yum update --enablerepo=updates-testing e2fsprogs-1.42-3.fc17'
> as soon as you are able to.
> Please go to the following url:
> https://admin.fedoraproject.org/updates/FEDORA-2012-6299/e2fsprogs-1.42-3.fc17
> then log in and leave karma (feedback).

Comment 79 Eric Sandeen 2012-04-22 22:37:21 UTC
Michael, I think you are making the issue too complicated.  There should be no need to disable systemd-fsck.

All that should be required is the presence of the new /etc/e2fsck.conf in the initramfs, something which should be taken care of automatically if/when dracut is updated per bug 814874 and the newer e2fsprogs is installed.  Until then, you can rebuild the initramfs and manually include /etc/e2fsck.conf from an updated e2fsprogs.

Once the config file is present in the initramfs, e2fsck won't exit with error due to large clock offsets, and boot will continue as normal even if the clock is off.

Can you show otherwise?

Comment 80 Michael Rado 2012-04-23 00:01:30 UTC
I believe /lib/systemd/systemd-fsck is the application stopping the bootup due to the large time gap.  It's not part of the e2fsprogs package (according to rpm -ql), and doesn't seem to respond to the /etc/e2fsck.conf file, at least in my testing on F15.

I found "disabling" /lib/systemd/systemd-fsck in both the initramfs and the root fs was the easiest way to avoid the stop during boot.

Comment 81 Michael Rado 2012-04-23 00:28:14 UTC
/lib/systemd/systemd-fsck is started by the fsck-root service.  I noted in the fsck.c file in the systemd source that systemd-fsck seems to call "/sbin/fsck" using the execv call with the fsck args -a -T -l (and others depending on state).

That being the case I'm not sure why /sbin/fsck in the initramfs did not respond to the /etc/e2fsck.conf file installed into the initramfs when executed by systemd-fsck.  

Eric in your testing did you find that installed the /etc/e2fsck.conf file prevented the stop at boot?

Comment 82 Eric Sandeen 2012-04-23 04:34:44 UTC
(In reply to comment #80)

> I found "disabling" /lib/systemd/systemd-fsck in both the initramfs and the
> root fs was the easiest way to avoid the stop during boot.

Of course, but that's a big hammer.  If that completely disables any boot-time fsck, then yes, I would expect it to avoid the problematic fsck return value, but it's hardly an optimal solution.

(In reply to comment #81)

> Eric in your testing did you find that installed the /etc/e2fsck.conf file
> prevented the stop at boot?

In my initial testing, yes.  Before, boot stopped, because e2fsck failed, because the time was wrong.  The conf file tells e2fsck not to fail due to time differences, so I expect this to be a complete solution once it's present in the boot environment.

Adding it to the initramfs manually, and then setting back the time to 1980, booted fine for me.

You found otherwise?

Comment 83 Eric Sandeen 2012-04-23 04:55:41 UTC
Just remembered that setting the clock to 1980 is a bad test :/  e2fsck decides on its own that that is "too far back" and automatically ignores the time difference :(  grumble grumble.

And, crud, the format of the conf file I pushed was wrong.  If you want to re-test, try this in e2fsck.conf:

[options]
broken_system_clock = 1

I had missed the [options] header, and without that it doesn't work.

I'll build a new e2fsprogs :/

Comment 84 Fedora Update System 2012-04-23 05:09:52 UTC
e2fsprogs-1.42-4.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/e2fsprogs-1.42-4.fc17

Comment 85 Adam Pribyl 2012-04-23 08:11:03 UTC
Just CCing me as I hit this on a systems that do not have RTC or RTC+backup battery (and there is a plenty of them, usually the embedded devices).

Comment 86 Eric Sandeen 2012-04-23 14:33:17 UTC
Adam, feel free to test the latest e2fsprogs-2.42-4.fc17, make a new initramfs including the new e2fsck.conf file, and see if that resolves it for you.

-Eric

Comment 87 Eric Sandeen 2012-04-23 14:48:39 UTC
Ok, dracut-018-23.git20120419.fc17 has been built, and automatically includes e2fsck.conf.

So with both updates in place, re-running dracut (or installing a new kernel) will create a new initramfs with an e2fsck which should not stop on the clock problem.

-Eric

Comment 88 Fedora Update System 2012-04-24 03:20:21 UTC
Package e2fsprogs-1.42-4.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing e2fsprogs-1.42-4.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-6496/e2fsprogs-1.42-4.fc17
then log in and leave karma (feedback).

Comment 89 Fedora Update System 2012-05-02 04:40:20 UTC
e2fsprogs-1.42-4.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.