Bug 713262 - RFE: advise/guide the user how to do a manual fsck when needed
Summary: RFE: advise/guide the user how to do a manual fsck when needed
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: dracut-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-14 20:11 UTC by George Lebl
Modified: 2014-03-17 03:38 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-01 08:58:43 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
The incredibly useful and very user friendly message. (318.58 KB, image/jpeg)
2012-01-31 23:16 UTC, George Lebl
no flags Details

Description George Lebl 2011-06-14 20:11:52 UTC
Description of problem:
On a power outage or cernel crash it is unfortunately common to reboot and obtain a very unfriendly drop to shell with some cryptic message about fsck.  For a casual user, the wonderful GNOME-Shell uber friendly machine became a brick.

Once happened to my wife about a year ago, and ... She was very mad that she could do no work for the rest of the day until I fixed her computer in the evening.

Now if you are a relatively knowledgable user you might manage to run "fsck /dev/whatever"  Well, actually it's pretty interesting intellectual problem to find out what "whatever" is, since as a user I don't really care.  Actually I think I am a knowledgable user and it still was not trivial to figure it out for me.

Why not just put up a question: "Should I try to fix your disk?"  Perhaps with a warning about bad things possibly happening.  And then run "fsck /dev/whatever" for the user.  But I have myself never any clue as to what to do if bad things really did happen, so I just do "fsck /dev/whatever" and answer yes to any question that it asks me because I simply don't know what it is asking me.  I just want a working system.  Perhaps I'll check my backups, etc... but I need a working system first.

Version-Release number of selected component (if applicable):
Fedora 15 latest.

How reproducible:


Steps to Reproduce:
1. Have a power failure, or a kernel crash, etc...
2. Get the manual fsck required nonsense
3. Become confused, curse Fedora, not able to do actual work
  
Actual results:
Essentially a brick unless you are a very experienced user.

Expected results:
The computer offers to fix itself.  Perhaps give warnings.  But it should not require one to write "fsck /dev/whorememberswhatthedevicenameisnowdays"

Additional info:

Comment 1 Michal Schmidt 2011-06-14 22:36:29 UTC
(In reply to comment #0)
> On a power outage or cernel crash it is unfortunately common to reboot and
> obtain a very unfriendly drop to shell with some cryptic message about fsck. 

Could you quote (or take a picture of) the exact message?
It's not supposed to be "common" nowadays. What type of filesystem do you use?

Comment 2 George Lebl 2011-06-27 23:09:25 UTC
Well I don't really feel like trying to coax the system to corrupt my disks again.  It is the standard message you get when automatic fsck fails.

See e2fsprogs e2fsck/util.c function preenhalt, about line 261 in version 1.41.14

fsck returns "uncorrected" with that cryptic (for a new user) message, and systemd dumps you to a shell (src/fsck.c in systemd)

As for filesystem, this is a completely stock fedora 15 so ext4.

Just because it is not supposed to be common, crappy hardware exists so for example my computer keeps overheating (stock hardware again, no mods, cleaned, lenovo) and that sometimes leads to these messages.  Plus bugs in the kernel/graphics drivers/etc do exist and will always exist, so the system needs to recover gracefully by itself if it is to be used by a nonexpert.

(not to mention that I sometimes unplug the battery by mistake when handling the laptop)

Jiri

Comment 3 Fedora Admin XMLRPC Client 2011-10-20 16:28:12 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 4 Jóhann B. Guðmundsson 2012-01-25 14:01:21 UTC
Enhancement for emergency mode?

Comment 5 George Lebl 2012-01-25 20:12:57 UTC
Happened again a few weeks ago.  Left the laptop on a bed running by mistake.  It I guess overheated and I came back to this message yet again.  I needed the laptop quickly but it took me again a while to figure out the device name since I have an encrypted harddisk.  It just tells me the volume name, which is an utterly useless information, not to mention it's confusing because that is just the name of the installed fedora release (which is by now different since I upgraded, so it's doubly confusing).

I can imagine a really simple solution, I guess I can submit a patch though I can't guarantee how much testing it will get and when I'll get to it, but I can't imagine this having a necessarily more complicated solution:

Add code that does something like

printf("Would you like me to run fsck for you? [y/n]\n");
...read y/n...
if (yes) {
  system("/bin/fsck  /dev/whatever");
  ...reboot... (possibly only after another prompt).
}

I can't believe I am the only one that leaves laptops on beds, pulls out the battery by mistake, and hits kernel bugs related to suspend or X.  given that pretty much every computer that I have every had suffered this (and it even happened to my wife who is not a computer geek), I doubt this is uncommon.  I tend to hit such a problem at least once a year myself.  And I always have to waste at least 15 minutes just to figure out what /dev/whatever I am supposed to use because of how helpful that message is.

Comment 6 George Lebl 2012-01-31 23:14:35 UTC
happened again, I think it ran out of battery while suspended, but I can't be quite sure.  This time I figured out there is a /dev/disk/by-label/ directory, yay.  Not that the label is useful.  I am attaching a picture of the incredibly useful message.

Now try to imagine the effect of this event on a nontechnical user.

Comment 7 George Lebl 2012-01-31 23:16:23 UTC
Created attachment 558721 [details]
The incredibly useful and very user friendly message.

Comment 8 Jóhann B. Guðmundsson 2012-02-01 08:58:43 UTC
First of all this is not a systemd bug. 

Secondly if an automatic fsck check $foo is created it´s implemented in dracut not systemd until that happens users will have to run fsck -y when Dracut drops them to emergency shell. 

Thirdly you are running what seems to be upgraded beta F15 which might be what causes this ext corruption in suspend in the first place at-least I personally experienced some weird ext corruption and behaviour ( like slower boot ) that happened with suspend resume which I have not seen after I did fresh install of the final release.

Running an F16 kernel on F15 is not supported ( the version name change can be causing unforeseeable behaviour/breakage ).

There is absolutely no need nor makes any sense for user to be installing F16 kernels on F15.

For example 2.6.41.x is the 3.1.x kernel and if you want 3.2.x kernel it's called 2.6.42.x.( same kernels different version naming )

File an RFE against dracut where you request this to be implemented 

Closing this bug.

Thanks.

Comment 9 Michal Schmidt 2012-02-01 09:06:28 UTC
Right, that's dracut's emergency shell. systemd is not involved.

The hint how to run fsck could be made clearer in dracut. Though I have some doubts about how a user who does not know how to run fsck will know what to answer to fsck's tricky questions. Blindly fixing everything can result in data loss.

It's also interesting why you seem to be getting into this situation relatively
often. ext4 is not expected to break itself like that. Not even when running
out of battery while suspended.

Comment 10 George Lebl 2012-02-01 14:12:37 UTC
To see it's a problem for other people see:
http://andy-xhosa.blogspot.com/2012/01/love-of-work-overcomes-all-problems.html 

That's exactly the response I expected.  Because that's pretty much the kind of response I got from ubuntu, "this never happens because ext4 is magic":

1) The bug is not about how I originally installed my system.  So what it is an upgraded fedora 15 beta.  This happens on cleanly installed systems as well.  THE CODE THAT DOES THE STUPID MESSAGE AND A DROP TO SHELL IS NOT CAUSED BY MY USING AN UPGRADED BETA.  What are you talking about running a kernel on F15?  That has nothing to do with the bug.  The bug is about the message, not about the corruption.

2) Yeah so it happens in dracut and not systemd.  This is a bug report not a patch.  If I knew exactly where and how to fix it, I would just fix the damn thing.  So reassign to dracut.  A bug is a bug even if it is somewhere else.

3) TO REITERATE:

Such corruption IS GOING TO HAPPEN no matter what.  It has happened on every release of every distro I have ever run in the last 15 years.

I GIVE UP!  File the RFE yourself.  I work for a living, I can't spend it repeatedly trying to explain what is wrong when people don't apparently read what i write.  You guys feel free to keep a a system that bricks itself (for a nontechnical user) when you unplug it by mistake.

Comment 11 Michal Schmidt 2012-02-01 14:27:06 UTC
[ moved as a RFE to dracut; changed title ]


Note You need to log in before you can comment on or make changes to this bug.