Bug 1170803

Summary:

calls e2fsck on all ext volumes, provides no status indicator, and hangs indefinitely if e2fsck doesn't exit

Product:

[Fedora] Fedora

Reporter:

Leslie Satenstein <lsatenstein>

Component:

python-blivet

Assignee:

Vratislav Podzimek <vpodzime>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

CC:

amulhern, awilliam, blivet-maint-list, bugzilla, bugzilla, cra, cristian.ciupitu, djuran, dlehman, esandeen, fabrice, g.kaviyarasu, jan.public, jansen, jcapik, jeder, jonathan, josef, j, kparal, kzak, lsatenstein, marmalodak, marmarek, me, m_kretzschmar, mrmazda, oliver, pschindl, robatino, samuel-rhbugs, vanmeeuwen+fedora, vpodzime

Target Milestone:

---

Keywords:

CommonBugs, Reopened

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

https://fedoraproject.org/wiki/Common_F25_bugs#anaconda-fsck-slow AcceptedBlocker

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-09-26 01:41:19 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1396702

Attachments:

Description	Flags
RC5 /tmp as tar file 1170803.tar	none
anconda.log	none
ifcfg.log	none
program.log	none
sensitive-info.log	none
storage.log	none
Request install log with e2fsck commands	none

Description Leslie Satenstein 2014-12-04 22:11:17 UTC

Description of problem:

ANACONDA  "SERVER NETWORK INSTALL ISO" WRITTEN TO FLASH DRIVE.

Had to poweroff to boot this sha256sum tested iso. Even so, it took three tries to get iso to boot.

Then, I was unable to change time-zone (Winnepeg in place of Toronto) (1 timezone off.

In entering network (Wired connection) Anaconda locked up solidly. 
Not able to switch to command line.  I was changing hostname from localhost.localhost to fedora21.fedora21




Version-Release number of selected component (if applicable):

https://dl.fedoraproject.org/pub/alt/stage/21_RC5/Server/x86_64/iso/Fedora-Server-netinst-x86_64-21.iso

Dated 4 December 2014


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

I don't mind helping with the testing. I have two systems available on which I can test server net install and workstation versions.

Comment 1 Leslie Satenstein 2014-12-04 22:28:26 UTC

Same Lockup problem with Workspace Anaconda version (Probably the same version).

Cannot reset hostname.

Comment 2 Leslie Satenstein 2014-12-04 22:45:16 UTC

Cannot set timezone or select installation disk. With command line prompt, system was waiting for anaconda.

Was able to kill anaconda and do one restart. -- 2nd attempt with anaconda locked up.

Quite prepared to test again with newer RC6.

Comment 3 Leslie Satenstein 2014-12-06 15:12:22 UTC

Anaconda

RC5 Locks up with attempt to change hostname (wired access).

I hope this version is not the one for general release.

Shasum checking shows no download error. Self test indicates no error.

Was hoping to test a fully non-lockup anywhere version of Anaconda.

Reminder   Same problem with workspace and net-install versions

Comment 4 Leslie Satenstein 2014-12-06 22:59:42 UTC

I am reporting this problem because it occurs using a wired connection.

When I retest using a computer with wi-fi, Anaconda does not lock up.

With wired connection, one cannot correct the timezone or exit from the network hub after setting a hostname.

With the wireless connection, the default hostname is random, with faulty characters, and requires a manual correction.

Please test with wired connection. Did not work for me
Please test with wireless connection yes, worked for me.

tested with  network-server version and with live-Workspace version. 

Tested with two versions of RC5

Comment 5 Chris Murphy 2014-12-07 01:00:59 UTC

I tested a wired connection and can't reproduce any of the things you've reported. Logs from /tmp would help, so would journalctl -b -l -o short-monotonic. If you can't get to a shell for these things, then there are problems deeper than just the installer that implicate kernel and/or hardware.

Comment 6 Leslie Satenstein 2014-12-07 04:33:39 UTC

Created attachment 965493 [details]
RC5 /tmp  as tar file 1170803.tar

This tar file is created post anaconda lockup within the Wired spoke
some info.
CPU = 100%.
the tarfile contains all the contents of the /tmp (all logs).

I was able to return from the shell to gnome, and somehow was able to verify that I had copyied the files to a secondary flash drive.

Anaconda appeared to have restarted.  I had changed the keyboard to ca (french canada) That was shown as the default , but on return it showed as US switched back).  Is Anaconda in there twice? 

One feedback.  The help button on the wireless setup page eventually opened up, perhaps after 90 seconds.  (Note the 100% cpu)

I will follow up with individual log files in case the tar file does not arrive cleanly

Comment 7 Leslie Satenstein 2014-12-07 04:35:08 UTC

Created attachment 965494 [details]
anconda.log

anaconda.log

Comment 8 Leslie Satenstein 2014-12-07 04:36:17 UTC

Created attachment 965495 [details]
ifcfg.log

Comment 9 Leslie Satenstein 2014-12-07 04:38:31 UTC

Created attachment 965496 [details]
program.log

Comment 10 Leslie Satenstein 2014-12-07 04:39:22 UTC

Created attachment 965497 [details]
sensitive-info.log

Comment 11 Leslie Satenstein 2014-12-07 04:40:19 UTC

Created attachment 965498 [details]
storage.log

other logs = 0 bytes

Comment 12 Chris Murphy 2014-12-07 19:42:46 UTC

The last entry in program.log is e2fsck so it seems like anaconda is waiting for it to complete. 
1.Please reboot from install media.
2.Use blkid to confirm the device for fs uuid 0594544c-dc98-4ab7-af14-0840131a2ca1, volume name SeagateExt3. Previously it was sdg2 but this isn't guaranteed between reboots.
3. Run e2fsck -f -p -C 0 /dev/sdXY

What results do you get? If this comes up clean and completes, then you'll need to reproduce the freeze, find the PID for anaconda, and run 'pstack PID' and attach the results as a file. I'm not sure if pstack comes on all install media, or if you'll need to use live media and yum install it.

Comment 13 Leslie Satenstein 2014-12-08 16:42:10 UTC

Hi Chris, Sorry for the delay in responding. I was looking to my emails for bugzilla email showing activity on this bug #.

When you indicated SeagateEXT3, that was my 2 terrabyte USB external backup.

I unplugged it, and retested. There is no problem any longer.  Since I will do the install without the external USB backup plugged in, I know it will succeed. 

So, I apologize for causing concern at the last minute.
BTW, while that e2fsck check was taking place cpu was 101% (on dual core).
When I saw that, I assumed wrongly, a tight loop in code.


For my own curiosity, I will try the rc5 beta again with drive plugged in and give it a half hour to run through the check.

I believe that the release notes should be updated to indicate potential delays as I experienced it with Anaconda.

Please redirect to release note person responsible (Pete Travis)

Comment 14 Chris Murphy 2014-12-08 18:29:03 UTC

Anaconda is running resize2fs -P and then e2fsck -f -p -C 0 on every ext volume, before I've even gotten passed the language selection screen. It's fine for it to collect minimum size info with resize2fs -P for all ext volumes, but I think it's inappropriate to run e2fsck on everything.

My case:
INFO program: Running... e2fsck -f -p -C 0 /dev/sda1
INFO program: Running... e2fsck -f -p -C 0 /dev/loop3
INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/live-base
INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/live-osimg-min

That's recorded while I'm still at the Welcome+Language menu, so sda isn't even a chosen target for installation yet fsck is running on it? And I have no opt out? And I get no status report if it hangs? The fsck on the other three volumes fails with exit code other than 0 because they're all busy at the time the e2fsck was issued. So that's also pointless.

I think this needs to behave better.

Comment 15 Leslie Satenstein 2014-12-08 20:04:51 UTC

If there is a way to get the revised anaconda out before tomorrow, or if there is
going to be an emergency patch and you would like me to test with my system, I would like to know by email. 

To do so, update this bug report and as I am on the CC list, I will know to fetch the download to test. 

Please put the retrieval url for wget download in this response.
Leslie

Comment 16 Chris Murphy 2014-12-08 20:23:17 UTC

This is a sufficiently pernicious bug, I'm proposing as an F22 alpha blocker. 

- It's completely reasonable for users to have many, or a large, ext volume(s). 

- It's unreasonable for the installer to issue e2fsck on all volumes when they aren't related to the installation at all. The whole point of journaled file systems is to avoid potentially hours or days long fsck. e2fsck -f -p should only be run on volumes that are explicitly chosen for resize, or reuse (like /home), and then there needs to be a status UI for that during the installation phase so we know what's going on.

"When using a dedicated installer image, the installer must be able to complete an installation using the text, graphical and VNC installation interfaces. This criterion covers showstopper bugs in the installer for which there isn't any other specific criterion: obviously, it can't 'complete an installation' if there's a showstopper."

Comment 17 Adam Williamson 2014-12-08 20:37:00 UTC

I believe this comes from ce92b550a687547683bd36fe9419fb08894207fa , which was added to fix bug #1162215 . Note the explanation there:

    Run fsck before obtaining minimum filesystem size. (#1162215)
    
    resize2fs -P now requires an e2fsck -f first. The lack of a real minimum
    size left us with no useful minInstanceSize and allowed users to attempt
    a shrink down to any size, which led to failures.
    
    If we have a currentSize, set filesystem.

Comment 18 Adam Williamson 2014-12-08 20:39:30 UTC

Leslie: there are not going to be any emergency fixes. When go/no-go finishes, that is it, the release is done. Nothing will change in the release images after the Go decision.

Comment 19 Chris Lumens 2014-12-08 21:12:08 UTC

*** Bug 1169542 has been marked as a duplicate of this bug. ***

Comment 20 Leslie Satenstein 2014-12-08 21:18:13 UTC

Adam
I thought so.  I sent a note to Pete Travis to update the F12 release note.

I am not certain that the note too, is blocked from updates.  (Example, other distributions have corrected release notes after go-live).

If Pete is not the right person, please follow up.  It is important to update the wiki before tomorrow. I am not the only one to keep backup drives plugged in 24/7

Comment 21 Chris Murphy 2014-12-08 21:23:30 UTC

(In reply to Adam Williamson (Red Hat) from comment #17)
> I believe this comes from ce92b550a687547683bd36fe9419fb08894207fa , which
> was added to fix bug #1162215.

Yep, too bad as bcl predicted this bug in 1162215c21 if the check were applied to all volumes, not just user selected target devices.

Draft commonbugs text: If the system has any combination of slow, many, or large, ext[234] volumes, the installer might hang. The hang could begin at any point from the Welcome & Language selection menu onward. This is due to the installer running e2fsck on all available ext volumes. Once the installer is launched, users are advised to wait for e2fsck instances to complete. As a work around prior to launching the installer, users can physically detach devices unrelated to the installation.

Comment 22 Pete Travis 2014-12-08 21:39:41 UTC

I think commonbugs is the best place for this info, thanks for the report and wiki entry.

Comment 23 David Shea 2014-12-09 21:04:27 UTC

*** Bug 1172324 has been marked as a duplicate of this bug. ***

Comment 24 Vratislav Podzimek 2014-12-10 08:02:55 UTC

I think this actually is a bug e2fsprogs. blivet just tries to get information about existing storage devices and formats (disks, partitions, file systems,...) and anaconda needs this information before user does any disk/device selection. And with the follow-up patch for bug #1162215 (c0cccb8ee "Try to get FS info first before doing an FS check") no fsck should be run unless needed. 

That means that:
1) if you have damaged or cleanly unmounted file system on your machine before the installation, fsck is run, but it's probably not a good idea to run installation on such machine

2) if your filesystems are clear and *everything works as expected*, there no fsck should be run

And the thing that is not working as expected here is resize2fs requiring fsck to tell blivet the minimum size of the filesystem. Or does anybody disagree with this? Please let me know because I plan to reassign this bug to e2fsprogs.

Comment 25 Chris Murphy 2014-12-10 08:47:26 UTC

Well, in comment 14 I show log entries when UI is not further than the Welcome/Language menu, and it's running e2fsck -f -p on every ext volume available including the ext4 live system images. They can't all need fsck or we have other problems. When I do an e2fsck -f -p on an 8TB ext4 volume that's freshly created and has no files in it takes ~45 seconds. So 30-60 minutes for a large fs with a bunch of files in it, even if clean, doesn't seem outlandish. I'll add Eric Sandee to the bug, see what he thinks. Eric, maybe start at comment 21's draft text for a shorter summary of this bug rather than wading through the whole thing.

Comment 26 mulhern 2014-12-10 13:21:03 UTC

<-- SNIP -->

> And the thing that is not working as expected here is resize2fs requiring
> fsck to tell blivet the minimum size of the filesystem. Or does anybody
> disagree with this? Please let me know because I plan to reassign this bug
> to e2fsprogs.

I agree that this requirement seems wrong. Also, I know that it has not always been the case, or perhaps not for all situations. For resize2fs version 1.42.8 (the version on my desktop), on the tests that I run on filesystems on loop devices, e2fsck is not required to be run to get resize2fs -P to return an apparently meaningful value.

Comment 27 Eric Sandeen 2014-12-10 14:41:39 UTC

If you are unconditionally running "e2fsck -f -p" you are forcing a full check, and you get to wait for it.  If you're running it on every disk on every machine in the Fedora universe, you get to wait for every single possible attached drive.

So, not an e2fsprogs bug.

resize2fs does now require a check prior to printing the minimum size, if the filesystem is in error or has a last-check-time set which has expired, because on corrupted filesystems the calculation could hang:

commit 7d7a8fe4ea4d9162977a1a6b32c4737d9ca9dd1f
Author: Eric Sandeen <sandeen>
Date:   Mon Jun 9 09:52:19 2014 -0400

    resize2fs: don't attempt to calculate minimum size on fs with errors
    
...

+       if (!(mount_flags & EXT2_MF_MOUNTED)) {
+               if (!force && ((fs->super->s_lastcheck < fs->super->s_mtime) ||
+                              (fs->super->s_state & EXT2_ERROR_FS) ||
+                              ((fs->super->s_state & EXT2_VALID_FS) == 0))) {
+                       fprintf(stderr,
+                               _("Please run 'e2fsck -f %s' first.\n\n"),
+                               device_name);
+                       exit(1);
+               }
+       }

so there are filesystems out there which will require a check prior to resize2fs -P, but certainly not *all* of them.  You could attempt resize2fs -P, and if that fails w/ the above message, run e2fsck if you still really wanted to, alert the user, etc.

As always, if Anaconda or associated bits has filesystem questions, we're happy to help - wish I'd known about this earlier.

-Eric

Comment 28 Eric Sandeen 2014-12-10 14:45:26 UTC

Oh, damn.  Just reread the above check.

(fs->super->s_lastcheck < fs->super->s_mtime)

does indeed require fsck if it's mounted after the last check.

hohum.

-Eric

Comment 29 Eric Sandeen 2014-12-10 14:48:39 UTC

Still, if we'd known about this requirement/problem in the installer, I think we probably could have relaxed that check.  :(

We didn't anticipate a workflow which ran resize2fs -P on many filesystems in a row, I guess.

-Eric

Comment 30 Eric Sandeen 2014-12-10 15:20:22 UTC

I've sent a patch upstream to drop the fsck requirement if we're only printing the minimum size.

Comment 31 Vratislav Podzimek 2014-12-15 09:40:24 UTC

(In reply to Eric Sandeen from comment #30)
> I've sent a patch upstream to drop the fsck requirement if we're only
> printing the minimum size.
Thanks! Are you okay with taking this bug? python-blivet (now) does exactly what you suggested in comment #27 -- attempt resize2fs -P, if that fails, e2fsck.

Comment 32 Eric Sandeen 2014-12-15 17:46:12 UTC

Sure; the patch is merged upstream now, according to Ted (though not pushed yet, apparently).

Sadly my suggestion in #27 won't really work too well, it'll almost always fail.  So F20 is just kind of doomed for this, I guess.  Wish I'd known about the problem earlier, but oh well!

Part of this bug seems to be that the user gets no feedback on a long fsck action; I don't know if it should be cloned to deal with that, if it' possible.

-Eric

Comment 33 Vratislav Podzimek 2014-12-16 08:57:30 UTC

(In reply to Eric Sandeen from comment #32)
> Sure; the patch is merged upstream now, according to Ted (though not pushed
> yet, apparently).
Thanks!

> 
> Sadly my suggestion in #27 won't really work too well, it'll almost always
> fail.  So F20 is just kind of doomed for this, I guess.  Wish I'd known
> about the problem earlier, but oh well!
Yeah, what I meant by my comment is that blivet does what's right here.

> 
> Part of this bug seems to be that the user gets no feedback on a long fsck
> action; I don't know if it should be cloned to deal with that, if it'
> possible.
That's a good point. I'm going to create a separate bug for it, though.

Comment 34 Petr Schindler 2015-01-07 17:01:35 UTC

Discussed at today's blocker review meeting [1]. Rejected as a blocker. This bug doesn't clearly violate any criteria and looks to be getting worked on either way. Please repropose if it's found to violate another criterion.

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2015-01-07/

Comment 35 Leslie Satenstein 2015-02-13 22:30:10 UTC

When can I test a F22 beta, so I can close this bug(let)

Comment 36 Fedora Update System 2015-02-24 17:55:48 UTC

e2fsprogs-1.42.12-3.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/e2fsprogs-1.42.12-3.fc21

Comment 37 Fedora Update System 2015-02-24 17:59:23 UTC

e2fsprogs-1.42.12-3.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/e2fsprogs-1.42.12-3.fc20

Comment 38 Fedora Update System 2015-02-25 13:27:42 UTC

Package e2fsprogs-1.42.12-3.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing e2fsprogs-1.42.12-3.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-2511/e2fsprogs-1.42.12-3.fc21
then log in and leave karma (feedback).

Comment 39 Fedora Update System 2015-03-04 10:22:57 UTC

e2fsprogs-1.42.12-3.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 40 Fedora Update System 2015-03-04 10:34:28 UTC

e2fsprogs-1.42.12-3.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 41 Andy Campbell 2015-06-06 11:14:38 UTC

Still an issue installing F22.  Anaconda hangs while it is running e2fsck on each filesystem,  with no indication what its doing.

Comment 42 Eric Sandeen 2015-06-08 00:00:41 UTC

With what version of e2fsprogs?

Comment 43 Brian Lane 2015-06-08 16:26:41 UTC

(In reply to Andy Campbell from comment #41)
> Still an issue installing F22.  Anaconda hangs while it is running e2fsck on
> each filesystem,  with no indication what its doing.

Please attach /tmp/program.log

Comment 44 Andy Campbell 2015-06-09 17:35:35 UTC

e2fsprogs, as shipped with F22 ....

[liveuser@localhost ~]$ rpm -qa | grep e2fsprogs
e2fsprogs-1.42.12-4.fc22.x86_64
e2fsprogs-libs-1.42.12-4.fc22.x86_64


Uploading requested logs.   
All I did was boot F22 Workstation live image from a USB stick and selected install. I waited for 30 mins or so for the fscks to complete,  PC was clean shutdown before booting off of the USB stick.

[liveuser@localhost ~]$       grep e2fsck /tmp/program.log
13:17:37,177 INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/vg_neotrantor-entertainment
13:17:41,118 INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/vg_neotrantor-software
13:17:44,939 INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/vg_neotrantor-photos
13:17:46,912 INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/vg_neotrantor-stuff
18:17:48,752 INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/vg_neotrantor-VirtMch2
18:23:50,320 INFO program: Running... e2fsck -f -p -C 0 /dev/sda1
18:23:50,399 INFO program: Running... e2fsck -f -p -C 0 /dev/sda2
18:23:50,746 INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/fedora-mnt_VirtMch
18:23:58,089 INFO program: Running... e2fsck -f -p -C 0 /dev/sdc5
18:25:40,583 INFO program: Running... e2fsck -f -p -C 0 /dev/sdd1
18:34:31,307 INFO program: Running... e2fsck -f -p -C 0 /dev/loop3
18:34:31,313 INFO program: e2fsck: Cannot continue, aborting.
18:34:31,379 INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/live-rw
18:34:31,387 INFO program: e2fsck: Cannot continue, aborting.
18:34:31,435 INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/live-base
18:34:31,441 INFO program: e2fsck: Operation not permitted while trying to open /dev/mapper/live-base
18:34:31,492 INFO program: Running... e2fsck -f -p -C 0 /dev/mapper/live-osimg-min
18:34:31,497 INFO program: e2fsck: Operation not permitted while trying to open /dev/mapper/live-osimg-min

Comment 45 Andy Campbell 2015-06-09 17:36:32 UTC

Created attachment 1036924 [details]
Request install log with e2fsck commands

Comment 46 Brian Lane 2015-06-09 18:45:06 UTC

(In reply to Andy Campbell from comment #44)

> 13:17:37,177 INFO program: Running... e2fsck -f -p -C 0
> /dev/mapper/vg_neotrantor-entertainment

Thanks, this is a different problem. It looks like blivet is running e2fsck unconditionally. Please open a new bug against python-blivet with the logs from /tmp/*log attached to it as individual text/plain attachments.

Comment 47 Adam Williamson 2016-11-07 18:46:10 UTC

*** Bug 1390027 has been marked as a duplicate of this bug. ***

Comment 48 Adam Williamson 2016-11-07 18:53:48 UTC

So this bug was never really fixed, and this is not really a 'different problem'.

In #1162215 we noticed that resize2fs was requiring us to run e2fsck before it would tell us a minimum size for the filesystem, so we started running it on all filesystems to make sure we could get the minimum size info.

This bug says 'wait a minute, instead of having anaconda fsck everything, we should just make resize2fs tell us the minimum size without requiring an fsck if the fs has been mounted since last check'.

And so Eric changed resize2fs:

https://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/commit/?id=0462fd6db55de28d7e087d8d06ab20339acd8f67

and then he submitted an update, and marked it as fixing this bug, and this bug got closed.

But that was wrong, because a crucial step was missed: we never actually changed anaconda/blivet back to not running fsck on everything again. It no longer *needed to*, but it still actually *was*.

So this bug has been the same bug all along, and never has been fixed. We actually need to change this code:

https://github.com/rhinstaller/blivet/blob/2.1-devel/blivet/formats/fs.py#L122-L125
https://github.com/rhinstaller/blivet/blob/2.1-devel/blivet/formats/fs.py#L277-L279

basically, `FS.__init__()` calls `self.update_size_info()` (which means that will run for *every* filesystem anaconda sees), and `update_size_info()` unconditionally runs `do_check()`, which if the filesystem in question is ext2/3/4, results in a run of `e2fsck -f -p -C 0` on that filesystem.

Comment 49 Adam Williamson 2016-11-07 18:56:43 UTC

For the record, cmurf correctly notes that this actually results in anaconda making changes to filesystems not involved in the install, because `e2fsck -p` means:

-p    Automatically  repair  ("preen") the file system.  This option will cause e2fsck to automatically fix any filesystem problems that can be safely fixed without human intervention.

even given that we trust it only makes 'safe' changes, that's pretty unexpected behaviour.

Also for the record, we re-considered the dupe (#1390027) as a blocker, but concluded that we'd stand by the original assessment of this bug as rejected blocker, even in light of the 'preen' thing.

Comment 50 Charles R. Anderson 2016-11-07 19:41:12 UTC

(In reply to Adam Williamson from comment #49)
> Also for the record, we re-considered the dupe (#1390027) as a blocker, but
> concluded that we'd stand by the original assessment of this bug as rejected
> blocker, even in light of the 'preen' thing.

Too bad.  So now we have to wait for Fedora 26 to get this bug fixed?  What do those of us with large filesystems on the same disk that we want to install to do until then? (Other disks could be unplugged during install, but not the install destination disk.)

Comment 51 Adam Williamson 2016-11-07 19:43:20 UTC

"Too bad.  So now we have to wait for Fedora 26 to get this bug fixed?"

I'm afraid so, yes.

"Too bad.  So now we have to wait for Fedora 26 to get this bug fixed?  What do those of us with large filesystems on the same disk that we want to install to do until then?"

I'll make an updates.img today and link it here and from common bugs. It's quite easy to hack out the fsck from the code in a slightly dirty way which isn't really appropriate to throw into the official release right now, but will work around the problem for those with giant filesystems.

Comment 52 Leslie Satenstein 2016-11-08 00:27:57 UTC

Will respins be able to have the corrected code?  In the past, Respins usually are generated a few days after the official Fedora Release.  Ditto for Remixes.

The latter two types of outputs (respin/remix) can prove the fix.

Comment 53 Adam Williamson 2016-11-08 00:29:50 UTC

Can't really say. Depends when we get around to fixing it and how complicated the fix is.

Comment 54 Adam Williamson 2016-11-08 02:26:33 UTC

https://www.happyassassin.net/updates/1170803.0.img should work around this for Fedora 25 users. Boot the installer with `inst.updates=https://www.happyassassin.net/updates/1170803.0.img` as a kernel parameter to use it.

(for dlehman - I basically just ripped the `do_check()` out of `update_size_info() and put the finally: block back in line).

Comment 55 Jeremy Eder 2016-11-15 20:42:31 UTC

Thanks Adam, that worked for me.  And then after booting fedora I had to boot with fsck.mode=skip.

The reason is that the partition it wants to fsck is 1.5TB if git repos, and is only 7200rpm...probably would take days.

Comment 56 Vratislav Podzimek 2016-11-18 13:08:40 UTC

My take on this https://github.com/rhinstaller/blivet/pull/526

Comment 57 Charles R. Anderson 2016-11-18 14:07:27 UTC

(In reply to Vratislav Podzimek from comment #56)
> My take on this https://github.com/rhinstaller/blivet/pull/526

No, I just tested this and -n still takes a very long time to run.  I had to wait 7.5 hours for the installer to get past this step, which is unacceptable.  The only solution is to remove the e2fsck call completely (or ask the user if they want to run it).  e2fsck is no longer required to solve the original reason that it was added since resize2fs was fixed.

Comment 58 Vratislav Podzimek 2016-11-21 09:09:16 UTC

(In reply to Charles R. Anderson from comment #57)
> (In reply to Vratislav Podzimek from comment #56)
> > My take on this https://github.com/rhinstaller/blivet/pull/526
> 
> No, I just tested this and -n still takes a very long time to run.  I had to
> wait 7.5 hours for the installer to get past this step, which is
> unacceptable.  The only solution is to remove the e2fsck call completely (or
> ask the user if they want to run it).  e2fsck is no longer required to solve
> the original reason that it was added since resize2fs was fixed.

The problem is that blivet/anaconda need to know if the file system is clean in order to decide whether it can be resized or not. The plan is to report that information from blivet to anaconda which could then ask user if they want to do a check or not. However, there's no fast, safe and noninvasive way to tell if an ext2/3/4 file system is clean or (probably) not. Maybe there should be some way to e.g. just check the journal? Without it, blivet/anaconda would consider all ext2/3/4 file systems unresizable and only after user explicitly runs checks on them they would be considered resizable (if the check succeeds, of course).

Comment 59 Charles R. Anderson 2016-11-21 15:07:42 UTC

(In reply to Vratislav Podzimek from comment #58)
> The problem is that blivet/anaconda need to know if the file system is clean
> in order to decide whether it can be resized or not.

Please re-read Eric Sandeen's comment #27, especially this part:

"resize2fs: don't attempt to calculate minimum size on fs with errors
    
...

so there are filesystems out there which will require a check prior to resize2fs -P, but certainly not *all* of them.  You could attempt resize2fs -P, and if that fails w/ the above message, run e2fsck if you still really wanted to, alert the user, etc."

> However, there's no fast, safe and noninvasive
> way to tell if an ext2/3/4 file system is clean or (probably) not. Maybe
> there should be some way to e.g. just check the journal? Without it,
> blivet/anaconda would consider all ext2/3/4 file systems unresizable and
> only after user explicitly runs checks on them they would be considered
> resizable (if the check succeeds, of course).

tune2fs shows Filesystem features: needs_recovery if they filesystem is dirty.  You could replay the journal by mounting/unmounting the filesystem.  You could try resize2fs -P to get the minimum size without running e2fsck as Eric Sandeen suggests.  When it finally comes to doing the actual resize, you could try running the resize2fs and only run e2fsck if that fails.

I believe the goals for any solution should be:

1. e2fsck should only be run when necessary, and only on disks and filesystems that were specifically selected for resize or installation.  e.g. if you don't select a disk, don't check filesystems on that disk.  If you select a filesystem for reformat, it shouldn't be checked.  If you don't select a filesystem for any operations at all, it shouldn't be checked.

[The current method of checking all disks, all filesystems fails to check encrypted partitions, so delaying checks until the last possible moment has the additional benefit that the checks could be applied to encrypted filesystems as well]

2. On filesystems that have been selected for operations to be performed on them, avoid running e2fsck wherever possible.  Check for dirty flag with tune2fs, try mounting filesystem to cause the kernel to replay the journal if necessary, try resizing the filesystem without running e2fsck first.  Only if all those steps fail, then fall back to running e2fsck.

3. As a last resort when e2fsck is determined to be necessary, ask the user to confirm this, warning them that it may take many hours or days to complete the check. (It might even be possible to estimate the time based on the speed of the disk and the "used" size of the filesystem.)

4. When finally running e2fsck, provide visual feedback of the progress and allow the user to cancel the operation.

Thanks.

Comment 60 Leslie Satenstein 2016-11-21 15:13:54 UTC

Perhaps one should select the drives to be part of the new installation before the integrity scan takes place. Post installation, one could start a background "re-check" on "first boot" or later. Implementing this change would be an anaconda design change F26/F27.

Comment 61 Charles R. Anderson 2016-11-21 17:04:45 UTC

(In reply to Leslie Satenstein from comment #60)
> Perhaps one should select the drives to be part of the new installation
> before the integrity scan takes place. Post installation, one could start a
> background "re-check" on "first boot" or later. Implementing this change
> would be an anaconda design change F26/F27.

Yes, but partition (not just disk) selection also needs to take place, because large/full partitions on a single disk cause this issue also, and one cannot exactly remove a partition just to do the installation like one can for a separate disk.  I have a 2.4TB data partition which is 1.7T full which took 12 hours to fsck when I tested this again a couple days ago, even though the partition was 1) unmounted cleanly, 2) had no journal to replay, and 3) had no fsck auto-checking options set:

Maximum mount count:      -1
Check interval:           0 (<none>)

Comment 62 Eric Sandeen 2016-11-21 19:30:24 UTC

If Anaconda wants to present a minimum size for any particular extN partition, then as of recent e2fsprogs, that partition will simply need to be without errors for "resize2fs -P" to calculate a minimum size.

But being "without errors" does not mean that a full e2fsck must be run.  "With errors" is a flag which gets set on the superblock if the filesystem encountered a runtime error which has not yet been fixed.  That flag will not be present the vast majority of the time, and resize2fs -P will Just Work, and will present a minimum size without needing fsck on most filesystems.

(Honestly, any filesystem which is presenting errors at install time should probably just be excluded (and noted) from the install targets - Anaconda should not be in the business of resolving such issues; if an existing filesystem requires repair that's something for the admin/owner to make a careful decision about before proceeding with a new install.)

If you actually want to shrink an extN filesystem, then it almost certainly will need an e2fsfsk first - it will only proceed if there have been no filesystem modifications since the last full fsck.  Shrinking is a very metadata intensive operation, and we don't want to run into inconsistencies and errors while performing that brain surgery, so a preceeding e2fsck is required - but only for filesystems which /will/ actually undergo shrink.

Comment 63 Adam Williamson 2016-11-21 20:47:38 UTC

The code is already written to check the filesystem prior to doing the actual shrink. The additional run of e2fck on all detected ext filesystems was very specifically added to deal with the 'resize2fs -P requires it' issue. I really think it would be pretty fine to just take it back out again.

I might try and come up with a PR for this, but I'm not totally sure as I've got a lot of other stuff to do.

Comment 64 Cristian Ciupitu 2016-11-21 22:16:12 UTC

I'm not sure if this issue applies to me, but I shared a disk with a virtual machine and I tried to install Fedora 25 Beta on it. The disk had several partitions on it, and some were part of the BTRFS filesystem (volume). Since I wasn't planning to install Fedora on that filesystem, I didn't feel the need to unmount it, but somehow it got corrupted pretty badly about that time. Could the design of blivet have anything with it or was it just a coincidence?

Comment 65 Adam Williamson 2016-11-21 23:53:05 UTC

It seems very unlikely to have anything at all to do with this bug. For a start, this is specific to ext2/3/4.

Comment 66 Cristian Ciupitu 2016-11-22 01:08:21 UTC

I was under the impression that fsck is run for all possible filesystems, but it's more problematic for ext2/3/4 because it takes longer.

Comment 67 Adam Williamson 2016-11-22 02:44:11 UTC

No, that's not really the case. ext, FAT, NTFS and HFS+ partitions are checked. All others are not. I don't know how fast or slow FAT, NTFS and HFS+ partition checking is compared to ext checking, but in any case, btrfs partitions are not checked (because the BTRFS class does not override the FS class's definition of self._fsck_class as fsck.UnimplementedFSCK).

Comment 68 Vratislav Podzimek 2016-11-22 09:41:11 UTC

Updated https://github.com/rhinstaller/blivet/pull/526 to avoid the e2fsck call.

However, I still don't think this is right. Blivet needs to know if the file system is in a good shape and can be resized. The "if the tools tell us the minimum size, the file system is okay and resizable" sounds twisted to me. Blivet should be able to get the information about the file system's shape in some quick way not based on any assumptions.

Eric, is there a way to run 'e2fsck' somehow for it to just check the 'clean' flag and return 0/1 based on the value of that flag? Would 'e2fsck -n' (without '-f') do that? If not, could such option be added?

Comment 69 Adam Williamson 2016-11-22 16:45:18 UTC

"Blivet needs to know if the file system is in a good shape and can be resized."

But...at least AIUI, it really can't. Knowing whether the error flag has been set is not entirely the same thing. This is a 'the map is the territory' problem. As Eric said, if we're really going to *do* a resize, we have to do a full fsck before we do so. There is no way around that. And I believe there's no practical way to predict whether an fsck run will succeed any faster than simply *doing the fsck run*.

To put it another way:

* We can quickly tell whether the error flag has been set.
* If the error flag has been set, we know we must run an fsck before doing the resize.
* If the error flag has not been set, we know we must run an fsck before doing the resize.

Comment 70 Adam Williamson 2016-11-22 19:18:39 UTC

As for the error flag, I'd say I think we all agree on this:

* If the error flag is set, we should just consider the partition fundamentally non-resizable (and maybe provide sufficient info for a front-end like blivet or blivet-gui to display a warning/info box to the user telling them to check the filesystem).

The question, I guess, is whether it's OK to rely on resize2fs -M's implicit check of the error flag or not. So the question is really, does e2fsprogs upstream consider the error flag check an implementation detail of the -M feature, or a fundamental part of its job? i.e. if it somehow became the case that -M could be made to print a minimum size without checking the error flag, would they do that?

If so, then I agree with you that anaconda should ideally do an independent check of the error flag before bothering with the `resize2fs -M` call, and if the error flag is set, just mark the partition as not resizeable.

Comment 71 Chris Murphy 2016-11-22 21:07:32 UTC

This also seems unnecessary, it's run on the rootfs.img as part of the compose process, so there's no reason thousands of installations need to repeat that particular check.

Running... e2fsck -f -p -C 0 /dev/loop1
Running... e2fsck -f -p -C 0 /dev/mapper/live-rw
Running... e2fsck -f -p -C 0 /dev/mapper/live-base

The least amount of code change might be to just drop all of the flags being used, which should get a simple and fast pass/fail for whether e2fsck thinks the fs is clean.

And then only run resize2fs to get a minimum size on ext234 file systems that are located on the user selected destination device. Once installation begins, either the real e2fsck prior to resize, or the resize operation could fail, so there needs to be error handing there anyway. I'm not really sure what's gained by checking everything in advance, whether the user has any intention to modify those file systems.

Comment 72 Eric Sandeen 2016-11-22 22:21:26 UTC

(In reply to Vratislav Podzimek from comment #68)

> Eric, is there a way to run 'e2fsck' somehow for it to just check the
> 'clean' flag and return 0/1 based on the value of that flag? Would 'e2fsck
> -n' (without '-f') do that? If not, could such option be added?

"clean" simply means "no log replay needed" - I don't think there is any tool that allows you to query exactly that with a specific return value for the result.

You could also parse dumpe2fs -h output, it will contain one of the following:

Filesystem state:         clean
Filesystem state:         not clean
Filesystem state:         clean, with errors
Filesystem state:         not clean, with errors

You can quickly replay the log in userspace with e2fsck -E journal_only if "not valid" is the problem; if "error" is the problem, you must do a full e2fsck just to get the minimum size.  But again, if a filesystem is in this much distress I would /not/ try to have the installer deal with it.  Just ignore it and let the admin figure out what to do.

Comment 73 Fedora End Of Life 2017-02-28 09:39:02 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 74 Charles R. Anderson 2017-05-22 21:34:38 UTC

This bug still exists in the 20160521 nightly of Fedora 26.  Can it please be fixed before F26 release?  Thanks.

Comment 75 Charles R. Anderson 2017-05-22 21:35:31 UTC

Correction, 20170521 nightly of Fedora 26.

Comment 76 Charles R. Anderson 2017-05-22 21:37:36 UTC

*** Bug 1189905 has been marked as a duplicate of this bug. ***

Comment 77 Charles R. Anderson 2017-05-22 21:38:51 UTC

*** Bug 1375894 has been marked as a duplicate of this bug. ***

Comment 78 Fedora Blocker Bugs Application 2017-05-22 21:48:43 UTC

Proposed as a Blocker for 26-beta by Fedora user cra using the blocker tracking app because:

 Violates:

Other disks not touched [hide]

Disks not selected as installation targets must not be affected by the installation process in any way.

Comment 79 Adam Williamson 2017-05-26 21:10:52 UTC

This bug has already been rejected as a blocker twice before. Do you have any particular rationale for re-considering it at this time?

Comment 80 Charles R. Anderson 2017-05-26 21:48:53 UTC

(In reply to Adam Williamson from comment #79)
> This bug has already been rejected as a blocker twice before. Do you have
> any particular rationale for re-considering it at this time?

Because this violates the criterion given (don't touch disks/filesystems not selected for install).  Preen /may/ be safe, but are we sure it is safe and always will be and will never have a bug that might accidentally destroy data?  The whole point of disk selection is as an extra safeguard to avoid touching stuff you know will not be necessary for the install.

Because the purported reasons to keep doing e2fsck before disk/filesystem selection are bogus.  For example, there may be other filesystems that become available later that are never subjected to the early e2fsck (encrypted ones for example).

Because it is absurd to accept that installing Fedora takes 12+ hours where there are preexisting large filesystems unrelated to the installation due to the refusal to remove the broken workaround that was originally put in place as a /temporary/ way to solve another problem that has long since been fixed the correct way.

Comment 81 Adam Williamson 2017-05-26 22:00:44 UTC

So, nothing new, then?

Comment 82 Eric Sandeen 2017-05-26 22:03:06 UTC

To be quite honest, I've lost the thread here.  Is there more work required in e2fsprogs to resolve this?  I /think/ I fixed the underlying problem in 2014, but if there's more to it, please remind me.

Thanks,
-Eric

Comment 83 Chris Murphy 2017-05-26 22:11:51 UTC

The relevant criterion is actually alpha:
The user must be able to select which of the disks connected to the system will be affected by the installation process. 
Disks not selected as installation targets must not be affected by the installation process in any way.

If the file system is clean, the disk is not affected, so the criterion is not violated. If the file system on the non-selected disk is fixed, then the criterion is violated. That's how the criterion reads. Since the installer still indiscriminately runs 'e2fsck -f -p -C 0' on all ext file systems, even on devices not selected as installation targets, it's very obviously a criterion violation.

So the rationale for reconsidering is, the previous explanations for rejecting it flat out ignore the criterion: non-selected disks must not be affected in any way; with just a handwave. So make it a blocker or revise the criterion.

Per 62 and 72 I think anaconda needs to be out of the fsck business entirely. If dumpe2fs -h indicates the fs is clean, it can be included as an installation target. If it's anything other than clean, it's excluded.

Comment 84 Chris Murphy 2017-05-26 22:22:12 UTC

Since an fs volume other than "clean" is rare, it's arguably a local configuration issue that both the case where the file system is modified, and the file system is huge and fsck takes a long time. Since it doesn't always happen, it probably can be argued this is a conditional blocker.

But I still think the installer is being ornery, running e2fsck on everything in sight.

Comment 85 Charles R. Anderson 2017-05-26 22:47:35 UTC

(In reply to Chris Murphy from comment #83)
> So the rationale for reconsidering is, the previous explanations for
> rejecting it flat out ignore the criterion: non-selected disks must not be
> affected in any way; with just a handwave. So make it a blocker or revise
> the criterion.

I was just going to say the same thing.  I was busy looking into the history:

The original criterion for blocking Fedora 21 was:

"When using a dedicated installer image, the installer must be able to
complete an installation using the text, graphical and VNC
installation interfaces. This criterion covers showstopper bugs in the
installer for which there isn't any other specific criterion:
obviously, it can't 'complete an installation' if there's a
showstopper."

The decision was:

"1170803 - RejectedBlocker - This bug doesn't clearly violate any
criteria and looks to be getting worked on either way. Please
repropose if it's found to violate another criterion."

Then the underlying issue with e2fsprogs was fixed and updated
packages released.  Everyone had to wait until Fedora 22 for a
possible installer fix, but blivet was never changed to remove the
now-no-longer-needed call to "e2fsck -f -p" on all EXT filesystems on
all attached devices.

I re-discovered this issue and filed a dupe (#1390027) which
officially asked for this to be blocked on the other critereon:

"Disk selection

The user must be able to select which of the disks connected to the
system will be affected by the installation process.  Other disks not
touched [show] eferences [show]"

The decision was:

"The decision to classify this bug as a RejectedBlocker and a
RejectedFreezeException was made as this is an issue that has been
around since Fedora 21 and has not blocked since. Though we note it
causes e2fsck to perform 'safe' fixes on non-selected filesystems, as
well as taking a long time if large ext2/3/4 filesystems are present,
we don't see sufficient reason to change that decision or change this
as an FE at this time."

Looking into the meetbot logs, some of the reasons given for the
rejections are:

- "looks to be getting worked on either way"
- "because this issue was been around since Fedora 21"
- "because we are too close to the final release"
- "so obviously there's no momentum to make it a blocker even though it meets the alteration requirement people were saying needed to be true to make it a blocker"
- "it can survive one more release. But yes, let's ask for that to be prioritized"

Here we are 5 releases later.  I think these lines of reasoning
deserve to be reconsidered.

Comment 86 Adam Williamson 2017-05-26 23:28:00 UTC

So in future, if you're going to do this, could you please re-propose the bug as a blocker, say, *six weeks* before the relevant release? Rather than the day we do the go/no-go meeting?

Because now we've got another contentious issue thrown right back in at the death of a milestone that's already enough of a mess thanks to the libdb issue. It's a lot easier to make sensible decisions and ensure things are fixed properly when we're not all rushing around like a bunch of headless chickens trying to do things at the last minute.

Comment 87 Charles R. Anderson 2017-05-27 18:08:46 UTC

(In reply to Adam Williamson from comment #86)
> So in future, if you're going to do this, could you please re-propose the
> bug as a blocker, say, *six weeks* before the relevant release? Rather than
> the day we do the go/no-go meeting?

Sorry, I didn't re-discover this issue until this week when I tried an install on one of my systems that has large/filled filesystems.  I filed it on Monday, but it didn't go into the blocker tracker because of the RejectedBlocker keyword which I discovered/fixed on Friday.

> Because now we've got another contentious issue thrown right back in at the
> death of a milestone that's already enough of a mess thanks to the libdb
> issue. It's a lot easier to make sensible decisions and ensure things are
> fixed properly when we're not all rushing around like a bunch of headless
> chickens trying to do things at the last minute.

I'd be fine with moving this to a Final Blocker rather than Beta.

Comment 88 Chris Murphy 2017-05-28 21:49:01 UTC

(In reply to Adam Williamson from comment #86)
> So in future, if you're going to do this, could you please re-propose the
> bug as a blocker, say, *six weeks* before the relevant release? Rather than
> the day we do the go/no-go meeting?

That's rather blame the messenger. We need a better procedure. I've long argued for a better kick the can down the road process.

What we have now is kick the can down the road and hide it under the carpet. What I've suggested in the past, and ho hum rejected, is making certain bugs proposed as blockers as blocking the next release rather than the current release.

A consistent procedure in writing sounds nice and ideal, but a subjective process where we just do that is already better than what we have right now. At least stop sweeping certain bugs under the carpet and hope they get fixed on their own (or that no one nominates it as a blocker again).


> Because now we've got another contentious issue thrown right back in at the
> death of a milestone that's already enough of a mess thanks to the libdb
> issue. It's a lot easier to make sensible decisions and ensure things are
> fixed properly when we're not all rushing around like a bunch of headless
> chickens trying to do things at the last minute.

The procedure we have is what enables last minute running around.

Maybe at freeze we have a hard cutoff for new blocker bugs that are not regressions? If it's a regression, then it's current release current milestone blocker worthy. And if not then it gets kicked down the road to one of the next two milestones: in this case that would be either Fedora 26 Final, or Fedora 27 Alpha.

So my proposal is to kick the can down the road, but do no sweep it under the carpet, make it a Fedora 26 Final blocker subject to the input from the team tasked with fixing it, and if they have a compelling argument for pushing it to Fedora 27 Alpha instead, then do that.

Comment 89 Petr Schindler 2017-05-30 16:37:56 UTC

Discussed at 2017-05-30 blocker review meeting: [1]. 

This bug was accepted as F27 Alpha blocker and rejected as F26 Beta blocker: This bug has been in existence for a while now, and it's too sweeping of a change to accept for F26 Beta. However, per the logic in Comment 88, we've accepted this as a violation of the Alpha "No disks touched" criterion for F27.

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2017-05-30/

Comment 90 Marek Marczykowski 2017-06-02 18:59:08 UTC

Besides performance impact, this bug have also other aspects:
 - may cause data loss, if one use signed block device (dm-verity, like in Chromium OS), or have hibernated system on that disk
 - may lead to security issue - for example if you (re-)install system on previously compromised machine - then compromised system may try to leave specially crafted filesystem metadata to exploit fsck or other filesystem parsing code involved (see below) of the new system

Both applies also to mount and parsing selected files (/etc/fstab, /etc/os-release) from there. I hope this happens only to disks selected for installation - not all of them. Anyway it would be nice to have an option to opt-out from this feature - at the cost of not having detailed information in partitioning wizard.

Cross referencing downstream issue: https://github.com/QubesOS/qubes-issues/issues/2835

Comment 91 Marc Muehlfeld 2017-06-14 14:38:16 UTC

I installed F26 beta yesterday. After I hit the "next" button on the first Anaconda screen, the installer freezed for about 5 minutes. In the "ps" output I saw that Anaconda run e2fsck processes on each partition of my 3 disks (2x 1 TB HDD, 1x 500 GB NVMe). The same freeze happened if I clicked the "refresh" button in the Anaconda partitioning tool to re-read the partition layout.

It would be nice if the installer could tell the user that it runs file system checks in the background and that it can take several minutes. A frozen installer without any information makes users nervous. :-)

Comment 92 Vratislav Podzimek 2017-06-16 09:45:11 UTC

An updates.img that should hopefully fix the issues:
http://vpodzime.fedorapeople.org/1170803_updates.img

Please try!

Comment 93 Kamil Páral 2017-06-16 11:52:33 UTC

Hey Vratislav, can you please describe the changes you made? Thanks a lot.

Comment 94 Vratislav Podzimek 2017-06-16 12:38:05 UTC

It's this PR -- https://github.com/rhinstaller/blivet/pull/526 -- which basically does two things:

1. changes how blivet runs e2fsck to not make any changes to the FS (because there's right now no way to ask user for confirmation)

2. removes the FS checks when determining whether and FS instance is resizable and what its minimum size is because we no longer need to do it

Comment 95 Adam Williamson 2017-06-20 17:00:30 UTC

Is the updates.img against current F26?

Comment 96 Vratislav Podzimek 2017-06-21 09:09:29 UTC

(In reply to Adam Williamson from comment #95)
> Is the updates.img against current F26?

"current" at the time it was created. Any problems with it?

Comment 97 Adam Williamson 2017-06-21 18:19:19 UTC

well, just trying to clarify what people should test with. I usually generate updates images against the tag matching the package build in current nightly composes, so people can properly test against a nightly compose...

Comment 98 Adam Williamson 2017-07-11 18:28:21 UTC

Seems like this has now been merged to blivet upstream, but not released yet. Vratislav, can we get it built for Rawhide and then we can finally close this one out? :)

Comment 99 Adam Williamson 2017-08-20 16:33:54 UTC

So this went into python-blivet-2.1.10-1, which is now built for Rawhide, F27 and F26. I'll mark it ON_QA for now, so we can verify the fix with recent Rawhide / F27 composes.

Comment 100 Adam Williamson 2017-08-20 16:34:33 UTC

Moving to Beta blocker, as we're not doing an Alpha for F27.

Comment 101 Adam Williamson 2017-08-23 22:01:57 UTC

Just as a note, the fix for this seems to have broken resizing - see https://bugzilla.redhat.com/show_bug.cgi?id=1484575 .

Comment 102 Adam Williamson 2017-09-12 01:55:57 UTC

Can anyone who has a system where the impact of this is very obvious please test with any recent F27 compose and confirm whether it's fixed? Thanks. See https://www.happyassassin.net/nightlies.html for image download links (that's my personal domain, if you don't trust me, just verify the links are https links to Fedora domains before downloading).

Comment 103 Kamil Páral 2017-09-25 16:45:28 UTC

It seems no one with affected system is willing to test the fix. Closing the bug, hopefully everything works. It if doesn't, please reopen.

Comment 104 Leslie Satenstein 2017-09-25 23:45:26 UTC

Did not receive notice to do a test.  If it is part of kogj, I would have tested it.
The only request I received was today, Sept 25.

My email address has not changed.

It works better than before, I have been using 
Fedora 27 pre-beta 
for versions dated 02, 11, 18, 19, and 23 (on 5 disks with 6 systems).

Before the fix, the scan took 7 minutes elapsed to do the scan.
I will post a response today (EST on improvements).

Comment 105 Leslie Satenstein 2017-09-26 01:41:19 UTC

I have tested the fixes using 

Fedora-Workstation-netinst-x86_64-27-20170925.n.0.iso 


From the clicking of Continue (and responding yes to the beta "risk message",

Elapsed time dropped from seven minutes to about 2.¼ minutes.

A substantial improvement. A posiitive change.


Thank you.

Leslie


PS. I never received any emails notifications about readiness, except for Comment 103.  I do check spam filters.

7 minutes --> 2.25 minutes to scan 7 Fedora distributions file systems.
lvm, btrfs, btrfs, ext4.  F25, F26 3x(F27 test) 

Here is what I have that I used  for timings

4 disks are 1 terrabyte sata2 units.  1 is an SSD. 
I typically scratch the DISKC to test nightlies on true hardware.

DISKA F27 lvm  GNOME         test
DISKB F26 ext4 XFCE
DISKC F27 btrfs Gnome        test    
DISKC F27 lvmthin GNOME      test
DISKD F25 ext4    GNOME      test
DISKE F26 BTRFS W /Home on xfs GNOME   

System Ram 8 gigs. CPU Q9650 (Dual core 64bit Intel, circa 2012)

Comment 106 Leslie Satenstein 2017-09-26 02:19:48 UTC

Addendum

All spinning disks are 7200rpm, each with 64megs cache, and 2 have NCQ

A second test ran in 2 minutes, about 15 seconds shorter time.

Seven minutes to 2 minutes is great.

Comment 107 Vratislav Podzimek 2017-09-27 11:44:27 UTC

(In reply to Leslie Satenstein from comment #106)
> Addendum
> 
> All spinning disks are 7200rpm, each with 64megs cache, and 2 have NCQ
> 
> A second test ran in 2 minutes, about 15 seconds shorter time.
> 
> Seven minutes to 2 minutes is great.

Cool!

Comment 108 Charles R. Anderson 2017-09-27 16:20:37 UTC

Also confirmed that this works for me now.  No e2fsck is being run before disk selection or custom partitioning.  No more 12+ hour wait at the Language Selection screen from trying to fsck 1.7TB of data.  Thanks!

Comment 109 David Jansen 2017-10-09 08:19:00 UTC

I ask here since the Fedora 25 common bugs has a link to this issue: the installer update fro F25 mentioned there and here, seems to be gone or off-line. Is this installer image mirrored somewhere? (I know I could switch to F26 but keeping all systems in a computer classroom at the same version would be preferable)

Comment 110 Kamil Páral 2017-10-09 09:52:03 UTC

David, we'll try to get the web server back up.

Adam, happyassassin.net seems to be down and therefore https://www.happyassassin.net/updates/1170803.0.img can't be downloaded. Can you please fix?

Comment 111 David Jansen 2017-10-10 07:22:53 UTC

The site seems to be back, thanks for fixing!
As an additional note: wouldn't it be advisable if there are installer updates like this that are advertised in teh release notes/ common bugs, to host these at Fedora, preferrably on the mirrors like rpm updates?

Comment 112 Adam Williamson 2017-10-11 22:11:24 UTC

Frankly it's just rather *easier* for me to do it on my personal domain. These are not being provided as Official Fedora Updates, it's just something I'm doing as a courtesy. It wouldn't actually be appropriate to publish this as if it were an Official Fedora Update as no-one at all besides me has verified the contents :) (you can of course expand the image file to check what's in it).

The site usually stays up pretty well, but I went on vacation and my cable modem somehow lost its signal a week before I got back, so it was down till I could get back and power cycle it. Sorry about that.