Bug 1602046 - resize2fs in e2fsprogs-1.44.2 breaks livecd-tools
Summary: resize2fs in e2fsprogs-1.44.2 breaks livecd-tools
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: e2fsprogs
Version: 28
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Lukáš Czerner
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-17 16:56 UTC by Scott Dowdle
Modified: 2018-08-14 06:56 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-08-13 17:54:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Scott Dowdle 2018-07-17 16:56:07 UTC
I continuously build remixes of Fedora releases... mostly when there is a kernel update... or a large amount of updates... so every week or two.  For the last couple of weeks I've been running into this error:

"Error creating Live CD : fsck after resize returned an error (1)!"

That is after it has figured out all of the deps, downloaded everything, installed everything, indexed the man pages... and is getting ready to compress the disk image to make the iso.  As mentioned in the summary, I'm using livecd-creator.

Comment 1 Scott Dowdle 2018-07-17 18:01:58 UTC
Tail end of build output:
- - - - -

110 man subdirectories contained newer manual pages.
7543 manual pages were added.
0 stray cats were added.
0 old database entries were purged.
/ 100.0%

Unmounting directory /var/tmp/imgcreate-xy8h7_2l/install_root failed, using lazy umount
Unmounting directory /var/tmp/imgcreate-xy8h7_2l/install_root failed, using lazy umount
lazy umount succeeded on /var/tmp/imgcreate-xy8h7_2l/install_root
Error creating Live CD : fsck after resize returned an error (1)!

Comment 2 Scott Dowdle 2018-07-26 20:35:20 UTC
I noticed that I don't have the issue until I upgrade to these packages:

e2fsprogs-1.44.2-0.fc28.x86_64
e2fsprogs-libs-1.44.2-0.fc28.x86_64
libcom_err-1.44.2-0.fc28.x86_64
libss-1.44.2-0.fc28.x86_64

Comment 3 Scott Dowdle 2018-07-26 21:19:19 UTC
Rolling back to these versions of the packages makes it work again:

e2fsprogs-1.43.8-2.fc28.x86_64.rpm
e2fsprogs-libs-1.43.8-2.fc28.x86_64.rpm
libcom_err-1.43.8-2.fc28.x86_64.rpm
libss-1.43.8-2.fc28.x86_64.rpm

Comment 4 Neal Gompa 2018-07-26 21:40:19 UTC
I've confirmed that this is caused by the e2fsprogs update. However, there have been changes to the function this error comes from in over a year: https://github.com/livecd-tools/livecd-tools/blob/e032bbea873b7095330b6b02d3711b7c713a4d6d/imgcreate/fs.py#L123-L148

This appears to be a bug with e2fsprogs' fsck within resize2fs. Reassigning to e2fsprogs.

Comment 5 Neal Gompa 2018-07-26 21:41:06 UTC
(In reply to Neal Gompa from comment #4)
> I've confirmed that this is caused by the e2fsprogs update. However, there
> have been changes to the function this error comes from in over a year:
> https://github.com/livecd-tools/livecd-tools/blob/
> e032bbea873b7095330b6b02d3711b7c713a4d6d/imgcreate/fs.py#L123-L148
> 

Sorry, I mean there have been _no_ changes to the function in over a year.

Comment 6 Eric Sandeen 2018-07-26 22:31:06 UTC
(In reply to Scott Dowdle from comment #1)

I hear you that e2fsprogs version seems to be the culprit, but:

> Unmounting directory /var/tmp/imgcreate-xy8h7_2l/install_root failed, using
> lazy umount
> Unmounting directory /var/tmp/imgcreate-xy8h7_2l/install_root failed, using
> lazy umount
> lazy umount succeeded on /var/tmp/imgcreate-xy8h7_2l/install_root
> Error creating Live CD : fsck after resize returned an error (1)!

if lazy unmount fails and the device is still mounted while fsck runs, inconsistencies would be expected.  I can't tell for sure if the lazy umount failure is related  to the fsck failure though.

Can you gather more info, such as what exactly was being fscked?

And find the spot in the scripts where live cd creator invokes resize, as well as the fsck after the resize, and generate an e2image prior to each of these steps?

It's invoked like:

# e2image -Q /dev/whatever livecd.qcow2
or
# e2image -Q /path/to/image livecd.qcow2
if it's a filesystem image

-Q means "use qcow2", 2nd arg is the filesystem (or fs image) and 3rd arg is the filename to save it to.

Then we can look at the image before & after the resize as well as what fsck is finding.

Thanks,
-Eric

Comment 7 Scott Dowdle 2018-07-26 23:03:11 UTC
I'm pretty sure prior to the update it was giving the same lazy umount message.  I have no idea what is causing that but I don't think that is a new issue.

So far as doing what you said to test it, I'm not sure how to go about doing that but I'm guessing maybe Neal could modify the livecd-creator source as needed... if he is willing.

Comment 8 Neal Gompa 2018-07-27 01:17:20 UTC
(In reply to Scott Dowdle from comment #7)
> I'm pretty sure prior to the update it was giving the same lazy umount
> message.  I have no idea what is causing that but I don't think that is a
> new issue.
>

Yes, we always had those messages. We just ignore them.
 
> So far as doing what you said to test it, I'm not sure how to go about doing
> that but I'm guessing maybe Neal could modify the livecd-creator source as
> needed... if he is willing.

If a specific modification is needed to test the behavior, I can certainly provide a modified build accordingly. But at least in this case, the code is straightforward.

(In reply to Eric Sandeen from comment #6)
> 
> And find the spot in the scripts where live cd creator invokes resize, as
> well as the fsck after the resize, and generate an e2image prior to each of
> these steps?
> 
> It's invoked like:
> 
> # e2image -Q /dev/whatever livecd.qcow2
> or
> # e2image -Q /path/to/image livecd.qcow2
> if it's a filesystem image
> 
> -Q means "use qcow2", 2nd arg is the filesystem (or fs image) and 3rd arg is
> the filename to save it to.
> 


I linked the invocation of resize2fs in the creator in comment 4.

However, here it is again: https://github.com/livecd-tools/livecd-tools/blob/e032bbea873b7095330b6b02d3711b7c713a4d6d/imgcreate/fs.py#L123-L148

Comment 9 Jeremiah 2018-07-27 01:41:00 UTC
Getting the same error..

Unmounting directory /home/jsummers/Play/korora/kp/build/release/tmp/imgcreate-ead2_n9k/install_root failed, using lazy umount
Error creating Live CD : fsck after resize returned an error (1)!

Doing a dnf downgrade e2fsprogs fixed my issue as well.

Comment 10 Scott Dowdle 2018-07-27 15:05:05 UTC
> Can you gather more info, such as what exactly was being fscked?
> 
> And find the spot in the scripts where live cd creator invokes resize, as
> well as the fsck after the resize, and generate an e2image prior to each of
> these steps?

I don't think we are really sure what you are wanting.

Comment 11 Eric Sandeen 2018-07-27 15:07:36 UTC
e2image creates a metadata image of the filesystem.

So after it gets unmounted, and before livecd-creator does resize, gather that image.
Then post-resize, gather another image.

We can look at both images and see what the corruption is (in the latter image) and rereate it to debug resize2fs (using the former image).

Comment 12 Scott Dowdle 2018-07-27 21:14:01 UTC
Oh, I didn't previously mention... but this is also broken in F27... but really, is there anyone remixing F27 at this point? Well, besides me?

I tried rolling back on one system and it did not fix the problem but I have more testing to do.

Comment 13 Scott Dowdle 2018-07-27 21:43:23 UTC
Hmm, now on one of the systems that was fixed by rolling back... it is broken again.

I'm starting to believe e2fsprogs isn't the problem.  Also noticed these hints by user JMiahMan in the #korora channel on the Freenode IRC network:

<JMiahMan> I put a input("Press Enter to continue...") in fs.py to give me a chance to examine the disk image before it runs resize
<JMiahMan> https://thepasteb.in/p/lOhO8kZQpvKfB
<JMiahMan> Pharaoh_Atem: the images are telling me to run a 'e2fsck -f ext3fs.img' first.
<JMiahMan> looks like it got a bit wonky
<JMiahMan> I'll rerun and replace it with a disk image that I have ran fsck on and see if it finishes properly
<JMiahMan> two tests no.. the pause until enter actually seems to fix it.. huh I wonder if something isn't finished or synced when it starts the resize

Will hopefully have more time this weekend for additional testing / scenarios.

Comment 14 Scott Dowdle 2018-07-27 22:55:32 UTC
I added a input statement which effectively adds a pause of undetermined duration since I don't usually notice it is waiting for input immediately to the file mentioned in comment 13 (/usr/lib/python3.6/site-packages/imgcreate/fs.py).  That file is part of the python3-imgcreate package.  Who knows if that file or package is even the culprit?!?

In any event, I upgraded to the latest and greatest e2fsprogs and used the same work around and I'm able to build .isos just fine... BUT I've only done this on one system a small number of times.  I'm guessing I need to do some more due diligence and gather more data.  Hopefully will have a little more to report by the end of the weekend.

Comment 15 Scott Dowdle 2018-07-28 14:19:58 UTC
Tried the pause work around on a system at home... and it works fine... again even after upgrading to the latest e2fsprogs.

Comment 16 Lukáš Czerner 2018-08-03 07:51:31 UTC
Hi Scott,

sorry for the late response, I've been really busy lately. Is there a quick and easy way to reproduce it ?

I already cloned the livecd-tools git and I can modify it myself to get some debugging out of it. I am just not really sure how to easily reproduce it.

Thanks!
-LUkas

Comment 17 Scott Dowdle 2018-08-07 02:06:08 UTC
Just try to use livecd-creator to build an iso.  You can install the spin-kickstarts and fedora-kickstarts packages which provide a lot of kickstarts... those used by the Fedora Project to build their release products.  The respin SIG uses the kickstarts to build the period updated isos they provide.

The pykickstart package provides the ksflatten command which can be used to collapse all of the related/included .ks files into a single .ks file.

Basically there are two tools provided for building live isos: 1) livecd-creator and 2) livemedia-creator.  I prefer livecd-creator because it can use a cache directory flag and not have to download everything every build.  livecd-creator also understands the include statement and is more flexible with repo references.

Here's an example of using it:

livecd-creator \
  --cache=/root/livecd-creator/package-cache \
  -c MontanaLinux-F28-x86_64.ks \
  -f MontanaLinux-F28-012-x86_64

I'm able to reproduce the issue every time I try to build... although it does take a while as it successfully goes through a number of time-consuming steps before it hits the place where it fails.

If you have any additional questions, let me know.

Comment 18 Lukáš Czerner 2018-08-09 12:35:16 UTC
Just tried this:

ksflatten -c /usr/share/spin-kickstarts/fedora-live-base.ks > fedora-live-base.ks
livecd-creator --cache=/home/lczerner/test/cache/ -c fedora-live-base.ks -f fedora-live-base

and it worked no problem, I have the image built without complains.

So I downloaded stuff from 
http://img.cs.montana.edu/linux/montanalinux/config/f28/

and did 

livecd-creator --cache=/home/lczerner/test/cache/ -c MontanaLinux-F28-x86_64.ks -f MontanaLinux-F28-012-x86_64

...
139 man subdirectories contained newer manual pages.
11122 manual pages were added.
0 stray cats were added.
0 old database entries were purged.
cp: cannot stat '/root/livecd-creator/MontanaLinux/.tmux.conf': No such file or directory
cp: cannot stat '/root/livecd-creator/MontanaLinux/.tmux.conf': No such file or directory
cp: cannot stat '/root/livecd-creator/MontanaLinux': No such file or directory
ignoring %post failure (code 1)
/ 100.0%

I: -input-charset not specified, using utf-8 (detected in locale settings)
Size of boot image is 4 sectors -> No emulation
Size of boot image is 10516 sectors -> No emulation
Size of boot image is 23104 sectors -> No emulation
  0.27% done, estimate finish Thu Aug  9 14:31:53 2018
  0.54% done, estimate finish Thu Aug  9 14:31:53 2018
  0.82% done, estimate finish Thu Aug  9 14:31:53 2018
  1.09% done, estimate finish Thu Aug  9 14:31:53 2018
  1.36% done, estimate finish Thu Aug  9 14:31:53 2018
...
 99.93% done, estimate finish Thu Aug  9 14:32:11 2018
Total translation table size: 2048
Total rockridge attributes bytes: 2973
Total directory bytes: 10240
Path table size(bytes): 78
Max brk space used 1c000
1841330 extents written (3596 MB)
...
                          
and again it worked. There were some problems about missing tmux.conf files, but other than that I have the image built. No fsck problems at all. Am I missing something ? For some reason I can't reproduce.

Name         : livecd-tools
Epoch        : 1
Version      : 25.0
Release      : 6.fc28


Name         : e2fsprogs
Version      : 1.44.2
Release      : 0.fc28

kernel 4.17.12-200.fc28.x86_64

-Lukas

Comment 19 Scott Dowdle 2018-08-09 12:53:58 UTC
Yeah, the missing files are not provided by packages and since you didn't copy those, they are expected to be missing.

I was able to duplicate with stock kickstarts at time of bug report and others were able to duplicate too... but as we discovered over time, it seemed to be timing related and the work around was just to put a pause in there... without really knowing where the problem lies specifically.

So, given the fact that Fedora has a firehose of updates and a lot has changed since the initial bug report, it is totally possible that something has subtly affected the timing making it work.  I know that is a lot of hand waving but it is the best I can do.

I'll see if I can duplicate it working again... sometime later today and report back my findings.

Comment 20 Lukáš Czerner 2018-08-09 13:51:24 UTC
Yeah it does look to be timing sensitive. I'll see if I can reproduce it as well

Thanks!
-Lukas

Comment 21 Scott Dowdle 2018-08-13 17:54:29 UTC
I was unable to reproduce it this weekend after 7 different builds... so I believe this bug can be closed now... and hopefully we won't run into it again anytime soon.

Comment 22 Lukáš Czerner 2018-08-14 06:56:01 UTC
Fair enough, please reopen this if you even see it again.

Thanks!
-Lukas


Note You need to log in before you can comment on or make changes to this bug.