Bug 748708

Summary: treebuilder: "mkfs.ext4 -L Anaconda ..." returned non-zero exit status 127
Product: [Fedora] Fedora Reporter: John Reiser <jreiser>
Component: e2fsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: anaconda-maint-list, bcl, esandeen, josef, kzak, oliver
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-01 19:17:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
"strace --trace=file" beginning at the mkfs.ext4 none

Description John Reiser 2011-10-25 05:44:48 UTC
Description of problem: Composing an install DVD of rawhide (to be Fedora 17) for x86_64 fails during "creating the runtime image" with a complaint about mkfs.ext4.


Version-Release number of selected component (if applicable):
lorax-17.0   7f8a5e48b30a02afd6f9a71274a09ba73f892d60  (HEAD of treebuilder branch as of 2011-10-24 15:49:39)

How reproducible: both times


Steps to Reproduce:
1./usr/bin/pungi -c fedora-install-fedora.ks \
        --destdir=$DESTDIR --name Fedora --ver $VERSION --nosource
designating rawhide (eventually will become Fedora 17)
2.
3.
  
Actual results:
   [snip]
creating the runtime image
Traceback (most recent call last):
  File "/usr/bin/pungi", line 222, in <module>
    main()
  File "/usr/bin/pungi", line 124, in main
    mypungi.doBuildinstall()
  File "/usr/lib/python2.7/site-packages/pypungi/__init__.py", line 845, in doBuildinstall
    workdir=workdir, outputdir=outputdir)
  File "/usr/lib/python2.7/site-packages/pylorax/__init__.py", line 225, in run
    compression=compression, compressargs=compressargs)
  File "/usr/lib/python2.7/site-packages/pylorax/treebuilder.py", line 136, in create_runtime
    label="Anaconda", size=fssize)
  File "/usr/lib/python2.7/site-packages/pylorax/imgutils.py", line 220, in mkext4img
    mkfsargs=["-L", label, "-b", "1024", "-m", "0"], graft=graft)
  File "/usr/lib/python2.7/site-packages/pylorax/imgutils.py", line 207, in mkfsimage
    stdout=PIPE, stderr=PIPE)
  File "/usr/lib64/python2.7/subprocess.py", line 511, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mkfs.ext4', '-L', 'Anaconda', '-b', '1024', '-m', '0', '/dev/loop0']' returned non-zero exit status 127


Expected results: success


Additional info:
cpio-2.11-5.fc17.x86_64
device-mapper-1.02.67-1.fc17.x86_64
dosfstools-3.0.11-5.fc16.x86_64
e2fsprogs-1.42-0.5.WIP.1016.fc17.x86_64
findutils-4.5.10-1.fc16.x86_64
gawk-4.0.0-1.fc16.x86_64
GConf2-3.2.0-1.fc16.x86_64
genisoimage-1.1.11-8.fc16.x86_64
glibc-2.14.90-13.x86_64
glibc-common-2.14.90-13.x86_64
gzip-1.4-3.fc15.x86_64
hfsplus-tools-332.14-12.fc15.x86_64
isomd5sum-1.0.7-1.fc16.x86_64
module-init-tools-3.16-3.fc17.x86_64
package kernel-bootwrapper is not installed
package silo is not installed
package util-linux-ng is not installed
package yaboot is not installed
parted-3.0-3.fc17.x86_64
python-mako-0.4.2-2.fc17.noarch
squashfs-tools-4.2-1.fc16.x86_64
syslinux-4.02-5.fc17.x86_64
xz-5.1.1-1alpha.fc17.x86_64

Comment 1 John Reiser 2011-10-25 15:40:24 UTC
Created attachment 530130 [details]
"strace --trace=file" beginning at the mkfs.ext4

This is what "strace -f --trace=file" reveals about the mkfs.ext4, continuing until the end of the whole pungi run.

Comment 2 John Reiser 2011-10-25 18:27:19 UTC
Problem avoided by reverting from
    e2fsprogs-1.42-0.5.WIP.1016.fc17.x86_64
to
    e2fsprogs-1.41.14-2.fc15.x86_64
.  Yum is inadequate for this task; instead I used "rpm --install --force --oldpackage e2fsprogs-1.41.14-2.fc15.x86_64.rpm e2fsprogs-libs-1.41.14-2.fc15.x86_64.rpm".

Also noted from syslog (/var/log/messages):
Oct 25 10:59:40 f16a64 kernel: [ 8758.127527] mkfs.ext4[16843]: segfault at 0 ip           (null) sp 00007fffabf54168 error 14 in mkfs.ext4[400000+14000]
   [snip]
Oct 25 10:59:40 f16a64 abrtd: Directory 'ccpp-2011-10-25-10:59:40-16843' creation detected
Oct 25 10:59:40 f16a64 abrt[16845]: saved core dump of pid 16843 (/sbin/mkfs.ext4) to /var/spool/abrt/ccpp-2011-10-25-10:59:40-16843 (491520 bytes)
Oct 25 10:59:40 f16a64 abrtd: Package 'e2fsprogs' isn't signed with proper key
Oct 25 10:59:40 f16a64 abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-2011-10-25-10:59:40-16843 (res:2), deleting


Will try to re-assign to e2fsprogs.

abrt-libs-2.0.4.981-3.fc16.x86_64
abrt-addon-python-2.0.4.981-3.fc16.x86_64
abrt-addon-ccpp-2.0.4.981-3.fc16.x86_64
abrt-desktop-2.0.4.981-3.fc16.x86_64
abrt-addon-vmcore-2.0.4.981-3.fc16.x86_64
abrt-addon-kerneloops-2.0.4.981-3.fc16.x86_64
abrt-gui-2.0.4.981-3.fc16.x86_64
abrt-retrace-client-2.0.4.981-3.fc16.x86_64
abrt-2.0.4.981-3.fc16.x86_64

kernel-3.1.0-0.rc10.git0.1.fc16.x86_64 #1 SMP Wed Oct 19 05:02:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

Comment 3 Eric Sandeen 2011-10-25 18:35:27 UTC
> Oct 25 10:59:40 f16a64 abrt[16845]: saved core dump of pid 16843
> (/sbin/mkfs.ext4) to /var/spool/abrt/ccpp-2011-10-25-10:59:40-16843 (491520
bytes)

Is that core dump still around?  Otherwise I'll try to reproduce, thanks!

-Eric

Comment 4 Eric Sandeen 2011-10-25 18:43:25 UTC
Testing directly worked ok:

> subprocess.CalledProcessError: Command '['mkfs.ext4', '-L', 'Anaconda', '-b',
> '1024', '-m', '0', '/dev/loop0']' returned non-zero exit status 127

[root@inode e2fsprogs]# misc/mkfs.ext4 -L Anaconda -b 1024 -m 0 /dev/loop0
mke2fs 1.42-WIP (16-Oct-2011)
Filesystem label=Anaconda
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
32768 inodes, 131072 blocks
0 blocks (0.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
16 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks: 
	8193, 24577, 40961, 57345, 73729

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done 

[root@inode e2fsprogs]#

Comment 5 Eric Sandeen 2011-10-25 18:53:29 UTC
oh:

> /var/spool/abrt/ccpp-2011-10-25-10:59:40-16843 (res:2), deleting

crud.

Comment 6 Eric Sandeen 2011-10-25 18:58:53 UTC
Can you look at

/etc/abrt/abrt-action-save-package-data.conf

and set:

OpenGPGCheck = no

and do it again, to collect the core dump?

Thanks,
-Eric

Comment 7 John Reiser 2011-10-25 19:55:52 UTC
There is no coredump this time!

I set OpenGPGCHECK = no.  I "rpm --erase --force e2fsprogs e2fsprogs-libs" then install e2fsprogs-1.42-0.5.WIP.1016.fc17.x86_64 and corresponding e2fsprogs-libs.  Then I re-run the pungi+lorax(treebuilder) compose for both Fedora 16 (branched+updates+updates-testing) and Fedora 17 (rawhide).

I get the same higher-level error in both cases:
     subprocess.CalledProcessError: Command '['mkfs.ext4', '-L', 'Anaconda', '-b', '1024', '-m', '0', '/dev/loop0']' returned non-zero exit status 127
but syslog (/var/log/messages) has no mention of abrt for mkfs.ext4, only for:
     Oct 25 12:46:08 f16a64 pungi: abrt: detected unhandled Python exception in /usr/bin/pungi

Manually looking in /var/spool/abrt, I see no relevant subdirectories.  It's a mystery to me, too.

Comment 8 Eric Sandeen 2011-10-25 22:15:07 UTC
Hrmph.  ;)

Well shoot, I'm not sure how to proceed.

maybe

# ulimit -c unlimited

to allow coredumps?  I didn't think abrt required that though.

(Did you still get the segfault message from the kernel?)

Or, maybe I can reproduce; I don't know for sure what is different in my case where it's working.  It's mkfsing the loop device; maybe if you can grab /proc/partitions when it runs, maybe it's size-dependent and that will tell me what size to test with?

-Eric

Comment 9 John Reiser 2011-10-26 01:26:27 UTC
Here's a related-but-different tidbit.  While I had e2fsprogs-1.42-0.5.WIP.1016.fc17.x86_64 installed, I did a "yum update" which pulled in kernel-3.1.0-1.fc16.  [Yes, fc16: I'm trying to run only the anaconda/lorax/dracut/necessary packages from rawhide fc17.]  Booting that kernel failed because dracut aborted:
   e2fsck: symbol lookup error: e2fsck: undefined symbol: set_com_err_gettext
Evidently the initramfs for 3.1.0-1 was built with e2fs stuff that was broken.

So I went back to the previous kernel, re-instated the old e2fsprogs-1.41.14-2.fc15.x86_64 and -libs, erased the new kernel package, re-updated the new kernel package, and rebooted.  This time it booted correctly.

That may be enough excitement for a while, or until I can get a rawhide-only setup in which I have some confidence.

Comment 10 Eric Sandeen 2011-10-26 02:08:09 UTC
I think that you have your packages munged up.

"rpm --erase --force ... " gave me the heebie-jeebies a few comments back ;)

libcom_err is a separate package built from the e2fsprogs srpm, they should all match versions.  I bet you have different versions floating around now.

If you have to fight yum and manually install, please at least make sure all of:

      e2fsprogs
      e2fsprogs-debuginfo
      e2fsprogs-devel
      e2fsprogs-libs
      e2fsprogs-static
      libcom_err
      libcom_err-devel
      libss
      libss

(or at least as many as are installed) are in sync version-wise.

Comment 11 Eric Sandeen 2011-11-01 18:59:40 UTC
Is this still happening?  Does it happen on a non-hacked-up set of rpms? :)

Comment 12 John Reiser 2011-11-01 19:17:29 UTC
At the surface level, this was my user error: incorrectly swapping versions of e2fsprogs.  However, the toolset helped me become confused.  lorax(treebuilder) did not report precisely the actual error (external library symbol referenced by mkfs.ext4 was not defined).  Yum does not provide a way to swap versions of a package, "holding" the dependencies while removing one version and installing the other.  "yum upgrade" works in the direction of increasing version number from a repository, but there is no 'localupgrade' nor 'localdowngrade' with "bare" .rpm.  Yum also does not have 'localreinstall'.

Comment 13 Eric Sandeen 2011-11-01 19:23:44 UTC
Ok - where do you want to go with this bug? :)

Comment 14 Eric Sandeen 2011-11-01 19:28:57 UTC
Oh - you closed it!  My eyes...

Thanks!

-Eric