RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1339691 - supermin init segfaults when kernel has large modules
Summary: supermin init segfaults when kernel has large modules
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: supermin
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Richard W.M. Jones
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-25 15:40 UTC by Luiz Capitulino
Modified: 2016-11-03 23:04 UTC (History)
3 users (show)

Fixed In Version: supermin-5.1.16-2.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-03 23:04:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Run with debugging and tracing enabled (32.11 KB, text/plain)
2016-05-25 15:40 UTC, Luiz Capitulino
no flags Details
libguestfs-test-tool output (26.92 KB, text/plain)
2016-05-25 15:42 UTC, Luiz Capitulino
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2154 0 normal SHIPPED_LIVE supermin bug fix update 2016-11-03 13:13:19 UTC

Description Luiz Capitulino 2016-05-25 15:40:46 UTC
Created attachment 1161488 [details]
Run with debugging and tracing enabled

Description of problem:

I've installed a RHEL7.2 guest with virt-install and a kickstart file. Then I try to copy a file into the guest:

[root@virtlab508 guest-automated-tests]# virt-copy-in -d kvm-rt-guest run-test.sh /root/bin
libguestfs: error: appliance closed the connection unexpectedly.
This usually means the libguestfs appliance crashed.
Do:
  export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1
and run the command again.  For further information, read:
  http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs
You can also run 'libguestfs-test-tool' and post the *complete* output
into a bug report or message to the libguestfs mailing list.
libguestfs: error: guestfs_launch failed.
This usually means the libguestfs appliance failed to start or crashed.
Do:
  export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1
and run the command again.  For further information, read:
  http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs
You can also run 'libguestfs-test-tool' and post the *complete* output
into a bug report or message to the libguestfs mailing list.
[root@virtlab508 guest-automated-tests]# 

My host system is latest RHEL7.3 and is setup for real-time. This problem started happening yesterday, possibly after a system update. virt-copy-in and virt-copy-out have been working for several days without problems.


Version-Release number of selected component (if applicable): libguestfs-tools-1.32.4-1.el7.noarch


How reproducible:


Steps to Reproduce:
1. Install a RHEL7.2 guest
2. Try to use virt-copy-in
3.

Comment 1 Luiz Capitulino 2016-05-25 15:42:36 UTC
Created attachment 1161489 [details]
libguestfs-test-tool output

Comment 2 Richard W.M. Jones 2016-05-25 15:48:39 UTC
Interestingly the supermin init process segfaults:

[    0.966816] init[1]: segfault at 7ffd1319b278 ip 0000000000400f06 sp 00007ffd1319b280 error 6 in init[400000+c3000]

Is this supermin from the RHEL 7.3 preview repository?

It's probably segfaulting in this loop, or just after this loop:

https://github.com/libguestfs/supermin/blob/master/init/init.c#L132

Comment 4 Luiz Capitulino 2016-05-25 15:58:06 UTC
Yes, package version is: supermin5-5.1.16-1.el7.x86_64

Is there a quick recipe for me to run supermin from sources and debug this issue?

Comment 5 Richard W.M. Jones 2016-05-25 16:11:44 UTC
I believe it's crashing here:

  https://github.com/libguestfs/supermin/blob/master/init/init.c#L320

I'm very unclear why.

You can run your own version of supermin by checking out:

  https://github.com/libguestfs/supermin

and running the test suite:

  sudo yum-builddep supermin
  ./autogen.sh
  make
  make check

The binary which fails is built in init/init.c.  However you can't
really run the binary on its own since it's an early boot process.
You can insert printf's into it.

Comment 7 Richard W.M. Jones 2016-05-25 16:18:01 UTC
Sorry the above comment is wrong.  The supermin test suite doesn't
actually run the init code.

Testing this is tricky.  Probably the best thing would be to
copy the supermin binary (src/supermin) over /usr/bin/supermin5
and then run libguestfs-test-tool.

The path to supermin is hard-coded into libguestfs-test-tool, it
is not possible to change it through environment variables etc.

Comment 8 Richard W.M. Jones 2016-05-25 16:21:49 UTC
And also, you have to do:

  rm -rf /var/tmp/.guestfs-*
  libguestfs-test-tool

else it will continue to use the old cached appliance.

Comment 9 Richard W.M. Jones 2016-05-25 16:26:20 UTC
One thought - could it be that your kernel modules are unstripped
and therefore huge?  The way the code is currently written, the
whole module is stored on the stack.

I notice you're not running a RHEL kernel.

What would be useful would be to do:

  cpio -itv < /var/tmp/.guestfs-`id -u`/appliance.d/init

and see if any of the kernel files (*.ko) is especially big.

Comment 10 Richard W.M. Jones 2016-05-25 16:27:01 UTC
That command should be:

cpio -itv < /var/tmp/.guestfs-`id -u`/appliance.d/initrd

Comment 11 Richard W.M. Jones 2016-05-25 16:40:30 UTC
On the basis this is likely to be a stack overflow I posted
the following patch upstream:

https://www.redhat.com/archives/libguestfs/2016-May/msg00215.html

Waiting for confirmation from comment 10 before moving to POST.

Comment 12 Luiz Capitulino 2016-05-25 18:09:53 UTC
How big is huge? :)

I'm running the RHEL RT kernel, will be an official kernel for next RHEL.

[root@virtlab508 tmp]# cpio -itv < /var/tmp/.guestfs-`id -u`/appliance.d/initrd
drwxr-xr-x   2 root     root            0 May 25 10:46 .
-rw-r--r--   1 root     root       423958 May 25 10:46 ata_piix.ko
-rw-r--r--   1 root     root       110246 May 25 10:46 crc-ccitt.ko
-rw-r--r--   1 root     root       110038 May 25 10:46 crc-itu-t.ko
-rw-r--r--   1 root     root       218206 May 25 10:46 crc32-pclmul.ko
-rw-r--r--   1 root     root       211982 May 25 10:46 crc32.ko
-rw-r--r--   1 root     root       270966 May 25 10:46 crc32c-intel.ko
-rw-r--r--   1 root     root       111094 May 25 10:46 crc8.ko
-rw-r--r--   1 root     root       218894 May 25 10:46 crct10dif-pclmul.ko
-rw-r--r--   1 root     root     11056054 May 25 10:46 ext4.ko
-rwxr-xr-x   1 root     root       811064 May 25 10:46 init
-rw-r--r--   1 root     root      2090550 May 25 10:46 jbd2.ko
-rw-r--r--   1 root     root      3363558 May 25 10:46 libata.ko
-rw-r--r--   1 root     root       176198 May 25 10:46 libcrc32c.ko
-rw-r--r--   1 root     root       342286 May 25 10:46 mbcache.ko
-rw-r--r--   1 root     root          290 May 25 10:46 modules
-rw-r--r--   1 root     root       737334 May 25 10:46 sd_mod.ko
-rw-r--r--   1 root     root       169510 May 25 10:46 virtio-rng.ko
-rw-r--r--   1 root     root       301286 May 25 10:46 virtio_balloon.ko
-rw-r--r--   1 root     root       347726 May 25 10:46 virtio_blk.ko
-rw-r--r--   1 root     root       421950 May 25 10:46 virtio_console.ko
-rw-r--r--   1 root     root       241326 May 25 10:46 virtio_input.ko
-rw-r--r--   1 root     root       585798 May 25 10:46 virtio_net.ko
-rw-r--r--   1 root     root       709310 May 25 10:46 virtio_pci.ko
-rw-r--r--   1 root     root       384862 May 25 10:46 virtio_scsi.ko
45738 blocks

Comment 14 Richard W.M. Jones 2016-05-25 18:12:50 UTC
(In reply to Luiz Capitulino from comment #12)
> How big is huge? :)

> -rw-r--r--   1 root     root     11056054 May 25 10:46 ext4.ko

It actually crashed while loading this module, and that's quite
big.  I'm guessing allocating 11MB of stack wasn't such a great
idea.

I'll build a new supermin shortly with the posted patch, stay tuned ...

Comment 16 Xianghua Chen 2016-05-26 09:58:00 UTC
I want to reproduce and verify this bug, can you tell me how to allocate 11MB of stack to ext4.ko?
And about "RHEL RT kernel", can you give detailed info like what compose version are you using? 
That's all about how can I prepare the same env as yours, the more info the better , thanks a lot!

Comment 17 Richard W.M. Jones 2016-05-26 10:07:05 UTC
(In reply to Xianghua Chen from comment #16)
> I want to reproduce and verify this bug, can you tell me how to allocate
> 11MB of stack to ext4.ko?
> And about "RHEL RT kernel", can you give detailed info like what compose
> version are you using? 
> That's all about how can I prepare the same env as yours, the more info the
> better , thanks a lot!

Installing Linux version 4.4.9-rt17+ (root.lab.eng.bos.redhat.com)
should be sufficient to reproduce this. However it's not a brew
kernel.  Maybe Luiz can help to locate that kernel for you.

Comment 18 Luiz Capitulino 2016-05-26 12:51:43 UTC
(In reply to Richard W.M. Jones from comment #17)

> Installing Linux version 4.4.9-rt17+
> (root.lab.eng.bos.redhat.com)
> should be sufficient to reproduce this. However it's not a brew
> kernel.

Oh, now I got it why you said I wasn't running a RHEL kernel. It's true that I have that 4.4.9-rt17+ kernel installed, but I'm not running it:

[root@virtlab508 ~]# uname -r
3.10.0-408.rt56.290.el7.x86_64
[root@virtlab508 ~]# 

However, I might have ran virt-copy-in once while running 4.4.9-rt17+. So, how does libguestfs decide which kernel to run?

> Maybe Luiz can help to locate that kernel for you.

Xianghua, I'll send you instructions by email on how to download this kernel from my test machine.

Comment 19 Richard W.M. Jones 2016-05-26 13:04:22 UTC
It chooses the highest numbered kernel from /boot to run the
appliance.  It doesn't matter what kernel you are running
on the host.

I don't think it's really important to be able to QE this bug.
It's an obvious bug and is covered by the rebase we are already
doing for supermin (ie. bug 1271255).

Comment 20 Pino Toscano 2016-05-26 13:06:25 UTC
(In reply to Luiz Capitulino from comment #18)
> However, I might have ran virt-copy-in once while running 4.4.9-rt17+. So,
> how does libguestfs decide which kernel to run?

By default it picks the newest kernel found.  You can point supermin to the kernel to pick with the SUPERMIN_KERNEL environment variable; see:
http://libguestfs.org/supermin.1.html#ENVIRONMENT-VARIABLES

Comment 21 Luiz Capitulino 2016-05-26 13:15:50 UTC
Is there a good reason not to pick the kernel the host is running? What if the newest or the highest installed kernel is broken?

Note that I'm not saying 4.4.9-rt17+ was broken. It was an upstream debugging kernel, so I'm not surprised the modules were big. And I'm happy the bug is fixed, because I plan to run the tests scripts I'm writing against upstream kernels.

Comment 22 Richard W.M. Jones 2016-05-26 13:39:22 UTC
When supermin was a shell script, it worked by running this
precise command:

 ls -1dvr /boot/vmlinuz*.$arch* | grep -v xen | head -1

The -v -r parameters sort by version in reverse so this
picks the highest version.  The current code isn't a shell
script but it does exactly the same thing.  So that's the
reason - choosing the running kernel is more work.

Also I guess the running kernel might not exist in /boot
(although that would be an unusual situation, usually prevented
in most distros).  As Pino mentioned if the choice of kernel
for the appliance really matters then you can set it using
the various SUPERMIN_* options, but don't forget to delete
the cache under /var/tmp/.guestfs-* otherwise the existing
cached kernel will continue to be reused.

Comment 23 Luiz Capitulino 2016-05-26 14:03:49 UTC
IMO, supermin should use the kernel the host is running as a hint and try that one first. This shouldn't be hard to do.

This BZ should be enough evidence that picking up the highest numbered kernel is not a good design decision. My kernel was a test kernel with custom patches, I was just lucky it didn't blow at boot.

Comment 24 Richard W.M. Jones 2016-05-27 10:11:46 UTC
Discussion moved to
https://www.redhat.com/archives/libguestfs/2016-May/msg00234.html

Comment 25 Xianghua Chen 2016-05-31 02:03:26 UTC
Thank you all for all the information.
I have reproduced it successfully with supermin5-5.1.16-1.el7.x86_64 and verified it with supermin5-5.1.16-2.el7.x86_64.

Verified with the packages:
supermin5-5.1.16-2.el7.x86_64

Original host kernel:
kernel-3.10.0-229.el7.x86_64

Steps:
1. Install the kernel provided by Luiz.
# rpm -ivh kernel-4.4.9_rt17+-4.x86_64.rpm
2. Reboot the system to use kernel and check the kernel:
# uname -a
Linux dhcp-8-189.nay.redhat.com 4.4.9-rt17+ #3 SMP PREEMPT RT Tue May 24 14:52:54 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
3. # virt-ls -a  RHEL-Server-7.2-64-hvm.raw /root
.bash_logout
.bash_profile
.bashrc
.cache
.config
.cshrc
.tcshrc
anaconda-ks.cfg

The following error didn't occurred:
libguestfs: error: appliance closed the connection unexpectedly.
This usually means the libguestfs appliance crashed.
See http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs
for information about how to debug libguestfs and report bugs.
libguestfs: error: guestfs_launch failed.
This usually means the libguestfs appliance failed to start or crashed.
See http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs
or run 'libguestfs-test-tool' and post the *complete* output into a
bug report or message to the libguestfs mailing list.

4. Also tried libguestfs-test-tool, the test finished ok:

The following error didn't occurred:
[    4.710415] init[1]: segfault at 7fffa3ec2408 ip 0000000000400f06 sp 00007fffa3ec2410 error 6 in init[400000+c3000]

So verified.

Comment 26 Richard W.M. Jones 2016-05-31 07:50:27 UTC
(In reply to Xianghua Chen from comment #25)
> Thank you all for all the information.

Could you give this bug QA ack please?

Comment 27 Xianghua Chen 2016-05-31 12:30:57 UTC
Sorry. Missed this one.

Comment 29 errata-xmlrpc 2016-11-03 23:04:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2154.html


Note You need to log in before you can comment on or make changes to this bug.