Bug 1339691
| Summary: | supermin init segfaults when kernel has large modules | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Luiz Capitulino <lcapitulino> | ||||||
| Component: | supermin | Assignee: | Richard W.M. Jones <rjones> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 7.3 | CC: | lcapitulino, ptoscano, xchen | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | supermin-5.1.16-2.el7 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-11-03 23:04:40 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Luiz Capitulino
2016-05-25 15:40:46 UTC
Created attachment 1161489 [details]
libguestfs-test-tool output
Interestingly the supermin init process segfaults: [ 0.966816] init[1]: segfault at 7ffd1319b278 ip 0000000000400f06 sp 00007ffd1319b280 error 6 in init[400000+c3000] Is this supermin from the RHEL 7.3 preview repository? It's probably segfaulting in this loop, or just after this loop: https://github.com/libguestfs/supermin/blob/master/init/init.c#L132 Yes, package version is: supermin5-5.1.16-1.el7.x86_64 Is there a quick recipe for me to run supermin from sources and debug this issue? I believe it's crashing here: https://github.com/libguestfs/supermin/blob/master/init/init.c#L320 I'm very unclear why. You can run your own version of supermin by checking out: https://github.com/libguestfs/supermin and running the test suite: sudo yum-builddep supermin ./autogen.sh make make check The binary which fails is built in init/init.c. However you can't really run the binary on its own since it's an early boot process. You can insert printf's into it. Sorry the above comment is wrong. The supermin test suite doesn't actually run the init code. Testing this is tricky. Probably the best thing would be to copy the supermin binary (src/supermin) over /usr/bin/supermin5 and then run libguestfs-test-tool. The path to supermin is hard-coded into libguestfs-test-tool, it is not possible to change it through environment variables etc. And also, you have to do: rm -rf /var/tmp/.guestfs-* libguestfs-test-tool else it will continue to use the old cached appliance. One thought - could it be that your kernel modules are unstripped and therefore huge? The way the code is currently written, the whole module is stored on the stack. I notice you're not running a RHEL kernel. What would be useful would be to do: cpio -itv < /var/tmp/.guestfs-`id -u`/appliance.d/init and see if any of the kernel files (*.ko) is especially big. That command should be: cpio -itv < /var/tmp/.guestfs-`id -u`/appliance.d/initrd On the basis this is likely to be a stack overflow I posted the following patch upstream: https://www.redhat.com/archives/libguestfs/2016-May/msg00215.html Waiting for confirmation from comment 10 before moving to POST. How big is huge? :) I'm running the RHEL RT kernel, will be an official kernel for next RHEL. [root@virtlab508 tmp]# cpio -itv < /var/tmp/.guestfs-`id -u`/appliance.d/initrd drwxr-xr-x 2 root root 0 May 25 10:46 . -rw-r--r-- 1 root root 423958 May 25 10:46 ata_piix.ko -rw-r--r-- 1 root root 110246 May 25 10:46 crc-ccitt.ko -rw-r--r-- 1 root root 110038 May 25 10:46 crc-itu-t.ko -rw-r--r-- 1 root root 218206 May 25 10:46 crc32-pclmul.ko -rw-r--r-- 1 root root 211982 May 25 10:46 crc32.ko -rw-r--r-- 1 root root 270966 May 25 10:46 crc32c-intel.ko -rw-r--r-- 1 root root 111094 May 25 10:46 crc8.ko -rw-r--r-- 1 root root 218894 May 25 10:46 crct10dif-pclmul.ko -rw-r--r-- 1 root root 11056054 May 25 10:46 ext4.ko -rwxr-xr-x 1 root root 811064 May 25 10:46 init -rw-r--r-- 1 root root 2090550 May 25 10:46 jbd2.ko -rw-r--r-- 1 root root 3363558 May 25 10:46 libata.ko -rw-r--r-- 1 root root 176198 May 25 10:46 libcrc32c.ko -rw-r--r-- 1 root root 342286 May 25 10:46 mbcache.ko -rw-r--r-- 1 root root 290 May 25 10:46 modules -rw-r--r-- 1 root root 737334 May 25 10:46 sd_mod.ko -rw-r--r-- 1 root root 169510 May 25 10:46 virtio-rng.ko -rw-r--r-- 1 root root 301286 May 25 10:46 virtio_balloon.ko -rw-r--r-- 1 root root 347726 May 25 10:46 virtio_blk.ko -rw-r--r-- 1 root root 421950 May 25 10:46 virtio_console.ko -rw-r--r-- 1 root root 241326 May 25 10:46 virtio_input.ko -rw-r--r-- 1 root root 585798 May 25 10:46 virtio_net.ko -rw-r--r-- 1 root root 709310 May 25 10:46 virtio_pci.ko -rw-r--r-- 1 root root 384862 May 25 10:46 virtio_scsi.ko 45738 blocks (In reply to Luiz Capitulino from comment #12) > How big is huge? :) > -rw-r--r-- 1 root root 11056054 May 25 10:46 ext4.ko It actually crashed while loading this module, and that's quite big. I'm guessing allocating 11MB of stack wasn't such a great idea. I'll build a new supermin shortly with the posted patch, stay tuned ... I want to reproduce and verify this bug, can you tell me how to allocate 11MB of stack to ext4.ko? And about "RHEL RT kernel", can you give detailed info like what compose version are you using? That's all about how can I prepare the same env as yours, the more info the better , thanks a lot! (In reply to Xianghua Chen from comment #16) > I want to reproduce and verify this bug, can you tell me how to allocate > 11MB of stack to ext4.ko? > And about "RHEL RT kernel", can you give detailed info like what compose > version are you using? > That's all about how can I prepare the same env as yours, the more info the > better , thanks a lot! Installing Linux version 4.4.9-rt17+ (root.lab.eng.bos.redhat.com) should be sufficient to reproduce this. However it's not a brew kernel. Maybe Luiz can help to locate that kernel for you. (In reply to Richard W.M. Jones from comment #17) > Installing Linux version 4.4.9-rt17+ > (root.lab.eng.bos.redhat.com) > should be sufficient to reproduce this. However it's not a brew > kernel. Oh, now I got it why you said I wasn't running a RHEL kernel. It's true that I have that 4.4.9-rt17+ kernel installed, but I'm not running it: [root@virtlab508 ~]# uname -r 3.10.0-408.rt56.290.el7.x86_64 [root@virtlab508 ~]# However, I might have ran virt-copy-in once while running 4.4.9-rt17+. So, how does libguestfs decide which kernel to run? > Maybe Luiz can help to locate that kernel for you. Xianghua, I'll send you instructions by email on how to download this kernel from my test machine. It chooses the highest numbered kernel from /boot to run the appliance. It doesn't matter what kernel you are running on the host. I don't think it's really important to be able to QE this bug. It's an obvious bug and is covered by the rebase we are already doing for supermin (ie. bug 1271255). (In reply to Luiz Capitulino from comment #18) > However, I might have ran virt-copy-in once while running 4.4.9-rt17+. So, > how does libguestfs decide which kernel to run? By default it picks the newest kernel found. You can point supermin to the kernel to pick with the SUPERMIN_KERNEL environment variable; see: http://libguestfs.org/supermin.1.html#ENVIRONMENT-VARIABLES Is there a good reason not to pick the kernel the host is running? What if the newest or the highest installed kernel is broken? Note that I'm not saying 4.4.9-rt17+ was broken. It was an upstream debugging kernel, so I'm not surprised the modules were big. And I'm happy the bug is fixed, because I plan to run the tests scripts I'm writing against upstream kernels. When supermin was a shell script, it worked by running this precise command: ls -1dvr /boot/vmlinuz*.$arch* | grep -v xen | head -1 The -v -r parameters sort by version in reverse so this picks the highest version. The current code isn't a shell script but it does exactly the same thing. So that's the reason - choosing the running kernel is more work. Also I guess the running kernel might not exist in /boot (although that would be an unusual situation, usually prevented in most distros). As Pino mentioned if the choice of kernel for the appliance really matters then you can set it using the various SUPERMIN_* options, but don't forget to delete the cache under /var/tmp/.guestfs-* otherwise the existing cached kernel will continue to be reused. IMO, supermin should use the kernel the host is running as a hint and try that one first. This shouldn't be hard to do. This BZ should be enough evidence that picking up the highest numbered kernel is not a good design decision. My kernel was a test kernel with custom patches, I was just lucky it didn't blow at boot. Discussion moved to https://www.redhat.com/archives/libguestfs/2016-May/msg00234.html Thank you all for all the information. I have reproduced it successfully with supermin5-5.1.16-1.el7.x86_64 and verified it with supermin5-5.1.16-2.el7.x86_64. Verified with the packages: supermin5-5.1.16-2.el7.x86_64 Original host kernel: kernel-3.10.0-229.el7.x86_64 Steps: 1. Install the kernel provided by Luiz. # rpm -ivh kernel-4.4.9_rt17+-4.x86_64.rpm 2. Reboot the system to use kernel and check the kernel: # uname -a Linux dhcp-8-189.nay.redhat.com 4.4.9-rt17+ #3 SMP PREEMPT RT Tue May 24 14:52:54 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux 3. # virt-ls -a RHEL-Server-7.2-64-hvm.raw /root .bash_logout .bash_profile .bashrc .cache .config .cshrc .tcshrc anaconda-ks.cfg The following error didn't occurred: libguestfs: error: appliance closed the connection unexpectedly. This usually means the libguestfs appliance crashed. See http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs for information about how to debug libguestfs and report bugs. libguestfs: error: guestfs_launch failed. This usually means the libguestfs appliance failed to start or crashed. See http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs or run 'libguestfs-test-tool' and post the *complete* output into a bug report or message to the libguestfs mailing list. 4. Also tried libguestfs-test-tool, the test finished ok: The following error didn't occurred: [ 4.710415] init[1]: segfault at 7fffa3ec2408 ip 0000000000400f06 sp 00007fffa3ec2410 error 6 in init[400000+c3000] So verified. (In reply to Xianghua Chen from comment #25) > Thank you all for all the information. Could you give this bug QA ack please? Sorry. Missed this one. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2154.html |