Bug 1274966 - Kernel warning on VMware ESXi 6 while booting PXE LiveISO
Summary: Kernel warning on VMware ESXi 6 while booting PXE LiveISO
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Neil Horman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-24 06:58 UTC by Sitsofe Wheeler
Modified: 2016-10-29 07:27 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-29 07:27:49 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg (103.13 KB, text/plain)
2015-10-24 06:58 UTC, Sitsofe Wheeler
no flags Details
iPXE debug screenshot (115.00 KB, image/png)
2016-10-27 06:12 UTC, Sitsofe Wheeler
no flags Details
Attaching increased debugging screenshot (189.18 KB, image/png)
2016-10-28 20:23 UTC, Sitsofe Wheeler
no flags Details

Description Sitsofe Wheeler 2015-10-24 06:58:27 UTC
Created attachment 1086062 [details]
dmesg

Description of problem:
After PXE booting a LiveISO that had been converted into an kernel+initrd a kernel warning came up during boot.

Version-Release number of selected component (if applicable):
kernel 4.2.3-300.fc23.x86_64

How reproducible:
Reproducible every time.

Steps to Reproduce:
1. Convert Fedora-Live-Workstation-x86_64-23-20151016.iso into a PXE compatible kernel+initrd using livecd-iso-to-pxeboot (https://fedorahosted.org/cobbler/wiki/HowToPxeAnyLiveCd ).
2. Use iPXE inside a VM with 4GBytes RAM on ESXi 6.0u1 to boot the kernel and initrd over HTTP.

Actual results:
Abrt mentions that there was an error logged against the kernel. Looking through dmesg shows this:

[    0.223735] NET: Registered protocol family 1
[    0.223744] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.223796] pci 0000:00:0f.0: Video device with shadowed ROM
[    0.224011] PCI: CLS 32 bytes, default 64
[    0.224056] Unpacking initramfs...
[    2.196627] ------------[ cut here ]------------
[    2.196635] WARNING: CPU: 1 PID: 1 at arch/x86/mm/init.c:653 free_init_pages+0x9c/0xb0()
[    2.196636] Modules linked in:
[    2.196639] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.2.3-300.fc23.x86_64 #1
[    2.196640] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[    2.196641]  0000000000000000 000000008df4d5d4 ffff880140223d58 ffffffff81771fca
[    2.196643]  0000000000000000 0000000000000000 ffff880140223d98 ffffffff8109e4a6
[    2.196644]  ffff880140223d78 ffff88007ffffffd ffff880080000000 ffff880026c0a000
[    2.196645] Call Trace:
[    2.196649]  [<ffffffff81771fca>] dump_stack+0x45/0x57
[    2.196652]  [<ffffffff8109e4a6>] warn_slowpath_common+0x86/0xc0
[    2.196653]  [<ffffffff8109e5da>] warn_slowpath_null+0x1a/0x20
[    2.196655]  [<ffffffff810630dc>] free_init_pages+0x9c/0xb0
[    2.196658]  [<ffffffff81d58b31>] ? unpack_to_rootfs+0x296/0x296
[    2.196661]  [<ffffffff81d71ca0>] free_initrd_mem+0x2f/0x34
[    2.196662]  [<ffffffff81d58c1a>] populate_rootfs+0xe9/0x10a
[    2.196664]  [<ffffffff81002123>] do_one_initcall+0xb3/0x200
[    2.196666]  [<ffffffff810bbed1>] ? parse_args+0x271/0x4a0
[    2.196669]  [<ffffffff81778400>] ? _raw_write_unlock_irqrestore+0x10/0x20
[    2.196671]  [<ffffffff81d571dc>] kernel_init_freeable+0x18e/0x228
[    2.196673]  [<ffffffff81768710>] ? rest_init+0x80/0x80
[    2.196674]  [<ffffffff8176871e>] kernel_init+0xe/0xe0
[    2.196676]  [<ffffffff81778d9f>] ret_from_fork+0x3f/0x70
[    2.196677]  [<ffffffff81768710>] ? rest_init+0x80/0x80
[    2.196681] ---[ end trace aa74c03b3950d863 ]---

Expected results:
No warning.

Additional info:
Attaching dmesg.

Comment 1 Tore Anderson 2016-05-27 11:04:24 UTC
FYI, the same thing occurs when booting an Ubuntu Xenial VM with kernel 4.4.0-21-generic on a KVM-based hypervisor running 2.6.32-504.16.2.el6.x86_64:

[    0.292700] ------------[ cut here ]------------
[    0.293186] WARNING: CPU: 1 PID: 1 at /build/linux-Ay7j_C/linux-4.4.0/arch/x86/mm/init.c:656 free_init_pages+0x94/0xa0()
[    0.294287] Modules linked in:
[    0.294630] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.4.0-21-generic #37-Ubuntu
[    0.295374] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[    0.296009]  0000000000000286 00000000745a61ba ffff88011ab97db8 ffffffff813e93c3
[    0.296014]  0000000000000000 ffffffff81ca7fd0 ffff88011ab97df0 ffffffff81080f62
[    0.296014]  ffff88007fffffff ffff880080000000 ffff88007fef9000 ffff880080000000
[    0.296014] Call Trace:
[    0.296014]  [<ffffffff813e93c3>] dump_stack+0x63/0x90
[    0.296014]  [<ffffffff81080f62>] warn_slowpath_common+0x82/0xc0
[    0.296014]  [<ffffffff810810aa>] warn_slowpath_null+0x1a/0x20
[    0.296014]  [<ffffffff81068b84>] free_init_pages+0x94/0xa0
[    0.296014]  [<ffffffff81f5c7c3>] ? clean_rootfs+0x175/0x175
[    0.296014]  [<ffffffff81f77882>] free_initrd_mem+0x2f/0x34
[    0.296014]  [<ffffffff81f5c0a3>] free_initrd+0x7d/0x96
[    0.296014]  [<ffffffff81f5c8b5>] populate_rootfs+0xf2/0xfc
[    0.296014]  [<ffffffff81002123>] do_one_initcall+0xb3/0x200
[    0.296014]  [<ffffffff8109fca5>] ? parse_args+0x295/0x4b0
[    0.296014]  [<ffffffff81f5a1a5>] kernel_init_freeable+0x173/0x212
[    0.296014]  [<ffffffff81817f30>] ? rest_init+0x80/0x80
[    0.296014]  [<ffffffff81817f3e>] kernel_init+0xe/0xe0
[    0.296014]  [<ffffffff8182488f>] ret_from_fork+0x3f/0x70
[    0.296014]  [<ffffffff81817f30>] ? rest_init+0x80/0x80
[    0.307317] ---[ end trace f5087025e139ef3d ]---

So this is likely an upstream issue.

Comment 2 Sitsofe Wheeler 2016-06-28 20:03:19 UTC
Still here in Fedora 24:
[    0.205692] Unpacking initramfs...
[    4.148684] ------------[ cut here ]------------
[    4.148690] WARNING: CPU: 0 PID: 1 at arch/x86/mm/init.c:656 free_init_pages+0x94/0xa0()
[    4.148691] Modules linked in:
[    4.148693] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.5-300.fc24.x86_64 #1
[    4.148694] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[    4.148695]  0000000000000286 00000000fd548064 ffff88013abf3d90 ffffffff813d35af
[    4.148696]  0000000000000000 ffffffff81a6557c ffff88013abf3dc8 ffffffff810a5f12
[    4.148698]  ffff88007ffffb5a ffff880080000000 ffff880022de9000 ffff880080000000
[    4.148699] Call Trace:
[    4.148702]  [<ffffffff813d35af>] dump_stack+0x63/0x84
[    4.148705]  [<ffffffff810a5f12>] warn_slowpath_common+0x82/0xc0
[    4.148706]  [<ffffffff810a605a>] warn_slowpath_null+0x1a/0x20
[    4.148707]  [<ffffffff81066e24>] free_init_pages+0x94/0xa0
[    4.148710]  [<ffffffff81d6ab9a>] ? unpack_to_rootfs+0x29c/0x29c
[    4.148712]  [<ffffffff81d85368>] free_initrd_mem+0x2f/0x34
[    4.148713]  [<ffffffff81d6ac82>] populate_rootfs+0xe8/0x108
[    4.148715]  [<ffffffff81002123>] do_one_initcall+0xb3/0x1f0
[    4.148716]  [<ffffffff81d688ea>] ? initcall_blacklist+0xbe/0xbe
[    4.148717]  [<ffffffff81d691ef>] kernel_init_freeable+0x18f/0x22b
[    4.148720]  [<ffffffff817c1cd0>] ? rest_init+0x80/0x80
[    4.148722]  [<ffffffff817c1cde>] kernel_init+0xe/0xe0
[    4.148724]  [<ffffffff817cf03f>] ret_from_fork+0x3f/0x70
[    4.148725]  [<ffffffff817c1cd0>] ? rest_init+0x80/0x80
[    4.148729] ---[ end trace ca922cef829ef6af ]---
[    4.303464] Freeing initrd memory: 1525852K (ffff880022de9000 - ffff880080000000)

kernel-4.5.5-300.fc24.x86_64

Comment 3 Laura Abbott 2016-09-23 19:17:02 UTC
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs.
 
Fedora 24 has now been rebased to 4.7.4-200.fc24.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 25, and are still experiencing this issue, please change the version to Fedora 25.
 
If you experience different issues, please open a new bug report for those.

Comment 4 Sitsofe Wheeler 2016-09-24 15:15:36 UTC
I've booted a Rawhide LiveISO from today and the problem is still here with 4.8.0-0.rc7.git2.1.fc26.x86_64 :

[    0.955216] Unpacking initramfs...
[    5.391977] ------------[ cut here ]------------
[    5.391986] WARNING: CPU: 0 PID: 1 at arch/x86/mm/init.c:671 free_init_pages+0x94/0xa0
[    5.391988] Modules linked in:
[    5.391992] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-0.rc7.git2.1.fc26.x86_64 #1
[    5.391993] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[    5.391995]  0000000000000286 00000000c1677634 ffff9d943abffd58 ffffffffb6467073
[    5.391999]  0000000000000000 0000000000000000 ffff9d943abffd98 ffffffffb60ae7ab
[    5.392001]  0000029fb71cab9d ffff9d9300000000 ffff9d9380000000 ffff9d9329861000
[    5.392003] Call Trace:
[    5.392008]  [<ffffffffb6467073>] dump_stack+0x86/0xc3
[    5.392020]  [<ffffffffb60ae7ab>] __warn+0xcb/0xf0
[    5.392022]  [<ffffffffb60ae8dd>] warn_slowpath_null+0x1d/0x20
[    5.392024]  [<ffffffffb606b094>] free_init_pages+0x94/0xa0
[    5.392027]  [<ffffffffb71cabe3>] ? unpack_to_rootfs+0x29d/0x29d
[    5.392039]  [<ffffffffb71c88cd>] ? set_debug_rodata+0x12/0x12
[    5.392042]  [<ffffffffb71e6aee>] free_initrd_mem+0x21/0x23
[    5.392044]  [<ffffffffb71cacc8>] populate_rootfs+0xe5/0x105
[    5.392046]  [<ffffffffb6002190>] do_one_initcall+0x50/0x180
[    5.392048]  [<ffffffffb71c88cd>] ? set_debug_rodata+0x12/0x12
[    5.392049]  [<ffffffffb71c923b>] kernel_init_freeable+0x200/0x29c
[    5.392053]  [<ffffffffb68e6b4e>] kernel_init+0xe/0x100
[    5.392055]  [<ffffffffb68f7b2f>] ret_from_fork+0x1f/0x40
[    5.392056]  [<ffffffffb68e6b40>] ? rest_init+0x140/0x140
[    5.392062] ---[ end trace 0fbf5e487fcf7e56 ]---
[    6.572994] Freeing initrd memory: 1416828K (ffff9d9329861000 - ffff9d9380000000)

Re-assinging to Rawhide.

Comment 5 Sitsofe Wheeler 2016-09-25 02:14:10 UTC
Reproduced using a Hyper-V VM so as Tore stated this error message looks generically spurious.

Comment 6 Neil Horman 2016-10-18 14:21:56 UTC
Seems like the various hypervisors may be passing ramdisk start locations that are not page aligned.

Comment 7 Sitsofe Wheeler 2016-10-18 21:05:09 UTC
Neil:
When I asked on LKML about this there was a suggestion that the concatanation might have created something itself misaligned (http://www.gossamer-threads.com/lists/linux/kernel/2534175#2534175 "Re: Warning from free_init_pages with large initrd").

Comment 8 Neil Horman 2016-10-19 14:50:55 UTC
Possible, but its the start of the image thats misaligned, which means the first part of the concatenation is misaligned, which comes from the bios.

Comment 9 Sitsofe Wheeler 2016-10-20 05:51:56 UTC
Neil:
What information do you need from me?

Comment 10 Neil Horman 2016-10-20 14:20:29 UTC
Well, we can start by confirming that the begining of the ramdisk is misaligned.  If you drop to a grub console, you can set debug=1 and see what grub passes to the kernel.  That should tell us what the ramdisk start and end addresses are and will confirm that they are in fact misalgned.

Comment 11 Sitsofe Wheeler 2016-10-21 06:14:55 UTC
I'm actually using iPXE (http://ipxe.org/ ) and not grub as the bootloader. I've just come across (https://github.com/ipxe/ipxe/commit/c825a9b39dddcdc8e403435f9984ef8be11967bc#diff-51986121f3799668277dead60f11f4f0 ) which talks about initrd alignment. I'm fairly sure my iPXE is later than that commit but just in case I'll build a new one and see if the problem can still be reproduced.

Comment 12 Sitsofe Wheeler 2016-10-21 06:32:02 UTC
My iPXE version is https://github.com/ipxe/ipxe/commit/57bab4 so it should already include that patch. I tried to see what grub does but https://github.com/coreos/grub/blob/6d3b1dc72b39bbfe03b894477c62f005c06f691b/grub-core/loader/linux.c is fairly complicated...

Comment 13 Neil Horman 2016-10-21 14:24:43 UTC
That patch looks like its just padding the end of the image to be page aligned, but not the start address.  The commit you need to be sure you have is 0fac05511967e791bff56996227047ade16a8533.

Comment 14 Sitsofe Wheeler 2016-10-21 21:19:10 UTC
I seem to have that patch too (see https://github.com/ipxe/ipxe/blame/57bab4e1d3c5ffae8f7c698937d20a01222275c5/src/arch/i386/image/bzimage.c#L354 ). What I really need is to include the debug statements that say which addresses were used for loading the initrd. I'll see if I can rebuild iPXE with the appropriate bits...

Comment 15 Sitsofe Wheeler 2016-10-21 21:19:39 UTC
Forgot to say - thanks for the tip offs Neil!

Comment 16 Neil Horman 2016-10-26 16:55:22 UTC
Glad to help!

Concur with you on those two patches.  If you have them it should be working, and at this point the right approach is to instrument the ipxe code to print out the start and end addresses.

Comment 17 Sitsofe Wheeler 2016-10-27 06:10:17 UTC
Here's some hand transcribed ipxe output:
bzImage 0xd4004 version 020d RM 0x10000+0x4400 PM 0x100000+0x5f8618 cmdlen 2047

bzImage 0xd3fc4 missing 55AA signature

bzImage 0xd4004 version 020d RM 0x10000+0x4400 PM 0x100000+0x5f8618 cmdlen 2047
bzImage 0xd4004 initrd 0xd3fc4 from [0x62573000,0xbf789b5b)

bzImage 0xd4004 loading initrds from 0x7fffffff downwards
bzImage 0xd4004 initrd 0xd3fc4 [0x22de8fff,0x22de8fff,0x7ffffb5a)
bzImage 0xd4004 initrds at [0x22de8fff,0x7ffffb5a)
bzImage 0xd4004 vidmode 65535
bzImage 0xd4004 jumping to RM kernel at 1020:0000 (stack 1000:9000)

Screenshot to be attached (in case of transcription errors)

Comment 18 Sitsofe Wheeler 2016-10-27 06:12:08 UTC
Created attachment 1214495 [details]
iPXE debug screenshot

iPXE 0be77e959e813773cac10f049cb62c83258f81ff debug screenshot

Comment 19 Neil Horman 2016-10-27 17:15:13 UTC
well, that definately looks like the ramdisk isn't sitting on a page boundary.  Looking at the bzimage_load_initrds function, I would have to venture a guess that we're loading some additional images, and while the offset computation for each gets properly aligned (via bzimage_align), I would get the userptr_add operation of a -offset value somehow isn't doing whats expected.  I would recommend further instrumenting this function with additional DBGC statements to dump the offset intermediate values, and the dest pointer value after each iteration through that for_each_image loop to confirm and see more precisely where we are going sideways

Comment 20 Sitsofe Wheeler 2016-10-28 20:23:42 UTC
Created attachment 1215104 [details]
Attaching increased debugging screenshot

Looks like things are going funny when dest is calculated - the raw offset appears to be a multiple of 4096...

Comment 21 Sitsofe Wheeler 2016-10-29 04:03:35 UTC
This has now been resolved in iPXE by https://git.ipxe.org/ipxe.git/commit/aa11f5deda29f91f5b8592965ed00f4a65002eff "[bzimage] Fix page alignment of initrd images". For the record this seems to have affected initrds of any size used with iPXE. Once again, thanks for the help Neil!

Tore:
Can I just confirm that you too were using iPXE as your bootloader? Once you've replied we should be able to close this issue...

Comment 22 Tore Anderson 2016-10-29 04:46:21 UTC
(In reply to Sitsofe Wheeler from comment #21)

> Tore:
> Can I just confirm that you too were using iPXE as your bootloader? Once
> you've replied we should be able to close this issue...

Confirmed.

Tore

Comment 23 Sitsofe Wheeler 2016-10-29 07:27:49 UTC
Tore: Thanks for following up. Marking this issue as CLOSED NOTABUG because the issue wasn't in the kernel, it was in iPXE and should be fixed when using a newer iPXE.


Note You need to log in before you can comment on or make changes to this bug.