Bug 1802123
| Summary: | ipxe corrupts large initramfs | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Grzegorz Halat <ghalat> | ||||||
| Component: | ipxe | Assignee: | Neil Horman <nhorman> | ||||||
| ipxe sub component: | ipxe-bootimgs | QA Contact: | Raviv Bar-Tal <rbartal> | ||||||
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |||||||
| Severity: | medium | ||||||||
| Priority: | unspecified | CC: | aklimov, astupnik, cbesson, nhorman, rmetrich | ||||||
| Version: | 7.7 | ||||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | All | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2020-06-23 16:38:35 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
can you please post the logs that you have here regardless? Also, can you capture a tcpdump from the ipxe server (in pcap format) and attach it here? The fact that the lines are being overwritten on the console is not an issue with the console itself. You should be able to use minicom or some other utility to dump serial output directly to a log file, which will negate any screen clearing/reset. The packet loss is an indicator that we should get the tcpdump output I requested above to correlate those errors Also, what is the uncompressed size of the initramfs? Looking at your memory map above, you only have 1.9 Gb of storage space under the 4GB address boundary, which is needed for the ipxe image, kernel and initramfs, among other potential mappings. It is entirely possible your initramfs is just to large for that space. Lastly, you mentioned in comment 6 that ipxe is compiled with various options. Is this as custom build of ipxe? If so, we don't support that, and I'd ask that you reproduce this issue with a supported build The size definitely matters. The checksum of the file image is based on the uncompressed image, so if tmpfs is running out of space, the checksum will be wrong. Please confirm the size of the uncompressed image FWIW, Looking at the png you provided, it looks like that particular boot had a ramdisk that was a total of 444MB (I think compressed), meaning you would need a little over a gig of space to decompress it. Thats going to give you a few hundred MB of space to store the kernel (10MB), the ipxe binary(10MB) and the heap and stack space that the drivers need to run properly, you may legitimately be running out of space in the ipxe environment (which all has to exist under the 4GB mark). Suggest that you compare a system that consistently works - specifically the E820 map to see how much ram is available on those servers under the 4GB mark. (In reply to Neil Horman from comment #11) >Please confirm the size of the uncompressed image I asked the customer to upload the initramfs which we were using during remote sessions. > (...) Suggest that you > compare a system that consistently works - specifically the E820 map to see > how much ram is available on those servers under the 4GB mark. We had this idea during the last remote session - boot a working server via iPXE and collect dmesg to compare it with not working server. Unfortunately we had some issues and we run out of time. We will try again during the next session. ok, please let me know Created attachment 1663151 [details]
445M initramfs on 1.5G VM - kernel boots successfully
I've tried to explain this in comment 7. pxe boot environments operate entirely in the 32 bit address range, meaning only memory in the e820 map below 4GB is accessible. On the VM that you tested on, the e820 map looks like this: 0x0000000000000000-0x000000000009fbff usable 0x000000000009fc00-0x000000000009ffff reserved 0x00000000000f0000-0x00000000000fffff reserved 0x0000000000100000-0x00000000bb7dffff usable 0x00000000bb7e0000-0x00000000bb7fffff reserved 0x00000000feffc000-0x00000000feffffff reserved 0x00000000fffc0000-0x00000000ffffffff reserved The usable sections that use less that 32 significant bits of address space amount to (0xbb7dffff-0x10000)+(0x9fbff) = 0xbb7fbfe = 3145202686 / (1024^3) = 2.9Gb of available RAM. On the failing system the e820 map looks like this: 0x0000000000100000-0x0000000077e07fff usable 0x0000000000000000-0x0000000000098fff usable 0x0000000000099000-0x000000000009ffff reserved 0x00000000000e0000-0x00000000000fffff reserved 0x0000000000100000-0x0000000077e07fff usable 0x0000000077e08000-0x000000007ee45fff reserved 0x000000007ee46000-0x000000007ef58fff ACPI data 0x000000007ef59000-0x000000007f168fff ACPI NVS 0x000000007f169000-0x000000007f27cfff reserved 0x000000007f27d000-0x000000007f7fffff ACPI NVS 0x0000000080000000-0x000000008fffffff reserved 0x00000000fed1c000-0x00000000fed3ffff reserved 0x00000000ff000000-0x00000000ffffffff reserved 0x0000000100000000-0x0000000f7fffffff usable It has lots more usable ram sections, but only the first 2 sections fit under the 32 bit address space limit (there is a 3rd section that does, but its a duplicate, not sure why its there). Regardless, the 32 bit usable address space memory on this system is: (0x77e07fff−0x100000)+98fff = 0x77DA0FFE = 2010779646 / (1024^3) = 1.8G Ram if the compressed file size of the initramfs is approximately .5 Gb and the uncompressed size is 1.2Gb you are taking up 1.7Gb of your 1.8Gb of available ram before you factor in the kernel size, ipxe image, heap and stack. You are running out of memory. can you provide links to the failing and non-failing initramfs? This doesn't appear to have anything to do with ipxe or decompression. According to these logs: 1) ipxe validated the downloaded inintramfs' md5sum: INITRD squashing agent.ramdisk [0x5b991000,0x77618495)->[0x5bfff000,0x77c86495) INITRD agent.ramdisk at [0x5bfff000,0x77c86495) md5sum ( 0x5bfff000, 0x1bc87495 ) = 5da87dd14488ec7c2eb0cf40fbf98cad 2) the kernel never gets to the deompression phase, because it can't identify the decompression type listed at the head of the initramfs from the files magic numbers. That would suggest that we have an initramfs that passes its integrity check before booting the kernel (meaning its unchanged from whats on the server), but that cannot be identified as having a valid decompression algorithm. If you can post links to the failing and working initramfs images we can look into that further well, it looks like I owe you an apology. The md5sum doesn't in fact match. I'll spend some time setting up a reproducer to see if I can't get better visibility on whats going on here. attaching my reproducer attempt, which unfortunately, seems to work Created attachment 1667556 [details]
output log of my reproducer attempt
ok, then I would return to my thought in comment 33. Its probably worth a test to limit the low memory available to ipxe so as to test if the e820 map on the compute server isn't somehow bad, and stepping on some device space. any update here? first of all, nice work! Thats a really good find Next, yes, if you could, at your next opportunity use the e820 map to exclude that region, just to confirm that we can make this work, that would be great. Looking at the data, here are my immediate observations: 1) its not ascii, so its likely not from an input device 2) I almost see an ethernet oui in the data (there is a repeating pattern of 52 54 60), the first two bytes of which are a realtek oui, but the 60 doesn't match anything. might be worth checking to see if any of the nics on board have a mac that start with 52 54 60 3) The data is somewhat patterned. Every block starts with either a 00 or 20, and is followed by 32 d6 50 d3. Makes it seem like its an informational header of some sort, though I can't figure out exactly what it is. I thought perhaps it was an smbios system boot information block, but it doesn't match up 4) The fact that hw breakpoints aren't triggered is somewhat telling. Normally hw breakpoints are implemented by snooping the frontside bus on the cpu for writes to specific addresses, but since its not triggering, that suggests that the change/corruption is occuring due to a write from a device (i.e. a dma operation). Is it possible to disable the iommu from bios on this system? It might not do anything but it might be an interesting test to see if doing so changes where the corrupted memory lives. ping any update here? ok, so thats good news. Where does that leave us however? We seem to have a system that has a corrupted section of the e820 map in it? Is it time to contact the system vendor and ask them what sort of firmware updates are available? Copy that, thanks for the update. Wish we could have figured out the root cause here |
Description of problem: Some servers can't be booted by iPXE due to initramfs corruption. Version-Release number of selected component (if applicable): 20180825-2.git133f4c.el7 and upstream built from commit 18dc73d2 How reproducible: The issue is always reproducible only on some servers, on other servers it always works correctly. There is no obvious difference in HW/firmware between working and not working configurations. This also happens even when initramfs is downloaded via HTTP. Steps to Reproduce: 1. Create a large initramfs, 2. Try to boot a server using iPXE - result: kernel panic 3. initramfs corruption can be verified by: - using a custom kernel with implemented initramfs checksuming (this feature doesn't exist in RHEL nor in the upstream, it was implemented in a scratch build of kernel for troubleshooting purposes) - using iPXE compiled with DEBUG=initrd:3 A such compiled iPXE calculates md5 hash of initramfs. Actual results: initramfs passed to the kernel by iPXE is corrupted, the corruption is detected by the kernel so initramfs is not uncompressed and kernel panics due to lack of init binary. md5 hash of initramfs is different at each boot. iPXE compiled with DEBUG=initrd:3 calculates md5 twice and those checksums are always different, so it look that initramfs is corrupted twice. The second checksum calculated by iPXE matches with the checksum calculated by the kernel. Expected results: iPXE should not corrupt initramfs, server should successfuly boot Additional info: The upstream version of iPXE has been tested, the result is the same. More details in a private comment because they may contain sensitive data.