Bug 507391
Summary: | qemu-kvm PXE boot with e1000 results in bogus packets | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Gilboa Davara <gilboad> | ||||||||
Component: | etherboot | Assignee: | Mark McLoughlin <markmc> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 11 | CC: | dwmw2, ehabkost, gcosta, itamar, jaswinder, kari.hautio, markmc, pcfe, virt-maint | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | 5.4.4-16.fc11 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2009-07-02 05:41:58 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 480594, 494832 | ||||||||||
Attachments: |
|
Created attachment 348935 [details]
Private bridge configuration. (Bridge running in promisc mode)
Created attachment 348936 [details]
tap42 wireshark recording.
P.S. dhcp works just fine, once the OS actually boots. What version of etherboot is this? Does etherboot-5.4.4-15.fc11 help? https://admin.fedoraproject.org/updates/etherboot-5.4.4-15.fc11 I doubt it - those frames are pretty messed up. Does it work with e.g. rtl8139, virtio, ne2k_pci or pcnet? Works just fine with rtl8139 with etherboot-5.4.4-13. I'm still getting trashed 0xff frames with etherboot-5.4.4-15. - Gilboa Okay, so the packet dump shows the type field in the ethernet header is (incorrectly) zero. Enabling debugging in etherboot-5.4.4/drivers/net/e1000.c made the problem go away, which was the first clue. The code is as follows: struct eth_hdr { unsigned char dst_addr[ETH_ALEN]; unsigned char src_addr[ETH_ALEN]; unsigned short type; } hdr; ... hdr.type = htons (type); txhd = tx_base + tx_tail; tx_tail = (tx_tail + 1) % 8; ... txhd->buffer_addr = virt_to_bus (&hdr); ... E1000_WRITE_REG (&hw, TDT, tx_tail); i.e. we're setting the type in the header on the stack, setting up a tx descriptor to point to header on the stack and then writing the descriptor number to the device queue. Looking at the assembly, I see: 36d: 8b 4c 24 38 mov 0x38(%esp),%ecx 371: 86 cd xchg %cl,%ch ... 3fb: 89 90 18 38 00 00 mov %edx,0x3818(%eax) ... 407: 66 89 4c 24 1e mov %cx,0x1e(%esp) i.e. we're only actually moving the results of the htons() into the header on the stack until after we've set the TDT register. At that point the packet has already been sent. The problem is that the compiler has no way of knowing this memory is used as a result of us writing to the register. So, if we do: - struct eth_hdr { + volatile struct eth_hdr { we see: 36c: 8b 44 24 38 mov 0x38(%esp),%eax 370: 86 c4 xchg %al,%ah 372: 66 89 44 24 1e mov %ax,0x1e(%esp) ... 400: 89 90 18 38 00 00 mov %edx,0x3818(%eax) This fixes the problem. * Tue Jun 23 2009 Mark McLoughlin <markmc> - 5.4.4-16 - Fix e1000 PXE boot - caused by compiler optimization (bug #507391) *** Bug 494541 has been marked as a duplicate of this bug. *** etherboot-5.4.4-16.fc11 has been submitted as an update for Fedora 11. http://admin.fedoraproject.org/updates/etherboot-5.4.4-16.fc11 etherboot-5.4.4-16.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update etherboot'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7024 etherboot-5.4.4-16.fc11.noarch seems to solve the problem. - Gilboa etherboot-5.4.4-16.fc11 works for me also and solves no IP problem (bug #494541) Gilboa and Kari, thanks for testing - I'll push to stable now Note, in future, if you go to the update url: https://admin.fedoraproject.org/updates/F11/FEDORA-2009-7024 you can login and add a comment - this increases the update's 'karma'; if enough people comment, the update gets pushed automatically Thanks. Will do. - Gilboa etherboot-5.4.4-16.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 348934 [details] DSL VM configuration Description of problem: I've upgraded my first KVM host to F11. I'm trying to boot DSL (Damn Small Linux) using bootpxe. This test works just fine under F9 and F10. Version-Release number of selected component (if applicable): qemu-0.10.4-4.fc11.x86_64 How reproducible: Always Steps to Reproduce: 1. Setup a private bridge. (Configuration attached.) 2. Setup a qemu empty VM. (Configuration attached.) 3. Boot. Actual results: Client fails to receive an IP. Host sees invalid packets. (pcap attached) Expected results: boot.