Bug 2369047
Summary: | DHCP not working reliably between tap / openvswitch guests with qemu 9.2.0+ | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> |
Component: | qemu | Assignee: | Fedora Virtualization Maintainers <virt-maint> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 42 | CC: | aodaki, berrange, cfergeau, crobinso, jasowang, laine, mcascell, pbonzini, philmd, rjones, suraj.ghimire7, virt-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-9.2.4-1.fc42 | Doc Type: | --- |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2025-06-03 05:17:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Adam Williamson
2025-05-28 18:14:01 UTC
Looking at the server logs from a test after the downgrade to qemu 9.1.3, they're way cleaner. There's a single set of DISCOVER / OFFER / REQUEST / ACK messages for each client. Four messages per client and everything is good. 9.2.0 is bad, so this broke between 9.1.x and 9.2.0. Just to verify: in the case of Comment 1, the *only* thing changed to make it "way cleaner" was switching from qemu 9.2.3 to 9.1.3 and nothing else was changed? Normally a time-based issue like this would lead me to think of the STP learning time when a port is newly connected and forwarding hasn't yet been enabled (which is usually right around 30 seconds, and that is about how long after the first DHCPDISCOVER until the first offer is received)(although... there is also the amount of time it takes for the OS to boot up to the level of starting dhclient, so maybe that's a false clue), but that would be on the side of the bridge device, not the tap (and certainly not the emulated device in the guest). (sorry I don't have any concrete advice; just poking to see if some other clue emerges...) Yep. qemu version is the only variable between broken and working cases. I'm bisecting this ATM, the range is down to 'somewhere between 67194c7018b8b06a1c149757f596bb919c683725 and 9.2.0' so far, at least assuming my bisect procedure is valid (I'll verify everything and double-check results when the bisect is complete). Still bisecting, but I took a quick break to verify a v9.2.0 build with my bisect setup is bad, and it is, so the bisect procedure looks valid. The current range is 61 commits, and it includes this juicy little cluster: 16f6804c46 vhost_net: fix assertion triggered by batch of host notifiers processing 9379ea9db3 virtio-net: Add queues before loading them 7987d2be5a virtio-net: Copy received header to buffer 17437418c4 virtio-net: Initialize hash reporting values 1981fa9d7d virtio-net: Fix hash reporting when the queue changes 162bdb8113 virtio-net: Do not check for the queue before RSS a8575f7fb2 virtio-net: Fix size check in dhclient workaround 5930e5ccf3 net: checksum: Convert data to void * so, I'm gonna bet we wind up in one of those, maybe the "Fix size check in dhclient workaround" one. OK, bisect comes down to: commit 7987d2be5a8bc3a502f89ba8cf3ac3e09f64d1ce Author: Akihiko Odaki Date: Fri Nov 22 14:03:12 2024 +0900 virtio-net: Copy received header to buffer receive_header() used to cast the const qualifier of the pointer to the received packet away to modify the header. Avoid this by copying the received header to buffer. Signed-off-by: Akihiko Odaki Signed-off-by: Jason Wang CCing them. The upstream fixed the issue by reverting it. The upstream issue is tracked at: https://gitlab.com/qemu-project/qemu/-/issues/2727 ah, thanks. I should've looked there before bisecting again...sigh. FEDORA-2025-2b4cc4d8cd (qemu-9.2.4-1.fc42) has been submitted as an update to Fedora 42. https://bodhi.fedoraproject.org/updates/FEDORA-2025-2b4cc4d8cd FEDORA-2025-2b4cc4d8cd (qemu-9.2.4-1.fc42) has been pushed to the Fedora 42 stable repository. If problem still persists, please make note of it in this bug report. |