| Summary: | switch can not find any dhcp packages in 260 seconds | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | FuXiangChun <xfu> | ||||||||||
| Component: | ipxe | Assignee: | Ladi Prosek <lprosek> | ||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 7.3 | CC: | ailan, chayang, juzhang, knoel, michen, qzhang, weliao, xfu, xutian | ||||||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | x86_64 | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2016-04-20 12:23:54 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Attachments: |
|
||||||||||||
|
Description
FuXiangChun
2016-03-22 09:19:05 UTC
As ipxe-roms-qemu-20130517-6.gitc4bce43.el7 works well. it can get tftp packages in 4 seconds. ipxe-roms-qemu-20130517-7.gitc4bce43.el7~ipxe-roms-qemu-20160127-1.git6366fa7a.el7encountered this bug. so set regression to keywords. Created attachment 1138966 [details]
vm ipxe
The main reason is that virtio-net can not get dhcp packages for long time. It is easy to get tftp package once guest get ip address. so changed bug summary tftp/dhcp/s. Created attachment 1148155 [details]
virtio-net rom with DHCP debugging enabled
Hi Fu Xiang Chun, ipxe-roms-qemu-20160127-1.git6366fa7a.el7 uses higher DHCP timeouts but it should still be able to finish all DHCP steps within 4 minutes as long as the server is responsive and the link solid. On my network, the time to finish DHCP went up from 4 seconds to 12 seconds, in line with the PXE spec and the goal of bug 1196352. Could you please re-run your scenario with the attached .rom? It is the current upstream iPXE with DHCP debugging enabled. It should help us figure out what's going on, and whether this issue hasn't been fixed already. Thanks! Ladi (In reply to Ladi Prosek from comment #7) > Hi Fu Xiang Chun, > > ipxe-roms-qemu-20160127-1.git6366fa7a.el7 uses higher DHCP timeouts but it > should still be able to finish all DHCP steps within 4 minutes as long as > the server is responsive and the link solid. > > On my network, the time to finish DHCP went up from 4 seconds to 12 seconds, > in line with the PXE spec and the goal of bug 1196352. > > Could you please re-run your scenario with the attached .rom? It is the > current upstream iPXE with DHCP debugging enabled. It should help us figure > out what's going on, and whether this issue hasn't been fixed already. > > Thanks! > Ladi Ladi, As only one special host can reproduce this bug. I need to reserve it before testing. So late reply to your needinfo. result: still can reproduce this bug with your 1af41000.rom file(need about 260 seconds). I added 2 screenshots to attachment. Additional: QE found one host can reproduce this bug. host info: 1.processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz stepping : 9 microcode : 0x1b cpu MHz : 2466.195 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt bogomips : 6784.52 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: 2.# dmidecode |grep BIOS SMBIOS 2.7 present. BIOS Information BIOS is upgradeable BIOS shadowing is allowed BIOS boot specification is supported BIOS Revision: 1.2 BIOS Language Information If you need more information. please let me know. Created attachment 1148887 [details]
sreenshot 1
Created attachment 1148888 [details]
guest screenshot 2
Thanks Fu Xiang Chun for the additional information. So here's what's happening. This machine is receiving STP packets like this one:
IEEE 802.3 Ethernet
Destination: Spanning-tree-(for-bridges)_00 (01:80:c2:00:00:00)
Address: Spanning-tree-(for-bridges)_00 (01:80:c2:00:00:00)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
Source: JuniperN_f6:5c:a3 (3c:61:04:f6:5c:a3)
Address: JuniperN_f6:5c:a3 (3c:61:04:f6:5c:a3)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Length: 39
Padding: 00000000000000
Logical-Link Control
DSAP: Spanning Tree BPDU (0x42)
IG Bit: Individual
SSAP: Spanning Tree BPDU (0x42)
CR Bit: Command
Control field: U, func=UI (0x03)
000. 00.. = Command: Unnumbered Information (0x00)
.... ..11 = Frame type: Unnumbered frame (0x03)
Spanning Tree Protocol
Protocol Identifier: Spanning Tree Protocol (0x0000)
Protocol Version Identifier: Rapid Spanning Tree (2)
BPDU Type: Rapid/Multiple Spanning Tree (0x02)
BPDU flags: 0x0e (Port Role: Designated, Proposal)
0... .... = Topology Change Acknowledgment: No
.0.. .... = Agreement: No
..0. .... = Forwarding: No
...0 .... = Learning: No
.... 11.. = Port Role: Designated (3)
.... ..1. = Proposal: Yes
.... ...0 = Topology Change: No
Root Identifier: 32768 / 160 / 3c:61:04:f6:5c:81
Root Bridge Priority: 32768
Root Bridge System ID Extension: 160
Root Bridge System ID: JuniperN_f6:5c:81 (3c:61:04:f6:5c:81)
Root Path Cost: 0
Bridge Identifier: 32768 / 160 / 3c:61:04:f6:5c:81
Bridge Priority: 32768
Bridge System ID Extension: 160
Bridge System ID: JuniperN_f6:5c:81 (3c:61:04:f6:5c:81)
Port identifier: 0x8221
Message Age: 0
Max Age: 20
Hello Time: 2
Forward Delay: 15
Version 1 Length: 0
Because the Forwarding bit is not set, iPXE treats the link as blocked and waits for it to unblock. This functionality was added upstream quite recently:
https://git.ipxe.org/ipxe.git/commitdiff/d73982f098db9fdedb28a3826eb97a6832eac1e4
which explains why this looks like a regression. It has nothing to do with bug 1196352 / longer timeouts.
Maybe the Juniper device where the packets originate (3c:61:04:f6:5c:a3) is misconfigured. Maybe only that one port this machine is connected to is broken. Either way, not an iPXE bug.
Can you please follow up with IT / netops? (In reply to Ladi Prosek from comment #14) > Can you please follow up with IT / netops? Thanks for Ladi's detailed explanation. In order to confirm this problem. I have sent a ticket to IT. and will update reason to bz once IT reply me. https://engineering.redhat.com/rt/Ticket/Display.html?id=399935 Another. I still have 2 small problems. P1. why ipxe-roms-qemu-20130517-6.gitc4bce43.el7 works? (According to my understanding. This version shouldn't work) p2. Can upstream's patch fix this problem? If it can. then It should be a valid bug before patch is backported. (In reply to FuXiangChun from comment #15) > (In reply to Ladi Prosek from comment #14) > > Can you please follow up with IT / netops? > > Thanks for Ladi's detailed explanation. In order to confirm this problem. I > have sent a ticket to IT. and will update reason to bz once IT reply me. > > https://engineering.redhat.com/rt/Ticket/Display.html?id=399935 Thank you! > Another. I still have 2 small problems. > > P1. why ipxe-roms-qemu-20130517-6.gitc4bce43.el7 works? (According to my > understanding. This version shouldn't work) This version of the package doesn't contain the code that defers DHCP if the link is blocked as reported by the STP protocol, it was added in 2015 (upstream). In other words what you hit is a feature that's missing in the 2013 package. > p2. Can upstream's patch fix this problem? If it can. then It should be a > valid bug before patch is backported. Reverting the defer-DHCP-on-STP-blocked-link feature would be extremely unlikely to succeed. And again, this is not a bug, iPXE works as intended here. |