Bug 2022381
| Summary: | uefi: booting from http/1.0 server fails | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Gerd Hoffmann <kraxel> | ||||||
| Component: | edk2 | Assignee: | Oliver Steffen <osteffen> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Xueqiang Wei <xuwei> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | CentOS Stream | CC: | berrange, bstinson, coli, jinzhao, juzhang, jwboyer, kkiwi, kraxel, lersek, osteffen, pbonzini, virt-maint, yduan, ymankad | ||||||
| Target Milestone: | rc | Keywords: | TestOnly, Triaged | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2022-11-15 09:56:33 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 2074831 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
|
Description
Gerd Hoffmann
2021-11-11 14:24:32 UTC
> not sure whenever edk2 or grub is at fault here.
Any hint on this one? I see there is a HttpDxe in NetworkPkg,
so fetching files via http is probaby a service offered by edk2.
Not sure grub actually uses that though.
In case this is edk2's fault: any hints on how to best report
this upstream?
- tianocore bugzilla?
- edk2 devel list?
- both?
- something else?
> > not sure whenever edk2 or grub is at fault here. > > Any hint on this one? Ok, scratch that question. Just noticed edk2 runs into an assert. log tail: [Bds]Booting UEFI HTTPv4 (MAC:525400123456) InstallProtocolInterface: BA23B311-343D-11E6-9185-5820B1D65299 3E92B180 InstallProtocolInterface: 41D94CD2-35B6-455A-8258-D4E51334AADD 3E821AA0 InstallProtocolInterface: 3AD9DF29-4501-478D-B1F8-7F7FE70E50F3 3E81FEB8 InstallProtocolInterface: F4B427BB-BA21-4F16-BC4E-43E416AB619C 3E81FC30 InstallProtocolInterface: 7A59B29B-910B-4171-8242-A85A0DF25B5B 3E819020 InstallProtocolInterface: 65530BC7-A359-410F-B010-5AADC7EC2B62 3E819E38 InstallProtocolInterface: 41D94CD2-35B6-455A-8258-D4E51334AADD 3E81E1A0 InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 3E811098 HttpNotify: Event - 1, EventStatus - Success HttpNotify: Event - 3, EventStatus - Success TcpOnAppAbort: connection reset issued by application for TCB 3E81E418 HttpNotify: Event - 1, EventStatus - Success HttpNotify: Event - 3, EventStatus - Success TcpOnAppAbort: connection reset issued by application for TCB 3E81E418 [Bds] Expand PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri() -> PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,TCP,DHCP,192.168.42.18,192.168.42.1,255.255.255.0)/Dns(192.168.42.1)/Uri(http://192.168.42.1:8080/EFI/BOOT/grubx64.efi) [Security] 3rd party image[0] can be loaded after EndOfDxe: PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,TCP,DHCP,192.168.42.18,192.168.42.1,255.255.255.0)/Dns(192.168.42.1)/Uri(http://192.168.42.1:8080/EFI/BOOT/grubx64.efi). InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 3E8201C0 Loading driver at 0x0003E316000 EntryPoint=0x0003E317000 InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 3E821918 ProtectUefiImageCommon - 0x3E8201C0 - 0x000000003E316000 - 0x0000000000278000 InstallProtocolInterface: 7A59B29B-910B-4171-8242-A85A0DF25B5B 3E819020 InstallProtocolInterface: 8A219718-4EF5-4761-91C8-C0F04BDA9E56 3E80C920 InstallProtocolInterface: 87C8BAD7-0595-4053-8297-DEDE395F5D5B 3E821AC0 InstallProtocolInterface: F4B427BB-BA21-4F16-BC4E-43E416AB619C 3E81AE30 InstallProtocolInterface: 65530BC7-A359-410F-B010-5AADC7EC2B62 3E81ABB8 InstallProtocolInterface: 41D94CD2-35B6-455A-8258-D4E51334AADD 3E81E1A0 InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 3E80CE98 HttpNotify: Event - 1, EventStatus - Success HttpNotify: Event - 3, EventStatus - Success HttpNotify: Event - 3, EventStatus - Success TcpInput: sequence acceptance test failed for segment of TCB 3E81E418 TcpInput: Discard a packet TcpOnAppAbort: connection reset issued by application for TCB 3E81E418 ASSERT /home/kraxel/projects/edk2/NetworkPkg/TcpDxe/TcpOutput.c(155): ((INT32) ((Tcb->SndNxt) - (((TCP_SEG *) ((Nbuf)->ProtoData))->End)) <= 0) Hi Gerd, I've only ever tested edk2 HTTP(S) Boot with remote ISO images <https://github.com/tianocore/tianocore.github.io/wiki/HTTP-Boot#ram-disk-boot-from-http>; I've never tried any kind of chain-loading (shim or grub) with edk2 HTTP. The problem could be in any of those compontens. A long time ago, there was a bug in grub, where its PXE boot implementation fought over the SNP interface with the firmware's ARP driver -- SNP is exclusive use, so two agents cannot use it simultaneously. Perhaps this is a similar issue with HTTP or TCP. A packet capture and detailed backtraces will likely be necessary. FWIW, the exact ASSERT that fails goes back to the appearance of NetworkPkg in edk2 (in 2010); there could be ancient bugs in the edk2 TCP stack. Thanks Laszlo Automated test for this: https://gitlab.com/kraxel/edk2-tests/ (1) clone repo (2) fetch + extract test images ('make extract') (3) make sure avocado framework is installed (4) edit network-ns-x64.py, remove the @avocado.skip annotation for test_ovmf_q35_http4 (5) kick off network tests: './test-rpms.sh /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net' It's all a bit tricky to setup, specifically dnsmasq needs "--dhcp-option-force=60,HTTPClient" otherwise edk2 doesn't accept the response for http boot which is your problem I guess. qemu user networking isn't able to do that. The testcases in network-ns-x64.py use a network namespace and run everything (http server, dhcp server, qemu) inside. (In reply to Gerd Hoffmann from comment #7) > Automated test for this: > > https://gitlab.com/kraxel/edk2-tests/ > > (1) clone repo > (2) fetch + extract test images ('make extract') > (3) make sure avocado framework is installed > (4) edit network-ns-x64.py, remove the @avocado.skip annotation for > test_ovmf_q35_http4 > (5) kick off network tests: './test-rpms.sh > /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net' > > It's all a bit tricky to setup, specifically dnsmasq needs > "--dhcp-option-force=60,HTTPClient" otherwise > edk2 doesn't accept the response for http boot which is your problem I > guess. qemu user networking isn't > able to do that. > > The testcases in network-ns-x64.py use a network namespace and run > everything (http server, dhcp server, qemu) inside. Gerd, Thanks for your detailed explanation. Many thanks. I run the automation test case, hit the message [1], but did not find message [2] in firmware.log. Please refer to attachment for the debug.log and firmware.log [1] TcpOnAppAbort: connection reset issued by application for TCB 3D979818 [2] ASSERT /home/kraxel/projects/edk2/NetworkPkg/TcpDxe/TcpOutput.c(155): ((INT32) ((Tcb->SndNxt) - (((TCP_SEG *) ((Nbuf)->ProtoData))->End)) <= 0) (See also this Comment: https://bugzilla.tianocore.org/show_bug.cgi?id=3735#c4). The version of grub from the Red Hat boot loader team [1] uses the HTTP protocol implementation from the firmware (EFI_HTTP_PROTOCOL), unlike the upstream version [2], see grub-core/net/efi/http.c [3], from introduced with commit 4b477fef4c3aa0bdfde2bf5b2384afb5a45b584f [4,5]. This explains the user agent string to mention "Uefi" instead of "Grub" and calls to the EFI HTTP protocol functions during the runtime of grub (can be observed by adding print statements in edk2). [1] https://github.com/rhboot/grub2 [2] https://git.savannah.gnu.org/git/grub.git [3] https://github.com/rhboot/grub2/blob/centos-9/grub-core/net/efi/http.c [4] https://github.com/rhboot/grub2/commit/4b477fef4c3aa0bdfde2bf5b2384afb5a45b584f [5] https://gitlab.com/redhat/centos-stream/rpms/grub2/-/blob/c9s/0095-Support-UEFI-networking-protocols.patch#L2056 Here is a backtrace from where the ASSERT fails. I replaced the ASSERT by an if + CpuDeadLoop() to connect the debugger when when the condition occurs. The problem under investigation does not occur with debugger attached. #0 0x000000003ec255ad in CpuPause () at <...>/MdePkg/Library/BaseLib/X64/GccInline.c:45 #1 CpuDeadLoop () at <...>/MdePkg/Library/BaseLib/CpuDeadLoop.c:30 #2 TcpGetMaxSndNxt (Tcb=0x3e81ad18) at <...>/NetworkPkg/TcpDxe/TcpOutput.c:155 <--- Originally the failing ASSERT #3 TcpGetMaxSndNxt (Tcb=Tcb@entry=0x3e81ad18) at <...>/NetworkPkg/TcpDxe/TcpOutput.c:137 #4 0x000000003ec25624 in TcpDataToSend (Tcb=Tcb@entry=0x3e81ad18, Force=Force@entry=1) at <...>/NetworkPkg/TcpDxe/TcpOutput.c:209 #5 0x000000003ec29594 in TcpToSendData (Tcb=0x3e81ad18, Force=1) at <...>/NetworkPkg/TcpDxe/TcpOutput.c:806 #6 0x000000003ec2bc74 in TcpOnAppClose (Tcb=<optimized out>) at <...>/NetworkPkg/TcpDxe/TcpMisc.c:782 #7 TcpDispatcher (Sock=0x3e813d18, Request=<optimized out>, Data=0x0) at <...>/NetworkPkg/TcpDxe/TcpDispatcher.c:804 #8 0x000000003ec25254 in SockConnect (Sock=0x3e813d18, Sock@entry=0x3e81ad18, Token=0x3e8130f8) at <...>/NetworkPkg/TcpDxe/SockInterface.c:449 #9 0x000000003ec252d6 in Tcp4Connect (ConnectionToken=0x1, This=<optimized out>) at <...>/NetworkPkg/TcpDxe/TcpMain.c:272 #10 Tcp4Connect (This=<optimized out>, ConnectionToken=0x1) at <...>/NetworkPkg/TcpDxe/TcpMain.c:259 #11 0x000000003ebfaec9 in HttpCreateConnection (HttpInstance=HttpInstance@entry=0x3e813018) at <...>/NetworkPkg/HttpDxe/HttpProto.c:952 #12 0x000000003ec0007d in HttpConnectTcp4 (HttpInstance=0x3e813018) at <...>/NetworkPkg/HttpDxe/HttpProto.c:1222 #13 HttpInitSession (TlsConfigure=0 '\000', Configure=0 '\000', Wrap=0x3e81a418, HttpInstance=0x3e813018) at <...>/NetworkPkg/HttpDxe/HttpProto.c:1413 #14 EfiHttpRequest (This=0x3e813020, Token=0x3fe9c5e0) at <...>/NetworkPkg/HttpDxe/HttpImpl.c:589 #15 0x000000003e30b552 in ?? () #16 0x000000003e813020 in ?? () #17 0x000000003fe9c5e0 in ?? () #18 0x000000003bca5860 in ?? () #19 0x000000003bc857e0 in ?? () #20 0x000000003fe9c5e0 in ?? () #21 0x000000003beb669d in ?? () #22 0x000000003bca6100 in ?? () #23 0x0000000000000000 in ?? () Tcb->State == 2 (TCP_SYN_SENT) at ASSERT. I took a look at the package capture The order of TCP responses seems in the wrong order on the firmware side. ┌────┬────────────┬──────┬─────────────────────────────────────────────────────────────┬─────────┬─────────┐ │ No.│ Time │Length│Info │ SEQ│ ACK│ ├────┼────────────┼──────┼─────────────────────────────────────────────────────────────┼─────────┼─────────┤ │ │ │ │ ( ... downloading grubx64.efi ... ) │ │ │ │1236│19.995672084│ 54│1646 > 8080 [ACK] Seq=102 Ack=2594486 Win=64207 Len=0 │392587732│115357342│ │1237│19.996215825│ 54│1646 > 8080 [RST] Seq=102 Win=65535 Len=0 │392587732│115357342│ │ │ │ │ ( ^ Download complete. Client resets connection instread of FIN ) │ │ │ │ │ │ ( starting grub ... ) │ │ │ │1238│20.023909816│ 42│(ARP) Who has 192.168.42.1? Tell 192.168.42.18 │ │ │ │1239│20.023918190│ 42│(ARP) 192.168.42.1 is at 62:8d:41:dd:01:c7 │ │ │ │1240│20.024184674│ 58│1647 > 8080 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 │392589778│ 0│ │1241│20.024196726│ 58│8080 > 1647 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460│241569928│392589779│ │1242│20.024272977│ 54│1647 > 8080 [ACK] Seq=1 Ack=1 Win=65535 Len=0 │392589779│241569929│ │1243│20.024526108│ 158│1647 > 8080 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=104 │392589779│241569929│ │ │ │ │ ( ^ HEAD grub.cfg ) │ │ │ │1244│20.024532734│ 54│8080 > 1647 [ACK] Seq=1 Ack=105 Win=64136 Len=0 │241569929│392589883│ │1245│20.024753511│ 255│8080 > 1647 [PSH, ACK] Seq=1 Ack=105 Win=64136 Len=201 │241569929│392589883│ │ │ │ │ ( ^ Response to HEAD ) │ │ │ │1246│20.024781818│ 54│8080 > 1647 [FIN, ACK] Seq=202 Ack=105 Win=64136 Len=0 │241570130│392589883│ │ │ │ │ ( ^ Server closes connection ) │ │ │ │1247│20.024804128│ 54│1647 > 8080 [ACK] Seq=105 Ack=202 Win=65334 Len=0 │392589883│241570130│ │ │ │ │ ( ^ Client ACK for No. 1245 ) │ │ │ │1248│20.025066997│ 157│1647 > 8080 [PSH, ACK] Seq=105 Ack=202 Win=65334 Len=103 │392589883│241570130│ │ │ │ │ ( ^ Client reuses conn. for GET request ) │ │ │ │1249│20.025075322│ 54│8080 > 1647 [RST] Seq=202 Win=0 Len=0 │241570130│ 0│ │ │ │ │ ( ^ Server complains ) │ │ │ │1250│20.025080407│ 54│1647 > 8080 [ACK] Seq=208 Ack=203 Win=65333 Len=0 │392589986│241570131│ │ │ │ │ ( ^ Client ACK for No. 1246: FIN from server ) │ │ │ │1251│20.025082495│ 54│8080 > 1647 [RST] Seq=203 Win=0 Len=0 │241570131│ 0│ │ │ │ │ ( ^ Server complains about No. 1250, conn. was already reset ) │ │1252│20.026932227│ 54│1647 > 8080 [RST] Seq=208 Win=65535 Len=0 │392589986│241570131│ │ │ │ │ ( ASSERT ) │ │ │ └────┴────────────┴──────┴─────────────────────────────────────────────────────────────┴─────────┴─────────┘ "1646 > 8080": client to server "8080 > 1646": server to client I put a bunch of print statements (debug console) the http and some of the tcp code. This slows everyhting down noticably. In this case the situation is different: The TCP communication is in a different order. ┌────┬────────────┬──────┬─────────────────────────────────────────────────────────────┬──────────┬──────────┐ │ No.│ Time│Length│Info │ SEQ│ ACK│ ├────┼────────────┼──────┼─────────────────────────────────────────────────────────────┼──────────┼──────────┤ │ │ │ │ ( ... downloading grubx64.efi ... ) │ │ │ │2376│37.015509174│ 54│1840 > 8080 [ACK] Seq=106 Ack=2600457 Win=64075 Len=0 │1145620198│1090026340│ │2377│37.033942580│ 54│1840 > 8080 [ACK] Seq=106 Ack=2602446 Win=65007 Len=0 │1145620198│1090028329│ │2378│37.052953349│ 54│1840 > 8080 [RST] Seq=106 Win=65535 Len=0 │1145620198│1090028329│ │ │ │ │ ( ^ Download complete. Client resets instread of FIN ) │ │ | │ │ │ │ ( starting grub ... ) │ │ │ │2379│37.129416802│ 42│(ARP) Who has 192.168.42.1? Tell 192.168.42.18 │ │ │ │2380│37.129428482│ 42│(ARP) 192.168.42.1 is at 42:07:6e:ac:bc:04 │ │ │ │2381│37.130443090│ 58│1841 > 8080 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 │1145630440│ 0│ │2382│37.130459835│ 58│8080 > 1841 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460│2938323266│1145630441│ │2383│37.137178082│ 54│1841 > 8080 [ACK] Seq=1 Ack=1 Win=65535 Len=0 │1145630441│2938323267│ │2384│37.146770500│ 162│1841 > 8080 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=108 │1145630441│2938323267│ │ │ │ │ ( ^ HEAD grub.cfg ) │ │ │ │2385│37.146788923│ 54│8080 > 1841 [ACK] Seq=1 Ack=109 Win=64132 Len=0 │2938323267│1145630549│ │2386│37.147065843│ 255│8080 > 1841 [PSH, ACK] Seq=1 Ack=109 Win=64132 Len=201 │2938323267│1145630549│ │ │ │ │ ( ^ Response to HEAD ) │ │ │ │2387│37.147098195│ 54│8080 > 1841 [FIN, ACK] Seq=202 Ack=109 Win=64132 Len=0 │2938323468│1145630549│ │2388│37.161140215│ 54│1841 > 8080 [ACK] Seq=109 Ack=202 Win=65334 Len=0 │1145630549│2938323468│ │ │ │ │ ( ^ Client ACKs 2386 ) │ │ │ │2389│37.163976426│ 54│1841 > 8080 [ACK] Seq=109 Ack=203 Win=65333 Len=0 │1145630549│2938323469│ │ │ │ │ ( ^ Client ACKs 2387: FIN ) │ │ │ │2390│37.169382000│ 54│1841 > 8080 [RST] Seq=109 Win=65535 Len=0 │1145630549│2938323469│ │ │ │ │ ( ^ Client uses RST instread of FIN ) │ │ │ │2391│37.178297699│ 58│[TCP Port numbers reused] 1841 > 8080 | | | │ │ │ │ [SYN] Seq=0 Win=65535 Len=0 MSS=1460 │1145632488│2938323469│ │2392│37.178318945│ 58│8080 > 1841 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460│2939071032│1145632489│ │2393│37.185358807│ 54│1841 > 8080 [ACK] Seq=1 Ack=1 Win=65535 Len=0 │1145632489│2939071033│ │2394│37.195330842│ 161│1841 > 8080 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=107 │1145632489│2939071033│ │ │ │ │ ( ^ GET grub.cfg ) │ │ │ │2395│37.195349491│ 54│8080 > 1841 [ACK] Seq=1 Ack=108 Win=64133 Len=0 │2939071033│1145632596│ │ │ │ │ ( ... ) │ │ │ └────┴────────────┴──────┴─────────────────────────────────────────────────────────────┴──────────┴──────────┘ "1840 > 8080": client to server "8080 > 1840": server to client (In reply to Oliver Steffen from comment #15) > I took a look at the package capture > The order of TCP responses seems in the wrong order on the firmware side. The ordering of the packets in the capture file can differ from the processing order in the firmware. Specifically it is quite likely that the firmware has sent out 1248 before processing the incoming 1246. In general edk2 doesn't use interrupts (timer interrupt is the exception to the rule). So the nic is polled for new packets, and we could have this sequence of events: (1) edk2 polls nic. (2) 1246 arrives. (3) 1248 is sent. (4) edk2 polls nic (and finds 1246). A variant of this problem was already discovered in 2020: https://bugzilla.tianocore.org/show_bug.cgi?id=2720 Created attachment 1862881 [details]
Patch for edk2-tests: Use HTTP/1.1 in python server
Workaround: Use the working version 1.1 of the HTTP protocol on the web server side.
> Patch for edk2-tests: Use HTTP/1.1 in python server
That works indeed. Well, the testcase fails a bit later now. when grub tries to fetch a command list ...
I get that too now when re-running the test. It worked fine when I tried it earlier when I created the patch. That might happen, we already know that this problem is timing related. The python web server now closes the connection after the 404-response (which, according to some googling, is how some server implementations handle it), and we end up in the same situation again. But this time the server even announces that it will close the connection via the header: HTTP/1.1 404 File not found Server: SimpleHTTP/0.6 Python/3.10.2 Date: Wed, 23 Feb 2022 14:40:28 GMT Connection: close Content-Type: text/html;charset=utf-8 Content-Length: 469 So it seems like neither HTTP/1.0 not 1.1 are correctly implemented. Regardless of the (potentially messed up) processing order of TCP segments the program should not have reused the connection. I am not sure who's responsibility it is to watch the headers and reopen the connections. In the upstream discussion mentioned above[1] the proposed fix was to have the user of the EFI_HTTP_PROTOCOL do that. In there discussed case the change was to HttpBootDxe, which uses DxeHttpIoLib and not the EFI_HTTP_PROTOCOL directly. In our case that would mean the RHEL grub would have to take care of all that. From a user point of view I would expect that EFI_HTTP_PROTOCOL should handle that. Side note: nginx does not close the connection on 404, and thus the test succeeds. [1] https://bugzilla.tianocore.org/show_bug.cgi?id=2720 Created attachment 1863040 [details]
Patch for edk2-tests: Use nginx as http server
For cross checks and testing only.
nginx seems a rather heavy dependency for edk2-tests.
> I am not sure who's responsibility it is to watch
> the headers and reopen the connections.
> In the upstream discussion mentioned above[1]
> the proposed fix was to have the user of the EFI_HTTP_PROTOCOL
> do that. In there discussed case the change was to HttpBootDxe,
> which uses DxeHttpIoLib and not the EFI_HTTP_PROTOCOL directly.
> In our case that would mean the RHEL grub would have to take care of
> all that.
Hmm, seems processing the response headers happens one layer up indeed.
See 'git grep HTTP_HEADER_CONTENT_LENGTH -- NetworkPkg'
So DxeHttpIoLib being responsible for finding "Connection: close" kind-of
makes sense. On the other hand the HttpVersion is not exported by HttpDxe,
so turning off pipelining for http/1.0 servers must be handled on that
level anyway.
I guess I need to browse the source code a bit more ...
Also I do not see a good way of re-opening the connection for the user of EFI_HTTP_PROTOCOL, without doing a full Http->Configure(Http, NULL); Http->Configure(Http, &Config); This is not something one wants to do between each single request (HTTP/1.0 case). For HTTP/1.1, the user still needs to check headers and status codes, and do the re-configure (above). To reduce the number of HTTP requests, one could skip the HEAD and use GET immediately. Content-Length and status (404 or not) can also be obtained from the GET response. But in the end, this does not solve anything, if multiple files are queried (as grub does). I am also wondering what the motivation of the grub team was to switch to the EFI_HTTP_PROTOCOL instead of staying with the http implementation of upstream grub. The patches in PR [1] fix the following: - Detection of server side HTTP version. If version is not 1.1, do not reuse the TCP connection for the next request. This makes it possible to use a 1.0 web server as described above. - Detect "Connection: close" headers sent by HTTP 1.1 servers and close the TCP connection after the current request in that case. Some HTTP 1.1 servers close the connection if a 404-error occurs. Since Grub probes for a set of config files which may or may not exist ths might happen. - Fix re-configuring of the HTTP instance. The TCP protocol was not de-configured correctly. Open ToDos: - Check if similar problems exist when using IPv6 (probably!) - Also check HTTPS, for both IPv4 and IPv6. Current open problem: Firmware/grub does not close the connection after loading the last file and booting the kernel. This leaves an HTTP 1.1 server waiting on the abandoned connection for a long time. The python web server is single threaded and blocks all other requests in that period. The connection should be closed properly before launching the kernel. [1] https://github.com/tianocore/edk2/pull/2564 Posted to devel list: https://edk2.groups.io/g/devel/message/87283 Patches have been merged upstream: https://github.com/tianocore/edk2/pull/2756 drop '--test-runner=runner' from the script. fyi: background story why this is there: https://github.com/avocado-framework/avocado/issues/5407 (In reply to Gerd Hoffmann from comment #35) > drop '--test-runner=runner' from the script. I updated test-rpms.sh and got following results, maybe need further updates. # cat test-rpm.sh ... avocado run $tags *-{aa64,arm,ia32,x64}.py #avocado run --test-runner=runner $tags *-{aa64,arm,ia32,x64}.py ... # ./test-rpms.sh /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net ### ### conf: /usr/share/qemu/firmware/50-edk2-ovmf.json ### code: /usr/share/edk2/ovmf/OVMF_CODE.secboot.fd ### vars: /usr/share/edk2/ovmf/OVMF_VARS.fd ### arch: x86_64 ### tags: -t arch:x86_64,machine:q35,network:virtio-net ### JOB ID : 93e213b0a4f3a0a79220fe427108aed380bdfbd1 JOB LOG : /root/avocado/job-results/job-2022-06-21T09.49-93e213b/job.log (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.08 s) (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: STARTED (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s) (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s) (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: STARTED (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s) (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: STARTED (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s) (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: STARTED (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s) (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: STARTED (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s) (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: STARTED (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s) (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: STARTED (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s) RESULTS : PASS 0 | ERROR 9 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB HTML : /root/avocado/job-results/job-2022-06-21T09.49-93e213b/results.html JOB TIME : 13.47 s Test summary: network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: ERROR network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: ERROR network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: ERROR network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: ERROR network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: ERROR network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: ERROR network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: ERROR network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: ERROR ### ### rc 1 ### > (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED
> (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: ERROR: __init__()
> got an unexpected keyword argument 'run.results_dir' (0.08 s)
Had that too recently. I think simply updating avocato-framework (0.97) fix that for me.
After installing qemu/python/qemu and virt-firmware, got following results: # ./test-rpms.sh /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net ### ### conf: /usr/share/qemu/firmware/50-edk2-ovmf.json ### code: /usr/share/edk2/ovmf/OVMF_CODE.secboot.fd ### vars: /usr/share/edk2/ovmf/OVMF_VARS.fd ### arch: x86_64 ### tags: -t arch:x86_64,machine:q35,network:virtio-net ### JOB ID : cd5d8875fbc47e11990917c5fd0f4f65b8901935 JOB LOG : /root/avocado/job-results/job-2022-06-24T03.03-cd5d887/job.log (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: INTERRUPTED: timeout (60.02 s) (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: STARTED (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: INTERRUPTED: timeout (60.02 s) (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: INTERRUPTED: timeout (60.01 s) (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: STARTED (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: INTERRUPTED: timeout (60.02 s) (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: STARTED (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: SKIP: no iso image (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: STARTED (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: SKIP: wip (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: STARTED (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: SKIP: no iso image (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: STARTED (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: SKIP: no iso image (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: STARTED (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: PASS (26.12 s) RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 4 | WARN 0 | INTERRUPT 4 | CANCEL 0 JOB HTML : /root/avocado/job-results/job-2022-06-24T03.03-cd5d887/results.html JOB TIME : 290.55 s ### ### rc 8 ### > (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED > (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: INTERRUPTED: timeout > (60.02 s) > (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: STARTED > (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: SKIP: no iso > image Oh, it needs the test images, "make extract boot-iso" fetches them. And the tftp test needs a check to see whenever the images needed are there, simliar to the iso test, /me goes fix that. Latest results after fetching test images through "make extract boot-iso": # ./test-rpms.sh /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net ### ### conf: /usr/share/qemu/firmware/50-edk2-ovmf.json ### code: /usr/share/edk2/ovmf/OVMF_CODE.secboot.fd ### vars: /usr/share/edk2/ovmf/OVMF_VARS.fd ### arch: x86_64 ### tags: -t arch:x86_64,machine:q35,network:virtio-net ### JOB ID : 7f26f9831bb2bb9da65291007fbc5cb446cd0e41 JOB LOG : /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/job.log (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: PASS (30.48 s) (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: STARTED (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: PASS (48.48 s) (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: PASS (37.25 s) (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: STARTED (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: PASS (40.05 s) (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: STARTED (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR: Command 'echo -ne "GET / HTTP/1.0\r\n\r\n" | /usr/bin/nsenter -n -U --preserve-credentials -t 113258 openssl s_client -CAfile /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/test-results/5-network-ns-x64.py_TestNetNS.test_ovmf_q35_https4_iso/tmp_di... (7.53 s) (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: STARTED (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: SKIP: wip (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: STARTED (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: INTERRUPTED: timeout (60.01 s) (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: STARTED (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: ERROR: Command 'echo -ne "GET / HTTP/1.0\r\n\r\n" | /usr/bin/nsenter -n -U --preserve-credentials -t 113425 openssl s_client -CAfile /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/test-results/8-network-ns-x64.py_TestNetNS.test_ovmf_q35_https6_iso/tmp_di... (6.89 s) (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: STARTED (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: PASS (26.03 s) RESULTS : PASS 5 | ERROR 2 | FAIL 0 | SKIP 1 | WARN 0 | INTERRUPT 1 | CANCEL 0 JOB HTML : /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/results.html JOB TIME : 273.77 s Test summary: network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: ERROR ### ### rc 9 ### > (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED > (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: PASS (37.25 s) Good. That is the testcase which was broken due to the bug. > (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR: Command > 'echo -ne "GET / HTTP/1.0\r\n\r\n" | /usr/bin/nsenter -n -U > --preserve-credentials -t 113258 openssl s_client -CAfile > /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/test-results/5- > network-ns-x64.py_TestNetNS.test_ovmf_q35_https4_iso/tmp_di... (7.53 s) Oh. https server sanity check fails. Never saw this one before. Can I get the full debug.log? (In reply to Gerd Hoffmann from comment #45) > > (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED > > (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: PASS (37.25 s) > > Good. That is the testcase which was broken due to the bug. This means the verification is PASS. Right? > > > (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR: Command > > 'echo -ne "GET / HTTP/1.0\r\n\r\n" | /usr/bin/nsenter -n -U > > --preserve-credentials -t 113258 openssl s_client -CAfile > > /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/test-results/5- > > network-ns-x64.py_TestNetNS.test_ovmf_q35_https4_iso/tmp_di... (7.53 s) > > Oh. https server sanity check fails. Never saw this one before. Can I get > the full debug.log? test_ovmf_q35_https4_iso.debug.log is attached. Thanks! (In reply to yduan from comment #47) > (In reply to Gerd Hoffmann from comment #45) > > > (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED > > > (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: PASS (37.25 s) > > > > Good. That is the testcase which was broken due to the bug. > > This means the verification is PASS. Right? Yes. > > Oh. https server sanity check fails. Never saw this one before. Can I get > > the full debug.log? > > test_ovmf_q35_https4_iso.debug.log is attached. thanks. Change to VERIFIED based on comment 44 and comment 48. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (edk2 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:7971 |