RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2022381 - uefi: booting from http/1.0 server fails
Summary: uefi: booting from http/1.0 server fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: edk2
Version: CentOS Stream
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Oliver Steffen
QA Contact: Xueqiang Wei
URL:
Whiteboard:
Depends On: 2074831
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-11 14:24 UTC by Gerd Hoffmann
Modified: 2022-11-15 10:25 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-15 09:56:33 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch for edk2-tests: Use HTTP/1.1 in python server (833 bytes, patch)
2022-02-23 09:29 UTC, Oliver Steffen
no flags Details | Diff
Patch for edk2-tests: Use nginx as http server (3.11 KB, patch)
2022-02-23 17:00 UTC, Oliver Steffen
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Github tianocore edk2 pull 2564 0 None Draft WIP: Fix HTTP connection reuse 2022-03-01 15:34:23 UTC
Github tianocore edk2 pull 2756 0 None Merged mrabeda queue - BZ 3889 & 2720 2022-04-12 08:10:58 UTC
Red Hat Issue Tracker RHELPLAN-102523 0 None None None 2021-11-11 14:28:10 UTC
Red Hat Product Errata RHEA-2022:7971 0 None None None 2022-11-15 09:57:21 UTC
TianoCore 3735 0 None None None 2021-11-15 11:32:19 UTC

Description Gerd Hoffmann 2021-11-11 14:24:32 UTC
Description of problem:
I see pipelined requests (http/1.1 feature) being used when booting
from a http/1.0 server.  Which makes the boot fail.

Version-Release number of selected component (if applicable):
edk2: upstream, master branch
grub: centos stream 9 (download from http://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/EFI/BOOT/)

How reproducible:
100%

Steps to Reproduce:
1. use python http.server to serve boot files (just grub.efi and grub.cfg is enough).
2. try uefi HTTPv4 boot from that server.

Actual results:
grub.efi loads fine.
then grub fails to load the grub.cfg config file.

Expected results:
grub loads grub.cfg.

Additional info:
not sure whenever edk2 or grub is at fault here.

Comment 2 Gerd Hoffmann 2021-11-11 14:34:43 UTC
> not sure whenever edk2 or grub is at fault here.

Any hint on this one?  I see there is a HttpDxe in NetworkPkg,
so fetching files via http is probaby a service offered by edk2.
Not sure grub actually uses that though.

In case this is edk2's fault: any hints on how to best report
this upstream?

 - tianocore bugzilla?
 - edk2 devel list?
 - both?
 - something else?

Comment 3 Gerd Hoffmann 2021-11-11 17:51:24 UTC
> > not sure whenever edk2 or grub is at fault here.
> 
> Any hint on this one?

Ok, scratch that question.  Just noticed edk2 runs into an assert.  log tail:

[Bds]Booting UEFI HTTPv4 (MAC:525400123456)
InstallProtocolInterface: BA23B311-343D-11E6-9185-5820B1D65299 3E92B180
InstallProtocolInterface: 41D94CD2-35B6-455A-8258-D4E51334AADD 3E821AA0
InstallProtocolInterface: 3AD9DF29-4501-478D-B1F8-7F7FE70E50F3 3E81FEB8
InstallProtocolInterface: F4B427BB-BA21-4F16-BC4E-43E416AB619C 3E81FC30
InstallProtocolInterface: 7A59B29B-910B-4171-8242-A85A0DF25B5B 3E819020
InstallProtocolInterface: 65530BC7-A359-410F-B010-5AADC7EC2B62 3E819E38
InstallProtocolInterface: 41D94CD2-35B6-455A-8258-D4E51334AADD 3E81E1A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 3E811098
HttpNotify: Event - 1, EventStatus - Success
HttpNotify: Event - 3, EventStatus - Success
TcpOnAppAbort: connection reset issued by application for TCB 3E81E418
HttpNotify: Event - 1, EventStatus - Success
HttpNotify: Event - 3, EventStatus - Success
TcpOnAppAbort: connection reset issued by application for TCB 3E81E418
[Bds] Expand PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri() -> PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,TCP,DHCP,192.168.42.18,192.168.42.1,255.255.255.0)/Dns(192.168.42.1)/Uri(http://192.168.42.1:8080/EFI/BOOT/grubx64.efi)
[Security] 3rd party image[0] can be loaded after EndOfDxe: PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,TCP,DHCP,192.168.42.18,192.168.42.1,255.255.255.0)/Dns(192.168.42.1)/Uri(http://192.168.42.1:8080/EFI/BOOT/grubx64.efi).
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 3E8201C0
Loading driver at 0x0003E316000 EntryPoint=0x0003E317000 
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 3E821918
ProtectUefiImageCommon - 0x3E8201C0
  - 0x000000003E316000 - 0x0000000000278000
InstallProtocolInterface: 7A59B29B-910B-4171-8242-A85A0DF25B5B 3E819020
InstallProtocolInterface: 8A219718-4EF5-4761-91C8-C0F04BDA9E56 3E80C920
InstallProtocolInterface: 87C8BAD7-0595-4053-8297-DEDE395F5D5B 3E821AC0
InstallProtocolInterface: F4B427BB-BA21-4F16-BC4E-43E416AB619C 3E81AE30
InstallProtocolInterface: 65530BC7-A359-410F-B010-5AADC7EC2B62 3E81ABB8
InstallProtocolInterface: 41D94CD2-35B6-455A-8258-D4E51334AADD 3E81E1A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 3E80CE98
HttpNotify: Event - 1, EventStatus - Success
HttpNotify: Event - 3, EventStatus - Success
HttpNotify: Event - 3, EventStatus - Success
TcpInput: sequence acceptance test failed for segment of TCB 3E81E418
TcpInput: Discard a packet
TcpOnAppAbort: connection reset issued by application for TCB 3E81E418
ASSERT /home/kraxel/projects/edk2/NetworkPkg/TcpDxe/TcpOutput.c(155): ((INT32) ((Tcb->SndNxt) - (((TCP_SEG *) ((Nbuf)->ProtoData))->End)) <= 0)

Comment 4 Laszlo Ersek 2021-11-12 14:11:12 UTC
Hi Gerd,

I've only ever tested edk2 HTTP(S) Boot with remote ISO images <https://github.com/tianocore/tianocore.github.io/wiki/HTTP-Boot#ram-disk-boot-from-http>; I've never tried any kind of chain-loading (shim or grub) with edk2 HTTP. The problem could be in any of those compontens.

A long time ago, there was a bug in grub, where its PXE boot implementation fought over the SNP interface with the firmware's ARP driver -- SNP is exclusive use, so two agents cannot use it simultaneously. Perhaps this is a similar issue with HTTP or TCP. A packet capture and detailed backtraces will likely be necessary. FWIW, the exact ASSERT that fails goes back to the appearance of NetworkPkg in edk2 (in 2010); there could be ancient bugs in the edk2 TCP stack.


Thanks
Laszlo

Comment 7 Gerd Hoffmann 2021-12-02 13:45:13 UTC
Automated test for this:

https://gitlab.com/kraxel/edk2-tests/

(1) clone repo
(2) fetch + extract test images ('make extract')
(3) make sure avocado framework is installed
(4) edit network-ns-x64.py, remove the @avocado.skip annotation for test_ovmf_q35_http4
(5) kick off network tests: './test-rpms.sh /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net'

It's all a bit tricky to setup, specifically dnsmasq needs "--dhcp-option-force=60,HTTPClient" otherwise
edk2 doesn't accept the response for http boot which is your problem I guess.  qemu user networking isn't
able to do that.

The testcases in network-ns-x64.py use a network namespace and run everything (http server, dhcp server, qemu) inside.

Comment 8 Xueqiang Wei 2021-12-13 02:23:15 UTC
(In reply to Gerd Hoffmann from comment #7)
> Automated test for this:
> 
> https://gitlab.com/kraxel/edk2-tests/
> 
> (1) clone repo
> (2) fetch + extract test images ('make extract')
> (3) make sure avocado framework is installed
> (4) edit network-ns-x64.py, remove the @avocado.skip annotation for
> test_ovmf_q35_http4
> (5) kick off network tests: './test-rpms.sh
> /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net'
> 
> It's all a bit tricky to setup, specifically dnsmasq needs
> "--dhcp-option-force=60,HTTPClient" otherwise
> edk2 doesn't accept the response for http boot which is your problem I
> guess.  qemu user networking isn't
> able to do that.
> 
> The testcases in network-ns-x64.py use a network namespace and run
> everything (http server, dhcp server, qemu) inside.


Gerd,

Thanks for your detailed explanation. Many thanks.

I run the automation test case, hit the message [1], but did not find message [2] in firmware.log. Please refer to attachment for the debug.log and firmware.log

[1] TcpOnAppAbort: connection reset issued by application for TCB 3D979818
[2] ASSERT /home/kraxel/projects/edk2/NetworkPkg/TcpDxe/TcpOutput.c(155): ((INT32) ((Tcb->SndNxt) - (((TCP_SEG *) ((Nbuf)->ProtoData))->End)) <= 0)

Comment 13 Oliver Steffen 2022-02-04 09:44:02 UTC
(See also this Comment: https://bugzilla.tianocore.org/show_bug.cgi?id=3735#c4).

The version of grub from the Red Hat boot loader team [1]
uses the HTTP protocol implementation from the firmware (EFI_HTTP_PROTOCOL),
unlike the upstream version [2], see grub-core/net/efi/http.c [3],
from introduced with commit 4b477fef4c3aa0bdfde2bf5b2384afb5a45b584f [4,5].

This explains the user agent string to mention "Uefi" instead of "Grub" and
calls to the EFI HTTP protocol functions during the runtime of grub
(can be observed by adding print statements in edk2).


[1] https://github.com/rhboot/grub2
[2] https://git.savannah.gnu.org/git/grub.git
[3] https://github.com/rhboot/grub2/blob/centos-9/grub-core/net/efi/http.c
[4] https://github.com/rhboot/grub2/commit/4b477fef4c3aa0bdfde2bf5b2384afb5a45b584f
[5] https://gitlab.com/redhat/centos-stream/rpms/grub2/-/blob/c9s/0095-Support-UEFI-networking-protocols.patch#L2056

Comment 14 Oliver Steffen 2022-02-08 14:23:11 UTC
Here is a backtrace from where the ASSERT fails.
I replaced the ASSERT by an if + CpuDeadLoop() to connect the debugger when when the condition occurs.
The problem under investigation does not occur with debugger attached.


#0  0x000000003ec255ad in CpuPause () at <...>/MdePkg/Library/BaseLib/X64/GccInline.c:45
#1  CpuDeadLoop () at <...>/MdePkg/Library/BaseLib/CpuDeadLoop.c:30
#2  TcpGetMaxSndNxt (Tcb=0x3e81ad18) at <...>/NetworkPkg/TcpDxe/TcpOutput.c:155  <--- Originally the failing ASSERT
#3  TcpGetMaxSndNxt (Tcb=Tcb@entry=0x3e81ad18) at <...>/NetworkPkg/TcpDxe/TcpOutput.c:137
#4  0x000000003ec25624 in TcpDataToSend (Tcb=Tcb@entry=0x3e81ad18, Force=Force@entry=1) at <...>/NetworkPkg/TcpDxe/TcpOutput.c:209
#5  0x000000003ec29594 in TcpToSendData (Tcb=0x3e81ad18, Force=1) at <...>/NetworkPkg/TcpDxe/TcpOutput.c:806
#6  0x000000003ec2bc74 in TcpOnAppClose (Tcb=<optimized out>) at <...>/NetworkPkg/TcpDxe/TcpMisc.c:782
#7  TcpDispatcher (Sock=0x3e813d18, Request=<optimized out>, Data=0x0) at <...>/NetworkPkg/TcpDxe/TcpDispatcher.c:804
#8  0x000000003ec25254 in SockConnect (Sock=0x3e813d18, Sock@entry=0x3e81ad18, Token=0x3e8130f8) at <...>/NetworkPkg/TcpDxe/SockInterface.c:449
#9  0x000000003ec252d6 in Tcp4Connect (ConnectionToken=0x1, This=<optimized out>) at <...>/NetworkPkg/TcpDxe/TcpMain.c:272
#10 Tcp4Connect (This=<optimized out>, ConnectionToken=0x1) at <...>/NetworkPkg/TcpDxe/TcpMain.c:259
#11 0x000000003ebfaec9 in HttpCreateConnection (HttpInstance=HttpInstance@entry=0x3e813018) at <...>/NetworkPkg/HttpDxe/HttpProto.c:952
#12 0x000000003ec0007d in HttpConnectTcp4 (HttpInstance=0x3e813018) at <...>/NetworkPkg/HttpDxe/HttpProto.c:1222
#13 HttpInitSession (TlsConfigure=0 '\000', Configure=0 '\000', Wrap=0x3e81a418, HttpInstance=0x3e813018) at <...>/NetworkPkg/HttpDxe/HttpProto.c:1413
#14 EfiHttpRequest (This=0x3e813020, Token=0x3fe9c5e0) at <...>/NetworkPkg/HttpDxe/HttpImpl.c:589
#15 0x000000003e30b552 in ?? ()
#16 0x000000003e813020 in ?? ()
#17 0x000000003fe9c5e0 in ?? ()
#18 0x000000003bca5860 in ?? ()
#19 0x000000003bc857e0 in ?? ()
#20 0x000000003fe9c5e0 in ?? ()
#21 0x000000003beb669d in ?? ()
#22 0x000000003bca6100 in ?? ()
#23 0x0000000000000000 in ?? ()


Tcb->State == 2 (TCP_SYN_SENT) at ASSERT.

Comment 15 Oliver Steffen 2022-02-11 10:12:14 UTC
I took a look at the package capture
The order of TCP responses seems in the wrong order on the firmware side.

┌────┬────────────┬──────┬─────────────────────────────────────────────────────────────┬─────────┬─────────┐
│ No.│       Time │Length│Info                                                         │      SEQ│      ACK│
├────┼────────────┼──────┼─────────────────────────────────────────────────────────────┼─────────┼─────────┤
│    │            │      │ ( ... downloading grubx64.efi ... )                         │         │         │
│1236│19.995672084│    54│1646  >  8080 [ACK] Seq=102 Ack=2594486 Win=64207 Len=0      │392587732│115357342│
│1237│19.996215825│    54│1646  >  8080 [RST] Seq=102 Win=65535 Len=0                  │392587732│115357342│
│    │            │      │ ( ^ Download complete. Client resets connection instread of FIN )     │         │
│    │            │      │ ( starting grub ... )                                       │         │         │
│1238│20.023909816│    42│(ARP) Who has 192.168.42.1? Tell 192.168.42.18               │         │         │
│1239│20.023918190│    42│(ARP) 192.168.42.1 is at 62:8d:41:dd:01:c7                   │         │         │
│1240│20.024184674│    58│1647  >  8080 [SYN] Seq=0 Win=65535 Len=0 MSS=1460           │392589778│        0│
│1241│20.024196726│    58│8080  >  1647 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460│241569928│392589779│
│1242│20.024272977│    54│1647  >  8080 [ACK] Seq=1 Ack=1 Win=65535 Len=0              │392589779│241569929│
│1243│20.024526108│   158│1647  >  8080 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=104       │392589779│241569929│
│    │            │      │              ( ^ HEAD grub.cfg )                            │         │         │
│1244│20.024532734│    54│8080  >  1647 [ACK] Seq=1 Ack=105 Win=64136 Len=0            │241569929│392589883│
│1245│20.024753511│   255│8080  >  1647 [PSH, ACK] Seq=1 Ack=105 Win=64136 Len=201     │241569929│392589883│
│    │            │      │              ( ^ Response to HEAD )                         │         │         │
│1246│20.024781818│    54│8080  >  1647 [FIN, ACK] Seq=202 Ack=105 Win=64136 Len=0     │241570130│392589883│
│    │            │      │              ( ^ Server closes connection )                 │         │         │
│1247│20.024804128│    54│1647  >  8080 [ACK] Seq=105 Ack=202 Win=65334 Len=0          │392589883│241570130│
│    │            │      │              ( ^ Client ACK for No. 1245 )                  │         │         │
│1248│20.025066997│   157│1647  >  8080 [PSH, ACK] Seq=105 Ack=202 Win=65334 Len=103   │392589883│241570130│
│    │            │      │              ( ^ Client reuses conn. for GET request )      │         │         │
│1249│20.025075322│    54│8080  >  1647 [RST] Seq=202 Win=0 Len=0                      │241570130│        0│
│    │            │      │              ( ^ Server complains )                         │         │         │
│1250│20.025080407│    54│1647  >  8080 [ACK] Seq=208 Ack=203 Win=65333 Len=0          │392589986│241570131│
│    │            │      │              ( ^ Client ACK for No. 1246: FIN from server ) │         │         │
│1251│20.025082495│    54│8080  >  1647 [RST] Seq=203 Win=0 Len=0                      │241570131│        0│
│    │            │      │              ( ^ Server complains about No. 1250, conn. was already reset )     │
│1252│20.026932227│    54│1647  >  8080 [RST] Seq=208 Win=65535 Len=0                  │392589986│241570131│
│    │            │      │              ( ASSERT )                                     │         │         │
└────┴────────────┴──────┴─────────────────────────────────────────────────────────────┴─────────┴─────────┘
"1646  >  8080": client to server
"8080  >  1646": server to client


I put a bunch of print statements (debug console) the http and some of the tcp code.
This slows everyhting down noticably.
In this case the situation is different: The TCP communication is in a different order.

┌────┬────────────┬──────┬─────────────────────────────────────────────────────────────┬──────────┬──────────┐
│ No.│        Time│Length│Info                                                         │       SEQ│       ACK│
├────┼────────────┼──────┼─────────────────────────────────────────────────────────────┼──────────┼──────────┤
│    │            │      │ ( ... downloading grubx64.efi ... )                         │          │          │
│2376│37.015509174│    54│1840  >  8080 [ACK] Seq=106 Ack=2600457 Win=64075 Len=0      │1145620198│1090026340│
│2377│37.033942580│    54│1840  >  8080 [ACK] Seq=106 Ack=2602446 Win=65007 Len=0      │1145620198│1090028329│
│2378│37.052953349│    54│1840  >  8080 [RST] Seq=106 Win=65535 Len=0                  │1145620198│1090028329│
│    │            │      │ ( ^ Download complete. Client resets  instread of FIN )     │          │          |
│    │            │      │ ( starting grub ... )                                       │          │          │
│2379│37.129416802│    42│(ARP) Who has 192.168.42.1? Tell 192.168.42.18               │          │          │
│2380│37.129428482│    42│(ARP) 192.168.42.1 is at 42:07:6e:ac:bc:04                   │          │          │
│2381│37.130443090│    58│1841  >  8080 [SYN] Seq=0 Win=65535 Len=0 MSS=1460           │1145630440│         0│
│2382│37.130459835│    58│8080  >  1841 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460│2938323266│1145630441│
│2383│37.137178082│    54│1841  >  8080 [ACK] Seq=1 Ack=1 Win=65535 Len=0              │1145630441│2938323267│
│2384│37.146770500│   162│1841  >  8080 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=108       │1145630441│2938323267│
│    │            │      │              ( ^ HEAD grub.cfg )                            │          │          │
│2385│37.146788923│    54│8080  >  1841 [ACK] Seq=1 Ack=109 Win=64132 Len=0            │2938323267│1145630549│
│2386│37.147065843│   255│8080  >  1841 [PSH, ACK] Seq=1 Ack=109 Win=64132 Len=201     │2938323267│1145630549│
│    │            │      │              ( ^ Response to HEAD )                         │          │          │
│2387│37.147098195│    54│8080  >  1841 [FIN, ACK] Seq=202 Ack=109 Win=64132 Len=0     │2938323468│1145630549│
│2388│37.161140215│    54│1841  >  8080 [ACK] Seq=109 Ack=202 Win=65334 Len=0          │1145630549│2938323468│
│    │            │      │              ( ^ Client ACKs 2386 )                         │          │          │
│2389│37.163976426│    54│1841  >  8080 [ACK] Seq=109 Ack=203 Win=65333 Len=0          │1145630549│2938323469│
│    │            │      │              ( ^ Client ACKs 2387: FIN )                    │          │          │
│2390│37.169382000│    54│1841  >  8080 [RST] Seq=109 Win=65535 Len=0                  │1145630549│2938323469│
│    │            │      │              ( ^ Client uses RST instread of FIN )          │          │          │
│2391│37.178297699│    58│[TCP Port numbers reused] 1841  >  8080                      |          |          |
│    │            │      │                     [SYN] Seq=0 Win=65535 Len=0 MSS=1460    │1145632488│2938323469│
│2392│37.178318945│    58│8080  >  1841 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460│2939071032│1145632489│
│2393│37.185358807│    54│1841  >  8080 [ACK] Seq=1 Ack=1 Win=65535 Len=0              │1145632489│2939071033│
│2394│37.195330842│   161│1841  >  8080 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=107       │1145632489│2939071033│
│    │            │      │              ( ^ GET grub.cfg )                             │          │          │
│2395│37.195349491│    54│8080  >  1841 [ACK] Seq=1 Ack=108 Win=64133 Len=0            │2939071033│1145632596│
│    │            │      │              ( ... )                                        │          │          │
└────┴────────────┴──────┴─────────────────────────────────────────────────────────────┴──────────┴──────────┘
"1840  >  8080": client to server
"8080  >  1840": server to client

Comment 17 Gerd Hoffmann 2022-02-14 12:22:01 UTC
(In reply to Oliver Steffen from comment #15)
> I took a look at the package capture
> The order of TCP responses seems in the wrong order on the firmware side.

The ordering of the packets in the capture file can differ from the
processing order in the firmware.  Specifically it is quite likely that
the firmware has sent out 1248 before processing the incoming 1246.

In general edk2 doesn't use interrupts (timer interrupt is the exception
to the rule).  So the nic is polled for new packets, and we could have this
sequence of events:

  (1) edk2 polls nic.
  (2) 1246 arrives.
  (3) 1248 is sent.
  (4) edk2 polls nic (and finds 1246).

Comment 18 Oliver Steffen 2022-02-23 08:44:07 UTC
A variant of this problem was already discovered in 2020:
https://bugzilla.tianocore.org/show_bug.cgi?id=2720

Comment 19 Oliver Steffen 2022-02-23 09:29:38 UTC
Created attachment 1862881 [details]
Patch for edk2-tests: Use HTTP/1.1 in python server

Workaround: Use the working version 1.1 of the HTTP protocol on the web server side.

Comment 20 Gerd Hoffmann 2022-02-23 14:22:51 UTC
> Patch for edk2-tests: Use HTTP/1.1 in python server

That works indeed.  Well, the testcase fails a bit later now. when grub tries to fetch a command list ...

Comment 21 Oliver Steffen 2022-02-23 16:57:18 UTC
I get that too now when re-running the test.
It worked fine when I tried it earlier when I created the
patch. That might happen, we already know that this
problem is timing related.

The python web server now closes the connection after
the 404-response (which, according to some googling, is
how some server implementations handle it),
and we end up in the same situation again.

But this time the server even announces that it will
close the connection via the header:

HTTP/1.1 404 File not found
Server: SimpleHTTP/0.6 Python/3.10.2
Date: Wed, 23 Feb 2022 14:40:28 GMT
Connection: close
Content-Type: text/html;charset=utf-8
Content-Length: 469

So it seems like neither HTTP/1.0 not 1.1 are correctly implemented.
Regardless of the (potentially messed up) processing order of TCP segments
the program should not have reused the connection.

I am not sure who's responsibility it is to watch
the headers and reopen the connections.
In the upstream discussion mentioned above[1]
the proposed fix was to have the user of the EFI_HTTP_PROTOCOL
do that. In there discussed case the change was to HttpBootDxe,
which uses DxeHttpIoLib and not the EFI_HTTP_PROTOCOL directly.
In our case that would mean the RHEL grub would have to take care of
all that.

From a user point of view I would expect that EFI_HTTP_PROTOCOL
should handle that.

Side note: nginx does not close the connection on 404, and thus the
test succeeds.

[1] https://bugzilla.tianocore.org/show_bug.cgi?id=2720

Comment 22 Oliver Steffen 2022-02-23 17:00:06 UTC
Created attachment 1863040 [details]
Patch for edk2-tests: Use nginx as http server

For cross checks and testing only.
nginx seems a rather heavy dependency for edk2-tests.

Comment 23 Gerd Hoffmann 2022-02-24 09:13:45 UTC
> I am not sure who's responsibility it is to watch
> the headers and reopen the connections.
> In the upstream discussion mentioned above[1]
> the proposed fix was to have the user of the EFI_HTTP_PROTOCOL
> do that. In there discussed case the change was to HttpBootDxe,
> which uses DxeHttpIoLib and not the EFI_HTTP_PROTOCOL directly.
> In our case that would mean the RHEL grub would have to take care of
> all that.

Hmm, seems processing the response headers happens one layer up indeed.
See 'git grep HTTP_HEADER_CONTENT_LENGTH -- NetworkPkg'

So DxeHttpIoLib being responsible for finding "Connection: close" kind-of
makes sense.  On the other hand the HttpVersion is not exported by HttpDxe,
so turning off pipelining for http/1.0 servers must be handled on that
level anyway.

I guess I need to browse the source code a bit more ...

Comment 24 Oliver Steffen 2022-02-24 09:43:39 UTC
Also I do not see a good way of re-opening the connection for the
user of EFI_HTTP_PROTOCOL, without doing a full

  Http->Configure(Http, NULL);
  Http->Configure(Http, &Config);

This is not something one wants to do between each single request (HTTP/1.0 case).

For HTTP/1.1, the user still needs to check headers and status codes, and
do the re-configure (above).

To reduce the number of HTTP requests, one could skip the HEAD and use
GET immediately. Content-Length and status (404 or not) can also be
obtained from the GET response. But in the end, this does not solve anything,
if multiple files are queried (as grub does).


I am also wondering what the motivation of the grub team was to switch to
the EFI_HTTP_PROTOCOL instead of staying with the http implementation of
upstream grub.

Comment 25 Oliver Steffen 2022-03-02 11:34:06 UTC
The patches in PR [1] fix the following:
- Detection of server side HTTP version. If version is
  not 1.1, do not reuse the TCP connection for the next request.
  This makes it possible to use a 1.0 web server as described
  above.
- Detect "Connection: close" headers sent by HTTP 1.1 servers and
  close the TCP connection after the current request in that case.
  Some HTTP 1.1 servers close the connection if a 404-error occurs.
  Since Grub probes for a set of config files which may or may not exist
  ths might happen.
- Fix re-configuring of the HTTP instance. The TCP protocol was not
  de-configured correctly.

Open ToDos:
- Check if similar problems exist when using IPv6 (probably!)
- Also check HTTPS, for both IPv4 and IPv6.

Current open problem:

Firmware/grub does not close the connection after loading the last
file and booting the kernel. This leaves an HTTP 1.1 server
waiting on the abandoned connection for a long time.
The python web server is single threaded and blocks all other requests
in that period.
The connection should be closed properly before launching the kernel.


[1] https://github.com/tianocore/edk2/pull/2564

Comment 28 Oliver Steffen 2022-03-04 18:20:45 UTC
Posted to devel list: https://edk2.groups.io/g/devel/message/87283

Comment 29 Oliver Steffen 2022-04-12 08:10:58 UTC
Patches have been merged upstream: https://github.com/tianocore/edk2/pull/2756

Comment 35 Gerd Hoffmann 2022-06-21 09:18:19 UTC
drop '--test-runner=runner' from the script.

Comment 36 Gerd Hoffmann 2022-06-21 09:38:26 UTC
fyi: background story why this is there:
https://github.com/avocado-framework/avocado/issues/5407

Comment 37 yduan 2022-06-21 13:56:43 UTC
(In reply to Gerd Hoffmann from comment #35)
> drop '--test-runner=runner' from the script.

I updated test-rpms.sh and got following results, maybe need further updates.

# cat test-rpm.sh
...
avocado run $tags *-{aa64,arm,ia32,x64}.py
#avocado run --test-runner=runner $tags *-{aa64,arm,ia32,x64}.py
...

# ./test-rpms.sh /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net
###
### conf: /usr/share/qemu/firmware/50-edk2-ovmf.json
### code: /usr/share/edk2/ovmf/OVMF_CODE.secboot.fd
### vars: /usr/share/edk2/ovmf/OVMF_VARS.fd
### arch: x86_64
### tags: -t arch:x86_64,machine:q35,network:virtio-net
###
JOB ID     : 93e213b0a4f3a0a79220fe427108aed380bdfbd1
JOB LOG    : /root/avocado/job-results/job-2022-06-21T09.49-93e213b/job.log
 (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED
 (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.08 s)
 (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: STARTED
 (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s)
 (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED
 (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s)
 (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: STARTED
 (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s)
 (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: STARTED
 (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s)
 (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: STARTED
 (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s)
 (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: STARTED
 (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s)
 (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: STARTED
 (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s)
 (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: STARTED
 (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: ERROR: __init__() got an unexpected keyword argument 'run.results_dir' (0.02 s)
RESULTS    : PASS 0 | ERROR 9 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML   : /root/avocado/job-results/job-2022-06-21T09.49-93e213b/results.html
JOB TIME   : 13.47 s

Test summary:
network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: ERROR
network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: ERROR
network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: ERROR
network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: ERROR
network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR
network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: ERROR
network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: ERROR
network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: ERROR
network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: ERROR
###
### rc 1
###

Comment 39 Gerd Hoffmann 2022-06-22 08:34:47 UTC
>  (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED
>  (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: ERROR: __init__()
> got an unexpected keyword argument 'run.results_dir' (0.08 s)

Had that too recently.  I think simply updating avocato-framework (0.97) fix that for me.

Comment 42 yduan 2022-06-24 07:15:12 UTC
After installing qemu/python/qemu and virt-firmware, got following results:

# ./test-rpms.sh /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net
###
### conf: /usr/share/qemu/firmware/50-edk2-ovmf.json
### code: /usr/share/edk2/ovmf/OVMF_CODE.secboot.fd
### vars: /usr/share/edk2/ovmf/OVMF_VARS.fd
### arch: x86_64
### tags: -t arch:x86_64,machine:q35,network:virtio-net
###
JOB ID     : cd5d8875fbc47e11990917c5fd0f4f65b8901935
JOB LOG    : /root/avocado/job-results/job-2022-06-24T03.03-cd5d887/job.log
 (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED
 (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: INTERRUPTED: timeout (60.02 s)
 (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: STARTED
 (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: INTERRUPTED: timeout (60.02 s)
 (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED
 (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: INTERRUPTED: timeout (60.01 s)
 (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: STARTED
 (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: INTERRUPTED: timeout (60.02 s)
 (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: STARTED
 (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: SKIP: no iso image
 (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: STARTED
 (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: SKIP: wip
 (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: STARTED
 (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: SKIP: no iso image
 (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: STARTED
 (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: SKIP: no iso image
 (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: STARTED
 (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: PASS (26.12 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 4 | WARN 0 | INTERRUPT 4 | CANCEL 0
JOB HTML   : /root/avocado/job-results/job-2022-06-24T03.03-cd5d887/results.html
JOB TIME   : 290.55 s
###
### rc 8
###

Comment 43 Gerd Hoffmann 2022-06-24 11:13:02 UTC
>  (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED
>  (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: INTERRUPTED: timeout
> (60.02 s)

>  (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: STARTED
>  (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: SKIP: no iso
> image

Oh, it needs the test images, "make extract boot-iso" fetches them.
And the tftp test needs a check to see whenever the images needed
are there, simliar to the iso test, /me goes fix that.

Comment 44 yduan 2022-06-27 03:15:24 UTC
Latest results after fetching test images through "make extract boot-iso":

# ./test-rpms.sh /usr/share/qemu/firmware/50-edk2-ovmf.json network:virtio-net
###
### conf: /usr/share/qemu/firmware/50-edk2-ovmf.json
### code: /usr/share/edk2/ovmf/OVMF_CODE.secboot.fd
### vars: /usr/share/edk2/ovmf/OVMF_VARS.fd
### arch: x86_64
### tags: -t arch:x86_64,machine:q35,network:virtio-net
###
JOB ID     : 7f26f9831bb2bb9da65291007fbc5cb446cd0e41
JOB LOG    : /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/job.log
 (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: STARTED
 (1/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp4: PASS (30.48 s)
 (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: STARTED
 (2/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_tftp6: PASS (48.48 s)
 (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED
 (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: PASS (37.25 s)
 (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: STARTED
 (4/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_iso: PASS (40.05 s)
 (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: STARTED
 (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR: Command 'echo -ne "GET / HTTP/1.0\r\n\r\n" | /usr/bin/nsenter -n -U --preserve-credentials -t 113258 openssl s_client -CAfile /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/test-results/5-network-ns-x64.py_TestNetNS.test_ovmf_q35_https4_iso/tmp_di... (7.53 s)
 (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: STARTED
 (6/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_dir: SKIP: wip
 (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: STARTED
 (7/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http6_iso: INTERRUPTED: timeout (60.01 s)
 (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: STARTED
 (8/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: ERROR: Command 'echo -ne "GET / HTTP/1.0\r\n\r\n" | /usr/bin/nsenter -n -U --preserve-credentials -t 113425 openssl s_client -CAfile /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/test-results/8-network-ns-x64.py_TestNetNS.test_ovmf_q35_https6_iso/tmp_di... (6.89 s)
 (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: STARTED
 (9/9) network-x64.py:TestNet.test_ovmf_q35_virtio_net_pcie: PASS (26.03 s)
RESULTS    : PASS 5 | ERROR 2 | FAIL 0 | SKIP 1 | WARN 0 | INTERRUPT 1 | CANCEL 0
JOB HTML   : /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/results.html
JOB TIME   : 273.77 s

Test summary:
network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR
network-ns-x64.py:TestNetNS.test_ovmf_q35_https6_iso: ERROR
###
### rc 9
###

Comment 45 Gerd Hoffmann 2022-06-27 07:47:32 UTC
>  (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED
>  (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: PASS (37.25 s)

Good.  That is the testcase which was broken due to the bug.

>  (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR: Command
> 'echo -ne "GET / HTTP/1.0\r\n\r\n" | /usr/bin/nsenter -n -U
> --preserve-credentials -t 113258 openssl s_client -CAfile
> /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/test-results/5-
> network-ns-x64.py_TestNetNS.test_ovmf_q35_https4_iso/tmp_di... (7.53 s)

Oh.  https server sanity check fails.  Never saw this one before.  Can I get the full debug.log?

Comment 47 yduan 2022-06-28 01:51:04 UTC
(In reply to Gerd Hoffmann from comment #45)
> >  (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED
> >  (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: PASS (37.25 s)
> 
> Good.  That is the testcase which was broken due to the bug.

This means the verification is PASS. Right?

> 
> >  (5/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_https4_iso: ERROR: Command
> > 'echo -ne "GET / HTTP/1.0\r\n\r\n" | /usr/bin/nsenter -n -U
> > --preserve-credentials -t 113258 openssl s_client -CAfile
> > /root/avocado/job-results/job-2022-06-26T23.08-7f26f98/test-results/5-
> > network-ns-x64.py_TestNetNS.test_ovmf_q35_https4_iso/tmp_di... (7.53 s)
> 
> Oh.  https server sanity check fails.  Never saw this one before.  Can I get
> the full debug.log?

test_ovmf_q35_https4_iso.debug.log is attached.

Thanks!

Comment 48 Gerd Hoffmann 2022-06-29 07:35:12 UTC
(In reply to yduan from comment #47)
> (In reply to Gerd Hoffmann from comment #45)
> > >  (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: STARTED
> > >  (3/9) network-ns-x64.py:TestNetNS.test_ovmf_q35_http4_dir: PASS (37.25 s)
> > 
> > Good.  That is the testcase which was broken due to the bug.
> 
> This means the verification is PASS. Right?

Yes.

> > Oh.  https server sanity check fails.  Never saw this one before.  Can I get
> > the full debug.log?
> 
> test_ovmf_q35_https4_iso.debug.log is attached.

thanks.

Comment 49 yduan 2022-06-30 01:53:35 UTC
Change to VERIFIED based on comment 44 and comment 48.

Comment 52 errata-xmlrpc 2022-11-15 09:56:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (edk2 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:7971


Note You need to log in before you can comment on or make changes to this bug.