Bug 1487107

Summary: MAC-based grub2 config loading is broken in RHEL 7.4
Product: Red Hat Enterprise Linux 7 Reporter: Lukas Zapletal <lzap>
Component: grub2Assignee: Peter Jones <pjones>
Status: CLOSED DUPLICATE QA Contact: Release Test Team <release-test-team-automation>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: Andrzej.Kacprowski, doug.forster, lersek
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-26 11:52:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1487106    
Attachments:
Description Flags
Patch that fixes hardware address formating bug
none
v2: Fix for hardware address formating issue lersek: review+

Description Lukas Zapletal 2017-08-31 09:26:17 UTC
Description of problem:

We see weird behavior in Grub2. For some reason it tries to load a file with dash character in the end:

/grub2/grub.cfg-01-52-54-00-ac-d2-3d-

which apparently does not exist in our case.


 in.tftpd[135674]: RRQ from 192.168.100.15 filename grub2/shim.efi
 in.tftpd[135674]: tftp: client does not accept options
 in.tftpd[135675]: RRQ from 192.168.100.15 filename grub2/shim.efi
 in.tftpd[135675]: Client 192.168.100.15 finished grub2/shim.efi
 in.tftpd[135676]: RRQ from 192.168.100.15 filename grub2/grubx64.efi
 in.tftpd[135676]: Client 192.168.100.15 finished grub2/grubx64.efi
 in.tftpd[135677]: RRQ from 192.168.100.15 filename /grub2/grub.cfg-01-52-54-00-ac-d2-3d-
 in.tftpd[135677]: Client 192.168.100.15 File not found /grub2/grub.cfg-01-52-54-00-ac-d2-3d-
 in.tftpd[135678]: RRQ from 192.168.100.15 filename /grub2/grub.cfg-C0A8640F
 in.tftpd[135678]: Client 192.168.100.15 File not found /grub2/grub.cfg-C0A8640F
 in.tftpd[135679]: RRQ from 192.168.100.15 filename /grub2/grub.cfg-C0A8640
 in.tftpd[135679]: Client 192.168.100.15 File not found /grub2/grub.cfg-C0A8640
 in.tftpd[135680]: RRQ from 192.168.100.15 filename /grub2/grub.cfg-C0A864
 in.tftpd[135680]: Client 192.168.100.15 File not found /grub2/grub.cfg-C0A864
 in.tftpd[135681]: RRQ from 192.168.100.15 filename /grub2/grub.cfg-C0A86
 in.tftpd[135681]: Client 192.168.100.15 File not found /grub2/grub.cfg-C0A86
 in.tftpd[135682]: RRQ from 192.168.100.15 filename /grub2/grub.cfg-C0A8
 in.tftpd[135682]: Client 192.168.100.15 File not found /grub2/grub.cfg-C0A8
 in.tftpd[135683]: RRQ from 192.168.100.15 filename /grub2/grub.cfg-C0A
 in.tftpd[135683]: Client 192.168.100.15 File not found /grub2/grub.cfg-C0A
 in.tftpd[135684]: RRQ from 192.168.100.15 filename /grub2/grub.cfg-C0
 in.tftpd[135684]: Client 192.168.100.15 File not found /grub2/grub.cfg-C0
 in.tftpd[135685]: RRQ from 192.168.100.15 filename /grub2/grub.cfg-C
 in.tftpd[135685]: Client 192.168.100.15 File not found /grub2/grub.cfg-C
 in.tftpd[135686]: RRQ from 192.168.100.15 filename /grub2/grub.cfg
 in.tftpd[135686]: Client 192.168.100.15 finished /grub2/grub.cfg
 in.tftpd[135687]: RRQ from 192.168.100.15 filename /EFI/redhat/x86_64-efi/command.lst
 in.tftpd[135687]: Client 192.168.100.15 File not found /EFI/redhat/x86_64-efi/command.lst
 in.tftpd[135688]: RRQ from 192.168.100.15 filename /EFI/redhat/x86_64-efi/fs.lst
 in.tftpd[135688]: Client 192.168.100.15 File not found /EFI/redhat/x86_64-efi/fs.lst
 in.tftpd[135689]: RRQ from 192.168.100.15 filename /EFI/redhat/x86_64-efi/crypto.lst
 in.tftpd[135689]: Client 192.168.100.15 File not found /EFI/redhat/x86_64-efi/crypto.lst
 in.tftpd[135690]: RRQ from 192.168.100.15 filename /EFI/redhat/x86_64-efi/terminal.lst
 in.tftpd[135690]: Client 192.168.100.15 File not found /EFI/redhat/x86_64-efi/terminal.lst
 in.tftpd[135691]: RRQ from 192.168.100.15 filename /grub2/grub.cfg
 in.tftpd[135691]: Client 192.168.100.15 finished /grub2/grub.cfg

Analysis done by Laszlo Ersek in https://bugzilla.redhat.com/show_bug.cgi?id=873406#c12

This is the culprit grub2 patch (which you are running, but not looking at):

commit [irrelevant]
Author: Andrzej Kacprowski <andrzej.kacprowski>
Date:   Fri Apr 21 10:06:20 2017 +0200

    Add support for non-Ethernet network cards
    
    This patch replaces fixed 6-byte link layer address with
    up to 32-byte variable sized address.
    This allows supporting Infiniband and Omni-Path fabric
    which use 20-byte address, but other network card types
    can also take advantage of this change.
    The network card driver is responsible for replacing L2
    header provided by grub2 if needed.
    This approach is compatible with UEFI network stack which
    also allows up to 32-byte variable size link address.
    
    The BOOTP/DHCP packet format is limited to 16 byte client
    hardware address, if link address is more that 16-bytes
    then chaddr field in BOOTP it will be set to 0 as per rfc4390.
    
    Resolves: rhbz#1370642
    
    Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski>
    
    Conflicts:
            grub-core/net/ip.c

Namely, *before* applying this patch, the grub2 code had indeed matched what you quote above:

- "include/grub/net.h":

#define GRUB_NET_MAX_STR_HWADDR_LEN (sizeof ("XX:XX:XX:XX:XX:XX"))

- "grub-core/net/net.c", function grub_net_hwaddr_to_str():

        for (ptr = str, i = 0; i < ARRAY_SIZE (addr->mac); i++)
          {
            grub_snprintf (ptr, GRUB_NET_MAX_STR_HWADDR_LEN - (ptr - str),
                           "%02x:", addr->mac[i] & 0xff);
            ptr += (sizeof ("XX:") - 1);
          }

This loop
- formats every byte of the MAC addres with the format string "%02x:",
- and it relies on the GRUB_NET_MAX_STR_HWADDR_LEN macro to *prevent* the
  grub_snprintf() function from formatting the trailing colon (":") for the
  6th MAC address byte.

*After* the patch, the limit was raised like this (room for 32 octets):

-#define GRUB_NET_MAX_STR_HWADDR_LEN (sizeof ("XX:XX:XX:XX:XX:XX"))
+#define GRUB_NET_MAX_STR_HWADDR_LEN (sizeof (\
+       "XX:XX:XX:XX:XX:XX:XX:XX:"\
+       "XX:XX:XX:XX:XX:XX:XX:XX:"\
+       "XX:XX:XX:XX:XX:XX:XX:XX:"\
+       "XX:XX:XX:XX:XX:XX:XX:XX"))

And the loop was changed like this:

-    case GRUB_NET_LINK_LEVEL_PROTOCOL_ETHERNET:
-      {
-       char *ptr;
-       unsigned i;
-       for (ptr = str, i = 0; i < ARRAY_SIZE (addr->mac); i++)
-         {
-           grub_snprintf (ptr, GRUB_NET_MAX_STR_HWADDR_LEN - (ptr - str),
-                          "%02x:", addr->mac[i] & 0xff);
-           ptr += (sizeof ("XX:") - 1);
-         }
-      return;
-      }
+       str[0] = 0;
+       grub_printf (_("Unsupported hw address type %d len %d\n"),
+                   addr->type, addr->len);
+       return;
+    }
+  for (ptr = str, i = 0; i < addr->len; i++)
+    {
+      ptr += grub_snprintf (ptr, GRUB_NET_MAX_STR_HWADDR_LEN - (ptr - str),
+                    "%02x:", addr->mac[i] & 0xff);

The loop change is actually irrelevant; it preserved the same logic. The macro change is important however.

Namely, due to the new definition of GRUB_NET_MAX_STR_HWADDR_LEN, if you now format a MAC address that has strictly less than 32 octets, you will end up with a trailing colon (":"), simply because now the grub_snprintf() function has *room* for that colon. As I wrote above, the loop (both pre- and post-patch) relies on running out of room for *not* producing the last colon.

In brief, this regression was introduced in the fix for RHBZ#1370642.

Version-Release number of selected component (if applicable):

grub2-2.02-0.64.el7.x86_64

How reproducible:
Always, just PXE boot grub2.


Additional info:

Regression in RHEL 7.4 introduced in https://bugzilla.redhat.com/show_bug.cgi?id=1370642

Comment 2 Andrzej.Kacprowski 2017-08-31 16:53:50 UTC
Created attachment 1320661 [details]
Patch that fixes hardware address formating bug

Attached patch fixes net hardware address formatting regression.

Comment 4 Laszlo Ersek 2017-08-31 17:59:13 UTC
Andrzej,

(In reply to Andrzej.Kacprowski from comment #2)
> Created attachment 1320661 [details]
> Patch that fixes hardware address formating bug
> 
> Attached patch fixes net hardware address formatting regression.

is it guaranteed that "addr->len" is positive when reaching the loop?

Thanks
Laszlo

Comment 5 Andrzej.Kacprowski 2017-09-01 13:44:22 UTC
(In reply to Laszlo Ersek from comment #4)
> is it guaranteed that "addr->len" is positive when reaching the loop?

addr->len should not be 0 - network device without hardware address make no sense. But if for some reason (i.e. future grub2 change) addr->len is 0, then the grub_net_hwaddr_to_str() will not behave very well - that needs to be fixed.

Laszlo, 
Thanks for pointing this. 
I will provide improved patch.

Comment 6 Andrzej.Kacprowski 2017-09-01 13:52:28 UTC
Created attachment 1320974 [details]
v2: Fix for hardware address formating issue

v2 patch improves grub_net_hwaddr_to_str() function to handle zero length hardware address gracefully

Comment 7 Laszlo Ersek 2017-09-02 12:22:36 UTC
(In reply to Andrzej.Kacprowski from comment #6)
> Created attachment 1320974 [details]
> v2: Fix for hardware address formating issue
> 
> v2 patch improves grub_net_hwaddr_to_str() function to handle zero length
> hardware address gracefully

Reviewed-by: Laszlo Ersek <lersek>

Comment 8 Lukas Zapletal 2017-10-26 11:52:18 UTC
Allright, looking at this errata

https://access.redhat.com/errata/RHBA-2017:2950

I was scratching my head. Then I noticed that what I reported was filed as bug 1483740 few days ago and fixes in 7.4 as well. I think we can close this one, this has been now fixed in both 7.4 and 7.5. Many thanks for help!

*** This bug has been marked as a duplicate of bug 1483740 ***