Bug 2184201 - Deploying a compute node fails with: Error: 'utf-8' codec can't encode characters in position 2623-2624: surrogates not allowed _try_preserve_efi_assets
Summary: Deploying a compute node fails with: Error: 'utf-8' codec can't encode charac...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-python-agent
Version: 16.2 (Train)
Hardware: x86_64
OS: All
medium
high
Target Milestone: z6
: 16.2 (Train on RHEL 8.4)
Assignee: Julia Kreger
QA Contact: James E. LaBarre
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-03 22:41 UTC by Mircea Vutcovici
Modified: 2023-11-08 19:19 UTC (History)
6 users (show)

Fixed In Version: openstack-ironic-python-agent-5.0.5-2.20230502215002.8330df9.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 19:18:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 879897 0 None MERGED Fix UTF-16 result handling for efibootmgr 2023-05-10 14:03:05 UTC
Red Hat Issue Tracker OSP-23934 0 None None None 2023-04-03 22:42:54 UTC
Red Hat Knowledge Base (Solution) 7004609 0 None None None 2023-04-03 22:42:31 UTC
Red Hat Product Errata RHBA-2023:6307 0 None None None 2023-11-08 19:19:13 UTC

Description Mircea Vutcovici 2023-04-03 22:41:55 UTC
Description of problem:
When enabling debug log, in ironic, the /var/log/containers/ironic/journal records the following exception:
~~~
DEBUG ironic_python_agent.extensions.image [-] Exception encountered while attempting to setup the EFI loader from a root filesystem. Error: 'utf-8' codec can't encode characters in position 2623-2624: surrogates not allowed _try_preserve_efi_assets /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py:755
~~~

This error is caused by the following UEFI boot entry:
mvutcovi@supportshell-1:~/03465225/0220-efi-firmware-variables-hv57$ hexdump -C ./sys/firmware/efi/efivars/Boot0021-8be4df61-93ca-11d2-aa0d-00e098032b8c
00000000  07 00 00 00 00 01 00 00  2c 00 54 00 72 00 69 00  |........,.T.r.i.|
00000010  67 00 67 00 65 00 72 00  20 00 72 00 65 00 ff 00  |g.g.e.r. .r.e...|
00000020  64 00 79 00 2d 00 74 00  6f 00 2d 00 62 00 6f 00  |d.y.-.t.o.-.b.o.|
00000030  6f 00 74 00 20 00 65 00  76 00 65 00 6e 00 74 00  |o.t. .e.v.e.n.t.|
00000040  00 00 04 07 14 00 35 7b  bb cd 33 68 d6 4e 9a b2  |......5{..3h.N..|
00000050  57 d2 ac dd f6 f0 04 06  14 00 b0 aa ff 4a 76 13  |W............Jv.|
00000060  b4 44 9c 6e e9 23 88 75  1b c6 7f ff 04 00        |.D.n.#.u......|
0000006e
[supportshell-1.sush-001.prod.us-west-2.aws.redhat.com] [17:20:55+0000]

The command "efibootmgr -v"
mvutcovi@supportshell-1:~/03465225$ cat 0200-efibootmgr-v-hv57-hexdump.txt|sed -r 's/ \|.*//;s/  / /g'|grep -v 000009c8|xxd -r|grep Boot0021
Boot0021  Trigger reÿdy-to-boot event   FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(4affaab0-1376-44b4-9c6e-e92388751bc6)

The problem is that ironic_python_agent is trying to run _efi_boot_setup(device, efi_system_part_uuid), which fails not because efibootmgr has failed, but because python could not parse it's output. Also ironic_python_agent is continuing with GRUB installation, leading to more errors.

The solution would be to either notify the user about the broken UEFI boot entry, or catch it properly and log a warning message with details about the entry and the problematic character position in the entry.

Also please note that in the future we might see more and more UEFI boot entries that are internationalized.

https://github.com/openstack/ironic-python-agent/blob/0bf579c955477da9a43e546703146b8b2b24d05f/ironic_python_agent/extensions/image.py#L227
            efi_preserved = _try_preserve_efi_assets(
                device, path, efi_system_part_uuid,
                efi_partition, efi_partition_mount_point)
            if efi_preserved:
                _append_uefi_to_fstab(path, efi_system_part_uuid)
                # Success preserving efi assets
                return
            else:
                # Failure, either via exception or not found
                # which in this case the partition needs to be
                # remounted.
                LOG.debug('No EFI assets were preserved for setup or the '
                          'ramdisk was unable to complete the setup. '
                          'falling back to bootloader installation from '
                          'deployed image.')
                _mount_partition(root_partition, path)


https://github.com/openstack/ironic-python-agent/blob/0bf579c955477da9a43e546703146b8b2b24d05f/ironic_python_agent/extensions/image.py#L471
           try:
                # Since we have preserved the assets, we should be able
                # to call the _efi_boot_setup method to scan the device
                # and add loader entries
                efi_preserved = _efi_boot_setup(device, efi_system_part_uuid)
                # Executed before the return so we don't return and then begin
                # execution.
                return efi_preserved
            except Exception as e:
                # Remount the partition and proceed as we were.
                LOG.debug('Exception encountered while attempting to '
                          'setup the EFI loader from a root '
                          'filesystem. Error: %s', e)

https://github.com/openstack/ironic-python-agent/blob/a1670753a23a79b6536f67eae9cca154e0ed2e65/ironic_python_agent/efi_utils.py#L273
def get_boot_records():
    """Executes efibootmgr and returns boot records.
    :return: an iterator yielding pairs (boot number, boot record).
    """
    efi_output = utils.execute('efibootmgr', '-v')
    for line in efi_output[0].split('\n'):
        match = _ENTRY_LABEL.match(line)
        if match is not None:
            yield (match[1], match[2])

Version-Release number of selected component (if applicable):


How reproducible:
All the time until the boot entry with untranslatable unicode characters is deleted

Steps to Reproduce:
1. create a boot entry on a physical machine that contains "ÿ" character
https://www.compart.com/en/unicode/U+00FF
2. Try provision this machine as a compute node
3.

Comment 1 Julia Kreger 2023-04-07 14:38:59 UTC
This issue appears to be rooted in python2/python3 differences. The underlying call was written for python2 compatibility, which truncates the data down to the UTF-8 character set by using os.fsdecode on the standard-error and standard-output content before returning to the caller. This occurs in the oslo.concurrency library.

It appears, just adding the binary=True option to the calls where we may get UTF-16 content, seems reasonable and the path forward.

I'm going to upload a work in progress patch into CI to see if this if making the overall change results in things working as expected. Having an automated low level test of this is not really possible, but the underlying library *does* actually have a test for the option, so it is more a question of compatibility for the agent code using the library with the different option.

Comment 2 Julia Kreger 2023-05-08 18:51:43 UTC
The fix has merged downstream, and is in the build pipeline process. It should be available in our next z stream.

Comment 19 errata-xmlrpc 2023-11-08 19:18:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.6 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6307


Note You need to log in before you can comment on or make changes to this bug.