Bug 1808302 - [RHEL8] qemu-img human format not parseable: Invalid string format: 1eB
Summary: [RHEL8] qemu-img human format not parseable: Invalid string format: 1eB
Keywords:
Status: CLOSED DUPLICATE of bug 1807375
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-ironic-lib
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-28 08:41 UTC by Robin Cernin
Modified: 2020-02-28 12:27 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-28 12:27:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Robin Cernin 2020-02-28 08:41:37 UTC
Traceback in ironic/ironic-conductor.log during deployment:

~~~
 [req-c53baaf7-21ff-40c6-8734-4c3dd7b20cdc 2e1561e8f87744bea311b6ce7734240e c242b9ae848e418caddcd3e04adf21db - default default] Node 93fe2702-3696-4278-9d15-507ab646aa74 failed deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: Invalid string format: 1eB: ValueError: Invalid string format: 1eB
 Traceback (most recent call last):
   File "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 3914, in _do_next_deploy_step
     result = interface.execute_deploy_step(task, step)
   File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 360, in execute_deploy_step
     return self._execute_step(task, step)
   File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 283, in _execute_step
     return getattr(self, step['step'])(task)
   File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped
     result = f(*args, **kwargs)
   File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper
     return f(*args, **kwargs)
   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 422, in deploy
     deploy_utils.cache_instance_image(task.context, node)
   File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped
     result = f(*args, **kwargs)
   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 1197, in cache_instance_image
     force_raw)
   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 531, in fetch_images
     cache.fetch_image(href, path, ctx=ctx, force_raw=force_raw)
   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 140, in fetch_image
     href, master_path, dest_path, ctx=ctx, force_raw=force_raw)
   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 167, in _download_image
     _fetch(ctx, href, tmp_path, force_raw)
   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 321, in _fetch
     required_space = images.converted_size(path_tmp)
   File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 396, in converted_size
     data = disk_utils.qemu_img_info(path)
   File "/usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py", line 386, in qemu_img_info
     return imageutils.QemuImgInfo(out)
   File "/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py", line 62, in __init__
     details = self._parse(cmd_output or '')
   File "/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py", line 178, in _parse
     details = self._extract_details(root, root_details, lines)
   File "/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py", line 127, in _extract_details
     real_details = self._extract_bytes(root_details)
   File "/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py", line 113, in _extract_bytes
     return_int=True)
   File "/usr/lib/python3.6/site-packages/oslo_utils/strutils.py", line 225, in string_to_bytes
     raise ValueError(msg)
 ValueError: Invalid string format: 1eB
~~~

Investigation:

~~~
(Pdb) n
> / usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py(317)_fetch()
-> path_tmp = "%s.part" % path
(Pdb) l
312 return stat.f_frsize * stat.f_bavail
313
314
315 def _fetch(context, image_href, path, force_raw=False):
316         """Fetch image and convert to raw format if needed."""
317 -> path_tmp = "%s.part" % path
318         images.fetch(context, image_href, path_tmp, force_raw=False)
# Notes(yjiang5): If glance can provide the virtual size information,
319
320         # then we can firstly clean cache and then invoke images.fetch().
321 if force_raw:
322             required_space = images.converted_size(path_tmp)
(Pdb) p force_raw
True
(Pdb) n
> / usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py(318)_fetch()
-> images.fetch(context, image_href, path_tmp, force_raw=False)
(Pdb) p force_raw
True
(Pdb) s
--Call--
> / usr/lib/python3.6/site-packages/ironic/common/images.py(320)fetch()
-> def fetch(context, image_href, path, force_raw=False):
(Pdb) l
315 except processutils.ProcessExecutionError as e:
316                 LOG.exception("Creating ISO image failed.")
317 raise exception.ImageCreationFailed(image_type='iso', error=e)
318
319
320 -> def fetch(context, image_href, path, force_raw=False):
321         # TODO(vish): Improve context handling and add owner and auth data
322  # when it is added to glance.  Right now there is no
323  # auth checking in glance, so we assume that access was
324  # checked before we got here.
325         image_service = service.get_image_service(image_href,
(Pdb) p force_raw
False
(Pdb)
~~~

As you can see here, after stepping into images.fetch we are immediately missing force_raw,
and it's being reset to False by:

~~~
318         images.fetch(context, image_href, path_tmp, force_raw=False)
~~~

This is just a pseudo code execution, but same logic(replace qe_test with force_raw)
~~~
def _qe(qe_test=False):
    qe(qe_test=False)


def qe(qe_test=False):
    print(qe_test)


_qe(qe_test=True)
~~~

As you can see even if we set it to True, it will always be False.bit_length

~~~
sh-4.4$ python qe_test.py
False
~~~

We can fix this by changing:

~~~
> /usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py(318)_fetch()
-> images.fetch(context, image_href, path_tmp, force_raw=False)
~~~

To

~~~
> /usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py(318)_fetch()
-> images.fetch(context, image_href, path_tmp, force_raw)
~~~

However it introduces another problem because as you can see below path_tmp already
contains .part.

~~~
315 def _fetch(context, image_href, path, force_raw=False):
316         """Fetch image and convert to raw format if needed."""
317 -> path_tmp="%s.part" % path
318         images.fetch(context, image_href, path_tmp, force_raw=False)
~~~

We feed path_tmp to fetch method below as path:

~~~
320 def fetch(context, image_href, path, force_raw=False):
...
336 -> if force_raw:
337             image_to_raw(image_href, path, "%s.part" % path)
~~~

And if force_raw is True, which it is, path is appended another .part to it


(Pdb) p path_tmp
'/var/lib/ironic/master_images/tmp3bd03uc5/e48e1f98-2afe-40b6-bcf9-688c4203a5f1.part.part'
(Pdb)

We can fix this by changing

~~~
336 -> if force_raw:
337             image_to_raw(image_href, path, "%s.part" % path)
~~~

To

~~~
336 -> if force_raw:
337             image_to_raw(image_href, path, "%s" % path)
~~~

Next problem I can see is that we need to run qemu-img info to verify if the image
is alreayd raw or has backing file:

~~~
340  -> def image_to_raw(image_href, path, path_tmp):
341         with fileutils.remove_path_on_error(path_tmp):
342             data = disk_utils.qemu_img_info(path_tmp)
343             fmt = data.file_format
344             if fmt is None:
~~~

The qemu-img info on RHEL 8 returns:
~~~
383         out, err = utils.execute('env', 'LC_ALL=C', 'LANG=C',
384                                  'qemu-img', 'info', path,
385                                  prlimit=QEMU_IMG_LIMITS)
386  ->     return imageutils.QemuImgInfo(out)

> /usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py(386)qemu_img_info()
-> return imageutils.QemuImgInfo(out)
(Pdb) p out
'image: /var/lib/ironic/master_images/tmpj2m19d28/e48e1f98-2afe-40b6-bcf9-688c4203a5f1.part\nfile format: qcow2\nvirtual size: 3.84 GiB (4127195136 bytes)\ndisk size: 1e+03 MiB\ncluster_size: 65536\nFormat speci
fic information:\n    compat: 1.1\n    lazy refcounts: false\n    refcount bits: 16\n    corrupt: false\n'
~~~

Which causes problem because parses doesn't recognize 'e' character.

~~~
disk size: 1e+03 MiB
~~~

In parser we have human read-able output:
~~~
(Pdb) 
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py(62)__init__()
-> details = self._parse(cmd_output or '')

 61             else:
 62  ->             details = self._parse(cmd_output or '')
 63                 self.image = details.get('image')
 64                 self.backing_file = details.get('backing_file')
 65                 self.file_format = details.get('file_format')
 66                 self.virtual_size = details.get('virtual_size')
 67                 self.cluster_size = details.get('cluster_size')
~~~

We are sending this output to _parse:
~~~
(Pdb) p cmd_output
'image: /var/lib/ironic/master_images/tmphcz_ry4w/e48e1f98-2afe-40b6-bcf9-688c4203a5f1.part\nfile format: qcow2\nvirtual size: 3.84 GiB (4127195136 bytes)\ndisk size: 1e+03 MiB\ncluster_size: 65536\nFormat speci
fic information:\n    compat: 1.1\n    lazy refcounts: false\n    refcount bits: 16\n    corrupt: false\n'
(Pdb)
~~~

And we know the _parse is the one who fails here:
~~~
(Pdb) p self._parse(cmd_output or '')
*** ValueError: Invalid string format: 1eB
~~~

Code of _parse:

~~~
159  ->     def _parse(self, cmd_output):
160             # Analysis done of qemu-img.c to figure out what is going on here
161             # Find all points start with some chars and then a ':' then a newline
162             # and then handle the results of those 'top level' items in a separate
163             # function.
164             #
(Pdb) l
165             # TODO(harlowja): newer versions might have a json output format
166             #                 we should switch to that whenever possible.
167             #                 see: http://bit.ly/XLJXDX
168             contents = {}
169             lines = [x for x in cmd_output.splitlines() if x.strip()]
170             while lines:
171                 line = lines.pop(0)
172                 top_level = self.TOP_LEVEL_RE.match(line)
173                 if top_level:
174                     root = self._canonicalize(top_level.group(1))
175                     if not root:
176                         continue
177                     root_details = top_level.group(2).strip()
178                     details = self._extract_details(root, root_details, lines)
179                     contents[root] = details
180             return contents
~~~

We are parsing:

~~~
(Pdb) p line
'disk size: 1e+03 MiB'
(Pdb) l
167             #                 see: http://bit.ly/XLJXDX
168             contents = {}
169             lines = [x for x in cmd_output.splitlines() if x.strip()]
170             while lines:
171                 line = lines.pop(0)
172  ->             top_level = self.TOP_LEVEL_RE.match(line)
173                 if top_level:
174                     root = self._canonicalize(top_level.group(1))
175                     if not root:
176                         continue
177                     root_details = top_level.group(2).strip()
(Pdb) 
(Pdb) p self.TOP_LEVEL_RE
re.compile('^([\\w\\d\\s\\_\\-]+):(.*)$')
(Pdb) p top_level
<_sre.SRE_Match object; span=(0, 20), match='disk size: 1e+03 MiB'>
(Pdb) p root
'disk_size'
(Pdb) p root_details
'1e+03 MiB'
~~~

We are getting closer to the problem now:

~~~
[EOF]
(Pdb) p self._extract_details(root, root_details, lines)
*** ValueError: Invalid string format: 1eB
(Pdb) 
(Pdb) 
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py(117)_extract_details()
-> if root_cmd == 'backing_file':
(Pdb) p root_cmd
'disk_size'
(Pdb) p real_details
'1e+03 MiB'
122  ->         elif root_cmd in ['virtual_size', 'cluster_size', 'disk_size']:
123                 # Replace it with the byte amount (if we can convert it)
124                 if root_details in ('None', 'unavailable'):
125                     real_details = 0
126                 else:
127                     real_details = self._extract_bytes(root_details)
(Pdb) p self._extract_bytes(root_details)
*** ValueError: Invalid string format: 1eB
~~~

Even closer, we know the problem is in _extract_bytes.

Code of _extract_bytes

~~~
 97  ->     def _extract_bytes(self, details):
 98             # Replace it with the byte amount
 99             real_size = self.SIZE_RE.search(details)
100             if not real_size:
101                 raise ValueError(_('Invalid input value "%s".') % details)
102             magnitude = real_size.group(1)
(Pdb) 
103             unit_of_measure = real_size.group(2)
104             bytes_info = real_size.group(3)
105             if bytes_info:
106                 return int(real_size.group(4))
107             elif not unit_of_measure:
108                 return int(magnitude)
109             # Allow abbreviated unit such as K to mean KB for compatibility.
110             if len(unit_of_measure) == 1 and unit_of_measure != 'B':
111                 unit_of_measure += 'B'
112             return strutils.string_to_bytes('%s%s' % (magnitude, unit_of_measure),
113                                             return_int=True)
(Pdb)
~~~

And here is the real problem:

~~~
(Pdb) p self.SIZE_RE.search(details)
<_sre.SRE_Match object; span=(0, 2), match='1e'>
(Pdb)
(Pdb) 
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py(110)_extract_bytes()
-> if len(unit_of_measure) == 1 and unit_of_measure != 'B':
(Pdb) l
105             if bytes_info:
106                 return int(real_size.group(4))
107             elif not unit_of_measure:
108                 return int(magnitude)
109             # Allow abbreviated unit such as K to mean KB for compatibility.
110  ->         if len(unit_of_measure) == 1 and unit_of_measure != 'B':
111                 unit_of_measure += 'B'
112             return strutils.string_to_bytes('%s%s' % (magnitude, unit_of_measure),
113                                             return_int=True)
114  
115         def _extract_details(self, root_cmd, root_details, lines_after):
(Pdb) p unit_of_measure
'e'
(Pdb)
~~~

We will add 'B' to the unit_of_measure, hence 'eB' and return it together with magnitude
 as an int:

~~~
-> return strutils.string_to_bytes('%s%s' % (magnitude, unit_of_measure),
(Pdb) l
107             elif not unit_of_measure:
108                 return int(magnitude)
109             # Allow abbreviated unit such as K to mean KB for compatibility.
110             if len(unit_of_measure) == 1 and unit_of_measure != 'B':
111                 unit_of_measure += 'B'
112  ->         return strutils.string_to_bytes('%s%s' % (magnitude, unit_of_measure),
113                                             return_int=True)
114  
115         def _extract_details(self, root_cmd, root_details, lines_after):
116             real_details = root_details
117             if root_cmd == 'backing_file':
(Pdb) p magnitude
'1'
(Pdb) p unit_of_measure
'eB'
~~~

'1eB'

And here is where we get the 'ValueError: Invalid string format: 1eB'

So the problem seems in the qemu-img returning the disk size in '1e+03 MiB'

~~~
# qemu-img info /var/lib/ironic/master_images/tmptfe753fb/e48e1f98-2afe-40b6-bcf9-688c4203a5f1.part
image: /var/lib/ironic/master_images/tmptfe753fb/e48e1f98-2afe-40b6-bcf9-688c4203a5f1.part
file format: qcow2
virtual size: 3.84 GiB (4127195136 bytes)
-> disk size: 1e+03 MiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
~~~

We don't have the same problem using 'json'

~~~
qemu-img info --output=json /var/lib/ironic/master_images/tmptfe753fb/e48e1f98-2afe-40b6-bcf9-688c4203a5f1.part
{
    "virtual-size": 4127195136,
    "filename": "/var/lib/ironic/master_images/tmptfe753fb/e48e1f98-2afe-40b6-bcf9-688c4203a5f1.part",
    "cluster-size": 65536,
    "format": "qcow2",
    "actual-size": 1048510464,
    "format-specific": {
        "type": "qcow2",
        "data": {
            "compat": "1.1",
            "lazy-refcounts": false,
            "refcount-bits": 16,
            "corrupt": false
        }
    },
    "dirty-flag": false
}
~~~

I have forced qemu-img ofmt to be 'json' in /usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py

~~~
383         out, err = utils.execute('env', 'LC_ALL=C', 'LANG=C',
384                                  'qemu-img', 'info', path,
385                                  prlimit=QEMU_IMG_LIMITS)
386  ->     return imageutils.QemuImgInfo(out)
~~~

To

~~~
383         out, err = utils.execute('env', 'LC_ALL=C', 'LANG=C',
384                                  'qemu-img', 'info', '--output=json', path,
385                                  prlimit=QEMU_IMG_LIMITS)
386  ->     return imageutils.QemuImgInfo(out, format='json')
~~~


DISCLAIMER: All the changes were done in the lab to investigate the issue and
were never recommended. This is only to help you figure out the real fix.

Environment information:

~~~
# cat /etc/*release
NAME="Red Hat Enterprise Linux"
VERSION="8.1 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.1"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.1 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.1:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.1
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.1"
Red Hat Enterprise Linux release 8.1 (Ootpa)
Red Hat OpenStack Platform release 16.0.1 (Train)
Red Hat Enterprise Linux release 8.1 (Ootpa)
~~~

Images on Undercloud:
~~~
# podman ps | grep ironic
476ec58cc733  undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-ironic-conductor:16.0-77           kolla_start           6 hours ago  Up 8 minutes ago         ironic_conductor
493e6a47ea70  undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-nova-compute-ironic:16.0-77        kolla_start           2 days ago   Up 6 hours ago           nova_compute
~~~

Comment 1 Bob Fournier 2020-02-28 12:27:17 UTC
This is being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1807375 and the fix is available, marking as duplicate.

*** This bug has been marked as a duplicate of bug 1807375 ***


Note You need to log in before you can comment on or make changes to this bug.