Bug 1573555

Summary: _util.py:67:ensure_unicode_string:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte
Product: [Fedora] Fedora Reporter: Torgeir Veimo <torgeir>
Component: python-blivetAssignee: Blivet Maintenance Team <blivet-maint-list>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 28CC: amulhern, anaconda-maint-list, apevec, blivet-maint-list, clockfor, dshea, jkonecny, jonathan, jskarvad, junli, kellin, mkolman, rvykydal, sbueno, torgeir, vanmeeuwen+fedora, v.podzimek+fedora, vponcova, vtrefny, wwoods
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:177a60b4a5a57f84c9fd51f8e2b741ba07ba8de1;VARIANT_ID=workstation;
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-28 22:58:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: cgroup
none
File: cpuinfo
none
File: environ
none
File: mountinfo
none
File: namespaces
none
File: open_fds
none
File: os_info none

Description Torgeir Veimo 2018-05-01 16:54:48 UTC
Description of problem:
Just starting the installer (anaconda) from the activities drawer. 

I see the same problem with F27.

Version-Release number of selected component:
anaconda-core-28.22.10-1.fc28


Additional info:
cmdline:        /usr/bin/python3 /usr/bin/anaconda-cleanup anaconda --liveinst --method=livecd:///dev/mapper/live-base
crash_function: ensure_unicode_string
exception_type: UnicodeDecodeError
executable:     /usr/bin/anaconda-cleanup
interpreter:    python3-3.6.5-1.fc28.x86_64
kernel:         4.16.3-301.fc28.x86_64
runlevel:       N 5
type:           Python3
uid:            0

Comment 1 Torgeir Veimo 2018-05-01 16:54:53 UTC
Created attachment 1429371 [details]
File: cgroup

Comment 2 Torgeir Veimo 2018-05-01 16:54:54 UTC
Created attachment 1429372 [details]
File: cpuinfo

Comment 3 Torgeir Veimo 2018-05-01 16:54:56 UTC
Created attachment 1429373 [details]
File: environ

Comment 4 Torgeir Veimo 2018-05-01 16:54:57 UTC
Created attachment 1429374 [details]
File: mountinfo

Comment 5 Torgeir Veimo 2018-05-01 16:54:59 UTC
Created attachment 1429375 [details]
File: namespaces

Comment 6 Torgeir Veimo 2018-05-01 16:55:00 UTC
Created attachment 1429376 [details]
File: open_fds

Comment 7 Torgeir Veimo 2018-05-01 16:55:02 UTC
Created attachment 1429377 [details]
File: os_info

Comment 8 Torgeir Veimo 2018-05-01 16:57:45 UTC
I can provide remote login to this machine if that helps. There are windows 10 and mac partitions on this computer (dell 9010 sff) as well, not sure if it's relevant.

Comment 9 Torgeir Veimo 2018-05-01 23:42:03 UTC
Same thing happens in F27, but then the error comes in the console itself.

[root@hackintosh ~]# anaconda --loglevel debug
Starting installer, one moment...
anaconda 27.20.4-1 for anaconda bluesky (pre-release) started.
 * installation log files are stored in /tmp during the installation
 * shell is available on TTY2 and in second TMUX pane (ctrl+b, then press 2)
 * when reporting a bug add logs from /tmp as separate text/plain attachments
Traceback (most recent call last):
  File "/sbin/anaconda", line 658, in <module>
    matched = device_matches("LABEL=OEMDRV", disks_only=True)
  File "/usr/lib64/python3.6/site-packages/pyanaconda/storage_utils.py", line 897, in device_matches
    single_spec_matches = udev.resolve_glob(full_spec)
  File "/usr/lib/python3.6/site-packages/blivet/udev.py", line 155, in resolve_glob
    for dev in get_devices():
  File "/usr/lib/python3.6/site-packages/blivet/udev.py", line 73, in get_devices
    dev = device_to_dict(device)
  File "/usr/lib/python3.6/site-packages/blivet/udev.py", line 48, in device_to_dict
    result = dict(device.properties)
  File "/usr/lib/python3.6/site-packages/pyudev/device/_device.py", line 1085, in __getitem__
    return ensure_unicode_string(value)
  File "/usr/lib/python3.6/site-packages/pyudev/_util.py", line 67, in ensure_unicode_string
    value = value.decode(sys.getfilesystemencoding())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte

Comment 10 Torgeir Veimo 2018-05-02 01:09:14 UTC
Not entirely convinced this is due to errors in _util.py. I added logging to that file, but there's none produced.

Comment 11 mulhern 2018-05-03 16:55:57 UTC
Hi, pyudev maintainer here.

I doubt this is a pyudev error, but it might be a libudev error. Would you be able to run the following script

import pyudev

from pyudev import Context


def main():
    for device in Context().list_devices(subsystem="block"):
        properties = device.properties
        names = [n for n in properties]
        for prop_name in names:
            try:
                value = properties.get(prop_name)
            except UnicodeDecodeError as err:
                print("device: %s" % device)
                print("prop name: %s" % prop_name)
                raise


if __name__ == "__main__":
    main()

and let me know the output? Ideally, it will locate the particular block device and property value that is for some reason failing to be converted properly. Thanks!

Comment 12 Torgeir Veimo 2018-05-03 23:25:45 UTC
[root@localhost-live ~]# python3 test.py 
device: Device('/sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1')
prop name: PARTNAME
Traceback (most recent call last):
  File "test.py", line 20, in <module>
    main()
  File "test.py", line 12, in main
    value = properties.get(prop_name)
  File "/usr/lib64/python3.6/_collections_abc.py", line 660, in get
    return self[key]
  File "/usr/lib/python3.6/site-packages/pyudev/device/_device.py", line 1085, in __getitem__
    return ensure_unicode_string(value)
  File "/usr/lib/python3.6/site-packages/pyudev/_util.py", line 67, in ensure_unicode_string
    value = value.decode(sys.getfilesystemencoding())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte

Comment 13 Torgeir Veimo 2018-05-03 23:29:56 UTC
It looks like the string value of the property PARTNAME is simply "1".

Comment 14 Torgeir Veimo 2018-05-04 04:08:26 UTC
Correction, the value causing the problem seems to be 

b'\xc0#\\!U0!\xea!!\xdck!C\xa8\xc6!\xe7\xc5k/l?!!!\xb6]'

Comment 15 Torgeir Veimo 2018-05-04 04:38:53 UTC
I believe this might be a win 10 ntfs recovery partition.

Comment 16 mulhern 2018-05-04 13:25:55 UTC
It does decode in "latin-1"

>>> b'\xc0#\\!U0!\xea!!\xdck!C\xa8\xc6!\xe7\xc5k/l?!!!\xb6]'.decode('latin-1')
'À#\\!U0!ê!!Ük!C¨Æ!çÅk/l?!!!¶]'

Comment 17 Torgeir Veimo 2018-05-04 13:28:47 UTC
What's the best approach to have this code be resistant to such input data? Would be better if it just gave a warning and used an undecoded string so that installation can proceed?

Comment 18 mulhern 2018-05-04 13:44:55 UTC
pyudev doesn't log at all. It isn't clear what it should do in this situation.

If it is possible to find the "correct" encoding, it should do that. But that is unlikely to be always true.

Probably the best thing to do for the installer at this time is for blivet to catch the exception at around:

 File "/usr/lib/python3.6/site-packages/blivet/udev.py", line 73, in get_devices
    dev = device_to_dict(device)

and take whatever it considers to be the appropriate action for a failed construction of the property table for a particular device. OR blivet could take the step-by-step approach of the tests I wrote for constructing its dict. Then it can do whatever it wants w/ the particular property that can't be decoded and take it from there.

So it looks like it might make most sense to reassign to blivet at this time.

Comment 19 mulhern 2018-05-07 12:34:07 UTC
Reassigning, because I think blivet will always have to handle the possibility of pyudev decode failure, regardless of what changes may be made to pyudev.

Comment 20 David Lehman 2018-10-03 18:08:00 UTC
What about telling decode to handle errors some way other than by raising an exception? See https://docs.python.org/2/library/codecs.html#codec-base-classes

Passing errors='replace' to decode would allow pyudev to always present valid data.

Comment 21 mulhern 2018-10-10 14:01:04 UTC
I think that that is probably not a good idea. Objections are:

* It would constitute a significant change in behaviour. Clients typically object to that kind of thing.

* I don't think it would actually solve any problems/fix the bug. I think the root cause of the problem is that values can be set under one encoding and then read using another and that it is never known what the proper encoding for decoding really is (because devices can move in time and space and the values in udev properties and attributes are taken from many things).

* New clients of pyudev would rely on and be checking values that turned out to be suprising, because they had substitute characters. Eventually, but it would take longer than with an exception, they would notice that they were not getting what they expected and that would lead to a new set of bugs being filed.

Some sort of configuration parameter that allowed a client to explicitly change the behaviour in a global way might be possible, but it all seems like a long and tricky job.

Comment 22 mulhern 2018-10-10 14:02:48 UTC
dshea, just wondering if you had an opinion.

Comment 23 David Shea 2018-10-10 17:21:39 UTC
(In reply to mulhern from comment #22)
> dshea, just wondering if you had an opinion.

Do these strings need to be unique or reproducible? The problem I see with raising an exception is how is the caller supposed to handle it? You can't (or at least shouldn't) change the default encoding at runtime, so I don't see how blivet or another caller is expected to recover from the error.

Comment 24 Junxiang Li 2018-10-21 15:31:24 UTC
hi, any update or workaroud about this bug?

Comment 25 Ben Cotton 2019-05-02 20:36:34 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 26 Ben Cotton 2019-05-28 22:58:19 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.