Bug 1301593
Summary: | Windows inspection fails with: guestfsd: error: readdir: Invalid or incomplete multibyte or wide character | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Richard W.M. Jones <rjones> | ||||||
Component: | libguestfs-winsupport | Assignee: | Richard W.M. Jones <rjones> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.3 | CC: | mtessun, mxie, mzhan, nicolas, ptoscano, tzheng, xiaodwan, yoguo | ||||||
Target Milestone: | rc | ||||||||
Target Release: | 7.4 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | V2V P2V | ||||||||
Fixed In Version: | libguestfs-winsupport-7.2-2.el7 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-08-01 16:52:00 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
I'm adding V2V & P2V whiteboard flags, but this is not specifically about v2v (or even about libguestfs). Our working theory is that the c:\windows\system32 directory contains a file called "Chaînes.scf". I have attached the directory listings supplied by the user as private attachments. Created attachment 1118059 [details] test1.img.xz (In reply to Richard W.M. Jones from comment #5) > Our working theory is that the c:\windows\system32 directory > contains a file called "Chaînes.scf". I have attached the directory > listings supplied by the user as private attachments. This theory was wrong, but I have reproduced the problem by deliberately creating a malformed NTFS partition. I did this by touching a file called "/test/pqrst" and then (using a hex editor) modifying the file "t" character on disk to be the invalid[1] UCS2 character U+DF00. The xz-compressed disk image is attached. Opening the disk image in guestfish causes the error for various commands, eg: ><fs> ll /test libguestfs: error: ll: ls: reading directory /sysroot/test: Invalid or incomplete multibyte or wide character ><fs> case-sensitive-path /test/a libguestfs: error: case_sensitive_path: readdir: Invalid or incomplete multibyte or wide character So I suspect that the reporter's disk image contains some illegal filename. What's interesting is that Windows 2003 seems quite happy, so the filename is only illegal for ntfs-3g and not for Windows. Or maybe this is a bug in ntfs-3g and the filename is not illegal at all. [1] http://www.fileformat.info/info/unicode/char/df00/index.htm Long thread discussing this: https://www.mail-archive.com/ntfs-3g-devel@lists.sourceforge.net/msg01174.html The outcome were a couple of patches which went upstream (in ntfs-3g). Including these in ntfs-3g (or libguestfs-winsupport) ought to make our handling of these guests more robust. Unfortunately the problem with this plan is that libguestfs-winsupport is not on the RHEL 7.3 ACL. commit d9c61dd60ec484909f70b7a916ada3a93af94b60 Author: Erik Larsson <*@*> Date: Fri Apr 8 05:39:48 2016 +0200 unistr.c: Enable encoding broken UTF-16 into broken UTF-8, A.K.A. WTF-8. Windows filenames may contain invalid UTF-16 sequences (specifically broken surrogate pairs), which cannot be converted to UTF-8 if we do strict conversion. This patch enables encoding broken UTF-16 into similarly broken UTF-8 by encoding any surrogate character that don't have a match into a separate 3-byte UTF-8 sequence. This is "sort of" valid UTF-8, but not valid Unicode since the code points used for surrogate pair encoding are not supposed to occur in a valid Unicode string... but on the other hand the source UTF-16 data is also broken, so we aren't really making things any worse. This format is sometimes referred to as WTF-8 (Wobbly Translation Format, 8-bit encoding) and is a common solution to represent broken UTF-16 as UTF-8. It is a lossless round-trip conversion, i.e converting from broken UTF-16 to "WTF-8" and back to UTF-16 yields the same broken UTF-16 sequence. Because of this property it enables accessing these files by filename through ntfs-3g and the ntfsprogs (e.g. ls -la works as expected). To disable this behaviour you can pass the preprocessor/compiler flag '-DALLOW_BROKEN_SURROGATES=0' when building ntfs-3g. commit f0370bfa9c47575d4e47c94e443aa91983683a43 Author: Erik Larsson <*@*> Date: Tue Apr 12 17:02:40 2016 +0200 unistr.c: Unify the two defines NOREVBOM and ALLOW_BROKEN_SURROGATES. In the mailing list discussion we came to the conclusion that there doesn't seem to be any reason to keep these declarations separate since they address the same issue, namely libntfs-3g's tolerance for bad Unicode data in filenames and other UTF-16 strings in the file system, so merge the two defines into the new define ALLOW_BROKEN_UNICODE. Reproducer using RHEL 7.4 host: $ guestfish \ set-program virt-foo : \ add-ro test1.img : run : mount /dev/sda1 / : \ ll /test libguestfs: error: ll: ls: reading directory /sysroot/test: Invalid or incomplete multibyte or wide character (In reply to Richard W.M. Jones from comment #10) > Reproducer using RHEL 7.4 host: > > $ guestfish \ > set-program virt-foo : \ > add-ro test1.img : run : mount /dev/sda1 / : \ > ll /test > libguestfs: error: ll: ls: reading directory /sysroot/test: Invalid or > incomplete multibyte or wide character I should have said, you must download the test image from comment 6. If you use the fixed package, you will see this output instead: $ guestfish set-program virt-foo : add-ro test1.img : run : mount /dev/sda1 / : ll /test total 4 drwxrwxrwx 1 root root 0 Jan 25 2016 . drwxrwxrwx 1 root root 4096 Jan 25 2016 .. -rwxrwxrwx 1 root root 0 Jan 25 2016 pqrs��� Verified with packages: libguestfs-winsupport-7.2-2.el7.x86_64 libguestfs-1.36.3-5.el7.x86_64 1. Download the test1.img.xz in the attachment. 2. #xz -d test1.img.xz 3. # guestfish set-program virt-foo : add-ro test1.img : run : mount /dev/sda1 / : ll /test -------------------------------------------------- drwxrwxrwx 1 root root 0 Jan 25 2016 . drwxrwxrwx 1 root root 4096 Jan 25 2016 .. -rwxrwxrwx 1 root root 0 Jan 25 2016 pqrs��� -------------------------------------------------- The guestfish command can be executed successfully. So verified this bug. Note: I have reproduced it with the package of libguestfs-winsupport-7.2-1.el7.x86_64. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1979 |
Created attachment 1117985 [details] virt-v2v conversion log Description of problem: During a virt-p2v conversion, Windows inspection fails with an ntfs-3g error: libguestfs: trace: case_sensitive_path "/WINDOWS/system32/config" guestfsd: main_loop: proc 38 (is_dir) took 0.00 seconds guestfsd: main_loop: new request, len 0x44 guestfsd: error: readdir: Invalid or incomplete multibyte or wide character libguestfs: trace: case_sensitive_path = NULL (error) The full log supplied by the user is attached. Version-Release number of selected component (if applicable): virt-v2v 1.28.1.1.55 CentOS 7.2 The physical server being converted is Windows Server 2003. How reproducible: For the user this is 100% reproducible. I have not managed to reproduce it myself yet.