Bug 1301593

Summary: Windows inspection fails with: guestfsd: error: readdir: Invalid or incomplete multibyte or wide character
Product: Red Hat Enterprise Linux 7 Reporter: Richard W.M. Jones <rjones>
Component: libguestfs-winsupportAssignee: Richard W.M. Jones <rjones>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.3CC: mtessun, mxie, mzhan, nicolas, ptoscano, tzheng, xiaodwan, yoguo
Target Milestone: rc   
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: V2V P2V
Fixed In Version: libguestfs-winsupport-7.2-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 16:52:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
virt-v2v conversion log
none
test1.img.xz none

Description Richard W.M. Jones 2016-01-25 13:17:57 UTC
Created attachment 1117985 [details]
virt-v2v conversion log

Description of problem:

During a virt-p2v conversion, Windows inspection fails with an ntfs-3g error:

libguestfs: trace: case_sensitive_path "/WINDOWS/system32/config"
guestfsd: main_loop: proc 38 (is_dir) took 0.00 seconds
guestfsd: main_loop: new request, len 0x44
guestfsd: error: readdir: Invalid or incomplete multibyte or wide character
libguestfs: trace: case_sensitive_path = NULL (error)

The full log supplied by the user is attached.

Version-Release number of selected component (if applicable):

virt-v2v 1.28.1.1.55
CentOS 7.2

The physical server being converted is Windows Server 2003.

How reproducible:

For the user this is 100% reproducible.  I have not managed to
reproduce it myself yet.

Comment 1 Richard W.M. Jones 2016-01-25 13:20:28 UTC
I'm adding V2V & P2V whiteboard flags, but this is not specifically
about v2v (or even about libguestfs).

Comment 5 Richard W.M. Jones 2016-01-25 13:52:39 UTC
Our working theory is that the c:\windows\system32 directory
contains a file called "Chaînes.scf".  I have attached the directory
listings supplied by the user as private attachments.

Comment 6 Richard W.M. Jones 2016-01-25 14:39:42 UTC
Created attachment 1118059 [details]
test1.img.xz

(In reply to Richard W.M. Jones from comment #5)
> Our working theory is that the c:\windows\system32 directory
> contains a file called "Chaînes.scf".  I have attached the directory
> listings supplied by the user as private attachments.

This theory was wrong, but I have reproduced the problem
by deliberately creating a malformed NTFS partition.  I did
this by touching a file called "/test/pqrst" and then (using
a hex editor) modifying the file "t" character on disk to be the
invalid[1] UCS2 character U+DF00.

The xz-compressed disk image is attached.

Opening the disk image in guestfish causes the error for various
commands, eg:

><fs> ll /test
libguestfs: error: ll: ls: reading directory /sysroot/test: Invalid or incomplete multibyte or wide character
><fs> case-sensitive-path /test/a
libguestfs: error: case_sensitive_path: readdir: Invalid or incomplete multibyte or wide character

So I suspect that the reporter's disk image contains some illegal
filename.  What's interesting is that Windows 2003 seems quite
happy, so the filename is only illegal for ntfs-3g and not for
Windows.  Or maybe this is a bug in ntfs-3g and the filename is not
illegal at all.

[1] http://www.fileformat.info/info/unicode/char/df00/index.htm

Comment 8 Richard W.M. Jones 2016-06-22 13:19:33 UTC
Long thread discussing this:

https://www.mail-archive.com/ntfs-3g-devel@lists.sourceforge.net/msg01174.html

The outcome were a couple of patches which went upstream (in ntfs-3g).
Including these in ntfs-3g (or libguestfs-winsupport) ought to make
our handling of these guests more robust.

Unfortunately the problem with this plan is that libguestfs-winsupport
is not on the RHEL 7.3 ACL.

commit d9c61dd60ec484909f70b7a916ada3a93af94b60
Author: Erik Larsson <*@*>
Date:   Fri Apr 8 05:39:48 2016 +0200

    unistr.c: Enable encoding broken UTF-16 into broken UTF-8, A.K.A. WTF-8.
    
    Windows filenames may contain invalid UTF-16 sequences (specifically
    broken surrogate pairs), which cannot be converted to UTF-8 if we do
    strict conversion.
    
    This patch enables encoding broken UTF-16 into similarly broken UTF-8 by
    encoding any surrogate character that don't have a match into a separate
    3-byte UTF-8 sequence.
    
    This is "sort of" valid UTF-8, but not valid Unicode since the code
    points used for surrogate pair encoding are not supposed to occur in a
    valid Unicode string... but on the other hand the source UTF-16 data is
    also broken, so we aren't really making things any worse.
    
    This format is sometimes referred to as WTF-8 (Wobbly Translation
    Format, 8-bit encoding) and is a common solution to represent broken
    UTF-16 as UTF-8.
    
    It is a lossless round-trip conversion, i.e converting from broken
    UTF-16 to "WTF-8" and back to UTF-16 yields the same broken UTF-16
    sequence. Because of this property it enables accessing these files
    by filename through ntfs-3g and the ntfsprogs (e.g. ls -la works as
    expected).
    
    To disable this behaviour you can pass the preprocessor/compiler flag
    '-DALLOW_BROKEN_SURROGATES=0' when building ntfs-3g.

commit f0370bfa9c47575d4e47c94e443aa91983683a43
Author: Erik Larsson <*@*>
Date:   Tue Apr 12 17:02:40 2016 +0200

    unistr.c: Unify the two defines NOREVBOM and ALLOW_BROKEN_SURROGATES.
    
    In the mailing list discussion we came to the conclusion that there
    doesn't seem to be any reason to keep these declarations separate since
    they address the same issue, namely libntfs-3g's tolerance for bad
    Unicode data in filenames and other UTF-16 strings in the file system,
    so merge the two defines into the new define ALLOW_BROKEN_UNICODE.

Comment 10 Richard W.M. Jones 2017-02-22 11:28:52 UTC
Reproducer using RHEL 7.4 host:

$ guestfish \
    set-program virt-foo : \
    add-ro test1.img : run : mount /dev/sda1 / : \
    ll /test
libguestfs: error: ll: ls: reading directory /sysroot/test: Invalid or incomplete multibyte or wide character

Comment 11 Richard W.M. Jones 2017-02-22 11:30:25 UTC
(In reply to Richard W.M. Jones from comment #10)
> Reproducer using RHEL 7.4 host:
> 
> $ guestfish \
>     set-program virt-foo : \
>     add-ro test1.img : run : mount /dev/sda1 / : \
>     ll /test
> libguestfs: error: ll: ls: reading directory /sysroot/test: Invalid or
> incomplete multibyte or wide character

I should have said, you must download the test image from
comment 6.

If you use the fixed package, you will see this output instead:

$ guestfish set-program virt-foo : add-ro test1.img : run : mount /dev/sda1 / : ll /test
total 4
drwxrwxrwx 1 root root    0 Jan 25  2016 .
drwxrwxrwx 1 root root 4096 Jan 25  2016 ..
-rwxrwxrwx 1 root root    0 Jan 25  2016 pqrs���

Comment 12 YongkuiGuo 2017-06-22 11:41:00 UTC
Verified with packages:
libguestfs-winsupport-7.2-2.el7.x86_64
libguestfs-1.36.3-5.el7.x86_64


1. Download the test1.img.xz in the attachment.
2. #xz -d test1.img.xz
3. # guestfish set-program virt-foo : add-ro test1.img : run : mount /dev/sda1 / : ll /test
--------------------------------------------------
drwxrwxrwx 1 root root    0 Jan 25  2016 .
drwxrwxrwx 1 root root 4096 Jan 25  2016 ..
-rwxrwxrwx 1 root root    0 Jan 25  2016 pqrs���
--------------------------------------------------

The guestfish command can be executed successfully. So verified this bug. 





Note: I have reproduced it with the package of libguestfs-winsupport-7.2-1.el7.x86_64.

Comment 13 errata-xmlrpc 2017-08-01 16:52:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1979