Bug 1301593 - Windows inspection fails with: guestfsd: error: readdir: Invalid or incomplete multibyte or wide character
Windows inspection fails with: guestfsd: error: readdir: Invalid or incomplet...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libguestfs-winsupport (Show other bugs)
7.3
Unspecified Unspecified
medium Severity unspecified
: rc
: 7.4
Assigned To: Richard W.M. Jones
Virtualization Bugs
V2V P2V
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-25 08:17 EST by Richard W.M. Jones
Modified: 2017-08-01 12:52 EDT (History)
8 users (show)

See Also:
Fixed In Version: libguestfs-winsupport-7.2-2.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 12:52:00 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
virt-v2v conversion log (89.53 KB, text/plain)
2016-01-25 08:17 EST, Richard W.M. Jones
no flags Details
test1.img.xz (101.28 KB, application/x-xz)
2016-01-25 09:39 EST, Richard W.M. Jones
no flags Details

  None (edit)
Description Richard W.M. Jones 2016-01-25 08:17:57 EST
Created attachment 1117985 [details]
virt-v2v conversion log

Description of problem:

During a virt-p2v conversion, Windows inspection fails with an ntfs-3g error:

libguestfs: trace: case_sensitive_path "/WINDOWS/system32/config"
guestfsd: main_loop: proc 38 (is_dir) took 0.00 seconds
guestfsd: main_loop: new request, len 0x44
guestfsd: error: readdir: Invalid or incomplete multibyte or wide character
libguestfs: trace: case_sensitive_path = NULL (error)

The full log supplied by the user is attached.

Version-Release number of selected component (if applicable):

virt-v2v 1.28.1.1.55
CentOS 7.2

The physical server being converted is Windows Server 2003.

How reproducible:

For the user this is 100% reproducible.  I have not managed to
reproduce it myself yet.
Comment 1 Richard W.M. Jones 2016-01-25 08:20:28 EST
I'm adding V2V & P2V whiteboard flags, but this is not specifically
about v2v (or even about libguestfs).
Comment 5 Richard W.M. Jones 2016-01-25 08:52:39 EST
Our working theory is that the c:\windows\system32 directory
contains a file called "Chaînes.scf".  I have attached the directory
listings supplied by the user as private attachments.
Comment 6 Richard W.M. Jones 2016-01-25 09:39 EST
Created attachment 1118059 [details]
test1.img.xz

(In reply to Richard W.M. Jones from comment #5)
> Our working theory is that the c:\windows\system32 directory
> contains a file called "Chaînes.scf".  I have attached the directory
> listings supplied by the user as private attachments.

This theory was wrong, but I have reproduced the problem
by deliberately creating a malformed NTFS partition.  I did
this by touching a file called "/test/pqrst" and then (using
a hex editor) modifying the file "t" character on disk to be the
invalid[1] UCS2 character U+DF00.

The xz-compressed disk image is attached.

Opening the disk image in guestfish causes the error for various
commands, eg:

><fs> ll /test
libguestfs: error: ll: ls: reading directory /sysroot/test: Invalid or incomplete multibyte or wide character
><fs> case-sensitive-path /test/a
libguestfs: error: case_sensitive_path: readdir: Invalid or incomplete multibyte or wide character

So I suspect that the reporter's disk image contains some illegal
filename.  What's interesting is that Windows 2003 seems quite
happy, so the filename is only illegal for ntfs-3g and not for
Windows.  Or maybe this is a bug in ntfs-3g and the filename is not
illegal at all.

[1] http://www.fileformat.info/info/unicode/char/df00/index.htm
Comment 8 Richard W.M. Jones 2016-06-22 09:19:33 EDT
Long thread discussing this:

https://www.mail-archive.com/ntfs-3g-devel@lists.sourceforge.net/msg01174.html

The outcome were a couple of patches which went upstream (in ntfs-3g).
Including these in ntfs-3g (or libguestfs-winsupport) ought to make
our handling of these guests more robust.

Unfortunately the problem with this plan is that libguestfs-winsupport
is not on the RHEL 7.3 ACL.

commit d9c61dd60ec484909f70b7a916ada3a93af94b60
Author: Erik Larsson <*@*>
Date:   Fri Apr 8 05:39:48 2016 +0200

    unistr.c: Enable encoding broken UTF-16 into broken UTF-8, A.K.A. WTF-8.
    
    Windows filenames may contain invalid UTF-16 sequences (specifically
    broken surrogate pairs), which cannot be converted to UTF-8 if we do
    strict conversion.
    
    This patch enables encoding broken UTF-16 into similarly broken UTF-8 by
    encoding any surrogate character that don't have a match into a separate
    3-byte UTF-8 sequence.
    
    This is "sort of" valid UTF-8, but not valid Unicode since the code
    points used for surrogate pair encoding are not supposed to occur in a
    valid Unicode string... but on the other hand the source UTF-16 data is
    also broken, so we aren't really making things any worse.
    
    This format is sometimes referred to as WTF-8 (Wobbly Translation
    Format, 8-bit encoding) and is a common solution to represent broken
    UTF-16 as UTF-8.
    
    It is a lossless round-trip conversion, i.e converting from broken
    UTF-16 to "WTF-8" and back to UTF-16 yields the same broken UTF-16
    sequence. Because of this property it enables accessing these files
    by filename through ntfs-3g and the ntfsprogs (e.g. ls -la works as
    expected).
    
    To disable this behaviour you can pass the preprocessor/compiler flag
    '-DALLOW_BROKEN_SURROGATES=0' when building ntfs-3g.

commit f0370bfa9c47575d4e47c94e443aa91983683a43
Author: Erik Larsson <*@*>
Date:   Tue Apr 12 17:02:40 2016 +0200

    unistr.c: Unify the two defines NOREVBOM and ALLOW_BROKEN_SURROGATES.
    
    In the mailing list discussion we came to the conclusion that there
    doesn't seem to be any reason to keep these declarations separate since
    they address the same issue, namely libntfs-3g's tolerance for bad
    Unicode data in filenames and other UTF-16 strings in the file system,
    so merge the two defines into the new define ALLOW_BROKEN_UNICODE.
Comment 10 Richard W.M. Jones 2017-02-22 06:28:52 EST
Reproducer using RHEL 7.4 host:

$ guestfish \
    set-program virt-foo : \
    add-ro test1.img : run : mount /dev/sda1 / : \
    ll /test
libguestfs: error: ll: ls: reading directory /sysroot/test: Invalid or incomplete multibyte or wide character
Comment 11 Richard W.M. Jones 2017-02-22 06:30:25 EST
(In reply to Richard W.M. Jones from comment #10)
> Reproducer using RHEL 7.4 host:
> 
> $ guestfish \
>     set-program virt-foo : \
>     add-ro test1.img : run : mount /dev/sda1 / : \
>     ll /test
> libguestfs: error: ll: ls: reading directory /sysroot/test: Invalid or
> incomplete multibyte or wide character

I should have said, you must download the test image from
comment 6.

If you use the fixed package, you will see this output instead:

$ guestfish set-program virt-foo : add-ro test1.img : run : mount /dev/sda1 / : ll /test
total 4
drwxrwxrwx 1 root root    0 Jan 25  2016 .
drwxrwxrwx 1 root root 4096 Jan 25  2016 ..
-rwxrwxrwx 1 root root    0 Jan 25  2016 pqrs���
Comment 12 YongkuiGuo 2017-06-22 07:41:00 EDT
Verified with packages:
libguestfs-winsupport-7.2-2.el7.x86_64
libguestfs-1.36.3-5.el7.x86_64


1. Download the test1.img.xz in the attachment.
2. #xz -d test1.img.xz
3. # guestfish set-program virt-foo : add-ro test1.img : run : mount /dev/sda1 / : ll /test
--------------------------------------------------
drwxrwxrwx 1 root root    0 Jan 25  2016 .
drwxrwxrwx 1 root root 4096 Jan 25  2016 ..
-rwxrwxrwx 1 root root    0 Jan 25  2016 pqrs���
--------------------------------------------------

The guestfish command can be executed successfully. So verified this bug. 





Note: I have reproduced it with the package of libguestfs-winsupport-7.2-1.el7.x86_64.
Comment 13 errata-xmlrpc 2017-08-01 12:52:00 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1979

Note You need to log in before you can comment on or make changes to this bug.