Description of problem: When using ntfsclone we observe this error which seems to indicate memory corruption in the binary: ntfsclone -o - --save-image --metadata --ignore-fs-check /dev/sda2 ntfsclone v2017.3.23 (libntfs-3g) NTFS volume version: 3.1 Cluster size : 4096 bytes Current volume size: 268402688 bytes (269 MB) Current device size: 268403200 bytes (269 MB) Scanning volume ... 0.00 percent completed^M100.00 percent completed Accounting clusters ... Space in use : 1 MB (0.2%) Scanning volume ... 0.00 percent completed^Mntfsclone: malloc.c:2385: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed. Version-Release number of selected component (if applicable): ntfsprogs-2017.3.23-6.fc29.x86_64 I recently upgraded this machine. This did NOT happen with the previous version (2:2017.3.23-4.fc28). However since I also upgraded a lot of other stuff including glibc, it might have been caused by another component. How reproducible: 100% Steps to Reproduce: 1. Run the libguestfs test suite in the tests/ntfs directory.
Created attachment 1459065 [details] Avoid malloc() of zero bytes Due to a recent change implying mallocation of buffers for clusters, there is a possibility that an malloc() of zero bytes is made when no backup bootsector is present in the image. This patch avoid this situation, though such an malloc() should be allowed (which means the patch might not address the real issue).
malloc(0) is valid in C, so although this may be some bug it's unlikely to be this bug. Here's a simple reproducer of the orginal bug that doesn't require any special privileges or devices: $ rm -f test.img $ truncate -s 1G test.img $ mkfs.ntfs -F test.img test.img is not a block device. mkntfs forced anyway. [...] $ ntfsclone -o clone --save-image --metadata --ignore-fs-check test.img ntfsclone v2017.3.23 (libntfs-3g) NTFS volume version: 3.1 Cluster size : 4096 bytes Current volume size: 1073737728 bytes (1074 MB) Current device size: 1073741824 bytes (1074 MB) Scanning volume ... 100.00 percent completed Accounting clusters ... Space in use : 1 MB (0.1%) Scanning volume ... ntfsclone: malloc.c:2385: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed. Aborted (core dumped) I was also able to get a stack trace although it's unfortunately missing vital debug information for some stack frames even though I believe I have all the required debuginfo installed: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f52db7f2835 in __GI_abort () at abort.c:79 #2 0x00007f52db851a0a in __malloc_assert ( assertion=assertion@entry=0x7f52db959ea0 "(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)", file=file@entry=0x7f52db9560d2 "malloc.c", line=line@entry=2385, function=function@entry=0x7f52db95a4f0 <__PRETTY_FUNCTION__.12924> "sysmalloc") at malloc.c:298 #3 0x00007f52db853daf in sysmalloc (nb=nb@entry=4112, av=av@entry=0x7f52dbb8dc60 <main_arena>) at malloc.c:2382 #4 0x00007f52db855146 in _int_malloc ( av=av@entry=0x7f52dbb8dc60 <main_arena>, bytes=bytes@entry=4096) at malloc.c:4111 #5 0x00007f52db856387 in __GI___libc_malloc (bytes=bytes@entry=4096) at malloc.c:3041 #6 0x00007f52dbbc1922 in ntfs_malloc (size=4096) at misc.c:57 #7 0x00005634397bb707 in ?? () #8 0x00005634397bcc29 in ?? () #9 0x00005634397ba2d6 in ?? () #10 0x00007f52db7f43b3 in __libc_start_main (main=0x5634397b92b0, argc=7, argv=0x7fff7baebe18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff7baebe08) at ../csu/libc-start.c:308 #11 0x00005634397babaa in ?? () I am using ntfs-3g-2017.3.23-6.fc29.x86_64
A few valgrinding errors too, although again the symbols are unfortunately unavailable for unknown reasons: ==8242== Syscall param read(buf) points to unaddressable byte(s) ==8242== at 0x517B3D5: read (read.c:26) ==8242== by 0x10C105: ??? (in /usr/sbin/ntfsclone) ==8242== by 0x10CE11: ??? (in /usr/sbin/ntfsclone) ==8242== by 0x10E81E: ??? (in /usr/sbin/ntfsclone) ==8242== by 0x10B2D5: ??? (in /usr/sbin/ntfsclone) ==8242== by 0x50B23B2: (below main) (libc-start.c:308) ==8242== Address 0x54c83c0 is 0 bytes after a block of size 1,024 alloc'd ==8242== at 0x4C2FB5B: malloc (vg_replace_malloc.c:299) ==8242== by 0x4E6A921: ntfs_malloc (misc.c:57) ==8242== by 0x10DF54: ??? (in /usr/sbin/ntfsclone) ==8242== by 0x10B2D5: ??? (in /usr/sbin/ntfsclone) ==8242== by 0x50B23B2: (below main) (libc-start.c:308) ==8242== valgrind: m_mallocfree.c:280 (mk_plain_bszB): Assertion 'bszB != 0' failed. valgrind: This is probably caused by your program erroneously writing past the end of a heap block and corrupting heap metadata. If you fix any invalid writes reported by Memcheck, this assertion failure will probably go away. Please try that before reporting this as a bug.
Created attachment 1459151 [details] Always allocate full clusters Ok. got it. When using --ignore-fs-check the rescue procedure processes full clusters, so full clusters must be allocated even when they are not fully used.
Scratch build containing this patch: https://koji.fedoraproject.org/koji/taskinfo?taskID=28335543
Can confirm that the patch works, I have pushed it to Fedora and issued a build in Rawhide. Please let us know when the final version of the patch goes upstream in case the patch in Fedora needs adjustment.
Fixed in rawhide, thanks Richard for handling this one while I was on vacation. :)