Bug 1923971
Summary: | free(): invalid pointer in libguestfs Python bindings | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Richard W.M. Jones <rjones> | ||||||
Component: | libguestfs | Assignee: | Richard W.M. Jones <rjones> | ||||||
Status: | CLOSED CANTFIX | QA Contact: | YongkuiGuo <yoguo> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 8.3 | CC: | alex.iribarren, ben.morrice, breilly, egarver, fweimer, kkiwi, rjones, tkopecek, virt-maint, yoguo | ||||||
Target Milestone: | rc | Keywords: | Triaged | ||||||
Target Release: | 8.4 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2021-08-24 20:06:14 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Richard W.M. Jones
2021-02-02 10:57:33 UTC
Reporter says libguestfs-test-tool output is fine. libjansson 2.11-3.el8 I need to reproduce this one. Setting Triaged. Hi, we're seeing this issue in RHELBLD on a RHEL 8 host - see the linked RHELBLD issue. I was never able to reproduce this bug, and therefore diagnose it any further. Some kind of minimal reproducer would be greatly helpful. You could start by modifying this example: https://github.com/libguestfs/libguestfs/blob/master/python/examples/inspect_vm.py I notice the actual code which crashes: https://github.com/redhat-imaging/imagefactory/blob/master/imgfac/FactoryUtils.py#L22 does not use python_return_dict=True (so it invokes the older, and these days less tested path). It also uses g.add_drive_ro which is an older synonym for g.add_drive_opts(disk, readonly=1), but also a less tested path. I had a go at reproducing this, but I got nothing, on RHEL 8 or Fedora. I also examined the code (libguestfs.git/lib/info.c) but I don't see any obvious errors. As well as trying to find a reproducer, would suggest: - Upgrade libguestfs as you're using an old version. - Try downgrading jansson, since the crash is actually happening in that library. - Try a different version of Python, ie. platform-python vs one from a module. But what we really need most of all is a small reproducer. FWIW my non-reproducer was simply: $ truncate -s 1G test.raw $ python3 ./test.py where test.py is: import guestfs g = guestfs.GuestFS() g.add_drive_ro("test.raw") g.launch() You can also try different python interpreters, etc. I could not reproduce the bug here. A run under valgrind gave this error about the invalid free: ==2345280== Thread 2: ==2345280== Invalid free() / delete / delete[] / realloc() ==2345280== at 0x4C3610C: free (vg_replace_malloc.c:538) ==2345280== by 0x24240B98: json_delete (in /usr/lib64/libjansson.so.4.11.0) ==2345280== by 0x2396E0C4: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x239406B8: guestfs_disk_format (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x23975E40: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x23977117: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x23964599: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x239647CA: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x239650D7: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x238F4697: guestfs_add_drive_opts_argv (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x23965A84: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x238CEF33: guestfs_add_drive_ro (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== Address 0x1ddb7f31 is 1 bytes inside a block of size 128 alloc'd ==2345280== at 0x4C34F0B: malloc (vg_replace_malloc.c:307) ==2345280== by 0x2423AE30: ??? (in /usr/lib64/libjansson.so.4.11.0) ==2345280== by 0x2424020A: json_object (in /usr/lib64/libjansson.so.4.11.0) ==2345280== by 0x2423C6D4: ??? (in /usr/lib64/libjansson.so.4.11.0) ==2345280== by 0x2423C965: ??? (in /usr/lib64/libjansson.so.4.11.0) ==2345280== by 0x2423CC0C: json_loadb (in /usr/lib64/libjansson.so.4.11.0) ==2345280== by 0x2396DEF9: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x239612AF: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x23961C6C: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x2396DD0C: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x2396E005: ??? (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== by 0x239406B8: guestfs_disk_format (in /usr/lib64/libguestfs.so.0.507.0) ==2345280== Things I notice here: - Pointer was allocated in json_object() (as expected). - Pointer has somehow been incremented by 1 before being passed to free. Created attachment 1784004 [details] test.c The reason is ... #0 0x00007f1d7e1bd7a7 in json_object_iter_next () at /lib64/libjson-c.so.4 #1 0x00007f1d6a7d6efd in do_dump (json=json@entry=0x7f1d60186f70, flags=flags@entry=0, depth=depth@entry=0, parents=parents@entry=0x7f1d684e1650, dump=dump@entry=0x7f1d6a7d68c0 <dump_to_file>, data=data@entry=0x7f1d845b36e0 <_IO_2_1_stdout_>) at dump.c:400 If you look very closely you'll see frame 1 is in the Jansson library. Frame 0 is in a completely different and unrelated library (json-c) that happens to contain a symbol with the same name as one from Jansson (json_object_iter_next). Looking at /proc/pid/maps verifies that kojid is dynamically loading both libjansson and libjson-c. This causes memory corruption which creates the problem we've seen. CCing Dan Berrange. Dan: I'm fairly sure we've had this problem before ... with libvirt? Do you remember more about that? There are various reports of this on the web: https://github.com/json-c/json-c/issues/621 https://github.com/akheron/jansson/issues/536 A simple reproducer is that you can try on the Beaker machine is attached. You have to compile it this way to link both libraries: $ gcc -Wall -Werror test.c -ljson-c -ljansson -o test $ ./test Segmentation fault (core dumped) (Note if you link the libraries in the other order then it doesn't crash.) At the moment I don't know how to fix this, but at least I finally know what the problem is, which is a good start. Another thing I don't know is which particular Python dependency of Koji is using json-c. Just for the record json-c is coming to kojid from dnf, while jansson is from guestfs. @rjones thanks for this excellent debugging. It seems newer versions of json-c and jansson implement symbol versions, which would resolve this. Can RH8 get those updates in order to fix this issue? I don't know if there is a specific plan to do that. A possible workaround is to try importing both json extensions somewhere early on in the Koji code. Try importing them in one order, and then the other way around, and one way round ought to work. Created attachment 1804516 [details]
libdnf vs guestfs invalid pointer
@rjones - you are correct. At least in the context of koji, I have managed to work around this issue by simply adding 'import guestfs' at the beginning of kojid I've added a simple python script that illustrates the issue @tkopecek do you see a better way to work around this bug, or should I open an issue for tracking purposes at https://pagure.io/koji/issues? Let's open an issue for koji - it is not that easy (guestfs needn't be present for all builders, just image ones) but we should figure it out. Ok, I opened https://pagure.io/koji/issue/2964. Rich, I assigned this to you, since you identified and debugged the issue already, but my understanding is that the real fix for this would be elsewhere (updated packaging for jansson or json-c)? At any rate, this sounds sufficiently important that we need to prioritize it, even if we use this bug to track a workaround, and create another one to fix the underlying issue. Do you agree? This isn't really a bug in libguestfs, it's a bug caused by two JSON libraries having conflicting symbols so if (any) program happens to load both libraries at the same time then bad things happen. kojid was already using one of the JSON libraries via Python, and libguestfs happened to be the thing that pulled in the other JSON library so we get the blame for causing the crash, but it could equally well have been caused by anything else using the other JSON library. Interestingly it's been fixed already (in the two JSON libraries) in RHEL 9 by some ELF trickery. (In reply to Richard W.M. Jones from comment #18) > This isn't really a bug in libguestfs, it's a bug caused by > two JSON libraries having conflicting symbols so if (any) > program happens to load both libraries at the same time then > bad things happen. kojid was already using one of the JSON > libraries via Python, and libguestfs happened to be the thing > that pulled in the other JSON library so we get the blame for > causing the crash, but it could equally well have been caused > by anything else using the other JSON library. > > Interestingly it's been fixed already (in the two JSON libraries) > in RHEL 9 by some ELF trickery. We're in agreement. I just need to know how to wrap our work on this. i.e., is the issue opened elsewhere (https://pagure.io/koji/issue/2964) sufficient, or should we re-route this one to some RHEL toolchain component? Thanks. So how can we progress here? Are we tracking the fix for this anywhere else? What is the completion criteria? I'm setting sev/prio to low, assuming that this is no longer blocking any team. I don't know - there's nothing we can do to fix this in libguestfs, and in any case it's not a bug in libguestfs at all, so I'm going to close the bug. PSA: See bug 2001062. |