Bug 1923971 - free(): invalid pointer in libguestfs Python bindings
Summary: free(): invalid pointer in libguestfs Python bindings
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libguestfs
Version: 8.3
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: 8.4
Assignee: Richard W.M. Jones
QA Contact: YongkuiGuo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-02 10:57 UTC by Richard W.M. Jones
Modified: 2021-09-03 16:58 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-24 20:06:14 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
test.c (601 bytes, text/plain)
2021-05-17 09:44 UTC, Richard W.M. Jones
no flags Details
libdnf vs guestfs invalid pointer (521 bytes, text/plain)
2021-07-22 12:47 UTC, Ben Morrice
no flags Details

Description Richard W.M. Jones 2021-02-02 10:57:33 UTC
Description of problem:

This was actually reported against CentOS, but I'm filing this
against RHEL because I believe the same bug could occur here.

Version-Release number of selected component (if applicable):

/usr/bin/platform-python 3.6.8-31.el8
libguestfs-1.40.2-25.module_el8.3.0+555+a55c8938.x86_64

How reproducible:

100%

Steps to Reproduce:

(Details to follow)

Actual results:

A crash was observed in kojid which uses the Python bindings
to libguestfs.  The logs leading up to the crash were:

2021-01-29 15:48:03,025 [DEBUG] {368614} koji.TaskManager:918 bins: {'1:i386': 1, '1:x86_64': 1, '1:noarch': 1, '2:i386': 1, '2:x86_64': 1, '2:noarch': 1, '7:i386': 1, '7:x86_64': 1, '7:noarch': 1}
libguestfs: trace: set_verbose true
libguestfs: trace: set_verbose = 0
libguestfs: create: flags = 0, handle = 0x7ff40c01b110, program = platform-python
libguestfs: trace: set_program "kojid"
libguestfs: trace: set_program = 0
libguestfs: trace: add_drive_ro "/var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body"
libguestfs: trace: add_drive "/var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body" "readonly:true"
libguestfs: creating COW overlay to protect original drive content
libguestfs: trace: disk_format "/var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body"
libguestfs: command: run: qemu-img --help | grep -sqE -- '\binfo\b.*-U\b'
libguestfs: command: run: qemu-img
libguestfs: command: run: \ info
libguestfs: command: run: \ -U
libguestfs: command: run: \ --output json
libguestfs: command: run: \ /var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body
libguestfs: parse_json: qemu-img info JSON output:\n{\n    "virtual-size": 10737418240,\n    "filename": "/var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body",\n    "format": "raw",\n    "actual-size": 615018496,\n    "dirty-flag": false\n}\n\n
free(): invalid pointer

The reporter saw the following stack trace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff432b23c35 in __GI_abort () at abort.c:79
#2  0x00007ff432b7c987 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ff432c8911d "%s\n")
    at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ff432b83d8c in malloc_printerr (str=str@entry=0x7ff432c872fb "free(): invalid pointer") at malloc.c:5374
#4  0x00007ff432b8568c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=<optimized out>) at malloc.c:4185
#5  0x00007ff41c461b99 in json_delete () from /lib64/libjansson.so.4
#6  0x00007ff41cd6d0c5 in json_decref (json=0x7ff40c013470) at /usr/include/jansson.h:129
#7  cleanup_json_t_decref (ptr=<synthetic pointer>) at info.c:52
#8  guestfs_impl_disk_format (g=g@entry=0x7ff40c01b110, 
    filename=filename@entry=0x7ff40c027ad0 "/var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body")
    at info.c:66
#9  0x00007ff41cd3f6b9 in guestfs_disk_format (g=g@entry=0x7ff40c01b110, 
    filename=0x7ff40c027ad0 "/var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body")
    at actions-6.c:65
#10 0x00007ff41cd74e41 in get_source_format_or_autodetect (g=0x7ff40c01b110, drv=<optimized out>) at launch-libvirt.c:201
#11 0x00007ff41cd76118 in create_cow_overlay_libvirt (g=0x7ff40c01b110, datav=0x7ff40c016360, drv=0x7ff40c2b9990) at launch-libvirt.c:270
#12 0x00007ff41cd6359a in create_overlay (g=0x7ff40c01b110, drv=0x7ff40c2b9990) at drives.c:87
#13 0x00007ff41cd637cb in create_drive_file (g=g@entry=0x7ff40c01b110, data=data@entry=0x7ff413c6b870) at drives.c:119
#14 0x00007ff41cd640d8 in guestfs_impl_add_drive_opts (g=g@entry=0x7ff40c01b110, filename=<optimized out>, 
    filename@entry=0x7ff4180ba600 "/var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body", 
    optargs=optargs@entry=0x7ff413c6ba10) at drives.c:830
#15 0x00007ff41ccf3698 in guestfs_add_drive_opts_argv (g=0x7ff40c01b110, 
    filename=0x7ff4180ba600 "/var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body", 
    optargs=optargs@entry=0x7ff413c6ba10) at actions-2.c:181
#16 0x00007ff41cd64a85 in guestfs_impl_add_drive_ro (filename=<optimized out>, g=<optimized out>) at drives.c:942
#17 guestfs_impl_add_cdrom (g=<optimized out>, filename=<optimized out>) at drives.c:1013
#18 0x00007ff41cccdf34 in guestfs_add_drive_ro (g=g@entry=0x7ff40c01b110, 
    filename=0x7ff4180ba600 "/var/builder/builder/tasks/5516/5516/output_image/7938c61b-46bb-4b6a-b9c8-008e2ae5e900.body")
    at actions-0.c:188
#19 0x00007ff41d01233d in guestfs_int_py_add_drive_ro (self=<optimized out>, args=<optimized out>) at actions-0.c:200
...

Comment 2 Richard W.M. Jones 2021-02-02 11:15:47 UTC
Reporter says libguestfs-test-tool output is fine.

libjansson 2.11-3.el8

Comment 3 Richard W.M. Jones 2021-02-18 15:05:53 UTC
I need to reproduce this one.  Setting Triaged.

Comment 4 Brendan Reilly 2021-04-16 18:42:01 UTC
Hi, we're seeing this issue in RHELBLD on a RHEL 8 host - see the linked RHELBLD issue.

Comment 5 Richard W.M. Jones 2021-04-16 21:29:14 UTC
I was never able to reproduce this bug, and therefore diagnose it any
further.  Some kind of minimal reproducer would be greatly helpful.

You could start by modifying this example:

https://github.com/libguestfs/libguestfs/blob/master/python/examples/inspect_vm.py

I notice the actual code which crashes:

https://github.com/redhat-imaging/imagefactory/blob/master/imgfac/FactoryUtils.py#L22

does not use python_return_dict=True (so it invokes the older, and these days
less tested path).  It also uses g.add_drive_ro which is an older synonym for
g.add_drive_opts(disk, readonly=1), but also a less tested path.

Comment 6 Richard W.M. Jones 2021-04-20 08:14:35 UTC
I had a go at reproducing this, but I got nothing, on RHEL 8 or Fedora.  I also
examined the code (libguestfs.git/lib/info.c) but I don't see any obvious errors.
As well as trying to find a reproducer, would suggest:
 - Upgrade libguestfs as you're using an old version.
 - Try downgrading jansson, since the crash is actually happening in that library.
 - Try a different version of Python, ie. platform-python vs one from a module.
But what we really need most of all is a small reproducer.

Comment 7 Richard W.M. Jones 2021-05-05 12:44:05 UTC
FWIW my non-reproducer was simply:

$ truncate -s 1G test.raw
$ python3 ./test.py

where test.py is:

import guestfs
g = guestfs.GuestFS()
g.add_drive_ro("test.raw")
g.launch()

You can also try different python interpreters, etc.  I could not reproduce
the bug here.

Comment 8 Richard W.M. Jones 2021-05-06 20:15:26 UTC
A run under valgrind gave this error about the invalid free:

==2345280== Thread 2:
==2345280== Invalid free() / delete / delete[] / realloc()
==2345280==    at 0x4C3610C: free (vg_replace_malloc.c:538)
==2345280==    by 0x24240B98: json_delete (in /usr/lib64/libjansson.so.4.11.0)
==2345280==    by 0x2396E0C4: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x239406B8: guestfs_disk_format (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x23975E40: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x23977117: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x23964599: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x239647CA: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x239650D7: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x238F4697: guestfs_add_drive_opts_argv (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x23965A84: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x238CEF33: guestfs_add_drive_ro (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==  Address 0x1ddb7f31 is 1 bytes inside a block of size 128 alloc'd
==2345280==    at 0x4C34F0B: malloc (vg_replace_malloc.c:307)
==2345280==    by 0x2423AE30: ??? (in /usr/lib64/libjansson.so.4.11.0)
==2345280==    by 0x2424020A: json_object (in /usr/lib64/libjansson.so.4.11.0)
==2345280==    by 0x2423C6D4: ??? (in /usr/lib64/libjansson.so.4.11.0)
==2345280==    by 0x2423C965: ??? (in /usr/lib64/libjansson.so.4.11.0)
==2345280==    by 0x2423CC0C: json_loadb (in /usr/lib64/libjansson.so.4.11.0)
==2345280==    by 0x2396DEF9: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x239612AF: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x23961C6C: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x2396DD0C: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x2396E005: ??? (in /usr/lib64/libguestfs.so.0.507.0)
==2345280==    by 0x239406B8: guestfs_disk_format (in /usr/lib64/libguestfs.so.0.507.0)
==2345280== 

Things I notice here:

 - Pointer was allocated in json_object() (as expected).

 - Pointer has somehow been incremented by 1 before being passed to free.

Comment 9 Richard W.M. Jones 2021-05-17 09:44:59 UTC
Created attachment 1784004 [details]
test.c

The reason is ...

  #0  0x00007f1d7e1bd7a7 in json_object_iter_next () at /lib64/libjson-c.so.4   
  #1  0x00007f1d6a7d6efd in do_dump                                             
      (json=json@entry=0x7f1d60186f70, flags=flags@entry=0, depth=depth@entry=0,
parents=parents@entry=0x7f1d684e1650, dump=dump@entry=0x7f1d6a7d68c0
<dump_to_file>, data=data@entry=0x7f1d845b36e0 <_IO_2_1_stdout_>) at dump.c:400

If you look very closely you'll see frame 1 is in the Jansson library.

Frame 0 is in a completely different and unrelated library (json-c)
that happens to contain a symbol with the same name as one from
Jansson (json_object_iter_next).  Looking at /proc/pid/maps verifies
that kojid is dynamically loading both libjansson and libjson-c.

This causes memory corruption which creates the problem we've seen.

CCing Dan Berrange.  Dan: I'm fairly sure we've had this problem
before ... with libvirt?  Do you remember more about that?

There are various reports of this on the web:

  https://github.com/json-c/json-c/issues/621
  https://github.com/akheron/jansson/issues/536

A simple reproducer is that you can try on the Beaker machine is
attached.  You have to compile it this way to link both libraries:

  $ gcc -Wall -Werror test.c -ljson-c -ljansson -o test
  $ ./test
  Segmentation fault (core dumped)

(Note if you link the libraries in the other order then it doesn't
crash.)

At the moment I don't know how to fix this, but at least I finally 
know what the problem is, which is a good start.  Another thing I
don't know is which particular Python dependency of Koji is using
json-c.

Comment 10 Tomas Kopecek 2021-05-19 12:44:33 UTC
Just for the record json-c is coming to kojid from dnf, while jansson is from guestfs.

Comment 11 Alex Iribarren 2021-07-22 08:34:10 UTC
@rjones thanks for this excellent debugging. It seems newer versions of json-c and jansson implement symbol versions, which would resolve this. Can RH8 get those updates in order to fix this issue?

Comment 12 Richard W.M. Jones 2021-07-22 08:44:40 UTC
I don't know if there is a specific plan to do that.

A possible workaround is to try importing both json extensions
somewhere early on in the Koji code.  Try importing them in
one order, and then the other way around, and one way round ought
to work.

Comment 13 Ben Morrice 2021-07-22 12:47:48 UTC
Created attachment 1804516 [details]
libdnf vs guestfs invalid pointer

Comment 14 Ben Morrice 2021-07-22 12:48:39 UTC
@rjones - you are correct. At least in the context of koji, I have managed to work around this issue by simply adding 'import guestfs' at the beginning of kojid

I've added a simple python script that illustrates the issue

@tkopecek do you see a better way to work around this bug, or should I open an issue for tracking purposes at https://pagure.io/koji/issues?

Comment 15 Tomas Kopecek 2021-07-22 13:01:47 UTC
Let's open an issue for koji - it is not that easy (guestfs needn't be present for all builders, just image ones) but we should figure it out.

Comment 16 Alex Iribarren 2021-07-28 07:57:43 UTC
Ok, I opened https://pagure.io/koji/issue/2964.

Comment 17 Klaus Heinrich Kiwi 2021-07-28 13:13:23 UTC
Rich,

 I assigned this to you, since you identified and debugged the issue already, but my understanding is that the real fix for this would be elsewhere (updated packaging for jansson or json-c)?

At any rate, this sounds sufficiently important that we need to prioritize it, even if we use this bug to track a workaround, and create another one to fix the underlying issue. Do you agree?

Comment 18 Richard W.M. Jones 2021-08-02 08:04:33 UTC
This isn't really a bug in libguestfs, it's a bug caused by
two JSON libraries having conflicting symbols so if (any)
program happens to load both libraries at the same time then
bad things happen.  kojid was already using one of the JSON
libraries via Python, and libguestfs happened to be the thing
that pulled in the other JSON library so we get the blame for
causing the crash, but it could equally well have been caused
by anything else using the other JSON library.

Interestingly it's been fixed already (in the two JSON libraries)
in RHEL 9 by some ELF trickery.

Comment 19 Klaus Heinrich Kiwi 2021-08-02 20:16:51 UTC
(In reply to Richard W.M. Jones from comment #18)
> This isn't really a bug in libguestfs, it's a bug caused by
> two JSON libraries having conflicting symbols so if (any)
> program happens to load both libraries at the same time then
> bad things happen.  kojid was already using one of the JSON
> libraries via Python, and libguestfs happened to be the thing
> that pulled in the other JSON library so we get the blame for
> causing the crash, but it could equally well have been caused
> by anything else using the other JSON library.
> 
> Interestingly it's been fixed already (in the two JSON libraries)
> in RHEL 9 by some ELF trickery.

We're in agreement. I just need to know how to wrap our work on this.

i.e., is the issue opened elsewhere (https://pagure.io/koji/issue/2964) sufficient, or should we re-route this one to some RHEL toolchain component?

Thanks.

Comment 20 Klaus Heinrich Kiwi 2021-08-24 18:24:47 UTC
So how can we progress here?
Are we tracking the fix for this anywhere else? What is the completion criteria?

I'm setting sev/prio to low, assuming that this is no longer blocking any team.

Comment 21 Richard W.M. Jones 2021-08-24 20:06:14 UTC
I don't know - there's nothing we can do to fix this in libguestfs,
and in any case it's not a bug in libguestfs at all, so I'm going to
close the bug.

Comment 22 Eric Garver 2021-09-03 16:58:54 UTC
PSA: See bug 2001062.


Note You need to log in before you can comment on or make changes to this bug.