Bug 1814923 - Libvirtd get SIGABRT when blockcopy to an image of luks with secret usage
Summary: Libvirtd get SIGABRT when blockcopy to an image of luks with secret usage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.0
Assignee: Peter Krempa
QA Contact: Han Han
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-19 02:55 UTC by Han Han
Modified: 2020-05-08 07:34 UTC (History)
7 users (show)

Fixed In Version: libvirt-6.0.0-14.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:59:00 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
All threads backtrace (67.00 KB, text/plain)
2020-03-19 02:55 UTC, Han Han
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:59:43 UTC

Description Han Han 2020-03-19 02:55:18 UTC
Created attachment 1671313 [details]
All threads backtrace

Description of problem:
As subject

Version-Release number of selected component (if applicable):
libvirt-6.0.0-13.virtcov.el8.x86_64
qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64

How reproducible:
90%

Steps to Reproduce:
1. Prepare a VM. Inactively attach a second disk to it
# qemu-img create /tmp/disk 100M

# virsh attach-disk new /tmp/disk sda --config

2. Prepare a luks on gluster
# qemu-img create -f luks --object secret,data=redhat,id=sec0 -o key-secret=sec0 gluster://XXXX/gv/new-dest  100M


3. Prepare a luks secret, referring to https://libvirt.org/formatsecret.html

4. Prepare the gluster luks dest xml:
# cat gluster-dest.xml
<disk type="network" device="disk">
  <driver name="qemu" type="raw"/>
  <source protocol='gluster' name='gv/new-dest'>
    <encryption format="luks">
      <secret type="passphrase" usage="luks"/>
    </encryption>
    <host name='XXX'/>
    <host name='XXX'/>
  </source>
  <target dev="sda" bus="scsi"/>
</disk>

5. Start VM and do blockcopy with --reuse-external
# virsh blockcopy new sda --xml gluster-dest.xml --pivot --transient-job --verbose --wait --reuse-external
error: failed to connect to the hypervisor
error: End of file while reading data: Input/output error

Sometimes libvirt will not crash.
Then you can stop vm the check the new disk xml:

# virsh destroy new                                                                                       
Domain new destroyed

# virsh dumpxml new --inactive|xmllint --xpath //disk -
<disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/var/lib/libvirt/images/new.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
    </disk><disk type="network" device="disk">
      <driver name="qemu" type="raw"/>
      <source protocol="gluster" name="gv/new-dest">
        <host name="gls1.usersys.redhat.com" port="24007"/>
        <host name="gls2.usersys.redhat.com" port="24007"/>
      </source>
      <backingStore/>
      <target dev="sda" bus="scsi"/>
      <encryption format="luks">
        <secret type="passphrase" usage="pu"/>
      </encryption>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>

It looks like the memory is corrupted here. In case that it is a security problem. I set it private first. 

The name of secret usage is changed. 

Actual results:
As subject

Expected results:
NO SIGABRT

Additional info:
(gdb) bt
#0  0x00007f952651a70f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f9526504b25 in __GI_abort () at abort.c:79
#2  0x00007f952655d897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f952666a057 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007f9526563fdc in malloc_printerr (str=str@entry=0x7f952666c0c8 "malloc(): smallbin double linked list corrupted") at malloc.c:5366
#4  0x00007f95265672a4 in _int_malloc (av=av@entry=0x7f952689fba0 <main_arena>, bytes=bytes@entry=24) at malloc.c:3656
#5  0x00007f95265692c6 in __libc_calloc (n=n@entry=1, elem_size=elem_size@entry=24) at malloc.c:3444
#6  0x00007f952736b1fe in g_malloc0 (n_bytes=n_bytes@entry=24) at gmem.c:129
#7  0x00007f952a52c732 in virAlloc (ptrptr=ptrptr@entry=0x7ffd6f1c2410, size=size@entry=24) at ../../src/util/viralloc.c:48
#8  0x00007f952a5915ba in virJSONValueNewObject () at ../../src/util/virjson.c:577
#9  0x00007f952a591f87 in virJSONParserHandleStartMap (ctx=0x7ffd6f1c24c0) at ../../src/util/virjson.c:1723
#10 0x00007f9529c4ca07 in yajl_do_parse
    (hand=hand@entry=0x5583aeabf220, jsonText=jsonText@entry=0x5583aeab3500 "{\"return\": [{\"iops_rd\": 0, \"detect_zeroes\": \"off\", \"image\": {\"backing-image\": {\"virtual-size\": 104857600, \"filename\": \"/tmp/disk\", \"format\": \"raw\", \"actual-size\": 91770880, \"dirty-flag\": false}, \"back"..., jsonTextLen=jsonTextLen@entry=5878) at /usr/src/debug/yajl-2.1.0-10.el8.x86_64/src/yajl_parser.c:269
#11 0x00007f9529c4a69a in yajl_parse
    (hand=hand@entry=0x5583aeabf220, jsonText=jsonText@entry=0x5583aeab3500 "{\"return\": [{\"iops_rd\": 0, \"detect_zeroes\": \"off\", \"image\": {\"backing-image\": {\"virtual-size\": 104857600, \"filename\": \"/tmp/disk\", \"format\": \"raw\", \"actual-size\": 91770880, \"dirty-flag\": false}, \"back"..., jsonTextLen=jsonTextLen@entry=5878) at /usr/src/debug/yajl-2.1.0-10.el8.x86_64/src/yajl.c:130
#12 0x00007f952a59436a in virJSONValueFromString (jsonstring=<optimized out>, 
    jsonstring@entry=0x5583aeab3500 "{\"return\": [{\"iops_rd\": 0, \"detect_zeroes\": \"off\", \"image\": {\"backing-image\": {\"virtual-size\": 104857600, \"filename\": \"/tmp/disk\", \"format\": \"raw\", \"actual-size\": 91770880, \"dirty-flag\": false}, \"back"...) at ../../src/util/virjson.c:1854
#13 0x00007f94f15e6fa6 in qemuMonitorJSONIOProcessLine
    (mon=mon@entry=0x7f9504031290, line=0x5583aeab3500 "{\"return\": [{\"iops_rd\": 0, \"detect_zeroes\": \"off\", \"image\": {\"backing-image\": {\"virtual-size\": 104857600, \"filename\": \"/tmp/disk\", \"format\": \"raw\", \"actual-size\": 91770880, \"dirty-flag\": false}, \"back"..., msg=msg@entry=0x7f951e8ac620) at ../../src/qemu/qemu_monitor_json.c:222
#14 0x00007f94f15e76ca in qemuMonitorJSONIOProcess
    (mon=mon@entry=0x7f9504031290, data=0x5583aeab1cf0 "{\"return\": [{\"iops_rd\": 0, \"detect_zeroes\": \"off\", \"image\": {\"backing-image\": {\"virtual-size\": 104857600, \"filename\": \"/tmp/disk\", \"format\": \"raw\", \"actual-size\": 91770880, \"dirty-flag\": false}, \"back"..., len=5880, msg=0x7f951e8ac620) at ../../src/qemu/qemu_monitor_json.c:277
#15 0x00007f94f15cfa45 in qemuMonitorIOProcess (mon=0x7f9504031290) at ../../src/qemu/qemu_monitor.c:352
#16 0x00007f94f15cfa45 in qemuMonitorIO (watch=watch@entry=27, fd=<optimized out>, events=0, events@entry=1, opaque=opaque@entry=0x7f9504031290) at ../../src/qemu/qemu_monitor.c:611
#17 0x00007f952a568c47 in virEventPollDispatchHandles (fds=0x5583aeac2680, nfds=<optimized out>) at ../../src/util/vireventpoll.c:503
#18 0x00007f952a568c47 in virEventPollRunOnce () at ../../src/util/vireventpoll.c:658
#19 0x00007f952a5666dd in virEventRunDefaultImpl () at ../../src/util/virevent.c:353
#20 0x00007f952a766f2d in virNetDaemonRun (dmn=0x5583aea2d4a0) at ../../src/rpc/virnetdaemon.c:836
#21 0x00005583ad6c171a in main (argc=<optimized out>, argv=<optimized out>) at ../../src/remote/remote_daemon.c:1430

Comment 1 Han Han 2020-03-19 03:32:43 UTC
For the usage corrupted issue, it can be reproduced by file, iscsi backend with luks usage.
Not reproduced by gluster with luks uuid.

Comment 2 Peter Krempa 2020-03-19 15:12:25 UTC
Does the crash happen if you start a clean libvirtd and retry the steps just once? I've identified a double-free regarding the 'usage' string, but that manifests itself only after you destroy the VM and start it again.

Comment 3 Han Han 2020-03-20 06:15:56 UTC
(In reply to Peter Krempa from comment #2)
> Does the crash happen if you start a clean libvirtd and retry the steps just
> once? I've identified a double-free regarding the 'usage' string, but that
> manifests itself only after you destroy the VM and start it again.

The crash will not happen when following 1~5 with a fresh libvirtd:
➜  luks virsh start new           
Domain new started

➜  luks virsh blockcopy new sda --xml file-dest.xml --pivot --transient-job --verbose --wait --reuse-external
Block Copy: [100 %]
Successfully pivoted
➜  luks virsh destroy new                                                                                    
Domain new destroyed

➜  luks virsh dumpxml new |xmllint --xpath //disk -
-:48: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xC0 0x94 0xBB 0x7F
        <secret type='passphrase' usage=''/>

Comment 4 Han Han 2020-03-20 06:30:42 UTC
I have installed rpm of v6.1.0-190-gaeb909bf9b, which includes:
commit 38bc76bcc1e1bccb7f3265e15b60b0f6f8fe6dfa
Author: Peter Krempa <pkrempa>
Date:   Fri Mar 6 14:44:43 2020 +0100

    qemu: Don't take double pointer in qemuDomainSecretInfoFree
    
    Using a double pointer prevents the function from being used as the
    automatic cleanup function for the given type.
    
    Remove the double pointer use by replacing the calls with
    g_clear_pointer which ensures that the pointer is cleared.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>


But the error in comment3 could be reproduced again.

Comment 5 Peter Krempa 2020-03-20 07:13:59 UTC
(In reply to Han Han from comment #3)
> (In reply to Peter Krempa from comment #2)
> > Does the crash happen if you start a clean libvirtd and retry the steps just
> > once? I've identified a double-free regarding the 'usage' string, but that
> > manifests itself only after you destroy the VM and start it again.
> 
> The crash will not happen when following 1~5 with a fresh libvirtd:
> ➜  luks virsh start new           
> Domain new started
> 
> ➜  luks virsh blockcopy new sda --xml file-dest.xml --pivot --transient-job
> --verbose --wait --reuse-external
> Block Copy: [100 %]
> Successfully pivoted
> ➜  luks virsh destroy new                                           


Thank you for confirming.
        
> 
> Domain new destroyed
> 
> ➜  luks virsh dumpxml new |xmllint --xpath //disk -
> -:48: parser error : Input is not proper UTF-8, indicate encoding !
> Bytes: 0xC0 0x94 0xBB 0x7F
>         <secret type='passphrase' usage=''/>

I've seen this symptom while debugging.

(In reply to Han Han from comment #4)
> I have installed rpm of v6.1.0-190-gaeb909bf9b, which includes:
> commit 38bc76bcc1e1bccb7f3265e15b60b0f6f8fe6dfa
> Author: Peter Krempa <pkrempa>
> Date:   Fri Mar 6 14:44:43 2020 +0100
> 
>     qemu: Don't take double pointer in qemuDomainSecretInfoFree
>     
>     Using a double pointer prevents the function from being used as the
>     automatic cleanup function for the given type.
>     
>     Remove the double pointer use by replacing the calls with
>     g_clear_pointer which ensures that the pointer is cleared.
>     
>     Signed-off-by: Peter Krempa <pkrempa>
>     Reviewed-by: Ján Tomko <jtomko>
> 
> 
> But the error in comment3 could be reproduced again.

That one was just a simplification of the code. The proper fix for the issue should be the patch I've posted for a wrong copy of the 'usage' string:

https://www.redhat.com/archives/libvir-list/2020-March/msg00693.html

Comment 6 Peter Krempa 2020-03-20 08:51:34 UTC
Fixed upstream:

commit 299796328c34a30295d6cdc7ebce5d65843e921f
Author: Peter Krempa <pkrempa>
Date:   Thu Mar 19 15:38:06 2020 +0100

    virStorageEncryptionSecretCopy: Properly copy internals
    
    virStorageEncryptionSecretPtr may have a string inside it, thus we must
    copy the string too. Use virSecretLookupDefCopy to do that.
    
    Caused by non-obvious code introduced in 756b46ddd24 and later 47e88b33b
    which added a string that needed to be copied.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1814923
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>
    Reviewed-by: Michal Privoznik <mprivozn>

commit 02f909b8a6ad8c271693fed9ceab606f0dda2294
Author: Peter Krempa <pkrempa>
Date:   Thu Mar 19 15:27:40 2020 +0100

    virSecretLookupDefCopy: Remove return value
    
    The function always returns succes so there's no need for a return
    value.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>
    Reviewed-by: Michal Privoznik <mprivozn>

Comment 12 Han Han 2020-04-10 01:41:32 UTC
Vefified on libvirt-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64
qemu-kvm-4.2.0-17.module+el8.2.0+6141+0f540f16.x86_64:

1. Start a VM

2. Prepare luks secret and set the password

3. Blockcopy to a luks disk:
# cat /tmp/file-dest.xml 
<disk type="file" device="disk">
  <driver name="qemu" type="qcow2"/>
  <source file="/var/lib/libvirt/images/pc-luks">
    <encryption format="luks">
      <secret type="passphrase" usage="luks"/>
    </encryption>
  </source>
  <target dev="sda" bus="scsi"/>
</disk>

➜  ~ virsh blockcopy pc sda --xml /tmp/file-dest.xml --wait --verbose --transient-job
Block Copy: [100 %]

➜  ~ virsh blockjob pc sda --pivot  

➜  ~ virsh dumpxml pc | xmllint --xpath //disk -
<disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/var/lib/libvirt/images/pc-luks" index="6">
        <encryption format="luks">
          <secret type="passphrase" usage="luks"/>
        </encryption>
      </source>
      <backingStore/>
      <target dev="sda" bus="scsi"/>
      <alias name="scsi0-0-0-0"/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>

3. Destroy and start vm, check the vm xml again
➜  ~ virsh destroy pc                           
Domain pc destroyed

➜  ~ virsh start pc
Domain pc started

➜  ~ virsh dumpxml pc | xmllint --xpath //disk -
<disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/var/lib/libvirt/images/pc-luks" index="2"/>
      <backingStore/>
      <target dev="sda" bus="scsi"/>
      <encryption format="luks">
        <secret type="passphrase" usage="luks"/>
      </encryption>
      <alias name="scsi0-0-0-0"/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>

Comment 14 errata-xmlrpc 2020-05-05 09:59:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.