Hide Forgot
Description of problem: Remove RBD images from ceph server through rbd while refreshing the pool. Pool refreshing is completed without error. However, libvirtd is killed while getting rbd volume name which is already removed Version-Release number of selected component (if applicable): libvirt-1.3.1-1.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. prepare a running rbd pool # virsh pool-dumpxml rbd <pool type='rbd'> <name>rbd</name> <uuid>ebda974a-4fb7-4af2-b1a0-7a94e5cdda98</uuid> <capacity unit='bytes'>0</capacity> <allocation unit='bytes'>0</allocation> <available unit='bytes'>0</available> <source> <host name='10.66.110.191'/> <name>yy</name> </source> </pool> 2. create volumes # for i in {1..100}; do virsh vol-create-as rbd vol$i 100M; done 3. refresh rbd pool, meanwhile, remove 2 rbd vols from rbd server # virsh pool-refresh rbd [root@osd1 ~]# rbd rm yy/vol97 Removing image: 100% complete...done. [root@osd1 ~]# rbd rm yy/vol96 Removing image: 100% complete...done. 4. list vol # virsh vol-list rbd error: Failed to list volumes error: key in virGetStorageVol must not be NULL 5. get volume name which is removed in step 3 # virsh vol-name yy/vol97 error: Disconnected from qemu:///system due to I/O error error: failed to get vol 'yy/vol97' error: internal error: client socket is closed error: One or more references were leaked after disconnect from the hypervisor Actual results: Expected results: Pool refreshing should properly handle the volumes which are deleted through other route than libvirt Additional info: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f2c47cd7700 (LWP 17954)] 0x00007f2c556c7fc6 in __strcmp_sse42 () from /lib64/libc.so.6 (gdb) bt #0 0x00007f2c556c7fc6 in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00007f2c58362c99 in virStorageVolDefFindByKey (pool=<optimized out>, key=key@entry=0x7f2c3001c250 "yy/vol97") at conf/storage_conf.c:1734 #2 0x00007f2c3eb05698 in storageVolLookupByKey (conn=0x7f2c34003d40, key=0x7f2c3001c250 "yy/vol97") at storage/storage_driver.c:1501 #3 0x00007f2c583bad57 in virStorageVolLookupByKey (conn=0x7f2c34003d40, key=0x7f2c3001c250 "yy/vol97") at libvirt-storage.c:1342 #4 0x00007f2c58fe64d8 in remoteDispatchStorageVolLookupByKey ( server=0x7f2c5ae1afb0, msg=0x7f2c5ae3cda0, ret=0x7f2c3001c3a0, args=0x7f2c3001c4e0, rerr=0x7f2c47cd6c30, client=0x7f2c5ae3c290) at remote_dispatch.h:15967 #5 remoteDispatchStorageVolLookupByKeyHelper (server=0x7f2c5ae1afb0, client=0x7f2c5ae3c290, msg=0x7f2c5ae3cda0, rerr=0x7f2c47cd6c30, args=0x7f2c3001c4e0, ret=0x7f2c3001c3a0) at remote_dispatch.h:15945 #6 0x00007f2c584033a2 in virNetServerProgramDispatchCall (msg=0x7f2c5ae3cda0, client=0x7f2c5ae3c290, server=0x7f2c5ae1afb0, prog=0x7f2c5ae37fa0) at rpc/virnetserverprogram.c:437 #7 virNetServerProgramDispatch (prog=0x7f2c5ae37fa0, server=server@entry=0x7f2c5ae1afb0, client=0x7f2c5ae3c290, msg=0x7f2c5ae3cda0) at rpc/virnetserverprogram.c:307 #8 0x00007f2c583fe61d in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f2c5ae1afb0) at rpc/virnetserver.c:135 ---Type <return> to continue, or q <return> to quit--- #9 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f2c5ae1afb0) at rpc/virnetserver.c:156 #10 0x00007f2c582f78e5 in virThreadPoolWorker ( opaque=opaque@entry=0x7f2c5ae0ff60) at util/virthreadpool.c:145 #11 0x00007f2c582f6e08 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 #12 0x00007f2c5595edc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f2c5568c1cd in clone () from /lib64/libc.so.6
libvirt info # pwd /root/libvirt # git describe v1.3.1-rc1
Wido, does this sound familiar? any idea if this is fixed upstream?
(In reply to Cole Robinson from comment #2) > Wido, does this sound familiar? any idea if this is fixed upstream? I think it does. I thought I fixed it here: http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=f46d137e33a348c0f96eaacc58e29794170757cb Commit: f46d137e33a348c0f96eaacc58e29794170757cb Not sure where this is getting from. I think it goes wrong inside virStorageBackendRBDRefreshPool() where it iterates over the 'names' variable. Not sure though. But that's my best guess. Could take a while before I can dig into this.
(In reply to Wido den Hollander from comment #3) > > Could take a while before I can dig into this. no worries, I was just looking to see if it could be easily closed
Comment #3 suggests a likely fix and since there's been no feedback in 5 years since, I'll assume it was correct.