Hide Forgot
+++ This bug was initially created as a clone of Bug #1034312 +++ Description of problem: During hosted-engine --deploy, '/bin/systemctl start vdsmd.service' fails. vdsm-4.13.0-196.gitb871832.fc19.x86_64 libvirt-1.0.5.7-2.fc19.x86_64 Reproduced by command line: # /usr/libexec/vdsm/vdsmd_init_common.sh --pre-start vdsm: Running mkdirs vdsm: Running configure_coredump vdsm: Running run_init_hooks vdsm: Running gencerts vdsm: Running check_is_configured libvirt is already configured for vdsm sanlock service is already configured vdsm: Running validate_configuration SUCCESS: ssl configured to true. No conflicts vdsm: Running prepare_transient_repository vdsm: Running syslog_available vdsm: Running nwfilter /usr/libexec/vdsm/vdsmd_init_common.sh: line 70: 8708 Errore di segmentazione "${VDSM_TOOL}" nwfilter vdsm: failed to execute nwfilter, error code 139 dmesg: [ 2099.578442] python[11612]: segfault at 340b38aa40 ip 000000340b38aa40 sp 00007fb5c0e6bf38 error 15 in libpython2.7.so.1.0[340b375000+3e000] gdb: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/bin/python /usr/bin/vdsm-tool nwfilter'. Program terminated with signal 11, Segmentation fault. #0 0x0000000000000007 in ?? () (gdb) py-bt #13 Frame 0x7f72680010b0, for file /usr/lib64/python2.7/site-packages/libvirt.py, line 299, in virEventRunDefaultImpl () ret = libvirtmod.virEventRunDefaultImpl() (gdb) bt #0 0x0000000000000007 in ?? () #1 0x000000312480c2ce in _sasl_log (conn=<optimized out>, level=5, fmt=0x7f7267398218 "DIGEST-MD5 client mech dispose") at common.c:1985 #2 0x00007f7267391d38 in digestmd5_client_mech_dispose (conn_context=0x143c5b0, utils=0x10ddec0) at digestmd5.c:4580 #3 0x00000031248084c4 in client_dispose (pconn=0x10dcac0) at client.c:332 #4 0x000000312480b4fb in sasl_dispose (pconn=0xe63338) at common.c:851 #5 0x0000003411885a3b in virObjectUnref (anyobj=<optimized out>) at util/virobject.c:264 #6 0x0000003411970908 in virNetSocketDispose (obj=0xe5dc50) at rpc/virnetsocket.c:1026 #7 0x0000003411885a3b in virObjectUnref (anyobj=<optimized out>) at util/virobject.c:264 #8 0x000000341186d182 in virEventPollCleanupHandles () at util/vireventpoll.c:582 #9 0x000000341186e3cf in virEventPollRunOnce () at util/vireventpoll.c:651 #10 0x000000341186cd0d in virEventRunDefaultImpl () at util/virevent.c:273 #11 0x00007f726ce6cd06 in libvirt_virEventRunDefaultImpl (self=<optimized out>, args=<optimized out>) at libvirt.c:3076 #12 0x000000340b0ddcee in call_function (oparg=<optimized out>, pp_stack=0x7f726ca02390) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4098 #13 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f72680010b0, for file /usr/lib64/python2.7/site-packages/libvirt.py, line 299, in virEventRunDefaultImpl (), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740 (gdb) t 2 [Switching to thread 2 (Thread 0x7f72771b5740 (LWP 3295))] #0 _PyType_Lookup (type=type@entry=0x340b3875c0 <PyInt_Type>, name=name@entry='__class__') at /usr/src/debug/Python-2.7.5/Objects/typeobject.c:2518 2518 dict = ((PyTypeObject *)base)->tp_dict; (gdb) py-bt #10 Frame 0x14b5e30, for file /usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py, line 78, in wrapper (args=(), kwargs={}, ret=0) if isinstance(ret, libvirt.virDomain): #14 Frame 0x13f73e0, for file /usr/lib64/python2.7/site-packages/vdsm/tool/nwfilter.py, line 35, in main (conn=<virConnect(restore=<function at remote 0x140a8c0>, nodeDeviceLookupSCSIHostByWWN=<function at remote 0x140a0c8>, listAllSecrets=<function at remote 0x1406398>, saveImageGetXMLDesc=<function at remote 0x140aa28>, virConnGetLastError=<function at remote 0x1433230>, listDomainsID=<function at remote 0x14066e0>, storagePoolLookupByUUIDString=<function at remote 0x140af50>, getCPUMap=<function at remote 0x13b56e0>, setKeepAlive=<function at remote 0x140ac80>, pingLibvirt=<instancemethod at remote 0xf60a00>, changeBegin=<function at remote 0x1432ed8>, dispatchDomainEventBlockPullCallback=<function at remote 0x13b5320>, _o=None, saveImageDefineXML=<function at remote 0x140a9b0>, listDefinedDomains=<function at remote 0x1406488>, getCapabilities=<function at remote 0x13b57d0>, getFreeMemory=<function at remote 0x13b58c0>, numOfDomains=<function at remote 0x140a398>, networkCreateXML=<function at remote 0x1406c8...(truncated) conn.close() #19 Frame 0xe62d90, for file /usr/bin/vdsm-tool, line 142, in main (opts=[], args=['nwfilter'], cmd='nwfilter') return tool_command[cmd]["command"](*args[1:]) #22 Frame 0xe3d530, for file /usr/bin/vdsm-tool, line 145, in <module> () sys.exit(main()) bt #0 _PyType_Lookup (type=type@entry=0x340b3875c0 <PyInt_Type>, name=name@entry='__class__') at /usr/src/debug/Python-2.7.5/Objects/typeobject.c:2518 #1 0x000000340b0847a5 in _PyObject_GenericGetAttrWithDict (obj=0, name='__class__', dict=0x0) at /usr/src/debug/Python-2.7.5/Objects/object.c:1389 #2 0x000000340b046a2e in recursive_isinstance (inst=0, cls=<type at remote 0x1363850>) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2892 #3 0x000000340b04af55 in _PyObject_RealIsInstance (inst=<optimized out>, cls=<optimized out>) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:3060 #4 0x000000340b09caf2 in type___instancecheck__ (type=<optimized out>, inst=<optimized out>) at /usr/src/debug/Python-2.7.5/Objects/typeobject.c:595 #5 0x000000340b049dd3 in PyObject_Call (func=func@entry=<built-in method __instancecheck__ of type object at remote 0x1363850>, arg=arg@entry=(0,), kw=kw@entry=0x0) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529 #6 0x000000340b04a65f in PyObject_CallFunctionObjArgs (callable=callable@entry=<built-in method __instancecheck__ of type object at remote 0x1363850>) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2760 #7 0x000000340b04ac5f in PyObject_IsInstance (inst=0, cls=<type at remote 0x1363850>) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2963 #8 0x000000340b0d50c9 in builtin_isinstance (self=<optimized out>, args=<optimized out>) at /usr/src/debug/Python-2.7.5/Python/bltinmodule.c:2452 #9 0x000000340b0ddcee in call_function (oparg=<optimized out>, pp_stack=0x7fffaed4b120) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4098 #10 PyEval_EvalFrameEx (f=f@entry=Frame 0x14b5e30, for file /usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py, line 78, in wrapper (args=(), kwargs={}, ret=0), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740 --- Additional comment from Jiri Denemark on 2013-12-02 13:13:49 UTC --- The backtrace looks very similar to the following commit (v1.2.0-rc1-3-g13fdc6d): commit 13fdc6d63ef64f8e231a087e1dab7d90145c3c10 Author: Christophe Fergeau <cfergeau> Date: Fri Nov 22 17:27:21 2013 +0100 Tie SASL callbacks lifecycle to virNetSessionSASLContext The array of sasl_callback_t callbacks which is passed to sasl_client_new() must be kept alive as long as the created sasl_conn_t object is alive as cyrus-sasl uses this structure internally for things like logging, so the memory used for callbacks must only be freed after sasl_dispose() has been called. During testing of successful SASL logins with virsh -c qemu+tls:///system list --all I've been getting invalid read reports from valgrind ==9237== Invalid read of size 8 ==9237== at 0x6E93B6F: _sasl_getcallback (common.c:1745) ==9237== by 0x6E95430: _sasl_log (common.c:1850) ==9237== by 0x16593D87: digestmd5_client_mech_dispose (digestmd5.c:4580) ==9237== by 0x6E91653: client_dispose (client.c:332) ==9237== by 0x6E9476A: sasl_dispose (common.c:851) ==9237== by 0x4E225A1: virNetSASLSessionDispose (virnetsaslcontext.c:678) ==9237== by 0x4CBC551: virObjectUnref (virobject.c:262) ==9237== by 0x4E254D1: virNetSocketDispose (virnetsocket.c:1042) ==9237== by 0x4CBC551: virObjectUnref (virobject.c:262) ==9237== by 0x4E2701C: virNetSocketEventFree (virnetsocket.c:1794) ==9237== by 0x4C965D3: virEventPollCleanupHandles (vireventpoll.c:583) ==9237== by 0x4C96987: virEventPollRunOnce (vireventpoll.c:652) ==9237== by 0x4C94730: virEventRunDefaultImpl (virevent.c:274) ==9237== by 0x12C7BA: vshEventLoop (virsh.c:2407) ==9237== by 0x4CD3D04: virThreadHelper (virthreadpthread.c:161) ==9237== by 0x7DAEF32: start_thread (pthread_create.c:309) ==9237== by 0x8C86EAC: clone (clone.S:111) ==9237== Address 0xe2d61b0 is 0 bytes inside a block of size 168 free'd ==9237== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==9237== by 0x4C73827: virFree (viralloc.c:580) ==9237== by 0x4DE4BC7: remoteAuthSASL (remote_driver.c:4219) ==9237== by 0x4DE33D0: remoteAuthenticate (remote_driver.c:3639) ==9237== by 0x4DDBFAA: doRemoteOpen (remote_driver.c:832) ==9237== by 0x4DDC8DC: remoteConnectOpen (remote_driver.c:1031) ==9237== by 0x4D8595F: do_open (libvirt.c:1239) ==9237== by 0x4D863F3: virConnectOpenAuth (libvirt.c:1481) ==9237== by 0x12762B: vshReconnect (virsh.c:337) ==9237== by 0x12C9B0: vshInit (virsh.c:2470) ==9237== by 0x12E9A5: main (virsh.c:3338) This commit changes virNetSASLSessionNewClient() to take ownership of the SASL callbacks. Then we can free them in virNetSASLSessionDispose() after the corresponding sasl_conn_t has been freed. --- Additional comment from Sandro Bonazzola on 2013-12-02 14:18:13 UTC --- (In reply to Yaniv Bronhaim from comment #9) > Dan, this relates to nwfilter which uses libvirt. This looks like the > segfault happens internally in libvirt code. doesn't it? > > Sandro, reproducible or just happened? reproducible quite often especially just after the boot.
Hi jiri Since there was a dependency issue for vdsm installing in rhel7, so I installed it omit the dependent paket ,however, after i finished installing the vdsm packet, i didn't find the vdsmd_init_common.sh script in /usr/libexec/vdsm/ , so i can reproduce this bug as the comment 0 description, also I'm not sure which steps i missed, can you give me some advise? thanks pkg info libvirt-1.1.1-15.el7.x86_64 qemu-kvm-rhev-1.5.3-21.el7.x86_64 kernel-3.10.0-61.el7.x86_64 qemu-img-1.5.3-21.el7.x86_64 steps 1. download the following four packet from the brew # ll -rw-r--r--. 1 root root 652140 Dec 16 18:56 vdsm-4.10.3-17.el7.x86_64.rpm -rw-r--r--. 1 root root 58184 Dec 16 18:56 vdsm-cli-4.10.3-17.el7.noarch.rpm -rw-r--r--. 1 root root 77032 Dec 16 18:56 vdsm-python-4.10.3-17.el7.x86_64.rpm -rw-r--r--. 1 root root 26924 Dec 16 18:56 vdsm-xmlrpc-4.10.3-17.el7.noarch.rpm 2.install the vdsm with the rpm command(I have install the qemu-img packet and other dependent packet except the qemu-kvm-img) , we can see it depends on the qemu-kvm-img packet # rpm -ivh vdsm-* error: Failed dependencies: qemu-kvm-img is needed by vdsm-4.10.3-17.el7.x86_64 # rpm -ivh vdsm-* --nodeps --force Preparing... ################################# [100%] Updating / installing... 1:vdsm-python-4.10.3-17.el7 ################################# [ 25%] 2:vdsm-xmlrpc-4.10.3-17.el7 ################################# [ 50%] 3:vdsm-4.10.3-17.el7 ################################# [ 75%] 4:vdsm-cli-4.10.3-17.el7 ################################# [100%] 3.After installing, check the /usr/libexec/vdsm/ , find there wasn't vdsmd_init_common.sh script over there, so i can't reproduce this bug as the comment0 description # ll /usr/libexec/vdsm/ total 56 drwxr-xr-x. 28 root root 4096 Dec 16 19:08 hooks -rwxr-xr-x. 1 root root 692 Jun 11 2013 ovirt_functions.sh -rwxr-xr-x. 1 root root 475 Jun 11 2013 persist-vdsm-hooks -rwxr-xr-x. 1 root root 24176 Jun 11 2013 safelease -rwxr-xr-x. 1 root root 6812 Jun 11 2013 spmprotect.sh -rwxr-xr-x. 1 root root 1954 Jun 11 2013 spmstop.sh -rwxr-xr-x. 1 root root 696 Jun 11 2013 unpersist-vdsm-hook -rwxr-xr-x. 1 root root 2424 Jun 11 2013 vdsm-gencerts.sh
hi jiri I just re-check the patch, then do some testing about it ,can you help me have a look aboat my steps, were they enough to reproduce this bug? thanks pkg info libvirt-1.1.1-14.el7.x86_64 kernel-3.10.0-60.el7.x86_64 qemu-kvm-rhev-1.5.3-21.el7.x86_64 steps 1.Configure the tls environment 2.Connect to the hypervisor running on host using TLS without SASL via ipv4 after connected, i could get some invalid read reports from valgrind # valgrind -v --leak-check=full virsh -c qemu+tls://zhwang71/system list --all Please enter your authentication name: redhat Please enter your password: --15980-- REDIR: 0x80526c0 (__stpcpy_sse2_unaligned) redirected to 0x4a0b260 (stpcpy) --15980-- REDIR: 0x8053ed0 (__strcat_sse2_unaligned) redirected to 0x4a08fa0 (strcat) Id Name State ---------------------------------------------------- - rhel shut off - rhel6 shut off - rhel7 shut off - rhel7com shut off - rhel7qcow2 shut off - rheltest2 shut off - test shut off - win7 shut off ==15980== Thread 2: ==15980== Invalid read of size 8 ==15980== at 0x6D4F95F: _sasl_getcallback (in /usr/lib64/libsasl2.so.3.0.0) ==15980== by 0x6D511B0: _sasl_log (in /usr/lib64/libsasl2.so.3.0.0) ==15980== by 0x14D9CD37: ??? (in /usr/lib64/sasl2/libdigestmd5.so.3.0.0) ==15980== by 0x6D4D4C3: ??? (in /usr/lib64/libsasl2.so.3.0.0) ==15980== by 0x6D504FA: sasl_dispose (in /usr/lib64/libsasl2.so.3.0.0) ==15980== by 0x4CA892A: virObjectUnref (virobject.c:262) ==15980== by 0x4D99513: virNetSocketDispose (virnetsocket.c:1024) ==15980== by 0x4CA892A: virObjectUnref (virobject.c:262) ==15980== by 0x4C8C8B9: virEventPollCleanupHandles (vireventpoll.c:580) ==15980== by 0x4C8DB27: virEventPollRunOnce (vireventpoll.c:649) ==15980== by 0x4C8C44C: virEventRunDefaultImpl (virevent.c:273) ==15980== by 0x129374: vshEventLoop (virsh.c:2290) ==15980== Address 0xd94c800 is 0 bytes inside a block of size 168 free'd ==15980== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==15980== by 0x4C753D9: virFree (viralloc.c:566) ==15980== by 0x4D816F0: doRemoteOpen (remote_driver.c:4206) ==15980== by 0x4D81F75: remoteConnectOpen (remote_driver.c:1027) ==15980== by 0x4D27FCC: do_open (libvirt.c:1239) ==15980== by 0x4D2A838: virConnectOpenAuth (libvirt.c:1481) ==15980== by 0x12BB99: vshReconnect (virsh.c:336) ==15980== by 0x126DBA: main (virsh.c:2353) ==15980== -- ==16522== HEAP SUMMARY: ==16522== in use at exit: 571,462 bytes in 2,303 blocks ==16522== total heap usage: 18,606 allocs, 16,303 frees, 5,733,185 bytes allocated ==16522== ==16522== Searching for pointers to 2,303 not-freed blocks ==16522== Checked 1,598,904 bytes ==16522== ==16522== LEAK SUMMARY: ==16522== definitely lost: 0 bytes in 0 blocks ==16522== indirectly lost: 0 bytes in 0 blocks ==16522== possibly lost: 0 bytes in 0 blocks ==16522== still reachable: 571,462 bytes in 2,303 blocks ==16522== suppressed: 0 bytes in 0 blocks you can reference my attachment for more info 3.After i update the libvirt to the libvirt-1.1.1-15, the invalid read reports from valgrind in step 2 gone.
Yes, I believe that's the ideal way of verifying this bug. Good job.
hi jiri thanks for you quick response, however, i found other invalid read report from valgrind while i update the libvirt to the libvirt-1.1.1-15, not sure whether they are harmfull or harmful or harmless, sorry to miss it in the comment4. thanks # valgrind -v --leak-check=full virsh -c qemu+tls://zhwang71/system list --all Please enter your authentication name: redhat Please enter your password: =3545== Invalid read of size 1 ==3545== at 0x8004EBB: vfprintf (in /usr/lib64/libc-2.17.so) ==3545== by 0x80C9414: __vasprintf_chk (in /usr/lib64/libc-2.17.so) ==3545== by 0x4CB5075: virVasprintfInternal (stdio2.h:210) ==3545== by 0x4C9A745: virLogVMessage (virlog.c:842) ==3545== by 0x4C9AC26: virLogMessage (virlog.c:778) ==3545== by 0x4D98BC4: virNetSASLSessionClientStep (virnetsaslcontext.c:460) ==3545== by 0x4D819D5: doRemoteOpen (remote_driver.c:4131) ==3545== by 0x4D81FE5: remoteConnectOpen (remote_driver.c:1027) ==3545== by 0x4D2802C: do_open (libvirt.c:1239) ==3545== by 0x4D2A898: virConnectOpenAuth (libvirt.c:1481) ==3545== by 0x12BB99: vshReconnect (virsh.c:336) ==3545== by 0x126DBA: main (virsh.c:2353) ==3545== Address 0xd9a184e is 0 bytes after a block of size 126 alloc'd ==3545== at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==3545== by 0x80E45A4: xdr_array (in /usr/lib64/libc-2.17.so) ==3545== by 0x4D86251: xdr_remote_auth_sasl_start_ret (remote_protocol.c:3134) ==3545== by 0x4D9729F: virNetMessageDecodePayload (virnetmessage.c:405) ==3545== by 0x4D8DC6B: virNetClientProgramCall (virnetclientprogram.c:377) ==3545== by 0x4D69391: callFull.isra.2 (remote_driver.c:5727) ==3545== by 0x4D8190A: doRemoteOpen (remote_driver.c:5749) ==3545== by 0x4D81FE5: remoteConnectOpen (remote_driver.c:1027) ==3545== by 0x4D2802C: do_open (libvirt.c:1239) ==3545== by 0x4D2A898: virConnectOpenAuth (libvirt.c:1481) ==3545== by 0x12BB99: vshReconnect (virsh.c:336) ==3545== by 0x126DBA: main (virsh.c:2353)
Invalid reads are rarely harmless but we don't try to fix all of them in this bug. Please, file a new bug for it.
According to comment7, mark this bug verified
This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request.