Bug 731914

Summary: matahari service seg fault on multiple method timeouts
Product: Red Hat Enterprise Linux 6 Reporter: Dave Johnson <dajohnso>
Component: matahariAssignee: Russell Bryant <rbryant>
Status: CLOSED ERRATA QA Contact: Dave Johnson <dajohnso>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2CC: matahari-maint, rbryant
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: matahari-0.4.2-8.el6 Doc Type: Bug Fix
Doc Text:
No description required
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:40:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
script to recreate error none

Description Dave Johnson 2011-08-19 03:58:10 UTC
Description of problem:
=================================
Running some python test automation for matahari-qmf-service, I came across multiple segfaults after I added some tests to verify timeouts for start/stop/status methods.  

If I slow the test down by adding a 5 second sleep between each iteration, no segfault.  Also no segfault occurs when I run status() 1000 times in a row when the service being called returns within the allotted timeout. 


Version-Release number of selected component (if applicable):
================================================================
v0.4.2-7

How reproducible:
=======================
100%

Steps to Reproduce:
=======================
1.  install/start broker and service agent
2.  run attached script, "stressService.py"  (checks status of crond 100 times), no errors
3.  run attached script, "stressService.py --timeout" (adds a wrapper script around crond with a 10 sec sleep).  Seg faults can occur anywhere between attempts 1-15
  
Actual results:
================================
segfault

Additional info:
================================

(gdb) bt full
#0  0x00000030eef2fa5f in __strlen_sse42 () from /lib64/libc.so.6
No symbol table info available.
#1  0x00000030faa02381 in read_output (fd=9, user_data=0xc280c0) at /usr/src/debug/matahari-matahari-325f740/src/lib/services_linux.c:66
        data = 0x0
        rc = 0
        len = 0
        is_err = 0
        op = 0xc280c0
        buf = 
    "\240\277(!\377\177\000\000\265#\347\356\060\000\000\000X\n\303\000\000\000\000\000`\270\031\357\060\000\000\000\n\000\000\000\000\000\000\000@\277(!\377\177\000\000\240\277(!\377\177\000\000\017\027\347\356\060\000\000\000`\270\031\357\060\000\000\000\220\277(!\377\177\000\000@\277(!\377\177\000\000\025\033t\361\060\000\000\000x\275\031\357\060\000\000\000\020\277(!\377\177\000\000 \277(!\377\177\000\000p\301(!\377\177\000\000p\301(!\377\177\000\000\260\276(!\377\177\000\000\240\276(!\377\177\000\000\004d@\000\000\000\000\000\020\230\240\371\060\000\000\000h\030\000č\177\000\000x\r\000č\177\000\000\t\205\355\356\060", '\000' <repeats 11 times>, "\003\016\347\356\060\000\000\000\001\000\000\000\000\000\000\000`\270\031\357\060\000\000\000\001\000\000\000\000\000\000\000\343\270\031\357\060\000\000\000 \004\303\000\000\000\000\000\265#\347\356\060\000\000\000\000e\000č"...
#2  0x00000030f9202ab0 in mainloop_fd_dispatch (source=0xc28230, callback=<value optimized out>, userdata=<value optimized out>) at /usr/src/debug/matahari-matahari-325f740/src/lib/mainloop.c:297
        trig = 0xc28230
        __FUNCTION__ = "mainloop_fd_dispatch"
#3  0x00000030f0238f0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
No symbol table info available.
#4  0x00000030f023c938 in ?? () from /lib64/libglib-2.0.so.0
No symbol table info available.
#5  0x00000030f023cd55 in g_main_loop_run () from /lib64/libglib-2.0.so.0
No symbol table info available.
#6  0x0000000000403f47 in main (argc=3, argv=0x7fff2128c2c8) at /usr/src/debug/matahari-matahari-325f740/src/service/service-qmf.cpp:167
        agent = {<MatahariAgent> = {_vptr.MatahariAgent = 0x40eb10, _impl = 0xc20b80}, _services = {<qmf::Handle<qmf::DataImpl>> = {impl = 0xc2fe10}, <No data fields>}, _resources = 
    {<qmf::Handle<qmf::DataImpl>> = {impl = 0xc30080}, <No data fields>}, standards = std::list = {[0] = {impl = 0xc266d0}, [1] = {impl = 0xc294c0}, [2] = {impl = 0xc29820}}, _package = {
            data_Services = {<qmf::Handle<qmf::SchemaImpl>> = {impl = 0xc282f0}, <No data fields>}, data_Resources = {<qmf::Handle<qmf::SchemaImpl>> = {impl = 0xc2ddd0}, <No data fields>}, 
            event_resource_op = {<qmf::Handle<qmf::SchemaImpl>> = {impl = 0xc2d450}, <No data fields>}}}
        rc = 0
(gdb)

Comment 2 Dave Johnson 2011-08-19 04:16:03 UTC
Created attachment 518971 [details]
script to recreate error

* assumes service agent is running and connected to local broker

Usage: stressService.py [options]

Options:
  --version       show program's version number and exit
  -h, --help      show this help message and exit
  -t, --timeout   cause timeout failures
  -d, --delay     seconds between test
  -n, --maxTests  number of the tests to perform

Comment 3 Russell Bryant 2011-08-19 12:38:47 UTC
Thanks for the excellent report!  I was able to successfully reproduce the crash on the first try using your script.

Comment 4 Russell Bryant 2011-08-19 13:43:43 UTC
I have merged a patch upstream to resolve this crash.

https://github.com/matahari/matahari/commit/d7684f802f785a99e26862b5623d8b7aca3105a3

Comment 6 Dave Johnson 2011-08-24 19:40:40 UTC
good 2 go in v0.4.2-9

Comment 7 Russell Bryant 2011-11-16 21:46:12 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No description required

Comment 8 errata-xmlrpc 2011-12-06 11:40:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1569.html