Bug 894392
| Summary: | qpid c++ high level API receiver seldomly crashes while parsing map message (qmf event) around qpid::types::VariantImpl::asString (this=0x362e3220) at qpid/types/Variant.cpp:494 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Frantisek Reznicek <freznice> | ||||||
| Component: | qpid-cpp | Assignee: | mick <mgoulish> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | Frantisek Reznicek <freznice> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | Development | CC: | esammons, iboverma, jross, lzhaldyb | ||||||
| Target Milestone: | 2.3 | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2013-01-22 15:36:43 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Frantisek Reznicek
2013-01-11 16:02:13 UTC
> Steps to Reproduce: 0. service qpidd restart > 1. make clean make # build the client > 2. while true; do ./event_receiver localhost:5672 > "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.ClientConnect.#" > "" -1; done # termA > 3. ./event_receiver.sh localhost:5672 > "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"# > termB > 4. ./event_receiver.py localhost:5672 > "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"# > termC > 5. ./event_receiver localhost:5672 > "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"# > termD > 6. #sometimes c++ client from termD crashes > Created attachment 676952 [details]
Reproducer clients
Good, immediate repro on RHEL 6.3, using only terminals A, C, and D from reproducer attachment. C-client in term D failed for me as described above immediately upon startup. Sorry, previous comment was mistaken -- my client failure was spurious. No repro yet. How long did it take, running your reproducers, to see the failures ? i.e. -- You see 50% repro rate -- but at what test duration ? Were you on a VM ? Created attachment 680179 [details]
Another reproducer
Attached reproducer uses modified the clients attached earlier + runner runtest.sh
Usage:
./runtest.sh 1 $(hostname) 5672 10
# <b-cnt> <b-host> <b-port> <number-of-client-runs>
(In reply to comment #5) > How long did it take, running your reproducers, to see the failures ? > > i.e. -- You see 50% repro rate -- but at what test duration ? > > Were you on a VM ? Using attachment 680179 [details] I'm able to trigger the problem shortly on my VMs ~ multiple cores within first minute on RHEL 5.9 (32 bit~i386). I believe crash happens when c++ client tries to access non existing element of qpidtypes' Variant/Map instance (trying to get qpid.client_ppid property while java client does not provide it -> bug 895997).
On my machine, after 40 or 50 calls to this line:
clipid = (const std::string) ((((*(iter)).asMap()).find("_values")->second).asMap().find("properties")->second).asMap().find("qpid.client_pid")->second;
... the client dies. ( And the client is event_receiver.cpp from Frantisek's second reproducer. )
It dies because in the src/qpid/types/Variant.cpp code, we get a Variant that has a bad pointer in place of its VariantImpl.
I think I can finish tracking this down today.
I have prevented crashes by making a change to the cpp client code in the "another reproducer" attachment.
Instead of a single expression like this:
clipid = (const std::string) ((((*(iter)).asMap()).find("_values")->second).asMap().find("properties")->second).asMap().find("qpid.client_pid")->second;
we need a block of code more like this:
//------------------- start new code -------------------
Variant::Map& properties_map = ((((*(iter)).asMap()).find("_values")->second).asMap().find("properties")->second).asMap();
/*
Only take the string value of the qpid.client_pid Variant
if it is in the map, and its type is not VAR_VOID.
*/
Variant::Map::const_iterator client_pid_iter = properties_map.find("qpid.client_pid");
if ( client_pid_iter != properties_map.end() ) {
if ( VAR_VOID != client_pid_iter->second.getType() ) {
clipid = (const std::string) (client_pid_iter->second);
}
}
//------------------ end new code -----------------------
Please note:
I used similar code for the other field "qpid.client_ppid"
This is needed because the subexpression find("qpid.client_pid") will return Map::end() whenever that field does not exist. And that value cannot be dereferenced.
Well ... actually.... and very unfortunately .... it CAN sometimes be dereferenced! It does not *always* cause a SEGV. Although it seems like it should. I saw several instances where that code resulted in a pointer to a structure that had never been issued by a constructor -- -and we did *not* crash. This made it more difficult to understand.
My code example above checks for two conditions:
1. the field "qpid.client_pid" is not present in the properties map, and
2. the field is present, but its type is VAR_VOID
I am not sure that the VAR_VOID test is actually necessary. That code never executed in my tests. I cannot prove that it is necessary. But the test for Map::end() is definitely necessary.
With this change made in the reproducer I had zero core files after ten tests. Without it I had two core files in every one of thirty or forty tests.
I need to agree. I thought that map::find() will throw an exception which is not the case and thus dereferencing Map::end() may lead to SEGV. I'm fine with CLOSED/NOT-A-BUG resolution. I confirm that comment 10 client suggestion avoids SEGV condition[s]. The outstanding question is whether c++ client can actually crash in libqpidtypes. Variant::Map inherits from std::map so I tried to reproduce the same situation with std::map to see if standard library should check invalid data and raise exception or so. I succeeded to get similar crash with rhel5's libstdc++ which proves the c++ standard library does not do those checks as well which I believe means that user is responsible to avoid such situations -> resolution CLOSED/NOT-A-BUG is appropriate. See details below: [root@dhcp-x-y ~]# vi /tmp/a.cpp [root@dhcp-x-y ~]# cat /tmp/a.cpp // map::find #include <iostream> #include <map> int main () { std::map<char,int> mymap; std::map<char,int>::iterator it; std::map<char,std::string> * pmymap2; mymap['a']=50; mymap['b']=100; mymap['c']=150; mymap['d']=200; it=mymap.find('b'); mymap.erase (it); mymap.erase (mymap.find('d')); // print content: std::cout << "elements in mymap:" << '\n'; std::cout << "a => " << mymap.find('a')->second << '\n'; std::cout << "c => " << mymap.find('c')->second << '\n'; std::cout << "unun => " << mymap.find('q')->second << '\n'; pmymap2 = new std::map<char,std::string>(); int myint = mymap.find('q')->second; std::string mystr = (pmymap2->find('q'))->second; std::cout << myint << " / " << mystr << '\n'; return 0; } [root@dhcp-x-y ~]# g++ -g -Wall /tmp/a.cpp -o /tmp/a [root@dhcp-x-y ~]# /tmp/a elements in mymap: a => 50 c => 150 unun => 0 Segmentation fault (core dumped) [root@dhcp-x-y ~]# ls anaconda-ks.cfg bz core.5513 install.log install.log.syslog python-bugzilla-0.7.0-2.el5.noarch.rpm trusnaktest [root@dhcp-x-y ~]# gdb /tmp/a core.5513 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-45.el5) ... e was generated by `/tmp/a'. Program terminated with signal 11, Segmentation fault. #0 0x0000003e3929c928 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6 (gdb) info thr * 1 Thread 5513 0x0000003e3929c928 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6 (gdb) bt #0 0x0000003e3929c928 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6 #1 0x0000000000400efe in main () at /tmp/a.cpp:31 (gdb) quit |