Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 894392

Summary: qpid c++ high level API receiver seldomly crashes while parsing map message (qmf event) around qpid::types::VariantImpl::asString (this=0x362e3220) at qpid/types/Variant.cpp:494
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: mick <mgoulish>
Status: CLOSED NOTABUG QA Contact: Frantisek Reznicek <freznice>
Severity: urgent Docs Contact:
Priority: high    
Version: DevelopmentCC: esammons, iboverma, jross, lzhaldyb
Target Milestone: 2.3   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-22 15:36:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Reproducer clients
none
Another reproducer none

Description Frantisek Reznicek 2013-01-11 16:02:13 UTC
Description of problem:

Qpid c++ high level API receiver seldomly crashes while parsing map message (qmf event)

There were detected multiple occurrences of c++ client crashes while parsing map message. Client crashed in qpid::types::VariantImpl::asString:

#0  qpid::types::VariantImpl::asString (this=0x362e3220) at qpid/types/Variant.cpp:494
#1  0x00d1209f in qpid::types::Variant::asString (this=0x99ef118) at qpid/types/Variant.cpp:834
#2  0x00d12319 in qpid::types::Variant::operator std::string (this=0x99ef118) at qpid/types/Variant.cpp:859
#3  0x08049717 in main (argc=Cannot access memory at address 0x2
) at event_receiver.cpp:79

See full backtrace below...


So far it was seen on RHEL 5.9 i386 only.

Key to reproduce is to have multiple clients listening on exchange/topic (termB-D) and have one to generate exchange/topic (termA).

The more clients are running in parallel the more likely crash happens.



Version-Release number of selected component (if applicable):
python-qpid-0.18-4.el5
python-qpid-qmf-0.18-13.el5
qpid-cpp-client-0.18-13.el5
qpid-cpp-client-devel-0.18-13.el5
qpid-cpp-client-devel-docs-0.18-13.el5
qpid-cpp-client-rdma-0.18-13.el5
qpid-cpp-client-ssl-0.18-13.el5
qpid-cpp-mrg-debuginfo-0.18-13.el5
qpid-cpp-server-0.18-13.el5
qpid-cpp-server-cluster-0.18-13.el5
qpid-cpp-server-devel-0.18-13.el5
qpid-cpp-server-ha-0.18-13.el5
qpid-cpp-server-rdma-0.18-13.el5
qpid-cpp-server-ssl-0.18-13.el5
qpid-cpp-server-store-0.18-13.el5
qpid-cpp-server-xml-0.18-13.el5
qpid-java-client-0.18-6.el5
qpid-java-common-0.18-6.el5
qpid-java-example-0.18-6.el5
qpid-qmf-0.18-13.el5
qpid-qmf-debuginfo-0.18-13.el5
qpid-qmf-devel-0.18-13.el5
qpid-tests-0.18-2.el5
qpid-tools-0.18-7.el5
rh-qpid-cpp-tests-0.18-13.el5
ruby-qpid-qmf-0.18-13.el5


How reproducible:
50%

Steps to Reproduce:
1. make clean make # build the client
2. while true; do ./event_receiver localhost:5672 "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.ClientConnect.#" "" -1; done # termA
3. ./event_receiver.sh localhost:5672 "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"# termB
4. ./event_receiver.py localhost:5672 "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"# termC
5. ./event_receiver localhost:5672 "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"# termD
6. #sometimes c++ client from termD crashes
  
Actual results:
Client crashes.

Expected results:
Client should not crash.

Additional info:

[root@kvm-guest-01 bz845223]# ./event_receiver localhost:5672 "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"
| clientConnect event_receiver.py 30107 18891
Segmentation fault (core dumped)
Core was generated by `./event_receiver localhost:5672 qmf.default.topic/agent.ind.event.org_apache_qp'.
Program terminated with signal 11, Segmentation fault.
#0  qpid::types::VariantImpl::asString (this=0x362e3220) at qpid/types/Variant.cpp:494
494         switch(type) {
(gdb) info threads
  2 Thread 30106  0x00580402 in __kernel_vsyscall ()
* 1 Thread 0xb7fa5720 (LWP 30105)  qpid::types::VariantImpl::asString (this=0x362e3220) at qpid/types/Variant.cpp:494
(gdb) t a a bt

Thread 2 (Thread 30106):
#0  0x00580402 in __kernel_vsyscall ()
#1  0x001e5c86 in epoll_wait () from /lib/libc.so.6
#2  0x006d805a in qpid::sys::Poller::wait (this=0x99e8040, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:568
#3  0x006d8c83 in qpid::sys::Poller::run (this=0x99e8040) at qpid/sys/epoll/EpollPoller.cpp:520
#4  0x006cef81 in qpid::sys::(anonymous namespace)::runRunnable (p=0x99e8040) at qpid/sys/posix/Thread.cpp:35
#5  0x00cbf912 in start_thread () from /lib/libpthread.so.0
#6  0x001e560e in clone () from /lib/libc.so.6

Thread 1 (Thread 0xb7fa5720 (LWP 30105)):
#0  qpid::types::VariantImpl::asString (this=0x362e3220) at qpid/types/Variant.cpp:494
#1  0x00d1209f in qpid::types::Variant::asString (this=0x99ef118) at qpid/types/Variant.cpp:834
#2  0x00d12319 in qpid::types::Variant::operator std::string (this=0x99ef118) at qpid/types/Variant.cpp:859
#3  0x08049717 in main (argc=Cannot access memory at address 0x2
) at event_receiver.cpp:79

Comment 1 Frantisek Reznicek 2013-01-11 16:04:41 UTC
> Steps to Reproduce:
  0. service qpidd restart
> 1. make clean make # build the client
> 2. while true; do ./event_receiver localhost:5672
> "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.ClientConnect.#"
> "" -1; done # termA
> 3. ./event_receiver.sh localhost:5672
> "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"#
> termB
> 4. ./event_receiver.py localhost:5672
> "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"#
> termC
> 5. ./event_receiver localhost:5672
> "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.clientConnect.#"#
> termD
> 6. #sometimes c++ client from termD crashes
>

Comment 2 Frantisek Reznicek 2013-01-11 16:12:32 UTC
Created attachment 676952 [details]
Reproducer clients

Comment 3 mick 2013-01-15 14:23:40 UTC
Good, immediate repro on RHEL 6.3, using only terminals A, C, and D from reproducer attachment.

C-client in term D failed for me as described above immediately upon startup.

Comment 4 mick 2013-01-15 16:21:52 UTC
Sorry, previous comment was mistaken -- my client failure was spurious.  No repro yet.

Comment 5 mick 2013-01-17 07:37:09 UTC
How long did it take, running your reproducers, to see the failures ?

i.e. -- You see 50% repro rate -- but at what test duration ?

Were you on a VM ?

Comment 6 Frantisek Reznicek 2013-01-17 12:18:31 UTC
Created attachment 680179 [details]
Another reproducer

Attached reproducer uses modified the clients attached earlier + runner runtest.sh

Usage:
  ./runtest.sh 1       $(hostname) 5672      10
  #           <b-cnt>  <b-host>    <b-port>  <number-of-client-runs>

Comment 7 Frantisek Reznicek 2013-01-17 12:29:33 UTC
(In reply to comment #5)
> How long did it take, running your reproducers, to see the failures ?
> 
> i.e. -- You see 50% repro rate -- but at what test duration ?
> 
> Were you on a VM ?

Using attachment 680179 [details] I'm able to trigger the problem shortly on my VMs ~ multiple cores within first minute on RHEL 5.9 (32 bit~i386).

I believe crash happens when c++ client tries to access non existing element of qpidtypes' Variant/Map instance (trying to get qpid.client_ppid property while java client does not provide it -> bug 895997).

Comment 9 mick 2013-01-18 12:43:25 UTC
On my machine, after 40 or 50 calls to this line:

  clipid = (const std::string) ((((*(iter)).asMap()).find("_values")->second).asMap().find("properties")->second).asMap().find("qpid.client_pid")->second;

... the client dies.   ( And the client is event_receiver.cpp from Frantisek's second reproducer. )


It dies because in the src/qpid/types/Variant.cpp code, we get a Variant that has a bad pointer in place of its VariantImpl.


I think I can finish tracking this down today.

Comment 10 mick 2013-01-22 08:43:33 UTC
I have prevented crashes by making a change to the cpp client code in the "another reproducer" attachment.

Instead of a single expression like this:

clipid = (const std::string) ((((*(iter)).asMap()).find("_values")->second).asMap().find("properties")->second).asMap().find("qpid.client_pid")->second;


we need a block of code more like this:

//------------------- start new code -------------------
Variant::Map& properties_map = ((((*(iter)).asMap()).find("_values")->second).asMap().find("properties")->second).asMap();

/*
  Only take the string value of the qpid.client_pid Variant  
  if it is in the map, and its type is not VAR_VOID.
*/
Variant::Map::const_iterator client_pid_iter = properties_map.find("qpid.client_pid");

if ( client_pid_iter != properties_map.end() ) {
  if ( VAR_VOID != client_pid_iter->second.getType() ) {
    clipid = (const std::string)  (client_pid_iter->second);
  }
}
//------------------ end new code -----------------------


Please note:
I used similar code for the other field "qpid.client_ppid"   




This is needed because the subexpression   find("qpid.client_pid")  will return Map::end()  whenever that field does not exist.  And that value cannot be dereferenced.



Well ... actually.... and very unfortunately .... it CAN sometimes be dereferenced!  It does not *always* cause a SEGV.  Although it seems like it should.  I saw several instances where that code resulted in a pointer to a structure that had never been issued by a constructor -- -and we did *not* crash.  This made it more difficult to understand.



My code example above checks for two conditions:  

  1. the field "qpid.client_pid" is not present in the properties map, and
  2. the field is present, but its type is VAR_VOID

I am not sure that the VAR_VOID test is actually necessary.  That code never executed in my tests.  I cannot prove that it is necessary.  But the test for Map::end() is definitely necessary.


With this change made in the reproducer I had zero core files after ten tests.  Without it I had two core files in every one of thirty or forty tests.

Comment 11 Frantisek Reznicek 2013-01-22 09:36:19 UTC
I need to agree. I thought that map::find() will throw an exception which is not the case and thus dereferencing Map::end() may lead to SEGV.

I'm fine with CLOSED/NOT-A-BUG resolution.

Comment 12 Frantisek Reznicek 2013-01-23 08:57:09 UTC
I confirm that comment 10 client suggestion avoids SEGV condition[s].
The outstanding question is whether c++ client can actually crash in libqpidtypes.

Variant::Map inherits from std::map so I tried to reproduce the same situation with std::map to see if standard library should check invalid data and raise exception or so.

I succeeded to get similar crash with rhel5's libstdc++ which proves the c++ standard library does not do those checks as well which I believe means that user is responsible to avoid such situations -> resolution CLOSED/NOT-A-BUG is appropriate.

See details below:

[root@dhcp-x-y ~]# vi /tmp/a.cpp
[root@dhcp-x-y ~]# cat /tmp/a.cpp
// map::find
#include <iostream>
#include <map>

int main ()
{
  std::map<char,int> mymap;
  std::map<char,int>::iterator it;

  std::map<char,std::string> * pmymap2;


  mymap['a']=50;
  mymap['b']=100;
  mymap['c']=150;
  mymap['d']=200;

  it=mymap.find('b');
  mymap.erase (it);
  mymap.erase (mymap.find('d'));

  // print content:
  std::cout << "elements in mymap:" << '\n';
  std::cout << "a => " << mymap.find('a')->second << '\n';
  std::cout << "c => " << mymap.find('c')->second << '\n';
  std::cout << "unun => " << mymap.find('q')->second << '\n';

  pmymap2 = new std::map<char,std::string>();

  int myint = mymap.find('q')->second;
  std::string mystr = (pmymap2->find('q'))->second;

  std::cout << myint << " / " << mystr << '\n';

  return 0;
}
[root@dhcp-x-y ~]# g++ -g -Wall  /tmp/a.cpp -o /tmp/a
[root@dhcp-x-y ~]# /tmp/a
elements in mymap:
a => 50
c => 150
unun => 0
Segmentation fault (core dumped)
[root@dhcp-x-y ~]# ls
anaconda-ks.cfg  bz  core.5513  install.log  install.log.syslog  python-bugzilla-0.7.0-2.el5.noarch.rpm  trusnaktest
[root@dhcp-x-y ~]# gdb /tmp/a core.5513
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-45.el5)
...
e was generated by `/tmp/a'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003e3929c928 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
(gdb) info thr
* 1 Thread 5513  0x0000003e3929c928 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
(gdb) bt
#0  0x0000003e3929c928 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
#1  0x0000000000400efe in main () at /tmp/a.cpp:31
(gdb) quit