Bug 1396281

Summary: source broker of dynamic route doesnt delete bindings after two consumers of same queue bound to dest.broker are gone
Product: Red Hat Enterprise MRG Reporter: Pavel Moravec <pmoravec>
Component: qpid-cppAssignee: messaging-bugs <messaging-bugs>
Status: CLOSED ERRATA QA Contact: Messaging QE <messaging-qe-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.2CC: jross, mcressma, tross, zkraus
Target Milestone: 3.2.3   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qpid-cpp-0.34-19 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1399416 1401849 1695298 (view as bug list) Environment:
Last Closed: 2016-12-07 18:58:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1399416, 1401849, 1695298    

Description Pavel Moravec 2016-11-17 21:15:03 UTC
Description of problem:
Having dynamic route from A->B, subscribing one consumer to a queue + binding it via x-bindings to "the" exchange, A gets created the same binding. Unsubscribing the consumer and deleting the queue, A removes the binding. So far so good.

But having _two_ such consumers of the same queue in parallel and stopping them and deleting the queue, the A broker does not delete the binding.


Version-Release number of selected component (if applicable):
0.34-17 (quite probably also 0.18-39 / everywhere)


How reproducible:
100%


Steps to Reproduce:

# start 2 brokers, A = port 5672, B = port 5673
pkill -9 qpidd
rm qpidd.*.log

run_qpidd() {
    port=$1
    rm -rf _${port}
    mkdir _${port}
    nohup qpidd --port=$port --store-dir=_${port} --log-to-file=qpidd.$port.log --data-dir=_${port} --auth=no --log-to-stderr=no "$@" > /dev/null 2>&1 &
    sleep 1
}

run_qpidd 5672
run_qpidd 5673

# create the dynamic route
qpid-route dynamic add localhost:5673 localhost:5672 amq.direct

# have 2 consumers:
for i in 1 2; do 
    qpid-receive -b localhost:5673 -a "q; {create:always, delete:always, node:{ x-declare:{auto-delete:True}, x-bindings:[{exchange:'amq.direct', queue:'q', key:'key'}] }}" -f &
done

# check the binding on A exists:
qpid-config exchanges -r -b localhost:5672 amq.direct

# stop the consumers that will delete the autodel queue and binding on B
killall qpid-receive

# check no binding on B
qpid-config exchanges -r -b localhost:5673 amq.direct

# but see the binding on A survived!
qpid-config exchanges -r -b localhost:5672 amq.direct


Actual results:
First "qpid-config exchanges" shows one binding like:

Exchange 'amq.direct' (direct)
    bind [key] => qpid.bridge_queue__qpid.tcp:localhost:5672!amq.direct!amq.direct!_8ea2062a-b200-4639-877d-dc3e799a9d61 {u'qpid.replicate': u'none', u'qpid.fed.origin': u'8ea2062a-b200-4639-877d-dc3e799a9d61', u'qpid.fed.op': u'', u'qpid.fed.tags': u'8ea2062a-b200-4639-877d-dc3e799a9d61'}

Second shows nothing

Third shows the same like first


Expected results:
Third to show nothing


Additional info:

Comment 1 Pavel Moravec 2016-11-17 21:24:28 UTC
Btw. confirmed also on 0.18-39.

Comment 2 Pavel Moravec 2016-11-18 08:35:49 UTC
Technically, creating and deleting the bindings on source broker is triggered / requested by destination broker sending exchange.bind on the fed.link with parameters:

Queue: (the link queue on src.broker)
Exchange: the exchange to bind
Binding-key: the binding key
Arguments:
..
qpid.fed.op: empty for new binding, "U" for unbind

For the scenario with 2 consumers, the exchange.bind(qpid.fed.op="U") was not sent by destination broker.

Comment 3 Pavel Moravec 2016-11-18 10:37:47 UTC
Technical cause:

qpid/broker/Exchange.h has FedBinding class taking care of deciding when to call the exchange.bind to propagate the bindings to src.broker.

Adding some debug to:

        /** Returns true if propagation is needed. */
        bool addOrigin(const std::string& queueName, const std::string& origin) {
            QPID_LOG(notice, "PavelM: ADDOrigin, queueName=" << queueName << ", origin=" << origin << ", localBindings=" << localBindings << ", this=" << this);
            if (origin.empty()) {
                localBindings++;
                return localBindings == 1;
            }
            fedBindings[queueName].insert(origin);
            return true;
        }

shows the addOrigin method is called also for the 2nd consumer subscribing to the queue (and attempting to create the binding that is already present). Backtrace:

(gdb) bt
#0  qpid::broker::Exchange::FedBinding::addOrigin (this=0x7f2818003110, queueName="q", origin="") at /root/qpid-cpp-upstream/qpid-cpp/src/qpid/broker/Exchange.h:138
#1  0x00007f2825d0500b in qpid::broker::DirectExchange::bind (this=0xde0710, queue=..., routingKey="key", args=<value optimized out>)
    at /root/qpid-cpp-upstream/qpid-cpp/src/qpid/broker/DirectExchange.cpp:88
#2  0x00007f2825cc8bae in qpid::broker::Queue::bind (this=0x7f281800a570, exchange=..., key="key", arguments=...) at /root/qpid-cpp-upstream/qpid-cpp/src/qpid/broker/Queue.cpp:1601
#3  0x00007f2825c9e2fe in qpid::broker::Broker::bind (this=0xdd8da0, queueName="q", exchangeName="amq.direct", key="key", arguments=..., owner=0xdefd68, userId="anonymous", connectionId=
    "qpid.10.34.1.151:5673-10.34.1.149:42330") at /root/qpid-cpp-upstream/qpid-cpp/src/qpid/broker/Broker.cpp:1607
#4  0x00007f2825d7c1c2 in qpid::broker::SessionAdapter::ExchangeHandlerImpl::bind (this=0xdeff58, queueName="q", exchangeName="amq.direct", routingKey="key", arguments=...)
    at /root/qpid-cpp-upstream/qpid-cpp/src/qpid/broker/SessionAdapter.cpp:151
#5  0x00007f2825700653 in invoke<qpid::framing::AMQP_ServerOperations::ExchangeHandler> (this=<value optimized out>, body=<value optimized out>)
    at /root/qpid-cpp-upstream/qpid-cpp/BLD/src/qpid/framing/ExchangeBindBody.h:92
#6  qpid::framing::AMQP_ServerOperations::ExchangeHandler::Invoker::visit (this=<value optimized out>, body=<value optimized out>)
    at /root/qpid-cpp-upstream/qpid-cpp/BLD/src/qpid/framing/ServerInvoker.cpp:642
#7  0x00007f2825702391 in qpid::framing::AMQP_ServerOperations::Invoker::visit (this=0x7ffe5ada87b0, body=...) at /root/qpid-cpp-upstream/qpid-cpp/BLD/src/qpid/framing/ServerInvoker.cpp:353


I think the addOrigin should be called _only_ when a new binding is really created. Since delOrigin is called when the binding is really being destroyed, i.e. just once, and never else. That causes FedBindings::localBindings > 0 after the binding is removed on dest.broker, thus this broker does not initiate unbind on src.broker.

So the backtrace above IMHO should not happen - somewhere to the methods on that bt snippet, we have to add a condition "do this _only_ when the binding was really created".

Comment 4 Pavel Moravec 2016-11-18 10:57:11 UTC
(issue appears also in upstream for that I propose this patch)

Potential patch:

diff --git a/src/qpid/broker/DirectExchange.cpp b/src/qpid/broker/DirectExchange.cpp
index be66bc2..a8d52b8 100644
--- a/src/qpid/broker/DirectExchange.cpp
+++ b/src/qpid/broker/DirectExchange.cpp
@@ -85,7 +85,8 @@ bool DirectExchange::bind(Queue::shared_ptr queue, const string& routingKey, con
             }
         } else {
             // queue already present - still need to track fedOrigin
-            bk.fedBinding.addOrigin(queue->getName(), fedOrigin);
+            if (!fedOrigin.empty())
+                bk.fedBinding.addOrigin(queue->getName(), fedOrigin);
             return false;
         }
     } else if (fedOp == fedOpUnbind) {

(and analogously fix other exchange types in FanoutExchange.cpp, TopicExchange.cpp etc)

Reasoning:
- see previous comment
- exactly here is the branch "binding requested to add but already present"
- I dont understand "still need to track fedOrigin" but I guess it is applicable for nonempty fedOrigin only - hence the condition

Ted, could you pls. review the patch?

Comment 5 Ted Ross 2016-11-18 14:50:05 UTC
I think the diagnosis is(In reply to Pavel Moravec from comment #4)
> (issue appears also in upstream for that I propose this patch)
> 
> Potential patch:
> 
> diff --git a/src/qpid/broker/DirectExchange.cpp
> b/src/qpid/broker/DirectExchange.cpp
> index be66bc2..a8d52b8 100644
> --- a/src/qpid/broker/DirectExchange.cpp
> +++ b/src/qpid/broker/DirectExchange.cpp
> @@ -85,7 +85,8 @@ bool DirectExchange::bind(Queue::shared_ptr queue, const
> string& routingKey, con
>              }
>          } else {
>              // queue already present - still need to track fedOrigin
> -            bk.fedBinding.addOrigin(queue->getName(), fedOrigin);
> +            if (!fedOrigin.empty())
> +                bk.fedBinding.addOrigin(queue->getName(), fedOrigin);
>              return false;
>          }
>      } else if (fedOp == fedOpUnbind) {
> 
> (and analogously fix other exchange types in FanoutExchange.cpp,
> TopicExchange.cpp etc)
> 
> Reasoning:
> - see previous comment
> - exactly here is the branch "binding requested to add but already present"
> - I dont understand "still need to track fedOrigin" but I guess it is
> applicable for nonempty fedOrigin only - hence the condition
> 
> Ted, could you pls. review the patch?

I think the diagnosis is correct but the patch is incorrect.

Comment 6 Ted Ross 2016-11-18 22:26:52 UTC
On further inspection, I recant my previous assertion that the patch is incorrect.  The patch is merely incomplete.

The change to TopicExchange is good, I believe.  This same change also needs to be applied to DirectExchange and FanoutExchange.

Comment 7 Pavel Moravec 2016-11-20 20:36:12 UTC
For QE:

- alternative scenario worth to test as well: have same setup but 2 sequential consumers; 1st consumer to kill abruptly such that queue it is subscribed to is _not_ deleted. Even then subscribe identical 2nd consumer and let it unsubscribe and delete the queue properly. Example bash code:

pkill -9 qpidd
rm qpidd.*.log

run_qpidd() {
    port=$1
    rm -rf _${port}
    mkdir _${port}
    nohup qpidd --port=$port --log-to-file=$PWD/qpidd.$port.log --data-dir=$PWD/_${port} --auth=no --log-to-stderr=no "$@" > /dev/null 2>&1 &
    sleep 1
}

#start 2 brokers, 20000 is src, 20001 is dst broker
for i in $(seq 20000 20001); do
        run_qpidd $i
done

# create dynamic federation
qpid-route dynamic add localhost:20001 localhost:20000 amq.fanout

# start and stop with Ctrl+C (even) the consumer
drain "btest ; {create: always,  delete: always, link:{x-bindings:[{exchange:'amq.fanout', queue:'btest', key:'btest.1'}]} }" -b localhost:20001 -f &
pid=$!
sleep 1
kill $pid

# start consumer, stop it after 1s timeout of no message
drain "btest ; {create: always,  delete: always, link:{x-bindings:[{exchange:'amq.fanout', queue:'btest', key:'btest.1'}]} }" -b localhost:20001 --timeout 1

# now, localhost:20001 doesnt have the btest queue but localhost:20000 has the binding from amq.fanout exchange :-/
echo
qpid-stat -q -b localhost:20001
echo
qpid-config exchanges -r -b localhost:20000 amq.fanout

Comment 8 Pavel Moravec 2016-11-20 20:37:24 UTC
For QE 2:

- worth checking the scenarios for _various_ exchanges. Since the code change will quite probably needs to be applied to individual types of exchanges (fanout, direct, topic,..) so it makes sense to verify no one is forgotten

Comment 12 Zdenek Kraus 2016-12-06 09:09:12 UTC
We've found a problem still exists when using xml exchange, this problem is tracked by Bug 1401849.

Comment 13 Zdenek Kraus 2016-12-06 13:51:51 UTC
This bug was tested on RHEL 6.8 i686 and x86_64, with following packages:

qpid-cpp-client-0.34-19.el6.i686
qpid-cpp-client-devel-0.34-19.el6.i686
qpid-cpp-client-devel-docs-0.34-19.el6.noarch
qpid-cpp-client-rdma-0.34-19.el6.i686
qpid-cpp-debuginfo-0.34-19.el6.i686
qpid-cpp-server-0.34-19.el6.i686
qpid-cpp-server-devel-0.34-19.el6.i686
qpid-cpp-server-ha-0.34-19.el6.i686
qpid-cpp-server-linearstore-0.34-19.el6.i686
qpid-cpp-server-rdma-0.34-19.el6.i686
qpid-cpp-server-xml-0.34-19.el6.i686

qpid-cpp-client-0.34-19.el6.x86_64
qpid-cpp-client-devel-0.34-19.el6.x86_64
qpid-cpp-client-devel-docs-0.34-19.el6.noarch
qpid-cpp-client-rdma-0.34-19.el6.x86_64
qpid-cpp-debuginfo-0.34-19.el6.x86_64
qpid-cpp-server-0.34-19.el6.x86_64
qpid-cpp-server-devel-0.34-19.el6.x86_64
qpid-cpp-server-ha-0.34-19.el6.x86_64
qpid-cpp-server-linearstore-0.34-19.el6.x86_64
qpid-cpp-server-rdma-0.34-19.el6.x86_64
qpid-cpp-server-xml-0.34-19.el6.x86_64

There is one issue with this fix, tracked by Bug 1401849, and as per agreement, this case will be fixed separately.

Otherwise, fix work as expected.

-> VERIFIED

Comment 15 errata-xmlrpc 2016-12-07 18:58:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2921.html