Hide Forgot
Description of problem: qdrouterd on the capsule complains about a broken connection to the Satellite, causing goferd on clients to log qd:no-route-to-dest errors. Restarting qdrouterd on the capsule temporarily fixes the issue. Version-Release number of selected component (if applicable): Satellite 6.2.7 How reproducible: Sometimes Steps to Reproduce: 1. Set up a Satellite with an external/isolated Capsule Actual results: Capsule: capsule qdrouterd[19369]: read: Connection reset by peer capsule qdrouterd[19369]: send: Broken pipe Client: [ERROR][pulp.agent.73ca16e7-c3e4-4afe-a4f2-82f645229495] gofer.messaging.adapter.proton.reliability:47 - receiver eb42b639-b85b-4ec9-8377-d777ab39c26c from pulp.agent.73ca16e7-c3e4-4afe-a4f2-82f645229495 closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') Expected results: Messages are able to be routed from client to satellite
I am also hitting this not only on my capsules but on content hosts connected to those capsules. It's filling up logs. Mar 15 18:29:07 content1-0 goferd: [ERROR][worker-0] gofer.messaging.adapter.proton.reliability:47 - sender b5617a8f-c1db-449d-b9a2-f0055056f23b to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node')
I can confirm that Andrew's solution above fixes the issue at least temporarily.
(In reply to Charles Birk from comment #3) > I can confirm that Andrew's solution above fixes the issue at least > temporarily. You mean: > Restarting qdrouterd on the capsule temporarily fixes the issue. Correct?
(In reply to Daniele Palumbo from comment #4) > (In reply to Charles Birk from comment #3) > > I can confirm that Andrew's solution above fixes the issue at least > > temporarily. > > You mean: > > Restarting qdrouterd on the capsule temporarily fixes the issue. > > Correct? I am having to restart qdrouterd on the Satellite server as well as every Capsule server that we have (7 total). This only lasts temporarily before we start experiencing the disconnect issue that this bug was first opened with. Example: Mar 21 19:48:29 satellite.example.com qdrouterd[26095]: Tue Mar 21 19:48:29 2017 ROUTER_HELLO (info) HELLO peer expired: capsule6.example.com Mar 21 19:48:42 satellite.example.com qdrouterd[26095]: Tue Mar 21 19:48:42 2017 ROUTER_LS (info) Router Link Lost - link_id=0 Mar 21 20:46:20 satellite.example.com qdrouterd[26095]: Tue Mar 21 20:46:20 2017 ROUTER_HELLO (info) HELLO peer expired: capsule6.example.com Mar 21 20:46:32 satellite.example.com qdrouterd[26095]: Tue Mar 21 20:46:32 2017 ROUTER_LS (info) Router Link Lost - link_id=0 Mar 21 22:30:59 satellite.example.com qdrouterd[26095]: Tue Mar 21 22:30:59 2017 ROUTER_LS (info) Router Link Lost - link_id=0 Mar 22 01:01:05 satellite.example.com qdrouterd[26095]: Wed Mar 22 01:01:05 2017 ROUTER_HELLO (info) HELLO peer expired: capsule5.example.com Mar 22 04:09:39 satellite.example.com qdrouterd[26095]: Wed Mar 22 04:09:39 2017 ROUTER_HELLO (info) HELLO peer expired: capsule4.example.com Mar 22 06:01:34 satellite.example.com qdrouterd[26095]: Wed Mar 22 06:01:34 2017 ROUTER_HELLO (info) HELLO peer expired: devqacapsule.example.com Mar 22 07:47:53 satellite.example.com qdrouterd[26095]: Wed Mar 22 07:47:53 2017 ROUTER_HELLO (info) HELLO peer expired: devqacapsule.example.com Mar 22 07:48:05 satellite.example.com qdrouterd[26095]: Wed Mar 22 07:48:05 2017 ROUTER_LS (info) Router Link Lost - link_id=0 On the capsule servers themselves, we see the following (this is specifically capsule6, just as an example): Mar 21 12:40:38 capsule6.example.com qdrouterd[31488]: Tue Mar 21 12:40:38 2017 ROUTER (info) Configured Remote Link-route-pattern: prefix=qmf. dir=both Mar 21 12:40:38 capsule6.example.com qdrouterd[31488]: Tue Mar 21 12:40:38 2017 SERVER (info) Operational, 4 Threads Running Mar 21 18:15:44 capsule6.example.com qdrouterd[31488]: Tue Mar 21 18:15:44 2017 ROUTER_HELLO (info) HELLO peer expired: satellite.example.com Mar 21 18:39:17 capsule6.example.com qdrouterd[31488]: Tue Mar 21 18:39:17 2017 ROUTER_HELLO (info) HELLO peer expired: satellite.example.com Mar 21 18:47:59 capsule6.example.com qdrouterd[31488]: Tue Mar 21 18:47:59 2017 ROUTER_HELLO (info) HELLO peer expired: satellite.example.com Mar 21 19:09:40 capsule6.example.com qdrouterd[31488]: Tue Mar 21 19:09:40 2017 ROUTER_HELLO (info) HELLO peer expired: satellite.example.com Mar 21 19:31:17 capsule6.example.com qdrouterd[31488]: Tue Mar 21 19:31:17 2017 ROUTER_HELLO (info) HELLO peer expired: satellite.example.com Mar 21 19:41:52 capsule6.example.com qdrouterd[31488]: Tue Mar 21 19:41:52 2017 ROUTER_HELLO (info) HELLO peer expired: satellite.example.com Mar 21 19:48:30 capsule6.example.com qdrouterd[31488]: Tue Mar 21 19:48:30 2017 ROUTER_HELLO (info) HELLO peer expired: satellite.example.com Mar 21 20:46:18 capsule6.example.com qdrouterd[31488]: Tue Mar 21 20:46:18 2017 ROUTER_HELLO (info) HELLO peer expired: satellite.example.com And on any content host that is connected to 'capsule6' we are seeing the 'qd:no-route-to-dest' error. Goferd is also causing high iowait on several of our machines.
Guys, do you have some sort of reproducer? Without that, it is hard for engineering to fix it.. Have you migrated the Content Host among Capsules, for example? Since when I migrate a Content Host from Sat to Caps (or vice versa), I get the same: # set caps to FQDN of Satellite/Capsule where you migrate to rpm -ev $(rpm -qa | grep katello-ca-consumer) rpm -Uvh http://${caps}.gsslab.brq.redhat.com/pub/katello-ca-consumer-latest.noarch.rpm subscription-manager register --user admin --password redhat --environment Library --force service goferd restart then I get: Feb 11 19:35:54 pmoravec-rhel72 goferd: [ERROR][worker-0] gofer.messaging.adapter.proton.reliability:47 - sender ccf5f18e-5e67-4ca2-a763-2769c2185ea8 to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') (I planned to investigate this before filing BZ but havent time so far) Does/Can this reproducer conform with your scenarios?
Re-reading the BZ again, I forgot on "qdrouterd on the capsule complains about a broken connection to the Satellite, causing goferd on clients to log qd:no-route-to-dest errors. " : If qdrouterd on Caps isnt connected to qdrouterd on Sat, then the goferd errors are reasonable. Since qdrouterd is just a _router_ (like IP router), inter-connecting goferd on Content Hosts to qpidd on Satellite. If either router or connection among isnt established, no route to destination (qpidd). So you should check why the broken connections happen. 1) "HELLO peer expired: satellite.example.com" - this often happened in 0.4-20 (or older) version(s) of qpid-dispatch, see: https://bugzilla.redhat.com/show_bug.cgi?id=1398377 https://access.redhat.com/solutions/2778111 2) is the qdrouter trying to reconnect? It should (quite promptly & frequently, I think) 3) how that reconnect attempt succeeds/fails? If it fails, check if e.g. ulimit -n isnt reached by either qdrouterd 4) once qdrouterd connections are established, goferd shall reconnect again (after some timeout between probes) - automatically
(In reply to Pavel Moravec from comment #7) > Re-reading the BZ again, I forgot on "qdrouterd on the capsule complains > about a broken connection to the Satellite, causing goferd on clients to log > qd:no-route-to-dest errors. " : > > If qdrouterd on Caps isnt connected to qdrouterd on Sat, then the goferd > errors are reasonable. Since qdrouterd is just a _router_ (like IP router), > inter-connecting goferd on Content Hosts to qpidd on Satellite. If either > router or connection among isnt established, no route to destination (qpidd). > > So you should check why the broken connections happen. > > 1) > "HELLO peer expired: satellite.example.com" - this often happened in 0.4-20 > (or older) version(s) of qpid-dispatch, see: > > https://bugzilla.redhat.com/show_bug.cgi?id=1398377 > https://access.redhat.com/solutions/2778111 1.) On our Satellite servers: libqpid-dispatch.x86_64 0.4-21.el7sat @rhel-7-server-satellite-6.2-rpms qpid-dispatch-router.x86_64 0.4-21.el7sat @rhel-7-server-satellite-6.2-rpms On our Capsule servers: libqpid-dispatch.x86_64 0.4-21.el7sat @rhel-7-server-satellite-capsule-6.2-rpms qpid-dispatch-router.x86_64 0.4-21.el7sat @rhel-7-server-satellite-capsule-6.2-rpms > > 2) is the qdrouter trying to reconnect? It should (quite promptly & > frequently, I think) It is, and it's filling /var/log/messages full of this: goferd: [ERROR][worker-0] gofer.messaging.adapter.proton.reliability:47 - sender b5617a8f-c1db-449d-b9a2-f0055056f23b to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') > > 3) how that reconnect attempt succeeds/fails? If it fails, check if e.g. > ulimit -n isnt reached by either qdrouterd As far as I can tell, this limit is not being hit. > > 4) once qdrouterd connections are established, goferd shall reconnect again > (after some timeout between probes) - automatically As I previously stated, restarting qdrouterd on the Satellite server and then all 7 capsules seems to _temporarily_ resolve the issue.
Thank you Charles for providing this key information. We don't have a reproducer inside engineering at this time, so I really appreciate your participation. Pavel, As Charles mentioned, Sat 6.2.8 ships with qpid-dispatch 0.4-21. About the no-route-to-dest goferd error messages you refer to in comment 6, does restarting qdrouterd on all the capsules fix the communication problems temporarily, or is the route between goferd and the satellite permanently broken?
(In reply to Charles Birk from comment #8) > (In reply to Pavel Moravec from comment #7) > > Re-reading the BZ again, I forgot on "qdrouterd on the capsule complains > > about a broken connection to the Satellite, causing goferd on clients to log > > qd:no-route-to-dest errors. " : > > > > If qdrouterd on Caps isnt connected to qdrouterd on Sat, then the goferd > > errors are reasonable. Since qdrouterd is just a _router_ (like IP router), > > inter-connecting goferd on Content Hosts to qpidd on Satellite. If either > > router or connection among isnt established, no route to destination (qpidd). > > > > So you should check why the broken connections happen. > > > > 1) > > "HELLO peer expired: satellite.example.com" - this often happened in 0.4-20 > > (or older) version(s) of qpid-dispatch, see: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1398377 > > https://access.redhat.com/solutions/2778111 > > > 1.) On our Satellite servers: > > libqpid-dispatch.x86_64 0.4-21.el7sat > @rhel-7-server-satellite-6.2-rpms > qpid-dispatch-router.x86_64 0.4-21.el7sat > @rhel-7-server-satellite-6.2-rpms > > On our Capsule servers: > > libqpid-dispatch.x86_64 0.4-21.el7sat > @rhel-7-server-satellite-capsule-6.2-rpms > qpid-dispatch-router.x86_64 0.4-21.el7sat > @rhel-7-server-satellite-capsule-6.2-rpms > > > > > > 2) is the qdrouter trying to reconnect? It should (quite promptly & > > frequently, I think) > > It is, and it's filling /var/log/messages full of this: > > goferd: [ERROR][worker-0] gofer.messaging.adapter.proton.reliability:47 - > sender b5617a8f-c1db-449d-b9a2-f0055056f23b to qmf.default.direct closed due > to: Condition('qd:no-route-to-dest', 'No route to the destination node') > > > > > 3) how that reconnect attempt succeeds/fails? If it fails, check if e.g. > > ulimit -n isnt reached by either qdrouterd > > As far as I can tell, this limit is not being hit. > > > > > 4) once qdrouterd connections are established, goferd shall reconnect again > > (after some timeout between probes) - automatically > > As I previously stated, restarting qdrouterd on the Satellite server and > then all 7 capsules seems to _temporarily_ resolve the issue. Here the real cause is bz1367735 causing qpidd consuming much memory causing OOM killer killed qpidd causing the 'qd:no-route-to-dest' is expected consequence. (now I will check the other 2 cases linked to this BZ)
Data in case 01775422 are insufficient, asking for more there.
Pavel, I set up a 6.2.8 environment following the steps in comment 6, but I'm not able to reproduce the repeated no-route-to-dest errors in goferd. Here's what I did: 1. Install fresh 6.2.8 satellite and capsule install with RHEL 7 repos and Sat 6.2 RHEL 7 Tools repo 2. Register a RHEL7 content host to the satellite, enable it both repos, install katello-agent. 3. Ensure goferd logs on client and qdrouterd logs on satellite/capsule look good 4. On the client rpm -ev $(rpm -qa | grep katello-ca-consumer) rpm -Uvh http://capsule.example.com/pub/katello-ca-consumer-latest.noarch.rpm subscription-manager register --user admin --password changeme --environment Library --force --auto-attach systemctl restart goferd I'm seeing none of the no-route-to-dest errors in goferd, just a successful connection: Mar 24 15:28:19 sat62-host goferd: [INFO][MainThread] gofer.agent.main:87 - agent started. Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://capsule.example.com:5647 Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://capsule.example.com:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] root:490 - connecting to capsule.example.com:5647... Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] gofer.messaging.adapter.proton.connection:92 - opened: proton+amqps://capsule.example.com:5647 Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] gofer.messaging.adapter.connect:30 - connected: proton+amqps://capsule.example.com:5647 Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] root:512 - connected to capsule.example.com:5647 Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] gofer.messaging.adapter.proton.connection:131 - closed: proton+amqps://capsule.example.com:5647 Mar 24 15:28:19 sat62-host goferd: [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://capsule.example.com:5647 Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] gofer.agent.plugin:368 - plugin:katelloplugin, attached => pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1 Mar 24 15:28:19 sat62-host goferd: [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://capsule.example.com:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None Mar 24 15:28:19 sat62-host goferd: [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] root:490 - connecting to capsule.example.com:5647... Mar 24 15:28:20 sat62-host goferd: [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] gofer.messaging.adapter.proton.connection:92 - opened: proton+amqps://capsule.example.com:5647 Mar 24 15:28:20 sat62-host goferd: [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] gofer.messaging.adapter.connect:30 - connected: proton+amqps://capsule.example.com:5647 Mar 24 15:28:20 sat62-host goferd: [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] root:512 - connected to capsule.example.com:5647 Did I miss a step? I still have the environment set up. I'd be really interested to see how your reproducer differs.
(In reply to Andrew Kofink from comment #15) > Pavel, > > I set up a 6.2.8 environment following the steps in comment 6, but I'm not > able to reproduce the repeated no-route-to-dest errors in goferd. > > Here's what I did: > > 1. Install fresh 6.2.8 satellite and capsule install with RHEL 7 repos and > Sat 6.2 RHEL 7 Tools repo > 2. Register a RHEL7 content host to the satellite, enable it both repos, > install katello-agent. > 3. Ensure goferd logs on client and qdrouterd logs on satellite/capsule look > good > 4. On the client > > rpm -ev $(rpm -qa | grep katello-ca-consumer) > rpm -Uvh http://capsule.example.com/pub/katello-ca-consumer-latest.noarch.rpm > subscription-manager register --user admin --password changeme --environment > Library --force --auto-attach > systemctl restart goferd > > I'm seeing none of the no-route-to-dest errors in goferd, just a successful > connection: > > Mar 24 15:28:19 sat62-host goferd: [INFO][MainThread] gofer.agent.main:87 - > agent started. > Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] > gofer.messaging.adapter.connect:28 - connecting: > proton+amqps://capsule.example.com:5647 > Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] > gofer.messaging.adapter.proton.connection:87 - open: URL: > amqps://capsule.example.com:5647|SSL: ca: > /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: > /etc/pki/consumer/bundle.pem|host-validation: None > Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] root:490 - connecting to > capsule.example.com:5647... > Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] > gofer.messaging.adapter.proton.connection:92 - opened: > proton+amqps://capsule.example.com:5647 > Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] > gofer.messaging.adapter.connect:30 - connected: > proton+amqps://capsule.example.com:5647 > Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] root:512 - connected to > capsule.example.com:5647 > Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] > gofer.messaging.adapter.proton.connection:131 - closed: > proton+amqps://capsule.example.com:5647 > Mar 24 15:28:19 sat62-host goferd: > [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] > gofer.messaging.adapter.connect:28 - connecting: > proton+amqps://capsule.example.com:5647 > Mar 24 15:28:19 sat62-host goferd: [INFO][worker-0] gofer.agent.plugin:368 - > plugin:katelloplugin, attached => > pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1 > Mar 24 15:28:19 sat62-host goferd: > [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] > gofer.messaging.adapter.proton.connection:87 - open: URL: > amqps://capsule.example.com:5647|SSL: ca: > /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: > /etc/pki/consumer/bundle.pem|host-validation: None > Mar 24 15:28:19 sat62-host goferd: > [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] root:490 - > connecting to capsule.example.com:5647... > Mar 24 15:28:20 sat62-host goferd: > [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] > gofer.messaging.adapter.proton.connection:92 - opened: > proton+amqps://capsule.example.com:5647 > Mar 24 15:28:20 sat62-host goferd: > [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] > gofer.messaging.adapter.connect:30 - connected: > proton+amqps://capsule.example.com:5647 > Mar 24 15:28:20 sat62-host goferd: > [INFO][pulp.agent.9aeb80c7-75c0-44b4-bfdd-5da6065b29b1] root:512 - connected > to capsule.example.com:5647 > > Did I miss a step? I still have the environment set up. I'd be really > interested to see how your reproducer differs. My reproducer is very unreliable. I succeeded with this reproducer just once in past and once today. Today with above logs & coredump. Today I know I freezed qdrouterd on Satellite for a while when I was checking the process via gdb - that triggered hello peer expired and link lost (and this is trivially reproducible, i.e. via: pid=$(pgrep qdrouterd) echo "$(date): freezing" kill -s SIGSTOP $pid sleep $((RANDOM%10+10)) echo "$(date): unfreezing" kill -s SIGCONT $pid Now I let running the above in a loop together with "migrating" the goferd to connect to Satellite / Capsule, but without a success. Simply we dont have a reliable reproducer :(
I think this sequence of qpidd(S) and qdrouterd(S) restarts led to the qdrouterd(C) misbehaviour: [root@pmoravec-sat62-rhel7 ~]# ./random_restart_qpidd_qdrouterd.sh Redirecting to /bin/systemctl start qpidd.service Redirecting to /bin/systemctl start qdrouterd.service Mon Mar 27 22:43:38 CEST 2017: everyting started Mon Mar 27 22:43:38 CEST 2017: applying qpidd stop Redirecting to /bin/systemctl stop qpidd.service Mon Mar 27 22:43:38 CEST 2017: status is: qpidd=0, qdrouterd=1 Mon Mar 27 22:43:44 CEST 2017: applying qdrouterd stop Redirecting to /bin/systemctl stop qdrouterd.service Mon Mar 27 22:43:44 CEST 2017: status is: qpidd=0, qdrouterd=0 Mon Mar 27 22:43:49 CEST 2017: applying qdrouterd start Redirecting to /bin/systemctl start qdrouterd.service Mon Mar 27 22:43:49 CEST 2017: status is: qpidd=0, qdrouterd=1 Mon Mar 27 22:43:58 CEST 2017: applying qpidd start Redirecting to /bin/systemctl start qpidd.service Mon Mar 27 22:43:58 CEST 2017: status is: qpidd=1, qdrouterd=1 Mon Mar 27 22:44:06 CEST 2017: applying qpidd stop Redirecting to /bin/systemctl stop qpidd.service Mon Mar 27 22:44:07 CEST 2017: status is: qpidd=0, qdrouterd=1 Mon Mar 27 22:44:12 CEST 2017: applying qdrouterd stop Redirecting to /bin/systemctl stop qdrouterd.service Mon Mar 27 22:44:12 CEST 2017: status is: qpidd=0, qdrouterd=0 Mon Mar 27 22:44:17 CEST 2017: applying qdrouterd start Redirecting to /bin/systemctl start qdrouterd.service Mon Mar 27 22:44:17 CEST 2017: status is: qpidd=0, qdrouterd=1 Mon Mar 27 22:44:23 CEST 2017: applying qpidd start Redirecting to /bin/systemctl start qpidd.service Mon Mar 27 22:44:23 CEST 2017: status is: qpidd=1, qdrouterd=1 ^C[root@pmoravec-sat62-rhel7 ~]# Reproducer script: service qpidd start service qdrouterd start qpidd=1 qdrouterd=1 echo "$(date): everyting started" while true; do if [ $((RANDOM%2)) -eq 0 ]; then serv="qpidd" if [ $qpidd -eq 1 ]; then qpidd=0 act="stop" else qpidd=1 act="start" fi else serv="qdrouterd" if [ $qdrouterd -eq 1 ]; then qdrouterd=0 act="stop" else qdrouterd=1 act="start" fi fi echo "$(date): applying $serv $act" service $serv $act echo "$(date): status is: qpidd=$qpidd, qdrouterd=$qdrouterd" sleep 5 sleep $((RANDOM%5)) done Anyway, repeating the same steps did _not_ trigger the same bug - sounds some race condition affects it..
Is there a way to dump "routing table" from either qdrouterd (I think it isnt due to the management data deliberate drop recently)? Or how to get it from gdb / coredump? Or what data are worth to collect that could help you? (enable proton traces? some debugging?) Just a thought where the problem might also be: theoretically, qdrouterd(S) could also be the guilty one, if it does not update (due to some error) the qdrouterd(C) that link routing to qpidd is available (again / for the first time).
(In reply to Pavel Moravec from comment #18) > Just a thought where the problem might also be: theoretically, qdrouterd(S) > could also be the guilty one, if it does not update (due to some error) the > qdrouterd(C) that link routing to qpidd is available (again / for the first > time). One reproducer for this exists - NOT seen at a customer. If I artificially restrict ulimit -n for qdrouterd such that it cant open TCP connections _or_ some files it dynamically loads _after_ the TCP and AMQP connection is opened, I can reproduce situation where: - qdrouterd(S) complains about Too many open files (expected, this does not occur at customers) - TCP connection between the routers is established (hence user _thinks_ everything is inter-connected) - but the AMQP link isnt established, thus qd:no-route-to-dest is reasonable So could either of the routers establish TCP connection but not open the AMQP link?
(In reply to Pavel Moravec from comment #19) > (In reply to Pavel Moravec from comment #18) > > Just a thought where the problem might also be: theoretically, qdrouterd(S) > > could also be the guilty one, if it does not update (due to some error) the > > qdrouterd(C) that link routing to qpidd is available (again / for the first > > time). > > One reproducer for this exists - NOT seen at a customer. > > If I artificially restrict ulimit -n for qdrouterd such that it cant open > TCP connections _or_ some files it dynamically loads _after_ the TCP and > AMQP connection is opened, I can reproduce situation where: > > - qdrouterd(S) complains about Too many open files (expected, this does not > occur at customers) > - TCP connection between the routers is established (hence user _thinks_ > everything is inter-connected) > - but the AMQP link isnt established, thus qd:no-route-to-dest is reasonable > > > So could either of the routers establish TCP connection but not open the > AMQP link? Can we try doing the following on both routers 1. turning on PN_TRACE_FRM=1 2. turn on trace logging When the connection between the routers drops, you should see a message like this on both router logs indicating loss of connection - Closed 127.0.0.1:39328 Tue Mar 28 10:34:13 2017 ROUTER_LS (info) Router Link Lost - link_id=0 When the connection is re-established between the routers, you should see the SENT and RCVD HELLO messages going between the routers after an AMQP connection is established on top of the TCP connection like this - Tue Mar 28 10:34:12 2017 ROUTER_MA (trace) RCVD: MAR(id=Router.B area=0 have_seq=0) Tue Mar 28 10:34:12 2017 ROUTER_MA (trace) SENT: MAU(id=Router.A area=0 mobile_seq=1 add=['Cpulp.', 'Dpulp.'] del=[]) When a router receives an attach with a target address, it looks in its routing table. If the routing table *does* have this target address, it tries to find a connection on which this attach can be forwarded. If it cannot find any relevant connection, it throws the qd:no-route-to-dest error. Note here that the fact that you are seeing the qd:no-route-to-dest implicitly means that the router was successfully able to find the target address in its routing table but does not know where to forward the attach because the connection that the attach was supposed to be forwarded is not available or established. So, you can be sure that the router's routing table already has target address.
I found similar issue while performing n-1 capsule testing w/ satellite 6.2.9. I upgraded my satellite from 6.2.8 -> 6.2.9 while keeping the capsule at 6.2.8. Satellite was upgraded successfully, Later, I created a host through capsule and I was not able to install package on that host through satellite server. goferd was running. However, I was getting following error on checking goferd status: Apr 03 09:40:37 capn1rhel7test.capqe.lab.eng.rdu2.redhat.com goferd[10812]: [ERROR][worker-0] gofer.messaging.adapter.proton.reliability:47 - sender 5093ca70-6bdf-422a-99da-a9e32245...on node') and following error found in /var/log/messages: Apr 3 09:38:03 capn1rhel7test goferd: [ERROR][worker-0] gofer.messaging.adapter.proton.reliability:47 - sender 825ac9f4-9701-432e-b97c-c882eb8c41f4 to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') Please note that this issue appears only for those hosts which were installed through capsule. And as soon as I restarted "qdrouterd" service on capsule, issue was gone. I'll try to reproduce this issue.
Created attachment 1268648 [details] tcpdump of inter-qdrouterd traffic When qdrouterd(C) was already returning the error, qdrouterd(S) restarted with above tcpdump running (just decode 5646 as AMQP, no SSL). See frame 101 and 5th message there - qdrouterd(S) tells qdrouterd(C) to add "pulp" and "qmf" prefixes for link routing. That suggests me qdrouterd(C) was well informed about the connectivity status, but it hasnt updated its routing table/rules accordingly, since then - and has been still replying the "no link" error to clients. Questions: - when qdrouterd(C) can ignore such "add prefix" request? - can it ignore when it thinks there is _another_ link that can be used (but the link is orphaned record, such that when a client request comes in, this link record is skipped)?
From all the cases of this bug I can find, it seems the capsule version is one z stream behind the satellite server version (i.e. satellite 6.2.8, capsule 6.2.7) . It may be helpful to upgrade the capsule version to match the satellite.
I can add that our Satellite server is at 6.2.7 and our capsules are as well.
(In reply to Charles Birk from comment #32) > I can add that our Satellite server is at 6.2.7 and our capsules are as well. Didnt the error occur roughly during upgrade from 6.2.6 (or older) to 6.2.7? Since any upgrade starts by upgrading Satellite. I also reproduced it when upgrading from 6.2.8 to 6.2.9 QA snapshot - will debug more and restrict the reproducer just to standalone qdrouterd.
Created attachment 1272873 [details] reproduced qdrouterd traces with updated mobile.py The attached logs are from qdrouterd(C) - the faulty one - that curiously survived Satellite's upgrade, but didnt survive its reboot. The goferd client was mimicked by script: #!/usr/bin/python from time import sleep from uuid import uuid4 import datetime from proton import ConnectionException, Message, Timeout from proton import SSLDomain, SSLException from proton.utils import BlockingConnection from proton.reactor import DynamicNodeProperties ca_certificate='/etc/rhsm/ca/katello-server-ca.pem' client_certificate='/etc/pki/consumer/bundle.pem' client_key=None domain = SSLDomain(SSLDomain.MODE_CLIENT) domain.set_trusted_ca_db(ca_certificate) domain.set_credentials(client_certificate, client_key or client_certificate, None) domain.set_peer_authentication(SSLDomain.VERIFY_PEER) qmf_snd = None qmf_rec = None qmf_conn = None bounce = True while True: try: exception = False if qmf_conn is None: qmf_conn = BlockingConnection("amqps://pmoravec-caps62-bz1426242.gsslab.brq.redhat.com:5647", ssl_domain=domain, heartbeat=5) qmf_snd = qmf_conn.create_sender("qmf.default.direct", name=str(uuid4())) # qmf_rec = qmf_conn.create_receiver(None, name=str(uuid4()), dynamic=True, options=DynamicNodeProperties({'x-opt-qd.address': unicode("qmf.default.direct")})) except Exception, e: print "%s: %s" % (str(datetime.datetime.now()), e) exception = True pass finally: sleep(5) if not exception: print "%s: OK" % str(datetime.datetime.now()) if bounce or exception: try: if qmf_snd: qmf_snd.close() if qmf_rec: qmf_rec.close() if qmf_conn: qmf_conn.close() except Exception: pass qmf_conn = None Script output (relevant part): .. 2017-04-20 09:07:24.867724: OK 2017-04-20 09:07:25.911981: sender e4524b8c-4b7d-40ed-a5f0-dd60a1d532e4 to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') .. (then only no-route-to-dest errors) Curiously, when I modified the script by commenting out qmf_snd = .. and uncommenting qmf_rec = .. (just to try using _consumer_ and not _producer_ to the same address), the script succeeded: 2017-04-20 09:14:57.519501: OK So it seems just _producer_ link routing is affected (??? how is that even possible?).
A small correction to the timestamps of the client: .. 2017-04-20 09:07:12.863213: OK 2017-04-20 09:07:18.910720: OK 2017-04-20 09:07:24.867724: OK 2017-04-20 09:07:25.911981: sender e4524b8c-4b7d-40ed-a5f0-dd60a1d532e4 to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') 2017-04-20 09:07:31.412827: sender 2c44770d-f2ad-4f37-94c6-9c6750f361d7 to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') 2017-04-20 09:07:36.918367: sender 8df3689c-e9fb-4f1f-819f-d90c5299a036 to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') .. The timestamps are printed _after_ the 5s sleep or _immediatelly_ when an exception occurs. That means, at 09:07:19, client succeeded, waited 5s to 09:07:24. A second later another attempt failed. Relevant trace logs from the router are: Thu Apr 20 09:07:17 2017 MESSAGE (trace) Received Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\94\00\00\00\08\a1\04seen\d0'} on link qd.internode.1 Thu Apr 20 09:07:17 2017 ROUTER_HELLO (trace) RCVD: HELLO(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492671393 seen=['pmoravec-caps62-bz1426242.gsslab.brq.redhat.com']) Thu Apr 20 09:07:18 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=['pmoravec-sat62-bz1426242.gsslab.brq.redhat.com']) Thu Apr 20 09:07:18 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\94\00\00\00\08\a1\04seen\d0'} on link qd.internode.2 Thu Apr 20 09:07:18 2017 MESSAGE (trace) Received Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\94\00\00\00\08\a1\04seen\d0'} on link qd.internode.1 Thu Apr 20 09:07:18 2017 ROUTER_HELLO (trace) RCVD: HELLO(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492671393 seen=['pmoravec-caps62-bz1426242.gsslab.brq.redhat.com']) Thu Apr 20 09:07:18 2017 SERVER (debug) Accepting incoming connection from 10.34.85.74:38078 to 0.0.0.0:5647 Thu Apr 20 09:07:18 2017 SERVER (trace) Configuring SSL on incoming connection from 10.34.85.74:38078 to 0.0.0.0:5647 Thu Apr 20 09:07:19 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=['pmoravec-sat62-bz1426242.gsslab.brq.redhat.com']) Thu Apr 20 09:07:19 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\94\00\00\00\08\a1\04seen\d0'} on link qd.internode.2 Thu Apr 20 09:07:20 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=['pmoravec-sat62-bz1426242.gsslab.brq.redhat.com']) Thu Apr 20 09:07:20 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\94\00\00\00\08\a1\04seen\d0'} on link qd.internode.2 Thu Apr 20 09:07:21 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=['pmoravec-sat62-bz1426242.gsslab.brq.redhat.com']) Thu Apr 20 09:07:21 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\94\00\00\00\08\a1\04seen\d0'} on link qd.internode.2 Thu Apr 20 09:07:22 2017 ROUTER_HELLO (info) HELLO peer expired: pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Thu Apr 20 09:07:22 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=[]) Thu Apr 20 09:07:22 2017 ROUTER (trace) Node pmoravec-sat62-bz1426242.gsslab.brq.redhat.com link removed Thu Apr 20 09:07:22 2017 ROUTER (trace) Entered Router Flux Mode Thu Apr 20 09:07:22 2017 ROUTER_LS (trace) Local Link State: LS(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 ls_seq=6 peers=[]) Thu Apr 20 09:07:22 2017 ROUTER_LS (trace) Computed next hops: {} Thu Apr 20 09:07:22 2017 ROUTER_LS (trace) Computed valid origins: {} Thu Apr 20 09:07:22 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 ls_seq=6 mobile_seq=0) Thu Apr 20 09:07:22 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\\00\00\00\08\a1\04seenE'} on link qd.internode.2 Thu Apr 20 09:07:23 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=[]) Thu Apr 20 09:07:23 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\\00\00\00\08\a1\04seenE'} on link qd.internode.2 Thu Apr 20 09:07:24 2017 SERVER (debug) Accepting incoming connection from 10.34.85.74:38080 to 0.0.0.0:5647 Thu Apr 20 09:07:24 2017 SERVER (trace) Configuring SSL on incoming connection from 10.34.85.74:38080 to 0.0.0.0:5647 Thu Apr 20 09:07:24 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=[]) Thu Apr 20 09:07:24 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\\00\00\00\08\a1\04seenE'} on link qd.internode.2 Thu Apr 20 09:07:25 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=[]) Thu Apr 20 09:07:25 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\\00\00\00\08\a1\04seenE'} on link qd.internode.2 Thu Apr 20 09:07:26 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=[]) Thu Apr 20 09:07:26 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 ls_seq=6 mobile_seq=0) Thu Apr 20 09:07:26 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\\00\00\00\08\a1\04seenE'} on link qd.internode.2 Thu Apr 20 09:07:27 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=[]) Thu Apr 20 09:07:27 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\\00\00\00\08\a1\04seenE'} on link qd.internode.2 Thu Apr 20 09:07:29 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=[]) Thu Apr 20 09:07:29 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\\00\00\00\08\a1\04seenE'} on link qd.internode.2 Thu Apr 20 09:07:30 2017 ROUTER_HELLO (trace) SENT: HELLO(id=pmoravec-caps62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492670292 seen=[]) Thu Apr 20 09:07:30 2017 ROUTER (trace) Exited Router Flux Mode Thu Apr 20 09:07:30 2017 MESSAGE (trace) Sending Message{to='amqp:/_local/qdhello' body='\d1\00\00\00\\00\00\00\08\a1\04seenE'} on link qd.internode.2 Thu Apr 20 09:07:30 2017 SERVER (debug) Accepting incoming connection from 10.34.85.74:38082 to 0.0.0.0:5647 Thu Apr 20 09:07:30 2017 SERVER (trace) Configuring SSL on incoming connection from 10.34.85.74:38082 to 0.0.0.0:5647 (I *think* the "ROUTER_LS (trace) SENT: RA(id=.." is important here - noticed it in one another case as well)
Simpler reproducer (mimicking Satellite upgrade by pausing the VM): - run qdrouterd(S) on a VM - restart qdrouterd - immediately to that, pause the VM - wait a minute or two - resume the VM - optionally, restart qdrouterd(S) (to ensure this router isnt in a broken shape) - wait whatever long, but qdrouterd(C) will bounce any _sender_ client with qd:no-route-to-dest error - while the router has link to the peer router established, destinations mapped etc. Curiously, check if traces on the qdrouterd(C) have: Thu Apr 20 13:05:32 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492686127 ls_seq=1 mobile_seq=1) Thu Apr 20 13:05:33 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492686127 ls_seq=3 mobile_seq=1) but the ls_seq=2 is missing. (while traces on the qdrouterd(S) have also the missing SENT ls_seq=2 one)
Created attachment 1273067 [details] trace logs with verbose mobile.py and tcpdumps qdrouterd-S.log = Satellite's router qdrouterd-C.log = Capsule's router amqp-caps.cap = tcpdump from Capsule (port 5646 = AMQP) amqp-sat.cap = tcpdump from Satellite (port 5646 = AMQP) VM on Satellite was frozen from approx. 20:25:10 - 20:27:51 but the Satellite corrected its clock even at 20:28:58 (I deduce). qdrouterd(S) thought it sent ls_seq=3: Thu Apr 20 20:25:16 2017 ROUTER_LS (trace) SENT: LSU(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=2 ls=LS(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 ls_seq=2 peers=['pmoravec-caps62-bz1426242.gsslab.brq.redhat.com'])) Thu Apr 20 20:25:20 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=2 mobile_seq=1) Thu Apr 20 20:25:50 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=2 mobile_seq=1) Thu Apr 20 20:26:20 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=2 mobile_seq=1) Thu Apr 20 20:28:58 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=2 mobile_seq=1) Thu Apr 20 20:28:58 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=3 mobile_seq=1) Thu Apr 20 20:28:59 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:28:59 2017 ROUTER_LS (trace) SENT: LSU(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 ls=LS(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 ls_seq=4 peers=['pmoravec-caps62-bz1426242.gsslab.brq.redhat.com'])) Thu Apr 20 20:29:03 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:29:33 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:30:04 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:30:34 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:31:04 2017 ROUTER_LS (trace) SENT: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) but qdrouterd(C) didnt get it: Thu Apr 20 20:27:58 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=2 mobile_seq=1) Thu Apr 20 20:28:28 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=2 mobile_seq=1) Thu Apr 20 20:28:57 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=2 mobile_seq=1) Thu Apr 20 20:28:58 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=2 mobile_seq=1) Thu Apr 20 20:28:59 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:28:59 2017 ROUTER_LS (trace) RCVD: LSU(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 ls=LS(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 ls_seq=4 peers=['pmoravec-caps62-bz1426242.gsslab.brq.redhat.com'])) Thu Apr 20 20:29:03 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:29:33 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:30:03 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:30:34 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) Thu Apr 20 20:31:04 2017 ROUTER_LS (trace) RCVD: RA(id=pmoravec-sat62-bz1426242.gsslab.brq.redhat.com area=0 inst=1492712695 ls_seq=4 mobile_seq=1) and neither tcpdump contains that AMQP frame (search for "frame contains 6c:73:5f:73:65:71:55:03"). I guess qdrouterd(S) was so confused due to timer adjusting/reacting to OS time change. Not sure if this is the culprit (i.e. if ls_seq must be incremented by one otherwise it is ignored?), but this is the best weird thing I found in the traces.
under some circumstances, the router does not remove some old router reference from addr->rnodes, when getting new MAU(?). See e.g.: # grep mapped /var/log/qdrouterd.log | grep pulp Fri Apr 21 08:49:01 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 08:49:01 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 08:50:34 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 08:52:03 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 08:52:03 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:03:02 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:04:10 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:04:10 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:09:46 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:10:26 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:10:26 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:11:49 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:13:07 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:13:07 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:00:23 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:00:26 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:00:26 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:04:33 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:11:20 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:11:20 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:19:26 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:23:18 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:23:18 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:25:52 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:25:52 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:25:52 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:27:12 2017 ROUTER (debug) Remote destination pulp.;class=D unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:27:12 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:34:04 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:34:04 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:16:10 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:16:12 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:16:12 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:32:36 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:32:36 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:32:36 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:33:55 2017 ROUTER (debug) Remote destination pulp.;class=D unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:33:55 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:33:57 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:33:57 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com # split for C/D: # grep mapped /var/log/qdrouterd.log | grep "pulp.;class=C" Fri Apr 21 08:49:01 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 08:50:34 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 08:52:03 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:03:02 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:04:10 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:09:46 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:10:26 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:11:49 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:13:07 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:00:23 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:00:26 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:04:33 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:11:20 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:19:26 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:23:18 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:25:52 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:25:52 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:27:12 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:34:04 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:16:10 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:16:12 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:32:36 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:32:36 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:33:55 2017 ROUTER (debug) Remote destination pulp.;class=C unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:33:57 2017 ROUTER (debug) Remote destination pulp.;class=C mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com # # grep mapped /var/log/qdrouterd.log | grep "pulp.;class=D" Fri Apr 21 08:49:01 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 08:52:03 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:04:10 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:10:26 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Fri Apr 21 09:13:07 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:00:26 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:11:20 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:23:18 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:25:52 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:27:12 2017 ROUTER (debug) Remote destination pulp.;class=D unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 12:34:04 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:16:12 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:32:36 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:33:55 2017 ROUTER (debug) Remote destination pulp.;class=D unmapped from router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com Sat Apr 22 13:33:57 2017 ROUTER (debug) Remote destination pulp.;class=D mapped to router pmoravec-sat62-bz1426242.gsslab.brq.redhat.com # Why every "class=C mapped" is followed (some time later on) by "class=C unmapped", but the same isnt true for class=D . The reason is described in https://issues.apache.org/jira/browse/DISPATCH-749 and explained by fact that 'C' addresses are mapped on odd positions while 'D' on even, and the current "removal" deletes just odd positions. Applying the fix from the https://github.com/apache/qpid-dispatch/pull/160, reproducer does not trigger the infinite no-route-to-host further more \o/ .
Successfully pre-verified this on 0.4-22 build. Reproducer steps: 0) setup: - Satellite on a VM you can control - external Capsule - client registered to the Capsule with goferd running 1) on Satellite: service qdrouterd restart 2) Immediately(*) after that, suspend the VM with Satellite. "Immediately" means humen-acceptably delay, like execute the restart and click to Suspend after 10 seconds. 3) wait at least 2 minutes 4) Resume the VM 5) on the client system, restart goferd (just to ensure it will re-try to connect) Current behaviour: below error appears forever (with some delays between goferd connection attempts) Apr 26 08:44:16 pmoravec-rhel72 goferd: [ERROR][worker-0] gofer.messaging.adapter.proton.reliability:47 - sender 13e1494a-b3ff-4cde-a15f-47dd22b16c22 to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') (optionally, "receiver" can be affected as well, but in 99% it is the sender) Expected behaviour: no such error (or say as some very temporary, in case you restart goferd "too soon", before qdrouterd on Caps re-connects to the Sat and recalculates routing rules)
Hi Pavel, "suspend" would means also take a vmware snapshot that last for some time?
(In reply to Daniele Palumbo from comment #41) > Hi Pavel, > > "suspend" would means also take a vmware snapshot that last for some time? I guess so - something that makes the Satellite not responsive both in qdrouterd process and on kernel level, such that the inter-qdrouterd connection has absolutely silent Satellite's end (but when I tried to mimic this via iptables, no luck with repro - there must be something more..) Anyway the above reproducer is one that trigger the relevant code, there can be more ways how to trigger the bug, I guess.
*** Hotfix instructions for 6.2.9 and prior versions of 6.2 *** 1) download libqpid-dispatch-0.4-22.el7sat.x86_64.rpm and qpid-dispatch-router-0.4-22.el7sat.x86_64.rpm from this bug 2) copy them to your Satellite and Capsules 3) upgrade the 2 packages on both the Satellite and Capsules: # yum update ./libqpid-dispatch-0.4-22.el7sat.x86_64.rpm ./qpid-dispatch-router-0.4-22.el7sat.x86_64.rpm 4) restart services # katello-service restart 5) Proceed with operations
Created attachment 1274970 [details] hotfix version
Created attachment 1274971 [details] hotfix version
Hotfix worked for me with Satellite 6.2.9 with this issue on some Hosts, applied on Satellite and Capsules, don't forget to: # systemctl daemon-reload due change in qrouterd.service before restarting katello-service, and restart goferd on Hosts Thanks
Created attachment 1280909 [details] [hotfix] qpid-dispatch-tools package (In reply to Mike McCune from comment #44) > *** Hotfix instructions for 6.2.9 and prior versions of 6.2 *** > > > 1) download libqpid-dispatch-0.4-22.el7sat.x86_64.rpm and > qpid-dispatch-router-0.4-22.el7sat.x86_64.rpm from this bug > > 2) copy them to your Satellite and Capsules > > 3) upgrade the 2 packages on both the Satellite and Capsules: > > # yum update ./libqpid-dispatch-0.4-22.el7sat.x86_64.rpm > ./qpid-dispatch-router-0.4-22.el7sat.x86_64.rpm > > 4) restart services > > # katello-service restart > > 5) Proceed with operations There is a dependency to qpid-dispatch-tools package that needs to be updated too, if present. Attaching it here.
+1 also affected on 6.2.9 with capsule server.
VERIFIED. @satellite-capsule-6.2.10-3.0.el7sat.noarch libqpid-dispatch-0.4-22.el7sat.x86_64 qpid-dispatch-router-0.4-22.el7sat.x86_64 using manual reproducer described in comment#40 : 5) service goferd restart REPRO: Jun 08 15:33:17 vm1.example.com systemd[1]: Starting Gofer Agent... Jun 08 15:33:17 vm1.example.com goferd[3072]: [INFO][Thread-1] gofer.rmi.store:114 - Using: /var/lib/gofer/messaging/pending/demo Jun 08 15:33:17 vm1.example.com goferd[3072]: [WARNING][MainThread] gofer.agent.plugin:639 - plugin:demo, DISABLED Jun 08 15:33:17 vm1.example.com goferd[3072]: [INFO][Thread-2] gofer.rmi.store:114 - Using: /var/lib/gofer/messaging/pending/katelloplugin Jun 08 15:33:17 vm1.example.com goferd[3072]: [INFO][Thread-3] gofer.rmi.store:114 - Using: /var/lib/gofer/messaging/pending/katelloplugin Jun 08 15:33:17 vm1.example.com goferd[3072]: [INFO][MainThread] gofer.agent.plugin:682 - plugin:katelloplugin loaded using: /usr/lib/gofer/plugins/katelloplugin.py Jun 08 15:33:17 vm1.example.com goferd[3072]: [INFO][MainThread] rhsm.connection:830 - Connection built: host=cap.example.com port=8443 handler=/rhsm auth=identity_cert ca_dir=/etc/rhsm/ca/ verify=False Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][MainThread] katelloplugin:178 - Using /etc/rhsm/ca/katello-default-ca.pem as the ca cert for qpid connection Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][MainThread] rhsm.connection:830 - Connection built: host=cap.example.com port=8443 handler=/rhsm auth=identity_cert ca_dir=/etc/rhsm/ca/ verify=False Jun 08 15:33:18 vm1.example.com goferd[3072]: Loaded plugins: product-id Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][MainThread] katelloplugin:409 - reporting: {'enabled_repos': {'repos': [{'baseurl': ['https://cap.example.com/pulp/repos/Default_Organization/Library/content/dist/rhel/server/7/7Server/x86_64/os'], 'repositoryid': 'rhel-7-server-rpms'}, {'baseurl': ['https://cap.example.com/pulp/repos/Default_Organization/Library/content/dist/rhel/server/7/7Server/x86_64/sat-tools/6.2/os'], 'repositoryid': 'rhel-7-server-satellite-tools-6.2-rpms'}]}} Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][worker-0] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://cap.example.com:5647 Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][worker-0] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://cap.example.com:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][MainThread] gofer.agent.main:87 - agent started. Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][worker-0] root:490 - connecting to cap.example.com:5647... Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][worker-0] gofer.messaging.adapter.proton.connection:92 - opened: proton+amqps://cap.example.com:5647 Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][worker-0] gofer.messaging.adapter.connect:30 - connected: proton+amqps://cap.example.com:5647 Jun 08 15:33:18 vm1.example.com goferd[3072]: [INFO][worker-0] root:512 - connected to cap.example.com:5647 Jun 08 15:33:18 vm1.example.com goferd[3072]: [ERROR][worker-0] gofer.messaging.adapter.proton.reliability:47 - sender b7d01549-a41d-4d61-a774-099b6f325a33 to qmf.default.direct closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node') vs. FIX: Jun 08 15:55:47 vm1.example.com systemd[1]: Starting Gofer Agent... Jun 08 15:55:47 vm1.example.com goferd[3116]: [INFO][Thread-1] gofer.rmi.store:114 - Using: /var/lib/gofer/messaging/pending/demo Jun 08 15:55:47 vm1.example.com goferd[3116]: [WARNING][MainThread] gofer.agent.plugin:639 - plugin:demo, DISABLED Jun 08 15:55:47 vm1.example.com goferd[3116]: [INFO][Thread-2] gofer.rmi.store:114 - Using: /var/lib/gofer/messaging/pending/katelloplugin Jun 08 15:55:47 vm1.example.com goferd[3116]: [INFO][Thread-3] gofer.rmi.store:114 - Using: /var/lib/gofer/messaging/pending/katelloplugin Jun 08 15:55:47 vm1.example.com goferd[3116]: [INFO][MainThread] gofer.agent.plugin:682 - plugin:katelloplugin loaded using: /usr/lib/gofer/plugins/katelloplugin.py Jun 08 15:55:48 vm1.example.com goferd[3116]: [INFO][MainThread] rhsm.connection:830 - Connection built: host=cap.example.com port=8443 handler=/rhsm auth=identity_cert ca_dir=/etc/rhsm/ca/ verify=False Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][MainThread] katelloplugin:178 - Using /etc/rhsm/ca/katello-default-ca.pem as the ca cert for qpid connection Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][MainThread] rhsm.connection:830 - Connection built: host=cap.example.com port=8443 handler=/rhsm auth=identity_cert ca_dir=/etc/rhsm/ca/ verify=False Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][MainThread] katelloplugin:409 - reporting: {'enabled_repos': {'repos': [{'baseurl': ['https://cap.example.com/pulp/repos/Default_Organization/Library/content/dist/rhel/server/7/7Server/x86_64/os'], 'repositoryid': 'rhel-7-server-rpms'}, {'baseurl': ['https://cap.example.com/pulp/repos/Default_Organization/Library/content/dist/rhel/server/7/7Server/x86_64/sat-tools/6.2/os'], 'repositoryid': 'rhel-7-server-satellite-tools-6.2-rpms'}]}} Jun 08 15:56:39 vm1.example.com goferd[3116]: Loaded plugins: product-id Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][worker-0] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://cap.example.com:5647 Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][worker-0] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://cap.example.com:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][MainThread] gofer.agent.main:87 - agent started. Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][worker-0] root:490 - connecting to cap.example.com:5647... Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][worker-0] gofer.messaging.adapter.proton.connection:92 - opened: proton+amqps://cap.example.com:5647 Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][worker-0] gofer.messaging.adapter.connect:30 - connected: proton+amqps://cap.example.com:5647 Jun 08 15:56:39 vm1.example.com goferd[3116]: [INFO][worker-0] root:512 - connected to cap.example.com:5647 Jun 08 15:56:40 vm1.example.com goferd[3116]: [INFO][worker-0] gofer.messaging.adapter.proton.connection:131 - closed: proton+amqps://cap.example.com:5647 Jun 08 15:56:40 vm1.example.com goferd[3116]: [INFO][pulp.agent.e0092f2a-e0a6-4a43-9979-2b0cd784d21d] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://cap.example.com:5647 Jun 08 15:56:40 vm1.example.com goferd[3116]: [INFO][pulp.agent.e0092f2a-e0a6-4a43-9979-2b0cd784d21d] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://cap.example.com:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None Jun 08 15:56:40 vm1.example.com goferd[3116]: [INFO][worker-0] gofer.agent.plugin:368 - plugin:katelloplugin, attached => pulp.agent.e0092f2a-e0a6-4a43-9979-2b0cd784d21d Jun 08 15:56:40 vm1.example.com goferd[3116]: [INFO][pulp.agent.e0092f2a-e0a6-4a43-9979-2b0cd784d21d] root:490 - connecting to cap.example.com:5647... Jun 08 15:56:40 vm1.example.com goferd[3116]: [INFO][pulp.agent.e0092f2a-e0a6-4a43-9979-2b0cd784d21d] gofer.messaging.adapter.proton.connection:92 - opened: proton+amqps://cap.example.com:5647 Jun 08 15:56:40 vm1.example.com goferd[3116]: [INFO][pulp.agent.e0092f2a-e0a6-4a43-9979-2b0cd784d21d] gofer.messaging.adapter.connect:30 - connected: proton+amqps://cap.example.com:5647 Jun 08 15:56:40 vm1.example.com goferd[3116]: [INFO][pulp.agent.e0092f2a-e0a6-4a43-9979-2b0cd784d21d] root:512 - connected to cap.example.com:5647 >>> the error Condition('qd:no-route-to-dest', 'No route to the destination node') no longer appears after the Capsule<->Satellite connection is re-established
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1553