| Summary: | LinkState unable to initialize, invalid values in 'body' | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Jan Hutař <jhutar> |
| Component: | Infrastructure | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED WONTFIX | QA Contact: | Katello QA List <katello-qa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.2.0 | CC: | adprice, bbuckingham, bkearney, cduryee, jcallaha, psuriset, tross |
| Target Milestone: | Unspecified | Keywords: | Performance, Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-12-01 15:31:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
This appears to be a dispatch router bug, not be related to Satellite's use of dispatch router.
The only info I found on this bug is from #0 and a couple of pastebin examples. Here is all of the "body" fields when this occurs:
body={'ls': {'ls_seq': 9L, 'peers': {'ip-10-1-1-1.us-west-2.compute.internal': 1L}, 'id': 'ip-10-1-1-4.us-west-2.compute.internal', 'area': '0'}, 'ls_seq': 9L, 'area': '0', 'id': 'ip-10-1-1-4.us-west-2.compute.internal', 'instance': 1471730162L}
body={'ls': {'ls_seq': 2L, 'peers': ['p3-dev-capsule.example.com', 'centos7-capsule-p42-nightly.example.com'], 'id': 'dev-p42.example.com', 'area': '0'}, 'ls_seq': 2L, 'area': '0', 'id': 'dev-p42.example.com', 'instance': 1472591533L}
body={'ls': {'ls_seq': 1L, 'peers': {'repo.xxxxxxxxxx.com': 1L}, 'id': 'xxxxxx.xxxxxx.com', 'area': '0'}, 'ls_seq': 1L, 'area': '0', 'id': 'xxxxxx.xxxx.com', 'instance': 1470666984L}
Note that in examples 1 and 3, 'peers' is incorrectly set to a dict instead of a list. However, in example 2, it's set correctly, but for some reason getMandatory() is checking if it's a dict:
Aug 31 10:16:21 centos7-capsule-p42-nightly qdrouterd: Exception: Protocol field has wrong data type: 'peers' type=<type 'list'> expected=<type 'dict'>
I took a look at the dispatch router code, but I'm not familiar enough with the code to find the bug. Could there be a problem in *qd_field_to_py() ? I am not sure if that would explain the issue in example 2, however.
This issue occurs because there is a mix of dispatch 0.4 and 0.6.x in a network. In 0.6.0, configurable link-cost was added as a feature in Qpid Dispatch Router and resulted in a protocol incompatibility between 0.6.0 and earlier versions. The bug is that Dispatch Router does not gracefully handle this condition. The workaround is to ensure that the deployment uses the same major-version packages for qpid-dispatch-router. |
Description of problem: Lots of tracebacks in `journalctl -f`: qdrouterd[...]: Exception: Protocol field has wrong data type: 'peers' type=<type 'dict'> expected=<type 'list'> Version-Release number of selected component (if applicable): Satellite 6.2 @ RHEL 7.2 How reproducible: rarely, but persistent when it happens Steps to Reproduce: 1. Do not know. We have Sat with 2 capsules and 10k clients with katello-agent and we have scheduled errata apply on all of them and when investigating failures, we have noticed this Actual results: Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: Mon Aug 22 05:49:06 2016 ROUTER (error) Control message error: opcode=LSU body={'ls': {'ls_seq': 9L, 'peers': {'ip-10-1-1-1.us-west-2.compute.internal': 1L}, 'id': 'ip-10-1-1-4.us-west-2.compute.internal', 'area': '0'}, 'ls_seq': 9L, 'area': '0', 'id': 'ip-10-1-1-4.us-west-2.compute.internal', 'instance': 1471730162L} Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: Traceback (most recent call last): Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: File "/usr/lib/qpid-dispatch/python/qpid_dispatch_internal/router/engine.py", line 143, in handleControlMessage Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: msg = MessageLSU(body) Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: File "/usr/lib/qpid-dispatch/python/qpid_dispatch_internal/router/data.py", line 176, in __init__ Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: self.ls = LinkState(getMandatory(body, 'ls', dict)) Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: File "/usr/lib/qpid-dispatch/python/qpid_dispatch_internal/router/data.py", line 55, in __init__ Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: self.peers = getMandatory(body, 'peers', list) Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: File "/usr/lib/qpid-dispatch/python/qpid_dispatch_internal/router/data.py", line 27, in getMandatory Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: raise Exception("Protocol field has wrong data type: '%s' type=%r expected=%r" % (key, value.__class__, cls)) Aug 22 05:49:06 ip-10-1-1-1.us-west-2.compute.internal qdrouterd[20283]: Exception: Protocol field has wrong data type: 'peers' type=<type 'dict'> expected=<type 'list'> Expected results: No traceback Additional info: I have modified .../data.py to show me actual data as well, and it showed this: Exception: Protocol field has wrong data type: 'peers' type=<type 'dict'> expected=<type 'list'> value={'ip-10-1-1-1.us-west-2.compute.internal': 1L}