Bug 472242

Summary: Cluster race conditions turned up by failover_test
Product: Red Hat Enterprise MRG Reporter: Alan Conway <aconway>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED DUPLICATE QA Contact: Kim van der Riet <kim.vdriet>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 1.1CC: freznice
Target Milestone: 1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-12-09 13:53:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alan Conway 2008-11-19 15:46:30 UTC
Description of problem:

Running failover_test has thrown up some errors that indicate race conditions in cluster synchronization:

These look like inconsistent session state:

(09:18:58 AM) mick: 2008-nov-19 01:09:48 error Execution exception: invalid-argument: 6ef013c4-7cfa-4095-927a-a27022e4212c@guest@QPID: confirmed < (33744+0) but only sent < (33743+0) (qpid/SessionState.cpp:150)
(09:19:16 AM) mick:  2008-nov-19 05:57:08 error Connection exception: framing-error: Unexpected command start frame. (qpid/SessionState.cpp:57)

These are harmless but should be suppressed:
(09:19:49 AM) mick: 2008-nov-19 08:49:47 error amq.failovera29bae0f-a2a7-4d80-a24a-6c1ee1ae3bd9@guest error: Connection closed
(09:19:49 AM) mick: 2008-nov-19 08:49:47 error amq.failoverfdb88c35-00e1-43eb-8baf-98f97d88a96a@guest error: Connection closed

Not sure:
(09:19:50 AM) mick: 2008-nov-19 08:49:47 error 8e6c843f-2289-4493-ae64-d4bcadf5f544@guest error: Connection closed

Bug in  DumpClient or catch-up logic:
(09:19:50 AM) mick: 2008-nov-19 08:49:52 critical 6401a8c0:17552(CATCHUP) error in cluster delivered: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
(09:19:50 AM) mick: Invalid argument

How reproducible:

Sporadic

Steps to Reproduce:

Run failover_soak test.

Comment 1 Alan Conway 2008-11-26 21:29:33 UTC
Fixed in r720979, patch from mgoulish.

Comment 3 Frantisek Reznicek 2008-12-09 10:28:43 UTC
After long term running failover_soak test on three RHEL 5.2 boxes. (two x86_64, one i386) I can still see following issues: (svn co rev 724418)

  soak-7: 2008-dec-08 15:18:28 error Execution exception: invalid-argument: e55ba4e1-f17f-47a6-9677-59ce07d94a06@guest@QPID: confirmed < (21059+0) but only sent < (21058+0) (qpid/SessionState.cpp:150)

  soak-13: 2008-dec-08 13:45:01 critical 10.16.64.197:18551(CATCHUP) error in cluster delivery: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:80)

  soak-3: 2008-dec-08 18:45:31 critical 10.16.64.139:28381(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)

Detailed report stored in
mrg3.lab.bos.redhat.com:/root/failover_soak_failures_081209.tar.bz2

->ASSIGNED [NEEDINFO]


Additional info:

Short list of issues (non-verbose):
soak-7: 2008-dec-08 13:40:34 critical 10.16.64.155:19195(CATCHUP) error in cluster delivery: framing-error: Frame size too small 3 (qpid/framing/AMQFrame.cpp:88)
soak-7: 2008-dec-08 13:40:34 critical Multicast failure: Cannot mcast to CPG group soakTestCluster_826: access denied.
soak-13: 2008-dec-08 14:21:01 critical 10.16.64.155:527(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-10: 2008-dec-08 15:01:45 critical 10.16.64.155:15069(DUMPEE) catch-up connection closed prematurely 10.16.64.155:15069-0xb4240fb0(local,catchup)
soak-9: 2008-dec-08 15:19:03 critical 10.16.64.155:21238(CATCHUP) error in cluster delivery: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:99)
soak-9: 2008-dec-08 15:19:03 critical Multicast failure: Cannot mcast to CPG group soakTestCluster_633: access denied.
soak-10: 2008-dec-08 15:19:04 critical 10.16.64.155:21268(DUMPEE) catch-up connection closed prematurely 10.16.64.155:21268-0x84a3e08(local,catchup)
soak-4: 2008-dec-08 15:25:37 error Execution exception: invalid-argument: cb497a45-5c5f-488f-a257-5f3412f6fbed@guest@QPID: confirmed < (418+0) but only sent < (417+0) (qpid/SessionState.cpp:150)
soak-5: 2008-dec-08 15:25:47 error Execution exception: invalid-argument: cb497a45-5c5f-488f-a257-5f3412f6fbed@guest@QPID: confirmed < (48074+0) but only sent < (625+0) (qpid/SessionState.cpp:150)
soak-7: 2008-dec-08 15:26:03 critical 10.16.64.155:23684(DUMPEE) catch-up connection closed prematurely 10.16.64.155:23684-0x8800ed0(local,catchup)
soak-18: 2008-dec-08 16:18:00 critical 10.16.64.155:9417(DUMPEE) catch-up connection closed prematurely 10.16.64.155:9417-0x9ddee20(local,catchup)
soak-9: 2008-dec-08 16:22:10 critical 10.16.64.155:10702(DUMPEE) catch-up connection closed prematurely 10.16.64.155:10702-0x9bd8e68(local,catchup)
soak-5: 2008-dec-08 16:30:38 critical 10.16.64.155:13749(CATCHUP) error in cluster delivery: framing-error: Frame size too small 1 (qpid/framing/AMQFrame.cpp:88)
soak-3: 2008-dec-08 16:31:13 critical 10.16.64.155:14031(CATCHUP) error in cluster delivery: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:99)
soak-6: 2008-dec-08 17:15:21 critical 10.16.64.155:29588(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-7: 2008-dec-08 17:15:25 critical 10.16.64.155:29618(CATCHUP) error in cluster delivery: Invalid class id 0 (gen/qpid/framing/BodyHolder_gen.cpp:303)
soak-6: 2008-dec-08 17:24:10 critical 10.16.64.155:388(READY) error in cluster delivery: framing-error: Frame size too small 1 (qpid/framing/AMQFrame.cpp:88)
soak-7: 2008-dec-08 17:26:29 critical 10.16.64.155:1229(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-8: 2008-dec-08 17:53:18 critical 10.16.64.155:10900(CATCHUP) error in cluster delivery: Invalid class id 0 (gen/qpid/framing/BodyHolder_gen.cpp:303)
soak-9: 2008-dec-08 17:59:17 critical 10.16.64.155:13118(CATCHUP) error in cluster delivery: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:80)
soak-9: 2008-dec-08 17:59:17 critical Multicast failure: Cannot mcast to CPG group soakTestCluster_511: access denied.
soak-3: 2008-dec-08 18:01:35 critical 10.16.64.155:14062(CATCHUP) error in cluster delivery: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:80)
soak-9: 2008-dec-08 18:42:13 critical 10.16.64.155:28046(DUMPEE) catch-up connection closed prematurely 10.16.64.155:28046-0x9623ee0(local,catchup)
soak-10: 2008-dec-08 18:43:14 critical 10.16.64.155:28443(DUMPEE) catch-up connection closed prematurely 10.16.64.155:28443-0x924de50(local,catchup)
soak-9: 2008-dec-08 19:51:09 critical 10.16.64.155:19615(DUMPEE) catch-up connection closed prematurely 10.16.64.155:19615-0x81afe90(local,catchup)
soak-7: 2008-dec-08 20:03:06 critical 10.16.64.155:23831(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-7: 2008-dec-08 20:14:15 critical 10.16.64.155:27728(CATCHUP) error in cluster delivery: Invalid class id 55 (gen/qpid/framing/BodyHolder_gen.cpp:303)
soak-9: 2008-dec-08 20:22:24 critical 10.16.64.155:30865(CATCHUP) error in cluster delivery: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:99)
soak-11: 2008-dec-08 20:23:31 critical 10.16.64.155:31296(DUMPEE) catch-up connection closed prematurely 10.16.64.155:31296-0x9d16ef0(local,catchup)
soak-7: 2008-dec-08 20:25:13 critical 10.16.64.155:31960(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-4: 2008-dec-08 20:27:51 critical 10.16.64.155:562(CATCHUP) error in cluster delivery: framing-error: Frame size too small 3 (qpid/framing/AMQFrame.cpp:88)





soak-4: 2008-dec-08 13:40:40 critical 10.16.64.139:27613(DUMPEE) catch-up connection closed prematurely 10.16.64.139:27613-0x2e510c0(local,catchup)
soak-3: 2008-dec-08 13:49:17 error Execution exception: invalid-argument: 1f567300-be6e-45ed-9e5b-388d5653f0e1@guest@QPID: confirmed < (99681+0) but only sent < (99680+0) (qpid/SessionState.cpp:150)
soak-5: 2008-dec-08 13:54:59 critical 10.16.64.139:4007(DUMPEE) catch-up connection closed prematurely 10.16.64.139:4007-0x1222d090(local,catchup)
soak-3: 2008-dec-08 14:17:24 error Execution exception: invalid-argument: 0570bdbb-3760-4074-b67b-5bf71490f9e0@guest@QPID: confirmed < (73958+0) but only sent < (73957+0) (qpid/SessionState.cpp:150)
soak-5: 2008-dec-08 15:52:14 critical 10.16.64.139:1283(DUMPEE) catch-up connection closed prematurely 10.16.64.139:1283-0xea05090(local,catchup)
soak-4: 2008-dec-08 15:58:44 error Execution exception: invalid-argument: 06d7735f-5cf8-4f72-87dd-5b61b497d9d2@guest@QPID: confirmed < (62846+0) but only sent < (62845+0) (qpid/SessionState.cpp:150)
soak-3: 2008-dec-08 16:09:07 error Execution exception: invalid-argument: bd1580a8-6923-4969-8c84-45fc2c5c7e71@guest@QPID: confirmed < (79513+0) but only sent < (79512+0) (qpid/SessionState.cpp:150)
soak-4: 2008-dec-08 16:46:46 critical 10.16.64.139:31617(DUMPEE) catch-up connection closed prematurely 10.16.64.139:31617-0x12f6b070(local,catchup)
soak-4: 2008-dec-08 17:31:18 critical 10.16.64.139:22785(DUMPEE) catch-up connection closed prematurely 10.16.64.139:22785-0x156820b0(local,catchup)
soak-18: 2008-dec-08 17:36:24 critical 10.16.64.139:24608(DUMPEE) catch-up connection closed prematurely 10.16.64.139:24608-0xf03d0c0(local,catchup)
soak-3: 2008-dec-08 18:03:10 error Execution exception: invalid-argument: 3732d94e-5981-48e7-807e-bffe7d708862@guest@QPID: confirmed < (60557+0) but only sent < (60556+0) (qpid/SessionState.cpp:150)
soak-3: 2008-dec-08 18:33:01 error Execution exception: invalid-argument: 0bded3b1-1338-4c4d-a9ef-e79143c64aff@guest@QPID: confirmed < (77945+0) but only sent < (77944+0) (qpid/SessionState.cpp:150)
soak-3: 2008-dec-08 18:43:07 critical 10.16.64.139:26836(CATCHUP) error in cluster delivery: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:80)
soak-3: 2008-dec-08 18:45:31 critical 10.16.64.139:28381(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-18: 2008-dec-08 20:38:45 critical 10.16.64.139:20230(DUMPEE) catch-up connection closed prematurely 10.16.64.139:20230-0x1b928260(local,catchup)




soak-13: 2008-dec-08 13:45:01 critical 10.16.64.197:18551(CATCHUP) error in cluster delivery: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:80)
soak-14: 2008-dec-08 13:49:26 critical 10.16.64.197:19859(DUMPEE) catch-up connection closed prematurely 10.16.64.197:19859-0x7ff2c80(local,catchup)
soak-7: 2008-dec-08 13:56:02 error Execution exception: invalid-argument: 8599f18f-a676-4b39-867f-404686c5998c@guest@QPID: confirmed < (15934+0) but only sent < (15933+0) (qpid/SessionState.cpp:150)
soak-8: 2008-dec-08 13:57:45 critical 10.16.64.197:22350(CATCHUP) error in cluster delivery: framing-error: Frame size too small 1 (qpid/framing/AMQFrame.cpp:88)
soak-4: 2008-dec-08 14:04:51 critical 10.16.64.197:24688(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-13: 2008-dec-08 14:05:55 critical 10.16.64.197:24936(DUMPEE) catch-up connection closed prematurely 10.16.64.197:24936-0x13b52c80(local,catchup)
soak-12: 2008-dec-08 14:12:57 critical 10.16.64.197:27040(CATCHUP) error in cluster delivery: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:99)
soak-10: 2008-dec-08 14:22:10 critical 10.16.64.197:29740(CATCHUP) error in cluster delivery: Invalid class id 0 (gen/qpid/framing/BodyHolder_gen.cpp:303)
soak-3: 2008-dec-08 14:38:25 critical 10.16.64.197:2251(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-7: 2008-dec-08 15:18:28 error Execution exception: invalid-argument: e55ba4e1-f17f-47a6-9677-59ce07d94a06@guest@QPID: confirmed < (21059+0) but only sent < (21058+0) (qpid/SessionState.cpp:150)
soak-13: 2008-dec-08 15:25:12 critical 10.16.64.197:16060(NEWBIE) catch-up connection closed prematurely 10.16.64.197:16060-0x12a56c70(local,catchup)
soak-13: 2008-dec-08 15:36:05 critical 10.16.64.197:19229(DUMPEE) catch-up connection closed prematurely 10.16.64.197:19229-0x6e72d80(local,catchup)
soak-13: 2008-dec-08 16:05:10 critical 10.16.64.197:28007(DUMPEE) catch-up connection closed prematurely 10.16.64.197:28007-0x396bc30(local,catchup)
soak-11: 2008-dec-08 16:24:16 critical 10.16.64.197:1397(CATCHUP) error in cluster delivery: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:99)
soak-4: 2008-dec-08 16:40:53 critical 10.16.64.197:6525(CATCHUP) error in cluster delivery: framing-error: Frame size too small 2 (qpid/framing/AMQFrame.cpp:88)
soak-10: 2008-dec-08 16:58:58 critical 10.16.64.197:12066(CATCHUP) error in cluster delivery: Invalid class id 51 (gen/qpid/framing/BodyHolder_gen.cpp:303)
soak-5: 2008-dec-08 17:09:52 critical 10.16.64.197:15470(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-8: 2008-dec-08 17:33:15 critical 10.16.64.197:22554(CATCHUP) error in cluster delivery: Invalid class id 0 (gen/qpid/framing/BodyHolder_gen.cpp:303)
soak-5: 2008-dec-08 17:34:13 critical 10.16.64.197:22904(CATCHUP) error in cluster delivery: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:99)
soak-10: 2008-dec-08 18:04:12 error Execution exception: invalid-argument: 26b6d084-af0c-487d-a9c5-87f2c6333443@guest@QPID: confirmed < (19441+0) but only sent < (19440+0) (qpid/SessionState.cpp:150)
soak-8: 2008-dec-08 18:31:17 critical 10.16.64.197:7545(CATCHUP) error in cluster delivery: framing-error: Frame size too small 1 (qpid/framing/AMQFrame.cpp:88)
soak-3: 2008-dec-08 18:40:51 critical 10.16.64.197:10401(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-13: 2008-dec-08 18:52:04 critical 10.16.64.197:13876(DUMPEE) catch-up connection closed prematurely 10.16.64.197:13876-0x15995dc0(local,catchup)
soak-9: 2008-dec-08 18:56:03 critical 10.16.64.197:15105(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-2: 2008-dec-08 19:27:47 critical 10.16.64.197:24577(CATCHUP) error in cluster delivery: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:80)
soak-3: 2008-dec-08 19:31:18 critical 10.16.64.197:25494(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-4: 2008-dec-08 19:46:17 critical 10.16.64.197:30095(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)
soak-10: 2008-dec-08 19:55:51 critical 10.16.64.197:439(DUMPEE) catch-up connection closed prematurely 10.16.64.197:439-0xae2d050(local,catchup)
soak-3: 2008-dec-08 19:56:07 critical 10.16.64.197:622(CATCHUP) error in cluster delivery: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:80)
soak-17: 2008-dec-08 20:08:03 critical 10.16.64.197:4096(DUMPEE) catch-up connection closed prematurely 10.16.64.197:4096-0x1547bd00(local,catchup)
soak-3: 2008-dec-08 20:23:31 error Execution exception: invalid-argument: bc0a9e7b-ad07-4569-9e8e-0f92341ea70c@guest@QPID: confirmed < (12733+0) but only sent < (12732+0) (qpid/SessionState.cpp:150)
soak-6: 2008-dec-08 20:23:55 critical 10.16.64.197:9002(CATCHUP) error in cluster delivery: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:99)
soak-10: 2008-dec-08 20:24:25 critical 10.16.64.197:9110(CATCHUP) error in cluster delivery: Invalid class id 0 (gen/qpid/framing/BodyHolder_gen.cpp:303)
soak-13: 2008-dec-08 20:36:32 critical 10.16.64.197:12716(DUMPEE) catch-up connection closed prematurely 10.16.64.197:12716-0x1cf19d10(local,catchup)
soak-13: 2008-dec-08 20:42:52 critical 10.16.64.197:14594(DUMPEE) catch-up connection closed prematurely 10.16.64.197:14594-0x1d5d0d00(local,catchup)
soak-11: 2008-dec-08 21:33:12 critical 10.16.64.197:30100(CATCHUP) error in cluster delivery: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:99)
soak-8: 2008-dec-08 22:27:21 critical 10.16.64.197:14117(CATCHUP) error in cluster delivery: Invalid class id 0 (gen/qpid/framing/BodyHolder_gen.cpp:303)
soak-19: 2008-dec-08 22:28:44 critical 10.16.64.197:14403(DUMPEE) catch-up connection closed prematurely 10.16.64.197:14403-0x1770ae40(local,catchup)
soak-3: 2008-dec-08 22:28:52 critical 10.16.64.197:14562(CATCHUP) error in cluster delivery: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:88)

Comment 4 Alan Conway 2008-12-09 13:53:46 UTC

*** This bug has been marked as a duplicate of bug 473589 ***