Bug 1397110
Summary: | Queue.declare: (404) NOT_FOUND - failed to perform operation on queue 'notifications.audit' in vhost '/' due to timeout | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Yurii Prokulevych <yprokule> | |
Component: | rabbitmq-server | Assignee: | Peter Lemenkov <plemenko> | |
Status: | CLOSED DUPLICATE | QA Contact: | Asaf Hirshberg <ahirshbe> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 10.0 (Newton) | CC: | apevec, fdinitto, gkadam, jeckersb, jschluet, lhh, mcornea, mkrcmari, pkilambi, plemenko, sasha, srevivo, ushkalim, vstinner, yprokule | |
Target Milestone: | ga | Keywords: | Triaged | |
Target Release: | 10.0 (Newton) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | rabbitmq-server-3.6.3-6.el7ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1418668 (view as bug list) | Environment: | ||
Last Closed: | 2016-12-07 14:05:47 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1418668 |
Description
Yurii Prokulevych
2016-11-21 16:13:38 UTC
Forgot to mention overcloud is with SSL and IPv6. First issue I've just found is this one: https://github.com/rabbitmq/rabbitmq-server/issues/914 It was fixed upstream, and available since 3.6.6 version (see PR 918). This is a part of the log which shows that the cluster is affected by this one: ** Reason for termination == ** {{bad_return_value, {error, {function_clause, [{rabbit_mirror_queue_misc,promote_slave, [[]], [{file,"src/rabbit_mirror_queue_misc.erl"},{line,282}]}, {rabbit_mirror_queue_misc,'-remove_from_queue/3-fun-1-',3, [{file,"src/rabbit_mirror_queue_misc.erl"},{line,93}]}, {mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,833}]}, {mnesia_tm,execute_transaction,5, [{file,"mnesia_tm.erl"},{line,808}]}, {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1, [{file,"src/rabbit_misc.erl"},{line,532}]}, {worker_pool_worker,'-run/2-fun-0-',3, [{file,"src/worker_pool_worker.erl"},{line,77}]}]}}}, {gen_server2,call, [<24684.21919.0>,{add_on_right,{14,<0.28037.0>}},infinity]}} There might be more issues, and I'll update this ticket if I find anything. We need only this commit: https://github.com/rabbitmq/rabbitmq-server/pull/918/commits/a416060 The same issue: ** Reason for termination == ** {{{case_clause,{ok,<24684.21670.0>,[<24676.26042.0>],[]}}, [{rabbit_mirror_queue_coordinator,handle_cast,2, [{file,"src/rabbit_mirror_queue_coordinator.erl"},{line,351}]}, {gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1032}]}, {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, {gen_server2,call, [<24676.28451.0>,{add_on_right,{19,<0.28396.0>}},infinity]}} Another one issue is related to RabbitMQ OCF script. Similar issue for the different OCF scrip was described here: https://bugs.launchpad.net/fuel/+bug/1620649 It was addressed upstream in PR 946. Btw it looks like networking wasn't reliable: =INFO REPORT==== 21-Nov-2016::14:37:29 === node 'rabbit@controller-0' down: etimedout =WARNING REPORT==== 21-Nov-2016::14:37:29 === Cluster minority/secondary status detected - awaiting recovery =INFO REPORT==== 21-Nov-2016::14:37:29 === node 'rabbit@controller-0' up =INFO REPORT==== 21-Nov-2016::14:37:29 === rabbit on node 'rabbit@controller-2' down =INFO REPORT==== 21-Nov-2016::14:37:29 === Stopping RabbitMQ =INFO REPORT==== 21-Nov-2016::14:37:29 === stopped TCP Listener on [FD00:FD00:FD00:2000::15]:5672 Another one issue: ** Reason for termination == ** {bad_return_value, {error, {{badmatch,{error,not_found}}, [{gm,'-record_dead_member_in_group/5-fun-1-',4, [{file,"src/gm.erl"},{line,1129}]}, {mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,833}]}, {mnesia_tm,execute_transaction,5, [{file,"mnesia_tm.erl"},{line,808}]}, {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1, [{file,"src/rabbit_misc.erl"},{line,532}]}, {worker_pool_worker,'-run/2-fun-0-',3, [{file,"src/worker_pool_worker.erl"},{line,77}]}]}}} It was fixed upstream in this PR: https://github.com/rabbitmq/rabbitmq-server/pull/951 Yet another one issue: ** Reason for termination == ** {{badmatch,{error,{{duplicate_live_master,'rabbit@controller-1'}, {gen_server2,call, [<0.24797.0>, {add_on_right,{1,<0.25919.0>}}, infinity]}}}}, [{rabbit_mirror_queue_master,init_with_existing_bq,3, [{file,"src/rabbit_mirror_queue_master.erl"}, {line,104}]}, {rabbit_mirror_queue_master,init,3, [{file,"src/rabbit_mirror_queue_master.erl"}, {line,99}]}, {rabbit_amqqueue_process,init_it2,3, [{file,"src/rabbit_amqqueue_process.erl"}, {line,194}]}, {gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1032}]}, {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]} This was addressed upstream in PR 970, and available since ver. 3.6.6 We interested in this commit partcularly: https://github.com/rabbitmq/rabbitmq-server/commit/4005a5a This issue was addressed in PR 918: ** Reason for termination == ** {{bad_return_value, {error, {function_clause, [{gm,check_membership, [{5,<24678.21141.0>},{error,not_found}], [{file,"src/gm.erl"},{line,1590}]}, {gm,'-record_new_member_in_group/4-fun-1-',3, [{file,"src/gm.erl"},{line,1167}]}, {mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,833}]}, {mnesia_tm,execute_transaction,5, [{file,"mnesia_tm.erl"},{line,808}]}, {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1, [{file,"src/rabbit_misc.erl"},{line,532}]}, {worker_pool_worker,'-run/2-fun-0-',3, [{file,"src/worker_pool_worker.erl"},{line,77}]}]}}}, {gen_server2,call, [<24678.21141.0>,{add_on_right,{5,<0.20098.0>}},infinity]}} This commit addresses it: https://github.com/rabbitmq/rabbitmq-server/pull/918/commits/53f10c9 Ok, that's all for now. There are other issues, just grep for duplicate_live_master and gm_deaths, but they are likely related to these ones. I guess if we address these listed above, then all other will go away. My proposal is to backport all six patches from PRs 918, 970 to RabbitMQ 3.6.3 we're using. Then I'll adapt fix proposed in PR 946 to our resource-agent' script. Almost done. All RabbitMQ-related patches applied in this build - rabbitmq-server-3.6.3-6.el7ost. (In reply to Peter Lemenkov from comment #8) > Another one issue is related to RabbitMQ OCF script. Similar issue for the > different OCF scrip was described here: > > https://bugs.launchpad.net/fuel/+bug/1620649 > > It was addressed upstream in PR 946. For the fix for this issue please go to bug 1397393. Just for the record it can be easily spotted by the following log messages: =ERROR REPORT==== 21-Nov-2016::12:32:38 === Mnesia('rabbit@controller-1'): ** ERROR ** (core dumped to file: "/var/lib/rabbitmq/MnesiaCore.rabbit@controller-1_1479_731558_864930") ** FATAL ** Failed to merge schema: Bad cookie in table definition mirrored_sup_childspec: 'rabbit@controller-1' = {cstruct,mirrored_sup_childspec,ordered_set,['rabbit@controller-2','rabbit@controller-1','rabbit@controller-0'],[],[],0,read_write,false,[],[],false,mirrored_sup_childspec,[key,mirroring_pid,childspec],[],[],[],{{1479490843523035021,-576460752303423006,1},'rabbit@controller-0'},{{4,0},{'rabbit@controller-2',{1479,490892,472236}}}}, 'rabbit@controller-0' = {cstruct,mirrored_sup_childspec,ordered_set,['rabbit@controller-0'],[],[],0,read_write,false,[],[],false,mirrored_sup_childspec,[key,mirroring_pid,childspec],[],[],[],{{1479731540535703648,-576460752303422975,1},'rabbit@controller-0'},{{2,0},[]}} Regarding this (see comment 1): Attempt to list consumers was falling with next message: rabbitmqctl list_consumers Listing consumers ... Error: {badrpc, {'EXIT', {noproc, {gen_server2,call,[<10457.15686.0>,consumers,infinity]}}}} This looks like RabbitMQ is in a failed state. I believe that if we address all the issues mentioned above, then this one will magically gone. *** This bug has been marked as a duplicate of bug 1397393 *** Just an explanation of the comment 29. We believe we've fixed all the issues related to RabbitMQ itself. Still one issue related to resource-agents remains. We'll address it in bug 1397393 (thus we closed this ticket as a duplicate of that one). |