Important bits from the case notes, Chris and I were looking at this last week before the BZ was opened... The crash slogan: Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{k Systemtap was used to watch all exec()s and capture the cmdline of the crashing beam.smp. It looks like: Fri Apr 5 17:41:51 2019 2013 74073 134463 beam.smp /usr/lib64/erlang/erts-7.3.1.6/bin/beam.smp -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -sname epmd-starter-144528111 -proto_dist "inet_tcp" -noshell -eval halt(). This gets called from rabbit_nodes_common:ensure_epmd here: https://github.com/rabbitmq/rabbitmq-common/blob/v3.6.x/src/rabbit_nodes_common.erl#L37 Which is called from the rabbit_epmd_monitor process here: https://github.com/rabbitmq/rabbitmq-server/blob/v3.6.x/src/rabbit_epmd_monitor.erl#L108 The epmd monitor fires the check timer every 60 seconds, thus explaining the regular period seen here. What is not clear is why the epmd-starter exec fails to start distribution. There is some initial debugging in the case files around perhaps issues with ipv6 and/or hostname resolution, but it's not apparent that either are responsible. Everything seems to be functioning properly. The service is up, registered, epmd is running, everything is clustered. Just the epmd-starter crashes. Since epmd is already running, it doesn't have any practical effect on the running system.
*** Bug 1714128 has been marked as a duplicate of this bug. ***
Customer tried with 3.6.16 and it appears to have solved his problem.
I will report back on #1 in comment 20. Placing need info on Peter re: #2 & #3. -Chris
Please, try rabbitmq-server-3.6.15-4.el7ost build. It shouldn't create so many coredumps.
verified , puddle tested in automation and did not present any issues described in this bz https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/pidone/view/sanity/job/DFG-pidone-sanity-13_director-rhel-virthost-3cont_2comp-ipv4-vxlan-sanity/6/testReport/.home.stack.openstack-sts.tests.sanity/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2623
*** Bug 1751615 has been marked as a duplicate of this bug. ***