Description of problem: On running longevity tests in a clustered ODL setup we see that one of the ODL instances seems to be up and running as reported by ps output, systemctl and netstat listening ports, however it doesn't seem to be functional. We could not even ssh into the karaf terminal using ssh -p 8101 karaf.0.16 until we restarted opendaylight. On performing a service restart we were able to get into the karaf shell and ODL seemed to come back up. Out of the other two instances of ODL, one was killed due to OOM and the other seemed to be running fine. This happens after about 42 hours of running the tests. Setup: 3 ODLs 3 OpenStack Controllers 3 Compute nodes Test: Create 40 neutron resources (rotuers, networks etc) 2 at a time using Rally and delete them over and over again. This is a long running low stress test. Version-Release number of selected component (if applicable): OSP 12 Puddle from 2017-8-18.3 ODL RPM from upstream: python-networking-odl-11.0.0-0.20170806093629.2e78dca.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP12 with ODL 2. Run low stress Rally tests for a long time 3. Actual results: ODL become non-functional on controller-0 Expected results: ODL should be running fine as this just a low stress test over a long time Additional info: Entire Karaf Log: http://8.43.86.1:8088/smalleni/karaf-controller-0.log.tar.gz
ODL became non-functional around 10:44 UTC 08/28/2017. This was confirmed as collectd which talks tothe Karaf JMX suddenly stopped reporting values for heap size. Collectd was able to talk to Karaf JMX after the service restart. The break can be clearly observed at: https://snapshot.raintank.io/dashboard/snapshot/nf6OWq7jNSeT6vwjM71jlUSWc31E9LdW
If it helps: The karaf thread count https://snapshot.raintank.io/dashboard/snapshot/EgrJsRB7HJ6tl1pjLlSY4hb6wWvJS7nT We can see that arund 10:44 UTC the thread count suddenly spikes and falls back after a restart.
ODL RPM used was opendaylight-6.2.0-0.1.20170817rel1931.el7.noarch
Closing this as the linked upstream ODL Bug 9063 has been closed (as dupe). Sai (reporter), please re-open if you disagree.