Bug 1641604 - [Deployment] Logs are full of NULL (0 binary)
Summary: [Deployment] Logs are full of NULL (0 binary)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: 14.0 (Rocky)
Assignee: Victor Pickard
QA Contact: Noam Manos
URL:
Whiteboard: Deployment
Depends On:
Blocks: 1648348
TreeView+ depends on / blocked
 
Reported: 2018-10-22 10:14 UTC by Noam Manos
Modified: 2019-01-11 11:54 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.0.1-0.20181013060884.el7ost.noarch.rpm
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1648348 (view as bug list)
Environment:
Last Closed: 2019-01-11 11:54:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1793566 0 None None None 2018-10-23 09:33:52 UTC
OpenStack gerrit 612459 0 None None None 2018-11-01 14:38:03 UTC
OpenStack gerrit 612468 0 None None None 2018-10-23 09:34:24 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:54:18 UTC

Description Noam Manos 2018-10-22 10:14:01 UTC
Description of problem:
old karaf logs not removed and the controller disk gets 100% full

Version-Release number of selected component (if applicable):
OSP14   -p 2018-10-17.2


How reproducible:
always

Steps to Reproduce:
1. Install OSP14 with ODL
2. Run some Tempest and other tests
3. Check that all karaf logs are archived, and not exceeding 100MB each.
[heat-admin@controller-0 /]$ ll -h /var/log/containers/opendaylight/ 

Actual results:

[heat-admin@controller-0 ~]$ df -h | head
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda2        40G   40G   20K 100% /

[heat-admin@controller-0 /]$ sudo du -ah | sort -rh | head -5
sort: cannot create temporary file in ‘/tmp’: No space left on device
[heat-admin@controller-0 /]$ ll -h /var/log/containers/opendaylight/
total 17G

-rw-r--r--. 1 42462 42462 8.2G Oct 22 04:26 karaf.log
-rw-r--r--. 1 42462 42462 6.1G Oct 22 03:02 karaf.log.1
-rw-r--r--. 1 42462 42462  37M Oct 21 18:01 karaf.log.10.gz
-rw-r--r--. 1 42462 42462 6.1M Oct 21 17:01 karaf.log.11.gz
-rw-r--r--. 1 42462 42462 3.1M Oct 21 16:00 karaf.log.12.gz
-rw-r--r--. 1 42462 42462 3.1M Oct 21 15:01 karaf.log.13.gz
-rw-r--r--. 1 42462 42462 2.9M Oct 21 14:00 karaf.log.14.gz
-rw-r--r--. 1 42462 42462 4.1M Oct 21 13:01 karaf.log.15.gz
-rw-r--r--. 1 42462 42462  11G Oct 21 19:35 karaf.log.2
-rw-r--r--. 1 42462 42462  63M Oct 22 02:01 karaf.log.2.gz
-rw-r--r--. 1 42462 42462  11G Oct 21 22:15 karaf.log.3
-rw-r--r--. 1 42462 42462 9.9M Oct 22 01:03 karaf.log.3.gz
-rw-r--r--. 1 42462 42462  11G Oct 22 00:53 karaf.log.4
-rw-r--r--. 1 42462 42462  68M Oct 22 00:02 karaf.log.4.gz
-rw-r--r--. 1 42462 42462  51M Oct 21 23:02 karaf.log.5.gz
-rw-r--r--. 1 42462 42462  71M Oct 21 22:02 karaf.log.6.gz
-rw-r--r--. 1 42462 42462  65M Oct 21 21:01 karaf.log.7.gz
-rw-r--r--. 1 42462 42462  28M Oct 21 20:02 karaf.log.8.gz
-rw-r--r--. 1 42462 42462  74M Oct 21 19:03 karaf.log.9.gz


Expected results:
All old karaf logs should be archived, and not exceeding 100MB each

Additional info:

Comment 1 Mike Kolesnik 2018-10-23 09:33:52 UTC
After investigating the issue with Janki it seems that the cause of this problem is that the log file is also being rotated by a general log rotation policy from TripleO, which causes Log4J to freak out and output tons of NULL (0 binary) into the log.

Vic is already working on a fix upstream so assigning this to him.

In the meanwhile it's recommended to apply the fix manually before deploying.

Comment 3 Noam Manos 2018-10-28 15:03:15 UTC
Using puddle 2018-10-17.2 + patch https://review.openstack.org/#/c/612468

By adding to to deployment OVERCLOUD_DEPLOY_OVERRIDE_OPTIONS :

--fetchfiles-undercloud https://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/services/logging/files/opendaylight-api.yaml?h=refs/changes/68/612468/1,/usr/share/openstack-tripleo-heat-templates/docker/services/logging/files/opendaylight-api.yaml

I can see that the karaf.log have moved into new sub directories:
/var/log/containers/opendaylight/karaf/logs

Rather than directly in opendaylight dir. Is it by design ?

(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin@controller-1 "ll /var/log/containers/opendaylight/";
bash: ll: command not found
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin@controller-1 "ls /var/log/containers/opendaylight/";
karaf
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin@controller-1 "ls /var/log/containers/opendaylight/karaf";
logs
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin@controller-1 "ls /var/log/containers/opendaylight/karaf/logs";
karaf.log
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin@controller-1 "ls -lah /var/log/containers/opendaylight/karaf/logs";
total 5.6G
drwxr-xr-x. 2 42462 42462   23 Oct 24 13:27 .
drwxr-xr-x. 3 root  root    18 Oct 24 13:11 ..
-rw-r--r--. 1 42462 42462 5.1G Oct 28 14:56 karaf.log


However, the NULL binary bytes is not there, and a normal log output is created:

(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin@controller-1 "tail /var/log/containers/opendaylight/karaf/logs/*";

2018-10-28T14:53:32,508 | INFO  | nioEventLoopGroup-11-8 | AbstractConnectionAdapter        | 387 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.4.redhat-4 | The channel outbound queue size:1024
2018-10-28T14:53:32,508 | INFO  | nioEventLoopGroup-9-13 | SystemNotificationsListenerImpl  | 378 - org.opendaylight.openflowplugin.impl - 0.6.4.redhat-4 | ConnectionEvent: Connection closed by device, Device:/127.0.0.1:37564, NodeId:null
2018-10-28T14:53:32,509 | INFO  | nioEventLoopGroup-11-8 | SystemNotificationsListenerImpl  | 378 - org.opendaylight.openflowplugin.impl - 0.6.4.redhat-4 | ConnectionEvent: Connection closed by device, Device:/127.0.0.1:58424, NodeId:null
2018-10-28T14:53:32,509 | INFO  | nioEventLoopGroup-9-14 | AbstractConnectionAdapter        | 387 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.4.redhat-4 | The channel outbound queue size:1024
2018-10-28T14:53:32,511 | INFO  | nioEventLoopGroup-11-9 | AbstractConnectionAdapter        | 387 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.4.redhat-4 | The channel outbound queue size:1024
2018-10-28T14:53:32,511 | INFO  | nioEventLoopGroup-9-14 | SystemNotificationsListenerImpl  | 378 - org.opendaylight.openflowplugin.impl - 0.6.4.redhat-4 | ConnectionEvent: Connection closed by device, Device:/127.0.0.1:37568, NodeId:null
2018-10-28T14:53:32,512 | INFO  | nioEventLoopGroup-11-9 | SystemNotificationsListenerImpl  | 378 - org.opendaylight.openflowplugin.impl - 0.6.4.redhat-4 | ConnectionEvent: Connection closed by device, Device:/127.0.0.1:58430, NodeId:null
2018-10-28T14:53:32,533 | DEBUG | org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ElanOvsdbNodeListener            | 344 - org.opendaylight.netvirt.elanmanager-impl - 0.6.4.redhat-4 | ElanOvsdbNodeListener.update, updated node detected. original: Node{getNodeId=Uri [_value=ovsdb://uuid/ef0201ff-886a-4427-a667-e3e8b9abe8ce], augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.ovsdb.rev150105.OvsdbNodeAugmentation=OvsdbNodeAugmentation{getConnectionInfo=ConnectionInfo{getLocalIp=IpAddress [_ipv4Address=Ipv4Address [_value=172.17.1.17]]

Comment 4 Victor Pickard 2018-10-29 12:56:46 UTC
Hi,
Yes, the karaf logs are now located here, as you stated:
/var/log/containers/karaf/logs/karaf.log

Please refer to the bug below for more details

https://bugs.launchpad.net/tripleo/+bug/1793566

Comment 7 Noam Manos 2018-11-18 15:25:38 UTC
Running OSP14 deployment with openvswitch2.10-2.10.0-21.el7fdn.x86_64.rpm:
http://download.eng.bos.redhat.com/brewroot/packages/openvswitch2.10/2.10.0/21.el7fdn/x86_64/openvswitch2.10-2.10.0-21.el7fdn.x86_64.rpm


But there's no new file structure for odl logs, and disk is 100% full after deployment:

###### controller-0: ssh heat-admin.24.11 ######

b7c98e16d013        192.168.24.1:8787/rhosp14/openstack-gnocchi-metricd:2018-11-09.3               "kolla_start"            4 hours ago         Up 4 hours (unhealthy)                             gnocchi_metricd
a15023406b91        192.168.24.1:8787/rhosp14/openstack-ceilometer-central:2018-11-09.3            "kolla_start"            4 hours ago         Up 4 hours (unhealthy)                             ceilometer_agent_central
fc00f047c54c        192.168.24.1:8787/rhosp14/openstack-opendaylight:2018-11-09.3                  "kolla_start"            4 hours ago         Up 4 hours (unhealthy)                             opendaylight_api
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda2        20G   20G   20K 100% /
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G   39M   16G   1% /dev/shm
tmpfs            16G  151M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
tmpfs           3.2G     0  3.2G   0% /run/user/0
tmpfs           3.2G     0  3.2G   0% /run/user/1000



[heat-admin@controller-0 ~]$ ll /var/log/containers/opendaylight/
total 106476
-rw-r--r--. 1 42462 42462 106352640 Nov 18 15:13 karaf.log
-rw-r--r--. 1 42462 42462  88886182 Nov 18 13:00 karaf.log.1
-rw-r--r--. 1 42462 42462   2667866 Nov 18 12:00 karaf.log.2.gz


  [heat-admin@controller-0 /]$ 
  [heat-admin@controller-0 /]$ tail -1000 /var/log/containers/opendaylight/karaf.log
  2018-11-18T15:18:35,261 | INFO  | nioEventLoopGroup-11-3 | AbstractConnectionAdapter        | 387 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.4.redhat-6 | The channel outbound queue size:1024
  2018-11-18T15:18:35,261 | INFO  | nioEventLoopGroup-11-2 | AbstractConnectionAdapter        | 387 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.4.redhat-6 | The channel outbound queue size:1024
  2018-11-18T15:18:35,261 | INFO  | nioEventLoopGroup-11-3 | SystemNotificationsListenerImpl  | 378 - org.opendaylight.openflowplugin.impl - 0.6.4.redhat-6 | ConnectionEvent: Connection closed by device, Device:/127.0.0.1:46372, NodeId:null
  2018-11-18T15:18:35,262 | INFO  | nioEventLoopGroup-11-2 | SystemNotificationsListenerImpl  | 378 - org.opendaylight.openflowplugin.impl - 0.6.4.redhat-6 | ConnectionEvent: Connection closed by device, Device:/127.0.0.1:46368, NodeId:null
  2018-11-18T15:18:36,699 | ERROR | qtp626219007-21545 | DatastoreServiceStatusProvider   | 262 - org.opendaylight.genius.mdsalutil-impl - 0.4.4.redhat-6 | Unable to obtain the datastore status
  java.lang.reflect.UndeclaredThrowableException: null
  	at com.sun.proxy.$Proxy110.getRaftState(Unknown Source) [231:org.opendaylight.controller.sal-distributed-datastore:1.7.4.redhat-6]
  	at org.opendaylight.genius.mdsalutil.diagstatus.internal.DatastoreServiceStatusProvider.getServiceDescriptor(DatastoreServiceStatusProvider.java:89) [262:org.opendaylight.genius.mdsalutil-impl:0.4.4.redhat-6]
  	at Proxy241f9f36_ab17_4d9d_9a59_3fad77bf0588.getServiceDescriptor(Unknown Source) [?:?]
  	at Proxyd6f0d70e_6629_4345_b144_55579e64838d.getServiceDescriptor(Unknown Source) [?:?]
  	at org.opendaylight.infrautils.diagstatus.internal.DiagStatusServiceImpl.updateServiceStatusMap(DiagStatusServiceImpl.java:137) [272:org.opendaylight.infrautils.diagstatus-impl:1.3.4.redhat-6]
  	at org.opendaylight.infrautils.diagstatus.internal.DiagStatusServiceImpl.getAllServiceDescriptors(DiagStatusServiceImpl.java:86) [272:org.opendaylight.infrautils.diagstatus-impl:1.3.4.redhat-6]
  	at org.opendaylight.infrautils.diagstatus.internal.DiagStatusServiceImpl.isOperational(DiagStatusServiceImpl.java:95) [272:org.opendaylight.infrautils.diagstatus-impl:1.3.4.redhat-6]
  	at Proxyea07b505_74b4_459e_95c3_86daec3e8e0b.isOperational(Unknown Source) [?:?]
  	at Proxy108b1804_c30c_4378_a025_b6f5674a1215.isOperational(Unknown Source) [?:?]
  	at org.opendaylight.infrautils.diagstatus.web.DiagStatusServlet.doGet(DiagStatusServlet.java:42) [274:org.opendaylight.infrautils.diagstatus-web:1.3.4.redhat-6]
  	at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) [72:javax.servlet-api:3.1.0]
  	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [72:javax.servlet-api:3.1.0]
  	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.eclipse.jetty.websocket.server.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:205) [173:org.eclipse.jetty.websocket.server:9.3.21.v20170918]
  	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:71) [441:org.ops4j.pax.web.pax-web-jetty:6.0.9]
  	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) [160:org.eclipse.jetty.security:9.3.21.v20170918]
  	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:284) [441:org.ops4j.pax.web.pax-web-jetty:6.0.9]
  	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:80) [441:org.ops4j.pax.web.pax-web-jetty:6.0.9]
  	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.Server.handle(Server.java:534) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) [154:org.eclipse.jetty.io:9.3.21.v20170918]
  	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) [154:org.eclipse.jetty.io:9.3.21.v20170918]
  	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) [154:org.eclipse.jetty.io:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at java.lang.Thread.run(Thread.java:748) [?:?]
  Caused by: javax.management.InstanceNotFoundException: org.opendaylight.controller:type=DistributedConfigDatastore,Category=Shards,name=member-0-shard-default-config
  	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) ~[?:?]
  	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:643) ~[?:?]
  	at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) ~[?:?]
  	at com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:122) ~[?:?]
  	at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:167) ~[?:?]
  	at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:258) ~[?:?]
  	... 41 more
  2018-11-18T15:18:36,700 | INFO  | nioEventLoopGroup-9-9 | AbstractConnectionAdapter        | 387 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.4.redhat-6 | The channel outbound queue size:1024
  2018-11-18T15:18:36,700 | INFO  | nioEventLoopGroup-11-4 | AbstractConnectionAdapter        | 387 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.4.redhat-6 | The channel outbound queue size:1024
  2018-11-18T15:18:36,701 | ERROR | qtp626219007-21545 | DatastoreServiceStatusProvider   | 262 - org.opendaylight.genius.mdsalutil-impl - 0.4.4.redhat-6 | Unable to obtain the datastore status
  java.lang.reflect.UndeclaredThrowableException: null
  	at com.sun.proxy.$Proxy110.getRaftState(Unknown Source) [231:org.opendaylight.controller.sal-distributed-datastore:1.7.4.redhat-6]
  	at org.opendaylight.genius.mdsalutil.diagstatus.internal.DatastoreServiceStatusProvider.getServiceDescriptor(DatastoreServiceStatusProvider.java:89) [262:org.opendaylight.genius.mdsalutil-impl:0.4.4.redhat-6]
  	at Proxy241f9f36_ab17_4d9d_9a59_3fad77bf0588.getServiceDescriptor(Unknown Source) [?:?]
  	at Proxyd6f0d70e_6629_4345_b144_55579e64838d.getServiceDescriptor(Unknown Source) [?:?]
  	at org.opendaylight.infrautils.diagstatus.internal.DiagStatusServiceImpl.updateServiceStatusMap(DiagStatusServiceImpl.java:137) [272:org.opendaylight.infrautils.diagstatus-impl:1.3.4.redhat-6]
  	at org.opendaylight.infrautils.diagstatus.internal.DiagStatusServiceImpl.getAllServiceDescriptors(DiagStatusServiceImpl.java:86) [272:org.opendaylight.infrautils.diagstatus-impl:1.3.4.redhat-6]
  	at org.opendaylight.infrautils.diagstatus.internal.DiagStatusServiceImpl.isOperational(DiagStatusServiceImpl.java:95) [272:org.opendaylight.infrautils.diagstatus-impl:1.3.4.redhat-6]
  	at org.opendaylight.infrautils.diagstatus.internal.DiagStatusServiceImpl.getAllServiceDescriptorsAsJSON(DiagStatusServiceImpl.java:110) [272:org.opendaylight.infrautils.diagstatus-impl:1.3.4.redhat-6]
  	at Proxyea07b505_74b4_459e_95c3_86daec3e8e0b.getAllServiceDescriptorsAsJSON(Unknown Source) [?:?]
  	at Proxy108b1804_c30c_4378_a025_b6f5674a1215.getAllServiceDescriptorsAsJSON(Unknown Source) [?:?]
  	at org.opendaylight.infrautils.diagstatus.web.DiagStatusServlet.doGet(DiagStatusServlet.java:52) [274:org.opendaylight.infrautils.diagstatus-web:1.3.4.redhat-6]
  	at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) [72:javax.servlet-api:3.1.0]
  	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [72:javax.servlet-api:3.1.0]
  	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.eclipse.jetty.websocket.server.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:205) [173:org.eclipse.jetty.websocket.server:9.3.21.v20170918]
  	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:71) [441:org.ops4j.pax.web.pax-web-jetty:6.0.9]
  	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) [160:org.eclipse.jetty.security:9.3.21.v20170918]
  	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:284) [441:org.ops4j.pax.web.pax-web-jetty:6.0.9]
  	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) [163:org.eclipse.jetty.servlet:9.3.21.v20170918]
  	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:80) [441:org.ops4j.pax.web.pax-web-jetty:6.0.9]
  	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.Server.handle(Server.java:534) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) [162:org.eclipse.jetty.server:9.3.21.v20170918]
  	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) [154:org.eclipse.jetty.io:9.3.21.v20170918]
  	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) [154:org.eclipse.jetty.io:9.3.21.v20170918]
  	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) [154:org.eclipse.jetty.io:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) [165:org.eclipse.jetty.util:9.3.21.v20170918]
  	at java.lang.Thread.run(Thread.java:748) [?:?]
  Caused by: javax.management.InstanceNotFoundException: org.opendaylight.controller:type=DistributedConfigDatastore,Category=Shards,name=member-0-shard-default-config
  	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) ~[?:?]
  	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:643) ~[?:?]
  	at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) ~[?:?]
  	at com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:122) ~[?:?]
  	at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:167) ~[?:?]
  	at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:258) ~[?:?]
  	... 42 more

Comment 14 Noam Manos 2018-11-19 12:15:31 UTC
The bug was still occurring even when overcloud nodes got the new rpm:

[heat-admin@controller-0 ~]$ rpm -qa | grep openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-9.0.1-0.20181013060884.el7ost.noarch

Apparently, we need to apply this rpm into undercloud, and not into overcloud.

Undercloud was still using old rpm:

(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep trip openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-9.0.1-0.20181013060879.el7ost.noarch

I'm redeploying OSP with undercloud-packages option added to OVERCLOUD_DEPLOY_OVERRIDE_OPTIONS:

--undercloud-packages http://brew.redhat.com/official/openstack-tripleo-heat-templates/9.0.1/0.20181013060884.el7ost/noarch/openstack-tripleo-heat-templates-9.0.1-0.20181013060884.el7ost.noarch.rpm

Comment 22 Noam Manos 2018-11-21 09:38:25 UTC
Deploying on OSP14 2018-11-07.3 + openstack-tripleo-heat-templates-9.0.1-0.20181013060879.el7ost.noarch

Passed with following results. Notice:
1) The log dir of Karaf is directly under /var/log/containers/opendaylight.
2) There is no issue of NULLs anymore, but there is perhaps other issue in neutron log: Failed during periodic task operation journal_recovery.

---


(overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-9.0.1-0.20181013060879.el7ost.noarch


(undercloud) [stack@undercloud-0 ~]$ openstack server list --all
+--------------------------------------+-------------------+--------+--------------------------------------------------------------+----------+---------------+
| ID                                   | Name              | Status | Networks                                                     | Image    | Flavor        |
+--------------------------------------+-------------------+--------+--------------------------------------------------------------+----------+---------------+
| 5ecae502-3861-4aec-8ab8-0a3959ab3f50 | cirros35_vm2_net2 | ACTIVE | net_ipv64_2=2002::f816:3eff:fe81:a9f2, 10.0.2.22, 10.0.0.211 | cirros35 | cirros_flavor |
| e53bc071-ffb9-4aae-97cf-1efe0c386362 | cirros35_vm1_net2 | ACTIVE | net_ipv64_2=2002::f816:3eff:fe47:f78c, 10.0.2.24, 10.0.0.220 | cirros35 | cirros_flavor |
| 2d6f2598-9266-4ab4-ae8b-5fabcc69e63f | cirros35_vm2_net1 | ACTIVE | net_ipv64_1=10.0.1.10, 2001::f816:3eff:fec4:2570, 10.0.0.226 | cirros35 | cirros_flavor |
| f57bf00e-ded7-483b-8e89-ba725d717d22 | cirros35_vm1_net1 | ACTIVE | net_ipv64_1=10.0.1.5, 2001::f816:3eff:fee1:ee13, 10.0.0.213  | cirros35 | cirros_flavor |
+--------------------------------------+-------------------+--------+--------------------------------------------------------------+----------+---------------+


###### controller-1: ssh heat-admin.24.14 ######

Filesystem      Size  Used Avail Use% Mounted on
/dev/vda2        40G   18G   23G  43% /
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G   54M   16G   1% /dev/shm
tmpfs            16G  6.9M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
tmpfs           3.2G     0  3.2G   0% /run/user/1000



(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.14 "ls -lh /var/log/containers/opendaylight/"
total 4.7G
-rw-r--r--. 1 42462 42462 4.7G Nov 21 09:18 karaf.log
-rw-r--r--. 1 42462 42462 4.6G Nov 21 09:01 karaf.log.1
-rw-r--r--. 1 42462 42462 5.9M Nov 21 00:01 karaf.log.10.gz
-rw-r--r--. 1 42462 42462 5.8M Nov 20 23:01 karaf.log.11.gz
-rw-r--r--. 1 42462 42462 5.8M Nov 20 22:01 karaf.log.12.gz
-rw-r--r--. 1 42462 42462 5.6M Nov 20 21:00 karaf.log.13.gz
-rw-r--r--. 1 42462 42462 5.5M Nov 20 20:00 karaf.log.14.gz
-rw-r--r--. 1 42462 42462 6.8M Nov 21 08:02 karaf.log.2.gz
-rw-r--r--. 1 42462 42462 6.6M Nov 21 07:01 karaf.log.3.gz
-rw-r--r--. 1 42462 42462 6.5M Nov 21 06:01 karaf.log.4.gz
-rw-r--r--. 1 42462 42462 6.4M Nov 21 05:02 karaf.log.5.gz
-rw-r--r--. 1 42462 42462 6.4M Nov 21 04:02 karaf.log.6.gz
-rw-r--r--. 1 42462 42462 6.2M Nov 21 03:01 karaf.log.7.gz
-rw-r--r--. 1 42462 42462 6.1M Nov 21 02:02 karaf.log.8.gz
-rw-r--r--. 1 42462 42462 6.0M Nov 21 01:01 karaf.log.9.gz




*** /var/log/containers/neutron/server.log ***

2018-11-21 09:11:09.718 37 INFO networking_odl.journal.periodic_task [req-32bbc692-6b7d-4c29-8419-ef73aea0c343 - - - - -] Finished full_sync phase of maintenance task.
2018-11-21 09:11:09.718 37 INFO networking_odl.journal.periodic_task [req-32bbc692-6b7d-4c29-8419-ef73aea0c343 - - - - -] Starting journal_recovery phase of periodic task maintenance.
2018-11-21 09:11:09.746 37 DEBUG networking_odl.journal.recovery [req-32bbc692-6b7d-4c29-8419-ef73aea0c343 - - - - -] Attempting recovery of journal entry <networking_odl.db.models.OpenDaylightJournal[object at 7fd8308f9050] {seqnum=602, object_type=u'network', object_uuid=u'cd5cb086-22e4-434d-8eef-de38adf9e183', operation=u'update', data={'provider:physical_network': None, 'ipv6_address_scope': None, 'revision_number': 4, 'port_security_enabled': True, 'mtu': 1450, 'id': u'cd5cb086-22e4-434d-8eef-de38adf9e183', 'router:external': False, 'availability_zone_hints': [], 'availability_zones': [u'nova'], 'ipv4_address_scope': None, 'shared': False, 'project_id': u'3873ba1c11a2484bbe6b5cf2c94a742f', 'status': u'ACTIVE', 'subnets': [], 'description': u'', 'tags': [], 'updated_at': '2018-11-19T12:50:35Z', 'provider:segmentation_id': None, 'name': u'tempest-network-smoke--241992513', 'admin_state_up': True, 'tenant_id': u'3873ba1c11a2484bbe6b5cf2c94a742f', 'created_at': '2018-11-19T12:47:35Z', 'provider:network_type': None, 'vlan_transparent': None}, state='failed', retry_count=5, last_retried=datetime.datetime(2018, 11, 19, 12, 50, 36), version_id=13}>. journal_recovery /usr/lib/python2.7/site-packages/networking_odl/journal/recovery.py:39
2018-11-21 09:11:09.747 37 DEBUG networking_odl.common.client [req-32bbc692-6b7d-4c29-8419-ef73aea0c343 - - - - -] Sending METHOD (get) URL (http://172.17.1.20:8081/controller/nb/v2/neutron/networks/cd5cb086-22e4-434d-8eef-de38adf9e183) JSON (None) request /usr/lib/python2.7/site-packages/networking_odl/common/client.py:89
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task [req-32bbc692-6b7d-4c29-8419-ef73aea0c343 - - - - -] Failed during periodic task operation journal_recovery.: InvalidRequestError: Object '<OpenDaylightJournal at 0x7fd8308f9050>' is already attached to session '35892' (this is '35901')
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task Traceback (most recent call last):
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/networking_odl/journal/periodic_task.py", line 65, in _execute_op
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     operation(context)
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/neutron_lib/db/api.py", line 233, in wrapped
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     return method(*args, **kwargs)
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/neutron_lib/db/api.py", line 140, in wrapped
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     setattr(e, '_RETRY_EXCEEDED', True)
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     self.force_reraise()
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     six.reraise(self.type_, self.value, self.tb)
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/neutron_lib/db/api.py", line 136, in wrapped
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     return f(*args, **kwargs)
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     ectxt.value = e.inner_exc
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     self.force_reraise()
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     six.reraise(self.type_, self.value, self.tb)
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper
2018-11-21 09:11:09.786 37 ERROR networking_odl.journal.periodic_task     return f(*args, **kwargs)

Comment 26 Noam Manos 2018-11-22 15:01:07 UTC
Log rotation is fixed on OSP14 Puddle 2018-11-21.2:

###### controller-0: ssh heat-admin.24.14 ######
/var/log/containers/opendaylight/karaf/logs/karaf.log

###### controller-1: ssh heat-admin.24.7 ######
/var/log/containers/opendaylight/karaf/logs/karaf.log

###### controller-2: ssh heat-admin.24.8 ######
/var/log/containers/opendaylight/karaf/logs/karaf.log

Comment 29 errata-xmlrpc 2019-01-11 11:54:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.