Bug 1818844

Summary: ovn_controller crashes after setup is up for a day
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Numan Siddique <nusiddiq>
Component: ovn2.11Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: ying xu <yinxu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: FDP 20.DCC: apevec, atragler, ctrautma, dalvarez, dceara, ealcaniz, ffernand, fhallal, itbrown, jiji, jlibosva, jschluet, juriarte, lhh, majopela, mmichels, nusiddiq, oblaut, rheinzma, rsafrono, scohen, twilson
Target Milestone: ---Keywords: TestBlockerForLayeredProduct, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn2.11-2.11.1-41.el7fdn Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1812009
: 1819604 1822542 (view as bug list) Environment:
Last Closed: 2020-05-26 14:07:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1812009, 1819604, 1822542    

Comment 13 ying xu 2020-04-21 09:13:07 UTC
hi,Itzik Brown,

Could you pls help verify this bug in your env?
thanks very much!

Comment 14 Itzik Brown 2020-04-26 08:30:23 UTC
With the new images after two days the neutron_api crashes

# docker ps -a |grep neutron
901af86d68b6  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-server-ovn:20200416.1      kolla_start           2 days ago  Exited (0) 19 minutes ago         neutron_api
ff00b5cd352e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-server-ovn:20200416.1      /usr/bin/bootstra...  2 days ago  Exited (0) 2 days ago             neutron_db_sync
dca975009797  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-server-ovn:20200416.1      /bin/bash -c chow...  2 days ago  Exited (0) 2 days ago             neutron_init_logs

From neutron log:

    2020-04-26 03:43:45.119 33 DEBUG neutron.service [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling wait on <oslo_messaging.rpc.server.RPCServer object at 0x7f79322629b0> _wait /usr/lib/python3.6/site-packages/neutron/service.py:131
    2020-04-26 03:43:45.139 33 DEBUG neutron.service [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling wait on <oslo_messaging.rpc.server.RPCServer object at 0x7f7932262438> _wait /usr/lib/python3.6/site-packages/neutron/service.py:131
    2020-04-26 03:43:45.144 34 DEBUG oslo_concurrency.lockutils [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] Acquired lock "singleton_lock" lock /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:265
    2020-04-26 03:43:45.144 34 DEBUG oslo_concurrency.lockutils [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] Releasing lock "singleton_lock" lock /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:281
    2020-04-26 03:43:45.145 34 DEBUG neutron.service [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling RpcWorker wait() _wait /usr/lib/python3.6/site-packages/neutron/service.py:128
    2020-04-26 03:43:45.145 34 DEBUG neutron.service [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling wait on <oslo_messaging.rpc.server.RPCServer object at 0x7f79322b74e0> _wait /usr/lib/python3.6/site-packages/neutron/service.py:131
    2020-04-26 03:43:45.156 33 DEBUG neutron.service [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling wait on <oslo_messaging.rpc.server.RPCServer object at 0x7f7932262c50> _wait /usr/lib/python3.6/site-packages/neutron/service.py:131
    2020-04-26 03:43:45.162 34 DEBUG neutron.service [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] returning from RpcWorker wait() _wait /usr/lib/python3.6/site-packages/neutron/service.py:135
    2020-04-26 03:43:45.170 33 DEBUG neutron.service [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] returning from RpcWorker wait() _wait /usr/lib/python3.6/site-packages/neutron/service.py:135
    2020-04-26 03:43:45.175 7 INFO oslo_service.service [-] Child 34 exited with status 0
    2020-04-26 03:43:45.184 7 INFO oslo_service.service [-] Child 33 exited with status 0
    2020-04-26 03:43:45.185 7 INFO oslo_service.service [-] Wait called after thread killed. Cleaning up.
    2020-04-26 03:43:45.185 7 DEBUG oslo_service.service [-] Stop services. stop /usr/lib/python3.6/site-packages/oslo_service/service.py:699
    2020-04-26 03:43:45.185 7 DEBUG oslo_service.service [-] Killing children. stop /usr/lib/python3.6/site-packages/oslo_service/service.py:704

Comment 15 Dumitru Ceara 2020-04-27 07:49:37 UTC
(In reply to Itzik Brown from comment #14)
> With the new images after two days the neutron_api crashes
> 
> # docker ps -a |grep neutron
> 901af86d68b6 
> undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-
> server-ovn:20200416.1      kolla_start           2 days ago  Exited (0) 19
> minutes ago         neutron_api
> ff00b5cd352e 
> undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-
> server-ovn:20200416.1      /usr/bin/bootstra...  2 days ago  Exited (0) 2
> days ago             neutron_db_sync
> dca975009797 
> undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-
> server-ovn:20200416.1      /bin/bash -c chow...  2 days ago  Exited (0) 2
> days ago             neutron_init_logs
> 
> From neutron log:
> 
>     2020-04-26 03:43:45.119 33 DEBUG neutron.service
> [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling wait on
> <oslo_messaging.rpc.server.RPCServer object at 0x7f79322629b0> _wait
> /usr/lib/python3.6/site-packages/neutron/service.py:131
>     2020-04-26 03:43:45.139 33 DEBUG neutron.service
> [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling wait on
> <oslo_messaging.rpc.server.RPCServer object at 0x7f7932262438> _wait
> /usr/lib/python3.6/site-packages/neutron/service.py:131
>     2020-04-26 03:43:45.144 34 DEBUG oslo_concurrency.lockutils
> [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] Acquired lock
> "singleton_lock" lock
> /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:265
>     2020-04-26 03:43:45.144 34 DEBUG oslo_concurrency.lockutils
> [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] Releasing lock
> "singleton_lock" lock
> /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:281
>     2020-04-26 03:43:45.145 34 DEBUG neutron.service
> [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling RpcWorker
> wait() _wait /usr/lib/python3.6/site-packages/neutron/service.py:128
>     2020-04-26 03:43:45.145 34 DEBUG neutron.service
> [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling wait on
> <oslo_messaging.rpc.server.RPCServer object at 0x7f79322b74e0> _wait
> /usr/lib/python3.6/site-packages/neutron/service.py:131
>     2020-04-26 03:43:45.156 33 DEBUG neutron.service
> [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] calling wait on
> <oslo_messaging.rpc.server.RPCServer object at 0x7f7932262c50> _wait
> /usr/lib/python3.6/site-packages/neutron/service.py:131
>     2020-04-26 03:43:45.162 34 DEBUG neutron.service
> [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] returning from
> RpcWorker wait() _wait
> /usr/lib/python3.6/site-packages/neutron/service.py:135
>     2020-04-26 03:43:45.170 33 DEBUG neutron.service
> [req-f4a8fb18-1ab1-4927-8c5d-4103a955aa17 - - - - -] returning from
> RpcWorker wait() _wait
> /usr/lib/python3.6/site-packages/neutron/service.py:135
>     2020-04-26 03:43:45.175 7 INFO oslo_service.service [-] Child 34 exited
> with status 0
>     2020-04-26 03:43:45.184 7 INFO oslo_service.service [-] Child 33 exited
> with status 0
>     2020-04-26 03:43:45.185 7 INFO oslo_service.service [-] Wait called
> after thread killed. Cleaning up.
>     2020-04-26 03:43:45.185 7 DEBUG oslo_service.service [-] Stop services.
> stop /usr/lib/python3.6/site-packages/oslo_service/service.py:699
>     2020-04-26 03:43:45.185 7 DEBUG oslo_service.service [-] Killing
> children. stop /usr/lib/python3.6/site-packages/oslo_service/service.py:704

Hi Itzik,

Is ovn-controller still crashing? This BZ was supposed to track fixes in ovn-controller for the segfault that was discovered in BZ 1812009.

If ovn-controller crashed, can you please attach the coredump? If not then there might also be an issue that's not related to core OVN and should be tracked through BZ 1812009.

Thanks,
Dumitru

Comment 16 Itzik Brown 2020-04-27 11:13:10 UTC
ovn controller is not crashing.

Comment 17 Dumitru Ceara 2020-04-27 12:12:10 UTC
(In reply to Itzik Brown from comment #16)
> ovn controller is not crashing.

Thanks, moving back to ON_QA based on this.

Regards,
Dumitru

Comment 20 errata-xmlrpc 2020-05-26 14:07:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2318