The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1698462 - ovn-controller very high memory/RAM utilization if 2 chassis have same system-id configured (by mistake)
Summary: ovn-controller very high memory/RAM utilization if 2 chassis have same system...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.11
Version: FDP 19.C
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: haidong li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-10 12:16 UTC by Alexander
Modified: 2020-01-14 20:27 UTC (History)
9 users (show)

Fixed In Version: ovn2.11-2.11.0-18.el7fdn
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-20 11:05:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
top host-1 (72.96 KB, image/png)
2019-04-10 12:16 UTC, Alexander
no flags Details
top host-2 (81.83 KB, image/png)
2019-04-10 12:17 UTC, Alexander
no flags Details
hist-1 ovn logs (77.51 KB, application/gzip)
2019-04-10 14:43 UTC, Alexander
no flags Details
hist-2 ovn logs (5.84 MB, application/gzip)
2019-04-10 14:44 UTC, Alexander
no flags Details
host-1 ovn conf (2.79 KB, application/gzip)
2019-04-11 07:28 UTC, Alexander
no flags Details
host-2 ovn conf (2.79 KB, application/gzip)
2019-04-11 07:28 UTC, Alexander
no flags Details
before chassis-id change (2.24 KB, text/plain)
2019-04-15 08:15 UTC, Alexander
no flags Details
after chassis-id change (3.23 KB, text/plain)
2019-04-15 08:16 UTC, Alexander
no flags Details
host-1 ovn logs-1 (406.53 KB, application/gzip)
2019-04-15 08:28 UTC, Alexander
no flags Details
host-2 ovn logs-1 (448.48 KB, application/gzip)
2019-04-15 08:28 UTC, Alexander
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2527 0 None None None 2019-08-20 11:05:20 UTC

Description Alexander 2019-04-10 12:16:31 UTC
Created attachment 1554214 [details]
top host-1

Description of problem:
I've oVirt 4.2.(latest version installed) and some later was updated to oVirt 4.3.1-1.
I've two host and on each of them "ovn-controller" proccess takes too much gigs of RAM (pls see attachments).
On each of ovirt hosts running 1-2 VM, 3 VM total on cluster, with 8-16GB RAM on each VM.

Version-Release number of selected component (if applicable):
# rpm -qa|grep ovirt
python2-ovirt-setup-lib-1.2.0-1.el7.noarch
python-ovirt-engine-sdk4-4.3.0-2.el7.x86_64
ovirt-host-4.3.1-1.el7.x86_64
ovirt-imageio-daemon-1.5.1-0.el7.noarch
cockpit-ovirt-dashboard-0.12.5-1.el7.noarch
ovirt-vmconsole-host-1.0.7-2.el7.noarch
ovirt-host-dependencies-4.3.1-1.el7.x86_64
ovirt-ansible-engine-setup-1.1.9-1.el7.noarch
ovirt-host-deploy-common-1.8.0-1.el7.noarch
ovirt-release42-4.2.8-1.el7.noarch
cockpit-machines-ovirt-176-4.el7.centos.noarch
ovirt-release43-4.3.2-1.el7.noarch
ovirt-vmconsole-1.0.7-2.el7.noarch
ovirt-ansible-repositories-1.1.5-1.el7.noarch
ovirt-provider-ovn-driver-1.2.20-1.el7.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch
python2-ovirt-host-deploy-1.8.0-1.el7.noarch
ovirt-hosted-engine-setup-2.3.6-1.el7.noarch
ovirt-hosted-engine-ha-2.3.1-1.el7.noarch
ovirt-imageio-common-1.5.1-0.el7.x86_64
ovirt-ansible-hosted-engine-setup-1.0.13-1.el7.noarch

rpm -qa|grep openvswitch
openvswitch-ovn-common-2.10.1-3.el7.x86_64
python-openvswitch-2.10.1-3.el7.x86_64
openvswitch-2.10.1-3.el7.x86_64
openvswitch-ovn-host-2.10.1-3.el7.x86_64

How reproducible:

N/A
Just working after 4.2 to 4.3 ovirt update

Steps to Reproduce:
1.
2.
3.

Actual results:

Too much RAM utilization by ovn-controller proccess

Expected results:

Not such much RAM utilization

Additional info:

Comment 2 Alexander 2019-04-10 12:17:12 UTC
Created attachment 1554215 [details]
top host-2

Comment 3 Numan Siddique 2019-04-10 13:52:13 UTC
Hi Alexander,
We need more information to debug the issue.
Can you please share sosreports or ovn logs at least along with the OVN DB contents if you can.

Thanks

Comment 4 Alexander 2019-04-10 14:43:24 UTC
Created attachment 1554263 [details]
hist-1 ovn logs

Comment 5 Alexander 2019-04-10 14:44:00 UTC
Created attachment 1554264 [details]
hist-2 ovn logs

Comment 6 Alexander 2019-04-10 14:45:17 UTC
(In reply to Numan Siddique from comment #3)
> Hi Alexander,
> We need more information to debug the issue.
> Can you please share sosreports or ovn logs at least along with the OVN DB
> contents if you can.
> 
> Thanks

Numan, what's "OVN DB contents" location?

Comment 7 Numan Siddique 2019-04-10 17:53:40 UTC
It depends on how you have installed. If its rpm based then please look in - /var/lib/openvswitch or in /etc/openvswitch/

Thanks

Comment 8 Numan Siddique 2019-04-10 17:59:21 UTC
It depends on how you have installed. If its rpm based then please look in - /var/lib/openvswitch or in /etc/openvswitch/

Thanks

Comment 9 Alexander 2019-04-11 07:28:30 UTC
Created attachment 1554430 [details]
host-1 ovn conf

Comment 10 Alexander 2019-04-11 07:28:53 UTC
Created attachment 1554431 [details]
host-2 ovn conf

Comment 11 Alexander 2019-04-11 07:31:20 UTC
(In reply to Numan Siddique from comment #8)
> It depends on how you have installed. If its rpm based then please look in -
> /var/lib/openvswitch or in /etc/openvswitch/
> 
> Thanks

I've installed from centos7 repos

rpm -qi openvswitch-ovn-host-2.10.1-3.el7.x86_64
Name        : openvswitch-ovn-host
Epoch       : 1
Version     : 2.10.1
Release     : 3.el7
Architecture: x86_64
Install Date: Fri 15 Mar 2019 06:17:27 PM MSK
Group       : Unspecified
Size        : 168454
License     : ASL 2.0
Signature   : RSA/SHA1, Fri 15 Feb 2019 06:52:13 PM MSK, Key ID 7aebbe8261e8806c
Source RPM  : openvswitch-2.10.1-3.el7.src.rpm
Build Date  : Thu 14 Feb 2019 01:03:44 PM MSK
Build Host  : c1bd.rdu2.centos.org
Relocations : (not relocatable)
Packager    : CBS <cbs>
Vendor      : CentOS
URL         : http://www.openvswitch.org/
Summary     : Open vSwitch - Open Virtual Network support
Description :
OVN, the Open Virtual Network, is a system to support virtual network
abstraction.  OVN complements the existing capabilities of OVS to add
native support for virtual network abstractions, such as virtual L2 and L3
overlays and security groups.

Comment 12 qding 2019-04-11 07:41:57 UTC
Take

Comment 13 qding 2019-04-11 07:43:44 UTC
recover wrongly modification for Assignee and Status.

Comment 14 Numan Siddique 2019-04-11 08:05:39 UTC
(In reply to Alexander from comment #10)
> Created attachment 1554431 [details]
> host-2 ovn conf

Looks like you have shared - ovs db (conf.db)

Please look for ovnnb_db.db and ovnsdb_db.db

From the logs I see that there are lot of disconnections to the OVN db servers ?

There is a known bug in ovn-controller - that it consumes lot of CPU when it looses connection to
the OVN SB db server. CPU usage comes to normal when it reconnects back . This patch should address that issue - https://patchwork.ozlabs.org/patch/1076620/

Do you see the CPU usage high all the time ?
What is the CPU usage when ovn-controller is connected to the OVN DB db server.

Can you please look into the logs and see the CPU usage when it is connected. You can figure out  the connection status yourself when you see the logs.

Regarding the memory usage, I am not sure what's causing it. Is it constantly increasing ?
Not sure its because of SSL connection.

Comment 15 Alexander 2019-04-11 09:14:19 UTC
(In reply to Numan Siddique from comment #14)
> (In reply to Alexander from comment #10)
> > Created attachment 1554431 [details]
> > host-2 ovn conf
> 
> Looks like you have shared - ovs db (conf.db)

Why shared? And what does it mean?

[root@vsrvlab02-1 openvswitch]# cd /etc/openvswitch/
[root@vsrvlab02-1 openvswitch]# 
[root@vsrvlab02-1 openvswitch]# ls -al
total 76
drwxr-xr-x.   2 openvswitch openvswitch  4096 Apr 11 11:30 .
drwxr-xr-x. 133 root        root        12288 Apr  8 16:40 ..
-rw-r--r--    1 openvswitch hugetlbfs   14546 Apr  9 15:56 conf.db
-rw-r--r--    1 root        root        19585 Mar 21 11:41 conf.db.backup7.15.1-3682332033
-rw-------.   1 openvswitch openvswitch     0 Jun 20  2018 .conf.db.~lock~
-rw-r--r--    1 root        root        14546 Apr 11 11:30 conf.db.orig
-rw-------    1 openvswitch openvswitch     0 Mar 21 11:41 .conf.db.tmp.~lock~
-rw-r--r--.   1 openvswitch openvswitch   163 Oct 20 01:57 default.conf
-rw-r--r--.   1 openvswitch openvswitch    37 Jun 20  2018 system-id.conf
[root@vsrvlab02-1 openvswitch]#

[root@vsrvlab02-2 openvswitch]# cd /etc/openvswitch/
[root@vsrvlab02-2 openvswitch]# ls -al
total 60
drwxr-xr-x.   2 openvswitch openvswitch  4096 Apr 11 00:23 .
drwxr-xr-x. 133 root        root        12288 Apr  2 17:20 ..
-rw-r--r--    1 openvswitch hugetlbfs   14546 Apr 11 00:23 conf.db
-rw-r--r--    1 root        root        18511 Mar 21 11:48 conf.db.backup7.15.1-3682332033
-rw-------    1 openvswitch openvswitch     0 Mar 21 11:48 .conf.db.~lock~
-rw-------    1 openvswitch openvswitch     0 Mar 21 11:48 .conf.db.tmp.~lock~
-rw-r--r--    1 openvswitch openvswitch   163 Oct 20 01:57 default.conf
-rw-r--r--.   1 openvswitch openvswitch    37 Jun 20  2018 system-id.conf
[root@vsrvlab02-2 openvswitch]#

> 
> Please look for ovnnb_db.db and ovnsdb_db.db

I found this files on ovirt engine host

[root@ovirt-eng openvswitch]# pwd
/var/lib/openvswitch
[root@ovirt-eng openvswitch]# ls -alh
total 473M
drwxr-xr-x.  3 root root  109 Apr 11 11:51 .
drwxr-xr-x. 41 root root 4.0K Mar 19 11:29 ..
-rw-r--r--.  1 root root 9.1K Apr  9 18:06 ovnnb_db.db
-rw-------.  1 root root    0 Apr  2 15:23 .ovnnb_db.db.~lock~
-rw-r--r--.  1 root root 425M Apr 11 12:02 ovnsb_db.db
-rw-------.  1 root root    0 Apr  2 15:23 .ovnsb_db.db.~lock~
drwxr-xr-x.  2 root root    6 Feb 14 12:58 pki
[root@ovirt-eng openvswitch]#

> 
> From the logs I see that there are lot of disconnections to the OVN db
> servers ?

Yes

> 
> There is a known bug in ovn-controller - that it consumes lot of CPU when it
> looses connection to
> the OVN SB db server. CPU usage comes to normal when it reconnects back .
> This patch should address that issue -
> https://patchwork.ozlabs.org/patch/1076620/
> 
> Do you see the CPU usage high all the time ?

Yes, it take 30-40% %CPU(in "top" proccess viewer), Xeon(R) CPU E5-4640 0 @ 2.40GHz

> What is the CPU usage when ovn-controller is connected to the OVN DB db
> server.

Is it never connected?
[root@vsrvlab02-1 openvswitch]# cat ovsdb-server.log | grep -Ev "(connection dropped|receive error|Dropped)"
2019-04-11T00:20:01.923Z|07165|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
[root@vsrvlab02-1 openvswitch]#

> 
> Can you please look into the logs and see the CPU usage when it is
> connected. You can figure out  the connection status yourself when you see
> the logs.

pls see above

> 
> Regarding the memory usage, I am not sure what's causing it. Is it
> constantly increasing ?

Yes, it's will increase (one time i see about 137GB by ovn-controller proccess) while not "restart vdsm" daemon

> Not sure its because of SSL connection.

Comment 16 Numan Siddique 2019-04-11 09:44:44 UTC
(In reply to Alexander from comment #15)
> (In reply to Numan Siddique from comment #14)
> > (In reply to Alexander from comment #10)
> > > Created attachment 1554431 [details]
> > > host-2 ovn conf
> > 
> > Looks like you have shared - ovs db (conf.db)
> 
> Why shared? And what does it mean?

I meant in your previous attachments, you have shared conf.db. We are more interested in ovnnb_db.db and ovnsdb_db.db.

It would be great if you can share these files.

> 
> [root@vsrvlab02-1 openvswitch]# cd /etc/openvswitch/
> [root@vsrvlab02-1 openvswitch]# 
> [root@vsrvlab02-1 openvswitch]# ls -al
> total 76
> drwxr-xr-x.   2 openvswitch openvswitch  4096 Apr 11 11:30 .
> drwxr-xr-x. 133 root        root        12288 Apr  8 16:40 ..
> -rw-r--r--    1 openvswitch hugetlbfs   14546 Apr  9 15:56 conf.db
> -rw-r--r--    1 root        root        19585 Mar 21 11:41
> conf.db.backup7.15.1-3682332033
> -rw-------.   1 openvswitch openvswitch     0 Jun 20  2018 .conf.db.~lock~
> -rw-r--r--    1 root        root        14546 Apr 11 11:30 conf.db.orig
> -rw-------    1 openvswitch openvswitch     0 Mar 21 11:41
> .conf.db.tmp.~lock~
> -rw-r--r--.   1 openvswitch openvswitch   163 Oct 20 01:57 default.conf
> -rw-r--r--.   1 openvswitch openvswitch    37 Jun 20  2018 system-id.conf
> [root@vsrvlab02-1 openvswitch]#
> 
> [root@vsrvlab02-2 openvswitch]# cd /etc/openvswitch/
> [root@vsrvlab02-2 openvswitch]# ls -al
> total 60
> drwxr-xr-x.   2 openvswitch openvswitch  4096 Apr 11 00:23 .
> drwxr-xr-x. 133 root        root        12288 Apr  2 17:20 ..
> -rw-r--r--    1 openvswitch hugetlbfs   14546 Apr 11 00:23 conf.db
> -rw-r--r--    1 root        root        18511 Mar 21 11:48
> conf.db.backup7.15.1-3682332033
> -rw-------    1 openvswitch openvswitch     0 Mar 21 11:48 .conf.db.~lock~
> -rw-------    1 openvswitch openvswitch     0 Mar 21 11:48
> .conf.db.tmp.~lock~
> -rw-r--r--    1 openvswitch openvswitch   163 Oct 20 01:57 default.conf
> -rw-r--r--.   1 openvswitch openvswitch    37 Jun 20  2018 system-id.conf
> [root@vsrvlab02-2 openvswitch]#
> 
> > 
> > Please look for ovnnb_db.db and ovnsdb_db.db
> 
> I found this files on ovirt engine host
> 
> [root@ovirt-eng openvswitch]# pwd
> /var/lib/openvswitch
> [root@ovirt-eng openvswitch]# ls -alh
> total 473M
> drwxr-xr-x.  3 root root  109 Apr 11 11:51 .
> drwxr-xr-x. 41 root root 4.0K Mar 19 11:29 ..
> -rw-r--r--.  1 root root 9.1K Apr  9 18:06 ovnnb_db.db
> -rw-------.  1 root root    0 Apr  2 15:23 .ovnnb_db.db.~lock~
> -rw-r--r--.  1 root root 425M Apr 11 12:02 ovnsb_db.db
> -rw-------.  1 root root    0 Apr  2 15:23 .ovnsb_db.db.~lock~
> drwxr-xr-x.  2 root root    6 Feb 14 12:58 pki


If you could share these files.

> [root@ovirt-eng openvswitch]#
> 
> > 
> > From the logs I see that there are lot of disconnections to the OVN db
> > servers ?
> 
> Yes
> 
> > 
> > There is a known bug in ovn-controller - that it consumes lot of CPU when it
> > looses connection to
> > the OVN SB db server. CPU usage comes to normal when it reconnects back .
> > This patch should address that issue -
> > https://patchwork.ozlabs.org/patch/1076620/
> > 
> > Do you see the CPU usage high all the time ?
> 
> Yes, it take 30-40% %CPU(in "top" proccess viewer), Xeon(R) CPU E5-4640 0 @
> 2.40GHz
> 
> > What is the CPU usage when ovn-controller is connected to the OVN DB db
> > server.
> 
> Is it never connected?
> [root@vsrvlab02-1 openvswitch]# cat ovsdb-server.log | grep -Ev "(connection
> dropped|receive error|Dropped)"
> 2019-04-11T00:20:01.923Z|07165|vlog|INFO|opened log file
> /var/log/openvswitch/ovsdb-server.log
> [root@vsrvlab02-1 openvswitch]#
> 

Can you see the ovn-controller.log and check the status. i.e if it is connected or still trying to connect.

From the ovn-controller.log whic you shared earlier, I see that ovn-controller do get connected after
a while,

> > 
> > Can you please look into the logs and see the CPU usage when it is
> > connected. You can figure out  the connection status yourself when you see
> > the logs.
> 
> pls see above
> 
> > 
> > Regarding the memory usage, I am not sure what's causing it. Is it
> > constantly increasing ?
> 
> Yes, it's will increase (one time i see about 137GB by ovn-controller
> proccess) while not "restart vdsm" daemon
> 
> > Not sure its because of SSL connection.

Comment 17 Alexander 2019-04-11 10:19:10 UTC
Can't upload archive, file too big.
I'm try via GDrive:
"ovnnb_db.db and ovnsdb_db.db" files
https://drive.google.com/open?id=1kCP-k8-aaEwmyNKeGHfCvrUu26p-SgUe

Comment 18 Alexander 2019-04-11 10:22:53 UTC
(In reply to Numan Siddique from comment #16)
> (In reply to Alexander from comment #15)
> > (In reply to Numan Siddique from comment #14)
> > > (In reply to Alexander from comment #10)
> > > > Created attachment 1554431 [details]
> > > > host-2 ovn conf
> > > 
> > > Looks like you have shared - ovs db (conf.db)
> > 
> > Why shared? And what does it mean?
> 
> I meant in your previous attachments, you have shared conf.db. We are more
> interested in ovnnb_db.db and ovnsdb_db.db.
> 
> It would be great if you can share these files.

Attached via url to google drive

> 
> > 
> > [root@vsrvlab02-1 openvswitch]# cd /etc/openvswitch/
> > [root@vsrvlab02-1 openvswitch]# 
> > [root@vsrvlab02-1 openvswitch]# ls -al
> > total 76
> > drwxr-xr-x.   2 openvswitch openvswitch  4096 Apr 11 11:30 .
> > drwxr-xr-x. 133 root        root        12288 Apr  8 16:40 ..
> > -rw-r--r--    1 openvswitch hugetlbfs   14546 Apr  9 15:56 conf.db
> > -rw-r--r--    1 root        root        19585 Mar 21 11:41
> > conf.db.backup7.15.1-3682332033
> > -rw-------.   1 openvswitch openvswitch     0 Jun 20  2018 .conf.db.~lock~
> > -rw-r--r--    1 root        root        14546 Apr 11 11:30 conf.db.orig
> > -rw-------    1 openvswitch openvswitch     0 Mar 21 11:41
> > .conf.db.tmp.~lock~
> > -rw-r--r--.   1 openvswitch openvswitch   163 Oct 20 01:57 default.conf
> > -rw-r--r--.   1 openvswitch openvswitch    37 Jun 20  2018 system-id.conf
> > [root@vsrvlab02-1 openvswitch]#
> > 
> > [root@vsrvlab02-2 openvswitch]# cd /etc/openvswitch/
> > [root@vsrvlab02-2 openvswitch]# ls -al
> > total 60
> > drwxr-xr-x.   2 openvswitch openvswitch  4096 Apr 11 00:23 .
> > drwxr-xr-x. 133 root        root        12288 Apr  2 17:20 ..
> > -rw-r--r--    1 openvswitch hugetlbfs   14546 Apr 11 00:23 conf.db
> > -rw-r--r--    1 root        root        18511 Mar 21 11:48
> > conf.db.backup7.15.1-3682332033
> > -rw-------    1 openvswitch openvswitch     0 Mar 21 11:48 .conf.db.~lock~
> > -rw-------    1 openvswitch openvswitch     0 Mar 21 11:48
> > .conf.db.tmp.~lock~
> > -rw-r--r--    1 openvswitch openvswitch   163 Oct 20 01:57 default.conf
> > -rw-r--r--.   1 openvswitch openvswitch    37 Jun 20  2018 system-id.conf
> > [root@vsrvlab02-2 openvswitch]#
> > 
> > > 
> > > Please look for ovnnb_db.db and ovnsdb_db.db
> > 
> > I found this files on ovirt engine host
> > 
> > [root@ovirt-eng openvswitch]# pwd
> > /var/lib/openvswitch
> > [root@ovirt-eng openvswitch]# ls -alh
> > total 473M
> > drwxr-xr-x.  3 root root  109 Apr 11 11:51 .
> > drwxr-xr-x. 41 root root 4.0K Mar 19 11:29 ..
> > -rw-r--r--.  1 root root 9.1K Apr  9 18:06 ovnnb_db.db
> > -rw-------.  1 root root    0 Apr  2 15:23 .ovnnb_db.db.~lock~
> > -rw-r--r--.  1 root root 425M Apr 11 12:02 ovnsb_db.db
> > -rw-------.  1 root root    0 Apr  2 15:23 .ovnsb_db.db.~lock~
> > drwxr-xr-x.  2 root root    6 Feb 14 12:58 pki
> 
> 
> If you could share these files.

Attached via url to google drive

> 
> > [root@ovirt-eng openvswitch]#
> > 
> > > 
> > > From the logs I see that there are lot of disconnections to the OVN db
> > > servers ?
> > 
> > Yes
> > 
> > > 
> > > There is a known bug in ovn-controller - that it consumes lot of CPU when it
> > > looses connection to
> > > the OVN SB db server. CPU usage comes to normal when it reconnects back .
> > > This patch should address that issue -
> > > https://patchwork.ozlabs.org/patch/1076620/
> > > 
> > > Do you see the CPU usage high all the time ?
> > 
> > Yes, it take 30-40% %CPU(in "top" proccess viewer), Xeon(R) CPU E5-4640 0 @
> > 2.40GHz
> > 
> > > What is the CPU usage when ovn-controller is connected to the OVN DB db
> > > server.
> > 
> > Is it never connected?
> > [root@vsrvlab02-1 openvswitch]# cat ovsdb-server.log | grep -Ev "(connection
> > dropped|receive error|Dropped)"
> > 2019-04-11T00:20:01.923Z|07165|vlog|INFO|opened log file
> > /var/log/openvswitch/ovsdb-server.log
> > [root@vsrvlab02-1 openvswitch]#
> > 
> 
> Can you see the ovn-controller.log and check the status. i.e if it is
> connected or still trying to connect.
> 
> From the ovn-controller.log whic you shared earlier, I see that
> ovn-controller do get connected after
> a while,

Yes, now I see too (this night, logs was rotated).
From 00:20 AM the ovn-controller.log file has no records:
[root@vsrvlab02-1 openvswitch]# cat /var/log/openvswitch/ovn-controller.log
2019-04-11T00:20:01.883Z|00033|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log
[root@vsrvlab02-1 openvswitch]#

[root@vsrvlab02-2 openvswitch]# cat /var/log/openvswitch/ovn-controller.log
2019-04-11T00:46:01.264Z|00217|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log
[root@vsrvlab02-2 openvswitch]#

> 
> > > 
> > > Can you please look into the logs and see the CPU usage when it is
> > > connected. You can figure out  the connection status yourself when you see
> > > the logs.
> > 
> > pls see above
> > 
> > > 
> > > Regarding the memory usage, I am not sure what's causing it. Is it
> > > constantly increasing ?
> > 
> > Yes, it's will increase (one time i see about 137GB by ovn-controller
> > proccess) while not "restart vdsm" daemon
> > 
> > > Not sure its because of SSL connection.

Comment 19 Alexander 2019-04-12 07:48:47 UTC
Current RAM utilization)

[root@vsrvlab02-1 ~]# top -bc|grep ovn-controller
 14852 root      10 -10   43.4g  43.2g   3076 R  35.3 22.9 893:53.39 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/cert+
 
[root@vsrvlab02-2 openvswitch]# top -bc|grep ovn-controller
 14919 root      10 -10   56.5g  56.3g   3100 S  50.0 29.8   1329:10 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/cert+

Comment 20 Numan Siddique 2019-04-12 12:28:32 UTC
Hi Alexander,

If I see the conf.db contents from the file - 1698462-host-1-ovn-conf.tar.gz, I see system-id used is "c28e9fa6-b925-4947-8893-bf202d5d6738".
And If I see the conf.db contents from the file - 1698462-host-2-ovn-conf.tar.gz , I see the same system-id - "c28e9fa6-b925-4947-8893-bf202d5d6738".

In the ovnsb_db.conf, I can see that both the ovn-controllers (host-1 and host-2) are fighting for the chassis-name 

***
{"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-1.domain.ru","encaps":["uuid","ed20e08d-44cd-4842-bc7b-745cc52f83b5"]}},"Encap":{"ed20e08d-44cd-4842-bc7b-745cc52f83b5":{"ip":"172.25.133.36","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"fbd8f8cb-f584-4e90-85ce-61e25202502c":null},"_date":1554972717993,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"}
OVSDB JSON 479 aa50972cc0205d9b26f185da9a23ead771e270b1
{"Encap":{"4b010856-e431-458e-aeff-0d5ec0cbb211":{"ip":"172.25.133.37","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"ed20e08d-44cd-4842-bc7b-745cc52f83b5":null},"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-2.domain.ru","encaps":["uuid","4b010856-e431-458e-aeff-0d5ec0cbb211"]}},"_date":1554972717995,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"}
OVSDB JSON 479 a6ceb735c89ccbd2728a752becb5db13a80dacd2
{"Encap":{"953720be-bcea-451e-8d04-d02d5b70edf3":{"ip":"172.25.133.36","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"4b010856-e431-458e-aeff-0d5ec0cbb211":null},"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-1.domain.ru","encaps":["uuid","953720be-bcea-451e-8d04-d02d5b70edf3"]}},"_date":1554972717996,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"}
OVSDB JSON 479 a29b12e5dff94772019ef9898ab08d66a6319b0c
{"Encap":{"953720be-bcea-451e-8d04-d02d5b70edf3":null,"70b15ed9-b0e7-47fa-b52d-6b46f1cf7379":{"ip":"172.25.133.37","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"}},"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-2.domain.ru","encaps":["uuid","70b15ed9-b0e7-47fa-b52d-6b46f1cf7379"]}},"_date":1554972717997,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"}
OVSDB JSON 479 d8f29c6e828f82134dadb93bdbbba8bfac7f24b4
{"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-1.domain.ru","encaps":["uuid","9727cace-ab58-4922-9c69-48f147397d0a"]}},"Encap":{"9727cace-ab58-4922-9c69-48f147397d0a":{"ip":"172.25.133.36","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"70b15ed9-b0e7-47fa-b52d-6b46f1cf7379":null},"_date":1554972717998,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"}
OVSDB JSON 479 2bbc7b255ef4c38dc98342d842d245d6b93072b6
{"Encap":{"5e899072-6ad5-43cd-8223-3ec847f31388":{"ip":"172.25.133.37","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"9727cace-ab58-4922-9c69-48f147397d0a":null},"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-2.domain.ru","encaps":["uuid","5e899072-6ad5-43cd-8223-3ec847f31388"]}},"_date":1554972717999,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"}

***

In one of the host, can you please change the system-id to some other value and see ?

May be on host-1, please edit the file  - /etc/openvswitch/system-id.conf and change the id to some other value and restart openvswitch and ovn-controller service.

After starting openvswitch service, run "ovs-vsctl get open . external_ids" and make sure that system-id displayed there matches the value you edited.

Also, can you please share the output of "ovn-sbctl list chassis" before changing the system-id on host1 and after changing to me ?

Before changing the system-id, also run "watch -n1 ovn-sbctl list chassis" and notice if the chassis list gets updated all the time. 

Let me know how it goes. 

Also please share the output of "ovn-sbctl list encap" before and after.

Thanks

Comment 21 Alexander 2019-04-15 08:15:38 UTC
Created attachment 1555158 [details]
before chassis-id change

Comment 22 Alexander 2019-04-15 08:16:19 UTC
Created attachment 1555160 [details]
after chassis-id change

Comment 23 Alexander 2019-04-15 08:28:31 UTC
Created attachment 1555164 [details]
host-1 ovn logs-1

Comment 24 Alexander 2019-04-15 08:28:54 UTC
Created attachment 1555165 [details]
host-2 ovn logs-1

Comment 25 Numan Siddique 2019-04-15 12:33:19 UTC
Thanks.

How is the CPU/memory utilization now ? Is it any better ?

Thanks

Comment 26 Alexander 2019-04-15 13:56:04 UTC
Yes, now RAM utilization is good

[root@vsrvlab02-1 /]# top -bc|grep ovn-controller
97059 root      10 -10  251564  44112   3220 S   0.0  0.0   1:15.27 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/cert+

[root@vsrvlab02-1 /]# top -bc|grep ovs-vswitchd
96940 openvsw+  10 -10 4861948 549068  13700 S   0.0  0.3  78:51.41 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswi+

 
[root@vsrvlab02-2 brick1]# top -bc|grep ovn-controller
88573 root      10 -10  249764  42196   3108 S   0.0  0.0   1:27.49 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/cert+

[root@vsrvlab02-2 brick1]# top -bc|grep ovs-vswitchd
88329 openvsw+  10 -10 4861952 549064  13700 S   0.0  0.3  75:43.87 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswi+

CPU utilization also changed to god way.

Thank you

Comment 27 Numan Siddique 2019-04-15 14:11:21 UTC
Glad to know it worked.

Couple of issues I see here

1. After updating from ovirt4.1 to 4.2, why did the system-id issue came ? Probably you can debug this if you want

2. Why is the memory usage so high if system-id is same for 2 nodes. This needs investigation. I will leave the BZ open to investigate further.


Thanks

Comment 28 Alexander 2019-04-15 15:13:51 UTC
(In reply to Numan Siddique from comment #27)
> Glad to know it worked.
> 
> Couple of issues I see here
> 
> 1. After updating from ovirt4.1 to 4.2, why did the system-id issue came ?
> Probably you can debug this if you want

I think, it's after migration from libvirt+qemu/kvm to oVirt 4.2, when I lost data partition with VMs on host-1, but I had VMs backups on host-2 (and ovirt installed), and I just copy data partition from host-2 to host-1, had deploy glusterfs and ovirt self-hosted engine and then added host-1 and host-2 to it. After that, i had update oVirt from 4.2 to 4.3 and saw that a lot of memory utilization.
It's my first oVirt deployment, and I've no too much oVirt experience, plus crazy/raw ovirt methods for VM migration from libvirt (with disks in qcow2 format) and UI bugs like this https://bugzilla.redhat.com/show_bug.cgi?id=1690268 :)
May be there need checking for unique chassis ID, because virtualization works (even with same chassis ID)?

> 
> 2. Why is the memory usage so high if system-id is same for 2 nodes. This
> needs investigation. I will leave the BZ open to investigate further.

Agree with you. I don't know why too much memory.

Comment 29 Numan Siddique 2019-04-15 15:41:28 UTC
(In reply to Alexander from comment #28)
> (In reply to Numan Siddique from comment #27)
> > Glad to know it worked.
> > 
> > Couple of issues I see here
> > 
> > 1. After updating from ovirt4.1 to 4.2, why did the system-id issue came ?
> > Probably you can debug this if you want
> 
> I think, it's after migration from libvirt+qemu/kvm to oVirt 4.2, when I
> lost data partition with VMs on host-1, but I had VMs backups on host-2 (and
> ovirt installed), and I just copy data partition from host-2 to host-1, had
> deploy glusterfs and ovirt self-hosted engine and then added host-1 and
> host-2 to it. After that, i had update oVirt from 4.2 to 4.3 and saw that a
> lot of memory utilization.
> It's my first oVirt deployment, and I've no too much oVirt experience, plus
> crazy/raw ovirt methods for VM migration from libvirt (with disks in qcow2
> format) and UI bugs like this
> https://bugzilla.redhat.com/show_bug.cgi?id=1690268 :)

Thanks. This explains why system-id wasn't unique. I was suspecting some bug
in ovirt ovn module.


> May be there need checking for unique chassis ID, because virtualization
> works (even with same chassis ID)?
> 
> > 
> > 2. Why is the memory usage so high if system-id is same for 2 nodes. This
> > needs investigation. I will leave the BZ open to investigate further.
> 
> Agree with you. I don't know why too much memory.

Comment 31 haidong li 2019-07-29 09:25:43 UTC
I have tested on the latest version,the bug is verified on the latest version,memory usage is low:

[root@dell-per730-19 openvswitch]# rpm -qa | grep openvswitch
kernel-kernel-networking-openvswitch-ovn-1.0-130.noarch
openvswitch-selinux-extra-policy-1.0-13.el7fdp.noarch
openvswitch2.11-2.11.0-18.el7fdp.x86_64
[root@dell-per730-19 openvswitch]# rpm -qa | grep ovn
kernel-kernel-networking-openvswitch-ovn-1.0-130.noarch
ovn2.11-central-2.11.0-26.el7fdp.x86_64
ovn2.11-2.11.0-26.el7fdp.x86_64
ovn2.11-host-2.11.0-26.el7fdp.x86_64

[root@dell-per730-19 openvswitch]# ovs-vsctl get Open_vSwitch . external-ids:system-id
"hv1"
[root@dell-per730-19 openvswitch]#
top - 04:28:41 up  7:06,  1 user,  load average: 1.28, 0.50, 0.23
Tasks: 446 total,   2 running, 444 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.1 us,  0.9 sy,  0.0 ni, 95.8 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 65708808 total, 63721112 free,  1502800 used,   484896 buff/cache
KiB Swap: 29241340 total, 29241340 free,        0 used. 63768860 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                       
 9157 root      10 -10  280948   5052   1408 S  69.8  0.0   1:32.92 ovn-controller                                                                
 9194 root      20   0   69200   5352   1736 R  64.8  0.0   1:23.27 ovsdb-server                                                                  
 9129 openvsw+  10 -10 2331468 101288  17932 S   1.3  0.2   0:01.66 ovs-vswitchd                                                                  
    9 root      20   0       0      0      0 S   0.7  0.0   0:20.68 rcu_sched                                                                     
 9436 root      20   0  162304   2632   1580 R   0.7  0.0   0:00.42 top                                                                           
   56 root      20   0       0      0      0 S   0.3  0.0   0:01.04 ksoftirqd/9                                                                   
    1 root      20   0  193908   7088   4216 S   0.0  0.0   0:07.22 systemd                                                                       
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.02 kthreadd                                                                      
    4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H   


[root@dell-per730-57 openvswitch]# ovs-vsctl get Open_vSwitch . external-ids:system-id
"hv1"
[root@dell-per730-57 openvswitch]#
[root@dell-per730-57 openvswitch]# top

top - 04:31:37 up  7:09,  1 user,  load average: 0.25, 0.14, 0.09
Tasks: 465 total,   1 running, 464 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.6 us,  0.2 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65706380 total, 64442260 free,   920140 used,   343980 buff/cache
KiB Swap: 29241340 total, 29241340 free,        0 used. 64350416 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                       
 8784 root      10 -10  280948   3056   1456 S  40.5  0.0   2:03.01 ovn-controller                                                                
 8745 openvsw+  10 -10 3412884 134092  17932 S   1.0  0.2   0:02.67 ovs-vswitchd                                                                  
    9 root      20   0       0      0      0 S   0.3  0.0   0:10.07 rcu_sched                                                                     
 8907 root      20   0  162308   2652   1580 R   0.3  0.0   0:00.05 top                                                                           
    1 root      20   0  194028   7184   4216 S   0.0  0.0   0:06.30 systemd                                                                       
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.01 kthreadd                                                                      
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.97 kworker/0:0                                                                   
    4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H

Comment 33 errata-xmlrpc 2019-08-20 11:05:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2527


Note You need to log in before you can comment on or make changes to this bug.