Created attachment 1554214 [details] top host-1 Description of problem: I've oVirt 4.2.(latest version installed) and some later was updated to oVirt 4.3.1-1. I've two host and on each of them "ovn-controller" proccess takes too much gigs of RAM (pls see attachments). On each of ovirt hosts running 1-2 VM, 3 VM total on cluster, with 8-16GB RAM on each VM. Version-Release number of selected component (if applicable): # rpm -qa|grep ovirt python2-ovirt-setup-lib-1.2.0-1.el7.noarch python-ovirt-engine-sdk4-4.3.0-2.el7.x86_64 ovirt-host-4.3.1-1.el7.x86_64 ovirt-imageio-daemon-1.5.1-0.el7.noarch cockpit-ovirt-dashboard-0.12.5-1.el7.noarch ovirt-vmconsole-host-1.0.7-2.el7.noarch ovirt-host-dependencies-4.3.1-1.el7.x86_64 ovirt-ansible-engine-setup-1.1.9-1.el7.noarch ovirt-host-deploy-common-1.8.0-1.el7.noarch ovirt-release42-4.2.8-1.el7.noarch cockpit-machines-ovirt-176-4.el7.centos.noarch ovirt-release43-4.3.2-1.el7.noarch ovirt-vmconsole-1.0.7-2.el7.noarch ovirt-ansible-repositories-1.1.5-1.el7.noarch ovirt-provider-ovn-driver-1.2.20-1.el7.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch python2-ovirt-host-deploy-1.8.0-1.el7.noarch ovirt-hosted-engine-setup-2.3.6-1.el7.noarch ovirt-hosted-engine-ha-2.3.1-1.el7.noarch ovirt-imageio-common-1.5.1-0.el7.x86_64 ovirt-ansible-hosted-engine-setup-1.0.13-1.el7.noarch rpm -qa|grep openvswitch openvswitch-ovn-common-2.10.1-3.el7.x86_64 python-openvswitch-2.10.1-3.el7.x86_64 openvswitch-2.10.1-3.el7.x86_64 openvswitch-ovn-host-2.10.1-3.el7.x86_64 How reproducible: N/A Just working after 4.2 to 4.3 ovirt update Steps to Reproduce: 1. 2. 3. Actual results: Too much RAM utilization by ovn-controller proccess Expected results: Not such much RAM utilization Additional info:
Created attachment 1554215 [details] top host-2
Hi Alexander, We need more information to debug the issue. Can you please share sosreports or ovn logs at least along with the OVN DB contents if you can. Thanks
Created attachment 1554263 [details] hist-1 ovn logs
Created attachment 1554264 [details] hist-2 ovn logs
(In reply to Numan Siddique from comment #3) > Hi Alexander, > We need more information to debug the issue. > Can you please share sosreports or ovn logs at least along with the OVN DB > contents if you can. > > Thanks Numan, what's "OVN DB contents" location?
It depends on how you have installed. If its rpm based then please look in - /var/lib/openvswitch or in /etc/openvswitch/ Thanks
Created attachment 1554430 [details] host-1 ovn conf
Created attachment 1554431 [details] host-2 ovn conf
(In reply to Numan Siddique from comment #8) > It depends on how you have installed. If its rpm based then please look in - > /var/lib/openvswitch or in /etc/openvswitch/ > > Thanks I've installed from centos7 repos rpm -qi openvswitch-ovn-host-2.10.1-3.el7.x86_64 Name : openvswitch-ovn-host Epoch : 1 Version : 2.10.1 Release : 3.el7 Architecture: x86_64 Install Date: Fri 15 Mar 2019 06:17:27 PM MSK Group : Unspecified Size : 168454 License : ASL 2.0 Signature : RSA/SHA1, Fri 15 Feb 2019 06:52:13 PM MSK, Key ID 7aebbe8261e8806c Source RPM : openvswitch-2.10.1-3.el7.src.rpm Build Date : Thu 14 Feb 2019 01:03:44 PM MSK Build Host : c1bd.rdu2.centos.org Relocations : (not relocatable) Packager : CBS <cbs> Vendor : CentOS URL : http://www.openvswitch.org/ Summary : Open vSwitch - Open Virtual Network support Description : OVN, the Open Virtual Network, is a system to support virtual network abstraction. OVN complements the existing capabilities of OVS to add native support for virtual network abstractions, such as virtual L2 and L3 overlays and security groups.
Take
recover wrongly modification for Assignee and Status.
(In reply to Alexander from comment #10) > Created attachment 1554431 [details] > host-2 ovn conf Looks like you have shared - ovs db (conf.db) Please look for ovnnb_db.db and ovnsdb_db.db From the logs I see that there are lot of disconnections to the OVN db servers ? There is a known bug in ovn-controller - that it consumes lot of CPU when it looses connection to the OVN SB db server. CPU usage comes to normal when it reconnects back . This patch should address that issue - https://patchwork.ozlabs.org/patch/1076620/ Do you see the CPU usage high all the time ? What is the CPU usage when ovn-controller is connected to the OVN DB db server. Can you please look into the logs and see the CPU usage when it is connected. You can figure out the connection status yourself when you see the logs. Regarding the memory usage, I am not sure what's causing it. Is it constantly increasing ? Not sure its because of SSL connection.
(In reply to Numan Siddique from comment #14) > (In reply to Alexander from comment #10) > > Created attachment 1554431 [details] > > host-2 ovn conf > > Looks like you have shared - ovs db (conf.db) Why shared? And what does it mean? [root@vsrvlab02-1 openvswitch]# cd /etc/openvswitch/ [root@vsrvlab02-1 openvswitch]# [root@vsrvlab02-1 openvswitch]# ls -al total 76 drwxr-xr-x. 2 openvswitch openvswitch 4096 Apr 11 11:30 . drwxr-xr-x. 133 root root 12288 Apr 8 16:40 .. -rw-r--r-- 1 openvswitch hugetlbfs 14546 Apr 9 15:56 conf.db -rw-r--r-- 1 root root 19585 Mar 21 11:41 conf.db.backup7.15.1-3682332033 -rw-------. 1 openvswitch openvswitch 0 Jun 20 2018 .conf.db.~lock~ -rw-r--r-- 1 root root 14546 Apr 11 11:30 conf.db.orig -rw------- 1 openvswitch openvswitch 0 Mar 21 11:41 .conf.db.tmp.~lock~ -rw-r--r--. 1 openvswitch openvswitch 163 Oct 20 01:57 default.conf -rw-r--r--. 1 openvswitch openvswitch 37 Jun 20 2018 system-id.conf [root@vsrvlab02-1 openvswitch]# [root@vsrvlab02-2 openvswitch]# cd /etc/openvswitch/ [root@vsrvlab02-2 openvswitch]# ls -al total 60 drwxr-xr-x. 2 openvswitch openvswitch 4096 Apr 11 00:23 . drwxr-xr-x. 133 root root 12288 Apr 2 17:20 .. -rw-r--r-- 1 openvswitch hugetlbfs 14546 Apr 11 00:23 conf.db -rw-r--r-- 1 root root 18511 Mar 21 11:48 conf.db.backup7.15.1-3682332033 -rw------- 1 openvswitch openvswitch 0 Mar 21 11:48 .conf.db.~lock~ -rw------- 1 openvswitch openvswitch 0 Mar 21 11:48 .conf.db.tmp.~lock~ -rw-r--r-- 1 openvswitch openvswitch 163 Oct 20 01:57 default.conf -rw-r--r--. 1 openvswitch openvswitch 37 Jun 20 2018 system-id.conf [root@vsrvlab02-2 openvswitch]# > > Please look for ovnnb_db.db and ovnsdb_db.db I found this files on ovirt engine host [root@ovirt-eng openvswitch]# pwd /var/lib/openvswitch [root@ovirt-eng openvswitch]# ls -alh total 473M drwxr-xr-x. 3 root root 109 Apr 11 11:51 . drwxr-xr-x. 41 root root 4.0K Mar 19 11:29 .. -rw-r--r--. 1 root root 9.1K Apr 9 18:06 ovnnb_db.db -rw-------. 1 root root 0 Apr 2 15:23 .ovnnb_db.db.~lock~ -rw-r--r--. 1 root root 425M Apr 11 12:02 ovnsb_db.db -rw-------. 1 root root 0 Apr 2 15:23 .ovnsb_db.db.~lock~ drwxr-xr-x. 2 root root 6 Feb 14 12:58 pki [root@ovirt-eng openvswitch]# > > From the logs I see that there are lot of disconnections to the OVN db > servers ? Yes > > There is a known bug in ovn-controller - that it consumes lot of CPU when it > looses connection to > the OVN SB db server. CPU usage comes to normal when it reconnects back . > This patch should address that issue - > https://patchwork.ozlabs.org/patch/1076620/ > > Do you see the CPU usage high all the time ? Yes, it take 30-40% %CPU(in "top" proccess viewer), Xeon(R) CPU E5-4640 0 @ 2.40GHz > What is the CPU usage when ovn-controller is connected to the OVN DB db > server. Is it never connected? [root@vsrvlab02-1 openvswitch]# cat ovsdb-server.log | grep -Ev "(connection dropped|receive error|Dropped)" 2019-04-11T00:20:01.923Z|07165|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log [root@vsrvlab02-1 openvswitch]# > > Can you please look into the logs and see the CPU usage when it is > connected. You can figure out the connection status yourself when you see > the logs. pls see above > > Regarding the memory usage, I am not sure what's causing it. Is it > constantly increasing ? Yes, it's will increase (one time i see about 137GB by ovn-controller proccess) while not "restart vdsm" daemon > Not sure its because of SSL connection.
(In reply to Alexander from comment #15) > (In reply to Numan Siddique from comment #14) > > (In reply to Alexander from comment #10) > > > Created attachment 1554431 [details] > > > host-2 ovn conf > > > > Looks like you have shared - ovs db (conf.db) > > Why shared? And what does it mean? I meant in your previous attachments, you have shared conf.db. We are more interested in ovnnb_db.db and ovnsdb_db.db. It would be great if you can share these files. > > [root@vsrvlab02-1 openvswitch]# cd /etc/openvswitch/ > [root@vsrvlab02-1 openvswitch]# > [root@vsrvlab02-1 openvswitch]# ls -al > total 76 > drwxr-xr-x. 2 openvswitch openvswitch 4096 Apr 11 11:30 . > drwxr-xr-x. 133 root root 12288 Apr 8 16:40 .. > -rw-r--r-- 1 openvswitch hugetlbfs 14546 Apr 9 15:56 conf.db > -rw-r--r-- 1 root root 19585 Mar 21 11:41 > conf.db.backup7.15.1-3682332033 > -rw-------. 1 openvswitch openvswitch 0 Jun 20 2018 .conf.db.~lock~ > -rw-r--r-- 1 root root 14546 Apr 11 11:30 conf.db.orig > -rw------- 1 openvswitch openvswitch 0 Mar 21 11:41 > .conf.db.tmp.~lock~ > -rw-r--r--. 1 openvswitch openvswitch 163 Oct 20 01:57 default.conf > -rw-r--r--. 1 openvswitch openvswitch 37 Jun 20 2018 system-id.conf > [root@vsrvlab02-1 openvswitch]# > > [root@vsrvlab02-2 openvswitch]# cd /etc/openvswitch/ > [root@vsrvlab02-2 openvswitch]# ls -al > total 60 > drwxr-xr-x. 2 openvswitch openvswitch 4096 Apr 11 00:23 . > drwxr-xr-x. 133 root root 12288 Apr 2 17:20 .. > -rw-r--r-- 1 openvswitch hugetlbfs 14546 Apr 11 00:23 conf.db > -rw-r--r-- 1 root root 18511 Mar 21 11:48 > conf.db.backup7.15.1-3682332033 > -rw------- 1 openvswitch openvswitch 0 Mar 21 11:48 .conf.db.~lock~ > -rw------- 1 openvswitch openvswitch 0 Mar 21 11:48 > .conf.db.tmp.~lock~ > -rw-r--r-- 1 openvswitch openvswitch 163 Oct 20 01:57 default.conf > -rw-r--r--. 1 openvswitch openvswitch 37 Jun 20 2018 system-id.conf > [root@vsrvlab02-2 openvswitch]# > > > > > Please look for ovnnb_db.db and ovnsdb_db.db > > I found this files on ovirt engine host > > [root@ovirt-eng openvswitch]# pwd > /var/lib/openvswitch > [root@ovirt-eng openvswitch]# ls -alh > total 473M > drwxr-xr-x. 3 root root 109 Apr 11 11:51 . > drwxr-xr-x. 41 root root 4.0K Mar 19 11:29 .. > -rw-r--r--. 1 root root 9.1K Apr 9 18:06 ovnnb_db.db > -rw-------. 1 root root 0 Apr 2 15:23 .ovnnb_db.db.~lock~ > -rw-r--r--. 1 root root 425M Apr 11 12:02 ovnsb_db.db > -rw-------. 1 root root 0 Apr 2 15:23 .ovnsb_db.db.~lock~ > drwxr-xr-x. 2 root root 6 Feb 14 12:58 pki If you could share these files. > [root@ovirt-eng openvswitch]# > > > > > From the logs I see that there are lot of disconnections to the OVN db > > servers ? > > Yes > > > > > There is a known bug in ovn-controller - that it consumes lot of CPU when it > > looses connection to > > the OVN SB db server. CPU usage comes to normal when it reconnects back . > > This patch should address that issue - > > https://patchwork.ozlabs.org/patch/1076620/ > > > > Do you see the CPU usage high all the time ? > > Yes, it take 30-40% %CPU(in "top" proccess viewer), Xeon(R) CPU E5-4640 0 @ > 2.40GHz > > > What is the CPU usage when ovn-controller is connected to the OVN DB db > > server. > > Is it never connected? > [root@vsrvlab02-1 openvswitch]# cat ovsdb-server.log | grep -Ev "(connection > dropped|receive error|Dropped)" > 2019-04-11T00:20:01.923Z|07165|vlog|INFO|opened log file > /var/log/openvswitch/ovsdb-server.log > [root@vsrvlab02-1 openvswitch]# > Can you see the ovn-controller.log and check the status. i.e if it is connected or still trying to connect. From the ovn-controller.log whic you shared earlier, I see that ovn-controller do get connected after a while, > > > > Can you please look into the logs and see the CPU usage when it is > > connected. You can figure out the connection status yourself when you see > > the logs. > > pls see above > > > > > Regarding the memory usage, I am not sure what's causing it. Is it > > constantly increasing ? > > Yes, it's will increase (one time i see about 137GB by ovn-controller > proccess) while not "restart vdsm" daemon > > > Not sure its because of SSL connection.
Can't upload archive, file too big. I'm try via GDrive: "ovnnb_db.db and ovnsdb_db.db" files https://drive.google.com/open?id=1kCP-k8-aaEwmyNKeGHfCvrUu26p-SgUe
(In reply to Numan Siddique from comment #16) > (In reply to Alexander from comment #15) > > (In reply to Numan Siddique from comment #14) > > > (In reply to Alexander from comment #10) > > > > Created attachment 1554431 [details] > > > > host-2 ovn conf > > > > > > Looks like you have shared - ovs db (conf.db) > > > > Why shared? And what does it mean? > > I meant in your previous attachments, you have shared conf.db. We are more > interested in ovnnb_db.db and ovnsdb_db.db. > > It would be great if you can share these files. Attached via url to google drive > > > > > [root@vsrvlab02-1 openvswitch]# cd /etc/openvswitch/ > > [root@vsrvlab02-1 openvswitch]# > > [root@vsrvlab02-1 openvswitch]# ls -al > > total 76 > > drwxr-xr-x. 2 openvswitch openvswitch 4096 Apr 11 11:30 . > > drwxr-xr-x. 133 root root 12288 Apr 8 16:40 .. > > -rw-r--r-- 1 openvswitch hugetlbfs 14546 Apr 9 15:56 conf.db > > -rw-r--r-- 1 root root 19585 Mar 21 11:41 > > conf.db.backup7.15.1-3682332033 > > -rw-------. 1 openvswitch openvswitch 0 Jun 20 2018 .conf.db.~lock~ > > -rw-r--r-- 1 root root 14546 Apr 11 11:30 conf.db.orig > > -rw------- 1 openvswitch openvswitch 0 Mar 21 11:41 > > .conf.db.tmp.~lock~ > > -rw-r--r--. 1 openvswitch openvswitch 163 Oct 20 01:57 default.conf > > -rw-r--r--. 1 openvswitch openvswitch 37 Jun 20 2018 system-id.conf > > [root@vsrvlab02-1 openvswitch]# > > > > [root@vsrvlab02-2 openvswitch]# cd /etc/openvswitch/ > > [root@vsrvlab02-2 openvswitch]# ls -al > > total 60 > > drwxr-xr-x. 2 openvswitch openvswitch 4096 Apr 11 00:23 . > > drwxr-xr-x. 133 root root 12288 Apr 2 17:20 .. > > -rw-r--r-- 1 openvswitch hugetlbfs 14546 Apr 11 00:23 conf.db > > -rw-r--r-- 1 root root 18511 Mar 21 11:48 > > conf.db.backup7.15.1-3682332033 > > -rw------- 1 openvswitch openvswitch 0 Mar 21 11:48 .conf.db.~lock~ > > -rw------- 1 openvswitch openvswitch 0 Mar 21 11:48 > > .conf.db.tmp.~lock~ > > -rw-r--r-- 1 openvswitch openvswitch 163 Oct 20 01:57 default.conf > > -rw-r--r--. 1 openvswitch openvswitch 37 Jun 20 2018 system-id.conf > > [root@vsrvlab02-2 openvswitch]# > > > > > > > > Please look for ovnnb_db.db and ovnsdb_db.db > > > > I found this files on ovirt engine host > > > > [root@ovirt-eng openvswitch]# pwd > > /var/lib/openvswitch > > [root@ovirt-eng openvswitch]# ls -alh > > total 473M > > drwxr-xr-x. 3 root root 109 Apr 11 11:51 . > > drwxr-xr-x. 41 root root 4.0K Mar 19 11:29 .. > > -rw-r--r--. 1 root root 9.1K Apr 9 18:06 ovnnb_db.db > > -rw-------. 1 root root 0 Apr 2 15:23 .ovnnb_db.db.~lock~ > > -rw-r--r--. 1 root root 425M Apr 11 12:02 ovnsb_db.db > > -rw-------. 1 root root 0 Apr 2 15:23 .ovnsb_db.db.~lock~ > > drwxr-xr-x. 2 root root 6 Feb 14 12:58 pki > > > If you could share these files. Attached via url to google drive > > > [root@ovirt-eng openvswitch]# > > > > > > > > From the logs I see that there are lot of disconnections to the OVN db > > > servers ? > > > > Yes > > > > > > > > There is a known bug in ovn-controller - that it consumes lot of CPU when it > > > looses connection to > > > the OVN SB db server. CPU usage comes to normal when it reconnects back . > > > This patch should address that issue - > > > https://patchwork.ozlabs.org/patch/1076620/ > > > > > > Do you see the CPU usage high all the time ? > > > > Yes, it take 30-40% %CPU(in "top" proccess viewer), Xeon(R) CPU E5-4640 0 @ > > 2.40GHz > > > > > What is the CPU usage when ovn-controller is connected to the OVN DB db > > > server. > > > > Is it never connected? > > [root@vsrvlab02-1 openvswitch]# cat ovsdb-server.log | grep -Ev "(connection > > dropped|receive error|Dropped)" > > 2019-04-11T00:20:01.923Z|07165|vlog|INFO|opened log file > > /var/log/openvswitch/ovsdb-server.log > > [root@vsrvlab02-1 openvswitch]# > > > > Can you see the ovn-controller.log and check the status. i.e if it is > connected or still trying to connect. > > From the ovn-controller.log whic you shared earlier, I see that > ovn-controller do get connected after > a while, Yes, now I see too (this night, logs was rotated). From 00:20 AM the ovn-controller.log file has no records: [root@vsrvlab02-1 openvswitch]# cat /var/log/openvswitch/ovn-controller.log 2019-04-11T00:20:01.883Z|00033|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log [root@vsrvlab02-1 openvswitch]# [root@vsrvlab02-2 openvswitch]# cat /var/log/openvswitch/ovn-controller.log 2019-04-11T00:46:01.264Z|00217|vlog|INFO|opened log file /var/log/openvswitch/ovn-controller.log [root@vsrvlab02-2 openvswitch]# > > > > > > > Can you please look into the logs and see the CPU usage when it is > > > connected. You can figure out the connection status yourself when you see > > > the logs. > > > > pls see above > > > > > > > > Regarding the memory usage, I am not sure what's causing it. Is it > > > constantly increasing ? > > > > Yes, it's will increase (one time i see about 137GB by ovn-controller > > proccess) while not "restart vdsm" daemon > > > > > Not sure its because of SSL connection.
Current RAM utilization) [root@vsrvlab02-1 ~]# top -bc|grep ovn-controller 14852 root 10 -10 43.4g 43.2g 3076 R 35.3 22.9 893:53.39 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/cert+ [root@vsrvlab02-2 openvswitch]# top -bc|grep ovn-controller 14919 root 10 -10 56.5g 56.3g 3100 S 50.0 29.8 1329:10 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/cert+
Hi Alexander, If I see the conf.db contents from the file - 1698462-host-1-ovn-conf.tar.gz, I see system-id used is "c28e9fa6-b925-4947-8893-bf202d5d6738". And If I see the conf.db contents from the file - 1698462-host-2-ovn-conf.tar.gz , I see the same system-id - "c28e9fa6-b925-4947-8893-bf202d5d6738". In the ovnsb_db.conf, I can see that both the ovn-controllers (host-1 and host-2) are fighting for the chassis-name *** {"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-1.domain.ru","encaps":["uuid","ed20e08d-44cd-4842-bc7b-745cc52f83b5"]}},"Encap":{"ed20e08d-44cd-4842-bc7b-745cc52f83b5":{"ip":"172.25.133.36","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"fbd8f8cb-f584-4e90-85ce-61e25202502c":null},"_date":1554972717993,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"} OVSDB JSON 479 aa50972cc0205d9b26f185da9a23ead771e270b1 {"Encap":{"4b010856-e431-458e-aeff-0d5ec0cbb211":{"ip":"172.25.133.37","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"ed20e08d-44cd-4842-bc7b-745cc52f83b5":null},"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-2.domain.ru","encaps":["uuid","4b010856-e431-458e-aeff-0d5ec0cbb211"]}},"_date":1554972717995,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"} OVSDB JSON 479 a6ceb735c89ccbd2728a752becb5db13a80dacd2 {"Encap":{"953720be-bcea-451e-8d04-d02d5b70edf3":{"ip":"172.25.133.36","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"4b010856-e431-458e-aeff-0d5ec0cbb211":null},"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-1.domain.ru","encaps":["uuid","953720be-bcea-451e-8d04-d02d5b70edf3"]}},"_date":1554972717996,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"} OVSDB JSON 479 a29b12e5dff94772019ef9898ab08d66a6319b0c {"Encap":{"953720be-bcea-451e-8d04-d02d5b70edf3":null,"70b15ed9-b0e7-47fa-b52d-6b46f1cf7379":{"ip":"172.25.133.37","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"}},"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-2.domain.ru","encaps":["uuid","70b15ed9-b0e7-47fa-b52d-6b46f1cf7379"]}},"_date":1554972717997,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"} OVSDB JSON 479 d8f29c6e828f82134dadb93bdbbba8bfac7f24b4 {"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-1.domain.ru","encaps":["uuid","9727cace-ab58-4922-9c69-48f147397d0a"]}},"Encap":{"9727cace-ab58-4922-9c69-48f147397d0a":{"ip":"172.25.133.36","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"70b15ed9-b0e7-47fa-b52d-6b46f1cf7379":null},"_date":1554972717998,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"} OVSDB JSON 479 2bbc7b255ef4c38dc98342d842d245d6b93072b6 {"Encap":{"5e899072-6ad5-43cd-8223-3ec847f31388":{"ip":"172.25.133.37","options":["map",[["csum","true"]]],"chassis_name":"c28e9fa6-b925-4947-8893-bf202d5d6738","type":"geneve"},"9727cace-ab58-4922-9c69-48f147397d0a":null},"Chassis":{"15baab56-d8a7-4a7e-a817-64961cf88404":{"hostname":"vsrvlab02-2.domain.ru","encaps":["uuid","5e899072-6ad5-43cd-8223-3ec847f31388"]}},"_date":1554972717999,"_comment":"ovn-controller: registering chassis 'c28e9fa6-b925-4947-8893-bf202d5d6738'"} *** In one of the host, can you please change the system-id to some other value and see ? May be on host-1, please edit the file - /etc/openvswitch/system-id.conf and change the id to some other value and restart openvswitch and ovn-controller service. After starting openvswitch service, run "ovs-vsctl get open . external_ids" and make sure that system-id displayed there matches the value you edited. Also, can you please share the output of "ovn-sbctl list chassis" before changing the system-id on host1 and after changing to me ? Before changing the system-id, also run "watch -n1 ovn-sbctl list chassis" and notice if the chassis list gets updated all the time. Let me know how it goes. Also please share the output of "ovn-sbctl list encap" before and after. Thanks
Created attachment 1555158 [details] before chassis-id change
Created attachment 1555160 [details] after chassis-id change
Created attachment 1555164 [details] host-1 ovn logs-1
Created attachment 1555165 [details] host-2 ovn logs-1
Thanks. How is the CPU/memory utilization now ? Is it any better ? Thanks
Yes, now RAM utilization is good [root@vsrvlab02-1 /]# top -bc|grep ovn-controller 97059 root 10 -10 251564 44112 3220 S 0.0 0.0 1:15.27 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/cert+ [root@vsrvlab02-1 /]# top -bc|grep ovs-vswitchd 96940 openvsw+ 10 -10 4861948 549068 13700 S 0.0 0.3 78:51.41 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswi+ [root@vsrvlab02-2 brick1]# top -bc|grep ovn-controller 88573 root 10 -10 249764 42196 3108 S 0.0 0.0 1:27.49 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/cert+ [root@vsrvlab02-2 brick1]# top -bc|grep ovs-vswitchd 88329 openvsw+ 10 -10 4861952 549064 13700 S 0.0 0.3 75:43.87 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswi+ CPU utilization also changed to god way. Thank you
Glad to know it worked. Couple of issues I see here 1. After updating from ovirt4.1 to 4.2, why did the system-id issue came ? Probably you can debug this if you want 2. Why is the memory usage so high if system-id is same for 2 nodes. This needs investigation. I will leave the BZ open to investigate further. Thanks
(In reply to Numan Siddique from comment #27) > Glad to know it worked. > > Couple of issues I see here > > 1. After updating from ovirt4.1 to 4.2, why did the system-id issue came ? > Probably you can debug this if you want I think, it's after migration from libvirt+qemu/kvm to oVirt 4.2, when I lost data partition with VMs on host-1, but I had VMs backups on host-2 (and ovirt installed), and I just copy data partition from host-2 to host-1, had deploy glusterfs and ovirt self-hosted engine and then added host-1 and host-2 to it. After that, i had update oVirt from 4.2 to 4.3 and saw that a lot of memory utilization. It's my first oVirt deployment, and I've no too much oVirt experience, plus crazy/raw ovirt methods for VM migration from libvirt (with disks in qcow2 format) and UI bugs like this https://bugzilla.redhat.com/show_bug.cgi?id=1690268 :) May be there need checking for unique chassis ID, because virtualization works (even with same chassis ID)? > > 2. Why is the memory usage so high if system-id is same for 2 nodes. This > needs investigation. I will leave the BZ open to investigate further. Agree with you. I don't know why too much memory.
(In reply to Alexander from comment #28) > (In reply to Numan Siddique from comment #27) > > Glad to know it worked. > > > > Couple of issues I see here > > > > 1. After updating from ovirt4.1 to 4.2, why did the system-id issue came ? > > Probably you can debug this if you want > > I think, it's after migration from libvirt+qemu/kvm to oVirt 4.2, when I > lost data partition with VMs on host-1, but I had VMs backups on host-2 (and > ovirt installed), and I just copy data partition from host-2 to host-1, had > deploy glusterfs and ovirt self-hosted engine and then added host-1 and > host-2 to it. After that, i had update oVirt from 4.2 to 4.3 and saw that a > lot of memory utilization. > It's my first oVirt deployment, and I've no too much oVirt experience, plus > crazy/raw ovirt methods for VM migration from libvirt (with disks in qcow2 > format) and UI bugs like this > https://bugzilla.redhat.com/show_bug.cgi?id=1690268 :) Thanks. This explains why system-id wasn't unique. I was suspecting some bug in ovirt ovn module. > May be there need checking for unique chassis ID, because virtualization > works (even with same chassis ID)? > > > > > 2. Why is the memory usage so high if system-id is same for 2 nodes. This > > needs investigation. I will leave the BZ open to investigate further. > > Agree with you. I don't know why too much memory.
I have tested on the latest version,the bug is verified on the latest version,memory usage is low: [root@dell-per730-19 openvswitch]# rpm -qa | grep openvswitch kernel-kernel-networking-openvswitch-ovn-1.0-130.noarch openvswitch-selinux-extra-policy-1.0-13.el7fdp.noarch openvswitch2.11-2.11.0-18.el7fdp.x86_64 [root@dell-per730-19 openvswitch]# rpm -qa | grep ovn kernel-kernel-networking-openvswitch-ovn-1.0-130.noarch ovn2.11-central-2.11.0-26.el7fdp.x86_64 ovn2.11-2.11.0-26.el7fdp.x86_64 ovn2.11-host-2.11.0-26.el7fdp.x86_64 [root@dell-per730-19 openvswitch]# ovs-vsctl get Open_vSwitch . external-ids:system-id "hv1" [root@dell-per730-19 openvswitch]# top - 04:28:41 up 7:06, 1 user, load average: 1.28, 0.50, 0.23 Tasks: 446 total, 2 running, 444 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.1 us, 0.9 sy, 0.0 ni, 95.8 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem : 65708808 total, 63721112 free, 1502800 used, 484896 buff/cache KiB Swap: 29241340 total, 29241340 free, 0 used. 63768860 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9157 root 10 -10 280948 5052 1408 S 69.8 0.0 1:32.92 ovn-controller 9194 root 20 0 69200 5352 1736 R 64.8 0.0 1:23.27 ovsdb-server 9129 openvsw+ 10 -10 2331468 101288 17932 S 1.3 0.2 0:01.66 ovs-vswitchd 9 root 20 0 0 0 0 S 0.7 0.0 0:20.68 rcu_sched 9436 root 20 0 162304 2632 1580 R 0.7 0.0 0:00.42 top 56 root 20 0 0 0 0 S 0.3 0.0 0:01.04 ksoftirqd/9 1 root 20 0 193908 7088 4216 S 0.0 0.0 0:07.22 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd 4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H [root@dell-per730-57 openvswitch]# ovs-vsctl get Open_vSwitch . external-ids:system-id "hv1" [root@dell-per730-57 openvswitch]# [root@dell-per730-57 openvswitch]# top top - 04:31:37 up 7:09, 1 user, load average: 0.25, 0.14, 0.09 Tasks: 465 total, 1 running, 464 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.6 us, 0.2 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 65706380 total, 64442260 free, 920140 used, 343980 buff/cache KiB Swap: 29241340 total, 29241340 free, 0 used. 64350416 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8784 root 10 -10 280948 3056 1456 S 40.5 0.0 2:03.01 ovn-controller 8745 openvsw+ 10 -10 3412884 134092 17932 S 1.0 0.2 0:02.67 ovs-vswitchd 9 root 20 0 0 0 0 S 0.3 0.0 0:10.07 rcu_sched 8907 root 20 0 162308 2652 1580 R 0.3 0.0 0:00.05 top 1 root 20 0 194028 7184 4216 S 0.0 0.0 0:06.30 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.97 kworker/0:0 4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2527