Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 856273

Summary: [Docs][Scalability][RFE][KBase] - give proper guidelines and documentation on configuring scaled RHEV-M environments
Product: Red Hat Enterprise Virtualization Manager Reporter: Rami Vaknin <rvaknin>
Component: DocumentationAssignee: Avital Pinnick <apinnick>
Status: CLOSED CURRENTRELEASE QA Contact: Tahlia Richardson <trichard>
Severity: medium Docs Contact:
Priority: high    
Version: 3.1.0CC: apinnick, bsettle, dagur, dossow, jcoscia, lpeer, lsurette, mkalinin, mkolbas, mlehrer, obockows, ohochman, pkovar, pstehlik, rdlugyhe, rgolan, Rhev-m-bugs, rhodain, sgordon, srevivo, tdosek, tvvcox
Target Milestone: ovirt-4.2.8Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: docs-accepted
Fixed In Version: Doc Type: Release Note
Doc Text:
On large scale deployments, of 200 hosts and above, you may need to increase the maximum allowed connection on the database server from the default value of 150 to 75% of the expected number of hosts. By default this value is found in the "/var/lib/pgsql/data/postgresql.conf" file on the database server.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-03 07:55:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 861705    
Bug Blocks:    
Attachments:
Description Flags
dwh and server logs
none
engine logs
none
Logs from case 00750506
none
engine.log case 00807367 none

Description Rami Vaknin 2012-09-11 15:34:09 UTC
Description of problem:
ovirt-engine-dwh and ovirt-engine can't start since they exceeded postgresql's default max_connections value in loaded rhevm that runs reporting.

/var/lib/pgsql/data/postgresql.conf:max_connections=150

Scale:
224 hosts (200 fake hosts + 24 bare metal hosts)
200 vms
few hundreds of storage domains

Version-Release number of selected component (if applicable):
SI17, rhevm-dwh-3.1.0-12.el6ev.noarch, RHEL6.3

Comment 1 Rami Vaknin 2012-09-11 15:36:23 UTC
Forgot to mention that I'm working with remote db, I see that the number of db open connections in the db machine (port 5432) are 150, while the number of db open connections from the rhevm machine are 128.

Comment 2 Itamar Heim 2012-09-11 17:36:15 UTC
rami - can you please check what we configure when using a local postgress - I'm pretty sure we configure a higher number than 150.
for a remote postgres, this would make this a documentation issue (not sure if the installer can check this as part of connecting to the remote db and validating its config. also, we can't assume the remote postgres is only handling our application, so the number may actually need to be higher).

Thanks,
   Itamar

Comment 3 Rami Vaknin 2012-09-11 19:34:16 UTC
Local postgresql is the same - max_connections=150

Need to take into account that not always restart of remote postgresql is possible just for dwh installation, hence when setting max_connections during engine installation - it would better to define max_connections for engine + dwh.

Comment 4 mkublin 2012-09-16 08:20:58 UTC
Hi Rami, can u please add an engine logs

Comment 5 Rami Vaknin 2012-09-16 20:28:09 UTC
Created attachment 613503 [details]
dwh and server logs

Comment 6 Rami Vaknin 2012-09-16 20:33:29 UTC
Created attachment 613504 [details]
engine logs

Comment 10 Juan Hernández 2012-09-26 16:08:48 UTC
Not a fix for this bug, but I am proposing the following change just in case we need to make the connection pool easily configurable:

http://gerrit.ovirt.org/8216

Comment 12 mkublin 2012-09-27 14:01:44 UTC
Not so familiar with reports , I found the following configuration inside 
rhevm-reports.war:
<Resource auth="Container" driverClassName="org.postgresql.Driver" maxActive="100" maxIdle="30" maxWait="10000" name="jdbc/jasperserver" password="123456" testOnBorrow="true" type="javax.sql.DataSource" url="jdbc:postgresql://localhost:5432/rhevmreports" username="postgres" validationQuery="SELECT 1"/>

It is look likes that reports has max 100 connections, it is means that
engine - 100
reports - 100
postgres - 150.

Comment 13 Juan Hernández 2012-09-27 14:24:27 UTC
In fact reports uses 25 connection max, as far as I know (after asking Yaniv Dary).

That rhevm-reports.war configuration comes from WEB-INF/context.xml, which is not used in JBoss AS7, probably a leftover from JBoss AS5.

The real configuration for reports comes from two places:

1. The Jasper server database is configured in the file /usr/share/ovirt-engine/rhevm-reports.war/WEB-INF/js-jboss7-ds.xml. That uses the JBoss connection pool, and as we don't have an explicit max-pool-size it uses the default 20.

2. All the reports share the same data source defined in in /usr/share/ovirt-engine-reports/reports/resources/reports_resources/JDBC/data_sources/ovirt.xml. This pool is not managed by JBoss, but by the reports themselves and the default configuration uses up to 5 connections for all the reports (it can be changed in /usr/share/ovirt-engine/rhevm-reports.war/WEB-INF/applicationContext.xml:

  <bean id="dataSourceObjectPoolFactory" class="org.apache.commons.pool.impl.GenericObjectPoolFactory">
    <constructor-arg type="org.apache.commons.pool.PoolableObjectFactory">
      <null/>
    </constructor-arg>
    <constructor-arg type="int" value="5"/>
  </bean>

Then we have the up to 100 connections used by the engine, the connection used by the notification service and the 3 connections used by DWH (I am not 100% sure of DWH, but I never seen it use more than 3).

So all in all we have a total of 129 connections:

engine 100
notifier 1
dwh 3
reports 25

This is quite close to the limit of 150 in the database server. The moment we have some broken/hung connections or some connections opened by other applications it will trigger this issue.

Comment 14 Juan Hernández 2012-10-02 13:09:22 UTC
The change to make the pool size configurable has been merged:

http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=d0bbbba7a1036422d2175c841740486565e96373

I am not moving the bug to POST, as that change doesn't change the size, it only makes it configurable.

Comment 22 Barak 2012-11-18 15:41:49 UTC
This is not a zstream issue.
for 3.1 it should be handled as a KB by GSS.
We intend to investigate for 3.2.

Comment 23 Yaniv Lavi 2012-12-05 09:16:14 UTC
*** Bug 874477 has been marked as a duplicate of this bug. ***

Comment 25 Tomas Dosek 2012-12-06 09:18:45 UTC
Created attachment 658590 [details]
Logs from case 00750506

Comment 28 Juan Hernández 2012-12-10 18:27:45 UTC
Some thoughts:

1. The default number of threads created by the Quartz scheduler is 100, so if we happen to have 100 or more scheduled tasks 100 of then will run in parallel and consume at least 1 connection each. During the start up of the engine one scheduled task is created to initialize each host, so if we have more than 100 hosts we will consume at least 100 connections. We can adjust (reduce) this with the following configuration inside /etc/sysconfig/ovirt-engine:

  ENGINE_PROPERTIES=org.quartz.threadPool.threadCount=10

This should reduce the pressure on the database.

2. The default configuration of the web server is to use one process per connection and to serve up to 256 simultaneous connections, so this could potentially generate up to 256 threads running and consuming database connections in the application server side. This can be relevant for situations where we have many different clients connecting simultaneously to the web server, using the REST API for example or the user portal. This behaviour can be adjusted in two ways:

* We can use the "worker" mode instead of the "prefork" mode. This is usually good for the application server because each Apache process handles several connections with several threads, and those threads can reuse the connections to the application server, thus reducing the load and the number of database connections there. I don't think this is relevant for us because all the requests we receive are redirected to the application server. Anyhow it is worth testing, changing the HTTPD variable in /etc/sysconfig/httpd and restarting Apache:
  
  HTTPD=/usr/sbin/httpd.worker

* We can adjust the maximum number of simultaneous requests accepted by Apache with the MaxClients directive in /etc/httpd/conf/httpd.conf. This should be set so that Apache doesn't accept more connections than the application server and the database can handle.

Comment 40 Javier Coscia 2013-04-12 12:39:57 UTC
Created attachment 734723 [details]
engine.log case 00807367

Comment 51 Lucy Bopf 2016-03-14 13:03:36 UTC
Resetting default assignee and QA contact, and moving bug to NEW until assignment.

Comment 52 Yaniv Kaul 2016-03-15 07:17:44 UTC
If QE can test this, I'd be happy to see it in 4.0.

Comment 54 Yaniv Lavi 2016-05-09 11:00:55 UTC
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

Comment 56 Yaniv Lavi 2016-06-19 13:53:23 UTC
Can we create a kbase for the tip in the doc text?

Comment 57 Yaniv Lavi 2016-12-06 09:43:04 UTC
Can you help with this?

Comment 58 Lucy Bopf 2017-07-11 00:52:18 UTC
Tomas, can you help us identify somebody to write a KBase to cover this specific use case?

Comment 64 mlehrer 2018-10-18 12:55:20 UTC
Spoke with Avital via bluejeans and did a review of the document and small tweaks/suggestions.  Removing needinfo.

Comment 71 Roy Golan 2018-10-31 08:35:19 UTC
my comments go on the google doc, its the easiest to discuss and share

Comment 77 Avital Pinnick 2018-12-03 07:55:24 UTC
Published article:

https://access.redhat.com/articles/3641742