Bug 856273
| Summary: | [Docs][Scalability][RFE][KBase] - give proper guidelines and documentation on configuring scaled RHEV-M environments | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Rami Vaknin <rvaknin> | ||||||||||
| Component: | Documentation | Assignee: | Avital Pinnick <apinnick> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Tahlia Richardson <trichard> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 3.1.0 | CC: | apinnick, bsettle, dagur, dossow, jcoscia, lpeer, lsurette, mkalinin, mkolbas, mlehrer, obockows, ohochman, pkovar, pstehlik, rdlugyhe, rgolan, Rhev-m-bugs, rhodain, sgordon, srevivo, tdosek, tvvcox | ||||||||||
| Target Milestone: | ovirt-4.2.8 | Keywords: | FutureFeature, Triaged | ||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | docs-accepted | ||||||||||||
| Fixed In Version: | Doc Type: | Release Note | |||||||||||
| Doc Text: |
On large scale deployments, of 200 hosts and above, you may need to increase the maximum allowed connection on the database server from the default value of 150 to 75% of the expected number of hosts. By default this value is found in the "/var/lib/pgsql/data/postgresql.conf" file on the database server.
|
Story Points: | --- | ||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2018-12-03 07:55:24 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | 861705 | ||||||||||||
| Bug Blocks: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Rami Vaknin
2012-09-11 15:34:09 UTC
Forgot to mention that I'm working with remote db, I see that the number of db open connections in the db machine (port 5432) are 150, while the number of db open connections from the rhevm machine are 128. rami - can you please check what we configure when using a local postgress - I'm pretty sure we configure a higher number than 150. for a remote postgres, this would make this a documentation issue (not sure if the installer can check this as part of connecting to the remote db and validating its config. also, we can't assume the remote postgres is only handling our application, so the number may actually need to be higher). Thanks, Itamar Local postgresql is the same - max_connections=150 Need to take into account that not always restart of remote postgresql is possible just for dwh installation, hence when setting max_connections during engine installation - it would better to define max_connections for engine + dwh. Hi Rami, can u please add an engine logs Created attachment 613503 [details]
dwh and server logs
Created attachment 613504 [details]
engine logs
Not a fix for this bug, but I am proposing the following change just in case we need to make the connection pool easily configurable: http://gerrit.ovirt.org/8216 Not so familiar with reports , I found the following configuration inside rhevm-reports.war: <Resource auth="Container" driverClassName="org.postgresql.Driver" maxActive="100" maxIdle="30" maxWait="10000" name="jdbc/jasperserver" password="123456" testOnBorrow="true" type="javax.sql.DataSource" url="jdbc:postgresql://localhost:5432/rhevmreports" username="postgres" validationQuery="SELECT 1"/> It is look likes that reports has max 100 connections, it is means that engine - 100 reports - 100 postgres - 150. In fact reports uses 25 connection max, as far as I know (after asking Yaniv Dary).
That rhevm-reports.war configuration comes from WEB-INF/context.xml, which is not used in JBoss AS7, probably a leftover from JBoss AS5.
The real configuration for reports comes from two places:
1. The Jasper server database is configured in the file /usr/share/ovirt-engine/rhevm-reports.war/WEB-INF/js-jboss7-ds.xml. That uses the JBoss connection pool, and as we don't have an explicit max-pool-size it uses the default 20.
2. All the reports share the same data source defined in in /usr/share/ovirt-engine-reports/reports/resources/reports_resources/JDBC/data_sources/ovirt.xml. This pool is not managed by JBoss, but by the reports themselves and the default configuration uses up to 5 connections for all the reports (it can be changed in /usr/share/ovirt-engine/rhevm-reports.war/WEB-INF/applicationContext.xml:
<bean id="dataSourceObjectPoolFactory" class="org.apache.commons.pool.impl.GenericObjectPoolFactory">
<constructor-arg type="org.apache.commons.pool.PoolableObjectFactory">
<null/>
</constructor-arg>
<constructor-arg type="int" value="5"/>
</bean>
Then we have the up to 100 connections used by the engine, the connection used by the notification service and the 3 connections used by DWH (I am not 100% sure of DWH, but I never seen it use more than 3).
So all in all we have a total of 129 connections:
engine 100
notifier 1
dwh 3
reports 25
This is quite close to the limit of 150 in the database server. The moment we have some broken/hung connections or some connections opened by other applications it will trigger this issue.
The change to make the pool size configurable has been merged: http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=d0bbbba7a1036422d2175c841740486565e96373 I am not moving the bug to POST, as that change doesn't change the size, it only makes it configurable. This is not a zstream issue. for 3.1 it should be handled as a KB by GSS. We intend to investigate for 3.2. *** Bug 874477 has been marked as a duplicate of this bug. *** Created attachment 658590 [details]
Logs from case 00750506
Some thoughts: 1. The default number of threads created by the Quartz scheduler is 100, so if we happen to have 100 or more scheduled tasks 100 of then will run in parallel and consume at least 1 connection each. During the start up of the engine one scheduled task is created to initialize each host, so if we have more than 100 hosts we will consume at least 100 connections. We can adjust (reduce) this with the following configuration inside /etc/sysconfig/ovirt-engine: ENGINE_PROPERTIES=org.quartz.threadPool.threadCount=10 This should reduce the pressure on the database. 2. The default configuration of the web server is to use one process per connection and to serve up to 256 simultaneous connections, so this could potentially generate up to 256 threads running and consuming database connections in the application server side. This can be relevant for situations where we have many different clients connecting simultaneously to the web server, using the REST API for example or the user portal. This behaviour can be adjusted in two ways: * We can use the "worker" mode instead of the "prefork" mode. This is usually good for the application server because each Apache process handles several connections with several threads, and those threads can reuse the connections to the application server, thus reducing the load and the number of database connections there. I don't think this is relevant for us because all the requests we receive are redirected to the application server. Anyhow it is worth testing, changing the HTTPD variable in /etc/sysconfig/httpd and restarting Apache: HTTPD=/usr/sbin/httpd.worker * We can adjust the maximum number of simultaneous requests accepted by Apache with the MaxClients directive in /etc/httpd/conf/httpd.conf. This should be set so that Apache doesn't accept more connections than the application server and the database can handle. Created attachment 734723 [details]
engine.log case 00807367
Resetting default assignee and QA contact, and moving bug to NEW until assignment. If QE can test this, I'd be happy to see it in 4.0. oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target. Can we create a kbase for the tip in the doc text? Can you help with this? Tomas, can you help us identify somebody to write a KBase to cover this specific use case? Spoke with Avital via bluejeans and did a review of the document and small tweaks/suggestions. Removing needinfo. my comments go on the google doc, its the easiest to discuss and share Published article: https://access.redhat.com/articles/3641742 |