Bug 874477
| Summary: | Reports fail to create a connection to database and right after fail to close JDBC lease for connection | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Tomas Dosek <tdosek> | ||||||
| Component: | ovirt-engine-reports | Assignee: | Yaniv Lavi <ylavi> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | David Botzer <dbotzer> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 3.0.7 | CC: | bazulay, dyasny, iheim, juan.hernandez, pstehlik, Rhev-m-bugs, sgrinber, tvvcox, ykaul | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 3.1.2 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | infra | ||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2012-12-05 09:16:14 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 640677 [details]
rhevm.log
Here are some basic stats that customer collected for us during the issue. You can see that many postgres threads from rhevm and rhevm reports are left opened and idle for long time: top - 12:25:04 up 25 days, 18:34, 2 users, load average: 0.18, 0.10, 0.09 Tasks: 423 total, 9 running, 414 sleeping, 0 stopped, 0 zombie Cpu0 : 15.9%us, 31.8%sy, 0.0%ni, 52.0%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 17.1%us, 32.9%sy, 0.0%ni, 49.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu2 : 23.8%us, 29.2%sy, 0.0%ni, 46.3%id, 0.3%wa, 0.0%hi, 0.3%si, 0.0%st Cpu3 : 18.2%us, 30.4%sy, 0.0%ni, 51.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu4 : 22.6%us, 30.6%sy, 0.0%ni, 46.5%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu5 : 17.0%us, 31.6%sy, 0.0%ni, 51.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 16.1%us, 32.4%sy, 0.0%ni, 50.5%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 18.2%us, 31.0%sy, 0.0%ni, 50.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16330452k total, 15539644k used, 790808k free, 304528k buffers Swap: 13766648k total, 294440k used, 13472208k free, 8218072k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2913 postgres 20 0 229m 51m 37m R 22.2 0.3 15:32.77 postmaster 2480 postgres 20 0 228m 50m 37m S 11.9 0.3 14:14.93 postmaster 12419 postgres 20 0 234m 57m 37m S 11.9 0.4 66:52.33 postmaster 13731 postgres 20 0 237m 59m 37m R 11.9 0.4 114:39.17 postmaster 16749 postgres 20 0 238m 61m 37m S 11.9 0.4 96:26.08 postmaster 1397 postgres 20 0 241m 63m 37m S 11.3 0.4 115:46.08 postmaster 17030 postgres 20 0 240m 61m 37m S 11.3 0.4 97:03.90 postmaster 2382 postgres 20 0 229m 52m 37m R 10.9 0.3 12:07.04 postmaster 4699 postgres 20 0 237m 57m 37m S 10.9 0.4 53:26.55 postmaster 7844 postgres 20 0 230m 53m 37m S 10.9 0.3 31:15.54 postmaster 13750 postgres 20 0 241m 61m 37m S 10.9 0.4 120:46.13 postmaster 17031 postgres 20 0 232m 57m 37m S 10.9 0.4 101:59.40 postmaster 17033 postgres 20 0 238m 63m 37m R 10.9 0.4 80:01.09 postmaster 1408 postgres 20 0 240m 63m 37m S 10.6 0.4 112:22.77 postmaster 7910 postgres 20 0 233m 55m 37m S 10.6 0.4 48:19.63 postmaster 9221 postgres 20 0 237m 59m 37m S 10.6 0.4 83:29.28 postmaster 13743 postgres 20 0 240m 63m 37m S 10.6 0.4 107:35.24 postmaster 17029 postgres 20 0 240m 62m 37m S 10.6 0.4 92:18.53 postmaster 17034 postgres 20 0 232m 57m 37m S 10.6 0.4 96:58.75 postmaster 17035 postgres 20 0 237m 59m 37m S 10.6 0.4 97:23.49 postmaster 28777 postgres 20 0 220m 45m 37m S 10.6 0.3 4:41.75 postmaster 29109 postgres 20 0 233m 56m 37m S 10.6 0.4 87:44.67 postmaster 31672 postgres 20 0 234m 58m 37m S 10.6 0.4 97:55.69 postmaster 3766 postgres 20 0 238m 60m 37m R 10.3 0.4 72:48.45 postmaster 4697 postgres 20 0 236m 58m 37m R 10.3 0.4 48:54.06 postmaster 12211 postgres 20 0 233m 55m 37m S 10.3 0.3 19:58.43 postmaster 17026 postgres 20 0 238m 61m 37m S 10.3 0.4 97:00.58 postmaster 28298 postgres 20 0 237m 59m 37m S 10.3 0.4 70:48.95 postmaster 2570 postgres 20 0 231m 56m 37m R 9.9 0.4 24:10.65 postmaster [root@jerez1 log]# ps -fe | grep postgres postgres 1397 7973 8 Nov05 ? 01:55:46 postgres: rhevm rhevm 127.0.0.1(51477) idle postgres 1408 7973 8 Nov05 ? 01:52:25 postgres: rhevm rhevm 127.0.0.1(51488) SELECT postgres 1918 7973 6 09:07 ? 00:12:15 postgres: rhevm rhevm 127.0.0.1(42892) idle postgres 1985 7973 6 09:07 ? 00:13:48 postgres: rhevm rhevm 127.0.0.1(42970) SELECT postgres 1986 7973 3 09:07 ? 00:07:11 postgres: rhevm rhevm 127.0.0.1(42971) SELECT postgres 2190 7973 0 11:56 ? 00:00:03 postgres: postgres rhevmreports 127.0.0.1(57686) idle postgres 2191 7973 0 11:56 ? 00:00:00 postgres: postgres rhevmreports 127.0.0.1(57687) idle postgres 2382 7973 6 09:09 ? 00:12:10 postgres: rhevm rhevm 127.0.0.1(43449) BIND postgres 2480 7973 7 09:09 ? 00:14:18 postgres: rhevm rhevm 127.0.0.1(43570) SELECT postgres 3766 7973 8 Nov05 ? 01:12:51 postgres: rhevm rhevm 127.0.0.1(40140) SELECT postgres 4696 7973 7 01:16 ? 00:50:35 postgres: rhevm rhevm 127.0.0.1(51754) idle postgres 4697 7973 7 01:16 ? 00:48:57 postgres: rhevm rhevm 127.0.0.1(51755) SELECT postgres 4699 7973 7 01:16 ? 00:53:29 postgres: rhevm rhevm 127.0.0.1(51757) SELECT postgres 5245 7973 0 12:11 ? 00:00:00 postgres: postgres rhevmreports 127.0.0.1(33444) idle postgres 5246 7973 0 12:11 ? 00:00:00 postgres: postgres rhevmreports 127.0.0.1(33445) idle postgres 7844 7973 6 04:12 ? 00:31:18 postgres: rhevm rhevm 127.0.0.1(47703) SELECT postgres 7845 7973 7 04:12 ? 00:38:35 postgres: rhevm rhevm 127.0.0.1(47705) idle postgres 7863 7973 7 04:12 ? 00:37:59 postgres: rhevm rhevm 127.0.0.1(47725) SELECT postgres 7910 7973 7 01:32 ? 00:48:22 postgres: rhevm rhevm 127.0.0.1(59346) SELECT postgres 7973 1 0 Oct11 ? 00:01:42 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data postgres 8070 7973 0 Oct11 ? 00:00:37 postgres: logger process root 8073 9747 0 12:25 pts/0 00:00:00 grep postgres postgres 8074 7973 0 Oct11 ? 02:14:40 postgres: writer process postgres 8075 7973 0 Oct11 ? 00:08:54 postgres: wal writer process postgres 8076 7973 0 Oct11 ? 00:02:14 postgres: autovacuum launcher process postgres 8077 7973 0 Oct11 ? 00:19:36 postgres: stats collector process postgres 9221 7973 8 Nov05 ? 01:23:32 postgres: rhevm rhevm 127.0.0.1(34942) SELECT root 9301 1 1 Oct11 ? 07:49:27 java -Xms256M -Xmx1024M -cp /usr/share/rhevm-dwh/etl:/usr/share/rhevm-dwh/etl/historyETLProcedure-3.0.2.jar:/usr/share/rhevm-dwh/etl/lib/advancedPersistentLookupLib-1.0.jar:/usr/share/rhevm-dwh/etl/lib/commons-collections.jar:/usr/share/rhevm-dwh/etl/lib/jboss-serialization.jar:/usr/share/rhevm-dwh/etl/lib/log4j.jar:/usr/share/rhevm-dwh/etl/lib/postgresql-jdbc.jar:/usr/share/rhevm-dwh/etl/lib/trove.jar:/usr/share/rhevm-dwh/etl/lib/dom4j.jar:/usr/share/rhevm-dwh/etl/lib/talendRoutines-4.2.2.r63143.jar:/usr/share/rhevm-dwh/etl/lib/xml-apis-1.0.b2.jar rhevm_reports.historyetl_3_0.HistoryETL --context=Default postgres 9348 7973 0 Oct11 ? 00:11:44 postgres: postgres rhevm 127.0.0.1(52662) idle postgres 9349 7973 0 Oct11 ? 03:06:37 postgres: postgres rhevm_history 127.0.0.1(52663) idle postgres 9350 7973 0 Oct11 ? 00:32:58 postgres: postgres rhevm_history 127.0.0.1(52664) idle in transaction postgres 9351 7973 0 Oct11 ? 00:00:35 postgres: postgres rhevm_history 127.0.0.1(52665) idle postgres 9352 7973 0 Oct11 ? 00:36:49 postgres: postgres rhevm_history 127.0.0.1(52666) idle in transaction postgres 12211 7973 6 07:14 ? 00:20:01 postgres: rhevm rhevm 127.0.0.1(41475) SELECT postgres 12419 7973 8 Nov05 ? 01:06:55 postgres: rhevm rhevm 127.0.0.1(40304) SELECT postgres 13731 7973 8 Nov05 ? 01:54:42 postgres: rhevm rhevm 127.0.0.1(47062) SELECT postgres 13734 7973 9 Nov05 ? 02:00:31 postgres: rhevm rhevm 127.0.0.1(47064) idle postgres 13741 7973 8 Nov05 ? 01:47:03 postgres: rhevm rhevm 127.0.0.1(47071) SELECT postgres 13743 7973 8 Nov05 ? 01:47:38 postgres: rhevm rhevm 127.0.0.1(47073) SELECT postgres 13746 7973 8 Nov05 ? 01:54:02 postgres: rhevm rhevm 127.0.0.1(47076) idle postgres 13750 7973 9 Nov05 ? 02:00:49 postgres: rhevm rhevm 127.0.0.1(47080) SELECT postgres 14785 7973 7 02:05 ? 00:44:30 postgres: rhevm rhevm 127.0.0.1(40370) SELECT postgres 16747 7973 8 Nov05 ? 01:28:45 postgres: rhevm rhevm 127.0.0.1(48960) idle postgres 16749 7973 8 Nov05 ? 01:36:28 postgres: rhevm rhevm 127.0.0.1(48963) idle postgres 16751 7973 8 Nov05 ? 01:32:29 postgres: rhevm rhevm 127.0.0.1(48965) idle postgres 17026 7973 8 Nov05 ? 01:37:03 postgres: rhevm rhevm 127.0.0.1(51112) SELECT postgres 17029 7973 8 Nov05 ? 01:32:21 postgres: rhevm rhevm 127.0.0.1(51115) SELECT postgres 17030 7973 8 Nov05 ? 01:37:06 postgres: rhevm rhevm 127.0.0.1(51116) SELECT postgres 17031 7973 9 Nov05 ? 01:42:02 postgres: rhevm rhevm 127.0.0.1(51117) SELECT postgres 17033 7973 7 Nov05 ? 01:20:04 postgres: rhevm rhevm 127.0.0.1(51119) SELECT postgres 17034 7973 8 Nov05 ? 01:37:01 postgres: rhevm rhevm 127.0.0.1(51120) SELECT postgres 17035 7973 8 Nov05 ? 01:37:26 postgres: rhevm rhevm 127.0.0.1(51122) SELECT postgres 17036 7973 8 Nov05 ? 01:35:07 postgres: rhevm rhevm 127.0.0.1(51123) idle postgres 17037 7973 9 Nov05 ? 01:44:05 postgres: rhevm rhevm 127.0.0.1(51125) SELECT postgres 18000 7973 8 05:02 ? 00:35:54 postgres: rhevm rhevm 127.0.0.1(33268) idle postgres 20435 7973 8 Nov05 ? 01:51:17 postgres: rhevm rhevm 127.0.0.1(58210) SELECT postgres 22037 7973 8 10:51 ? 00:07:52 postgres: rhevm rhevm 127.0.0.1(40941) SELECT postgres 22038 7973 4 10:51 ? 00:04:13 postgres: rhevm rhevm 127.0.0.1(40942) idle postgres 22040 7973 6 10:51 ? 00:06:05 postgres: rhevm rhevm 127.0.0.1(40946) idle postgres 28298 7973 8 Nov05 ? 01:10:51 postgres: rhevm rhevm 127.0.0.1(38574) idle postgres 28486 7973 0 11:26 ? 00:00:00 postgres: postgres rhevmreports 127.0.0.1(49853) idle postgres 28777 7973 8 11:27 ? 00:04:44 postgres: rhevm rhevm 127.0.0.1(50128) SELECT postgres 28778 7973 4 11:27 ? 00:02:45 postgres: rhevm rhevm 127.0.0.1(50132) idle postgres 28780 7973 5 11:27 ? 00:03:26 postgres: rhevm rhevm 127.0.0.1(50134) idle postgres 29109 7973 8 Nov05 ? 01:27:47 postgres: rhevm rhevm 127.0.0.1(38069) SELECT postgres 31672 7973 8 Nov05 ? 01:37:58 postgres: rhevm rhevm 127.0.0.1(52862) SELECT [root@jerez1 log]# Yaniv, Do we think it's zstream ? Do we know what the problem is ? How easy/hard to fix ? (In reply to comment #3) > Yaniv, > > Do we think it's zstream ? > Do we know what the problem is ? How easy/hard to fix ? Juan, you explored the JRS connection usage. any clue on this? Yaniv As far as I know the reports application uses its own connection pool, separated from the one used by the engine, at least this it how it works in 3.1. We found that while working in a different bug, see [1]. So if database connection pool is exhausted it should take at most 20 connections, not all the connections available for the engine. Tomas, you say that there are lot of PostgreSQL threads idle for a long time, but what I see in the output of top in comment #2 is that the have a lot of accumulated CPU time, which means that they have been very busy. Can you elaborate? [1] https://bugzilla.redhat.com/show_bug.cgi?id=856273#c13 In the rhevm.log file attached the database errors I see haven't been generated by reports, but by the VDS broker, so this looks more and more similar to bug 856273. Why do we think that this is related to reports? We've reproduced this on environment where database was overloaded with reports data and was not cleaned up properly. Looking at the bug you've posted here it seems you're right and this one can be closed as duplicate. Medium severity, proposing to 3.1.2, not critical for 3.1.1. *** This bug has been marked as a duplicate of bug 856273 *** |
Created attachment 640676 [details] rhevm reports log Description of problem: This is a scenario that happened within large deployment of RHEV-M with RHEV-M reports. (environment containing 30 hosts and around 350 VMs). During some database operation (not yet being reproduced in our test environment, partially probably because we're not able to simulate that large deployment) RHEV-M reports try to create a connection to database using JDBC connection pool in jboss. This operation fails, but the connection lease from the pool fails to rollback. As reports try to allocate this lease again and again (all failing to rollback) jdbc connection pool, in pretty short time, contains no allocable connection threads to database which rapidly slows down operation of both RHEV-M and RHEV-M reports (for instance login procedure takes about 10 minutes). This issue disappears while restarting rhevm and rhevm reports (as all database connections are forcibly closed). Version-Release number of selected component (if applicable): Sep 14 18:56:26 Updated: rhevm-3.0.7_0001-2.el6_3.x86_64 Apr 24 12:22:07 Installed: rhevm-reports-dwh-3.0.2-3.noarch Apr 24 12:22:13 Installed: rhevm-reports-server-3.0.2-3.noarch Apr 24 12:22:14 Installed: rhevm-reports-3.0.2-3.noarch How reproducible: Very hard to reproduce - only in large deployments Steps to Reproduce: Have a large rhev deployment - due to knowing bug some DB tables are growing rapidly and not being cleaned properly. After some time this issue happens. Actual results: All connection leases from jdbc connection pool are opened even though that some of these should be automatically cleaned Expected results: JDBC connection rollback procedure should be finished with no problem and connection lease should be released back to jdbc connection pool. Additional info: Please see attached log files.