Description of problem: For Oracle 11g. Several call-time metrics received in one batch are assigned to the first resource in the batch. Version-Release number of selected component (if applicable): Reproduced on 3.0.0 but as I could see from the source should be appliciable for 4.0.0 also. How reproducible: Server should receive a batch of call-time metrics from different resources at the same time. Steps to Reproduce: 1.Turn on call-time metric collection for different resources. For example "Method Invocation Time" for RHQ Server EJB3 session beans. Or use any other resources wich emit call-time metrics 2. Wait for collection 3. Check actual response times at Monitor-Response tab. Actual results: For some resources not all responses are shown. For other resources some extra responses are shown. Expected results: Each response is displayed on Response tab for apropriate resource. Additional info: Reproduced with "Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production" and the latest 10 and 11 Oracle JDBC drivers. The actual problem is with batch insert queries. Queries for INSERT INTO RHQ_CALLTIME_DATA_VALUE and RHQ_CALLTIME_DATA_VALUE tables have some subqueries with parameters. These parameters doesn't change and have the same values (values of the first record) for whole batch. Actualy I think it's Oracle bug. But it's very important for RHQ. I would sugest not to perform batch insert in Oracle at all to avoid this situation.
Pavel, to get this right: you have e.g. 2 resources with 1 call time schedule each and the data and if they are transmitted in one batch, data for both ends up at e.g. the first resource? Heiko
Heiko, You understand that right. Both call-time records are displayed on Response time of the first resource (the first in order of the received batch).
Pavel, I've written a test ( http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=blob;f=modules/enterprise/server/jar/src/test/java/org/rhq/enterprise/server/measurement/test/MeasurementDataManagerTest.java ) to test this from the insert side - do you have a chance to run this in your env? This works for me on Postgres, but as you reported the issue on Oracle, it does not prove anything. I will also try to run it against one of our Oracle instances. Thanks Heiko
Heiko, As I could see this test should reproduce the problem. I could try this on my Oracle instance, but I have no expirience running rhq tests. If you could give me guidelines how to run it, and set it up with particular database backend, I would try.
Pavel, great that the test seems to do what it should. I think you should have a look at http://rhq-project.org/display/RHQ/Building+RHQ#BuildingRHQ-CheckoutRHQSources that talks about how to check out the sources and build RHQ. To set up the database you add an entry in ~/.m2/settings.xml a la: <profile> <id>oracle11</id> <activation> <activeByDefault>false</activeByDefault> </activation> <properties> <rhq.test.ds.user-name>rhqadmin</rhq.test.ds.user-name> <rhq.test.ds.password>rhqadmin</rhq.test.ds.password> <rhq.test.ds.type-mapping>Oracle10g</rhq.test.ds.type-mapping> <rhq.test.ds.driver-class>oracle.jdbc.driver.OracleDriver</rhq.test.ds.driver-class> <rhq.test.ds.connection-url>jdbc:oracle:thin:@pickeldi:1521:orcl</rhq.test.ds.connection-url> <rhq.test.ds.xa-datasource-class>oracle.jdbc.xa.client.OracleXADataSource</rhq.test.ds.xa-datasource-class> <rhq.test.ds.server-name>172.7.31.8</rhq.test.ds.server-name> <rhq.test.ds.port>1521</rhq.test.ds.port> <rhq.test.ds.db-name>xe</rhq.test.ds.db-name> <!-- <rhq.test.ds.connection-url>jdbc:jon-log:jdbc:oracle:thin:@dev01.qa.atl.jboss.com:1521:qadb01</rhq.test.ds.connection-url> <rhq.test.ds.driver-class>org.jboss.on.db.log.LoggerDriver</rhq.test.ds.driver-class> --> </properties> </profile> And then start the test with $ cd modules/entreprise/server/jar $ mvn -Poracle11 -Dtest=MeasurementDataManagerTest test If you have questions, join us in #rhq on irc.freenode.net
Heiko, I run your test on my Oracle instance. Test have failed as expected! The result is List 2 returned 0 entries, expected was 1 org.rhq.enterprise.server.measurement.test.MeasurementDataManagerTest.bz658491(MeasurementDataManagerTest.java:163) After run I have only one record in RHQ_CALLTIME_DATA_KEY with CALL_DESTINATION=/foo and two records in RHQ_CALLTIME_DATA_VALUE. Regards, Pavel
Investigation shows that bff7953 introduced the issue as direct path inserts ( /*+APPEND */ ) do not allow any "read after write" within the same transaction.
root cause here is insertion of data using a sub-query to the same table using append hints: INSERT /*+ APPEND */ INTO SomeTable(col...) SELECT (col...) FROM SomeTable WHERE ... the first time this is done in a transaction it will succeed, but the second time it happens it will fail because the select is trying to read from a table with direct path inserts. oddly enough, if things hadn't been batched, it would have produced a very evident trace in the log as soon as the hint was added and deployed. could the way we handle testing for batch errors be incorrect: int insertedRowCount = 0; for (int i = 0; i < results.length; i++) { // oracle returns -2 because it can't count updated rows if (((results[i] < 0) || (results[i] > 1)) && (results[i] != -2)) { throw new MeasurementStorageException( "Failed to insert call-time data key rows - result [" + results[i] + "] for batch command [" + i + "] is less than 0 or greater than 1."); } // If Oracle returns -2, just count 1 row insertedRowCount += results[i]==-2 ? 1 : results[i]; } lastly, this should have been caught by our unit tests, since we're seeing DataPurgeJobTest.createNewCalltimeData is fail today.
Fixed in 9a7672945dc3c4890b2d491e91a6c36fc633b37a in master.
Removing the blocker on the JON241 version of this bug.
this has re-appeared. I suspect this might never have been fixed in teh first place. The unit tests in place all pass, but only on Oracle 10g Express Edition. They fail on Oracle 11g Enterprise Edition. I suspect the problem is not in the version, but the edition (Express vs Enterprise).
notes: here's the two sql inserts we issue, first inserts into the calltime data key table, second inserts into calltime data value table. for the tests, we batch insert two rows in each. note that the second query, I think we can remove the join to the rhq_numbers query - probably just a cut-n-paste mistake - no need to join it like we needed to in the first query INSERT INTO RHQ_CALLTIME_DATA_KEY(id, schedule_id, call_destination) SELECT RHQ_calltime_data_key_id_seq.nextval, ?1, ?2 FROM RHQ_numbers WHERE i = 42 AND NOT EXISTS (SELECT * FROM RHQ_CALLTIME_DATA_KEY WHERE schedule_id = ?3 AND call_destination = ?4) 1,3 = 10005 2,4 = /foo 1,3 = 10006 2,4 = /bar INSERT /*+ APPEND */ INTO RHQ_CALLTIME_DATA_VALUE(id, key_id, begin_time, end_time, minimum, maximum, total, count) SELECT RHQ_calltime_data_value_id_seq.nextval, key.id, ?1, ?2, ?3, ?4, ?5, ?6 FROM RHQ_numbers num, RHQ_calltime_data_key key WHERE num.i = 42 AND key.id = (SELECT id FROM RHQ_CALLTIME_DATA_KEY WHERE schedule_id = ?7 AND call_destination = ?8) 1 = 1310660707888 (begin time) long 2 = 1310660707988 (end time) long 3 = 100.0 (min) double 4 = 100.0 (max) double 5 = 100.0 (total) double 6 = 1 (count) long 7 = 10005 (sched id) 8 = /foo 1 = 1310660707800 (begin time) long 2 = 1310660707900 (end time) long 3 = 200.0 (min) double 4 = 200.0 (max) double 5 = 200.0 (total) double 6 = 1 (count) long 7 = 10006 (sched id) 8 = /bar
unsure why this query has that subselect but is appears the parameter being bound is getting fixed for the batch (not getting substituted for each individual query in the batch). Here's what we are going to use now, the 4 failing tests are passing with this (notice we removed the rhq_number join, don't see a need for that, plus we removed the sub-select and just use "key" entity in where clause): private static final String CALLTIME_VALUE_INSERT_STATEMENT = "INSERT /*+ APPEND */ INTO " + DATA_VALUE_TABLE_NAME + "(id, key_id, begin_time, end_time, minimum, maximum, total, count) " + "SELECT %s, key.id, ?, ?, ?, ?, ?, ? FROM " + DATA_KEY_TABLE_NAME + " key WHERE key.schedule_id = ? AND key.call_destination = ?";
running tests on jenkins to see if this fixes it. Will move to ON_QA once done. here's how to test: 1) you MUST be using Oracle 11g ENTERPRISE edition! Oracle XE versions are not good enough - they do not exhibit the bug - must use *Enterprise* Edition 2) import two resources into inventory that support calltime data (you can use two EJBs from under the RHQ Server resource - so just import RHQ Server resource and you'll use two of its Session EJBs) Pick two EJBs that are used alot - like the SubjectManagerBean and the AuthorizationManagerBean) 3) Enable calltime metric for both resources and make sure they run at the same interval. I would set the collection interval to 30s (the lowest possible) to ensure the best possibility that both resources' calltime data will be sent in the same report. 4) If you are using RHQ Server's EJB resources, start clicking aroudn the UI alot, make sure yuo hit pages that require you to be auth'ed to see things - like resource tabs (inventory, measurement, configuration, etc). 5) after several minutes, go check that you have calltime data in both of the resources. If you see calltime data for both resources, we can assume this bug is fixed.
there is an automated test for this ... and the test is passing.
Bulk closing of issues that were VERIFIED, had no target release and where the status changed more than a year ago.