Bug 758724

Summary: Transaction timing can prevent drift file content from being persisted
Product: [Other] RHQ Project Reporter: John Sanda <jsanda>
Component: driftAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.2CC: jshaughn
Target Milestone: ---   
Target Release: RHQ 4.3.0   
Hardware: All   
OS: All   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-31 05:56:25 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 752488, 707225    

Description John Sanda 2011-11-30 10:32:21 EST
Description of problem:
There is a transactional/timing issue that can occur when the RHQ server is persisting a change set and subsequently request file content from the agent that will result in an exception that prevents the file content from being persisted. Here is the relevant part of the exception from the server log:

2011-11-30 07:05:38,984 INFO 
[org.rhq.enterprise.server.drift.JPADriftServerBean] Skipping bad drift file
javax.ejb.EJBException: java.lang.IllegalArgumentException: JPADriftFile not
found [eec86c6712976844ffe31411c982fc7b6aa6d4e89b6c759273cf6c888872efb1]

Here is what is happening when I produce the issue. I have drift definitions for two EAP servers. The agent sends the initial change set report for EAP server 1. The RHQ server processes the report, creating and persisting drift records as well as JPADriftFile instances as needed. The RHQ server sends a request to the agent for the content of each JPADriftFile that is created.

While the agent is gathering content, the drift detector task kicks off and generates the initial change set report for EAP server 2. The RHQ server processes it and sends a request for content to the agent. The number of files for which the RHQ server requests content for EAP server 2 will be very small because most of that content has already been requested with EAP server 1. Because the number of files is small, the agent is able to process that request and send the content back to the server before the large transaction that is processing the initial change set has committed.

When the agent sends file content to the server, the server assumes that there is already a JPADriftFile in the database for each file that the agent is sending. This makes sense because the agent should only be sending content for stuff that the server knows about. The problem though is that the transaction in which the JPADriftFiles has not yet been committed. So to the thread handling the file content sent from the agent, there are some files that the agent is sending that the server does not know about; hence, the IllegalArgumentException from above.

We have a bit of a race condition here and need to ensure that those JPADriftFile entities are persisted and visible to the later transaction handling the file content sent from the agent.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Comment 1 Jay Shaughnessy 2011-11-30 16:30:38 EST
master commit b31e3a66a1e75dcad0070b5b78bbd3f8e9005533

This should resolve the timing issue where it was possible for the agent
to submit DriftFile content before the DriftFile entity was committed to
the database, thus generating exceptions due to the missing entity, and
a failure to store the required content.

This is not easy to test.  jsanda had a good reproduction environment
and if he verifies that should be sufficient. I have done some
mock testing which has been successful. And the code changes are
reviewed by john and mazz.
Comment 2 Jay Shaughnessy 2011-11-30 18:20:52 EST
release_jon3.x commit 66a4abdf1e8661a926869f23b8dbd0d357a8c11a