Bug 758724 - Transaction timing can prevent drift file content from being persisted
Transaction timing can prevent drift file content from being persisted
Product: RHQ Project
Classification: Other
Component: drift (Show other bugs)
All All
urgent Severity high (vote)
: ---
: RHQ 4.3.0
Assigned To: Jay Shaughnessy
Mike Foley
Depends On:
Blocks: jon30-sprint9 707225
  Show dependency treegraph
Reported: 2011-11-30 10:32 EST by John Sanda
Modified: 2013-08-31 05:56 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-08-31 05:56:25 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description John Sanda 2011-11-30 10:32:21 EST
Description of problem:
There is a transactional/timing issue that can occur when the RHQ server is persisting a change set and subsequently request file content from the agent that will result in an exception that prevents the file content from being persisted. Here is the relevant part of the exception from the server log:

2011-11-30 07:05:38,984 INFO 
[org.rhq.enterprise.server.drift.JPADriftServerBean] Skipping bad drift file
javax.ejb.EJBException: java.lang.IllegalArgumentException: JPADriftFile not
found [eec86c6712976844ffe31411c982fc7b6aa6d4e89b6c759273cf6c888872efb1]

Here is what is happening when I produce the issue. I have drift definitions for two EAP servers. The agent sends the initial change set report for EAP server 1. The RHQ server processes the report, creating and persisting drift records as well as JPADriftFile instances as needed. The RHQ server sends a request to the agent for the content of each JPADriftFile that is created.

While the agent is gathering content, the drift detector task kicks off and generates the initial change set report for EAP server 2. The RHQ server processes it and sends a request for content to the agent. The number of files for which the RHQ server requests content for EAP server 2 will be very small because most of that content has already been requested with EAP server 1. Because the number of files is small, the agent is able to process that request and send the content back to the server before the large transaction that is processing the initial change set has committed.

When the agent sends file content to the server, the server assumes that there is already a JPADriftFile in the database for each file that the agent is sending. This makes sense because the agent should only be sending content for stuff that the server knows about. The problem though is that the transaction in which the JPADriftFiles has not yet been committed. So to the thread handling the file content sent from the agent, there are some files that the agent is sending that the server does not know about; hence, the IllegalArgumentException from above.

We have a bit of a race condition here and need to ensure that those JPADriftFile entities are persisted and visible to the later transaction handling the file content sent from the agent.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Comment 1 Jay Shaughnessy 2011-11-30 16:30:38 EST
master commit b31e3a66a1e75dcad0070b5b78bbd3f8e9005533

This should resolve the timing issue where it was possible for the agent
to submit DriftFile content before the DriftFile entity was committed to
the database, thus generating exceptions due to the missing entity, and
a failure to store the required content.

This is not easy to test.  jsanda had a good reproduction environment
and if he verifies that should be sufficient. I have done some
mock testing which has been successful. And the code changes are
reviewed by john and mazz.
Comment 2 Jay Shaughnessy 2011-11-30 18:20:52 EST
release_jon3.x commit 66a4abdf1e8661a926869f23b8dbd0d357a8c11a

Note You need to log in before you can comment on or make changes to this bug.