Hide Forgot
When a repository is imported in Guvnor, it warns the user that all the data in the old repository will be deleted. This is, however, not true - the data disappear from the GUI, but not from the data store. This was reproduced with Modeshape on PostgreSQL. It is however the easiest to reproduce with JackRabbit using the default local database. Just import the samples repository, look at the size of the repo on the file system, then import the empty repo (see attached) and look at the size once again. This is not a problem for JackRabbit as its performance doesn't decrease with increasing the size of the repository. However, it constitutes a problem for Modeshape.
Created attachment 516116 [details] Import file for the empty repository
Candidate only.
The repository is indeed cleaned up during import. Below is a detailed jackrabbit repository directory size comparison using empty repository and sample repository: Step1: Start Guvnor with a brand-new empty repository. repository: 5.67M -repository: 128k --datastore: 0 --index: 104k --meta: 4k --namespace: 8k --nodetypes: 12k -versoin: 1.83m -workspaces: 3.71M --default: 1.86M ---db: 1.83M ---index: 28K --security: 1.85M Step2: import sample repo: repository: 6.17M -repository: 356k --datastore: 72K --index: 260k --meta: 4k --namespace: 8k --nodetypes: 12k -versoin: 1.83m -workspaces: 3.99M --default: 2.14M ---db: 1.83M ---index: 312K --security: 1.85M Step 3: import empty repo: repository: 5.93M -repository: 400k --datastore: 72K --index: 304k --meta: 4k --namespace: 8k --nodetypes: 12k -versoin: 1.83m -workspaces: 3.75M --default: 1.89M ---db: 1.83M ---index: 64K --security: 1.85M The only noticeable differences are the size of repository/repository/datastore and repository/repository/index directories. The repository/repository/datastore direcotry is used for large binary properties (eg, compiled drools package binary in our case). So looks like Jackrabbit does not clean these binary files during import. This shall has no impact on performance as these files are not referenced by any nodes anymore. The repository/repository/index directory is the Lucene index directory for version story. I am not 100% sure why its size kept increasing. It might be related to https://issues.jboss.org/browse/GUVNOR-296. Anyway I am going to close this issue as nothing needs to be changed in Guvnor. Let me know if this is not the case.
Jervis, would you please try with Modeshape as well? Unless I'm seriously mistaken, the issue will manifest there.
Hi Lukas, have you got any concrete data with Modeshape to show it does not clean up its repository after an import? As I've verified Guvnor with JackRabbit configuration and there is nothing wrong with either Guvnor or JackRabbit, it is very clear that the problem resides in ModeShape. As long as we've had concrete data to prove this is a problem, we can file an issue against ModeShape directly. We will have to let ModeShape guys to fix this problem.
Transferring this to ON_QA, I will look into it and try to provide specific data.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Marked as release note not required as this is an issue between internal builds.
Now even I cannot reproduce it. Marking as NOTABUG.