Bug 1113585
Summary: | LevelDBStore.stop() crashes JVM in native code | ||||||
---|---|---|---|---|---|---|---|
Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Radim Vansa <rvansa> | ||||
Component: | Infinispan | Assignee: | Tristan Tarrant <ttarrant> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Martin Gencur <mgencur> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.3.0 | CC: | afield, dmehra, gsheldon, jdg-bugs, jpallich, mhusnain, tsykora | ||||
Target Milestone: | ER1 | ||||||
Target Release: | 6.3.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Previously in Red Hat JBoss Data Grid, when a cache using LevelDB cache store was stopped (for example, as a consequence of stopping the cache manager), the LevelDB native implementation caused a segmentation fault in the JVM process. As a result of this segmentation fault, the process crashed. This issue is now fixed in JBoss Data Grid 6.3.1 so that using the LevelDB cache store native implementation works as expected.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-01-26 14:05:06 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Radim Vansa
2014-06-26 13:09:22 UTC
That looks great with freshly built http://download.eng.bos.redhat.com/brewroot/repos/jb-edg-6-rhel-6-build/latest/maven/org/fusesource/leveldbjni/leveldbjni-all/1.13-redhat.002/leveldbjni-all-1.13-redhat.002.jar Job: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/PERF-LIB/job/jdg-radargun-leveldb-jni-test/ Note: we will need to put respective JAR file into our zip (used in job) again, once CR3 is out. I am expecting this BZ ON_QA for 6.3.0 CR3. Setting target release. Just CCing Alan :)) (+ thank you Alan for your help with quick pre-CR3 verification) Brilliantly awesome and quick fix :P CR3 bits are ok, logs are clear as a mountain spring :) VERIFIED Unfortunately, this is reproducible in JDG 6.3.0 CR3. The previous verification by Tomas did not stop a single node during the test. This job reproduces the segfaults with CR1 and CR3: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-radargun-elasticity-repl-leveldb The new Jenkins job starts a cluster of nodes in library mode, and then tries to stop and start a single node in the cluster 3 times. The crash in the JNI code happens when stopping the node the first time. The test case code is not using JON, it is using the Infinispan/JDG API to stop the cache and cachestore on the single node. This might happen when a node is being removed from the cluster. Created attachment 925183 [details]
crash log
Attaching crash log from one instance of this issue.
I think that LevelDB can't handle correctly concurrent close and operations in another threads. I've assembled https://github.com/rvansa/jdg/tree/BZ1113585/LevelDB_JVM_crash/jdg_6.3.x with semaphore giving exclusive access for close operation and the test which was previously crashing the node now passes. Divya: It can affect throughput because any thread writing the store has to acquire the permit from the semaphore. However, writes can proceed concurrently; the only synchronization is some atomic CAS operation inside the semaphore. Verified that the JVM crash does not exist in JDG 6.3.1 ER1. Performance test with and without this fix is next. Executed distributed and replicated tests with JDG 6.3.0 ahd 6.3.1 ER1. No performance regressions for reads or writes were observed. https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/PERF-LIB/job/jdg-radargun-leveldb-jni-test/ |