Bug 1015628
| Summary: | Enable compression of storage node data | |||
|---|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | John Sanda <jsanda> | |
| Component: | Core Server, Performance, Storage Node | Assignee: | John Sanda <jsanda> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.9 | CC: | hrupp | |
| Target Milestone: | --- | |||
| Target Release: | RHQ 4.10 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1065652 (view as bug list) | Environment: | ||
| Last Closed: | 2014-04-23 12:31:46 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1011084, 1065652 | |||
|
Description
John Sanda
2013-10-04 16:07:19 UTC
Starting in C* 2.0, lz4 is the default compression using the lz4-java (https://github.com/jpountz/lz4-java) library. lz4-java provides three implementations - one using native libraries via JNI, a Java port that uses the sun.misc.Unsafe API, and lastly a pure Java one. I have started doing some work to re-enable compression using the LZ4Compressor. For the same reasons we ditched snappy-java, we want to avoid the JNI impl. We do not want to support platform-specific libraries. In a local branch, I have made build changes to strip out the native libraries from lz4-java. The Unsafe API is available in both Oracle and OpenJDK JREs. IBM JREs can fall back to using the pure Java impl. I have already tested a heterogeneous JRE deployment where one node was running IBM Java and another was running OpenJDK. Everything worked as expected. It is worth noting that we have internode compression disabled; so on that basis alone, I would expect the heterogeneous JRE deployment to work. I also tested switching a node from OpenJDK to IBM. This resulted in some errors like, ERROR [CompactionExecutor:8] 2013-11-27 10:09:24,855 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:8,1,main] org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.io.compress.CorruptBlockException: (/home/hudson/rhq/rhq-server-4.10.0-SNAPSHOT-lz4/rhq-storage/bin/../../../rhq-data/data/rhq/metrics_index/rhq-metrics_index-ic-1297-Data.db): corruption detected, chunk at 0 of length 1467. ... Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/home/hudson/rhq/rhq-server-4.10.0-SNAPSHOT-lz4/rhq-storage/bin/../../../rhq-data/data/rhq/metrics_index/rhq-metrics_index-ic-1297-Data.db): corruption detected, chunk at 0 of length 1467. at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:120) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:85) ... 25 more In the event that a user decides to switch from Oracle/OpenJDK to IBM (or vice-versa), there is a work around. Delete the corrupted sstable files and run repair on the table in question. For reference, the LZ4Compressor was added under https://issues.apache.org/jira/browse/CASSANDRA-5038. In terms of performance, lz4-java does very well. Some benchmark results of various compression libraries can be found at https://github.com/ning/jvm-compressor-benchmark/wiki. Enabling compression will reduce the data size on disk and should improve read performance. I am in the process of running some tests to compare read performance with and without compression. I will report back the results here. Compression has been re-enabled using lz4-java, and it has been repackaged to strip out its native components. This means that only the pure Java impl(s) will be used. Compression has been re-enabled using lz4-java, and it has been repackaged to strip out its native components. This means that only the pure Java impl(s) will be used. master commit hash: cb35dada1 I think (as master has shown) this may need more work like making sure that the lz4 lib is distributed with the storage node and that a special schema upgrade task needs to be triggered. Re-opening. Heiko, the problems you encountered are specific to the dev-container. If you review the commit cited in comment 3 you will see that lz4-java is in fact packaged with the Storage Node. In fact, lz4-java is included with a stock Cassandra distro. My commit strips out the native components. And the schema change is applied. I have tested upgrading a 4.10 snapshot build. If need be, I would prefer to open a separate BZ for handling the dev-container since the issues you hit are specific to the dev environment. Right now we do not have a really good C* db upgrade process in place for development, partly because we have not had to deal with it much yet. I do not think that should block issues that do not effect regular installs. I retested upgrading from 4.9.0 and there were no issues. Moving this back to ON_QA. Bulk closing of 4.10 issues. If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10. |