Bug 1418748
| Summary: | Hawkular metrics fails to deploy when using EFS | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Chris Callegari <ccallega> | ||||
| Component: | Hawkular | Assignee: | Matt Wringe <mwringe> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Peng Li <penli> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.4.0 | CC: | aos-bugs, ccallega, mwringe, tdawson | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause:
The JDK and Cassandra cannot properly handle determining the filesize on extremely large filesystems, such as EFS.
Consequence:
Cassandra will try and read the filesystem size when it starts up to configure itself. But it will notice invalid size and fail to start properly.
Fix:
Cassandra has been patched to work around the failure encountered.
Result:
Cassandra will be able to start on systems which are using extremely large filesystems.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-04-12 19:11:08 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Chris Callegari
2017-02-02 16:02:09 UTC
Following exactly your steps, this cannot be reproduced on my system. From the log it appears that this is failing with: Exception (java.lang.IllegalArgumentException) encountered during startup: Out of range: -2199023255552 This is caused by https://github.com/apache/cassandra/blob/cassandra-3.0.9/src/java/org/apache/cassandra/config/DatabaseDescriptor.java#L526 which would indicate its something strange going on when its trying to determine the size of the commit log directory (which should be /cassandra_data/commitlog) Is there anything special about this directory or the filesystem this is on? So since this is being loaded in AWS, the disksize for the directory is huge (exabytes) which is causing issues with the code. It looks like we are running into this java bug https://bugs.openjdk.java.net/browse/JDK-8162520 And possible with Cassandra not considering this large of disk space either (https://github.com/apache/cassandra/blob/cassandra-3.0.9/src/java/org/apache/cassandra/config/DatabaseDescriptor.java#L526). This check can be overwritten by setting a value in cassandra.yaml, but we currently don't expose this option. We may have to update our Cassandra start script to take into consideration this options and allow setting it via an envar or property. Yes. AWS/EFS is a exabyte size nfs target. There really is no ceiling. Therefore it is not a good data point to base cassandra sizing. Red Hat Mobile / SaaS has a deadline of June 1, 2017 to deploy an OpenShift based eval environment for customers. We have plenty of time to get this right. Review what needs to be reviewed. Prioritize the priorities. Thanks, /Chris Callegari The best you are going to have for this for the time being is the work around we are working on (https://github.com/openshift/origin-metrics/pull/292). This will require manual intervention. To properly get this working upstream is most likely going to take much longer. Longer than June 1?? Running this with an exabyte sized filesystem is not supported by the underlying software (eg the JDK). If you want this to work properly and automatically, you will need to change how you are running this. That comment alienates customers wanting to deploy OpenShift to AWS and use AWS/EFS as a storage target for Hawkular Same behavior with NFS based persistent volume
# oc edit pv/metrics-volume
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: 2017-02-09T18:29:59Z
name: metrics-volume
resourceVersion: "3647"
selfLink: /api/v1/persistentvolumes/metrics-volume
uid: c351e1dd-eef5-11e6-8511-06eb61e6059c
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1000Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: metrics-cassandra-1
namespace: openshift-infra
resourceVersion: "3644"
uid: 302a0cdb-eef6-11e6-8f3b-0a16016f2cb4
nfs:
path: //metrics
server: fs-c35efe8a.efs.us-east-1.amazonaws.com
persistentVolumeReclaimPolicy: Retain
status:
phase: Bound
# oc logs -f po/hawkular-cassandra-1-mfcql
The MAX_HEAP_SIZE envar is not set. Basing the MAX_HEAP_SIZE on the available memory limit for the pod (7933222912).
The memory limit is between 4 and 32GB. Using 1/4 of the available memory for the max_heap_size.
The MAX_HEAP_SIZE has been set to 1891M
THE HEAP_NEWSIZE envar is not set. Setting to 200M based on the CPU_LIMIT of 2000. [100M per CPU core]
About to generate seeds
Trying to access the Seed list [try #1]
Trying to access the Seed list [try #2]
Trying to access the Seed list [try #3]
Setting seeds to be hawkular-cassandra-1-mfcql
The previous version of Cassandra was 3.0.9.redhat-1. The current version is 3.0.9.redhat-1
cat: /etc/ld.so.conf.d/*.conf: No such file or directory
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
OpenJDK 64-Bit Server VM warning: Cannot open file /opt/apache-cassandra/logs/gc.log due to No such file or directory
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.deserializeLargeSubset (Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/Columns;I)Lorg/apache/cassandra/db/Columns;
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubset (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;ILorg/apache/cassandra/io/util/DataOutputPlus;)V
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubsetSize (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;I)I
CompilerOracle: dontinline org/apache/cassandra/db/transform/BaseIterator.tryGetMoreContents ()Z
CompilerOracle: dontinline org/apache/cassandra/db/transform/StoppingTransformation.stop ()V
CompilerOracle: dontinline org/apache/cassandra/db/transform/StoppingTransformation.stopInPartition ()V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.doFlush (I)V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.writeExcessSlow ()V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.writeSlow (JI)V
CompilerOracle: dontinline org/apache/cassandra/io/util/RebufferingInputStream.readPrimitiveSlowly (I)J
CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.selectBoundary (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;II)I
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.strictnessOfLessThan (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/vint/VIntCoding.encodeVInt (JI)[B
INFO 18:40:09 Configuration location: file:/opt/apache-cassandra-3.0.9.redhat-1/conf/cassandra.yaml
INFO 18:40:09 Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=true; auto_snapshot=true; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=null; broadcast_rpc_address=null; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; client_encryption_options=<REDACTED>; cluster_name=hawkular-metrics; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_compression=LZ4Compressor; commitlog_directory=/cassandra_data/commitlog; commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_batch_window_in_ms=null; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=null; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_compactors=null; concurrent_counter_writes=32; concurrent_materialized_view_writes=32; concurrent_reads=32; concurrent_replicates=null; concurrent_writes=32; counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[Ljava.lang.String;@105fece7; disk_access_mode=auto; disk_failure_policy=stop; disk_optimization_estimate_percentile=0.95; disk_optimization_page_cross_chance=0.1; disk_optimization_strategy=ssd; dynamic_snitch=true; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; enable_scripted_user_defined_functions=false; enable_user_defined_functions=false; enable_user_defined_functions_threads=true; encryption_options=null; endpoint_snitch=SimpleSnitch; file_cache_size_in_mb=512; gc_log_threshold_in_ms=200; gc_warn_threshold_in_ms=1000; hinted_handoff_disabled_datacenters=[]; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; hints_compression=null; hints_directory=null; hints_flush_period_in_ms=10000; incremental_backups=false; index_interval=null; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=null; inter_dc_stream_throughput_outbound_megabits_per_sec=200; inter_dc_tcp_nodelay=false; internode_authenticator=null; internode_compression=all; internode_recv_buff_size_in_bytes=null; internode_send_buff_size_in_bytes=null; key_cache_keys_to_save=2147483647; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=hawkular-cassandra-1-mfcql; listen_interface=null; listen_interface_prefer_ipv6=false; listen_on_broadcast_address=false; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; max_hints_file_size_in_mb=128; max_mutation_size_in_kb=null; max_streaming_retries=3; max_value_size_in_mb=256; memtable_allocation_type=heap_buffers; memtable_cleanup_threshold=null; memtable_flush_writers=null; memtable_heap_space_in_mb=null; memtable_offheap_space_in_mb=null; min_free_space_per_drive_in_mb=50; native_transport_max_concurrent_connections=-1; native_transport_max_concurrent_connections_per_ip=-1; native_transport_max_frame_size_in_mb=256; native_transport_max_threads=128; native_transport_port=9042; native_transport_port_ssl=null; num_tokens=256; otc_coalescing_strategy=TIMEHORIZON; otc_coalescing_window_us=200; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_cache_max_entries=1000; permissions_update_interval_in_ms=-1; permissions_validity_in_ms=2000; phi_convict_threshold=8.0; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_scheduler_id=null; request_scheduler_options=null; request_timeout_in_ms=10000; role_manager=CassandraRoleManager; roles_cache_max_entries=1000; roles_update_interval_in_ms=-1; roles_validity_in_ms=2000; row_cache_class_name=org.apache.cassandra.cache.OHCProvider; row_cache_keys_to_save=2147483647; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=hawkular-cassandra-1-mfcql; rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true; rpc_listen_backlog=50; rpc_max_threads=2147483647; rpc_min_threads=16; rpc_port=9160; rpc_recv_buff_size_in_bytes=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync; saved_caches_directory=null; seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=hawkular-cassandra-1-mfcql}; server_encryption_options=<REDACTED>; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=false; storage_port=7000; stream_throughput_outbound_megabits_per_sec=200; streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15; thrift_max_message_length_in_mb=16; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; tracetype_query_ttl=86400; tracetype_repair_ttl=604800; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; user_defined_function_fail_timeout=1500; user_defined_function_warn_timeout=500; user_function_timeout_policy=die; windows_timer_interval=1; write_request_timeout_in_ms=2000]
INFO 18:40:09 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO 18:40:09 Global memtable on-heap threshold is enabled at 468MB
INFO 18:40:09 Global memtable off-heap threshold is enabled at 468MB
Exception (java.lang.IllegalArgumentException) encountered during startup: Out of range: -2199023255552
ERROR 18:40:09 Exception encountered during startup
java.lang.IllegalArgumentException: Out of range: -2199023255552
at com.google.common.primitives.Ints.checkedCast(Ints.java:91) ~[guava-18.0.jar:na]
at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:526) ~[apache-cassandra-3.0.9.redhat-1.jar:3.0.9.redhat-1]
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:119) ~[apache-cassandra-3.0.9.redhat-1.jar:3.0.9.redhat-1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:543) [apache-cassandra-3.0.9.redhat-1.jar:3.0.9.redhat-1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:696) [apache-cassandra-3.0.9.redhat-1.jar:3.0.9.redhat-1]
java.lang.IllegalArgumentException: Out of range: -2199023255552
at com.google.common.primitives.Ints.checkedCast(Ints.java:91)
at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:526)
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:119)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:543)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:696)
Just an FYI, NFS is not recommended to be used with metrics as it tends to have poor performance when dealing with even modest sized cluster sizes. We are looking into this issue, but it looks like the work around may not work. The upstream issue is https://issues.apache.org/jira/browse/CASSANDRA-13067 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884 |