Bug 1505156 - [starter-ca-central-1] Metrics in crash loop backoff due to insufficient disk
Summary: [starter-ca-central-1] Metrics in crash loop backoff due to insufficient disk
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Unknown
Version: 3.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Abhishek Gupta
QA Contact: Xiaoli Tian
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-22 15:30 UTC by Justin Pierce
Modified: 2020-11-24 12:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-23 17:24:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Justin Pierce 2017-10-22 15:30:32 UTC
[root@starter-ca-central-1-master-692e9 ~]# oc logs hawkular-cassandra-2-3zvk7
The MAX_HEAP_SIZE envar is not set. Basing the MAX_HEAP_SIZE on the available memory limit for the pod (2000000000).
The memory limit is less than 2GB. Using 1/2 of available memory for the max_heap_size.
The MAX_HEAP_SIZE has been set to 953M
The HEAP_NEWSIZE envar is not set. Setting the HEAP_NEWSIZE to one third the MAX_HEAP_SIZE: 317M
About to generate seeds
Trying to access the Seed list [try #1]
Setting seeds to be 10.131.2.91
Creating the Cassandra keystore from the Secret's cert data
Converting the PKCS12 keystore into a Java Keystore
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
Entry for alias cassandra successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled
[Storing /opt/apache-cassandra/conf/.keystore]
Building the trust store for inter node communication
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
Certificate was added to keystore
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
Certificate was added to keystore
/opt/apache-cassandra/bin/cassandra-docker.sh: line 308: cd: /home/jboss: Permission denied
Building the trust store for client communication
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
Certificate was added to keystore
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
Certificate was added to keystore
Generating self signed certificates for the local client for cqlsh
Generating a 4096 bit RSA private key
...........................................................++
..........................................................++
writing new private key to '.cassandra.local.client.key'
-----
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
Certificate was added to keystore
The previous version of Cassandra was 3.0.14.redhat-1. The current version is 3.0.14.redhat-1
cat: /etc/ld.so.conf.d/*.conf: No such file or directory
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
OpenJDK 64-Bit Server VM warning: Cannot open file /opt/apache-cassandra/logs/gc.log due to No such file or directory

CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.deserializeLargeSubset (Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/Columns;I)Lorg/apache/cassandra/db/Columns;
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubset (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;ILorg/apache/cassandra/io/util/DataOutputPlus;)V
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubsetSize (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;I)I
CompilerOracle: dontinline org/apache/cassandra/db/transform/BaseIterator.tryGetMoreContents ()Z
CompilerOracle: dontinline org/apache/cassandra/db/transform/StoppingTransformation.stop ()V
CompilerOracle: dontinline org/apache/cassandra/db/transform/StoppingTransformation.stopInPartition ()V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.doFlush (I)V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.writeExcessSlow ()V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.writeSlow (JI)V
CompilerOracle: dontinline org/apache/cassandra/io/util/RebufferingInputStream.readPrimitiveSlowly (I)J
CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.selectBoundary (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;II)I
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.strictnessOfLessThan (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/vint/VIntCoding.encodeVInt (JI)[B
INFO  [main] 2017-10-22 15:26:56,101 YamlConfigurationLoader.java:85 - Configuration location: file:/opt/apache-cassandra-3.0.14.redhat-1/conf/cassandra.yaml
INFO  [main] 2017-10-22 15:26:56,240 Config.java:457 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=true; auto_snapshot=true; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=null; broadcast_rpc_address=null; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; client_encryption_options=<REDACTED>; cluster_name=hawkular-metrics; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_compression=null; commitlog_directory=/cassandra_data/commitlog; commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_batch_window_in_ms=null; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=null; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_compactors=null; concurrent_counter_writes=32; concurrent_materialized_view_writes=32; concurrent_reads=32; concurrent_replicates=null; concurrent_writes=32; counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[Ljava.lang.String;@275710fc; disk_access_mode=auto; disk_failure_policy=stop; disk_optimization_estimate_percentile=0.95; disk_optimization_page_cross_chance=0.1; disk_optimization_strategy=ssd; dynamic_snitch=true; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; enable_scripted_user_defined_functions=false; enable_user_defined_functions=false; enable_user_defined_functions_threads=true; encryption_options=null; endpoint_snitch=SimpleSnitch; file_cache_size_in_mb=512; gc_log_threshold_in_ms=200; gc_warn_threshold_in_ms=1000; hinted_handoff_disabled_datacenters=[]; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; hints_compression=null; hints_directory=null; hints_flush_period_in_ms=10000; incremental_backups=false; index_interval=null; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=null; inter_dc_stream_throughput_outbound_megabits_per_sec=200; inter_dc_tcp_nodelay=false; internode_authenticator=null; internode_compression=all; internode_recv_buff_size_in_bytes=null; internode_send_buff_size_in_bytes=null; key_cache_keys_to_save=2147483647; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=hawkular-cassandra-2-3zvk7; listen_interface=null; listen_interface_prefer_ipv6=false; listen_on_broadcast_address=false; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; max_hints_file_size_in_mb=128; max_mutation_size_in_kb=null; max_streaming_retries=3; max_value_size_in_mb=256; memtable_allocation_type=heap_buffers; memtable_cleanup_threshold=null; memtable_flush_writers=null; memtable_heap_space_in_mb=null; memtable_offheap_space_in_mb=null; min_free_space_per_drive_in_mb=50; native_transport_max_concurrent_connections=-1; native_transport_max_concurrent_connections_per_ip=-1; native_transport_max_frame_size_in_mb=256; native_transport_max_threads=128; native_transport_port=9042; native_transport_port_ssl=null; num_tokens=256; otc_backlog_expiration_interval_ms=200; otc_coalescing_enough_coalesced_messages=8; otc_coalescing_strategy=TIMEHORIZON; otc_coalescing_window_us=200; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_cache_max_entries=1000; permissions_update_interval_in_ms=-1; permissions_validity_in_ms=2000; phi_convict_threshold=8.0; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_scheduler_id=null; request_scheduler_options=null; request_timeout_in_ms=10000; role_manager=CassandraRoleManager; roles_cache_max_entries=1000; roles_update_interval_in_ms=-1; roles_validity_in_ms=2000; row_cache_class_name=org.apache.cassandra.cache.OHCProvider; row_cache_keys_to_save=2147483647; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=hawkular-cassandra-2-3zvk7; rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true; rpc_listen_backlog=50; rpc_max_threads=2147483647; rpc_min_threads=16; rpc_port=9160; rpc_recv_buff_size_in_bytes=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync; saved_caches_directory=null; seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=10.131.2.91}; server_encryption_options=<REDACTED>; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=false; storage_port=7000; stream_throughput_outbound_megabits_per_sec=200; streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15; thrift_max_message_length_in_mb=16; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; tracetype_query_ttl=86400; tracetype_repair_ttl=604800; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; user_defined_function_fail_timeout=1500; user_defined_function_warn_timeout=500; user_function_timeout_policy=die; windows_timer_interval=1; write_request_timeout_in_ms=2000]
INFO  [main] 2017-10-22 15:26:56,241 DatabaseDescriptor.java:323 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO  [main] 2017-10-22 15:26:56,404 DatabaseDescriptor.java:430 - Global memtable on-heap threshold is enabled at 230MB
INFO  [main] 2017-10-22 15:26:56,404 DatabaseDescriptor.java:434 - Global memtable off-heap threshold is enabled at 230MB
WARN  [main] 2017-10-22 15:26:56,432 DatabaseDescriptor.java:612 - Only 5162 MB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
INFO  [main] 2017-10-22 15:26:56,553 CassandraDaemon.java:434 - Hostname: hawkular-cassandra-2-3zvk7
INFO  [main] 2017-10-22 15:26:56,554 CassandraDaemon.java:441 - JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_144
INFO  [main] 2017-10-22 15:26:56,554 CassandraDaemon.java:442 - Heap size: 967114752/967114752
INFO  [main] 2017-10-22 15:26:56,555 CassandraDaemon.java:445 - Code Cache Non-heap memory: init = 2555904(2496K) used = 6130304(5986K) committed = 6160384(6016K) max = 251658240(245760K)
INFO  [main] 2017-10-22 15:26:56,555 CassandraDaemon.java:445 - Metaspace Non-heap memory: init = 0(0K) used = 17078856(16678K) committed = 17563648(17152K) max = -1(-1K)
INFO  [main] 2017-10-22 15:26:56,556 CassandraDaemon.java:445 - Compressed Class Space Non-heap memory: init = 0(0K) used = 1962496(1916K) committed = 2097152(2048K) max = 1073741824(1048576K)
INFO  [main] 2017-10-22 15:26:56,556 CassandraDaemon.java:445 - Par Eden Space Heap memory: init = 265945088(259712K) used = 117032104(114289K) committed = 265945088(259712K) max = 265945088(259712K)
INFO  [main] 2017-10-22 15:26:56,556 CassandraDaemon.java:445 - Par Survivor Space Heap memory: init = 33226752(32448K) used = 0(0K) committed = 33226752(32448K) max = 33226752(32448K)
INFO  [main] 2017-10-22 15:26:56,556 CassandraDaemon.java:445 - CMS Old Gen Heap memory: init = 667942912(652288K) used = 0(0K) committed = 667942912(652288K) max = 667942912(652288K)
INFO  [main] 2017-10-22 15:26:56,557 CassandraDaemon.java:447 - Classpath: /opt/apache-cassandra/conf:/opt/apache-cassandra/build/classes/main:/opt/apache-cassandra/build/classes/thrift:/opt/apache-cassandra/lib/ST4-4.0.8.jar:/opt/apache-cassandra/lib/airline-0.6.jar:/opt/apache-cassandra/lib/ant-junit-1.9.4.jar:/opt/apache-cassandra/lib/antlr-runtime-3.5.2.jar:/opt/apache-cassandra/lib/apache-cassandra-3.0.14.redhat-1.jar:/opt/apache-cassandra/lib/apache-cassandra-clientutil-3.0.14.redhat-1.jar:/opt/apache-cassandra/lib/apache-cassandra-thrift-3.0.14.redhat-1.jar:/opt/apache-cassandra/lib/asm-5.0.4.jar:/opt/apache-cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/opt/apache-cassandra/lib/commons-cli-1.1.jar:/opt/apache-cassandra/lib/commons-codec-1.2.jar:/opt/apache-cassandra/lib/commons-lang3-3.1.jar:/opt/apache-cassandra/lib/commons-math3-3.2.jar:/opt/apache-cassandra/lib/compress-lzf-0.8.4.jar:/opt/apache-cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/opt/apache-cassandra/lib/disruptor-3.0.1.jar:/opt/apache-cassandra/lib/ecj-4.4.2.jar:/opt/apache-cassandra/lib/guava-18.0.jar:/opt/apache-cassandra/lib/high-scale-lib-1.0.6.jar:/opt/apache-cassandra/lib/jackson-core-asl-1.9.2.jar:/opt/apache-cassandra/lib/jackson-mapper-asl-1.9.2.jar:/opt/apache-cassandra/lib/jamm-0.3.0.jar:/opt/apache-cassandra/lib/javax.inject.jar:/opt/apache-cassandra/lib/jbcrypt-0.3m.jar:/opt/apache-cassandra/lib/jcl-over-slf4j-1.7.7.jar:/opt/apache-cassandra/lib/jmx_prometheus_javaagent.jar:/opt/apache-cassandra/lib/jna-4.4.0.jar:/opt/apache-cassandra/lib/joda-time-2.4.jar:/opt/apache-cassandra/lib/json-simple-1.1.jar:/opt/apache-cassandra/lib/jstackjunit-0.0.1.jar:/opt/apache-cassandra/lib/libthrift-0.9.2.jar:/opt/apache-cassandra/lib/log4j-over-slf4j-1.7.7.jar:/opt/apache-cassandra/lib/logback-classic-1.1.3.jar:/opt/apache-cassandra/lib/logback-core-1.1.3.jar:/opt/apache-cassandra/lib/lz4-1.3.0.jar:/opt/apache-cassandra/lib/metrics-core-3.1.0.jar:/opt/apache-cassandra/lib/metrics-jvm-3.1.0.jar:/opt/apache-cassandra/lib/metrics-logback-3.1.0.jar:/opt/apache-cassandra/lib/netty-all-4.0.44.Final.jar:/opt/apache-cassandra/lib/ohc-core-0.4.3.jar:/opt/apache-cassandra/lib/ohc-core-j8-0.4.3.jar:/opt/apache-cassandra/lib/reporter-config-base-3.0.0.jar:/opt/apache-cassandra/lib/reporter-config3-3.0.0.jar:/opt/apache-cassandra/lib/sigar-1.6.4.jar:/opt/apache-cassandra/lib/slf4j-api-1.7.7.jar:/opt/apache-cassandra/lib/snakeyaml-1.11.jar:/opt/apache-cassandra/lib/snappy-java-1.1.1.7.jar:/opt/apache-cassandra/lib/stream-2.5.2.jar:/opt/apache-cassandra/lib/thrift-server-0.3.7.jar:/opt/apache-cassandra/lib/jsr223/*/*.jar:/opt/apache-cassandra/lib/jamm-0.3.0.jar
INFO  [main] 2017-10-22 15:26:56,557 CassandraDaemon.java:449 - JVM Arguments: [-Duser.home=/home/jboss, -Duser.name=jboss, -Dcassandra.commitlog.ignorereplayerrors=true, -Xloggc:/opt/apache-cassandra/logs/gc.log, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:CMSWaitDuration=10000, -XX:+CMSParallelInitialMarkEnabled, -XX:+CMSEdenChunksRecordAlways, -XX:+CMSClassUnloadingEnabled, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintHeapAtGC, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -XX:+PrintPromotionFailure, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=10, -XX:GCLogFileSize=10M, -Xms953M, -Xmx953M, -Xmn317M, -ea, -Xss256k, -XX:+AlwaysPreTouch, -XX:-UseBiasedLocking, -XX:StringTableSize=1000003, -XX:+UseTLAB, -XX:+ResizeTLAB, -XX:+PerfDisableSharedMem, -XX:CompileCommandFile=/opt/apache-cassandra/conf/hotspot_compiler, -javaagent:/opt/apache-cassandra/lib/jamm-0.3.0.jar, -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -XX:+HeapDumpOnOutOfMemoryError, -Djava.net.preferIPv4Stack=true, -Dcassandra.jmx.local.port=7199, -XX:+DisableExplicitGC, -Djava.library.path=/opt/apache-cassandra/lib/sigar-bin, -Dlogback.configurationFile=logback.xml, -Dcassandra.logdir=/opt/apache-cassandra/logs, -Dcassandra.storagedir=/opt/apache-cassandra/data, -Dcassandra-foreground=yes]
WARN  [main] 2017-10-22 15:26:56,618 NativeLibrary.java:180 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN  [main] 2017-10-22 15:26:56,619 StartupChecks.java:121 - jemalloc shared library could not be preloaded to speed up memory allocations
WARN  [main] 2017-10-22 15:26:56,619 StartupChecks.java:153 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
WARN  [main] 2017-10-22 15:26:56,620 StartupChecks.java:178 - OpenJDK is not recommended. Please upgrade to the newest Oracle Java release
INFO  [main] 2017-10-22 15:26:56,621 SigarLibrary.java:44 - Initializing SIGAR library
INFO  [main] 2017-10-22 15:26:56,635 SigarLibrary.java:180 - Checked OS settings and found them configured for optimal performance.
WARN  [main] 2017-10-22 15:26:56,640 StartupChecks.java:246 - Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low, recommended value: 1048575, you can change it with sysctl.
WARN  [main] 2017-10-22 15:26:56,653 StartupChecks.java:267 - Directory /opt/apache-cassandra/data/saved_caches doesn't exist
WARN  [main] 2017-10-22 15:26:56,664 StartupChecks.java:267 - Directory /opt/apache-cassandra/data/hints doesn't exist
INFO  [main] 2017-10-22 15:26:57,746 ColumnFamilyStore.java:389 - Initializing system.IndexInfo
INFO  [SSTableBatchOpen:1] 2017-10-22 15:26:58,423 BufferPool.java:226 - Global buffer pool is enabled, when pool is exahusted (max is 512 mb) it will allocate on heap
INFO  [main] 2017-10-22 15:26:58,461 CacheService.java:115 - Initializing key cache with capacity of 46 MBs.
INFO  [main] 2017-10-22 15:26:58,468 CacheService.java:137 - Initializing row cache with capacity of 0 MBs
INFO  [main] 2017-10-22 15:26:58,470 CacheService.java:166 - Initializing counter cache with capacity of 23 MBs
INFO  [main] 2017-10-22 15:26:58,471 CacheService.java:177 - Scheduling counter cache save to every 7200 seconds (going to save all keys).
INFO  [main] 2017-10-22 15:26:58,488 ColumnFamilyStore.java:389 - Initializing system.batches
INFO  [main] 2017-10-22 15:26:58,493 ColumnFamilyStore.java:389 - Initializing system.paxos
INFO  [main] 2017-10-22 15:26:58,504 ColumnFamilyStore.java:389 - Initializing system.local
INFO  [main] 2017-10-22 15:26:58,513 ColumnFamilyStore.java:389 - Initializing system.peers
INFO  [main] 2017-10-22 15:26:58,522 ColumnFamilyStore.java:389 - Initializing system.peer_events
INFO  [main] 2017-10-22 15:26:58,525 ColumnFamilyStore.java:389 - Initializing system.range_xfers
INFO  [main] 2017-10-22 15:26:58,530 ColumnFamilyStore.java:389 - Initializing system.compaction_history
INFO  [main] 2017-10-22 15:26:58,539 ColumnFamilyStore.java:389 - Initializing system.sstable_activity
INFO  [main] 2017-10-22 15:26:58,547 ColumnFamilyStore.java:389 - Initializing system.size_estimates
INFO  [main] 2017-10-22 15:26:58,557 ColumnFamilyStore.java:389 - Initializing system.available_ranges
INFO  [main] 2017-10-22 15:26:58,563 ColumnFamilyStore.java:389 - Initializing system.views_builds_in_progress
INFO  [main] 2017-10-22 15:26:58,566 ColumnFamilyStore.java:389 - Initializing system.built_views
INFO  [main] 2017-10-22 15:26:58,569 ColumnFamilyStore.java:389 - Initializing system.hints
INFO  [main] 2017-10-22 15:26:58,573 ColumnFamilyStore.java:389 - Initializing system.batchlog
INFO  [main] 2017-10-22 15:26:58,577 ColumnFamilyStore.java:389 - Initializing system.schema_keyspaces
INFO  [main] 2017-10-22 15:26:58,580 ColumnFamilyStore.java:389 - Initializing system.schema_columnfamilies
INFO  [main] 2017-10-22 15:26:58,584 ColumnFamilyStore.java:389 - Initializing system.schema_columns
INFO  [main] 2017-10-22 15:26:58,588 ColumnFamilyStore.java:389 - Initializing system.schema_triggers
INFO  [main] 2017-10-22 15:26:58,593 ColumnFamilyStore.java:389 - Initializing system.schema_usertypes
INFO  [main] 2017-10-22 15:26:58,599 ColumnFamilyStore.java:389 - Initializing system.schema_functions
INFO  [main] 2017-10-22 15:26:58,604 ColumnFamilyStore.java:389 - Initializing system.schema_aggregates
Exception (java.lang.RuntimeException) encountered during startup: java.util.concurrent.ExecutionException: FSDiskFullWriteError in 
java.lang.RuntimeException: java.util.concurrent.ExecutionException: FSDiskFullWriteError in 
	at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:402)
	at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:772)
	at org.apache.cassandra.db.SystemKeyspace.removeTruncationRecord(SystemKeyspace.java:623)
	at org.apache.cassandra.db.ColumnFamilyStore.invalidate(ColumnFamilyStore.java:519)
	at org.apache.cassandra.db.ColumnFamilyStore.invalidate(ColumnFamilyStore.java:495)
	at org.apache.cassandra.schema.LegacySchemaMigrator.lambda$unloadLegacySchemaTables$1(LegacySchemaMigrator.java:137)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at org.apache.cassandra.schema.LegacySchemaMigrator.unloadLegacySchemaTables(LegacySchemaMigrator.java:137)
	at org.apache.cassandra.schema.LegacySchemaMigrator.migrate(LegacySchemaMigrator.java:83)
	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:235)
	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697)
Caused by: java.util.concurrent.ExecutionException: FSDiskFullWriteError in 
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:398)
	... 11 more
Caused by: FSDiskFullWriteError in 
	at org.apache.cassandra.db.Directories.getWriteableLocation(Directories.java:389)
	at org.apache.cassandra.db.Memtable.flush(Memtable.java:323)
	at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1050)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Insufficient disk space to write 436 bytes
	... 7 more
ERROR [main] 2017-10-22 15:26:58,797 CassandraDaemon.java:710 - Exception encountered during startup
java.lang.RuntimeException: java.util.concurrent.ExecutionException: FSDiskFullWriteError in 
	at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:402) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush(SystemKeyspace.java:772) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.db.SystemKeyspace.removeTruncationRecord(SystemKeyspace.java:623) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.db.ColumnFamilyStore.invalidate(ColumnFamilyStore.java:519) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.db.ColumnFamilyStore.invalidate(ColumnFamilyStore.java:495) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.schema.LegacySchemaMigrator.lambda$unloadLegacySchemaTables$1(LegacySchemaMigrator.java:137) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_144]
	at org.apache.cassandra.schema.LegacySchemaMigrator.unloadLegacySchemaTables(LegacySchemaMigrator.java:137) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.schema.LegacySchemaMigrator.migrate(LegacySchemaMigrator.java:83) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:235) [apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569) [apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) [apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
Caused by: java.util.concurrent.ExecutionException: FSDiskFullWriteError in 
	at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_144]
	at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_144]
	at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:398) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	... 11 common frames omitted
Caused by: org.apache.cassandra.io.FSDiskFullWriteError: java.io.IOException: Insufficient disk space to write 436 bytes
	at org.apache.cassandra.db.Directories.getWriteableLocation(Directories.java:389) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.db.Memtable.flush(Memtable.java:323) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1050) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
	at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.0.14.redhat-1.jar:3.0.14.redhat-1]
	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
Caused by: java.io.IOException: Insufficient disk space to write 436 bytes
	... 7 common frames omitted



Version-Release number of selected component (if applicable):
v3.7.0-0.143.7

Comment 1 Matt Wringe 2017-10-23 13:23:18 UTC
Setting this to unknown since it doesn't look like we have a component for ops

Comment 2 Matt Wringe 2017-10-23 13:59:58 UTC
From the Docs:

"If you are using persistent storage with Cassandra, it is the administrator’s responsibility to set a sufficient disk size for the cluster using the openshift_metrics_cassandra_pvc_size variable. It is also the administrator’s responsibility to monitor disk usage to make sure that it does not become full.

Data loss will result if the Cassandra persisted volume runs out of sufficient space."

The PVs should have been monitored so that someone could have stepped in to help before it reached this point.

@jsanda: if we just increase the PV, can we get things working again? Or have been corrupted the database and either need to run some clean up commands or deal with the consequences.

Comment 3 John Sanda 2017-10-23 14:05:58 UTC
(In reply to Matt Wringe from comment #2)
> From the Docs:
> 
> "If you are using persistent storage with Cassandra, it is the
> administrator’s responsibility to set a sufficient disk size for the cluster
> using the openshift_metrics_cassandra_pvc_size variable. It is also the
> administrator’s responsibility to monitor disk usage to make sure that it
> does not become full.
> 
> Data loss will result if the Cassandra persisted volume runs out of
> sufficient space."
> 
> The PVs should have been monitored so that someone could have stepped in to
> help before it reached this point.
> 
> @jsanda: if we just increase the PV, can we get things working again? Or
> have been corrupted the database and either need to run some clean up
> commands or deal with the consequences.

I would try increasing the size of the PV. It looks like the failure happened during commit log replay. Any replay that failed or did not yet happen can be tried again after resizing the PV.

Comment 4 Justin Pierce 2017-10-23 17:24:47 UTC
I've increased the PV from 100GB to 500GB and resized the filesystem. The cassandra pods are now running. Thanks @mwringe and @jsanda.


Note You need to log in before you can comment on or make changes to this bug.