Bug 1017372

Summary: Increase permissions_validity_in_ms setting for storage node
Product: [Other] RHQ Project Reporter: John Sanda <jsanda>
Component: DatabaseAssignee: John Sanda <jsanda>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9CC: hrupp
Target Milestone: ---   
Target Release: RHQ 4.10   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1017432 (view as bug list) Environment:
Last Closed: 2014-04-23 12:32:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1017432    

Description John Sanda 2013-10-09 18:00:58 UTC
Description of problem:
The storage node uses org.apache.cassandra.auth.CassandraAuthorizer for authorization checks. This imposes a non-trivial amount of overhead because now authorization checks are performed for each read/write request. To mitigate that overhead, a local cache of permissions is stored. The default lifetime for a cache entry is set by the permissions_validity_in_ms property in cassandra.yaml. It defaults to two seconds.

When a node comes under heavy load, I have on several occassions started seeing read timeout exceptions, even on writes. This is because of the authorization check which very frequently has to query the system_auth.permissions table. The exceptions look like in rhq-storage.log look like,

ERROR [Native-Transport-Requests:1101] 2013-10-07 14:06:52,730 ErrorMessage.java (line 210) Unexpected exception during request
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2258)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3990)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878)
        at org.apache.cassandra.service.ClientState.authorize(ClientState.java:290)
        at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:170)
        at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:163)
        at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:147)
        at org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:67)
        at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:100)
        at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:223)
        at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:121)
        at org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:287)

I want to make set permissions_validity_in_ms to five minutes which substantially reduces the overhead of the authorization checks but does not allow the permissions to get stale either.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2013-10-09 18:17:02 UTC
I committed the change to master. I set the timeout to 10 minutes though. I had been testing with 10 minutes, not 5.

master commit hash:  d61b7ed441b25

Comment 2 Heiko W. Rupp 2014-04-23 12:32:18 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.