1017432 – Increase permissions_validity_in_ms setting for storage node

Bug 1017432 - Increase permissions_validity_in_ms setting for storage node

Summary: Increase permissions_validity_in_ms setting for storage node

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Database
Sub Component:
Version:	JON 3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ER04
Target Release:	JON 3.2.0
Assignee:	John Sanda
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:	1017372
Blocks:	1012435
TreeView+	depends on / blocked

Reported:	2013-10-09 21:03 UTC by John Sanda
Modified:	2014-01-02 20:38 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:	1017372
Environment:
Last Closed:
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
permissions_validity_in_ms (167.57 KB, image/png) 2013-11-15 14:00 UTC, Armine Hovsepyan	no flags	Details
View All

Description John Sanda 2013-10-09 21:03:31 UTC

+++ This bug was initially created as a clone of Bug #1017372 +++

Description of problem:
The storage node uses org.apache.cassandra.auth.CassandraAuthorizer for authorization checks. This imposes a non-trivial amount of overhead because now authorization checks are performed for each read/write request. To mitigate that overhead, a local cache of permissions is stored. The default lifetime for a cache entry is set by the permissions_validity_in_ms property in cassandra.yaml. It defaults to two seconds.

When a node comes under heavy load, I have on several occassions started seeing read timeout exceptions, even on writes. This is because of the authorization check which very frequently has to query the system_auth.permissions table. The exceptions look like in rhq-storage.log look like,

ERROR [Native-Transport-Requests:1101] 2013-10-07 14:06:52,730 ErrorMessage.java (line 210) Unexpected exception during request
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2258)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3990)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878)
        at org.apache.cassandra.service.ClientState.authorize(ClientState.java:290)
        at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:170)
        at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:163)
        at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:147)
        at org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:67)
        at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:100)
        at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:223)
        at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:121)
        at org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:287)

I want to make set permissions_validity_in_ms to five minutes which substantially reduces the overhead of the authorization checks but does not allow the permissions to get stale either.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from John Sanda on 2013-10-09 14:17:02 EDT ---

I committed the change to master. I set the timeout to 10 minutes though. I had been testing with 10 minutes, not 5.

master commit hash:  d61b7ed441b25

Comment 1 John Sanda 2013-10-09 21:10:31 UTC

Commit pushed to release/jon3.2.x branch.

commit hash: c4018b21d3af

Comment 2 Simeon Pinder 2013-10-24 04:09:57 UTC

Moving to ON_QA for testing in the next build.

Comment 3 Armine Hovsepyan 2013-11-15 14:00:16 UTC

Created attachment 824558 [details]
permissions_validity_in_ms

Comment 4 Armine Hovsepyan 2013-11-15 14:09:44 UTC

verified
no time out exceptions in rhq-storage.log -> http://d.pr/f/FHE9

scenario was:
1. installing storage, server and agent on slow environment - IP1
2. installing and starting storage on IP2 and connected to IP1
3. ran repair operation on storage in IP1 (expecting time out - no time out visible)

Note You need to log in before you can comment on or make changes to this bug.