1404888 – Accumulo fails to create write-ahead logs

Bug 1404888 - Accumulo fails to create write-ahead logs

Summary: Accumulo fails to create write-ahead logs

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	accumulo
Sub Component:
Version:	25
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Christopher Tubbs
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-12-15 00:26 UTC by Christopher Tubbs
Modified:	2016-12-17 00:40 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-12-17 00:40:29 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
tserver-stack-trace.log (5.17 KB, text/plain) 2016-12-15 00:26 UTC, Christopher Tubbs	no flags	Details
tserver-stack-trace.log (82.40 KB, text/plain) 2016-12-15 00:34 UTC, Christopher Tubbs	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Apache JIRA	ACCUMULO-4536	0	None	None	None	2016-12-17 00:40:42 UTC

Description Christopher Tubbs 2016-12-15 00:26:04 UTC

Created attachment 1231930 [details]
tserver-stack-trace.log

Description of problem:
When running Accumulo on top of a real HDFS volume (hdfs:// instead of file://), Accumulo fails to create write-ahead logs (WALs) and gets stuck and unresponsive.

Version-Release number of selected component (if applicable):
accumulo-1.6.6-13.fc25.x86_64
hadoop-common-2.4.1-24.fc25.noarch

How reproducible:
100%

Steps to Reproduce:
1. Start Zookeeper
2. Start Hadoop (HDFS: datanode and namenode)
3. Configure Accumulo to use HDFS
4. Initialize Accumulo
5. Start Accumulo

Actual results:
Tserver logs show that Accumulo is in an infinite loop, trying to create a WAL file in HDFS, but repeatedly getting an error creating the file. Looking in HDFS, one can see many zero-length WAL files from all the retries.

Expected results:
Tserver should create a WAL successfully and continue normally.

Additional info:
The workaround is to disable write-ahead logs by setting table.walog.enabled to false in /etc/accumulo/accumulo-site.xml before starting Accumulo. Since this property is overridden for the accumulo.root and accumulo.metadata tables, one must change this property manually in ZooKeeper for those tables, which is a somewhat advanced operation.

Comment 1 Christopher Tubbs 2016-12-15 00:34:19 UTC

Created attachment 1231931 [details]
tserver-stack-trace.log

Comment 2 Mike Miller 2016-12-15 14:41:44 UTC

Tested 1.6.6 with Hadoop 2.4.1 using just tar balls on a local setup and the WAL works fine.

Comment 3 Christopher Tubbs 2016-12-15 20:59:33 UTC

(In reply to Mike Miller from comment #2)
> Tested 1.6.6 with Hadoop 2.4.1 using just tar balls on a local setup and the
> WAL works fine.

Yeah, I was also able to confirm this is not an upstream bug. I strongly suspect it's a classpath issue. We're probably missing an essential jar in Hadoop's classpath.

Comment 4 Christopher Tubbs 2016-12-17 00:40:29 UTC

Ugh, the problem was simple. The default walog size is about 1G. Too big for the small disks we were testing with. Setting tserver.walog.max.size to 100M fixed the problem. Allocating larger disks would also solve the problem.

Reporting the infinite loop to upstream. Nothing we can do about it here other than to recommend larger disks or lowering the walog max size.

Note You need to log in before you can comment on or make changes to this bug.