Bug 765347 (GLUSTER-3615)

Summary: terasort failed with quick.slave.io on
Product: [Community] GlusterFS Reporter: M S Vishwanath Bhat <vbhat>
Component: HDFSAssignee: Steve Watt <swatt>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: pre-releaseCC: bugs, gluster-bugs, mzywusko, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 15:40:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
complete stack trace of terasort none

Description M S Vishwanath Bhat 2011-09-23 03:35:19 UTC
Created attachment 670


The gluster configuration was 2*2 stripe-replicate.

Comment 1 M S Vishwanath Bhat 2011-09-23 06:29:23 UTC
Created data for terasort with 10000000 rows. Now when start terasort job, it failed with following backtrace.

It says that space not available. But there is actually space available in all the back ends. 

11/09/23 01:57:11 INFO mapred.JobClient: Task Id : attempt_201109220435_0004_m_000142_0, Status : FAILED
java.io.IOException: Spill failed
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:860)
        at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
        at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201109220435_0004/attempt_201109220435_0004_m_000142_0/output/spill0.out
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)

attempt_201109220435_0004_m_000142_0: Initializing GlusterFS
11/09/23 01:57:13 INFO mapred.JobClient:  map 95% reduce 14%
11/09/23 01:57:17 INFO mapred.JobClient:  map 96% reduce 14%
11/09/23 01:57:22 INFO mapred.JobClient:  map 97% reduce 14%
11/09/23 01:57:29 INFO mapred.JobClient:  map 98% reduce 14%
11/09/23 01:57:32 INFO mapred.JobClient:  map 99% reduce 14%
11/09/23 01:57:37 INFO mapred.JobClient:  map 100% reduce 14%
11/09/23 01:57:49 INFO mapred.JobClient:  map 100% reduce 15%
11/09/23 01:57:52 INFO mapred.JobClient:  map 100% reduce 0%
11/09/23 01:57:54 INFO mapred.JobClient: Task Id : attempt_201109220435_0004_r_000000_1, Status : FAILED
Error: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:84)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:218)
        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:157)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2454)

attempt_201109220435_0004_r_000000_1: Initializing GlusterFS
11/09/23 01:57:55 INFO mapred.JobClient:  map 99% reduce 0%
11/09/23 01:57:57 INFO mapred.JobClient: Task Id : attempt_201109220435_0004_m_000133_0, Status : FAILED
attempt_201109220435_0004_m_000133_0: Initializing GlusterFS
11/09/23 01:58:09 INFO mapred.JobClient:  map 100% reduce 0%
11/09/23 01:58:16 INFO mapred.JobClient:  map 100% reduce 1%
11/09/23 01:58:19 INFO mapred.JobClient:  map 100% reduce 2%
11/09/23 01:58:22 INFO mapred.JobClient:  map 100% reduce 3%


I will attach the complete set of backtrace it threw.

Comment 2 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.