| Summary: | 'terasort' after 'teragen' gives lot warnings | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | M S Vishwanath Bhat <vbhat> |
| Component: | HDFS | Assignee: | Venky Shankar <vshankar> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | pre-release | CC: | gluster-bugs, mzywusko, rabhat |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | hadoop-glusterfs | Category: | --- |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Even the job fails with following backtrace...
11/08/10 10:48:04 WARN mapred.JobClient: Error reading task outputhttp://ubuntu3:50060/tasklog?plaintext=true&taskid=attempt_201108101044_0001_r_000001_0&filter=stdout
11/08/10 10:48:04 WARN mapred.JobClient: Error reading task outputhttp://ubuntu3:50060/tasklog?plaintext=true&taskid=attempt_201108101044_0001_r_000001_0&filter=stderr
11/08/10 10:48:08 INFO mapred.JobClient: Job complete: job_201108101044_0001
11/08/10 10:48:08 INFO mapred.JobClient: Counters: 14
11/08/10 10:48:08 INFO mapred.JobClient: Job Counters
11/08/10 10:48:08 INFO mapred.JobClient: Launched reduce tasks=2
11/08/10 10:48:08 INFO mapred.JobClient: Rack-local map tasks=15
11/08/10 10:48:08 INFO mapred.JobClient: Launched map tasks=41
11/08/10 10:48:08 INFO mapred.JobClient: Data-local map tasks=26
11/08/10 10:48:08 INFO mapred.JobClient: Failed map tasks=1
11/08/10 10:48:08 INFO mapred.JobClient: FileSystemCounters
11/08/10 10:48:08 INFO mapred.JobClient: FILE_BYTES_READ=547609080
11/08/10 10:48:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1095217136
11/08/10 10:48:08 INFO mapred.JobClient: Map-Reduce Framework
11/08/10 10:48:08 INFO mapred.JobClient: Combine output records=0
11/08/10 10:48:08 INFO mapred.JobClient: Map input records=5368710
11/08/10 10:48:08 INFO mapred.JobClient: Spilled Records=10737420
11/08/10 10:48:08 INFO mapred.JobClient: Map output bytes=536871000
11/08/10 10:48:08 INFO mapred.JobClient: Map input bytes=536871000
11/08/10 10:48:08 INFO mapred.JobClient: Combine input records=0
11/08/10 10:48:08 INFO mapred.JobClient: Map output records=5368710
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:248)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
the warnings related to file timstamps may be due to inconsistency in the clock time of nodes in the cluster. I see ~5-10 minutes difference in clock in the server. I will sync the clock and give a run again. Also the exceptions are warnings - the job runs to completion for me and teravalidate gives positive output for the run. Thanks, -Venky timestamp of the _partition.lst file on master and slave ..
for i in {21,22,23,24,33,34,35,36}; do echo "Doing 10.1.11.$i"; ssh 10.1.11.$i ' ls -l /mnt/glusterfs/gen-dir/_partition.lst';done
Doing 10.1.11.21
-rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst
Doing 10.1.11.22
-rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst
Doing 10.1.11.23
-rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst
Doing 10.1.11.24
-rw-r--r-- 1 root root 129 2011-08-17 00:14 /mnt/glusterfs/gen-dir/_partition.lst
Doing 10.1.11.33
-rw-r--r-- 1 root root 129 2011-08-17 00:14 /mnt/glusterfs/gen-dir/_partition.lst
Doing 10.1.11.34
-rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst
Doing 10.1.11.35
-rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst
Doing 10.1.11.36
-rw-r--r-- 1 root root 129 2011-08-17 00:14 /mnt/glusterfs/gen-dir/_partition.lst
As you see the timestamp of the same file on the hosts differs on the FUSE mount.
This cause the hadoop job to assume that the file has changed and throws an Ex eption regarding the same
after syncing the clocks with ntpd - the job runs fine without any warnings. Closing the bug. re-open if you see any more. CHANGE: http://review.gluster.com/241 (Change-Id: I71bc8a477155cff62c3218fee285a22a44aa54ba) merged in release-3.2 by Anand Avati (avati) The above patch is for the bug 765116 and not this. I see this issue only in the ubuntu vm's which have few seconds of time difference. But with AWS machines and bare metals I'm not seeing this issue. So moving to verified state. |
I ran 'terasort' after generating data sets using 'teragen' during the map jobs I get following warning. 11/08/10 04:07:22 INFO mapred.JobClient: Task Id : attempt_201108090104_0044_m_000009_0, Status : FAILED java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) 11/08/10 04:07:22 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000009_0&filter=stdout 11/08/10 04:07:22 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000009_0&filter=stderr 11/08/10 04:07:23 INFO mapred.JobClient: map 3% reduce 0% 11/08/10 04:07:27 INFO mapred.JobClient: map 4% reduce 0% 11/08/10 04:07:31 INFO mapred.JobClient: map 5% reduce 0% 11/08/10 04:07:31 INFO mapred.JobClient: Task Id : attempt_201108090104_0044_m_000014_0, Status : FAILED java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) 11/08/10 04:07:31 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000014_0&filter=stdout 11/08/10 04:07:31 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000014_0&filter=stderr 11/08/10 04:07:34 INFO mapred.JobClient: Task Id : attempt_201108090104_0044_m_000015_0, Status : FAILED java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) 11/08/10 04:07:34 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000015_0&filter=stdout 11/08/10 04:07:34 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000015_0&filter=stderr Though I get warnings, job seems to run fine.