Hide Forgot
I ran 'terasort' after generating data sets using 'teragen' during the map jobs I get following warning. 11/08/10 04:07:22 INFO mapred.JobClient: Task Id : attempt_201108090104_0044_m_000009_0, Status : FAILED java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) 11/08/10 04:07:22 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000009_0&filter=stdout 11/08/10 04:07:22 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000009_0&filter=stderr 11/08/10 04:07:23 INFO mapred.JobClient: map 3% reduce 0% 11/08/10 04:07:27 INFO mapred.JobClient: map 4% reduce 0% 11/08/10 04:07:31 INFO mapred.JobClient: map 5% reduce 0% 11/08/10 04:07:31 INFO mapred.JobClient: Task Id : attempt_201108090104_0044_m_000014_0, Status : FAILED java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) 11/08/10 04:07:31 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000014_0&filter=stdout 11/08/10 04:07:31 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000014_0&filter=stderr 11/08/10 04:07:34 INFO mapred.JobClient: Task Id : attempt_201108090104_0044_m_000015_0, Status : FAILED java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) java.io.IOException: File: glusterfs://ubuntu1:9000/mnt/glusterfs/ter-inp/_partition.lst#_partition.lst has changed on HDFS since job started at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:484) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:172) 11/08/10 04:07:34 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000015_0&filter=stdout 11/08/10 04:07:34 WARN mapred.JobClient: Error reading task outputhttp://ubuntu2:50060/tasklog?plaintext=true&taskid=attempt_201108090104_0044_m_000015_0&filter=stderr Though I get warnings, job seems to run fine.
Even the job fails with following backtrace... 11/08/10 10:48:04 WARN mapred.JobClient: Error reading task outputhttp://ubuntu3:50060/tasklog?plaintext=true&taskid=attempt_201108101044_0001_r_000001_0&filter=stdout 11/08/10 10:48:04 WARN mapred.JobClient: Error reading task outputhttp://ubuntu3:50060/tasklog?plaintext=true&taskid=attempt_201108101044_0001_r_000001_0&filter=stderr 11/08/10 10:48:08 INFO mapred.JobClient: Job complete: job_201108101044_0001 11/08/10 10:48:08 INFO mapred.JobClient: Counters: 14 11/08/10 10:48:08 INFO mapred.JobClient: Job Counters 11/08/10 10:48:08 INFO mapred.JobClient: Launched reduce tasks=2 11/08/10 10:48:08 INFO mapred.JobClient: Rack-local map tasks=15 11/08/10 10:48:08 INFO mapred.JobClient: Launched map tasks=41 11/08/10 10:48:08 INFO mapred.JobClient: Data-local map tasks=26 11/08/10 10:48:08 INFO mapred.JobClient: Failed map tasks=1 11/08/10 10:48:08 INFO mapred.JobClient: FileSystemCounters 11/08/10 10:48:08 INFO mapred.JobClient: FILE_BYTES_READ=547609080 11/08/10 10:48:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1095217136 11/08/10 10:48:08 INFO mapred.JobClient: Map-Reduce Framework 11/08/10 10:48:08 INFO mapred.JobClient: Combine output records=0 11/08/10 10:48:08 INFO mapred.JobClient: Map input records=5368710 11/08/10 10:48:08 INFO mapred.JobClient: Spilled Records=10737420 11/08/10 10:48:08 INFO mapred.JobClient: Map output bytes=536871000 11/08/10 10:48:08 INFO mapred.JobClient: Map input bytes=536871000 11/08/10 10:48:08 INFO mapred.JobClient: Combine input records=0 11/08/10 10:48:08 INFO mapred.JobClient: Map output records=5368710 java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:248) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:257) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
the warnings related to file timstamps may be due to inconsistency in the clock time of nodes in the cluster. I see ~5-10 minutes difference in clock in the server. I will sync the clock and give a run again. Also the exceptions are warnings - the job runs to completion for me and teravalidate gives positive output for the run. Thanks, -Venky
timestamp of the _partition.lst file on master and slave .. for i in {21,22,23,24,33,34,35,36}; do echo "Doing 10.1.11.$i"; ssh 10.1.11.$i ' ls -l /mnt/glusterfs/gen-dir/_partition.lst';done Doing 10.1.11.21 -rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst Doing 10.1.11.22 -rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst Doing 10.1.11.23 -rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst Doing 10.1.11.24 -rw-r--r-- 1 root root 129 2011-08-17 00:14 /mnt/glusterfs/gen-dir/_partition.lst Doing 10.1.11.33 -rw-r--r-- 1 root root 129 2011-08-17 00:14 /mnt/glusterfs/gen-dir/_partition.lst Doing 10.1.11.34 -rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst Doing 10.1.11.35 -rw-r--r-- 1 root root 129 2011-08-17 00:10 /mnt/glusterfs/gen-dir/_partition.lst Doing 10.1.11.36 -rw-r--r-- 1 root root 129 2011-08-17 00:14 /mnt/glusterfs/gen-dir/_partition.lst As you see the timestamp of the same file on the hosts differs on the FUSE mount. This cause the hadoop job to assume that the file has changed and throws an Ex eption regarding the same
after syncing the clocks with ntpd - the job runs fine without any warnings. Closing the bug. re-open if you see any more.
CHANGE: http://review.gluster.com/241 (Change-Id: I71bc8a477155cff62c3218fee285a22a44aa54ba) merged in release-3.2 by Anand Avati (avati)
The above patch is for the bug 765116 and not this.
I see this issue only in the ubuntu vm's which have few seconds of time difference. But with AWS machines and bare metals I'm not seeing this issue. So moving to verified state.