Bug 1470836 - Hadoop S3A 2.7.3 - distcp File bigger than ~5G - after bz#1470301 patch, upload goes in loop and never finishes RGW hasERROR: client_io->complete_request() returned -5
Summary: Hadoop S3A 2.7.3 - distcp File bigger than ~5G - after bz#1470301 patch, uplo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 2.3
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: 2.4
Assignee: Marcus Watts
QA Contact: ceph-qe-bugs
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1473436 1479701
TreeView+ depends on / blocked
 
Reported: 2017-07-13 19:01 UTC by Vikhyat Umrao
Modified: 2020-09-10 10:55 UTC (History)
13 users (show)

Fixed In Version: RHEL: ceph-10.2.7-36.el7cp Ubuntu: ceph_10.2.7-35redhat1
Doc Type: Bug Fix
Doc Text:
.Fixed incorrect handling of headers containing the plus character Incorrect handling of the plus character ("+") in Amazon Web Services (AWS) version 4 canonical headers caused unexpected authentication failures when operating on such objects. As a consequence, some operations, such as Hadoop Amazon Simple Storage Services (S3A) distributed copy (DistCp), failed unexpectedly. This update ensures that the plus character is escaped as required, and affected operations no longer fail.
Clone Of:
Environment:
Last Closed: 2017-10-17 18:12:51 UTC
Embargoed:


Attachments (Terms of Use)
RGW node1 - dell-per630-12.log with debug_rgw=20 and debug_ms=1 , ceph version -10.2.7-28.0.TEST.bz1470301.el7cp (1.28 MB, application/x-bzip)
2017-07-13 19:50 UTC, Vikhyat Umrao
no flags Details
RGW node2 - dell-per630-13.log with debug_rgw=20 and debug_ms=1 , ceph version -10.2.7-28.0.TEST.bz1470301.el7cp (1.27 MB, application/x-bzip)
2017-07-13 19:51 UTC, Vikhyat Umrao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 20463 0 None None None 2017-07-21 19:51:25 UTC
Red Hat Bugzilla 1470301 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Knowledge Base (Solution) 3137011 0 None None None 2017-08-02 21:30:04 UTC
Red Hat Product Errata RHBA-2017:2903 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.4 enhancement and bug fix update 2017-10-17 22:12:30 UTC

Internal Links: 1470301

Description Vikhyat Umrao 2017-07-13 19:01:01 UTC
Description of problem:
Hadoop S3A 2.7.3- distcp File bigger than ~5G - after bz#1470301 patch, upload goes in loop and never finishes RGW hasERROR: client_io->complete_request() returned -5 

Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 2.3
Test package from bz#1470301
10.2.7-28.0.TEST.bz1470301.el7cp

# hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /root/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar


How reproducible:
Always

Steps to Reproduce:
1.Configure Hadoop S3A with
2.Use hadoop distcp for uploading file above 6GB I tried 7G and 8G both failed.
3. Size below 5G works fine.

Comment 1 Vikhyat Umrao 2017-07-13 19:16:46 UTC
1. Files above than 5G size are not getting uploaded from Hadoop.

It started from: **Unable to copy part: Status Code: 404** and the version was 2.3 - 10.2.7-27.el7cp.

---
17/07/13 18:22:07 ERROR util.RetriableCommand: Failure in Retriable command: Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G
com.amazonaws.AmazonClientException: Unable to copy part: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: tx000000000000000000106-0059676cf1-6dbb4-default, AWS Error Code: NoSuchKey, AWS Error Message: null
---


[1] 4G and 5G success and 7G Failed Test Results:
==================================================

1. 7G file:
=====================


# hadoop distcp /root/testhadoop.7G s3a://test-bucket/
17/07/13 18:14:59 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/root/testhadoop.7G], targetPath=s3a://test-bucket/, targetPathExists=true, preserveRawXattrs=false}
17/07/13 18:14:59 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/07/13 18:14:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/07/13 18:15:00 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
17/07/13 18:15:00 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
17/07/13 18:15:00 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
17/07/13 18:15:00 INFO mapreduce.JobSubmitter: number of splits:1
17/07/13 18:15:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local375793309_0001
17/07/13 18:15:00 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/07/13 18:15:00 INFO tools.DistCp: DistCp job-id: job_local375793309_0001
17/07/13 18:15:00 INFO mapreduce.Job: Running job: job_local375793309_0001
17/07/13 18:15:00 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/07/13 18:15:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 18:15:00 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.tools.mapred.CopyCommitter
17/07/13 18:15:00 INFO mapred.LocalJobRunner: Waiting for map tasks
17/07/13 18:15:00 INFO mapred.LocalJobRunner: Starting task: attempt_local375793309_0001_m_000000_0
17/07/13 18:15:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 18:15:00 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
17/07/13 18:15:00 INFO mapred.MapTask: Processing split: file:/tmp/hadoop-root/mapred/staging/root680833703/.staging/_distcp-2118402004/fileList.seq:0+199
17/07/13 18:15:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 18:15:00 INFO mapred.CopyMapper: Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G
17/07/13 18:15:00 INFO mapred.RetriableFileCopyCommand: Creating temp file: s3a://test-bucket/.distcp.tmp.attempt_local375793309_0001_m_000000_0
17/07/13 18:15:01 INFO mapreduce.Job: Job job_local375793309_0001 running in uber mode : false
17/07/13 18:15:01 INFO mapreduce.Job:  map 0% reduce 0%
17/07/13 18:15:06 INFO mapred.LocalJobRunner: 5.5% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [395.7M/7.0G] > map
17/07/13 18:15:07 INFO mapreduce.Job:  map 100% reduce 0%
17/07/13 18:15:09 INFO mapred.LocalJobRunner: 7.3% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [519.9M/7.0G] > map
17/07/13 18:15:12 INFO mapred.LocalJobRunner: 7.8% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [560.4M/7.0G] > map
17/07/13 18:15:15 INFO mapred.LocalJobRunner: 11.7% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [836.4M/7.0G] > map
17/07/13 18:15:18 INFO mapred.LocalJobRunner: 13.6% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [971.9M/7.0G] > map
17/07/13 18:15:21 INFO mapred.LocalJobRunner: 15.8% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [1.1G/7.0G] > map
17/07/13 18:15:24 INFO mapred.LocalJobRunner: 16.8% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [1.2G/7.0G] > map
17/07/13 18:15:27 INFO mapred.LocalJobRunner: 18.8% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [1.3G/7.0G] > map

[.....]

17/07/13 18:22:03 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [7.0G/7.0G] > map
17/07/13 18:22:06 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [7.0G/7.0G] > map
17/07/13 18:22:07 ERROR util.RetriableCommand: Failure in Retriable command: Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G
com.amazonaws.AmazonClientException: Unable to copy part: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: tx000000000000000000106-0059676cf1-6dbb4-default, AWS Error Code: NoSuchKey, AWS Error Message: null
	at com.amazonaws.services.s3.transfer.internal.CopyMonitor.collectPartETags(CopyMonitor.java:252)
	at com.amazonaws.services.s3.transfer.internal.CopyMonitor.completeMultipartUpload(CopyMonitor.java:227)
	at com.amazonaws.services.s3.transfer.internal.CopyMonitor.poll(CopyMonitor.java:171)
	at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:136)
	at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: tx000000000000000000106-0059676cf1-6dbb4-default, AWS Error Code: NoSuchKey, AWS Error Message: null, S3 Extended Request ID: 6dbb4-default-default
	at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
	at com.amazonaws.services.s3.AmazonS3Client.copyPart(AmazonS3Client.java:1638)
	at com.amazonaws.services.s3.transfer.internal.CopyPartCallable.call(CopyPartCallable.java:41)
	at com.amazonaws.services.s3.transfer.internal.CopyPartCallable.call(CopyPartCallable.java:28)
	... 4 more
17/07/13 18:22:09 INFO mapred.RetriableFileCopyCommand: Creating temp file: s3a://test-bucket/.distcp.tmp.attempt_local375793309_0001_m_000000_0
17/07/13 18:22:12 INFO mapred.LocalJobRunner: 3.6% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [256.2M/7.0G] > map
17/07/13 18:22:15 INFO mapred.LocalJobRunner: 6.3% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [452.6M/7.0G] > map
17/07/13 18:22:18 INFO mapred.LocalJobRunner: 7.4% Copying file:/root/testhadoop.7G to s3a://test-bucket/testhadoop.7G [527.9M/7.0G] > map


2. 4G file:
=====================


# bin/hadoop distcp /root/testhadoop.4g s3a://test-bucket/
17/07/13 07:40:30 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/root/testhadoop.4g], targetPath=s3a://test-bucket/, targetPathExists=true, preserveRawXattrs=false}
17/07/13 07:40:30 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/07/13 07:40:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/07/13 07:40:30 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
17/07/13 07:40:30 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
17/07/13 07:40:30 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
17/07/13 07:40:30 INFO mapreduce.JobSubmitter: number of splits:1
17/07/13 07:40:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1614843136_0001
17/07/13 07:40:30 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/07/13 07:40:30 INFO tools.DistCp: DistCp job-id: job_local1614843136_0001
17/07/13 07:40:30 INFO mapreduce.Job: Running job: job_local1614843136_0001
17/07/13 07:40:30 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/07/13 07:40:30 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 07:40:30 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.tools.mapred.CopyCommitter
17/07/13 07:40:30 INFO mapred.LocalJobRunner: Waiting for map tasks
17/07/13 07:40:30 INFO mapred.LocalJobRunner: Starting task: attempt_local1614843136_0001_m_000000_0
17/07/13 07:40:30 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 07:40:30 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
17/07/13 07:40:30 INFO mapred.MapTask: Processing split: file:/tmp/hadoop-root/mapred/staging/root1675888413/.staging/_distcp1070895476/fileList.seq:0+199
17/07/13 07:40:30 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 07:40:30 INFO mapred.CopyMapper: Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g
17/07/13 07:40:31 INFO mapred.RetriableFileCopyCommand: Creating temp file: s3a://test-bucket/.distcp.tmp.attempt_local1614843136_0001_m_000000_0
17/07/13 07:40:31 INFO mapreduce.Job: Job job_local1614843136_0001 running in uber mode : false
17/07/13 07:40:31 INFO mapreduce.Job:  map 0% reduce 0%
17/07/13 07:40:36 INFO mapred.LocalJobRunner: 10.0% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [411.4M/4.0G] > map
17/07/13 07:40:37 INFO mapreduce.Job:  map 100% reduce 0%
17/07/13 07:40:39 INFO mapred.LocalJobRunner: 13.0% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [533.9M/4.0G] > map
17/07/13 07:40:42 INFO mapred.LocalJobRunner: 16.1% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [659.4M/4.0G] > map
17/07/13 07:40:45 INFO mapred.LocalJobRunner: 18.4% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [753.7M/4.0G] > map
17/07/13 07:40:48 INFO mapred.LocalJobRunner: 20.0% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [820.1M/4.0G] > map

[...........]

17/07/13 07:41:54 INFO mapred.LocalJobRunner: 99.8% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [4.0G/4.0G] > map
17/07/13 07:41:57 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [4.0G/4.0G] > map
[........]

17/07/13 07:44:22 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [4.0G/4.0G] > map
17/07/13 07:44:23 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [4.0G/4.0G] > map
17/07/13 07:44:23 INFO mapred.Task: Task:attempt_local1614843136_0001_m_000000_0 is done. And is in the process of committing
17/07/13 07:44:23 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [4.0G/4.0G] > map
17/07/13 07:44:23 INFO mapred.Task: Task attempt_local1614843136_0001_m_000000_0 is allowed to commit now
17/07/13 07:44:23 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1614843136_0001_m_000000_0' to file:/tmp/hadoop-root/mapred/staging/root1675888413/.staging/_distcp1070895476/_logs/_temporary/0/task_local1614843136_0001_m_000000
17/07/13 07:44:23 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.4g to s3a://test-bucket/testhadoop.4g [4.0G/4.0G]
17/07/13 07:44:23 INFO mapred.Task: Task 'attempt_local1614843136_0001_m_000000_0' done.
17/07/13 07:44:23 INFO mapred.LocalJobRunner: Finishing task: attempt_local1614843136_0001_m_000000_0
17/07/13 07:44:23 INFO mapred.LocalJobRunner: map task executor complete.
17/07/13 07:44:23 WARN mapred.CopyCommitter: Unable to cleanup temp files
java.lang.NullPointerException
	at org.apache.hadoop.fs.Path.<init>(Path.java:104)
	at org.apache.hadoop.fs.Path.<init>(Path.java:93)
	at org.apache.hadoop.tools.mapred.CopyCommitter.deleteAttemptTempFiles(CopyCommitter.java:136)
	at org.apache.hadoop.tools.mapred.CopyCommitter.cleanupTempFiles(CopyCommitter.java:126)
	at org.apache.hadoop.tools.mapred.CopyCommitter.commitJob(CopyCommitter.java:83)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:537)
17/07/13 07:44:23 INFO mapred.CopyCommitter: Cleaning up temporary work folder: file:/tmp/hadoop-root/mapred/staging/root1675888413/.staging/_distcp1070895476
17/07/13 07:44:23 INFO mapreduce.Job: Job job_local1614843136_0001 completed successfully
17/07/13 07:44:23 INFO mapreduce.Job: Counters: 23
	File System Counters
		FILE: Number of bytes read=4295096718
		FILE: Number of bytes written=418550
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		S3A: Number of bytes read=0
		S3A: Number of bytes written=4294967296
		S3A: Number of read operations=14
		S3A: Number of large read operations=0
		S3A: Number of write operations=44
	Map-Reduce Framework
		Map input records=1
		Map output records=0
		Input split bytes=156
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=18
		Total committed heap usage (bytes)=249561088
	File Input Format Counters 
		Bytes Read=231
	File Output Format Counters 
		Bytes Written=8
	org.apache.hadoop.tools.mapred.CopyMapper$Counter
		BYTESCOPIED=4294967296
		BYTESEXPECTED=4294967296
		COPY=1





3. 5G file:
=====================

# bin/hadoop distcp /root/testhadoop.5g s3a://test-bucket/
17/07/13 07:47:52 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/root/testhadoop.5g], targetPath=s3a://test-bucket/, targetPathExists=true, preserveRawXattrs=false}
17/07/13 07:47:52 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/07/13 07:47:52 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/07/13 07:47:52 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
17/07/13 07:47:52 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
17/07/13 07:47:52 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
17/07/13 07:47:52 INFO mapreduce.JobSubmitter: number of splits:1
17/07/13 07:47:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local389646759_0001
17/07/13 07:47:52 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/07/13 07:47:52 INFO tools.DistCp: DistCp job-id: job_local389646759_0001
17/07/13 07:47:52 INFO mapreduce.Job: Running job: job_local389646759_0001
17/07/13 07:47:52 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/07/13 07:47:52 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 07:47:52 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.tools.mapred.CopyCommitter
17/07/13 07:47:52 INFO mapred.LocalJobRunner: Waiting for map tasks
17/07/13 07:47:52 INFO mapred.LocalJobRunner: Starting task: attempt_local389646759_0001_m_000000_0
17/07/13 07:47:53 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 07:47:53 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
17/07/13 07:47:53 INFO mapred.MapTask: Processing split: file:/tmp/hadoop-root/mapred/staging/root1638366799/.staging/_distcp-524910745/fileList.seq:0+199
17/07/13 07:47:53 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 07:47:53 INFO mapred.CopyMapper: Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g
17/07/13 07:47:53 INFO mapred.RetriableFileCopyCommand: Creating temp file: s3a://test-bucket/.distcp.tmp.attempt_local389646759_0001_m_000000_0
17/07/13 07:47:53 INFO mapreduce.Job: Job job_local389646759_0001 running in uber mode : false
17/07/13 07:47:53 INFO mapreduce.Job:  map 0% reduce 0%
17/07/13 07:47:59 INFO mapred.LocalJobRunner: 8.9% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [454.1M/5.0G] > map
17/07/13 07:47:59 INFO mapreduce.Job:  map 100% reduce 0%
17/07/13 07:48:02 INFO mapred.LocalJobRunner: 10.6% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [543.6M/5.0G] > map
17/07/13 07:48:05 INFO mapred.LocalJobRunner: 12.6% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [644.8M/5.0G] > map
17/07/13 07:48:08 INFO mapred.LocalJobRunner: 17.7% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [907.8M/5.0G] > map

[........]

17/07/13 07:49:29 INFO mapred.LocalJobRunner: 92.6% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [4.6G/5.0G] > map
17/07/13 07:49:32 INFO mapred.LocalJobRunner: 95.3% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [4.8G/5.0G] > map
17/07/13 07:49:35 INFO mapred.LocalJobRunner: 97.9% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [4.9G/5.0G] > map
17/07/13 07:49:38 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [5.0G/5.0G] > map
17/07/13 07:49:41 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [5.0G/5.0G] > map
17/07/13 07:49:44 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [5.0G/5.0G] > map

[.........]


17/07/13 07:52:38 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [5.0G/5.0G] > map
17/07/13 07:52:40 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [5.0G/5.0G] > map
17/07/13 07:52:40 INFO mapred.Task: Task:attempt_local389646759_0001_m_000000_0 is done. And is in the process of committing
17/07/13 07:52:40 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [5.0G/5.0G] > map
17/07/13 07:52:40 INFO mapred.Task: Task attempt_local389646759_0001_m_000000_0 is allowed to commit now
17/07/13 07:52:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local389646759_0001_m_000000_0' to file:/tmp/hadoop-root/mapred/staging/root1638366799/.staging/_distcp-524910745/_logs/_temporary/0/task_local389646759_0001_m_000000
17/07/13 07:52:40 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.5g to s3a://test-bucket/testhadoop.5g [5.0G/5.0G]
17/07/13 07:52:40 INFO mapred.Task: Task 'attempt_local389646759_0001_m_000000_0' done.
17/07/13 07:52:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local389646759_0001_m_000000_0
17/07/13 07:52:40 INFO mapred.LocalJobRunner: map task executor complete.
17/07/13 07:52:40 WARN mapred.CopyCommitter: Unable to cleanup temp files
java.lang.NullPointerException
	at org.apache.hadoop.fs.Path.<init>(Path.java:104)
	at org.apache.hadoop.fs.Path.<init>(Path.java:93)
	at org.apache.hadoop.tools.mapred.CopyCommitter.deleteAttemptTempFiles(CopyCommitter.java:136)
	at org.apache.hadoop.tools.mapred.CopyCommitter.cleanupTempFiles(CopyCommitter.java:126)
	at org.apache.hadoop.tools.mapred.CopyCommitter.commitJob(CopyCommitter.java:83)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:537)
17/07/13 07:52:40 INFO mapred.CopyCommitter: Cleaning up temporary work folder: file:/tmp/hadoop-root/mapred/staging/root1638366799/.staging/_distcp-524910745
17/07/13 07:52:41 INFO mapreduce.Job: Job job_local389646759_0001 completed successfully
17/07/13 07:52:41 INFO mapreduce.Job: Counters: 23
	File System Counters
		FILE: Number of bytes read=5368838542
		FILE: Number of bytes written=416986
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		S3A: Number of bytes read=0
		S3A: Number of bytes written=5368709120
		S3A: Number of read operations=14
		S3A: Number of large read operations=0
		S3A: Number of write operations=55
	Map-Reduce Framework
		Map input records=1
		Map output records=0
		Input split bytes=156
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=27
		Total committed heap usage (bytes)=247463936
	File Input Format Counters 
		Bytes Read=231
	File Output Format Counters 
		Bytes Written=8
	org.apache.hadoop.tools.mapred.CopyMapper$Counter
		BYTESCOPIED=5368709120
		BYTESEXPECTED=5368709120
		COPY=1




$ s3cmd ls s3://test-bucket
[....]
2017-07-13 01:59      6198   s3://test-bucket/rgw.bucket.list
2017-07-13 02:14 4294967296   s3://test-bucket/testhadoop.4g
2017-07-13 02:22 5368709120   s3://test-bucket/testhadoop.5g

Comment 2 Vikhyat Umrao 2017-07-13 19:19:14 UTC
> Default RGW logs for 7G test with version 2.3 -  10.2.7-27.el7cp.

2017-07-13 18:29:07.573165 7fcdcf7e6700  1 ====== req done req=0x7fcdcf7e0710 op status=0 http_status=200 ======
2017-07-13 18:29:07.573217 7fcdcf7e6700  1 civetweb: 0x7fce54000bb0: 10.74.128.32 - - [13/Jul/2017:18:29:07 +0530] "POST /test-bucket/.distcp.tmp.attempt_local375793309_0001_m_000000_0 HTTP/1.1" 200 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.291593 7fcdccfe1700  1 ====== starting new request req=0x7fcdccfdb710 =====
2017-07-13 18:29:08.292490 7fcdccfe1700  1 ====== req done req=0x7fcdccfdb710 op status=0 http_status=404 ======
2017-07-13 18:29:08.292533 7fcdccfe1700  1 civetweb: 0x7fce74003a80: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "HEAD /test-bucket/testhadoop.7G/ HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.300035 7fcdcd7e2700  1 ====== starting new request req=0x7fcdcd7dc710 =====
2017-07-13 18:29:08.301197 7fcdcd7e2700  1 ====== req done req=0x7fcdcd7dc710 op status=0 http_status=200 ======
2017-07-13 18:29:08.301232 7fcdcd7e2700  1 civetweb: 0x7fce6c003b80: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "GET /test-bucket/ HTTP/1.1" 200 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.308735 7fcdcb7de700  1 ====== starting new request req=0x7fcdcb7d8710 =====
2017-07-13 18:29:08.309503 7fcdcb7de700  1 ====== req done req=0x7fcdcb7d8710 op status=0 http_status=404 ======
2017-07-13 18:29:08.309539 7fcdcb7de700  1 civetweb: 0x7fce880074c0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "HEAD /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.315145 7fcdc97da700  1 ====== starting new request req=0x7fcdc97d4710 =====
2017-07-13 18:29:08.316262 7fcdc97da700  1 ====== req done req=0x7fcdc97d4710 op status=0 http_status=200 ======
2017-07-13 18:29:08.316298 7fcdc97da700  1 civetweb: 0x7fce980074c0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "GET /test-bucket/ HTTP/1.1" 200 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.323168 7fcdcbfdf700  1 ====== starting new request req=0x7fcdcbfd9710 =====
2017-07-13 18:29:08.324512 7fcdcbfdf700  1 ====== req done req=0x7fcdcbfd9710 op status=0 http_status=200 ======
2017-07-13 18:29:08.324551 7fcdcbfdf700  1 civetweb: 0x7fce7c003300: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "HEAD /test-bucket/.distcp.tmp.attempt_local375793309_0001_m_000000_0 HTTP/1.1" 200 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.355642 7fcdcafdd700  1 ====== starting new request req=0x7fcdcafd7710 =====
2017-07-13 18:29:08.355672 7fcdc9fdb700  1 ====== starting new request req=0x7fcdc9fd5710 =====
2017-07-13 18:29:08.355687 7fcdc4fd1700  1 ====== starting new request req=0x7fcdc4fcb710 =====
2017-07-13 18:29:08.355700 7fcdc47d0700  1 ====== starting new request req=0x7fcdc47ca710 =====
2017-07-13 18:29:08.355709 7fcdc57d2700  1 ====== starting new request req=0x7fcdc57cc710 =====
2017-07-13 18:29:08.355897 7fcdc8fd9700  1 ====== starting new request req=0x7fcdc8fd3710 =====
2017-07-13 18:29:08.355949 7fcdc87d8700  1 ====== starting new request req=0x7fcdc87d2710 =====
2017-07-13 18:29:08.356689 7fcdcafdd700  1 ====== req done req=0x7fcdcafd7710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.356727 7fcdcafdd700  1 civetweb: 0x7fce84007530: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.356787 7fcdc57d2700  1 ====== req done req=0x7fcdc57cc710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.356826 7fcdc57d2700  1 civetweb: 0x7fcde0000ea0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.357093 7fcdc47d0700  1 ====== req done req=0x7fcdc47ca710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.357169 7fcdc47d0700  1 civetweb: 0x7fcde80050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.357180 7fcdc9fdb700  1 ====== req done req=0x7fcdc9fd5710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.357211 7fcdc9fdb700  1 civetweb: 0x7fce8c007c10: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.357239 7fcdc4fd1700  1 ====== req done req=0x7fcdc4fcb710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.357277 7fcdc4fd1700  1 civetweb: 0x7fcddc000ea0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.357404 7fcdc87d8700  1 ====== req done req=0x7fcdc87d2710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.357440 7fcdc87d8700  1 civetweb: 0x7fcea0005910: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.357531 7fcdc8fd9700  1 ====== req done req=0x7fcdc8fd3710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.357563 7fcdc8fd9700  1 civetweb: 0x7fce940009b0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.362042 7fcdc77d6700  1 ====== starting new request req=0x7fcdc77d0710 =====
2017-07-13 18:29:08.362877 7fcdc77d6700  1 ====== req done req=0x7fcdc77d0710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.362913 7fcdc77d6700  1 civetweb: 0x7fcea40064a0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.365419 7fcdc7fd7700  1 ====== starting new request req=0x7fcdc7fd1710 =====
2017-07-13 18:29:08.366235 7fcdc7fd7700  1 ====== req done req=0x7fcdc7fd1710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.366270 7fcdc7fd7700  1 civetweb: 0x7fce9c01eb70: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.367322 7fcdbffc7700  1 ====== starting new request req=0x7fcdbffc1710 =====
2017-07-13 18:29:08.368166 7fcdbffc7700  1 ====== req done req=0x7fcdbffc1710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.368202 7fcdbffc7700  1 civetweb: 0x7fce04003f70: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.368463 7fcdc6fd5700  1 ====== starting new request req=0x7fcdc6fcf710 =====
2017-07-13 18:29:08.369240 7fcdc6fd5700  1 ====== req done req=0x7fcdc6fcf710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.369276 7fcdc6fd5700  1 civetweb: 0x7fceac0009b0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.371327 7fcdc67d4700  1 ====== starting new request req=0x7fcdc67ce710 =====
2017-07-13 18:29:08.371462 7fcdc3fcf700  1 ====== starting new request req=0x7fcdc3fc9710 =====
2017-07-13 18:29:08.372086 7fcdc67d4700  1 ====== req done req=0x7fcdc67ce710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.372131 7fcdc67d4700  1 civetweb: 0x7fceb4000bd0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.372307 7fcdc3fcf700  1 ====== req done req=0x7fcdc3fc9710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.372345 7fcdc3fcf700  1 civetweb: 0x7fcde40050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.372444 7fcdc5fd3700  1 ====== starting new request req=0x7fcdc5fcd710 =====
2017-07-13 18:29:08.373164 7fcdc5fd3700  1 ====== req done req=0x7fcdc5fcd710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.373200 7fcdc5fd3700  1 civetweb: 0x7fced3ee4860: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.378685 7fcdc17ca700  1 ====== starting new request req=0x7fcdc17c4710 =====
2017-07-13 18:29:08.379524 7fcdc17ca700  1 ====== req done req=0x7fcdc17c4710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.379568 7fcdc17ca700  1 civetweb: 0x7fce000050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.380330 7fcdc07c8700  1 ====== starting new request req=0x7fcdc07c2710 =====
2017-07-13 18:29:08.381155 7fcdc07c8700  1 ====== req done req=0x7fcdc07c2710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.381193 7fcdc07c8700  1 civetweb: 0x7fce08000bb0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.381462 7fcdc2fcd700  1 ====== starting new request req=0x7fcdc2fc7710 =====
2017-07-13 18:29:08.381911 7fcdc1fcb700  1 ====== starting new request req=0x7fcdc1fc5710 =====
2017-07-13 18:29:08.382264 7fcdc2fcd700  1 ====== req done req=0x7fcdc2fc7710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.382307 7fcdc2fcd700  1 civetweb: 0x7fcdec0050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.382575 7fcdc1fcb700  1 ====== req done req=0x7fcdc1fc5710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.382611 7fcdc1fcb700  1 civetweb: 0x7fcdf40050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.389254 7fcdbf7c6700  1 ====== starting new request req=0x7fcdbf7c0710 =====
2017-07-13 18:29:08.390090 7fcdbf7c6700  1 ====== req done req=0x7fcdbf7c0710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.390140 7fcdbf7c6700  1 civetweb: 0x7fce100050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.391610 7fcdc0fc9700  1 ====== starting new request req=0x7fcdc0fc3710 =====
2017-07-13 18:29:08.392560 7fcdc0fc9700  1 ====== req done req=0x7fcdc0fc3710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.392603 7fcdc0fc9700  1 civetweb: 0x7fcdfc004170: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.393855 7fcdc27cc700  1 ====== starting new request req=0x7fcdc27c6710 =====
2017-07-13 18:29:08.394647 7fcdc27cc700  1 ====== req done req=0x7fcdc27c6710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.394687 7fcdc27cc700  1 civetweb: 0x7fcdf80050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.394921 7fcdc37ce700  1 ====== starting new request req=0x7fcdc37c8710 =====
2017-07-13 18:29:08.395643 7fcdc37ce700  1 ====== req done req=0x7fcdc37c8710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.395679 7fcdc37ce700  1 civetweb: 0x7fcdf00050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.399502 7fcdbdfc3700  1 ====== starting new request req=0x7fcdbdfbd710 =====
2017-07-13 18:29:08.400386 7fcdbdfc3700  1 ====== req done req=0x7fcdbdfbd710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.400391 7fcdbefc5700  1 ====== starting new request req=0x7fcdbefbf710 =====
2017-07-13 18:29:08.400428 7fcdbdfc3700  1 civetweb: 0x7fce140050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.401189 7fcdbefc5700  1 ====== req done req=0x7fcdbefbf710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.401227 7fcdbefc5700  1 civetweb: 0x7fce0c0050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.410795 7fcdbe7c4700  1 ====== starting new request req=0x7fcdbe7be710 =====
2017-07-13 18:29:08.410805 7fcdbd7c2700  1 ====== starting new request req=0x7fcdbd7bc710 =====
2017-07-13 18:29:08.410819 7fcdbcfc1700  1 ====== starting new request req=0x7fcdbcfbb710 =====
2017-07-13 18:29:08.411692 7fcdbcfc1700  1 ====== req done req=0x7fcdbcfbb710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.411728 7fcdbcfc1700  1 civetweb: 0x7fce24004e20: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.411854 7fcdbe7c4700  1 ====== req done req=0x7fcdbe7be710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.411905 7fcdbe7c4700  1 civetweb: 0x7fce180050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.411970 7fcdbd7c2700  1 ====== req done req=0x7fcdbd7bc710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.412000 7fcdbd7c2700  1 civetweb: 0x7fce1c000cf0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.414004 7fcdbc7c0700  1 ====== starting new request req=0x7fcdbc7ba710 =====
2017-07-13 18:29:08.414009 7fcdbbfbf700  1 ====== starting new request req=0x7fcdbbfb9710 =====
2017-07-13 18:29:08.414757 7fcdbbfbf700  1 ====== req done req=0x7fcdbbfb9710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.414793 7fcdbbfbf700  1 civetweb: 0x7fce2c0050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.414943 7fcdbc7c0700  1 ====== req done req=0x7fcdbc7ba710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.414981 7fcdbc7c0700  1 civetweb: 0x7fce200050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.423902 7fcdbafbd700  1 ====== starting new request req=0x7fcdbafb7710 =====
2017-07-13 18:29:08.423911 7fcdb7fb7700  1 ====== starting new request req=0x7fcdb7fb1710 =====
2017-07-13 18:29:08.423927 7fcdbb7be700  1 ====== starting new request req=0x7fcdbb7b8710 =====
2017-07-13 18:29:08.423940 7fcdb9fbb700  1 ====== starting new request req=0x7fcdb9fb5710 =====
2017-07-13 18:29:08.423943 7fcdba7bc700  1 ====== starting new request req=0x7fcdba7b6710 =====
2017-07-13 18:29:08.424875 7fcdb7fb7700  1 ====== req done req=0x7fcdb7fb1710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.424914 7fcdb7fb7700  1 civetweb: 0x7fce440050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.425033 7fcdbafbd700  1 ====== req done req=0x7fcdbafb7710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.425071 7fcdbafbd700  1 civetweb: 0x7fce340050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.425200 7fcdbb7be700  1 ====== req done req=0x7fcdbb7b8710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.425235 7fcdbb7be700  1 civetweb: 0x7fce28002090: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.425415 7fcdb9fbb700  1 ====== req done req=0x7fcdb9fb5710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.425474 7fcdb9fbb700  1 civetweb: 0x7fce380050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.425485 7fcdba7bc700  1 ====== req done req=0x7fcdba7b6710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.425529 7fcdba7bc700  1 civetweb: 0x7fce30001980: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.432289 7fcdb87b8700  1 ====== starting new request req=0x7fcdb87b2710 =====
2017-07-13 18:29:08.432651 7fcdb8fb9700  1 ====== starting new request req=0x7fcdb8fb3710 =====
2017-07-13 18:29:08.433166 7fcdb87b8700  1 ====== req done req=0x7fcdb87b2710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.433205 7fcdb87b8700  1 civetweb: 0x7fce480050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:08.433426 7fcdb8fb9700  1 ====== req done req=0x7fcdb8fb3710 op status=-2 http_status=404 ======
2017-07-13 18:29:08.433466 7fcdb8fb9700  1 civetweb: 0x7fce3c0050f0: 10.74.128.32 - - [13/Jul/2017:18:29:08 +0530] "PUT /test-bucket/testhadoop.7G HTTP/1.1" 404 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:13.352297 7fcdb97ba700  1 ====== starting new request req=0x7fcdb97b4710 =====
2017-07-13 18:29:13.353607 7fcdb97ba700  1 ====== req done req=0x7fcdb97b4710 op status=0 http_status=200 ======
2017-07-13 18:29:13.353650 7fcdb97ba700  1 civetweb: 0x7fce40004e20: 10.74.128.32 - - [13/Jul/2017:18:29:13 +0530] "HEAD /test-bucket/.distcp.tmp.attempt_local375793309_0001_m_000000_0 HTTP/1.1" 200 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
2017-07-13 18:29:13.360350 7fcdb77b6700  1 ====== starting new request req=0x7fcdb77b0710 =====
2017-07-13 18:29:13.370771 7fcdb77b6700  1 ====== req done req=0x7fcdb77b0710 op status=0 http_status=204 ======
2017-07-13 18:29:13.370823 7fcdb77b6700  1 civetweb: 0x7fce500050f0: 10.74.128.32 - - [13/Jul/2017:18:29:13 +0530] "DELETE /test-bucket/.distcp.tmp.attempt_local375793309_0001_m_000000_0 HTTP/1.1" 204 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131
^C

Comment 3 Vikhyat Umrao 2017-07-13 19:22:15 UTC
Then I upgraded my RGW daemons which are in roundrobin with HAPROXY to bz#1470301 test packages and enabled debug_rgw=20 and debug_ms=1 I am getting error:

2017-07-14 00:50:46.452588 7f5fb7faf700  0 ERROR: client_io->complete_request() returned -5

Then I upgraded my complete cluster with test packages and still error is same.

I am attaching debug logs from RGW upgrade only.

Comment 4 Vikhyat Umrao 2017-07-13 19:28:26 UTC
> This goes in loop I had to close with ctrl+c

# hadoop distcp /root/testhadoop.8G s3a://test-bucket/
17/07/13 23:56:45 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/root/testhadoop.8G], targetPath=s3a://test-bucket/, targetPathExists=true, preserveRawXattrs=false}
17/07/13 23:56:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/07/13 23:56:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/07/13 23:56:45 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
17/07/13 23:56:45 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
17/07/13 23:56:45 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
17/07/13 23:56:45 INFO mapreduce.JobSubmitter: number of splits:1
17/07/13 23:56:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1571695601_0001
17/07/13 23:56:46 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/07/13 23:56:46 INFO tools.DistCp: DistCp job-id: job_local1571695601_0001
17/07/13 23:56:46 INFO mapreduce.Job: Running job: job_local1571695601_0001
17/07/13 23:56:46 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/07/13 23:56:46 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 23:56:46 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.tools.mapred.CopyCommitter
17/07/13 23:56:46 INFO mapred.LocalJobRunner: Waiting for map tasks
17/07/13 23:56:46 INFO mapred.LocalJobRunner: Starting task: attempt_local1571695601_0001_m_000000_0
17/07/13 23:56:46 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 23:56:46 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
17/07/13 23:56:46 INFO mapred.MapTask: Processing split: file:/tmp/hadoop-root/mapred/staging/root1227293053/.staging/_distcp2027239354/fileList.seq:0+199
17/07/13 23:56:46 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/13 23:56:46 INFO mapred.CopyMapper: Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G
17/07/13 23:56:46 INFO mapred.RetriableFileCopyCommand: Creating temp file: s3a://test-bucket/.distcp.tmp.attempt_local1571695601_0001_m_000000_0
17/07/13 23:56:47 INFO mapreduce.Job: Job job_local1571695601_0001 running in uber mode : false
17/07/13 23:56:47 INFO mapreduce.Job:  map 0% reduce 0%
17/07/13 23:56:52 INFO mapred.LocalJobRunner: 4.8% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [396.7M/8.0G] > map
17/07/13 23:56:53 INFO mapreduce.Job:  map 100% reduce 0%
17/07/13 23:56:55 INFO mapred.LocalJobRunner: 6.4% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [527.9M/8.0G] > map
17/07/13 23:56:58 INFO mapred.LocalJobRunner: 8.2% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [675.4M/8.0G] > map
17/07/13 23:57:01 INFO mapred.LocalJobRunner: 9.1% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [748.2M/8.0G] > map
17/07/13 23:57:04 INFO mapred.LocalJobRunner: 10.4% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [852.9M/8.0G] > map
17/07/13 23:57:07 INFO mapred.LocalJobRunner: 13.9% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [1.1G/8.0G] > map
17/07/13 23:57:10 INFO mapred.LocalJobRunner: 15.4% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [1.2G/8.0G] > map



17/07/13 23:59:19 INFO mapred.LocalJobRunner: 94.1% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [7.5G/8.0G] > map
17/07/13 23:59:22 INFO mapred.LocalJobRunner: 94.9% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [7.6G/8.0G] > map
17/07/13 23:59:25 INFO mapred.LocalJobRunner: 98.4% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [7.9G/8.0G] > map
17/07/13 23:59:28 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
17/07/13 23:59:31 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
17/07/13 23:59:34 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
17/07/13 23:59:37 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map

17/07/14 00:02:37 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
17/07/14 00:02:40 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
17/07/14 00:02:43 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
17/07/14 00:02:46 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
17/07/14 00:02:49 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
17/07/14 00:02:52 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map

17/07/14 00:04:25 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
17/07/14 00:04:28 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/testhadoop.8G to s3a://test-bucket/testhadoop.8G [8.0G/8.0G] > map
^C[root@dell-per630-10 ~]#

Comment 5 Vikhyat Umrao 2017-07-13 19:50:29 UTC
Created attachment 1297830 [details]
RGW node1 - dell-per630-12.log with debug_rgw=20 and debug_ms=1 , ceph version -10.2.7-28.0.TEST.bz1470301.el7cp

Comment 6 Vikhyat Umrao 2017-07-13 19:51:18 UTC
Created attachment 1297831 [details]
RGW node2 - dell-per630-13.log with debug_rgw=20 and debug_ms=1 , ceph version -10.2.7-28.0.TEST.bz1470301.el7cp

Comment 7 Vikhyat Umrao 2017-07-13 19:55:38 UTC
I looked into node1- dell-per630-12 logs for one of the request:

Request ID: 0x7efd76fb7710 
Thread ID: 7efd76fbd700

2017-07-14 00:06:05.134757 7efd76fbd700  1 ====== starting new request req=0x7efd76fb7710 =====

2017-07-14 00:06:05.134771 7efd76fbd700  2 req 75:0.000013::PUT /test-bucket/testhadoop.8G::initializing for trans_id = tx00000000000000000004b-005967bd95-79ab2-default
2017-07-14 00:06:05.134778 7efd76fbd700 10 rgw api priority: s3=5 s3website=4

[....]

2017-07-14 00:06:05.134965 7efd76fbd700  2 req 75:0.000208:s3:PUT /test-bucket/testhadoop.8G:put_obj:verifying op params
2017-07-14 00:06:05.134968 7efd76fbd700  2 req 75:0.000210:s3:PUT /test-bucket/testhadoop.8G:put_obj:pre-executing
2017-07-14 00:06:05.134970 7efd76fbd700  2 req 75:0.000212:s3:PUT /test-bucket/testhadoop.8G:put_obj:executing

2017-07-14 00:06:05.135023 7efd76fbd700 20 get_obj_state: rctx=0x7efd76fb6e50 obj=test-bucket:.distcp.tmp.attempt_local1571695601_0001_m_000000_0 state=0x7efe14011a78 s->prefetch_data=0

2017-07-14 00:06:05.135062 7efd76fbd700  1 -- 10.74.128.34:0/2387008942 --> 10.74.128.33:6800/414106 -- osd_op(client.498354.0:17031 12.646fbbc5 53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1_.distcp.tmp.attempt_local1571695601_0001_m_000000_0 [getxattrs,stat] snapc 0=[] ack+read+known_if_redirected e113) v7 -- ?+0 0x7efe14012aa0 con 0x7efe5c015cc0

2017-07-14 00:06:05.136058 7efd76fbd700 10 manifest: total_size = 8589934592
2017-07-14 00:06:05.136066 7efd76fbd700 20 get_obj_state: setting s->obj_tag to 53c7fd09-5c03-4602-af99-466b06a3fc32.498340.48^@

2017-07-14 00:06:05.136081 7efd76fbd700 20 get_obj_state: rctx=0x7efd76fb6e50 obj=test-bucket:.distcp.tmp.attempt_local1571695601_0001_m_000000_0 state=0x7efe14011a78 s->prefetch_data=0

2017-07-14 00:06:05.136096 7efd76fbd700 20 rados->get_obj_iterate_cb oid=53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1__multipart_.distcp.tmp.attempt_local1571695601_0001_m_000000_0.2~u1u8vomDJTUgI1IddJplEeGS4NsW7-h.9 obj-ofs=838860800 read_ofs=0 len=524288

2017-07-14 00:06:05.136119 7efd76fbd700  1 -- 10.74.128.34:0/2387008942 --> 10.74.128.34:6804/165197 -- osd_op(client.498354.0:17033 12.8cadf48c 53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1__multipart_.distcp.tmp.attempt_local1571695601_0001_m_000000_0.2~u1u8vomDJTUgI1IddJplEeGS4NsW7-h.9 [read 0~524288] snapc 0=[] ack+read+known_if_redirected e113) v7 -- ?+0 0x7efe14014640 con 0x7efe5c01c7c0

2017-07-14 00:06:05.136147 7efd76fbd700 20 RGWObjManifest::operator++(): rule->part_size=104857600 rules.size()=2
2017-07-14 00:06:05.136149 7efd76fbd700 20 RGWObjManifest::operator++(): stripe_ofs=843055104 part_ofs=838860800 rule->part_size=104857600
2017-07-14 00:06:05.136152 7efd76fbd700 20 RGWObjManifest::operator++(): result: ofs=843055104 stripe_ofs=843055104 part_ofs=838860800 rule->part_size=104857600

> We have -EEXIST, need to restart write

2017-07-14 00:06:07.506048 7efd76fbd700  1 -- 10.74.128.34:0/2387008942 --> 10.74.128.34:6804/165197 -- osd_op(client.498354.0:17149 12.ca166856 53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1__multipart_testhadoop.8G.2~mwaZ7gOoGqmu1SbeI3Nz0DQl7X_5qPL.9 [create 0~0 [excl],writefull 0~524288] snapc 0=[] ack+ondisk+write+known_if_redirected e113) v7 -- ?+0 0x7efe14014310 con 0x7efe5c01c7c0
2017-07-14 00:06:07.858384 7efd76fbd700  5 NOTICE: processor->throttle_data() returned -EEXIST, need to restart write



[....]


2017-07-14 00:07:07.653848 7efd76fbd700 20  bucket index object: .dir.53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1
2017-07-14 00:07:07.653907 7efd76fbd700  1 -- 10.74.128.34:0/2387008942 --> 10.74.128.33:6804/704143 -- osd_op(client.498354.0:20514 10.de6dd1f .dir.53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1 [call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e113) v7 -- ?+0 0x7efe14061620 con 0x7efdbc00c3a0
2017-07-14 00:07:07.655997 7efd76fbd700  1 -- 10.74.128.34:0/2387008942 --> 10.74.128.33:6800/414106 -- osd_op(client.498354.0:20517 12.823e0ca8 53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1__multipart_testhadoop.8G.zSChIKp3PSgQwf2rKGsnZiQa7tzOpxg.9 [setxattr user.rgw.acl (137),setxattr user.rgw.content_type (49),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-copy-source (65),setxattr user.rgw.x-amz-copy-source-range (26),call rgw.obj_store_pg_ver,setxattr user.rgw.source_zone (4)] snapc 0=[] ondisk+write+known_if_redirected e113) v7 -- ?+0 0x7efe14052c90 con 0x7efe5c015cc0
2017-07-14 00:07:08.363957 7efd76fbd700  1 -- 10.74.128.34:0/2387008942 --> 10.74.128.33:6804/704143 -- osd_op(client.498354.0:20530 10.de6dd1f .dir.53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1 [call rgw.bucket_complete_op] snapc 0=[] ack+ondisk+write+known_if_redirected e113) v7 -- ?+0 0x7efe1407f770 con 0x7efdbc00c3a0
2017-07-14 00:07:08.364031 7efd76fbd700 15 omap_set bucket=test-bucket(@{i=default.rgw.buckets.index,e=default.rgw.buckets.non-ec}default.rgw.buckets.data[53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1]) oid=53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1__multipart_testhadoop.8G.2~mwaZ7gOoGqmu1SbeI3Nz0DQl7X_5qPL.meta key=part.00000009
2017-07-14 00:07:08.364058 7efd76fbd700  1 -- 10.74.128.34:0/2387008942 --> 10.74.128.35:6804/27200 -- osd_op(client.498354.0:20531 11.540307cb 53c7fd09-5c03-4602-af99-466b06a3fc32.214420.1__multipart_testhadoop.8G.2~mwaZ7gOoGqmu1SbeI3Nz0DQl7X_5qPL.meta [omap-set-vals 0~695] snapc 0=[] ondisk+write+known_if_redirected e113) v7 -- ?+0 0x7efe14061620 con 0x7efe5c00e320
2017-07-14 00:07:09.704648 7efd76fbd700  2 req 75:64.569890:s3:PUT /test-bucket/testhadoop.8G:put_obj:completing

> We have ERROR: client_io->complete_request() returned -5


2017-07-14 00:07:09.704715 7efd76fbd700  0 ERROR: client_io->complete_request() returned -5



2017-07-14 00:07:09.704720 7efd76fbd700  2 req 75:64.569963:s3:PUT /test-bucket/testhadoop.8G:put_obj:op status=0
2017-07-14 00:07:09.704723 7efd76fbd700  2 req 75:64.569965:s3:PUT /test-bucket/testhadoop.8G:put_obj:http status=200
2017-07-14 00:07:09.704733 7efd76fbd700  1 ====== req done req=0x7efd76fb7710 op status=0 http_status=200 ======

2017-07-14 00:07:09.704797 7efd76fbd700  1 civetweb: 0x7efe140050f0: 10.74.128.32 - - [14/Jul/2017:00:06:05 +0530] "PUT /test-bucket/testhadoop.8G HTTP/1.1" 200 0 - aws-sdk-java/1.7.4 Linux/3.10.0-510.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.131-b12/1.8.0_131

Comment 8 Vikhyat Umrao 2017-07-13 20:14:59 UTC
I did remove HAPROXY and pointed Hadoop directly to node1 - dell-per630-12 and same results.

Comment 11 Vikhyat Umrao 2017-07-13 20:37:25 UTC
> Tested 4G file with Test packages got uploaded successfully.


# hadoop distcp /root/Hadoop.4G.TEST s3a://test-bucket/
17/07/14 02:00:31 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/root/Hadoop.4G.TEST], targetPath=s3a://test-bucket/, targetPathExists=true, preserveRawXattrs=false}
17/07/14 02:00:31 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/07/14 02:00:31 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/07/14 02:00:31 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
17/07/14 02:00:31 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
17/07/14 02:00:31 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
17/07/14 02:00:31 INFO mapreduce.JobSubmitter: number of splits:1
17/07/14 02:00:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local977192475_0001
17/07/14 02:00:32 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/07/14 02:00:32 INFO tools.DistCp: DistCp job-id: job_local977192475_0001
17/07/14 02:00:32 INFO mapreduce.Job: Running job: job_local977192475_0001
17/07/14 02:00:32 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/07/14 02:00:32 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/14 02:00:32 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.tools.mapred.CopyCommitter
17/07/14 02:00:32 INFO mapred.LocalJobRunner: Waiting for map tasks
17/07/14 02:00:32 INFO mapred.LocalJobRunner: Starting task: attempt_local977192475_0001_m_000000_0
17/07/14 02:00:32 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/14 02:00:33 INFO mapreduce.Job: Job job_local977192475_0001 running in uber mode : false
17/07/14 02:00:33 INFO mapreduce.Job:  map 0% reduce 0%
17/07/14 02:00:33 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
17/07/14 02:00:33 INFO mapred.MapTask: Processing split: file:/tmp/hadoop-root/mapred/staging/root1503752063/.staging/_distcp488381173/fileList.seq:0+201
17/07/14 02:00:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/07/14 02:00:33 INFO mapred.CopyMapper: Copying file:/root/Hadoop.4G.TEST to s3a://test-bucket/Hadoop.4G.TEST
17/07/14 02:00:33 INFO mapred.RetriableFileCopyCommand: Creating temp file: s3a://test-bucket/.distcp.tmp.attempt_local977192475_0001_m_000000_0
17/07/14 02:00:38 INFO mapred.LocalJobRunner: 7.6% Copying file:/root/Hadoop.4G.TEST to s3a://test-bucket/Hadoop.4G.TEST [311.7M/4.0G] > map
17/07/14 02:00:39 INFO mapreduce.Job:  map 100% reduce 0%
17/07/14 02:00:41 INFO mapred.LocalJobRunner: 11.2% Copying file:/root/Hadoop.4G.TEST to s3a://test-bucket/Hadoop.4G.TEST [457.7M/4.0G] > map
17/07/14 02:00:44 INFO mapred.LocalJobRunner: 14.4% Copying file:/root/Hadoop.4G.TEST to s3a://test-bucket/Hadoop.4G.TEST [587.9M/4.0G] > map



17/07/14 02:04:35 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/Hadoop.4G.TEST to s3a://test-bucket/Hadoop.4G.TEST [4.0G/4.0G] > map
17/07/14 02:04:35 INFO mapred.Task: Task:attempt_local977192475_0001_m_000000_0 is done. And is in the process of committing
17/07/14 02:04:35 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/Hadoop.4G.TEST to s3a://test-bucket/Hadoop.4G.TEST [4.0G/4.0G] > map
17/07/14 02:04:35 INFO mapred.Task: Task attempt_local977192475_0001_m_000000_0 is allowed to commit now
17/07/14 02:04:35 INFO output.FileOutputCommitter: Saved output of task 'attempt_local977192475_0001_m_000000_0' to file:/tmp/hadoop-root/mapred/staging/root1503752063/.staging/_distcp488381173/_logs/_temporary/0/task_local977192475_0001_m_000000
17/07/14 02:04:35 INFO mapred.LocalJobRunner: 100.0% Copying file:/root/Hadoop.4G.TEST to s3a://test-bucket/Hadoop.4G.TEST [4.0G/4.0G]
17/07/14 02:04:35 INFO mapred.Task: Task 'attempt_local977192475_0001_m_000000_0' done.
17/07/14 02:04:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local977192475_0001_m_000000_0
17/07/14 02:04:35 INFO mapred.LocalJobRunner: map task executor complete.
17/07/14 02:04:36 WARN mapred.CopyCommitter: Unable to cleanup temp files
java.lang.NullPointerException
	at org.apache.hadoop.fs.Path.<init>(Path.java:104)
	at org.apache.hadoop.fs.Path.<init>(Path.java:93)
	at org.apache.hadoop.tools.mapred.CopyCommitter.deleteAttemptTempFiles(CopyCommitter.java:136)
	at org.apache.hadoop.tools.mapred.CopyCommitter.cleanupTempFiles(CopyCommitter.java:126)
	at org.apache.hadoop.tools.mapred.CopyCommitter.commitJob(CopyCommitter.java:83)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:537)
17/07/14 02:04:36 INFO mapred.CopyCommitter: Cleaning up temporary work folder: file:/tmp/hadoop-root/mapred/staging/root1503752063/.staging/_distcp488381173
17/07/14 02:04:36 INFO mapreduce.Job: Job job_local977192475_0001 completed successfully
17/07/14 02:04:36 INFO mapreduce.Job: Counters: 23
	File System Counters
		FILE: Number of bytes read=4295096729
		FILE: Number of bytes written=416959
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		S3A: Number of bytes read=0
		S3A: Number of bytes written=4294967296
		S3A: Number of read operations=14
		S3A: Number of large read operations=0
		S3A: Number of write operations=44
	Map-Reduce Framework
		Map input records=1
		Map output records=0
		Input split bytes=155
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=19
		Total committed heap usage (bytes)=249561088
	File Input Format Counters 
		Bytes Read=233
	File Output Format Counters 
		Bytes Written=8
	org.apache.hadoop.tools.mapred.CopyMapper$Counter
		BYTESCOPIED=4294967296
		BYTESEXPECTED=4294967296
		COPY=1


2017-07-13 20:34 4294967296   s3://test-bucket/Hadoop.4G.TEST
2017-07-13 01:59      6198   s3://test-bucket/rgw.bucket.list
2017-07-13 02:44 1073741824   s3://test-bucket/testhadoop.1g
2017-07-13 02:41 2147483648   s3://test-bucket/testhadoop.2g
2017-07-13 02:14 4294967296   s3://test-bucket/testhadoop.4g
2017-07-13 02:51 536870912   s3://test-bucket/testhadoop.512M
2017-07-13 02:22 5368709120   s3://test-bucket/testhadoop.5g

Comment 12 Vikhyat Umrao 2017-07-13 20:39:46 UTC
> I verified s3cmd ls and I tried 5 times uploading that 8G file and it never completed at Hadoop layer and then I killed with CTRL+C.
> But when I checked the s3cmd ls we have 5 uploads with temp filename. Something weird going on.

$ s3cmd ls s3://test-bucket
2017-07-13 16:06 8589934592   s3://test-bucket/.distcp.tmp.attempt_local1437356062_0001_m_000000_0
2017-07-13 18:34 8589934592   s3://test-bucket/.distcp.tmp.attempt_local1571695601_0001_m_000000_0
2017-07-13 20:09 8589934592   s3://test-bucket/.distcp.tmp.attempt_local374556484_0001_m_000000_0
2017-07-13 20:26 8589934592   s3://test-bucket/.distcp.tmp.attempt_local789721629_0001_m_000000_0
2017-07-13 19:16 8589934592   s3://test-bucket/.distcp.tmp.attempt_local804247269_0001_m_000000_0

Comment 82 errata-xmlrpc 2017-10-17 18:12:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2903


Note You need to log in before you can comment on or make changes to this bug.