Bug 973784

Summary: copy performance with recursive option needs improvement
Product: [Retired] Pulp Reporter: Michael Hrivnak <mhrivnak>
Component: rpm-supportAssignee: Michael Hrivnak <mhrivnak>
Status: CLOSED CURRENTRELEASE QA Contact: Preethi Thomas <pthomas>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.2 BetaCC: paji, rbarlow
Target Milestone: ---Keywords: Triaged
Target Release: 2.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-09 06:55:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of top/ps axf none

Description Michael Hrivnak 2013-06-12 18:03:15 UTC
copy performance with the recursive option is slower than desirable. For example, on my machine, copying the RPMs for a rhel6 repo with about 10350 packages took 38 minutes. There are opportunities for optimization that should be explored, such as loading the list of source-side content only once.

Comment 1 Partha Aji 2013-10-10 15:48:52 UTC
A similar issue was found when I had a filter to unit copy all the kernel* packages in a RHEL repo with recursive. The total number copied over was 379 rpms out of 11K rpms. But the time taken was ~64 minutes on my machine. To put this in perspective, the time without the recursive on was ~11 seconds. 
The recursive option is pretty important for Katello's whitelist filters and hoping this gets improved before the next release.

Comment 2 Michael Hrivnak 2014-02-25 15:33:58 UTC
This is a related change that will help copy performance in general, and performance of many other operations: https://github.com/pulp/pulp/pull/813

Comment 3 Michael Hrivnak 2014-02-27 13:38:53 UTC
https://github.com/pulp/pulp_rpm/pull/453

Comment 4 Jeff Ortel 2014-04-03 13:36:24 UTC
build: 2.4.0-0.7.beta

Comment 5 Preethi Thomas 2014-06-03 16:29:44 UTC
failing 

Copy with recursive took very long. I am attaching the output of top/log etc

[root@dell-pesc430-03 ~]# time pulp-admin rpm repo copy rpm -f rhel6-5 -t rhel-copy1  --recursive 
This command may be exited via ctrl+c without affecting the request.


[-]
Running...
[/]
Running...

Task Failed

Pulp exception occurred: PulpExecutionException


real	73m5.131s
user	1m25.817s
sys	0m4.431s

Comment 6 Preethi Thomas 2014-06-03 16:31:31 UTC
Created attachment 901861 [details]
Output of top/ps axf

Comment 7 Preethi Thomas 2014-06-04 13:33:14 UTC
This could be that the system ran out of memory.

I did another repo copy on the same rhel6.5 repo on a different server and it completed with the following stat. I was running top, I did see the cpu close to 100% while the copy was running. But memory did not seem as high as the other (was below 10%)

[root@ibm-x3550m3-08 ~]# time pulp-admin rpm repo copy rpm -f rhel6-5 -t rhel65-copy --recursive 
This command may be exited via ctrl+c without affecting the request.


[/]
Running...

Copied:
  rpm: 12570


real	19m5.497s
user	0m17.457s
sys	0m0.917s

Comment 8 Michael Hrivnak 2014-06-13 20:54:11 UTC
https://github.com/pulp/pulp_rpm/pull/513

Comment 9 Randy Barlow 2014-06-17 19:02:35 UTC
Fixed in 2.4.0-0.21.beta.

Comment 10 Preethi Thomas 2014-06-25 18:35:43 UTC
Moving to verified
[root@ibm-x3550m3-10 ~]# rpm -qa pulp-server
pulp-server-2.4.0-0.21.beta.el6.noarch
[root@ibm-x3550m3-10 ~]# 

Recursive copy has gotten super fast!
[root@ibm-x3550m3-10 ~]# time pulp-admin rpm repo copy rpm -f rhel6-5 -t rhel65-copy --recursive 
This command may be exited via ctrl+c without affecting the request.


[-]
Running...

Copied:
  rpm: 12602


real	3m44.624s
user	0m6.992s
sys	0m0.256s

Comment 11 Randy Barlow 2014-08-09 06:55:22 UTC
This has been fixed in Pulp 2.4.0-1.