Bug 965751 - The curl settings for when to abort the connection due to slow progress seem to trigger too easily
The curl settings for when to abort the connection due to slow progress seem ...
Status: CLOSED CURRENTRELEASE
Product: Pulp
Classification: Community
Component: z_other (Show other bugs)
Master
Unspecified Unspecified
medium Severity medium
: ---
: 2.3.0
Assigned To: Michael Hrivnak
Preethi Thomas
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-21 12:19 EDT by Randy Barlow
Modified: 2013-12-09 09:29 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-09 09:29:58 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
proof of bug (1.50 KB, text/x-python)
2013-09-09 17:06 EDT, Michael Hrivnak
no flags Details

  None (edit)
Description Randy Barlow 2013-05-21 12:19:45 EDT
I made an ISO repository that was set to a max speed of 56.6 kB/s, and the download times out:

$ pulp-admin iso repo create --repo-id speedtest --feed http://pkilambi.fedorapeople.org/test_file_repo/ --max-speed 57958

$ time pulp-admin iso repo sync run --repo-id speedtest
+----------------------------------------------------------------------+
                    Synchronizing Repository [test5]
+----------------------------------------------------------------------+

This command may be exited by pressing ctrl+c without affecting the actual
operation on the server.

Downloading the Pulp Manifest...
The Pulp Manifest was downloaded successfully.

Downloading 3 ISOs...
[==============                                    ] 29%
ISOs: 0/3	Data: 3.0 MB/10.2 MB	Avg: 55.9 kB/s


Failed to retrieve 3 ISOs.

	test2.iso: {u'response_code': 0, u'error_message': u'Operation timed out after
30000 milliseconds with 0 out of 0 bytes received', u'error_code': 28}

	test.iso: {}

	test3.iso: {u'response_code': 0, u'error_message': u'Operation timed out after
30000 milliseconds with 0 out of 0 bytes received', u'error_code': 28}


real	0m55.593s
user	0m0.807s
sys	0m0.109s
Comment 1 Randy Barlow 2013-05-21 12:21:10 EDT
I should mention that the code comments we have in the curl downloader state that the connection should abort if we get less than 1 kB/s over a 5 minute period. This seems to abort after 55 seconds, and we are going much faster than 1 kB/s, so I think something is wrong in the code.
Comment 2 Randy Barlow 2013-05-21 12:34:29 EDT
I think the error message about 30 seconds is a clue. We have a setting that tells Curl to give up if it fails to connect after 30 seconds. I don't think this is the 1 kB/s limit that we're hitting, but the connect timeout setting, DEFAULT_CONNECT_TIMEOUT. When speed limiting is happening, it seems to choke out the other downloads. I wonder if it will be difficult to support speed limiting and multiple concurrent downloads at the same time.
Comment 3 Randy Barlow 2013-05-21 12:49:57 EDT
I confirmed that using a max_concurrent of 1 and a speed limit of 56 kB/s does work, so I think we'll need to do something about max_speed and max_concurrent being set at the same time. I really don't know what though, as setting a high enough max_speed (200 kB/s, for example) makes everything work out just fine.

We could consider implementing our own max_speed check instead of relying on the one from pycurl. Or we could just disallow setting both and defaulting to a single download when max_speed is set.
Comment 4 Michael Hrivnak 2013-09-09 17:06:01 EDT
Created attachment 795758 [details]
proof of bug

This reproduces the bug.

To use, create a directory "tmp" in the same dir as this script, then run it. It will take about a minute, and then you'll see two error messages.

There are two issues here:

1) As far as I can tell, the curl downloader only downloads 1 file at a time ever.

2) When a file doesn't start its download soon enough, because only 1 of the "concurrent" downloads is actually doing anything, it times out after 30 seconds.

What happens when you run the script is a download of 3 files with a speed of 56KBps. In a separate terminal, if you run "watch -n 1 ls -l tmp/", you'll see that only one file's size is growing. You can also see with netstat and with lsof that only one TCP connection exists. Thus after about a minute when the first file finishes downloading, the other two have timed out, and you see each of their error messages.

You can switch lines 33 and 34 to see how the threaded "requests"-based downloader behaves.
Comment 5 Michael Hrivnak 2013-09-09 17:42:45 EDT
Based on the above problem, I propose that we switch all code currently using the curl-based downloader to use the requests-based one. We know it works, and it is actually more performant.

In tests today, I found that I was able to achieve 75MBps (yes, I really mean a big B) at around 50% CPU use with the requests downloader fetching three big files. The same test with the curl downloader achieved about 24MBps at over 90% CPU utilization.

Then requests library is also much easier to maintain. pycurl requires all kind of non-pythonic boilerplate.
Comment 6 Michael Hrivnak 2013-09-11 21:52:45 EDT
I have not done any work toward switching to the threaded requests downloader, so please proceed without me.

An email was sent to the pulp team soliciting opinions on the proposal to switch to the threaded requests downloader instead of the curl one, and there were no objections.
Comment 7 Jeff Ortel 2013-09-18 18:07:35 EDT
The fix is to migrate pulp away from the curl downloader.  As of now, there are two places where it is still used:
 1. nodes importer
 2. iso importer

This PR to migrate nodes has been merged:

https://github.com/pulp/pulp/pull/625

The ISO importer will be migrated later in the sprint.
Comment 8 Jeff Ortel 2013-09-18 18:19:19 EDT
This, now, should only affect the ISO importer when setting the a low bandwidth limit and parallel downloads.  Moving to MEDIUM priority.
Comment 9 Jeff Ortel 2013-09-19 16:03:45 EDT
The next step is to migrate the ISO importer to use the threaded downloader.
Comment 10 Michael Hrivnak 2013-09-26 15:16:14 EDT
https://github.com/pulp/pulp_rpm/pull/341
Comment 11 Jeff Ortel 2013-09-26 16:54:11 EDT
build: 2.3.0-0.16.alpha
Comment 12 Preethi Thomas 2013-10-09 09:24:14 EDT
verified
[root@pulp-v2-server ~]# rpm -qa pulp-server
pulp-server-2.3.0-0.17.beta.el6.noarch
[root@pulp-v2-server ~]# 


[root@pulp-v2-server ~]# pulp-admin iso repo create --repo-id speedtest --fettp://pkilambi.fedorapeople.org/test_file_repo/ --max-speed 57958
Successfully created repository [speedtest]

[root@pulp-v2-server ~]# time pulp-admin iso repo sync run --repo-id speedtest
+----------------------------------------------------------------------+
                  Synchronizing Repository [speedtest]
+----------------------------------------------------------------------+

This command may be exited by pressing ctrl+c without affecting the actual
operation on the server.

The Pulp Manifest was downloaded successfully.

Downloading 3 ISOs...
[==================================================] 100%
ISOs: 3/3	Data: 10.2 MB/10.2 MB	Avg: 76.6 kB/s


Successfully downloaded 3 ISOs.

The repository was successfully published.


[root@pulp-v2-server ~]#  pulp-admin node sync  run --node-id pulp.node1.com --max-speed 57958
This command may be exited via ctrl+c without affecting the request.

[-]
Waiting to begin...



(1/2) Repository: pulp
[==================================================] 100%


(2/2) Repository: zoo
[==================================================] 100%




Synchronization succeeded

+----------------------------------------------------------------------+
                       Child Node Synchronization
+----------------------------------------------------------------------+

Repository: 
  Action: Merged
  Id:     pulp
  Units:  
    Added:   0
    Removed: 0
    Updated: 0

Repository: 
  Action: Added
  Id:     zoo
  Units:  
    Added:   39
    Removed: 0
    Updated: 0



[root@pulp-v2-server ~]# 



real	2m18.004s
user	0m3.233s
sys	0m0.178s
[root@pulp-v2-server ~]#
Comment 13 Preethi Thomas 2013-12-09 09:29:58 EST
Pulp 2.3 released.

Note You need to log in before you can comment on or make changes to this bug.