Bug 1218272 - Performance problem with libcurl and FTP on RHEL7.X
Summary: Performance problem with libcurl and FTP on RHEL7.X
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: curl
Version: 7.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Kamil Dudka
QA Contact: Stefan Kremen
URL:
Whiteboard:
Keywords: Patch
: 1152628 1225836 1269086 (view as bug list)
Depends On:
Blocks: 1119415 1154205 1225836
TreeView+ depends on / blocked
 
Reported: 2015-05-04 13:18 UTC by Ken Green
Modified: 2016-01-14 12:08 UTC (History)
9 users (show)

(edit)
Previously, FTP transfers were slower than expected. Consequently, FTP operations such as downloading files took a significantly long time to complete. With this update, the FTP implementation in libcurl correctly sets blocking direction and estimated timeout for connections. As a result, FTP transfers using libcurl now complete faster.
Clone Of:
: 1225836 (view as bug list)
(edit)
Last Closed: 2015-11-19 07:09:14 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:2159 normal SHIPPED_LIVE Moderate: curl security, bug fix, and enhancement update 2015-11-19 08:26:18 UTC

Description Ken Green 2015-05-04 13:18:13 UTC
Description of problem:

Has anyone looked at the performance of libcurl based tools using RHEL7.0 and 7.1 when fetching files using FTP.

I started investigating this after finding that RHEL7 installs were taking massively longer than RHEL5 or 6 ones. Further investigate showed that the unexpected difference went away when switching to using HTTP instead of FTP to talk to the file server.

I then repeated my tests using yum on an installed system. Installing the default base 7.1 configuration and then loading the "Server with GUI" environment on top with yum with FTP results in the download of 890 RPMs taking 7m40s whereas using HTTP the same download from the same server is completed in 7s, a speed difference of over 65times.

I believe that yum uses urlgrabber (python-urlgrabber.noarch 3.10-6.el7) which in turn uses pycurl (python-pycurl.x86_64 7.19.0-17.el7) which uses libcurl (libcurl.x86_64 7.29.0-19.el7).
So next I tried comparing the behaviour of curl using FTP and HTTP and comparing these with wget and also with RHEL6.4.

I used a simple shell loop to download a small (less than 1K file) 10 times. I originally planned to do it 1000 times, but this proved to take far too long.

                real            usr           sys
RHEL 7.1
curl    FTP     0m11.610s       0m0.026s      0m0.041s
        HTTP    0m1.551s        0m0.017s      0m0.039s


wget    FTP     0m0.091s        0m0.011s      0m0.040s
        HTTP    0m0.049s        0m0.014s      0m0.031s

RHEL6.4
curl    FTP     0m0.099s        0m0.016s      0m0.031s
        HTTP    0m0.096s        0m0.013s      0m0.042s

wget    FTP     0m0.087s        0m0.007s      0m0.036s
        HTTP    0m0.048s        0m0.008s      0m0.028s


So comparing curl on RHEL7.1 between FTP and HTTP there is a speed difference of about 7.5. Comparing RHEL7.1 curl and wget for FTP there is a speed difference of 127 times and comparing curl using FTP between RHEL7.1 and 6.4 I see a speed difference of 117 times.

All of these measurements where made using HP ProLiant bl460c Gen8 blades using a Broadcom Corporation NetXtreme II BCM57810 running at 10Gb talking to a fileserver running RHEL6.4 with vsftpd and Apache. I have also experienced the same problems with a variety of other HP ProLiant servers using a number of different network cards.

This issue significantly impacts the install speed of RHEL7 when using FTP. I run training classes and for years have been used to scripting rebuilds of whole classrooms taking less than 10 minutes. With RHEL7 I was seeing install times of about 1 hour. 


Version-Release number of selected component (if applicable):
The install SW for both RHEL7.0 and 7.1
yum-3.4.3-125.el7.noarch
python-urlgrabber.noarch 3.10-6.el7
python-pycurl.x86_64 7.19.0-17.el7
curl-7.29.0-19.el7.x86_64


How reproducible:
totally.


Steps to Reproduce:
1. write a script like

#!/bin/sh

typeset -i n=0

while ((n<10))
do
        curl -s ftp://192.168.73.24/pub/vm1 > /dev/null
#       curl -s http://192.168.73.24/ftp/pub/vm1 > /dev/null
#       wget --quiet ftp://192.168.73.24/pub/vm1
#       wget --quiet http://192.168.73.24/ftp/pub/vm1 > /dev/null
        let n=n+1
        echo $n
done

You'll obviously need to use your own file server and filename.

2. time the script
3. try the different download options.

Actual results:

RHEL 7.1
curl    FTP     0m11.610s       0m0.026s      0m0.041s
        HTTP    0m1.551s        0m0.017s      0m0.039s


wget    FTP     0m0.091s        0m0.011s      0m0.040s
        HTTP    0m0.049s        0m0.014s      0m0.031s


Expected results:

No significant difference between curl and wget, a small difference between FTP and HTTP, like with the wget example. FTP needs to establish a second network connection so is likely to be a little slower on this test.

Additional info:

I think this is likely to be in libcurl rather than just python-pycurl, but this tool wouldn't let me put libcurl in the component field.

Comment 2 Kamil Dudka 2015-05-04 16:11:25 UTC
Thanks for the bug report!

It seems to be triggered by the following upstream commit:

https://github.com/bagder/curl/commit/7cc00d9a

... and I believe that this upstream commit will fix it:

https://github.com/bagder/curl/commit/29bf0598

Comment 3 Kamil Dudka 2015-05-04 16:31:09 UTC
*** Bug 1152628 has been marked as a duplicate of this bug. ***

Comment 6 Kamil Dudka 2015-05-06 18:26:34 UTC
(In reply to Kamil Dudka from comment #2)
> ... and I believe that this upstream commit will fix it:
> 
> https://github.com/bagder/curl/commit/29bf0598

One more upstream commit is needed to make the internal blocking logic work nicely with the FTP protocol implementation:

https://github.com/bagder/curl/commit/c4a7ca03

Comment 7 Kamil Dudka 2015-05-06 19:06:57 UTC
... and this commit will be needed to restore the functionality of HTTP PUT:

https://github.com/bagder/curl/commit/0bf5ce77

Upstream tests 154 and 155 tend to hang without the above patch applied.

Comment 12 Ken Green 2015-05-11 13:51:37 UTC
Thanks everyone, it's great to see so much attention on this.
Given the impact it has on the installation of RHEL 7.1 it the fix likely to make it into an ISO at any point? Or am I better just advising students to use HTTP for setting up install servers?

Comment 13 Kamil Dudka 2015-05-13 11:13:05 UTC
(In reply to Ken Green from comment #12)
> Thanks everyone, it's great to see so much attention on this.
> Given the impact it has on the installation of RHEL 7.1 it the fix likely to
> make it into an ISO at any point? Or am I better just advising students to
> use HTTP for setting up install servers?

I am not sure about this.  Release engineers would give you a precise answer.  Nevertheless installation via HTTP sounds like a reasonable default.  It will work fast enough regardless the version of libcurl on the installation images.

Comment 14 Ken Green 2015-05-13 11:37:24 UTC
Install via HTTP works fine. No problems there. But install via FTP is scarily slow. Chapter 2.3 of the RHEL Install guide shows setting up an FTP server as an install source. For an IBM LPAR I think they don't show HTTP (I've never touched a Mainframe).
Personally I've always tended to setup my repositories using FTP, partly because I preferred the logging but mostly because wget allows wild cards with FTP but not with HTTP. 
When researching this problem I found a few HowTo blogs on setting up for installing RHEL7 (or CentOS) which showed using FTP. I can't find any of those currently. But Googling for "RHEL7 network install server" returns 
http://www.tecmint.com/multiple-centos-installations-using-kickstart/
as the 3rd hit for me this morning.
As I said, for years I've setup install servers using FTP, and for RHEL 5 & 6 the installs were typically sub 10mins, the download install phase about 4mins. Suddenly I was seeing times of about an hour, which doesn't show 7 is a good light. It was only when I didn't find other other people complaining that I thought to look further and tried using HTTP and spotted the vast difference in speed.

Comment 15 Kamil Dudka 2015-05-13 12:10:55 UTC
Thanks for the explanation!  I will try to get the fixed version of libcurl on the installation ISO images.  However, I cannot guarantee that through Bugzilla.  Please open a customer case if it is important for your business.

Comment 16 Kamil Dudka 2015-05-28 11:40:35 UTC
*** Bug 1225836 has been marked as a duplicate of this bug. ***

Comment 21 errata-xmlrpc 2015-11-19 07:09:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2159.html

Comment 22 Valentina Mukhamedzhanova 2016-01-14 12:08:20 UTC
*** Bug 1269086 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.