Bug 1726337

Summary:	Frequent clone System-Calls with Apache, Perl and CURL with Threaded Resolver
Product:	Red Hat Enterprise Linux 7	Reporter:	Shao Miller <smiller>
Component:	curl	Assignee:	Kamil Dudka <kdudka>
Status:	CLOSED NOTABUG	QA Contact:	Daniel Rusek <drusek>
Severity:	low	Docs Contact:
Priority:	unspecified
Version:	7.6	CC:	kdudka
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-07-03 15:45:02 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Shao Miller 2019-07-02 15:53:25 UTC

Description of problem:

Installed:
- Apache 2.4.6 (httpd-2.4.6-88.el7)
- Perl module for Apache (mod_perl-2.0.10-3.el7 from EPEL 7)
- CURL (curl and libcurl both 7.29.0-51.el7)

In this scenario:
- Apache is running in "prefork" MPM mode.
- Certain web-requests are processed by Perl.
- Certain of those result in calls to CURL.

This combination results in frequent 'clone' system-calls, as observed with 'strace' and GDB. GDB reveals that the primary cloner is libcurl. The CURL .spec file specifies the '--enable-threaded-resolver' option for 'configure'. Rebuilding libcurl with --enable-ares instead of --enable-threaded-resolver eliminates the frequent 'clone' system-calls, but doing so means diverging from the packages for curl and libcurl.

Is this a supported scenario? Are frequent 'clone' system-calls by libcurl in a multi-process (as opposed to a multi-threaded) server to be expected?

(In this particular scenario, the libcurl invocations are expected to block the processing of the web-request, so starting up one or more threads for DNS resolution and then tearing those threads down seems pretty needless; no useful work is happening on behalf of the web-request until the DNS resolution and the curl-result have been completed.)

Version-Release number of selected component (if applicable):

curl and libcurl 7.29.0-51

How reproducible:

Always.

Steps to Reproduce:

Configure Apache to handle a web-request foo.cgi with a Perl file that invokes curl to get a resource from www.google.com, then send a continuous, parallel stream of web-requests for foo.cgi for a few minutes. Watch with:

(strace -T -tt -f $(for i in $(pidof httpd); do echo -n "-p $i "; done) -e trace=clone &); sleep 60; killall strace

Actual results:

Many 'clone' system-calls.

Expected results:

No 'clone' system-calls.

Comment 2 Kamil Dudka 2019-07-02 22:00:57 UTC

c-ares cannot be used for the system libcurl package because it bypasses the system Name Service Switch (NSS) stack.  We tried to make libcurl use c-ares in Fedora 13 and failed, see bug #554305 for instance.

Why exactly is "many clone syscalls" a problem for you?

Comment 3 Shao Miller 2019-07-03 00:36:41 UTC

Thank you for your response, Kamil Dudka.

libcurl's threaded-resolver threads' 'clone' system-calls:
- Cause noise into 'strace' during analysis.  (Filtering out 'clone' syscalls would then interfere with capturing non-libcurl 'clone' syscalls.)
- Come and go with web-requests instead of lasting for the duration of the Apache child-process.
- Seem unnecessary, as a blocking model is expected for these web-requests.

Using c-ares is unimportant, but I noticed it as the alternative and it seems like a better fit, with respect to these three points, above.  I've read bug #554305 and understand the other limitations that were discussed for c-ares, in that bug.

I have not yet noticed a performance-problem, with respect to requests-per-second nor to web-response time.  (Since the web-request blocks anyway, I can't imagine that the threaded-resolver strategy could be any faster, with its setup and tear-down.)

Do you suppose that a request to curl to offer a blocking DNS resolution-mode (perhaps via environment-variable or some other mechanism) would be better than filing this bug-report for Red Hat's curl?

Comment 4 Kamil Dudka 2019-07-03 10:25:25 UTC

libcurl needs a separate thread for DNS resolution to implement a non-blocking interface on top of the blocking interface provided by system's DNS resolver.  The libcurl "multi" API allows you to run multiple transfers in parallel without using any threads in your application (and without using any threads in libcurl itself, except for DNS resolution).  Without the threaded resolver, you would not be able to implement any timeout for DNS resolution (except the siglongjmp-based hack, which does not work reliably).  If you want to lower the number of clone() syscalls, you can try sharing libcurl's DNS cache between your requests and/or reuse existing connections.

Comment 5 Shao Miller 2019-07-03 14:51:21 UTC

libcurl's "multi" API seems to fit better with an event-driven loop, using 'epoll', 'kqueue', etc.  That seems like a better fit for Apache's "event MPM".  This seems modern and fast.

Using Apache's "prefork MPM" (multiple child-processes) and non-threaded Perl via mod_perl, the paradigm is more like: an Apache child-process becomes occupied by a single web-request until its web-response has been completed, then becomes available for the next web-request; web-requests aren't interleaved in a child-process.  (There is a case for "sub-requests," however, so this isn't strictly true.)  This seems old and slow, but it is a reality for some people.  Some people in this situation use SIGALRM to implement timeouts.

If I'm not mistaken, if we have 256 Apache child-processes and if each one is processing a web-request (with Perl) that happens to be using libcurl, there are 256 Apache threads, 256 libcurl DNS resolver threads, 256 libcurl DNS caches and 256 libcurl connections.  All 256 of those libcurl DNS resolver threads seem wasteful, since the 256 Apache threads (running Perl) are expecting to block for DNS resolution and expecting to use SIGALRM for timeouts.

It seems to me that Red Hat's libcurl is assuming 'epoll' usage and assuming the cost of a libcurl thread to be cheap, but it can be one thread per process instead of one thread per server.

Thank you for your two suggestions.  Perhaps they will help.

Comment 6 Kamil Dudka 2019-07-03 15:45:02 UTC

Nothing specific to Red Hat, it is just how upstream libcurl works.  Its blocking (easy) API is implemented on top of the non-blocking (multi) API.  If you build libcurl from an upstream tarball on Linux, the threaded resolver is used by default.  Yes, we assume the cost of the resolver thread to be cheap, and you did not provide any data showing us the opposite.  If you prefer to use SIGALRM-based timeouts for DNS resolution, you can customize your own build of libcurl.  But we are not going to build libcurl with any different resolver in Fedora/RHEL.

Comment 7 Shao Miller 2019-07-03 16:28:22 UTC

Thank you for your time.  I thought it might be a bug, as the stack seems a bit odd:
- Blocking DNS resolver
- Non-blocking libcurl "multi" API
- Blocking libcurl "easy" API
- Perl expects to block on libcurl
- Apache "prefork" child-process blocks on Perl

One of these things is not like the others.

But it's not Red Hat-specific, as you've kindly stated.