1204825 – [RFE] urlgrabber.urlread() is not thread-safe

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1204825 - [RFE] urlgrabber.urlread() is not thread-safe

Summary: [RFE] urlgrabber.urlread() is not thread-safe

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	python-urlgrabber
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Domonkos
QA Contact:	Jan Kepler
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1112660 1549618
TreeView+	depends on / blocked

Reported:	2015-03-23 14:56 UTC by Adrian Reber
Modified:	2018-10-30 10:16 UTC (History)
CC List:	7 users (show)
Fixed In Version:	python-urlgrabber-3.10-9.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-30 10:15:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
crash_test script (9.52 KB, text/x-python) 2015-04-10 12:29 UTC, Adrian Reber	no flags	Details
input file (60.37 KB, text/plain) 2015-04-10 12:30 UTC, Adrian Reber	no flags	Details
crash_test using pycurl (10.17 KB, text/x-python) 2015-04-10 19:25 UTC, Adrian Reber	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:3130	0	None	None	None	2018-10-30 10:16:02 UTC

Description Adrian Reber 2015-03-23 14:56:06 UTC

Description of problem:
One of the Fedora infrastructure machines (mm-crawler01.stg.phx2.fedoraproject.org) the new MirrorManager implementation is installed. To check if all Fedora mirrors are up to date it periodically scans the existing mirrors. It does this using rsync, http and ftp.

This new crawler is now multithreaded instead of forking a crawler for each host. Unfortunately this new design seems to trigger a segfault in libcurl:

 [239814.784542] mm2_crawler[16943]: segfault at 58 ip 00007fe19b2a39f7 sp 00007fe0ecfeefb0 error 4 in libpython2.7.so.1.0[7fe19b1c7000+178000]

Missing separate debuginfo for the main executable file
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/13/703bac3dafd17eda2d4a93cd8b11e766dd4721
Core was generated by `/usr/bin/python /usr/bin/mm2_crawler -c mirrormanager2.cfg -t 100'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f7f1e206557 in ?? ()
"/home/fedora/adrian/core.28994" is a core file.
Please specify an executable to debug.
(gdb) bt
#0  0x00007f7f1e206557 in ?? ()
#1  0x0000000001be7630 in ?? ()
#2  0x00007f7ed4738dc0 in ?? ()
#3  0x00007f7f0400cea0 in ?? ()
#4  0x00007f7f1e20f225 in ?? ()
#5  0x00007f7e807ed394 in ?? ()
#6  0x000000000003a6dd in ?? ()
#7  0x000000000003a6de in ?? ()
#8  0x00000000000200b3 in ?? ()
#9  0x00007f7f0400cea0 in ?? ()
#10 0x0100007f29ddfdfd in ?? ()
#11 0x00007f7ed46baf60 in ?? ()
#12 0xf8d086a72e609400 in ?? ()
#13 0x000000000003a6de in ?? ()
#14 0x00007f7ed4738dc0 in ?? ()
#15 0x00007f7f0400cea0 in ?? ()
#16 0x00000000000200b3 in ?? ()
#17 0x000000000003a6de in ?? ()
#18 0x0000000001be7630 in ?? ()
#19 0x0000000000000000 in ?? ()

Unfortunately I have no debuginfo packages installed. I have the core file and can distribute it if you need.


Version-Release number of selected component (if applicable):
[adrian@mm-crawler01 ~]$ rpm -qa | grep curl
python-pycurl-7.19.0-17.el7.x86_64
curl-7.29.0-19.el7.x86_64
libcurl-7.29.0-19.el7.x86_64
[adrian@mm-crawler01 ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.1 (Maipo)

Comment 2 Kamil Dudka 2015-03-23 15:17:20 UTC

(In reply to Adrian Reber from comment #0)
> Description of problem:
> One of the Fedora infrastructure machines
> (mm-crawler01.stg.phx2.fedoraproject.org) the new MirrorManager
> implementation is installed. To check if all Fedora mirrors are up to date
> it periodically scans the existing mirrors. It does this using rsync, http
> and ftp.

Where can we get source code of the crawler in question?

> This new crawler is now multithreaded instead of forking a crawler for each
> host. Unfortunately this new design seems to trigger a segfault in libcurl:

Please make sure that libcurl is used in the supported way.  That is, you never share the same handle in multiple threads:

http://curl.haxx.se/docs/faq.html#Is_libcurl_thread_safe

> Unfortunately I have no debuginfo packages installed. I have the core file
> and can distribute it if you need.

Please do.

Comment 3 Adrian Reber 2015-03-23 15:22:09 UTC

LD_PRELOAD-ing the libcurl from Fedora 21 resolves this bug by the way. Something between the curl version in RHEL7.1 and F21 seems to fix this problem.

Source code: https://github.com/fedora-infra/mirrormanager2/blob/master/utility/mm2_crawler

I will sent you a link to the core file as I am not sure what kind of information is stored in the core file.

Comment 4 Kamil Dudka 2015-03-23 16:57:43 UTC

Thank you for providing the backtrace.  It crashes in Curl_retry_request():

1851│   if(/* workaround for broken TLS servers */ data->state.ssl_connect_retry ||
1852│       ((data->req.bytecount +
1853│         data->req.headerbytecount == 0) &&
1854│         conn->bits.reuse &&
1855│         !data->set.opt_no_body &&
1856│         data->set.rtspreq != RTSPREQ_RECEIVE)) {
1857│     /* We got no data, we attempted to re-use a connection and yet we want a
1858│        "body". This might happen if the connection was left alive when we were
1859│        done using it before, but that was closed when we wanted to read from
1860│        it again. Bad luck. Retry the same request on a fresh connect! */
1861│     infof(conn->data, "Connection died, retrying a fresh connect\n");
1862│     *url = strdup(conn->data->change.url);
1863│     if(!*url)
1864│       return CURLE_OUT_OF_MEMORY;
1865│
1866│     conn->bits.close = TRUE; /* close this connection */
1867│     conn->bits.retry = TRUE; /* mark this as a connection we're about
1868│                                 to retry. Marking it this way should
1869│                                 prevent i.e HTTP transfers to return
1870│                                 error just because nothing has been
1871│                                 transferred! */
1872│
1873│
1874├>    if((conn->handler->protocol&CURLPROTO_HTTP) &&
1875│        data->state.proto.http->writebytecount)
1876│       return Curl_readrewind(conn);
1877│   }
1878│   return CURLE_OK;
1879│ }

At line 1875 of transfer.c, 'data->state.proto.http' is NULL and basically all the content of *data seems to be nullified.

The function was called from multi_runsingle():

1437│       k = &data->req;
1438│
1439│       if(!(k->keepon & KEEP_RECV)) {
1440│         /* We're done receiving */
1441│         easy->easy_conn->readchannel_inuse = FALSE;
1442│       }
1443│
1444│       if(!(k->keepon & KEEP_SEND)) {
1445│         /* We're done sending */
1446│         easy->easy_conn->writechannel_inuse = FALSE;
1447│       }
1448│
1449│       if(done || (easy->result == CURLE_RECV_ERROR)) {
1450│         /* If CURLE_RECV_ERROR happens early enough, we assume it was a race
1451│          * condition and the server closed the re-used connection exactly when
1452│          * we wanted to use it, so figure out if that is indeed the case.
1453│          */
1454├>        CURLcode ret = Curl_retry_request(easy->easy_conn, &newurl);
1455│         if(!ret)
1456│           retry = (newurl)?TRUE:FALSE;
1457│
1458│         if(retry) {
1459│           /* if we are to retry, set the result to OK and consider the
1460│              request as done */
1461│           easy->result = CURLE_OK;
1462│           done = TRUE;
1463│         }
1464│       }

At line 1454 of multi.c, done is TRUE and easy->result is CURLE_OK.

The backtrace looks like this:

#0  Curl_retry_request (conn=0x7f7ed471bd30, ...) at transfer.c:1874
#1  multi_runsingle (multi=0x7f7f0400cea0, ...) at multi.c:1454
#2  curl_multi_perform (multi_handle=0x7f7f0400cea0, ...) at multi.c:1713
#3  curl_easy_perform (easy=0x1be7630) at easy.c:482
#4  do_curl_perform (self=0x1b9f8c0) at src/pycurl.c:1019
#5  call_function (...) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4082

There is other 105 threads executing (or being blocked on) waitpid(), sem_wait(), recv(), connect(), epoll_wait(), or mprotect().

I see no obvious bug fixes related to this issue in the upstream git repository.  The code causing SIGSEGV has been changed, but the change is not documented as a bug fix (and at first glance does not look like it would fix the bug):

https://github.com/bagder/curl/commit/e79535bc

Comment 5 Adrian Reber 2015-04-01 11:59:35 UTC

Instead of LD_PRELOAD I rebuilt curl from the Fedora master branch on the epel7 branch and that also works:

http://koji.fedoraproject.org/koji/taskinfo?taskID=9391121

curl from RHEL7.1 still fails every time after about 90 seconds.

The cronjob starts at 00:00 and 12:00

[Tue Mar 31 00:01:40 2015] mm2_crawler[4358]: segfault at 30 ip 00007f58062e2557 sp 00007f576a7f1250 error 4 in libcurl.so.4.3.0[7f58062b9000+63000]
[Tue Mar 31 12:01:20 2015] mm2_crawler[9945]: segfault at 30 ip 00007fc9c1db5557 sp 00007fc9aa7f9250 error 4 in libcurl.so.4.3.0[7fc9c1d8c000+63000]
[Wed Apr  1 00:01:33 2015] mm2_crawler[10428]: segfault at 30 ip 00007f22b7ede557 sp 00007f22adf84250 error 4 in libcurl.so.4.3.0[7f22b7eb5000+63000]

Comment 6 Kamil Dudka 2015-04-01 12:10:45 UTC

If you have a self-contained script that decides whether a version of curl is "good" or "bad" in a finite amount of time, we can use git-bisect to find the upstream commit that turns the "bad" behavior into the "good" behavior.

Comment 7 Adrian Reber 2015-04-04 08:21:27 UTC

I am now running bisect for the last three days and unfortunately it is not always easy to decide if it fails or not. This will probably take a couple more days.

I now have additional segfaults which seem to be in libc:

[Fri Apr  3 18:42:33 2015] python[6141]: segfault at 0 ip 00007f0a3810c811 sp 00007f098bff4238 error 4 in libc-2.17.so[7f0a38086000+1b6000]
[Fri Apr  3 20:07:42 2015] python[20690]: segfault at 0 ip 00007f801287e811 sp 00007f7f5afda238 error 4 in libc-2.17.so[7f80127f8000+1b6000]

So I am not sure what this means for the curl problem I am bisceting.

Comment 8 Kamil Dudka 2015-04-07 11:46:57 UTC

(In reply to Adrian Reber from comment #7)
> I am now running bisect for the last three days and unfortunately it is not
> always easy to decide if it fails or not. This will probably take a couple
> more days.

Thank you for analyzing it further!  Wishing good luck with git-bisect!

> I now have additional segfaults which seem to be in libc:
> 
> [Fri Apr  3 18:42:33 2015] python[6141]: segfault at 0 ip 00007f0a3810c811
> sp 00007f098bff4238 error 4 in libc-2.17.so[7f0a38086000+1b6000]
> [Fri Apr  3 20:07:42 2015] python[20690]: segfault at 0 ip 00007f801287e811
> sp 00007f7f5afda238 error 4 in libc-2.17.so[7f80127f8000+1b6000]
> 
> So I am not sure what this means for the curl problem I am bisceting.

The above lines do not say much about the actual cause.  If there is a memory corruption inside libcurl, it can also result in a SIGSEGV in libc and abnormal termination of the python process.  However, we would need more info to confirm such a hypothesis.

Comment 9 Adrian Reber 2015-04-08 17:12:58 UTC

I have finished a bisect run where I was treating all segfaults (libc and libcurl) the same and if did not segfault I was running the crawler up to 5 times. Eventually I got a segfault all the times with every curl version I have tested. So this errors has not been fixed even if libcurl.so from Fedora 21 does not crash. So this is probably very hard to trigger. I bisected curl between 7.29 and yesterdays git. Sometimes it takes a couple of hours to trigger this segfault. I do not know how to debug this further.

Comment 10 Kamil Dudka 2015-04-08 17:29:13 UTC

Thank you for debugging this!

Did you run git-bisect on Fedora or the upstream git repository?

If upstream, are you sure that the same configure options and compiler flags were used as with rpmbuild?

Comment 11 Adrian Reber 2015-04-08 18:14:31 UTC

I was running git bisect on the upstream repository. The following is the output of git bisect log

# bad: [992a731116fc9134d4a0acf17fe10219917ecb30] test142[23]: verify that an empty file is stored on success
# good: [bf633a584dcbb0f80273ba856b7198ad1e395315] vms: config-vms.h is removed, no use trying to distribute it
git bisect start 'master' 'bf633a584dcbb0f80273ba856b7198ad1e395315'
# good: [75e996f29f1855d47299cf29f96507cd78d5aff1] tool: Moved --progress-bar to the global config
git bisect good 75e996f29f1855d47299cf29f96507cd78d5aff1
# good: [2e121fda355ecc94e155d27adbd21525aa60fdba] sasl_sspi: Fixed typo in comment
git bisect good 2e121fda355ecc94e155d27adbd21525aa60fdba
# good: [62a018762e081a679534a19c3b41fcf597de64ec] sockfilt.c: Replace 100ms sleep with thread throttle
git bisect good 62a018762e081a679534a19c3b41fcf597de64ec
# good: [0daf1ef7299dcd4755a75d6b9342739da6be7244] curl.1: clarify that -X is used for all requests
git bisect good 0daf1ef7299dcd4755a75d6b9342739da6be7244
# good: [76afe14584a1e6377663bbf6f0587981a686b615] CURLOPT_STDERR.3: added an example
git bisect good 76afe14584a1e6377663bbf6f0587981a686b615
# good: [e438a9e2f022819c1023b6fa9daf4b37bf0b8111] CURLOPT_PATH_AS_IS.3: add type 'long' to prototype
git bisect good e438a9e2f022819c1023b6fa9daf4b37bf0b8111
# good: [ae3c985060d7d5fd7a72d7dcb0b6b176f3c972b2] tool_operate: only set SSL options if SSL is enabled
git bisect good ae3c985060d7d5fd7a72d7dcb0b6b176f3c972b2
# good: [f203edc544e1fb902fbc950e47d04e1505c594de] cyassl: Set minimum protocol version before CTX callback
git bisect good f203edc544e1fb902fbc950e47d04e1505c594de
# good: [f2a0b2164a1cdeaa806debbc3d0b46cfe04976e9] checksrc.bat: quotes to support an SRC_DIR with spaces
git bisect good f2a0b2164a1cdeaa806debbc3d0b46cfe04976e9
# good: [c3101ae287fcfc420bdd816f1eaf39c8dc9b242b] x509asn1: Silence x64 loss-of-data warning on RSA key length assignment
git bisect good c3101ae287fcfc420bdd816f1eaf39c8dc9b242b
# good: [f251417d85d232605ca86e9562a64500c67ccdee] src/tool_cb_wrt: separate fnc for output file creation
git bisect good f251417d85d232605ca86e9562a64500c67ccdee
# good: [261a0fedcf1545440190965311a1554d7695b6c0] src/tool_operate: create output file on successful download
git bisect good 261a0fedcf1545440190965311a1554d7695b6c0
# first bad commit: [992a731116fc9134d4a0acf17fe10219917ecb30] test142[23]: verify that an empty file is stored on success


I used following configure options on RHEL7 to build curl from git:

../configure --disable-static --enable-symbol-hiding --enable-ipv6  --enable-ldaps --enable-manual  --enable-threaded-resolver -with-ca-bundle=/etc/pki/tls/certs/ca-bundle.crt -with-gssapi  --with-libidn --with-libmetalink --with-libssh2 -without-ssl --with-nss --disable-sspi

Those were the options I copied from the Fedora rawhide curl spec file.

I have logfiles from all builds and test runs if you need to look at the configure or make output.

Comment 12 Kamil Dudka 2015-04-09 08:17:50 UTC

If all commits were good (meaning a crash occurred), then the output of git-bisect says nothing.

Now the question is why it works with f21 libcurl?  Does it really work?

If so, I would suggest to checkout the curl-7_37_0 tag in the upstream git repository, build it, and check whether it suffers from this bug.  If it does, we need to check downstream patches.

Comment 13 Kamil Dudka 2015-04-09 08:32:35 UTC

Also, are you sure that you had all _build_ dependencies of curl installed when you built the upstream curl from sources?

Note that libmetalink-devel is not available on RHEL but that would likely make no difference regarding this bug.

Comment 14 Adrian Reber 2015-04-09 08:36:54 UTC

I am pretty sure I had all dependencies installed. libmetalink-devel was from EPEL: libmetalink-0.1.2-4.el7.x86_64

This is the result of configure:

  curl version:     7.36.0-DEV
  Host setup:       x86_64-unknown-linux-gnu
  Install prefix:   /usr/local
  Compiler:         gcc
  SSL support:      enabled (NSS)
  SSH support:      enabled (libSSH2)
  zlib support:     enabled
  GSSAPI support:   enabled (MIT/Heimdal)
  SPNEGO support:   no      (--with-spnego)
  TLS-SRP support:  no      (--enable-tls-srp)
  resolver:         POSIX threaded
  ipv6 support:     enabled
  IDN support:      enabled
  Build libcurl:    Shared=yes, Static=no
  Built-in manual:  enabled
  --libcurl option: enabled (--disable-libcurl-option)
  Verbose errors:   enabled (--disable-verbose)
  SSPI support:     no      (--enable-sspi)
  ca cert bundle:   /etc/pki/tls/certs/ca-bundle.crt
  ca cert path:     no
  LDAP support:     enabled (OpenLDAP)
  LDAPS support:    enabled
  RTSP support:     enabled
  RTMP support:     no      (--with-librtmp)
  metalink support: enabled
  HTTP2 support:    disabled (--with-nghttp2)
  Protocols:        DICT FILE FTP FTPS GOPHER HTTP HTTPS IMAP IMAPS LDAP LDAPS POP3 POP3S RTSP SCP SFTP SMTP SMTPS TELNET TFTP

I am now running my test against the curl-7_37_0 tag to see if I can reproduce the error.

Comment 15 Kamil Dudka 2015-04-09 09:04:40 UTC

(In reply to Adrian Reber from comment #14)

Looks good.  Thanks for the confirmation!

Comment 16 Adrian Reber 2015-04-09 12:10:56 UTC

It took some tries but I was able to crash it using the build from the curl-7_37_0 tag. I was now also able to trigger a segfault in the Fedora 21 version of curl. So this error seems to exist in all versions of curl. With some versions it segfaults after a few minutes and other versions take a couple of hours.

Comment 17 Kamil Dudka 2015-04-09 14:27:16 UTC

My assumption about this being a bug of libcurl was based on your statement that the bug does not occur with a different version of libcurl.

Now as you say that all versions of libcurl suffer from this bug, it looks rather as a bug somewhere higher in the application stack.  Other people use libcurl in multi-threaded applications with no issues.

In any case, a minimal example would be nice...

Comment 18 Adrian Reber 2015-04-10 12:29:18 UTC

I was able to create a "simple" version of the crash. It does not need a database only a list with URLs as inputs. I will attach the python script and and the URL list later to this. It does not really look like a bug in curl.

Most of the time the crawler only does HEAD requests using python's httplib. Only for repomd.xml files the crawler switches to urlgrabber to download the file to calculate the hash:

        s = urlgrabber.urlread(url)
        sha256 = hashlib.sha256(s).hexdigest()

Without this part it does not crash and with that code it segfaults like described above. It should be easy to replace this code with something that does not use urlgrabber. Maybe urlgrabber is not thread safe. Thanks for your help.

Comment 19 Adrian Reber 2015-04-10 12:29:57 UTC

Created attachment 1013122 [details]
crash_test script

Comment 20 Adrian Reber 2015-04-10 12:30:21 UTC

Created attachment 1013123 [details]
input file

Comment 21 Kamil Dudka 2015-04-10 15:17:41 UTC

Thanks for the test script!  I can confirm that it indeed crashes from time to time but it is difficult to debug.  For example the last backtrace did not go through libcurl at all:

#0  0x0000003611073c8b in list_dealloc () from /lib64/libpython2.7.so.1.0
#1  0x000000361108004b in dict_dealloc () from /lib64/libpython2.7.so.1.0
#2  0x00000036110584a3 in instance_dealloc () from /lib64/libpython2.7.so.1.0
#3  0x000000361108004b in dict_dealloc () from /lib64/libpython2.7.so.1.0
#4  0x00000036110584a3 in instance_dealloc () from /lib64/libpython2.7.so.1.0
#5  0x00000036110ddace in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#6  0x00000036110e3400 in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#7  0x000000361106f5dc in function_call () from /lib64/libpython2.7.so.1.0
#8  0x000000361104a903 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#9  0x00000036110dc4c7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#10 0x00000036110d83a8 in builtin_map () from /lib64/libpython2.7.so.1.0
#11 0x00000036110e21ff in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#12 0x00000036110e3400 in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#13 0x000000361106f6bd in function_call () from /lib64/libpython2.7.so.1.0
#14 0x000000361104a903 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#15 0x00000036110e0050 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#16 0x00000036110e3400 in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#17 0x000000361106f6bd in function_call () from /lib64/libpython2.7.so.1.0
#18 0x000000361104a903 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#19 0x00000036110e0050 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#20 0x00000036110e1be6 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#21 0x00000036110e1be6 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#22 0x00000036110e3400 in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#23 0x000000361106f5dc in function_call () from /lib64/libpython2.7.so.1.0
#24 0x000000361104a903 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#25 0x0000003611059815 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#26 0x000000361104a903 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#27 0x00000036110dc4c7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#28 0x0000003611112062 in t_bootstrap () from /lib64/libpython2.7.so.1.0
#29 0x000000360f00752a in start_thread (arg=0x7ffeca7fc700) at pthread_create.c:310
#30 0x000000360e50022d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Would it be possible to modify it such that it uses solely libcurl for network I/O?

Comment 22 Adrian Reber 2015-04-10 19:25:38 UTC

Created attachment 1013256 [details]
crash_test using pycurl

Comment 23 Adrian Reber 2015-04-10 19:26:46 UTC

I attached a new version which uses pycurl to make the HEAD requests. It still crashes when using urlgrabber (which is also curl based I think) to download the repo file for checksumming.

Comment 24 Kamil Dudka 2015-04-13 14:31:22 UTC

This took me quite a huge amount of time to debug because I was looking for the cause at a completely wrong place -- bisecting libcurl up to the commit where it started to crash:

https://github.com/bagder/curl/commit/058fb335

The minimal example was pretty helpful.  I modified it to print the exceptions falling out from urlgrabber and saw that setopt() is being called _during_  the execution of perform(), which was caught by pycurl.  It reminded me the FAQ entry I referred to in comment #2 -- curl handles are not allowed to be shared by multiple threads.

Finally, I looked into urlgrabber sources and saw this:

    _curl_cache = pycurl.Curl() # make one and reuse it over and over and over

    [...]

    def _do_open(self):
        self.curl_obj = _curl_cache
        self.curl_obj.reset() # reset all old settings away, just in case

The above code clearly violates the requirement if urlgrabber.urlread() is called from multiple threads.  After patching urlgrabber like this:

--- a/urlgrabber/grabber.py
+++ b/urlgrabber/grabber.py
@@ -1592,7 +1592,7 @@ class PyCurlFileObject(object):
                 raise err

     def _do_open(self):
-        self.curl_obj = _curl_cache
+        self.curl_obj = _pycurl.Curl()
         self.curl_obj.reset() # reset all old settings away, just in case
         # setup any ranges
         self._set_opts()

... your example started to work reliably (and independently on the version of curl).  I would suggest to just stop using urlgrabber in your code, it is rarely useful these days.

Comment 25 Adrian Reber 2015-04-14 13:12:38 UTC

Thanks for your analysis. The MirrorManager2 crawler has been rewritten to use urllib2 instead of urlgrabber, We are happy that we found the reason and that the crash is understood.

Comment 28 Valentina Mukhamedzhanova 2016-05-31 10:49:06 UTC

*** Bug 1178659 has been marked as a duplicate of this bug. ***

Comment 31 Michal Domonkos 2016-11-23 18:00:50 UTC

Created attachment 1223392 [details]
downstream patch

Comment 37 errata-xmlrpc 2018-10-30 10:15:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:3130

Note You need to log in before you can comment on or make changes to this bug.