Bug 1909150

Summary: yum gets deadlocked/hung up (indefinitely) waiting for urlgrabber-ext-down
Product: Red Hat Enterprise Linux 7 Reporter: Brian J. Murrell <brian.murrell>
Component: nssAssignee: nss-nspr-maint <nss-nspr-maint>
Status: CLOSED ERRATA QA Contact: Ivan Nikolchev <inikolch>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.9CC: akaiser, asanders, inikolch, james.antill, jreznik, mkielian, rrelyea, ssorce
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: ---Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-12 15:26:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brian J. Murrell 2020-12-18 13:34:05 UTC
Description of problem:
Yum is getting deadlocked trying to do an update.

Version-Release number of selected component (if applicable):
yum-3.4.3-168

How reproducible:
Intermittent

Steps to Reproduce:
Not really known

Actual results:
Yum deadlocks

Expected results:
Yum should not deadlock

Additional info:
The process tree looks like this:

 8702 ?        S      0:05  |       \_ /usr/bin/python /usr/bin/yum -y --disablerepo=* --enablerepo=repo.dc.hpdd.intel.com_repository_*,build.hpdd.intel.com_job_daos-stack* install --exclude openmpi daos-1.1.2.1-1.5456.g02ce0510.el7.x86_64 daos-client-1.1.2.1-1.5456.g02ce0510.el7.x86_64 daos-tests-1.1.2.1-1.5456.g02ce0510.el7.x86_64 daos-server-1.1.2.1-1.5456.g02ce0510.el7.x86_64 openmpi3 hwloc ndctl fio patchutils ior-hpc-daos-0 romio-tests-cart-4-daos-0 testmpio-cart-4-daos-0 mpi4py-tests-cart-4-daos-0 hdf5-mpich2-tests-daos-0 hdf5-openmpi3-tests-daos-0 hdf5-vol-daos-mpich2-tests-daos-0 hdf5-vol-daos-openmpi3-tests-daos-0 MACSio-mpich2-daos-0 MACSio-openmpi3-daos-0 mpifileutils-mpich-daos-0
 8705 ?        S      0:00  |           \_ /usr/bin/python /usr/libexec/urlgrabber-ext-down
 8711 ?        S      0:00  |           \_ /usr/bin/python /usr/libexec/urlgrabber-ext-down
 8712 ?        S      0:00  |           \_ /usr/bin/python /usr/libexec/urlgrabber-ext-down

The status of the processes are:

# /tmp/strace -f -p 8702
/tmp/strace: Process 8702 attached
wait4(8711, ^C/tmp/strace: Process 8702 detached
 <detached ...>
# /tmp/strace -f -p 8705
/tmp/strace: Process 8705 attached
read(0, ^C/tmp/strace: Process 8705 detached
 <detached ...>
# /tmp/strace -f -p 8711
/tmp/strace: Process 8711 attached
futex(0x1444c90, FUTEX_WAIT_PRIVATE, 2, NULL^C/tmp/strace: Process 8711 detached
 <detached ...>
# /tmp/strace -f -p 8712
/tmp/strace: Process 8712 attached
futex(0x2174c90, FUTEX_WAIT_PRIVATE, 2, NULL^C/tmp/strace: Process 8712 detached
 <detached ...>

which to me looks like 8702, 8711 and 8705 are deadlocked all waiting/blocked on each other.

This is gumming up our automated CI and leaves test nodes hung for days until it is noticed and killed.

Comment 14 Bob Relyea 2021-06-23 22:28:58 UTC
Ooops this bug need zstream+ and pm_ack+. Don't know why it didn't automatically get pm_ack with the devel & qa_acks.

Comment 15 Simo Sorce 2021-06-24 12:55:59 UTC
Acked

Comment 37 errata-xmlrpc 2021-10-12 15:26:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nss, nss-softokn, nss-util, and nspr bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3793