Bug 1397087 - Rpm hangs after updating to glibc >= 2.25
Rpm hangs after updating to glibc >= 2.25
Status: NEW
Product: Fedora
Classification: Fedora
Component: rpm (Show other bugs)
26
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: packaging-team-maint
Fedora Extras Quality Assurance
:
: 1435927 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-11-21 10:11 EST by Lukas Slebodnik
Modified: 2017-04-26 06:56 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Dockerfile which reveal the bug very often (672 bytes, text/plain)
2016-11-21 10:36 EST, Lukas Slebodnik
no flags Details
Output of db_stat -CA -h /var/lib/rpm (8.52 KB, text/plain)
2016-11-22 04:09 EST, Lukas Slebodnik
no flags Details

  None (edit)
Description Lukas Slebodnik 2016-11-21 10:11:10 EST
Description of problem:
I do not have a deterministic reproducer. But I happen for me when building
docker images based on rawhide. The bug is reproducible with dnf and yum-deprecated therefore I assume bug is in rpm. But I noticed there are valgrind errors with uninitialized memory which might explain difficult reproducer.

Version-Release number of selected component (if applicable):
rpm-4.13.0-4.fc26.x86_64

How reproducible:
Deterministic in my environment when building docker image
It happened to me that installation was stuck even for installation of simple pacakge (lftp)

Steps to Reproduce:
# docker run -ti --rm fedora:rawhide bash
Run docker images for fedora rawhide and run following commands inside container
1. dnf update -y 'rpm*'
2. dnf install -y 'dnf-command(builddep)' valgrind
3. dnf debuginfo-install -y rpm libdb
# try to install package with minimal dependencies (e.g. lftp)
4. valgrind --track-origins=yes dnf install -y lftp

Actual results:
Total                                                                                                                                                      215 kB/s | 822 kB     00:03     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Installing  : lftp-4.7.3-1.fc25.x86_64                                                                                                                                               1/1 
==83== Syscall param pwrite64(buf) points to uninitialised byte(s)
==83==    at 0x5317DE3: ??? (in /usr/lib64/libpthread-2.24.90.so)
==83==    by 0x12720C30: __os_io (in /usr/lib64/libdb-5.3.so)
==83==    by 0x1270C731: ??? (in /usr/lib64/libdb-5.3.so)
==83==    by 0x1270CBA2: __memp_bhwrite (in /usr/lib64/libdb-5.3.so)
==83==    by 0x1271CE14: __memp_sync_int (in /usr/lib64/libdb-5.3.so)
==83==    by 0x126B2D16: __db_sync (in /usr/lib64/libdb-5.3.so)
==83==    by 0x126C53CF: __db_sync_pp (in /usr/lib64/libdb-5.3.so)
==83==    by 0x11F53FF5: dbiSync (db3.c:547)
==83==    by 0x11F53FF5: db3_dbiCursorFree (db3.c:606)
==83==    by 0x11F5ECD9: rpmdbAdd (rpmdb.c:2411)
==83==    by 0x11F72A0C: dbAdd (psm.c:551)
==83==    by 0x11F72A0C: rpmPackageInstall (psm.c:682)
==83==    by 0x11F72A0C: rpmpsmRun (psm.c:844)
==83==    by 0x11F862A4: rpmteProcess (rpmte.c:756)
==83==    by 0x11F8CBBD: rpmtsProcess (transaction.c:1361)
==83==    by 0x11F8CBBD: rpmtsRun (transaction.c:1541)
==83==  Address 0x8e59248 is 2,312 bytes inside a block of size 4,096 alloc'd
==83==    at 0x4C2DB8D: malloc (vg_replace_malloc.c:299)
==83==    by 0x1271E1F4: __os_malloc (in /usr/lib64/libdb-5.3.so)
==83==    by 0x1270C897: ??? (in /usr/lib64/libdb-5.3.so)
==83==    by 0x1270CBA2: __memp_bhwrite (in /usr/lib64/libdb-5.3.so)
==83==    by 0x1271CE14: __memp_sync_int (in /usr/lib64/libdb-5.3.so)
==83==    by 0x126B2D16: __db_sync (in /usr/lib64/libdb-5.3.so)
==83==    by 0x126C53CF: __db_sync_pp (in /usr/lib64/libdb-5.3.so)
==83==    by 0x11F53FF5: dbiSync (db3.c:547)
==83==    by 0x11F53FF5: db3_dbiCursorFree (db3.c:606)
==83==    by 0x11F5ECD9: rpmdbAdd (rpmdb.c:2411)
==83==    by 0x11F72A0C: dbAdd (psm.c:551)
==83==    by 0x11F72A0C: rpmPackageInstall (psm.c:682)
==83==    by 0x11F72A0C: rpmpsmRun (psm.c:844)
==83==    by 0x11F862A4: rpmteProcess (rpmte.c:756)
==83==    by 0x11F8CBBD: rpmtsProcess (transaction.c:1361)
==83==    by 0x11F8CBBD: rpmtsRun (transaction.c:1541)
==83==  Uninitialised value was created by a stack allocation
==83==    at 0x125EB590: ??? (in /usr/lib64/libdb-5.3.so)
==83== 
  Verifying   : lftp-4.7.3-1.fc25.x86_64                                                                                                                                               1/1 

Installed:
  lftp.x86_64 4.7.3-1.fc25                                                                                                                                                                 

Complete!
==83== 
==83== HEAP SUMMARY:
==83==     in use at exit: 58,246,804 bytes in 53,257 blocks
==83==   total heap usage: 447,873 allocs, 394,616 frees, 3,993,424,200 bytes allocated
==83== 
==83== LEAK SUMMARY:
==83==    definitely lost: 345 bytes in 20 blocks
==83==    indirectly lost: 0 bytes in 0 blocks
==83==      possibly lost: 3,164,151 bytes in 18,487 blocks
==83==    still reachable: 55,081,988 bytes in 34,747 blocks
==83==                       of which reachable via heuristic:
==83==                         length64           : 426,096 bytes in 504 blocks
==83==                         newarray           : 1,536 bytes in 16 blocks
==83==         suppressed: 0 bytes in 0 blocks
==83== Rerun with --leak-check=full to see details of leaked memory
==83== 
==83== For counts of detected and suppressed errors, rerun with: -v
==83== ERROR SUMMARY: 5 errors from 1 contexts (suppressed: 0 from 0)

Expected results:
ERROR SUMMARY has 0 errors

Additional info:
I can see a little bit different valgrind error with direct usage of rpm bianry

sh# rpm -e lftp
sh# valgrind --track-origins=yes rpm -ivh https://kojipkgs.fedoraproject.org//packages/lftp/4.7.3/1.fc25/x86_64/lftp-4.7.3-1.fc25.x86_64.rpm
==91== Memcheck, a memory error detector
==91== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==91== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==91== Command: rpm -ivh https://kojipkgs.fedoraproject.org//packages/lftp/4.7.3/1.fc25/x86_64/lftp-4.7.3-1.fc25.x86_64.rpm
==91== 
Retrieving https://kojipkgs.fedoraproject.org//packages/lftp/4.7.3/1.fc25/x86_64/lftp-4.7.3-1.fc25.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:lftp-4.7.3-1.fc25                ################################# [100%]
==91== Syscall param pwrite64(buf) points to uninitialised byte(s)
==91==    at 0x6FA0DC3: __pwrite64_nocancel (in /usr/lib64/libpthread-2.24.90.so)
==91==    by 0x584AC30: __os_io (in /usr/lib64/libdb-5.3.so)
==91==    by 0x5836731: ??? (in /usr/lib64/libdb-5.3.so)
==91==    by 0x5836BA2: __memp_bhwrite (in /usr/lib64/libdb-5.3.so)
==91==    by 0x5846E14: __memp_sync_int (in /usr/lib64/libdb-5.3.so)
==91==    by 0x57DCD16: __db_sync (in /usr/lib64/libdb-5.3.so)
==91==    by 0x57EF3CF: __db_sync_pp (in /usr/lib64/libdb-5.3.so)
==91==    by 0x4E4EFF5: dbiSync (db3.c:547)
==91==    by 0x4E4EFF5: db3_dbiCursorFree (db3.c:606)
==91==    by 0x4E5626E: tag2index (rpmdb.c:2368)
==91==    by 0x4E59D4A: indexPut (rpmdb.c:2377)
==91==    by 0x4E59D4A: rpmdbAdd (rpmdb.c:2421)
==91==    by 0x4E6DA0C: dbAdd (psm.c:551)
==91==    by 0x4E6DA0C: rpmPackageInstall (psm.c:682)
==91==    by 0x4E6DA0C: rpmpsmRun (psm.c:844)
==91==    by 0x4E812A4: rpmteProcess (rpmte.c:756)
==91==  Address 0x9036f30 is in a rw- mapped file /var/lib/rpm/__db.003 segment
==91==  Uninitialised value was created by a stack allocation
==91==    at 0x572A740: ??? (in /usr/lib64/libdb-5.3.so)
==91== 
==91== 
==91== HEAP SUMMARY:
==91==     in use at exit: 13,716 bytes in 78 blocks
==91==   total heap usage: 12,555 allocs, 12,477 frees, 15,969,502 bytes allocated
==91== 
==91== LEAK SUMMARY:
==91==    definitely lost: 0 bytes in 0 blocks
==91==    indirectly lost: 0 bytes in 0 blocks
==91==      possibly lost: 0 bytes in 0 blocks
==91==    still reachable: 13,716 bytes in 78 blocks
==91==         suppressed: 0 bytes in 0 blocks
==91== Rerun with --leak-check=full to see details of leaked memory
==91== 
==91== For counts of detected and suppressed errors, rerun with: -v
==91== ERROR SUMMARY: 4 errors from 1 contexts (suppressed: 0 from 0)
Comment 1 Panu Matilainen 2016-11-21 10:18:43 EST
The valgrind complaint is an ages old false positive caused by BDB internal optimization, its even documented somewhere in BDB doco.

Try looking at what it's actually doing when it appears stuck - is it burning cpu/io cycles or just hanging there etc.
Comment 2 Lukas Slebodnik 2016-11-21 10:23:23 EST
(In reply to Panu Matilainen from comment #1)
> The valgrind complaint is an ages old false positive caused by BDB internal
> optimization, its even documented somewhere in BDB doco.
> 
> Try looking at what it's actually doing when it appears stuck - is it
> burning cpu/io cycles or just hanging there etc.

There is not any activity, but I can see also different valgrind error
which does not contain 'dbiSync'

Steps to Reproduce:
# docker run -ti --rm fedora:rawhide bash
Run docker images for fedora rawhide and run following commands inside container
    1  dnf update -y 'rpm*'
    2  dnf install -y 'dnf-command(builddep)' valgrind
    3  dnf debuginfo-install -y rpm libdb
    4  valgrind --track-origins=yes dnf update -y glibc

[root@d609e6427820 /]# valgrind --track-origins=yes dnf update -y glibc        
==80== Memcheck, a memory error detector
==80== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==80== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==80== Command: /usr/bin/dnf update -y glibc
==80== 
Last metadata expiration check: 0:02:12 ago on Mon Nov 21 15:15:36 2016 UTC.
Dependencies resolved.
================================================================================
 Package                  Arch        Version                Repository    Size
================================================================================
Upgrading:
 glibc                    x86_64      2.24.90-17.fc26        rawhide      3.3 M
 glibc-all-langpacks      x86_64      2.24.90-17.fc26        rawhide      7.0 M
 glibc-common             x86_64      2.24.90-17.fc26        rawhide      881 k
 libcrypt-nss             x86_64      2.24.90-17.fc26        rawhide       46 k

Transaction Summary
================================================================================
Upgrade  4 Packages

Total download size: 11 M
Downloading Packages:
(1/4): glibc-common-2.24.90-17.fc26.x86_64.rpm  1.8 MB/s | 881 kB     00:00    
(2/4): libcrypt-nss-2.24.90-17.fc26.x86_64.rpm  1.6 MB/s |  46 kB     00:00    
(3/4): glibc-2.24.90-17.fc26.x86_64.rpm         2.2 MB/s | 3.3 MB     00:01    
(4/4): glibc-all-langpacks-2.24.90-17.fc26.x86_ 2.4 MB/s | 7.0 MB     00:02    
--------------------------------------------------------------------------------
Total                                           1.7 MB/s |  11 MB     00:06     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Upgrading   : glibc-common-2.24.90-17.fc26.x86_64                         1/8 
==80== Syscall param pwrite64(buf) points to uninitialised byte(s)
==80==    at 0x5317DE3: ??? (in /usr/lib64/libpthread-2.24.90.so)
==80==    by 0x12720C30: __os_io (in /usr/lib64/libdb-5.3.so)
==80==    by 0x1270C731: ??? (in /usr/lib64/libdb-5.3.so)
==80==    by 0x1270CBA2: __memp_bhwrite (in /usr/lib64/libdb-5.3.so)
==80==    by 0x1271CE14: __memp_sync_int (in /usr/lib64/libdb-5.3.so)
==80==    by 0x126B2D16: __db_sync (in /usr/lib64/libdb-5.3.so)
==80==    by 0x126C53CF: __db_sync_pp (in /usr/lib64/libdb-5.3.so)
==80==    by 0x11F53FF5: dbiSync (db3.c:547)
==80==    by 0x11F53FF5: db3_dbiCursorFree (db3.c:606)
==80==    by 0x11F5ECD9: rpmdbAdd (rpmdb.c:2411)
==80==    by 0x11F72A0C: dbAdd (psm.c:551)
==80==    by 0x11F72A0C: rpmPackageInstall (psm.c:682)
==80==    by 0x11F72A0C: rpmpsmRun (psm.c:844)
==80==    by 0x11F862A4: rpmteProcess (rpmte.c:756)
==80==    by 0x11F8CBBD: rpmtsProcess (transaction.c:1361)
==80==    by 0x11F8CBBD: rpmtsRun (transaction.c:1541)
==80==  Address 0x7017596 is 2,278 bytes inside a block of size 4,096 alloc'd
==80==    at 0x4C2DB8D: malloc (vg_replace_malloc.c:299)
==80==    by 0x1271E1F4: __os_malloc (in /usr/lib64/libdb-5.3.so)
==80==    by 0x1270C897: ??? (in /usr/lib64/libdb-5.3.so)
==80==    by 0x1270CBA2: __memp_bhwrite (in /usr/lib64/libdb-5.3.so)
==80==    by 0x1271CE14: __memp_sync_int (in /usr/lib64/libdb-5.3.so)
==80==    by 0x126B2D16: __db_sync (in /usr/lib64/libdb-5.3.so)
==80==    by 0x126C53CF: __db_sync_pp (in /usr/lib64/libdb-5.3.so)
==80==    by 0x11F53FF5: dbiSync (db3.c:547)
==80==    by 0x11F53FF5: db3_dbiCursorFree (db3.c:606)
==80==    by 0x11F5ECD9: rpmdbAdd (rpmdb.c:2411)
==80==    by 0x11F72A0C: dbAdd (psm.c:551)
==80==    by 0x11F72A0C: rpmPackageInstall (psm.c:682)
==80==    by 0x11F72A0C: rpmpsmRun (psm.c:844)
==80==    by 0x11F862A4: rpmteProcess (rpmte.c:756)
==80==    by 0x11F8CBBD: rpmtsProcess (transaction.c:1361)
==80==    by 0x11F8CBBD: rpmtsRun (transaction.c:1541)
==80==  Uninitialised value was created by a stack allocation
==80==    at 0x125EB590: ??? (in /usr/lib64/libdb-5.3.so)
==80== 
  Upgrading   : glibc-all-langpacks-2.24.90-17.fc26.x86_64                  2/8 
==80== Conditional jump or move depends on uninitialised value(s)
==80==    at 0x1260C694: __bam_stkrel (in /usr/lib64/libdb-5.3.so)
==80==    by 0x125FD257: ??? (in /usr/lib64/libdb-5.3.so)
==80==    by 0x126B71DB: __dbc_iput (in /usr/lib64/libdb-5.3.so)
==80==    by 0x126C69C0: __dbc_put_pp (in /usr/lib64/libdb-5.3.so)
==80==    by 0x11F53234: dbiCursorPut.isra.5 (db3.c:626)
==80==    by 0x11F551C0: updateIndex.part.8 (db3.c:1097)
==80==    by 0x11F55590: updateIndex (db3.c:1144)
==80==    by 0x11F55590: db3_idxdbPut (db3.c:1140)
==80==    by 0x11F5B0B2: tag2index (rpmdb.c:2348)
==80==    by 0x11F5ED4A: indexPut (rpmdb.c:2377)
==80==    by 0x11F5ED4A: rpmdbAdd (rpmdb.c:2421)
==80==    by 0x11F72A0C: dbAdd (psm.c:551)
==80==    by 0x11F72A0C: rpmPackageInstall (psm.c:682)
==80==    by 0x11F72A0C: rpmpsmRun (psm.c:844)
==80==    by 0x11F862A4: rpmteProcess (rpmte.c:756)
==80==    by 0x11F8CBBD: rpmtsProcess (transaction.c:1361)
==80==    by 0x11F8CBBD: rpmtsRun (transaction.c:1541)
==80==  Uninitialised value was created by a stack allocation
==80==    at 0x125E9C50: ??? (in /usr/lib64/libdb-5.3.so)
==80== 
  Upgrading   : glibc-2.24.90-17.fc26.x86_64                                3/8 
  Upgrading   : libcrypt-nss-2.24.90-17.fc26.x86_64                         4/8 
  Cleanup     : libcrypt-nss-2.24.90-2.fc26.x86_64                          5/8 
  Cleanup     : glibc-common-2.24.90-2.fc26.x86_64                          6/8 
  Cleanup     : glibc-all-langpacks-2.24.90-2.fc26.x86_64                   7/8 
  Cleanup     : glibc-2.24.90-2.fc26.x86_64                                 8/8 
  Verifying   : glibc-2.24.90-17.fc26.x86_64                                1/8 
  Verifying   : glibc-common-2.24.90-17.fc26.x86_64                         2/8 
  Verifying   : glibc-all-langpacks-2.24.90-17.fc26.x86_64                  3/8 
  Verifying   : libcrypt-nss-2.24.90-17.fc26.x86_64                         4/8 
  Verifying   : glibc-2.24.90-2.fc26.x86_64                                 5/8 
  Verifying   : glibc-all-langpacks-2.24.90-2.fc26.x86_64                   6/8 
  Verifying   : glibc-common-2.24.90-2.fc26.x86_64                          7/8 
  Verifying   : libcrypt-nss-2.24.90-2.fc26.x86_64                          8/8 

Upgraded:
  glibc.x86_64 2.24.90-17.fc26                                                  
  glibc-all-langpacks.x86_64 2.24.90-17.fc26                                    
  glibc-common.x86_64 2.24.90-17.fc26                                           
  libcrypt-nss.x86_64 2.24.90-17.fc26                                           

Complete!
==80== 
==80== HEAP SUMMARY:
==80==     in use at exit: 58,807,031 bytes in 56,957 blocks
==80==   total heap usage: 1,264,661 allocs, 1,207,704 frees, 4,144,098,835 bytes allocated
==80== 
==80== LEAK SUMMARY:
==80==    definitely lost: 1,521 bytes in 23 blocks
==80==    indirectly lost: 125,140 bytes in 66 blocks
==80==      possibly lost: 3,288,601 bytes in 19,571 blocks
==80==    still reachable: 55,391,001 bytes in 37,287 blocks
==80==                       of which reachable via heuristic:
==80==                         length64           : 426,096 bytes in 504 blocks
==80==                         newarray           : 1,536 bytes in 16 blocks
==80==         suppressed: 0 bytes in 0 blocks
==80== Rerun with --leak-check=full to see details of leaked memory
==80== 
==80== For counts of detected and suppressed errors, rerun with: -v
==80== ERROR SUMMARY: 34 errors from 2 contexts (suppressed: 0 from 0)
Comment 3 Lukas Slebodnik 2016-11-21 10:36 EST
Created attachment 1222428 [details]
Dockerfile which reveal the bug very often

Copy attached file to empty directory and build and image
sh# docker build -t temp_rm_me .

The installation of build dependencies stuck for me with following output
Dependencies resolved.
================================================================================
 Package                          Arch   Version                  Repository
                                                                           Size
================================================================================
Installing:
 389-ds-base-devel                x86_64 1.3.6.1-1.fc26           rawhide 222 k
 autoconf                         noarch 2.69-23.fc26             rawhide 710 k
 automake                         noarch 1.15-8.fc26              rawhide 694 k
 cyrus-sasl-devel                 x86_64 2.1.26-27.fc26           rawhide 313 k
 dbus-python                      x86_64 1.2.4-2.fc25             rawhide 129 k

//snip

Transaction Summary
================================================================================
Install  209 Packages
Upgrade   28 Packages
Skip      14 Packages

Total download size: 99 M
Downloading Packages:
--------------------------------------------------------------------------------
Total                                            28 MB/s |  99 MB     00:03     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Upgrading   : libcom_err-1.43.3-1.fc26.x86_64                           1/265 
  Installing  : libtalloc-2.1.8-1.fc26.x86_64                             2/265 
  Upgrading   : nspr-4.13.1-1.fc26.x86_64                                 3/265 
  Installing  : libtevent-0.9.31-1.fc26.x86_64                            4/265 
  Installing  : libtdb-1.3.11-1.fc26.x86_64                               5/265 
  Upgrading   : nss-util-3.27.0-2.fc26.x86_64                             6/265 
  Installing  : python2-six-1.10.0-4.fc26.noarch                          7/265 
  Upgrading   : libuuid-2.29-1.fc26.x86_64                                8/265 
  Installing  : libldb-1.1.27-1.fc26.x86_64                               9/265 
  Upgrading   : pcre-8.39-6.fc26.x86_64                                  10/265 
  Installing  : nspr-devel-4.13.1-1.fc26.x86_64                          11/265 
  Upgrading   : libblkid-2.29-1.fc26.x86_64                              12/265 
  Installing  : xmlrpc-c-1.32.5-1909.svn2451.fc24.x86_64                 13/265


and there are just 2 processes inside container shell and python with dnf

   CGroup: /system.slice/docker-04f608509c75dc43bb7a1c5e86fae071397b4969ba03d179cc0b3466229bfd37.scope
           ├─24379 /bin/sh -c dnf --setopt=deltarpm=0 --setopt=fastestmirror=1         --setopt=install_weak_deps=false --setopt=tsflags=nodocs         install -y             sudo             git             'dnf-command(builddep)'             @buildsys-build     && curl https://git.fedorahosted.org/cgit/freeipa.git/plain/freeipa.spec.in             -o freeipa.spec.in     && dnf --setopt=deltarpm=0 --setopt=fastestmirror=1            --setopt=install_weak_deps=false --setopt=tsflags=nodocs            builddep -y --spec freeipa.spec.in     && rm -f freeipa.spec.in     && dnf clean all
           └─24589 /usr/libexec/system-python /usr/bin/dnf --setopt=deltarpm=0 --setopt=fastestmirror=1 --setopt=install_weak_deps=false --setopt=tsflags=nodocs builddep -y --spec freeipa.spec.in


The strace for python is:
sh# strace -p 24589
strace: Process 24589 attached
futex(0x7ff6f2a11064, FUTEX_WAIT, 4294967295, NULL
Comment 4 Lukas Slebodnik 2016-11-21 10:37:49 EST
As I mention in the description The same problem is with yum-deprecated.
And if I try to install some packages manually the installation is stuck on different package.
Comment 5 Lukas Slebodnik 2016-11-21 10:39:28 EST
BTW my workaround is to remove && rebuild rpm-db before calling builddep

--- a/Dockerfile        2016-11-21 16:24:25.994623950 +0100
+++ b/Dockerfile        2016-11-21 14:53:04.466568545 +0100
@@ -11,6 +11,8 @@
             @buildsys-build \
     && curl https://git.fedorahosted.org/cgit/freeipa.git/plain/freeipa.spec.in \
             -o freeipa.spec.in \
+    && rm -f /var/lib/rpm/__db* \
+    && rpm --rebuilddb \
     && dnf --setopt=deltarpm=0 --setopt=fastestmirror=1 \
            --setopt=install_weak_deps=false --setopt=tsflags=nodocs \
            builddep -y --spec freeipa.spec.in \
Comment 6 Lukas Slebodnik 2016-11-21 12:05:43 EST
(In reply to Lukas Slebodnik from comment #5)
> BTW my workaround is to remove && rebuild rpm-db before calling builddep
> 
> --- a/Dockerfile        2016-11-21 16:24:25.994623950 +0100
> +++ b/Dockerfile        2016-11-21 14:53:04.466568545 +0100
> @@ -11,6 +11,8 @@
>              @buildsys-build \
>      && curl
> https://git.fedorahosted.org/cgit/freeipa.git/plain/freeipa.spec.in \
>              -o freeipa.spec.in \
> +    && rm -f /var/lib/rpm/__db* \
> +    && rpm --rebuilddb \
>      && dnf --setopt=deltarpm=0 --setopt=fastestmirror=1 \
>             --setopt=install_weak_deps=false --setopt=tsflags=nodocs \
>             builddep -y --spec freeipa.spec.in \

Florian F. mentioned on IRC, that simpler workaround is just to remove var/lib/rpm/__db* files. Calling "rpm --rebuilddb" is not necessary.
Comment 7 Panu Matilainen 2016-11-21 23:18:11 EST
That sounds like there's a db cursor not getting freed somewhere.
Comment 8 Panu Matilainen 2016-11-22 00:31:44 EST
(too early in the morning...)

Output of 'db_stat -CA -h /var/lib/rpm' as root when it hangs would be interesting.
Comment 9 Lukas Slebodnik 2016-11-22 04:09 EST
Created attachment 1222632 [details]
Output of db_stat -CA -h /var/lib/rpm

Here you are
Comment 10 Lukas Slebodnik 2016-11-22 04:11:06 EST
FYI I also noticed that fedora rawhide has never version of libdb 5.3.28-15
which should fix mutexes not being released properly (#1272680).

I downgraded libdb to older version without this patch but it did not fix hangs.
Comment 11 Lukas Slebodnik 2016-11-23 08:12:26 EST
I hit this bug yesterday on my laptop when I was upgrading f25 -> rawhide.
I had to cancel/kill dnf and as a side effect I had installed more version of packages because upgrade transaction has not been finished.
Comment 12 Panu Matilainen 2016-11-23 11:26:34 EST
Okay I can reproduce something remotely resembling that now. I'm on F24'ish system so details will differ but there are hints at least.

# yum-deprecated --releasever=25 --installroot /srv/test --setopt=keepcache=true install bash
[...]
# db_stat -CA -h /srv/test/var/lib/rpm
db_stat: DB_ENV->open: /srv/test/var/lib/rpm: No such file or directory

That is how it should be, rpm removes the BDB environment after chroot operations. That's what happens with rpm itself (obviously). However doing the same with dnf:

# dnf --releasever=25 --installroot /srv/test --setopt=keepcache=true install bash
[...]
# db_stat -CA -h /srv/test/var/lib/rpm|wc -l
162

In this case they're all READ locks which rpm will mope up on the next run but it'll complain loudly. I know I've seen fleeting glimpses of those complaints where they shouldn't have existed, just haven't had a chance to investigate. But the screwy thing here is that the environment should not even exist after dnf exits! So it seems dnf is (re)opening the rpmdb from inside the chroot which is ... not a good idea. This is with dnf 1.1.10 on F24, IIRC F25 has 2.0 and newer hawkey/whatever so there's plenty of room for different behavior. 

You said you can reproduce it with yum-deprecated, but can you actually reproduce the hang if you *only* use yum-deprecated all the way? (ie after using "rm -f /var/lib/rpm/__db*" to clear any cruft from earlier runs)
Comment 13 Panu Matilainen 2016-11-23 11:38:44 EST
Another datapoint: for my testcase the problem is related to importing pubkeys. If I run dnf with --nogpgcheck or pre-import the keys to the test root, there are no leftover locks. Please test whether that makes a difference for you - its possible I'm seeing a different issue entirely, but its also quite possible it's the same issue but only worse with newer dnf.
Comment 14 Panu Matilainen 2016-11-24 04:35:35 EST
Right, never mind about the pubkey case, that appears to be relatively harmless.

What kills it is any outside access to rpmdb during the transaction where glibc is updated to the rawhide version. For example 'db_stat -CA -h /var/lib/rpm' is fine until glibc has gotten updated, after that trying to run it causes both rpm and the db_stat process  tohang. Ditto for running 'rpm -qa' or such (as root) durin the update. That does not happen during eg F24 -> F25 glibc update.

fweimer (cc'd): has there been some major futex-related change in rawhide glibc?
Comment 15 Panu Matilainen 2016-11-24 04:49:56 EST
Oh, actually it's simpler than that: after updating to rawhide glibc, new rpm is dead until you nuke the BDB environment: 'rm -f /var/lib/rpm/__db*'
Comment 16 Panu Matilainen 2016-11-24 04:50:47 EST
Oops, not "new rpm". Just "rpm is dead" until...
Comment 17 Florian Weimer 2016-11-24 05:10:52 EST
(In reply to Panu Matilainen from comment #14)
> Right, never mind about the pubkey case, that appears to be relatively
> harmless.
> 
> What kills it is any outside access to rpmdb during the transaction where
> glibc is updated to the rawhide version. For example 'db_stat -CA -h
> /var/lib/rpm' is fine until glibc has gotten updated, after that trying to
> run it causes both rpm and the db_stat process  tohang. Ditto for running
> 'rpm -qa' or such (as root) durin the update. That does not happen during eg
> F24 -> F25 glibc update.
> 
> fweimer (cc'd): has there been some major futex-related change in rawhide
> glibc?

This looks like bug 1394862.  Berkeley DB assumes that it's safe to use process-shared mutexes and condition variables after all processes have detached from a mapping, something which POSIX does not guarantee.
Comment 18 Panu Matilainen 2016-11-24 05:35:18 EST
(In reply to Florian Weimer from comment #17)
> This looks like bug 1394862. 

Oh, indeed. Thanks for the pointer.

> Berkeley DB assumes that it's safe to use
> process-shared mutexes and condition variables after all processes have
> detached from a mapping, something which POSIX does not guarantee.

Right. It's not the first time such glibc changes cause trouble for rpmdb, just been a while since the last one and BDB locks are still just as ill-suited for rpm as they ever were...

Technically this is a dupe of bug 1394862 but I'm leaving this open for now:
1) people are more likely to find an rpm bug than a libdb one when installation hangs
2) we might need to come up with some kind of workaround, we can't have every other upgrade blowing up in the middle

As for 2), we could have a trigger on glibc to always nuke the environment in that case, but I worry what it does to the already running rpm instance. Or we could have rpm use private bdb locking but that opens up other issues. Or something...
Comment 19 Fedora End Of Life 2017-02-28 05:38:42 EST
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.
Comment 20 Panu Matilainen 2017-04-04 06:27:13 EDT
*** Bug 1435927 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.