RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1668380 - Under Microsoft WIndows Subsystem for Linux (WSL) - Error: rpmdb open failed [rhel8]
Summary: Under Microsoft WIndows Subsystem for Linux (WSL) - Error: rpmdb open failed ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: rpm
Version: 8.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 8.0
Assignee: Packaging Maintenance Team
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
: 1668378 (view as bug list)
Depends On: 1668379
Blocks: 1623566
TreeView+ depends on / blocked
 
Reported: 2019-01-22 15:41 UTC by James Hartsock
Modified: 2019-05-03 22:13 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1668379
Environment:
Last Closed: 2019-02-01 08:10:39 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rpm output pre and post rebuilddb (8.89 MB, application/gzip)
2019-02-01 02:19 UTC, James Hartsock
no flags Details
tar of rpm.works & rpm.fails (6.25 MB, application/gzip)
2019-02-01 02:21 UTC, James Hartsock
no flags Details
opensuse strace of rpm and rebuildb (16.80 MB, application/gzip)
2019-02-01 14:53 UTC, James Hartsock
no flags Details
RHEL 7 strace of rpm -qa before & after rpm --rebuilddb in WSL build 18890 (1.24 MB, application/gzip)
2019-05-03 22:13 UTC, James Hartsock
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github Microsoft WSL issues 3939 0 None None None 2019-03-26 16:39:17 UTC
Github WhitewaterFoundry WLE issues 20 0 None None None 2019-03-26 16:38:31 UTC
Red Hat Knowledge Base (Solution) 3823982 0 None None None 2019-02-01 08:11:35 UTC

Description James Hartsock 2019-01-22 15:41:14 UTC
+++ This bug was initially created as a clone of Bug #1668379 +++

Description of problem:
Fedora (and RHEL 8 beta) systems have following errors with dnf immediately.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux release 8.0 Beta (Ootpa)

How reproducible:
Very

Steps to Reproduce:
1. Take Fedora (or RHEL 8 Beta) container image, export it as tar and gzip
   # podman pull registry.access.redhat.com/rhel8-beta
   # podman run -it rhel8-beta sleep 999999
   # podman ps
   CONTAINER ID
   <ContainID>
   # podman export <ContainID> -o rhel8.tar
   # podman kill   <ContainID>
   # gzip rhel8.tar

2. Use https://github.com/DDoSolitary/LxRunOffline on Windows

   C:\Temp>LxRunOffline-v3.3.2\LxRunOffline.exe install -n RHEL8 -d RHEL8 -f rhel8.tar.gz
   C:\Temp>LxRunOffline-v3.3.2\LxRunOffline.exe set-default -n RHEL8
   C:\Temp>LxRunOffline-v3.3.2\LxRunOffline.exe get-default
   RHEL*

3. Run rpm or dnf command


Actual results:
[root@win10 Temp]# dnf list
Failed to set locale, defaulting to C
error: db5 error(12) from dbenv->open: Cannot allocate memory
error: db5 error(22) from dbenv->close: Invalid argument
error: cannot open Packages index using db5 - Cannot allocate memory (12)
error: cannot open Packages database in /var/lib/rpm
Error: Error: rpmdb open failed

Expected results:
Should not be RPM issues


Additional info:

WSL info @ https://docs.microsoft.com/en-us/windows/wsl/install-win10

Comment 1 James Hartsock 2019-01-22 15:49:36 UTC
Some public discussion going on at:

https://github.com/Microsoft/WSL/issues/90
https://github.com/Microsoft/WSL/issues/3742

Comment 2 Panu Matilainen 2019-01-31 09:09:04 UTC
"Works in native Linux, doesn't in WSL" sounds simply like a bug in WSL to me.

Berkeley DB's shared environment has a bit of a history for being a sucker for weird VM etc kernel bugs on native Linux too, this seems no different. Using a private environment would probably work around it (I presume 'rpm -qa' as a normal user does work?)

Comment 3 Panu Matilainen 2019-01-31 09:34:08 UTC
I have no means of testing anything on Windows, but if you can provide a strace of the failure (eg 'strace rpm -qa' output) I can at least take a look at it.

Comment 4 James Hartsock 2019-02-01 02:19:20 UTC
Created attachment 1525672 [details]
rpm output pre and post rebuilddb

Yes, rpm seems to work when run as normal user.

Here is me capturing the strace data.
  [root@win10 temp]# strace -o rpm_-q_rpm.works -s 2048 -tvf rpm -q rpm
  rpm-4.11.3-35.el7.x86_64

  [root@win10 temp]# strace -o rpm_--rebuilddb.strace -s 2048 -tvf rpm --rebuilddb

  [root@win10 temp]# strace -o rpm_-q_rpm.fail1 -s 2048 -tvf rpm -q rpm
  Segmentation fault (core dumped)

  [root@win10 temp]# strace -o rpm_-q_rpm.fail2 -s 2048 -tvf rpm -q rpm
  <hangs ... kill -9 rpm in another window>
  Killed

Comment 5 James Hartsock 2019-02-01 02:21:59 UTC
Created attachment 1525673 [details]
tar of rpm.works & rpm.fails

Here is tar of the /var/lib/rpm directory both before (rpm.works) and after the rebuild (rpm.fails),

Seems that you can use this to mimic the behavior on a normal RHEL 7 image. Perhaps enough to get some addition information on your own if needed.

Here is me replicating on my RHEL 7 (csb) laptop
  # uname -r
  3.10.0-891.el7.x86_64 <---- normal RHEL, not WSL

  # tar zxf ~jhartsoc/var-lib-rpm.tar.gz 
  # cd /var/lib
  # cp -arp rpm rpm.BACKUP
  # rm -rf rpm
  # cp -arp rpm.fails rpm
  # rpm -q rpm
  <hangs>

Comment 6 Panu Matilainen 2019-02-01 08:10:39 UTC
If mmap() would fail with EINVAL or such we could deal with it, but as long as WSL is pretending all is well we can't help.

There are several tickets on WSL reporting how Berkeley DB and LMDB are broken because of mmap() issues, eg
https://github.com/Microsoft/WSL/issues/3451 and https://github.com/Microsoft/WSL/issues/658

A bug in WSL can only be fxed in WSL.

Comment 7 Panu Matilainen 2019-02-01 08:11:36 UTC
*** Bug 1668378 has been marked as a duplicate of this bug. ***

Comment 8 James Hartsock 2019-02-01 14:53:15 UTC
Created attachment 1525858 [details]
opensuse strace of rpm and rebuildb

OpenSuse does appear to work...


win10:/var/lib # cat /etc/SuSE-release
openSUSE 42.3 (x86_64)
VERSION = 42.3
CODENAME = Malachite
# /etc/SuSE-release is deprecated and will be removed in the future, use /etc/os-release insteada

win10:/var/lib # strace -o suse-rpm_-q_rpm.before -s 2048 -tvf rpm -q rpm
rpm-4.11.2-13.7.x86_64

win10:/var/lib # strace -o suse-rpm_--rebuilddb.strace -s 2048 -tvf rpm --rebuilddb

win10:/var/lib # strace -o suse-rpm_-q_rpm.after -s 2048 -tvf rpm -q rpm
rpm-4.11.2-13.7.x86_64

Comment 9 Panu Matilainen 2019-02-04 07:28:48 UTC
Yeah, it "works" because they carry a patch to the shared environment of Berkeley DB (essentially disabling BDB level locking on concurrent access, the same as we do for unprivileged users) and then a bunch of other patches to try and deal with the consequences.

Comment 14 James Hartsock 2019-05-03 22:13:38 UTC
Created attachment 1562799 [details]
RHEL 7 strace of rpm -qa before & after rpm --rebuilddb in WSL build 18890

Microsoft Claims fixed in 18890 build:
  https://github.com/Microsoft/WSL/issues/3939#issuecomment-488429593
    Fixed in Windows Insider Build 18890 - https://github.com/MicrosoftDocs/WSL/blob/live/WSL/release-notes.md#build-18890


So with 18890 build does work...
  [root@win10_build18890 ~]# rpm --rebuilddb
  [root@win10_build18890 ~]# echo $?
  0

  [root@win10_build18890 ~]# rpm -q rpm
  rpm-4.11.3-32.el7.x86_64

  [root@win10_build18890 ~]# rpm -qa 2>/dev/null | grep rpm-4
  rpm-4.11.3-32.el7.x86_64


But get mutex errors on STDERROR
  [root@win10_build18890 ~]# rpm -qa 2>error.out | wc -l
  248

  [root@win10_build18890 ~]# sort error.out | uniq -c
        1 error: cannot open Name index using db5 - Cannot allocate memory (12)
    39313 error: rpmdb: BDB2034 unable to allocate memory for mutex; resize mutex region


Note You need to log in before you can comment on or make changes to this bug.