Bug 1668380
Summary: | Under Microsoft WIndows Subsystem for Linux (WSL) - Error: rpmdb open failed [rhel8] | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | James Hartsock <hartsjc> |
Component: | rpm | Assignee: | Packaging Maintenance Team <packaging-team-maint> |
Status: | CLOSED NOTABUG | QA Contact: | BaseOS QE Security Team <qe-baseos-security> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 8.0 | CC: | fweimer, ngompa13, pmatilai |
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1668379 | Environment: | |
Last Closed: | 2019-02-01 08:10:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1668379 | ||
Bug Blocks: | 1623566 | ||
Attachments: |
Description
James Hartsock
2019-01-22 15:41:14 UTC
Some public discussion going on at: https://github.com/Microsoft/WSL/issues/90 https://github.com/Microsoft/WSL/issues/3742 "Works in native Linux, doesn't in WSL" sounds simply like a bug in WSL to me. Berkeley DB's shared environment has a bit of a history for being a sucker for weird VM etc kernel bugs on native Linux too, this seems no different. Using a private environment would probably work around it (I presume 'rpm -qa' as a normal user does work?) I have no means of testing anything on Windows, but if you can provide a strace of the failure (eg 'strace rpm -qa' output) I can at least take a look at it. Created attachment 1525672 [details]
rpm output pre and post rebuilddb
Yes, rpm seems to work when run as normal user.
Here is me capturing the strace data.
[root@win10 temp]# strace -o rpm_-q_rpm.works -s 2048 -tvf rpm -q rpm
rpm-4.11.3-35.el7.x86_64
[root@win10 temp]# strace -o rpm_--rebuilddb.strace -s 2048 -tvf rpm --rebuilddb
[root@win10 temp]# strace -o rpm_-q_rpm.fail1 -s 2048 -tvf rpm -q rpm
Segmentation fault (core dumped)
[root@win10 temp]# strace -o rpm_-q_rpm.fail2 -s 2048 -tvf rpm -q rpm
<hangs ... kill -9 rpm in another window>
Killed
Created attachment 1525673 [details]
tar of rpm.works & rpm.fails
Here is tar of the /var/lib/rpm directory both before (rpm.works) and after the rebuild (rpm.fails),
Seems that you can use this to mimic the behavior on a normal RHEL 7 image. Perhaps enough to get some addition information on your own if needed.
Here is me replicating on my RHEL 7 (csb) laptop
# uname -r
3.10.0-891.el7.x86_64 <---- normal RHEL, not WSL
# tar zxf ~jhartsoc/var-lib-rpm.tar.gz
# cd /var/lib
# cp -arp rpm rpm.BACKUP
# rm -rf rpm
# cp -arp rpm.fails rpm
# rpm -q rpm
<hangs>
If mmap() would fail with EINVAL or such we could deal with it, but as long as WSL is pretending all is well we can't help. There are several tickets on WSL reporting how Berkeley DB and LMDB are broken because of mmap() issues, eg https://github.com/Microsoft/WSL/issues/3451 and https://github.com/Microsoft/WSL/issues/658 A bug in WSL can only be fxed in WSL. *** Bug 1668378 has been marked as a duplicate of this bug. *** Created attachment 1525858 [details]
opensuse strace of rpm and rebuildb
OpenSuse does appear to work...
win10:/var/lib # cat /etc/SuSE-release
openSUSE 42.3 (x86_64)
VERSION = 42.3
CODENAME = Malachite
# /etc/SuSE-release is deprecated and will be removed in the future, use /etc/os-release insteada
win10:/var/lib # strace -o suse-rpm_-q_rpm.before -s 2048 -tvf rpm -q rpm
rpm-4.11.2-13.7.x86_64
win10:/var/lib # strace -o suse-rpm_--rebuilddb.strace -s 2048 -tvf rpm --rebuilddb
win10:/var/lib # strace -o suse-rpm_-q_rpm.after -s 2048 -tvf rpm -q rpm
rpm-4.11.2-13.7.x86_64
Yeah, it "works" because they carry a patch to the shared environment of Berkeley DB (essentially disabling BDB level locking on concurrent access, the same as we do for unprivileged users) and then a bunch of other patches to try and deal with the consequences. Created attachment 1562799 [details] RHEL 7 strace of rpm -qa before & after rpm --rebuilddb in WSL build 18890 Microsoft Claims fixed in 18890 build: https://github.com/Microsoft/WSL/issues/3939#issuecomment-488429593 Fixed in Windows Insider Build 18890 - https://github.com/MicrosoftDocs/WSL/blob/live/WSL/release-notes.md#build-18890 So with 18890 build does work... [root@win10_build18890 ~]# rpm --rebuilddb [root@win10_build18890 ~]# echo $? 0 [root@win10_build18890 ~]# rpm -q rpm rpm-4.11.3-32.el7.x86_64 [root@win10_build18890 ~]# rpm -qa 2>/dev/null | grep rpm-4 rpm-4.11.3-32.el7.x86_64 But get mutex errors on STDERROR [root@win10_build18890 ~]# rpm -qa 2>error.out | wc -l 248 [root@win10_build18890 ~]# sort error.out | uniq -c 1 error: cannot open Name index using db5 - Cannot allocate memory (12) 39313 error: rpmdb: BDB2034 unable to allocate memory for mutex; resize mutex region |