1086784 – Proposal: Replace Berkeley DB with LMDB

Bug 1086784 - Proposal: Replace Berkeley DB with LMDB

Summary: Proposal: Replace Berkeley DB with LMDB

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	rpm
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Packaging Maintenance Team
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-11 13:13 UTC by Petr Spacek
Modified:	2020-04-07 05:48 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-09-13 13:43:25 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Petr Spacek 2014-04-11 13:13:05 UTC

Considering license problems with Berkeley DB v6, I propose to think about moving RPM to MDB database.

MDB is supposed to have similar (but simpler) API as BDB + should have much much better performance.

The transition could provide real performance boost to RPM.

MDB vs. BDB (vs. others) benchmarks:
http://symas.com/mdb/microbench/

E-mail (from Debian world) about related problems and projects moving to MDB is here:
https://lists.debian.org/debian-devel/2013/07/msg00047.html

Documentation about MDB is here:
http://symas.com/mdb/

Comment 1 Florian Weimer 2014-04-11 13:32:15 UTC

LMDB has a hard key size limit of 511 bytes.  I think RPM may need to store path names longer than and index them, so LMDB would be a poor fit in this area.

In addition, when the database is opened, a maximum database has to be set.  Choosing this value on 32-bit architectures might turn out difficult.  Upstream is focused on 64 bit architectures.

Comment 2 Panu Matilainen 2014-04-14 07:24:18 UTC

The keysize limit appears to be only a backwards-compatible default:
http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=libraries/liblmdb/mdb.c;h=71b025fa68fd970504cd42ae1a80a7930ed567f7;hb=HEAD#l395

Considering LMDB appears to be "copy source software" that's probably not much of a problem, rpm could compile its internal version as it pleases. I fail to see anything on max db (size) on opening, what do you mean by that? An environment has a limit of how many db's can be opened within it, but that's not an issue for rpm.

Last I looked at LMDB, a bigger problem was lack of DB_PRIVATE counterpart which basically meant all rpm operations including queries would require root, which is simply a no-go. That situation may or may not have changed.

Comment 3 Florian Weimer 2014-04-14 08:03:17 UTC

(In reply to Panu Matilainen from comment #2)
> The keysize limit appears to be only a backwards-compatible default:
> http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=libraries/
> liblmdb/mdb.c;h=71b025fa68fd970504cd42ae1a80a7930ed567f7;hb=HEAD#l395

I think this only increases the key size limit to about one third of the page size, which was possible to achieve before by changing the #define directly.

> Considering LMDB appears to be "copy source software" that's probably not
> much of a problem, rpm could compile its internal version as it pleases. I
> fail to see anything on max db (size) on opening, what do you mean by that?

See mdb_env_set_mapsize, "The size of hte memory map is also the maximum size of the database."

Comment 4 Petr Spacek 2014-04-14 11:36:40 UTC

(In reply to Florian Weimer from comment #3)
> (In reply to Panu Matilainen from comment #2)
> > The keysize limit appears to be only a backwards-compatible default:
> > http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=libraries/
> > liblmdb/mdb.c;h=71b025fa68fd970504cd42ae1a80a7930ed567f7;hb=HEAD#l395
> 
> I think this only increases the key size limit to about one third of the
> page size, which was possible to achieve before by changing the #define
> directly.

I can't find any limitation in the code itself. It seems that one third of the page size is just "a" default.


> > Considering LMDB appears to be "copy source software" that's probably not
> > much of a problem, rpm could compile its internal version as it pleases. I
> > fail to see anything on max db (size) on opening, what do you mean by that?
> 
> See mdb_env_set_mapsize, "The size of hte memory map is also the maximum
> size of the database."

I don't see any explicit limit on database size in mdb_env_set_mapsize():
http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=libraries/liblmdb/mdb.c;h=71b025fa68fd970504cd42ae1a80a7930ed567f7;hb=HEAD#l3500

My understanding of thread
http://comments.gmane.org/gmane.network.openldap.technical/11699
is that mdb_env_set_mapsize() configures "sanity limit" specified by application.



I guess that Howard will be more than happy to answer specificif questions about MDB, so asking openldap-technical sounds like best idea...

Comment 5 Howard Chu 2014-04-14 13:48:36 UTC

Keys must be small enough to fit on a page, and a page must contain at least two keys in order for the B-tree structure to be maintained. So the absolute max is 1/2 the page size; we use 1/3 to avoid size issues when using DUPSORT mixed with other data.

Database size is limited to address space size. We don't recommend using LMDB on 32-bit architectures.

Comment 6 Howard Chu 2014-04-14 13:54:38 UTC

The Cfengine folks worked around the keysize limit by using a hash of their keys. I took a similar approach in SQLightning and my LMDB driver for MySQL/MariaDB. It's not most optimal; we could squeeze even more performance out of them by redesigning their data models but that's obviously much more work.

Comment 7 Petr Spacek 2014-04-15 20:07:39 UTC

Thank you Howard for explanation.

So we have 4096/3 = 1365 byte limit on key length.

/usr/include/linux/limits.h from package kernel-headers-3.13.9-200.fc20.x86_64 defines:

#define NAME_MAX         255    /* # chars in a file name */
#define PATH_MAX        4096    /* # chars in a path name including nul */

So it can't work directly with full paths... Maybe there is a clever workaround for this specific case (given that NAME_MAX is only 255 bytes)...


(In reply to Panu Matilainen from comment #2)
> Last I looked at LMDB, a bigger problem was lack of DB_PRIVATE counterpart
> which basically meant all rpm operations including queries would require
> root, which is simply a no-go. That situation may or may not have changed.

(Disclamer: I'm not BDB expert!) Panu, can you elaborate on this? I'm reading http://docs.oracle.com/cd/E17276_01/html/api_reference/C/envopen.html and I don't see relation between DB_PRIVATE and non-root access to the database.

Could you educate me, please? Thank you for your time.

Comment 8 Howard Chu 2014-04-15 20:53:18 UTC

There's no such requirement (requiring root access). However, to run without global write access requires you to disable LMDB's own lock manager and use your own. For example, Postfix does this.

Comment 9 Panu Matilainen 2014-04-16 05:17:16 UTC

DB_PRIVATE and root only relate in rpmdb context, not generally.

Comment 10 Howard Chu 2014-07-23 17:21:46 UTC

(In reply to Petr Spacek from comment #7)
> Thank you Howard for explanation.
> 
> So we have 4096/3 = 1365 byte limit on key length.
> 
> /usr/include/linux/limits.h from package
> kernel-headers-3.13.9-200.fc20.x86_64 defines:
> 
> #define NAME_MAX         255    /* # chars in a file name */
> #define PATH_MAX        4096    /* # chars in a path name including nul */
> 
> So it can't work directly with full paths... Maybe there is a clever
> workaround for this specific case (given that NAME_MAX is only 255 bytes)...

If pathnames are the only thing you're worried about, the obvious solution is to store records hierarchically using only the pathname components. As a bonus, if you have many objects in the same path, you only consume space once for their parent components. This is what OpenLDAP does to store DNs; you can get an overview of the design here http://www.openldap.org/conf/odd-sfo-2003/proceedings.html (Hierarchical Backend) 

The OpenLDAP hierarchical design is more complex because we needed to be able to traverse the hierarchy both top-down and bottom-up. If you're only parsing pathnames you only need top-down, so it will be even simpler.

Comment 11 Jaroslav Reznik 2015-03-03 15:41:31 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 12 Ľuboš Kardoš 2015-06-24 07:59:42 UTC

(In reply to Howard Chu from comment #6)
> The Cfengine folks worked around the keysize limit by using a hash of their
> keys. I took a similar approach in SQLightning and my LMDB driver for
> MySQL/MariaDB. It's not most optimal; we could squeeze even more performance
> out of them by redesigning their data models but that's obviously much more
> work.

What was your approach in LMDB driver for MySQL/MariaDB? Possibility to do range search is lost when keys are hashed. I think you needed range search in MySQL/MariaDB driver because MySQL/MariaDB can perform range queries. 

(In reply to Howard Chu from comment #10)
> (In reply to Petr Spacek from comment #7)
> > Thank you Howard for explanation.
> > 
> > So we have 4096/3 = 1365 byte limit on key length.
> > 
> > /usr/include/linux/limits.h from package
> > kernel-headers-3.13.9-200.fc20.x86_64 defines:
> > 
> > #define NAME_MAX         255    /* # chars in a file name */
> > #define PATH_MAX        4096    /* # chars in a path name including nul */
> > 
> > So it can't work directly with full paths... Maybe there is a clever
> > workaround for this specific case (given that NAME_MAX is only 255 bytes)...
> 
> If pathnames are the only thing you're worried about, the obvious solution
> is to store records hierarchically using only the pathname components. As a
> bonus, if you have many objects in the same path, you only consume space
> once for their parent components. This is what OpenLDAP does to store DNs;
> you can get an overview of the design here
> http://www.openldap.org/conf/odd-sfo-2003/proceedings.html (Hierarchical
> Backend) 
> 
> The OpenLDAP hierarchical design is more complex because we needed to be
> able to traverse the hierarchy both top-down and bottom-up. If you're only
> parsing pathnames you only need top-down, so it will be even simpler.

It is not just about storing single paths. E. g. requirements of package are also used as keys and with the latest upstream rpm they can contain strings like this: "(/fist/path OR /second/path AND /third/path)"

Comment 13 Howard Chu 2015-06-24 11:38:27 UTC

(In reply to Ľuboš Kardoš from comment #12)
> (In reply to Howard Chu from comment #6)
> > The Cfengine folks worked around the keysize limit by using a hash of their
> > keys. I took a similar approach in SQLightning and my LMDB driver for
> > MySQL/MariaDB. It's not most optimal; we could squeeze even more performance
> > out of them by redesigning their data models but that's obviously much more
> > work.
> 
> What was your approach in LMDB driver for MySQL/MariaDB? Possibility to do
> range search is lost when keys are hashed. I think you needed range search
> in MySQL/MariaDB driver because MySQL/MariaDB can perform range queries.

The hash is a suffix, it keeps the first 64 bytes of the original value and only computes the hash for larger values. For lots of large values that are identical in the first 64 bytes, range searches will require extra sorting/preprocessing.

> > The OpenLDAP hierarchical design is more complex because we needed to be
> > able to traverse the hierarchy both top-down and bottom-up. If you're only
> > parsing pathnames you only need top-down, so it will be even simpler.
> 
> It is not just about storing single paths. E. g. requirements of package are
> also used as keys and with the latest upstream rpm they can contain strings
> like this: "(/fist/path OR /second/path AND /third/path)"

I don't see this as a special problem. Again, OpenLDAP already handles this sort of boolean expression all the time.

Comment 14 Robert Scheck 2016-01-16 18:31:13 UTC

Does https://fedoraproject.org/wiki/Changes/NewRpmDBFormat mean, that this
proposal gets obsolete? Why yet another own DB format rather using LMDB? 
Unfortunately, the proposal in the wiki does not explain details or provide
a link - especially not why LMDB doesn't seem to be an option...

Comment 15 Jonathan Underwood 2016-02-01 13:27:58 UTC

The replacement DB backend is currently being discussed on the Fedora devel list:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/6TZWDVK2VGANJ25X66FH4QLXVRCH7M7D/

Comment 16 Jonathan Underwood 2016-02-01 13:29:33 UTC

In particular this response about (L)MDB as a replacement:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/5Y2KKL4PLSKGV4AM2QOXPVUMCCYHDVFV/

"LMDB seems a poor match because it requires the administrator to set a maximum database size, its key length is limited compared to Berkeley DB (which supports multi-megabyte keys). This means integrating it takes more than just rewriting the API calls."

Comment 17 Howard Chu 2016-02-01 14:09:55 UTC

1) The database size can be increased automatically, so that's not a relevant point.
2) This is a legitimate concern, being addressed in LMDB 1.x.

Comment 18 Ľuboš Kardoš 2016-02-08 13:31:22 UTC

> 2) This is a legitimate concern, being addressed in LMDB 1.x.

And when LMDB 1.x will be released ?

Comment 19 Howard Chu 2016-02-08 22:49:04 UTC

(In reply to Ľuboš Kardoš from comment #18)
> > 2) This is a legitimate concern, being addressed in LMDB 1.x.
> 
> And when LMDB 1.x will be released ?

LMDB 1.0.0 is due in a couple months from now.

Comment 20 Fedora End Of Life 2016-07-19 11:20:40 UTC

Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 21 Panu Matilainen 2017-11-03 14:01:35 UTC

(In reply to Howard Chu from comment #19)
> (In reply to Ľuboš Kardoš from comment #18)
> > > 2) This is a legitimate concern, being addressed in LMDB 1.x.
> > 
> > And when LMDB 1.x will be released ?
> 
> LMDB 1.0.0 is due in a couple months from now.

1.5 years later one might call that a fairly optimistic estimate :)
Not that versions matter, they're just numbers, but is the key size limitation still present in the current latest version 0.9.21?

Comment 22 Howard Chu 2017-11-03 14:43:08 UTC

Not sure the question is still relevant, given that this ticket was closed over a year ago, and rpm 4.14 has been released with LMDB support. http://rpm.org/#rpm-4140-released-oct-12-2017

As for LMDB, supporting very long key lengths requires an on-disk format change, so that feature will not go into LMDB 0.9. In 0.9 the default maxkey length is still 512, and the maximum limit is still 1/3rd the page size.

Comment 23 Neal Gompa 2017-11-04 00:20:15 UTC

@Howard,

This is still relevant, which is part of why LMDB is experimental for now.

Comment 24 Panu Matilainen 2017-11-06 06:28:35 UTC

Thanks for the update Howard. And yes it's relevant, like Neal said LMDB is considered experimental for now and will likely remain so as long as the key limit is there. Also *this* bug tracks the Fedora side of things, upstream support is just a pre-requisite for considering change in Fedora.

Comment 25 Howard Chu 2017-11-06 17:28:30 UTC

Can anyone point me to a description of the schema or data layout currently being used? Looking at the code and the existing DB on a Centos7 system (rpm 4.11.3) I'm not seeing any records using anywhere close to 512 byte long keys. Also, I would have expected any key-related problems to have already shown up in https://github.com/rpm-software-management/rpm/issues/281 e.g. using the --rebuilddb option or the other various tests cited there.

Comment 26 Richard W.M. Jones 2017-11-06 19:04:15 UTC

If you mean the current RPM (ie BDB) database then I'm not aware of a
"schema" outside of the code.  Note however that BDB is really a key-
value store.

However we have extensively examined the RPM database using plain
db_dump for our work in libguestfs grabbing the list of RPMs from
guests.  You can use commands like:

db_dump -p /var/lib/rpm/Packages | less

You'll also need to look at the RPM tag constants in rpmtag.h
in the RPM sources to make sense of what's going on.

Comment 27 Howard Chu 2017-11-06 19:42:20 UTC

(In reply to Richard W.M. Jones from comment #26)
> If you mean the current RPM (ie BDB) database then I'm not aware of a
> "schema" outside of the code.  Note however that BDB is really a key-
> value store.

Yes of course, and LMDB is key-value as well. But there's still a logical data model implemented on top of it, and that is still technically a "schema".

> However we have extensively examined the RPM database using plain
> db_dump for our work in libguestfs grabbing the list of RPMs from
> guests.  You can use commands like:
> 
> db_dump -p /var/lib/rpm/Packages | less

Yes I've been doing this. (Btw, I've been working with BDB for 16+ years and I wrote the Linux O_DIRECT support for it, as well as writing LMDB. I'm well aware of the capabilities and functionality available so please assume I've already taken the obvious steps to inspect the data.) So far, as I said, I haven't found any tables where the keys are anywhere near 512 bytes long. 

> You'll also need to look at the RPM tag constants in rpmtag.h
> in the RPM sources to make sense of what's going on.

Comment 28 Neal Gompa 2017-11-06 21:11:28 UTC

The RPM Database for Berkeley DB is split across multiple files, so you'll need to look at all of them in /var/lib/rpm to figure it out. I would recommend looking at a recent Fedora (like Fedora 27 Beta) to see what it looks like today.

Comment 29 Panu Matilainen 2017-11-07 09:56:48 UTC

The problem is that while in practise the data tends to fit fine into 512 bytes, it COULD legitimately be much longer. 

For a practical example, consider paths. Rpm stores indexes for dirnames and basenames, and while on Linux I think max filename length is 255, the path can be up to 4096 characters which is well beyond 512 bytes. Fedora packages might not come even close to reaching even 512 character paths but that doesn't such packages dont exist in the wild (or perhaps more likely, behind closed company doors). And it's also a question of principle: rpm should support what the OS does instead of limiting it. 

Paths aside, there's currently no (practical) limitation of how long a dependency string is allowed to be and those are stored in several indexes. Almost certainly I've never met a dependency string longer than 512 characters but doesn't mean they couldn't exist, perhaps the most obvious case being file dependencies (in which case the path exist as a whole string instead of split up to dirname and basename). And generated dependendcies can be arbitrarily long.

So for the majority of rpmdb's indexes, it needs to be possible to handle "arbitrarily" long keys. BDB's UINT32_MAX (iirc) is certainly arbitrary enough for rpm's purposes, UINT16_MAX would also be "arbitrarily long" I think but 512 is not.

Comment 30 Jeffrey Walton 2019-03-07 11:08:19 UTC

Sorry for barging in. I recently built OpenLDAP from sources and found some awful findings during acceptance testing.

My testing regime includes -Wall, Valgrind and the sanitizers. The file liblmdb/mdb.c is full of undefined behavior. Adding -fsanitize=undefined to CFLAGS and then running self tests produces a lot of findings. (The only other library I have seen worse is OpenSSL).

Below is a small portion of them from https://www.openldap.org/its/index.cgi/Incoming?id=8988. It should be no surprise there are bug reports sprayed across the web of unexplained crashes with backtraces that originate in liblmdb/mdb.c.

$ make test V=1
...

>>>>> Starting test001-slapadd for mdb...
running defines.sh
Running slapadd to build slapd database...
../../../libraries/liblmdb/mdb.c:7544:26: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:7545:3: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:6046:8: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:2418:7: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:6938:10: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:6939:6: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:6939:6: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:7287:6: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:7303:18: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:7303:18: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:7306:6: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:7335:11: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:7339:12: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00
              ^
../../../libraries/liblmdb/mdb.c:7341:20: runtime error: member access within
misaligned address 0x0000023fe67a for type 'struct MDB_page', which requires 8
byte alignment
0x0000023fe67a: note: pointer points here
 00 00  00 00 03 00 00 00 00 00  00 00 00 00 52 00 10 00  66 00 00 00 00 00 00
00  00 00 00 00 00 00

Comment 31 Howard Chu 2019-03-07 15:19:11 UTC

(In reply to Jeffrey Walton from comment #30)
> Sorry for barging in. I recently built OpenLDAP from sources and found some
> awful findings during acceptance testing.
> 
> My testing regime includes -Wall, Valgrind and the sanitizers. The file
> liblmdb/mdb.c is full of undefined behavior. Adding -fsanitize=undefined to
> CFLAGS and then running self tests produces a lot of findings. (The only
> other library I have seen worse is OpenSSL).
> 
> Below is a small portion of them from
> https://www.openldap.org/its/index.cgi/Incoming?id=8988. It should be no
> surprise there are bug reports sprayed across the web of unexplained crashes
> with backtraces that originate in liblmdb/mdb.c.

Please provide links to some of these bug reports. I see nothing in the OpenLDAP ITS
besides the issues you just filed.

We've tested LMDB extensively across ARM, x86, SPARC, MIPS, and POWER architectures.
Both ARM and SPARC are quite sensitive to misalignment. None of them misbehave with
any recent releases of LMDB.

Comment 32 Florian Festi 2019-04-03 10:36:54 UTC

I looked into using hashes as keys for the Dirnames index. Turns out that can't be done easily as we rely on the DB being ordered and allow ordered traversal for the files/paths. This is offered in the API and used by the filetriggers feature.
One option would be splitting the paths to single directory names and building the FS tree in the DB by pointing to the parent dirs. This will require writing new code for querying, traversing and updating which we'd rather avoid.

Comment 33 Panu Matilainen 2019-09-13 13:43:25 UTC

(In reply to Howard Chu from comment #19)
> 
> LMDB 1.0.0 is due in a couple months from now.

Three and half years later the key size limit is still there, it's time we move on with something else.

While paths can be split at /, we have other arbitrary strings that could exceed the key size too. And even if all that could be worked around, it's a whole lotta complications to what is otherwise pretty simple scheme, complications which we don't want.

Comment 34 Leonid Yuriev 2020-04-04 15:08:31 UTC

(In reply to Petr Spacek from comment #0)
> Considering license problems with Berkeley DB v6, I propose to think about
> moving RPM to MDB database.
> 
> MDB is supposed to have similar (but simpler) API as BDB + should have much
> much better performance.
> 
> The transition could provide real performance boost to RPM.
> 
> MDB vs. BDB (vs. others) benchmarks:
> http://symas.com/mdb/microbench/
> 
> E-mail (from Debian world) about related problems and projects moving to MDB
> is here:
> https://lists.debian.org/debian-devel/2013/07/msg00047.html
> 
> Documentation about MDB is here:
> http://symas.com/mdb/

Please take look to libmdbx.
Copy & paste from README:

libmdbx is superior to legendary LMDB in terms of features and reliability, not inferior in performance.
In comparison to LMDB, libmdbx make things “just work” perfectly and out-of-the-box, not silently and catastrophically break down.

Improvements beyond LMDB
1) Keys could be more than 2 times longer than LMDB.
For DB with default page size libmdbx support keys up to 1300 bytes and up to 21780 bytes for 64K page size. LMDB allows key size up to 511 bytes and may silently loses data with large values.

2) Up to 20% faster than LMDB in CRUD benchmarks.
Benchmarks of the in-tmpfs scenarios, that tests the speed of engine itself, shown that libmdbx 10-20% faster than LMDB. These and other results could be easily reproduced with ioArena just by make bench-quartet, including comparisons with RockDB and WiredTiger.

3) Automatic on-the-fly database size adjustment, both increment and reduction.
libmdbx manage the database size according to parameters specified by mdbx_env_set_geometry() function, ones include the growth step and the truncation threshold.

4) Automatic continuous zero-overhead database compactification.
During each commit libmdbx merges suitable freeing pages into unallocated area at the end of file, and then truncate unused space when a lot enough of.

5) The same database format for 32- and 64-bit builds.
libmdbx database format depends only on the endianness but not on the bitness.

6) mdbx_chk tool for database integrity check.

7) Fixed more than 10 significant errors, in particular: page leaks, wrong sub-database statistics, segfault in several conditions, unoptimal page merge strategy, updating an existing record with a change in data size (including for multimap), etc.

8) All cursors can be reused and should be closed explicitly, regardless ones were opened within write or read transaction.

9) Opening database handles are spared from race conditions and pre-opening is not needed.

etc.

https://github.com/erthink/libmdbx

Regards,
Leonid.

Comment 35 Robert Scheck 2020-04-04 15:27:44 UTC

Leonid, https://fedoraproject.org/wiki/Changes/Sqlite_Rpmdb is the current way. So for having at least a chance, I guess you would have to provide a pull request on GitHub containing libmdbx support for RPM to gain at least upstream support. I'm in doubt that somebody else will implement libmdbx support while SQLite is targeted and this bug report is closed as WONTFIX.

Comment 36 Howard Chu 2020-04-04 17:32:51 UTC

(In reply to Leonid Yuriev from comment #34)
> (In reply to Petr Spacek from comment #0)
> > Considering license problems with Berkeley DB v6, I propose to think about
> > moving RPM to MDB database.
> > 
> > MDB is supposed to have similar (but simpler) API as BDB + should have much
> > much better performance.
> > 
> > The transition could provide real performance boost to RPM.
> > 
> > MDB vs. BDB (vs. others) benchmarks:
> > http://symas.com/mdb/microbench/
> > 
> > E-mail (from Debian world) about related problems and projects moving to MDB
> > is here:
> > https://lists.debian.org/debian-devel/2013/07/msg00047.html
> > 
> > Documentation about MDB is here:
> > http://symas.com/mdb/
> 
> Please take look to libmdbx.
> Copy & paste from README:
> 
> libmdbx is superior to legendary LMDB in terms of features and reliability,
> not inferior in performance.
> In comparison to LMDB, libmdbx make things “just work” perfectly and
> out-of-the-box, not silently and catastrophically break down.
> 
> Improvements beyond LMDB
> 1) Keys could be more than 2 times longer than LMDB.
> For DB with default page size libmdbx support keys up to 1300 bytes and up
> to 21780 bytes for 64K page size. LMDB allows key size up to 511 bytes and
> may silently loses data with large values.

LMDB actually supports keys up to 1/3 page size. I.e., 1300 bytes for 4KB page.

> 5) The same database format for 32- and 64-bit builds.
> libmdbx database format depends only on the endianness but not on the
> bitness.

LMDB can be compiled to use identical format on 32-bit and 64-bit builds.

Comment 37 Leonid Yuriev 2020-04-04 21:50:45 UTC

(In reply to Howard Chu from comment #36)
> (In reply to Leonid Yuriev from comment #34)
> > Improvements beyond LMDB:
> > 1) Keys could be more than 2 times longer than LMDB.
> > For DB with default page size libmdbx support keys up to 1300 bytes and up
> > to 21780 bytes for 64K page size. LMDB allows key size up to 511 bytes and
> > may silently loses data with large values.
> 
> LMDB actually supports keys up to 1/3 page size. I.e., 1300 bytes for 4KB
> page.

Howard, unfortunately I am completely sure that LMDB has problems with large keys and other bugs (at least broken internal audit and page leak).
Because last fall I had to deal with these errors in MDBX, and as a result rewritten one more part of the code that was inherited from LMDB (page split/rebalance).

I think you can detect errors in LMDB if you fix the internal audit and use the test from MDBX (i.e. https://github.com/erthink/libmdbx/blob/master/test/long_stochastic.sh, it runs about a month).
Same with the -fsanitize=undefined, ASAN, Valgrind (see https://github.com/erthink/libmdbx/issues/82).

However, I do not think that this is the best place to discuss this issue, and I suggest that we leave it at that.

Comment 38 Leonid Yuriev 2020-04-04 22:06:12 UTC

(In reply to Robert Scheck from comment #35)
> Leonid, https://fedoraproject.org/wiki/Changes/Sqlite_Rpmdb is the current
> way. So for having at least a chance, I guess you would have to provide a
> pull request on GitHub containing libmdbx support for RPM to gain at least
> upstream support. I'm in doubt that somebody else will implement libmdbx
> support while SQLite is targeted and this bug report is closed as WONTFIX.

Robert, thank for your reply.

Unfortunately Panu has already given me the answer "That is not the kind of upstream we want".
https://github.com/rpm-software-management/rpm/issues/958

So I don't think the pull-request will change anything.
However, I wrote about libmdbx here in case there are any difficulties with SQLite.

Regards,
Leonid.

Comment 39 Panu Matilainen 2020-04-06 07:54:24 UTC

As I already said in the GH ticket, we're not in the market for any new database backends at this point, and in particular we're not in the market to be a test-bed for new, unproven databases. Continuing to pushing such a solution in this manner does not make it more attractive, only to the contrary.

Note You need to log in before you can comment on or make changes to this bug.