Red Hat Bugzilla – Bug 460851
tdb is missing msync() calls
Last modified: 2008-09-09 10:12:52 EDT
I noticed that tdb doesn't issue msync() calls after writing to a file via mmap(). However, unless msync() is called data that is written to an MAP_SHARED mmap segment is not necessarily actually stored on disk. A call to munmap() or close() does *not* include an implicit msync. i.e. if you mmap a file, write to that segment and then immediately munmap/close it there is quite a chance that the data will not be in that file if you open it afterwards.
Right now it depends on luck whether data stored in a tdb database actually arrives on the disk before the database is closed.
tdb seems to call msync() only as part of the transaction stuff in a similar fashion as fsync(). However, msync() is strictly necessary to write correct code that uses mmap() for writing, while fsync() is optional for code that uses write() and only adds extra safety.
tdb should call msync() each time before it unlocks a writer lock on the db.
(afaik a lot of other projects copied tdb into their sources -- they'd need a fix for this too)
A quick temporary fix for clients is to pass TDB_NOMMAP to tdb.
Thanks Lennart, I am investigating the issue.
Lennat I actually pushed a patch that add msync() before munmap() in samba but I am getting a lot of push back.
The claim is that it slow things down and that the Linux implementation does an implicit msync() on munmap(), do you have a case where you actually found a problem on some OS?
I wrote a tool called "syrep" a while back which heavily uses mmap(). Back then I thought too that munmap() included an msync(). What happened is that from time to time my code produced files filled only with NUL bytes. To track this down was a royal PITA, and the man pages don't really stress this point.
This was the trivial commit I made back then:
I haven't had any iusses with tdb itself, but every time I see mmap code I now instantly look for the msync() and that's how I noticed that tdb was lacking those calls.
Also, if they claim that munmap includes msync, how could it be that adding the additional msync would slow anything down?
Do you remember what kernel was this on ?
munmap() implicitly does the same as msync(MS_ASYNC), which guarantees that the data will be written to disk eventually.
A subsequent mmap will use the exact same page cache pages that were just munmapped, so they will contain the data that was written into the mmaped area.
Apparently all people that I consulted with agree that msync() before munmap() is not necessary unless you want to protect data from sudden hard reboots.
Given TDB offers a transaction API (that uses msync(MS_SYNC) and that is the recommended API for persistent TDBs that should survive against odds I think we can close this as not a bug.
Hmm, no idea what kernel that was on. Something that was current in 2005.
Rik, was there any bug fixed related to this since 2005?