Bug 460851 - tdb is missing msync() calls
Summary: tdb is missing msync() calls
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: samba
Version: rawhide
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Simo Sorce
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-09-02 03:00 UTC by Lennart Poettering
Modified: 2008-09-09 14:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-09-08 17:49:48 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Lennart Poettering 2008-09-02 03:00:30 UTC
I noticed that tdb doesn't issue msync() calls after writing to a file via mmap(). However, unless msync() is called data that is written to an MAP_SHARED mmap segment is not necessarily actually stored on disk. A call to munmap() or close() does *not* include an implicit msync. i.e. if you mmap a file, write to that segment and then immediately munmap/close it there is quite a chance that the data will not be in that file if you open it afterwards.

Right now it depends on luck whether data stored in a tdb database actually arrives on the disk before the database is closed.

tdb seems to call msync() only as part of the transaction stuff in a similar fashion as fsync(). However, msync() is strictly necessary to write correct code that uses mmap() for writing, while fsync() is optional for code that uses write() and only adds extra safety.

tdb should call msync() each time before it unlocks a writer lock on the db.

(afaik a lot of other projects copied tdb into their sources -- they'd need a fix for this too)

A quick temporary fix for clients is to pass TDB_NOMMAP to tdb.

Comment 1 Simo Sorce 2008-09-03 12:48:40 UTC
Thanks Lennart, I am investigating the issue.

Comment 2 Simo Sorce 2008-09-04 21:41:59 UTC
Lennat I actually pushed a patch that add msync() before munmap() in samba but I am getting a lot of push back.
The claim is that it slow things down and that the Linux implementation does an implicit msync() on munmap(), do you have a case where you actually found a problem on some OS?

Comment 3 Lennart Poettering 2008-09-08 16:25:37 UTC
I wrote a tool called "syrep" a while back which heavily uses mmap(). Back then I thought too that munmap() included an msync(). What happened is that from time to time my code produced files filled only with NUL bytes. To track this down was a royal PITA, and the man pages don't really stress this point.

This was the trivial commit I made back then:

http://git.0pointer.de/?p=syrep.git;a=blobdiff;f=src/util.c;h=1f5c412feb86a2aa7b1f0659b8198eb50cd41add;hp=4eebd1a6dd49f1cd1f2ec110c8a4ef9e3903e6e7;hb=5889555ab4749b3d93ffcc35af11fc7750b6d533;hpb=078b8983f0f8d2cbf6371f093aad5615ca33bcdd

I haven't had any iusses with tdb itself, but every time I see mmap code I now instantly look for the msync() and that's how I noticed that tdb was lacking those calls.

Also, if they claim that munmap includes msync, how could it be that adding the additional msync would slow anything down?

Comment 4 Simo Sorce 2008-09-08 17:23:00 UTC
Do you remember what kernel was this on ?

Comment 5 Rik van Riel 2008-09-08 17:43:24 UTC
Lennart,

munmap() implicitly does the same as msync(MS_ASYNC), which guarantees that the data will be written to disk eventually.

A subsequent mmap will use the exact same page cache pages that were just munmapped, so they will contain the data that was written into the mmaped area.

Comment 6 Simo Sorce 2008-09-08 17:49:48 UTC
Apparently all people that I consulted with agree that msync() before munmap() is not necessary unless you want to protect data from sudden hard reboots.
Given TDB offers a transaction API (that uses msync(MS_SYNC) and that is the recommended API for persistent TDBs that should survive against odds I think we can close this as not a bug.

Comment 7 Lennart Poettering 2008-09-09 14:12:52 UTC
Hmm, no idea what kernel that was on. Something that was current in 2005.

Rik, was there any bug fixed related to this since 2005?


Note You need to log in before you can comment on or make changes to this bug.