Bug 616608
Summary: | SIGBUS in RDN index reads on platforms with strict alignments | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] 389 | Reporter: | Ulf Weltman <ulf.weltman> | ||||||||||||||
Component: | Directory Server | Assignee: | Noriko Hosoi <nhosoi> | ||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Viktor Ashirov <vashirov> | ||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||
Priority: | low | ||||||||||||||||
Version: | 1.2.6 | CC: | amsharma, nhosoi, nkinder, rmeggins | ||||||||||||||
Target Milestone: | --- | Keywords: | VerifiedUpstream | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | ia64 | ||||||||||||||||
OS: | All | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2015-12-07 17:11:30 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 543590, 639035 | ||||||||||||||||
Attachments: |
|
Description
Ulf Weltman
2010-07-20 23:02:21 UTC
Created attachment 433277 [details]
patch proposal
Please hold off on this, I was too hasty. It looks more like corruption, the address for the rdn_elem struct doesn't look right. Created attachment 433471 [details]
new patch proposal
MODDN produces SIGBUS when strict alignments are enforced because data read from the rdn index is cast to an an rdn_elem struct which is not aligned properly. To fix this I copy just the first part of the raw rdn_elem to stack memory and use that to find the true size.
Created attachment 433564 [details]
new patch proposal with additional fixes
The RDN index stores rdn_elem structs. In some places dbc->c_get() is used with DB_MULTIPLE flag for bulk retrieval of rdn_elem structs. The Berkeley DB macro DB_MULTIPLE_NEXT is used to iterate over and return a pointer into the buffer for each item. These pointers might not be for aligned memory so when the data is cast to a rdn_elem struct and the members are accessed a SIGBUS occurs.
I've found these affected areas:
1) In libback-ldbm, _entryrdn_index_read() uses _entryrdn_dup_rdn_elem() to duplicate what it gets back with DB_MULTIPLE_NEXT.
Fixed _entryrdn_dup_rdn_elem() by making it start out by copying just the first part of the raw rdn_elem to stack memory and use that to find the true size so it can proceed with copying the full thing.
2) In libback-ldbm, _entryrdn_append_childidl() tries to access members of the rdn_elem that it gets back from DB_MULTIPLE_NEXT directly for appending to the IDL.
Fixed by making it duplicate the data to a buffer that is recycled each iteration.
3) In dbscan, display_entryrdn_item() retrieves rdn_elem with DB_MULTIPLE_NEXT and passes them to _entryrdn_dump_rdn_elem() for printing.
Fixed by duplicating it.
Thanks for the patch, Ulf! Your fix looks good. I have one minor request. :) Please use pure C comment syntax for the commented out line in https://bugzilla.redhat.com/attachment.cgi?id=433564&action=diff#a/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c_sec4: 2560 // myelem = NULL;" (But you don't have to leave the line, do you? Please just delete it. Thanks!) Created attachment 433821 [details]
fix proposal with commented-out line removed
Thanks Noriko!
Oops, I use the C++-style comments when I'm debugging since it's easier to type and intended to remove the line completely before submitting. I've attached an identical diff except with the commented line removed.
Why is it necessary to do a malloc/slapi_ch_realloc? rdn_elem are not of fixed size, the nrdn/rdn strings trail behind it. So sizeof(rdn_elem) reports 12, but if the nrdn and rdn strings are "uid=foo\0", the true size of rdn_elem is 12+8+8. I tried to reduce the number of reallocs by allocating 1.5x the space. Not sure if it's a good idea or not, but do we want to introduce ifdef HP-UX? (or possibly SOLARIS, as well?) For _entryrdn_append_childidl() I could change it to use a stack buffer of decent size to start with and only switch to malloc'd memory if needed (e.g. a gigantic RDN comes along). Would that be preferable? By the way, in dbscan I do malloc/free for each RDN, having assumed you'd prefer simple rather than more complex but high-performing code for this tool. Is the problem this: typedef struct _rdn_elem { PRUint32 rdn_elem_id; PRUint16 rdn_elem_nrdn_len; /* including '\0' */ PRUint16 rdn_elem_rdn_len; /* including '\0' */ char rdn_elem_nrdn_rdn[1]; /* "normalized rdn" '\0' "rdn" '\0' */ } rdn_elem; #define RDN_ADDR(elem) ((elem)->rdn_elem_nrdn_rdn + (elem)->rdn_elem_nrdn_len) Because rdn_elem_nrdn_rdn is not really a char *, the address computation and dereference of ((elem)->rdn_elem_nrdn_rdn + (elem)->rdn_elem_nrdn_len) fails? I understand the rationale for using this "trick", in order to perform only one memory alloc (instead of a separate one for a char *rdn_elem_nrdn_rdn) - is there a more portable way to do this? I think the problem is rather the size of rdn_elem is variable and not the multiple of 4-bytes. If we read from db with MULTIPLE flag, multiple rdn_elems are put into the buffer without pudding. The starting point of each rdn_elem could be invalid on the strict alignment system. If we could allocate rdn_elem always guaranteed to be aligned, then we don't have this problem (I hope...) Or we could change rdn_elem like this and convert ID and length in the entryrdn code. This might be more reasonable... typedef struct _rdn_elem { char rdn_elem_id[4]; char rdn_elem_nrdn_len[2]; /* including '\0' */ char rdn_elem_rdn_len[2]; /* including '\0' */ char rdn_elem_nrdn_rdn[1]; /* "normalized rdn" '\0' "rdn" '\0' */ } rdn_elem; (Do we need to support the upgrade from 1.2.6.rc# to 1.2.6.rc#+1? It might be difficult...) (In reply to comment #10) > For _entryrdn_append_childidl() I could change it to use a stack buffer of > decent size to start with and only switch to malloc'd memory if needed (e.g. a > gigantic RDN comes along). Would that be preferable? > > By the way, in dbscan I do malloc/free for each RDN, having assumed you'd > prefer simple rather than more complex but high-performing code for this tool. If we're introducing mallocs to code that didn't have mallocs previously, I just want to understand why. (In reply to comment #12) > I think the problem is rather the size of rdn_elem is variable and not the > multiple of 4-bytes. If we read from db with MULTIPLE flag, multiple rdn_elems > are put into the buffer without pudding. The starting point of each rdn_elem > could be invalid on the strict alignment system. Is there a flag to pass to db open or cursor creation or ??? so that DB_MULTIPLE_NEXT will return correctly aligned data? Is the problem that in static size_t _entryrdn_rdn_elem_size(rdn_elem *elem) { size_t len = sizeof(rdn_elem); len += elem->rdn_elem_rdn_len + elem->rdn_elem_nrdn_len; return len; } the address of elem is not a multiple of 8? What if, when we stored the data, we rounded up the size of elem to the nearest multiple of the pointer size? Would that guarantee that DB_MULTIPLE_NEXT would return the data aligned properly? > > If we could allocate rdn_elem always guaranteed to be aligned, then we don't > have this problem (I hope...) (In reply to comment #15) > (In reply to comment #12) > > I think the problem is rather the size of rdn_elem is variable and not the > > multiple of 4-bytes. If we read from db with MULTIPLE flag, multiple rdn_elems > > are put into the buffer without pudding. The starting point of each rdn_elem > > could be invalid on the strict alignment system. > > Is there a flag to pass to db open or cursor creation or ??? so that > DB_MULTIPLE_NEXT will return correctly aligned data? DB_MULTPLE_NEXT is a simple macro that advances the pointer in the buffer, it doesn't take any flags, nor does DB_MULTIPLE_INIT. I had hoped to find something for the initial dbc->getc() call to combine with the DB_MULTIPLE flag when it creates the bulk buffer but didn't spot anything. > > Is the problem that in > > static size_t > _entryrdn_rdn_elem_size(rdn_elem *elem) > { > size_t len = sizeof(rdn_elem); > len += elem->rdn_elem_rdn_len + elem->rdn_elem_nrdn_len; > return len; > } > > the address of elem is not a multiple of 8? Yes. Also in _entryrdn_append_childidl() it tries to access the rdn_elem_id and rdn_elem_nrdn_rdn members of a raw rdn_elem. Something similar is done in dbscan. > What if, when we stored the data, > we rounded up the size of elem to the nearest multiple of the pointer size? > Would that guarantee that DB_MULTIPLE_NEXT would return the data aligned > properly? I think that the variable-sized nrdn+rdn data will still skew the alignment of the next rdn_elem. If we could pad the end of the variable data it might work, but it seems fragile in that we'd be relying on undocumented Berkeley DB implementation that could lay out the bulk buffer differently later. (In reply to comment #16) > (In reply to comment #15) [...] > > the address of elem is not a multiple of 8? > Yes. Also in _entryrdn_append_childidl() it tries to access the rdn_elem_id > and rdn_elem_nrdn_rdn members of a raw rdn_elem. I think as long as each rdn_elem's size is a multiple of 8, accessing the members should be okay. > Something similar is done in dbscan. As long as each datum in db is aligned, accessing it in the MULTIPLE buffer should not cause any problem, I think. The point is we store aligned rdn_elem in the db... > > > What if, when we stored the data, > > we rounded up the size of elem to the nearest multiple of the pointer size? > > Would that guarantee that DB_MULTIPLE_NEXT would return the data aligned > > properly? > I think that the variable-sized nrdn+rdn data will still skew the alignment of > the next rdn_elem. If we could pad the end of the variable data it might work, > but it seems fragile in that we'd be relying on undocumented Berkeley DB > implementation that could lay out the bulk buffer differently later. Berkeley DB does not care the contents of data whether the last 1 ~ 7 bytes are used or not. All it cares is the size of data.data is data.size. Might be half baked, but could you experiment this change on HP-UX, Ulf? diff --git a/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c b/ldap/servers/slapd/b index dc00d33..7ceed20 100644 --- a/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c +++ b/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c @@ -1359,11 +1359,15 @@ _entryrdn_dup_rdn_elem(const void *raw, rdn_elem **new) memcpy(*new, raw, elem_len); } +#define ALIGNMENT4MULT 8 static size_t _entryrdn_rdn_elem_size(rdn_elem *elem) { size_t len = sizeof(rdn_elem); len += elem->rdn_elem_rdn_len + elem->rdn_elem_nrdn_len; + /* we rounded up the size of elem to the nearest multiple of the pointer + * size*/ + len = ((len + ALIGNMENT4MULT) & ~(ALIGNMENT4MULT - 1)); return len; } It would be interesting to see if that makes the problem go away. That's what I meant - pad the data. I guess it does seem fragile, but I don't understand why using regular put/get returns aligned data, and why using DB_MULTIPLE returns unaligned data. We're doing almost the exact same thing as is recommended in the BDB documentation: http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_misc_struct.html That doesn't differentiate between single get() and DB_MULTIPLE get(). I haven't been able to find anything about the implementation details e.g. * If we put aligned, padded data with put(), can we be guaranteed of reading properly aligned data with get() and DB_MULTIPLE get()? * If it stores/retrieves unaligned data by default, is there some flag we can pass to make sure put() stores it aligned and get() reads it aligned? The changelog code uses DB_MULTIPLE, but it uses the marshalling/unmarshalling method mentioned on that bdb doc page, so I suppose we can never run into this problem. If there is no way to guarantee that DB_MULTIPLE returns aligned data, I suggest that we just change the rdn code to use the marshall/unmarshall method instead, which seems far safer. Note: http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_misc.html "The Berkeley DB access methods provide no guarantees about byte alignment for returned key/data pairs, or callback functions which take DBT references as arguments, and applications are responsible for arranging any necessary alignment. The DB_DBT_MALLOC, DB_DBT_REALLOC, and DB_DBT_USERMEM flags may be used to store returned items in memory of arbitrary alignment." which seems in direct opposition to the information on the am_misc_struct.html page. On 07/23/2010 10:14 AM, bugzilla wrote: Comment #18 from Rich Megginson <rmeggins> 2010-07-23 13:14:24 EDT > It would be interesting to see if that makes the problem go away. > > That's what I meant - pad the data. I guess it does seem fragile, but I don't > understand why using regular put/get returns aligned data, and why using > DB_MULTIPLE returns unaligned data. Using regular get, each retrieved datum is put in a USER_MEM buffer or internally allocated memory. I think the starting address of data is an issue. If using DB_MULTIPLE, various sized data are put into one big buffer contiguously. > We're doing almost the exact same thing as > is recommended in the BDB documentation: > > http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_misc_struct.html > > That doesn't differentiate between single get() and DB_MULTIPLE get(). I > haven't been able to find anything about the implementation details e.g. > * If we put aligned, padded data with put(), can we be guaranteed of reading > properly aligned data with get() and DB_MULTIPLE get()? I believe so. Again, "padded data" is ours. For libdb, it's part of data. > * If it stores/retrieves unaligned data by default, is there some flag we can > pass to make sure put() stores it aligned and get() reads it aligned? Ah, that's a good idea... Only when the size of data is not a multiple of 8, we could copy it to malloc'ed memory as Ulf did. Then, we don't have to worry about the migration! > The changelog code uses DB_MULTIPLE, but it uses the marshalling/unmarshalling > method mentioned on that bdb doc page, so I suppose we can never run into this > problem. If there is no way to guarantee that DB_MULTIPLE returns aligned > data, I suggest that we just change the rdn code to use the marshall/unmarshall > method instead, which seems far safer. Do you prefer this way? We could switch to it, too. > Note: > http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_misc.html > > "The Berkeley DB access methods provide no guarantees about byte alignment for > returned key/data pairs, or callback functions which take DBT references as > arguments, and applications are responsible for arranging any necessary > alignment. The DB_DBT_MALLOC, DB_DBT_REALLOC, and DB_DBT_USERMEM flags may be > used to store returned items in memory of arbitrary alignment." > > which seems in direct opposition to the information on the am_misc_struct.html > page. > It seems safest to use the marshall/unmarshall technique. However, I would really like to see what happens if we explicitly pad the data to make it aligned. (In reply to comment #20) > It seems safest to use the marshall/unmarshall technique. However, I would > really like to see what happens if we explicitly pad the data to make it > aligned. Agreed! Let me "take" this bug. ;) I've tested padding but it doesn't seem to work. The very first item returned from the bulk retrieval with DB_MULTIPLE_NEXT is misaligned: 2470 DB_MULTIPLE_INIT(ptr, &data); (gdb) n 2471 do { (gdb) n 2472 memset(&dataret, 0, sizeof(dataret)); (gdb) n 2473 DB_MULTIPLE_NEXT(ptr, &data, dataret.data, dataret.size); (gdb) n 2474 if (NULL == dataret.data || NULL == ptr) { (gdb) print dataret.data $4 = (void *) 0x9fffffffed2acd87 (gdb) print &((rdn_elem*)dataret.data)->rdn_elem_id $5 = (unsigned int *) 0x9fffffffed2acd87 Thanks to Ulf for testing the change. All right. We really need to "marshall" the entryrdn data. Created attachment 434086 [details]
git patch file (master)
Fix description:
Use the marshall/unmarshall technique for the entryrdn index data.
Introduced sizeushort_internal_to_stored/sizeushort_stored_to_internal
for the size data to store in 2 bytes.
Ulf, could you apply this patch and try on HP-UX? I ran several test cases and they passed 100%.
Great, works for me, I tested on HP-UX on ia64. Thanks Noriko! dbscan.c needs updates too, right? Some minor comments: ldbm_entryrdn.c: newid is set to id_stored_to_internal(newelem->rdn_elem_id), but the ID in newelem was originally passed into the function so you could just do newid = id so its origin is clear. In two instances ID is written into keybuf using slapi_ch_smprintf with format %u, in others with %d. nextid.c: In sizeushort_internal_to_stored variable ui is not used. Created attachment 434877 [details] revised git patch file (master) Fix description: Use the marshall/unmarshall technique for the entryrdn index data. Introduced sizeushort_internal_to_stored/sizeushort_stored_to_internal for the size data to store in 2 bytes. Entryrdn related functions in the dbscan utility are also modified to support marshalled data. Reviewed by ulf.weltman and fixed bugs found by him. Brief discussion about the upgrade: > My concern is if we change the entryrdn index representation, community > users who already installed 389 v1.2.6.a# or v1.2.6.rc# should do > export/import when upgrading to the next release. Do you think it's > acceptable? I think that might be acceptable since we haven't released the official 1.2.6. We'd need to be sure to let people know about it in case they are upgrading from a pre-release to the official bits though. Updated the design doc: http://directory.fedoraproject.org/wiki/Subtree_Rename#Entryrdn_index Pushed to master: $ git push Counting objects: 33, done. Delta compression using up to 4 threads. Compressing objects: 100% (20/20), done. Writing objects: 100% (20/20), 4.20 KiB, done. Total 20 (delta 16), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 5ae0c1b..666873d master -> master Pushed to 389-ds-base-1.2.6: $ git push origin ds126-local:389-ds-base-1.2.6 Counting objects: 33, done. Delta compression using up to 4 threads. Compressing objects: 100% (20/20), done. Writing objects: 100% (20/20), 4.28 KiB, done. Total 20 (delta 16), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git af5c96e..d67bb39 ds126-local -> 389-ds-base-1.2.6 I added the upgrade steps from the non-marshalled entryrdn to marshalled entryrdn. If the 389 rc# info is not accurate, please let me know: http://directory.fedoraproject.org/wiki/Subtree_Rename#warning:_upgrade_from_389_v1.2.6_.28a.3F.2C_rc1_.7E_rc6.29_to_v1.2.6_rc7_or_newer Mark as Verified Upstream. |