Bug 616608

Summary: SIGBUS in RDN index reads on platforms with strict alignments
Product: [Retired] 389 Reporter: Ulf Weltman <ulf.weltman>
Component: Directory ServerAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED CURRENTRELEASE QA Contact: Viktor Ashirov <vashirov>
Severity: medium Docs Contact:
Priority: low    
Version: 1.2.6CC: amsharma, nhosoi, nkinder, rmeggins
Target Milestone: ---Keywords: VerifiedUpstream
Target Release: ---   
Hardware: ia64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-07 17:11:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 543590, 639035    
Attachments:
Description Flags
patch proposal
none
new patch proposal
none
new patch proposal with additional fixes
none
fix proposal with commented-out line removed
nhosoi: review+
git patch file (master)
none
revised git patch file (master) nkinder: review+

Description Ulf Weltman 2010-07-20 23:02:21 UTC
Program received signal SIGBUS, Bus error
  si_code: 1 - BUS_ADRALN - Invalid address alignment.
[Switching to thread 13 (system thread 161253)]
0x9fffffffed2105d0:0 in _entryrdn_rdn_elem_size (elem=0x9fffffffed11456f)
    at ../ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:1366

1366        len += elem->rdn_elem_rdn_len + elem->rdn_elem_nrdn_len;


len is a size_t while the rdn_elem_rdn_len and rdn_elem_nrdn_len members of rdn_elem struct are PRUint16.

Comment 1 Ulf Weltman 2010-07-20 23:04:26 UTC
Created attachment 433277 [details]
patch proposal

Comment 2 Ulf Weltman 2010-07-21 01:43:34 UTC
Please hold off on this, I was too hasty.  It looks more like corruption, the address for the rdn_elem struct doesn't look right.

Comment 3 Ulf Weltman 2010-07-21 18:19:04 UTC
Created attachment 433471 [details]
new patch proposal

MODDN produces SIGBUS when strict alignments are enforced because data read from the rdn index is cast to an an rdn_elem struct which is not aligned properly.  To fix this I copy just the first part of the raw rdn_elem to stack memory and use that to find the true size.

Comment 4 Ulf Weltman 2010-07-22 01:07:50 UTC
Created attachment 433564 [details]
new patch proposal with additional fixes

The RDN index stores rdn_elem structs.  In some places dbc->c_get() is used with DB_MULTIPLE flag for bulk retrieval of rdn_elem structs.  The Berkeley DB macro DB_MULTIPLE_NEXT is used to iterate over and return a pointer into the buffer for each item.  These pointers might not be for aligned memory so when the data is cast to a rdn_elem struct and the members are accessed a SIGBUS occurs.

I've found these affected areas:
1) In libback-ldbm, _entryrdn_index_read() uses _entryrdn_dup_rdn_elem() to duplicate what it gets back with DB_MULTIPLE_NEXT.
Fixed _entryrdn_dup_rdn_elem() by making it start out by copying just the first part of the raw rdn_elem to stack memory and use that to find the true size so it can proceed with copying the full thing.
2) In libback-ldbm, _entryrdn_append_childidl() tries to access members of the rdn_elem that it gets back from DB_MULTIPLE_NEXT directly for appending to the IDL.
Fixed by making it duplicate the data to a buffer that is recycled each iteration.
3) In dbscan, display_entryrdn_item() retrieves rdn_elem with DB_MULTIPLE_NEXT and passes them to _entryrdn_dump_rdn_elem() for printing.
Fixed by duplicating it.

Comment 5 Noriko Hosoi 2010-07-22 21:51:33 UTC
Thanks for the patch, Ulf!  Your fix looks good.

I have one minor request. :) Please use pure C comment syntax for the commented out line in https://bugzilla.redhat.com/attachment.cgi?id=433564&action=diff#a/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c_sec4:
2560  	//  myelem = NULL;"
(But you don't have to leave the line, do you?  Please just delete it.  Thanks!)

Comment 6 Ulf Weltman 2010-07-22 22:05:50 UTC
Created attachment 433821 [details]
fix proposal with commented-out line removed

Thanks Noriko!
Oops, I use the C++-style comments when I'm debugging since it's easier to type and intended to remove the line completely before submitting.  I've attached an identical diff except with the commented line removed.

Comment 7 Rich Megginson 2010-07-22 23:01:42 UTC
Why is it necessary to do a malloc/slapi_ch_realloc?

Comment 8 Ulf Weltman 2010-07-22 23:13:00 UTC
rdn_elem are not of fixed size, the nrdn/rdn strings trail behind it.  So sizeof(rdn_elem) reports 12, but if the nrdn and rdn strings are "uid=foo\0", the true size of rdn_elem is 12+8+8.  I tried to reduce the number of reallocs by allocating 1.5x the space.

Comment 9 Noriko Hosoi 2010-07-22 23:18:48 UTC
Not sure if it's a good idea or not, but do we want to introduce ifdef HP-UX? (or possibly SOLARIS, as well?)

Comment 10 Ulf Weltman 2010-07-22 23:41:33 UTC
For _entryrdn_append_childidl() I could change it to use a stack buffer of decent size to start with and only switch to malloc'd memory if needed (e.g. a gigantic RDN comes along).  Would that be preferable?

By the way, in dbscan I do malloc/free for each RDN, having assumed you'd prefer simple rather than more complex but high-performing code for this tool.

Comment 11 Rich Megginson 2010-07-22 23:42:16 UTC
Is the problem this:
typedef struct _rdn_elem {
    PRUint32 rdn_elem_id;
    PRUint16 rdn_elem_nrdn_len; /* including '\0' */
    PRUint16 rdn_elem_rdn_len;  /* including '\0' */
    char rdn_elem_nrdn_rdn[1];  /* "normalized rdn" '\0' "rdn" '\0' */
} rdn_elem;

#define RDN_ADDR(elem) ((elem)->rdn_elem_nrdn_rdn + (elem)->rdn_elem_nrdn_len)

Because rdn_elem_nrdn_rdn is not really a char *, the address computation and
dereference of ((elem)->rdn_elem_nrdn_rdn + (elem)->rdn_elem_nrdn_len) fails? 
I understand the rationale for using this "trick", in order to perform only one
memory alloc (instead of a separate one for a char *rdn_elem_nrdn_rdn) - is
there a more portable way to do this?

Comment 12 Noriko Hosoi 2010-07-23 00:20:49 UTC
I think the problem is rather the size of rdn_elem is variable and not the multiple of 4-bytes.  If we read from db with MULTIPLE flag, multiple rdn_elems are put into the buffer without pudding.  The starting point of each rdn_elem could be invalid on the strict alignment system.

If we could allocate rdn_elem always guaranteed to be aligned, then we don't have this problem (I hope...)

Comment 13 Noriko Hosoi 2010-07-23 00:33:20 UTC
Or we could change rdn_elem like this and convert ID and length in the entryrdn code.  This might be more reasonable...
typedef struct _rdn_elem {
    char rdn_elem_id[4];
    char rdn_elem_nrdn_len[2]; /* including '\0' */
    char rdn_elem_rdn_len[2];  /* including '\0' */
    char rdn_elem_nrdn_rdn[1];  /* "normalized rdn" '\0' "rdn" '\0' */
} rdn_elem;

(Do we need to support the upgrade from 1.2.6.rc# to 1.2.6.rc#+1?  It might be difficult...)

Comment 14 Rich Megginson 2010-07-23 02:56:51 UTC
(In reply to comment #10)
> For _entryrdn_append_childidl() I could change it to use a stack buffer of
> decent size to start with and only switch to malloc'd memory if needed (e.g. a
> gigantic RDN comes along).  Would that be preferable?
> 
> By the way, in dbscan I do malloc/free for each RDN, having assumed you'd
> prefer simple rather than more complex but high-performing code for this tool.    

If we're introducing mallocs to code that didn't have mallocs previously, I just want to understand why.

Comment 15 Rich Megginson 2010-07-23 03:31:14 UTC
(In reply to comment #12)
> I think the problem is rather the size of rdn_elem is variable and not the
> multiple of 4-bytes.  If we read from db with MULTIPLE flag, multiple rdn_elems
> are put into the buffer without pudding.  The starting point of each rdn_elem
> could be invalid on the strict alignment system.

Is there a flag to pass to db open or cursor creation or ??? so that DB_MULTIPLE_NEXT will return correctly aligned data?

Is the problem that in 

static size_t
_entryrdn_rdn_elem_size(rdn_elem *elem)
{
    size_t len = sizeof(rdn_elem);
    len += elem->rdn_elem_rdn_len + elem->rdn_elem_nrdn_len;
    return len;
}

the address of elem is not a multiple of 8?  What if, when we stored the data, we rounded up the size of elem to the nearest multiple of the pointer size?  Would that guarantee that DB_MULTIPLE_NEXT would return the data aligned properly?

> 
> If we could allocate rdn_elem always guaranteed to be aligned, then we don't
> have this problem (I hope...)

Comment 16 Ulf Weltman 2010-07-23 16:52:33 UTC
(In reply to comment #15)
> (In reply to comment #12)
> > I think the problem is rather the size of rdn_elem is variable and not the
> > multiple of 4-bytes.  If we read from db with MULTIPLE flag, multiple rdn_elems
> > are put into the buffer without pudding.  The starting point of each rdn_elem
> > could be invalid on the strict alignment system.
> 
> Is there a flag to pass to db open or cursor creation or ??? so that
> DB_MULTIPLE_NEXT will return correctly aligned data?
DB_MULTPLE_NEXT is a simple macro that advances the pointer in the buffer, it doesn't take any flags, nor does DB_MULTIPLE_INIT.  I had hoped to find something for the initial dbc->getc() call to combine with the DB_MULTIPLE flag when it creates the bulk buffer but didn't spot anything.

> 
> Is the problem that in 
> 
> static size_t
> _entryrdn_rdn_elem_size(rdn_elem *elem)
> {
>     size_t len = sizeof(rdn_elem);
>     len += elem->rdn_elem_rdn_len + elem->rdn_elem_nrdn_len;
>     return len;
> }
> 
> the address of elem is not a multiple of 8?
Yes.  Also in _entryrdn_append_childidl() it tries to access the rdn_elem_id and rdn_elem_nrdn_rdn members of a raw rdn_elem.  Something similar is done in dbscan.

>  What if, when we stored the data,
> we rounded up the size of elem to the nearest multiple of the pointer size? 
> Would that guarantee that DB_MULTIPLE_NEXT would return the data aligned
> properly?
I think that the variable-sized nrdn+rdn data will still skew the alignment of the next rdn_elem.  If we could pad the end of the variable data it might work, but it seems fragile in that we'd be relying on undocumented Berkeley DB implementation that could lay out the bulk buffer differently later.

Comment 17 Noriko Hosoi 2010-07-23 17:11:43 UTC
(In reply to comment #16)
> (In reply to comment #15)
[...]
> > the address of elem is not a multiple of 8?
> Yes.  Also in _entryrdn_append_childidl() it tries to access the rdn_elem_id
> and rdn_elem_nrdn_rdn members of a raw rdn_elem. 

I think as long as each rdn_elem's size is a multiple of 8, accessing the members should be okay.

> Something similar is done in dbscan.

As long as each datum in db is aligned, accessing it in the MULTIPLE buffer should not cause any problem, I think.

The point is we store aligned rdn_elem in the db...
>
> >  What if, when we stored the data,
> > we rounded up the size of elem to the nearest multiple of the pointer size?
> > Would that guarantee that DB_MULTIPLE_NEXT would return the data aligned
> > properly?
> I think that the variable-sized nrdn+rdn data will still skew the alignment of
> the next rdn_elem.  If we could pad the end of the variable data it might work,
> but it seems fragile in that we'd be relying on undocumented Berkeley DB
> implementation that could lay out the bulk buffer differently later.   

Berkeley DB does not care the contents of data whether the last 1 ~ 7 bytes are used or not.  All it cares is the size of data.data is data.size.

Might be half baked, but could you experiment this change on HP-UX, Ulf?
diff --git a/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c b/ldap/servers/slapd/b
index dc00d33..7ceed20 100644
--- a/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c
+++ b/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c
@@ -1359,11 +1359,15 @@ _entryrdn_dup_rdn_elem(const void *raw, rdn_elem **new)
     memcpy(*new, raw, elem_len);
 }
 
+#define ALIGNMENT4MULT 8
 static size_t
 _entryrdn_rdn_elem_size(rdn_elem *elem)
 {
     size_t len = sizeof(rdn_elem);
     len += elem->rdn_elem_rdn_len + elem->rdn_elem_nrdn_len;
+    /* we rounded up the size of elem to the nearest multiple of the pointer
+     * size*/
+    len = ((len + ALIGNMENT4MULT) & ~(ALIGNMENT4MULT - 1));
     return len;
 }

Comment 18 Rich Megginson 2010-07-23 17:14:24 UTC
It would be interesting to see if that makes the problem go away.

That's what I meant - pad the data.  I guess it does seem fragile, but I don't
understand why using regular put/get returns aligned data, and why using
DB_MULTIPLE returns unaligned data.  We're doing almost the exact same thing as
is recommended in the BDB documentation:

http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_misc_struct.html

That doesn't differentiate between single get() and DB_MULTIPLE get().  I
haven't been able to find anything about the implementation details e.g.
* If we put aligned, padded data with put(), can we be guaranteed of reading
properly aligned data with get() and DB_MULTIPLE get()?
* If it stores/retrieves unaligned data by default, is there some flag we can
pass to make sure put() stores it aligned and get() reads it aligned?

The changelog code uses DB_MULTIPLE, but it uses the marshalling/unmarshalling
method mentioned on that bdb doc page, so I suppose we can never run into this
problem.  If there is no way to guarantee that DB_MULTIPLE returns aligned
data, I suggest that we just change the rdn code to use the marshall/unmarshall
method instead, which seems far safer.

Note: http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_misc.html

"The Berkeley DB access methods provide no guarantees about byte alignment for
returned key/data pairs, or callback functions which take DBT references as
arguments, and applications are responsible for arranging any necessary
alignment. The DB_DBT_MALLOC, DB_DBT_REALLOC, and DB_DBT_USERMEM flags may be
used to store returned items in memory of arbitrary alignment."

which seems in direct opposition to the information on the am_misc_struct.html
page.

Comment 19 Noriko Hosoi 2010-07-23 17:42:20 UTC
On 07/23/2010 10:14 AM, bugzilla wrote:
Comment #18 from Rich Megginson <rmeggins> 2010-07-23 13:14:24 EDT
> It would be interesting to see if that makes the problem go away.
>
> That's what I meant - pad the data.  I guess it does seem fragile, but I don't
> understand why using regular put/get returns aligned data, and why using
> DB_MULTIPLE returns unaligned data.  

Using regular get, each retrieved datum is put in a USER_MEM buffer or internally allocated memory.  I think the starting address of data is an issue.  If using DB_MULTIPLE, various sized data are put into one big buffer contiguously.

> We're doing almost the exact same thing as
> is recommended in the BDB documentation:
>
> http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_misc_struct.html
>
> That doesn't differentiate between single get() and DB_MULTIPLE get().  I
> haven't been able to find anything about the implementation details e.g.
> * If we put aligned, padded data with put(), can we be guaranteed of reading
> properly aligned data with get() and DB_MULTIPLE get()?
I believe so.  Again, "padded data" is ours.  For libdb, it's part of data.
> * If it stores/retrieves unaligned data by default, is there some flag we can
> pass to make sure put() stores it aligned and get() reads it aligned?

Ah, that's a good idea...  Only when the size of data is not a multiple of 8, we could copy it to malloc'ed memory as Ulf did.  Then, we don't have to worry about the migration!

> The changelog code uses DB_MULTIPLE, but it uses the marshalling/unmarshalling
> method mentioned on that bdb doc page, so I suppose we can never run into this
> problem.  If there is no way to guarantee that DB_MULTIPLE returns aligned
> data, I suggest that we just change the rdn code to use the marshall/unmarshall
> method instead, which seems far safer.

Do you prefer this way?  We could switch to it, too.

> Note:
> http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_misc.html
>
> "The Berkeley DB access methods provide no guarantees about byte alignment for
> returned key/data pairs, or callback functions which take DBT references as
> arguments, and applications are responsible for arranging any necessary
> alignment. The DB_DBT_MALLOC, DB_DBT_REALLOC, and DB_DBT_USERMEM flags may be
> used to store returned items in memory of arbitrary alignment."
>
> which seems in direct opposition to the information on the am_misc_struct.html
> page.
>

Comment 20 Rich Megginson 2010-07-23 17:52:18 UTC
It seems safest to use the marshall/unmarshall technique.  However, I would really like to see what happens if we explicitly pad the data to make it aligned.

Comment 21 Noriko Hosoi 2010-07-23 17:59:48 UTC
(In reply to comment #20)
> It seems safest to use the marshall/unmarshall technique.  However, I would
> really like to see what happens if we explicitly pad the data to make it
> aligned.    

Agreed!  Let me "take" this bug. ;)

Comment 22 Ulf Weltman 2010-07-23 18:39:23 UTC
I've tested padding but it doesn't seem to work.  The very first item returned from the bulk retrieval with DB_MULTIPLE_NEXT is misaligned:
2470                DB_MULTIPLE_INIT(ptr, &data);
(gdb) n
2471                do {
(gdb) n
2472                    memset(&dataret, 0, sizeof(dataret));
(gdb) n
2473                    DB_MULTIPLE_NEXT(ptr, &data, dataret.data, dataret.size);
(gdb) n
2474                    if (NULL == dataret.data || NULL == ptr) {
(gdb) print dataret.data
$4 = (void *) 0x9fffffffed2acd87
(gdb) print &((rdn_elem*)dataret.data)->rdn_elem_id
$5 = (unsigned int *) 0x9fffffffed2acd87

Comment 23 Noriko Hosoi 2010-07-23 20:16:32 UTC
Thanks to Ulf for testing the change.  All right.  We really need to "marshall" the entryrdn data.

Comment 24 Noriko Hosoi 2010-07-23 23:28:07 UTC
Created attachment 434086 [details]
git patch file (master)

Fix description:
Use the marshall/unmarshall technique for the entryrdn index data.
Introduced sizeushort_internal_to_stored/sizeushort_stored_to_internal
for the size data to store in 2 bytes.

Ulf, could you apply this patch and try on HP-UX?  I ran several test cases and they passed 100%.

Comment 25 Ulf Weltman 2010-07-26 21:20:10 UTC
Great, works for me, I tested on HP-UX on ia64.  Thanks Noriko!  dbscan.c needs updates too, right?

Some minor comments:
ldbm_entryrdn.c:
newid is set to id_stored_to_internal(newelem->rdn_elem_id), but the ID in newelem was originally passed into the function so you could just do newid = id so its origin is clear.
In two instances ID is written into keybuf using slapi_ch_smprintf with format %u, in others with %d.

nextid.c:
In sizeushort_internal_to_stored variable ui is not used.

Comment 26 Noriko Hosoi 2010-07-27 22:49:06 UTC
Created attachment 434877 [details]
revised git patch file (master)

Fix description:
Use the marshall/unmarshall technique for the entryrdn index data.
Introduced sizeushort_internal_to_stored/sizeushort_stored_to_internal
for the size data to store in 2 bytes.

Entryrdn related functions in the dbscan utility are also modified
to support marshalled data.

Reviewed by ulf.weltman and fixed bugs found by him.

Brief discussion about the upgrade:
> My concern is if we change the entryrdn index representation, community 
> users who already installed 389 v1.2.6.a# or v1.2.6.rc# should do 
> export/import when upgrading to the next release.  Do you think it's 
> acceptable?

I think that might be acceptable since we haven't released the official 1.2.6.  We'd need to be sure to let people know about it in case they are upgrading from a pre-release to the official bits though.

Comment 27 Noriko Hosoi 2010-07-27 23:46:47 UTC
Updated the design doc:
http://directory.fedoraproject.org/wiki/Subtree_Rename#Entryrdn_index

Comment 28 Noriko Hosoi 2010-07-30 23:14:14 UTC
Pushed to master:

$ git push
Counting objects: 33, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (20/20), done.
Writing objects: 100% (20/20), 4.20 KiB, done.
Total 20 (delta 16), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   5ae0c1b..666873d  master -> master

Pushed to 389-ds-base-1.2.6:

$ git push origin ds126-local:389-ds-base-1.2.6
Counting objects: 33, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (20/20), done.
Writing objects: 100% (20/20), 4.28 KiB, done.
Total 20 (delta 16), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   af5c96e..d67bb39  ds126-local -> 389-ds-base-1.2.6

I added the upgrade steps from the non-marshalled entryrdn to marshalled entryrdn.  If the 389 rc# info is not accurate, please let me know:
http://directory.fedoraproject.org/wiki/Subtree_Rename#warning:_upgrade_from_389_v1.2.6_.28a.3F.2C_rc1_.7E_rc6.29_to_v1.2.6_rc7_or_newer

Comment 30 Rich Megginson 2011-07-08 13:33:13 UTC
Mark as Verified Upstream.