Bug 441878

Summary: more unaligned access messages from libnl during install
Product: Red Hat Enterprise Linux 5 Reporter: Doug Chapman <dchapman>
Component: libnlAssignee: Dan Williams <dcbw>
Status: CLOSED NOTABUG QA Contact: desktop-bugs <desktop-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: dwa, tgraf
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-04-10 18:34:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Doug Chapman 2008-04-10 17:05:37 UTC
Description of problem:
Please note that this is in addition to the issue fixed by the patch in bug
434826.  Since the only way to test the fix was to try an install with an
anaconda image built from the new libnl I was unable to test this until the
previous fix was pulled into RHEL5.2 which didn't happen until last night.

That patch does make things better but we still hit unaligned accesses during
stage1 anaconda:

loader(761): unaligned access to 0x60000000001a2b54, ip=0x40000000000e4420
loader(761): unaligned access to 0x60000000001a2b5c, ip=0x40000000000e4430
loader(761): unaligned access to 0x60000000001a2d3c, ip=0x40000000000e43e0
loader(761): unaligned access to 0x60000000001a2d44, ip=0x40000000000e4420


I dug through the binaries and did some dissasembly and found these ip's resolve
to these lines of code in link_msg_parser():

lib/route/link.c:289
                link->l_mask |= LINK_ATTR_MAP;

lib/route/link.c:293
                link->l_master = nla_get_u32(tb[IFLA_MASTER]);

lib/route/link.c:294
                link->l_mask |= LINK_ATTR_MASTER;

I have confirmed the offsets of l_mask and l_master are properly aligned, so it
would appear the problem is that link itself is not aligned properly.

I manually traced the code to figure out how "link" is allocated:

lib/route/link.c:188
        link = rtnl_link_alloc();

lib/route/link.c:599
        return (struct rtnl_link *) nl_object_alloc_from_ops(&rtnl_link_ops);


lib/object.c:70
        new = nl_object_alloc(ops->co_size);

lib/object.c:49
        new = calloc(1, size);


Not sure at all how this is possible.  Unless there is some way to trick it
calloc should never be able to return an unaligned pointer.  Perhaps my
assumption is wrong here.  I need to find a way to reproduce this outside of
anaconda.


Version-Release number of selected component (if applicable):
libnl-1.0-0.10.pre5.5

How reproducible:
100%

Steps to Reproduce:
1. do a network based install on ia64
2. note unaligned access messages during anaconda stage1
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Dan Williams 2008-04-10 17:13:16 UTC
needinfo on tgraf...

Comment 2 Dan Williams 2008-04-10 17:14:54 UTC
Thomas; the only change in libnl-1.0-0.10.pre5.5 is the addition of the patch
that doug attached to the original libnl unaligned access bug.  It's based on
libnl-1.0-pre5 with a few patches.

Comment 3 Doug Chapman 2008-04-10 17:58:45 UTC
I am continuing to debug this issue...

I was able to run anaconda's stage one "/sbin/loader" in userspace.  So, I built
my own copy of anaconda to try to do some debugging but with my binary I do
_not_ see the unaligned accesses, but if I run the original binary I do see them.

I need to check with release engineering to see if it is possible that last
night's anaconda was still built with the older version of libnl somehow.  It is
statically linked so I don't know how to check this from the binary (if someone
has a suggestion please let me know).


Comment 4 Doug Chapman 2008-04-10 18:34:36 UTC
Ah HA!

Anaconda stage 1 is statically linked, it has not been rebuilt since libnl was
fixed.  This explains why the stage2 messages are resolved, that part of
anaconda is dynamicaly linked.

Sorry for the noise.