Bug 434826
Summary: | unaligned access warnings during install from libnl | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Doug Chapman <dchapman> | ||||||||
Component: | libnl | Assignee: | Dan Williams <dcbw> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | desktop-bugs <desktop-bugs> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 5.2 | CC: | atodorov, dwa, gbeshers, prarit, rick.hester, tgraf | ||||||||
Target Milestone: | rc | Keywords: | Regression | ||||||||
Target Release: | --- | ||||||||||
Hardware: | ia64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHBA-2008-0457 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-05-21 14:28:53 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 441922 | ||||||||||
Bug Blocks: | 409971 | ||||||||||
Attachments: |
|
Description
Doug Chapman
2008-02-25 18:11:08 UTC
Created attachment 295861 [details]
simple test code to reproduce unaligned access warnings
I did some additional digging and was surprised to see libnl has not changed
since RHEL5.1. It turns out that the reason we are seeing these unaligned
warnings is because anaconda is making use of some libnl functionality that it
didn't use before.
I wrote a simple standalone program to do the same steps anaconda is doing at
the point of the warnings. That code is attached. It is taken nearly directly
from the anaconda source.
Running the code on an ia64 system will generate the following on the console:
a.out(4542): unaligned access to 0x60000000000042ac, ip=0x200000000007b2d0
a.out(4542): unaligned access to 0x60000000000042b4, ip=0x200000000007b310
a.out(4542): unaligned access to 0x60000000000042bc, ip=0x200000000007b320
a.out(4542): unaligned access to 0x60000000000045a4, ip=0x200000000007b2d0
a.out(4542): unaligned access to 0x60000000000045ac, ip=0x200000000007b310
It's a pretty old build of libnl; though newer versions break ABI compat so must tread carefully here. Thomas, any ideas? Doug, any chance you could get a backtrace of the spot where some of the accesses happen? Created attachment 295933 [details]
stack trace of one of the libnl unaligned access warnings
Here is a sample stack trace. Note that this was using a slightly hacked up
version of libnl. Since the unaligned trap doesn't give a stack trace what I
had to do was determine where the access was happening via the ip, then add my
own code to libnl to do my own unaligned check at that point and then sleep
there to allow me to attach gdb and get the stack trace.
One thing to note in the stack that confused me at first. level 11 of the
stack which is at nl.c:199 is the nl.c that is part of anaconda, not libnl.
Level 10 of the stack is the actual entry into libnl code (at nl.c:49).
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP. Any updates on this? While this may not sound like a critical issue, it makes a mess of the console during serial console text mode installs. At times this makes it so that the user is unable to read the screen. Not yet, need some guidance from thomas first. Thomas, thoughts on this issue? *** Bug 434823 has been marked as a duplicate of this bug. *** Sorry, this slipped by me. Can you post a source code listing relevant to the backtrace? I bet it's an unaligned 64bit read from a part in the netlink message. Thomas, Getting info like this out of shared libs is tricky (for me at least) and unfortunately I didn't keep my hacked up version of libnl around. However, iirc the unaligned access happened in this section of code from route/link.c (but there certainly could have been other locations as well, this is just the one I caught in the act) 286 if (tb[IFLA_MAP]) { 287 struct rtnl_link_ifmap *map = nla_data(tb[IFLA_MAP]); 288 link->l_map.lm_mem_start = map->mem_start; 289 link->l_map.lm_mem_end = map->mem_end; 290 link->l_map.lm_base_addr = map->base_addr; 291 link->l_map.lm_irq = map->irq; 292 link->l_map.lm_dma = map->dma; 293 link->l_map.lm_port = map->port; 294 link->l_mask |= LINK_ATTR_MAP; 295 } I think it was a few of the map-> reads that triggered it, not the link-> writes. Inside the kernel the "fix" would be to wrap these in get_unaligned() but that doesn't exist in userspace hearders. I will do some more hacking to double check that what I said here matches reality. Created attachment 299357 [details]
potential fix?
Disclaimer!!! I don't know that this fixes _all_ the unaligned accesses we were
seeing during install. The only way to test that would be to build a full
RHEL5.2 install image.
But, something like this patch might do the trick. It does fix the case caught
by my reproducer. It seems it would be better to make sure that the addresses
are aligned right in the first place but I imagine due the the format of the
netlink message that may not be possible?
It's where I suspected the problem to be. It's not the only 64 bit member inside of netlink message structs but the only one to my knowledge which is not properly aligned with the use of __attribute__((aligned(8))). The problem with the memcpy() solution is that I've noticed gcc producing code which optimized the memcpy() away again. The use of ifmap is mainly deprecated anyawys, I don't know of any application which actually makes use of it. I'll push your fix into my tree although the problem is really on the kernel side, though not fixable without breaking interfaces. HP has confirmed that they're willing to help QA for this bug. This event sent from IssueTracker by dwa issue 171602 This patch fixes some of the unaligned accesses but not all. We have made some improvement in that the unaligned accesses seen during anaconda stage2 are gone but we still see some in stage1. I was unable to fully test this because the only way to verify was to try an install with a tree built with the latest libnl which did not happen until last night. Please note that I did file these as 2 separate BZs to start with but the other one was closed as a dup of this one (granted they were both in libnl so they did look like the same issue on the surface). To avoid confusion I have opened BZ 441878 for the issues we are still seeing. Follow up to comment #34. It turns out the reason I still see this in anaconda stage1 is that the stage1 "/sbin/loader" is static linked. Since anaconda has not been rebuilt we still see this issue there. Since stage2 is dynamicly linked that also explains why we no longer see the unaligned accessed there. I will open a new anaconda BZ to ask for a rebuild with the new libnl. *** Bug 441518 has been marked as a duplicate of this bug. *** With the rebuilt of anaconda I now no longer see these errors. - Doug Verified by reporter. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0457.html |