Bug 463408 - unaligned access from dlopen of libdmraid.so
unaligned access from dlopen of libdmraid.so
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: dmraid (Show other bugs)
5.3
ia64 Linux
medium Severity medium
: rc
: ---
Assigned To: Heinz Mauelshagen
Cluster QE
: Regression, Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-23 04:03 EDT by Alexander Todorov
Modified: 2009-01-20 15:47 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:47:54 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alexander Todorov 2008-09-23 04:03:06 EDT
Description of problem:
Unaligned access in stage2 of the installer

Version-Release number of selected component (if applicable):
anaconda-11.1.2.128-1

How reproducible:
Always

Steps to Reproduce:
1. Install into text mode
  
Actual results:
Running anaconda, the Red Hat Enterprise Linux Server system installer - please wait...
anaconda(527): unaligned access to 0x2000000001956b44, ip=0x2000000000018880
anaconda(527): unaligned access to 0x2000000001956b44, ip=0x2000000000018890
anaconda(527): unaligned access to 0x2000000001956b5c, ip=0x2000000000018880
anaconda(527): unaligned access to 0x2000000001956b5c, ip=0x2000000000018890
anaconda(527): unaligned access to 0x2000000001956b74, ip=0x2000000000018880
Probing for video card:   ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE]

Expected results:
No unaligned access

Additional info:
This is on host roentgen
Comment 1 Joel Andres Granados 2008-09-23 09:25:12 EDT
Was it on some type of special HW. s390, ia64, ppc?  Is there a test log that we can see?  It stops installation or does it continue?  If it does not stop installation, do you see any behavior in the installed system?  if you can install a system  do you see those messages when you exec another app?

thx for the info
Comment 2 Joel Andres Granados 2008-09-23 09:36:33 EDT
ok, its ia64.....
And I'm guessing it continues.
And I'm further guessing that no other apps show such message.

look at http://kbase.redhat.com/faq/FAQ_105_9111.shtm.

Additional Question.  Does this happen consistently?  How many installs have presented this message.

If this does not happen consistently, we might be able to safely ignore this.
Comment 3 Joel Andres Granados 2008-09-23 09:58:53 EDT
As disscused with in anaconda irc,  this is just the kernel being the kernel.  Additionally if this message occurred in stage2 (which has no C code in anaconda) it is safe to say that the anaconda component did not cause the message.
Pasting irc log:
"
<hansg> As for what an unaligned access is. On i386 and only on i386 (and x86_64) its allowed to read a 32 bit integer on an address which is not a  multiple of 4 bytes

<hansg> So on i386 you can read an integer starting at 2 bytes from the start of a page, then the hardware will do 2 32 bits reads, and shift them and or them together to get the intiger you want
<hansg> On all other (read all sane hardware) this is not allowed (and on intel its dog slow) 

<hansg> On other hardware the kernel emulates the i386 behavior within the trap handler to stop code from crashing, and complains loudly while it does this
<hansg> This was done as most code is written for and only ever tested on that shitty i386 architecture
<hansg> So this can pretty much be ignored, it should be fixed one day but its not urgent
"
Comment 4 RHEL Product and Program Management 2008-09-23 10:10:11 EDT
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.
Comment 5 Prarit Bhargava 2008-09-24 09:41:10 EDT
1.  This BZ should not be closed.  It is a very public facing issue and happens in the install.

While it isn't an anaconda issue, this is a bug -- it should have been properly reassigned to the appropriate group (kernel).

2.  Adding dchapman -- Doug, have you seen this?

P.
Comment 6 Prarit Bhargava 2008-09-24 09:43:36 EDT
Exception request: This type of bug will be seen during the install -- at a time when we definitely do not want any type of errors or warnings on the screen.

This *must* be fixed prior to 5.3 shipping.

P.
Comment 7 Alexander Todorov 2008-09-24 09:49:13 EDT
Just FYI - the machine is from HP, will test on Intel machines as well
Comment 8 Doug Chapman 2008-09-24 10:00:53 EDT
I will dig into this one.  FYI it is NOT a kernel bug:

anaconda(527): unaligned access to 0x2000000001956b44, ip=0x2000000000018880

If it were in the kernel it would be "kernel unaligned access".  This is probably in one of the shared libraries that anaconda uses.  I will assign to myself and then re-assign to the proper component once that is known.

- Doug
Comment 9 RHEL Product and Program Management 2008-09-24 10:06:05 EDT
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.
Comment 10 Hans de Goede 2008-09-24 10:18:21 EDT
May I suggest that as these messages are really harmless, and the entire issue now seems to be the fact that there are messages, that we just disable the messages during the installation ?
Comment 11 Prarit Bhargava 2008-09-24 10:26:15 EDT
... and then we hope we don't encounter them when the OS boots?

Let's find out what part of glibc is causing the problem and then decide what to do about them.

P.
Comment 12 Prarit Bhargava 2008-09-24 13:56:59 EDT
(In reply to comment #8)
> I will dig into this one.  FYI it is NOT a kernel bug:
> 
> anaconda(527): unaligned access to 0x2000000001956b44, ip=0x2000000000018880
> 
> If it were in the kernel it would be "kernel unaligned access".  This is
> probably in one of the shared libraries that anaconda uses.  I will assign to
> myself and then re-assign to the proper component once that is known.
> 
> - Doug

Absolutely ;) -- I just had to put it somewhere :)

P.
Comment 13 Doug Chapman 2008-09-24 14:06:19 EDT
The problem appears to lie in python-pyblock, more specifically it appears to be something with /usr/lib/python2.4/site-packages/block/dmmodule.so being loaded at runtime via dlopen().

Here is a simple reproducer, compile using cc foo.c -ldl

#include <dlfcn.h>

main(){
        dlopen("/usr/lib/python2.4/site-packages/block/dmmodule.so", RTLD_NOW);
}

This will reproduce the unaligned accesses on ia64.

I will continue to dig.
Comment 14 Doug Chapman 2008-09-24 14:21:31 EDT
Interestingly python-pyblock has not been rebuilt in over a year.  Also, it no longer builds now (I tried to build it from .src.rpm with no success).

I guess it must have been a glibc change that triggered this?

suggestions welcome
Comment 15 Doug Chapman 2008-09-24 17:51:46 EDT
OK, it appears this is a python-pyblock bug.  With my reproducer from comment #13 I can reproduce it on a RHEL5.2 system so the bug has been there all along but evidently there was a change in anaconda that causes it to get loaded now when it wasn't in RHEL5.2.


python-pyblock doesn't build anymore (which itself is a concern) so we need to fix that before we can debug this issue.
Comment 16 Doug Chapman 2008-09-24 20:29:49 EDT
It turns out that python-pyblock is linked to libdmraid and the real issue is in libdmraid itself.  I have narrowed it down to something in the source file lib/format/format.c in libdmraid but nothing obvious jumps out.

But then again it is late, will dig more tomorrow.
Comment 17 Alasdair Kergon 2008-09-24 21:33:13 EDT
There's debugging you can enable on that arch that should take you straight to the problem - install debug packages then run under gdb:

  prctl --unaligned=signal gdb <program>

Then you break to the gdb prompt with:
  Program received signal SIGBUS, Bus error.

But a common cause is casting variables into pointers that aren't aligned correctly - any casts of non-pointers into pointers are suspect.  (One near the start of the upstream file I have in front of me needs checking, for example.)

Fix may involve appending __attribute((aligned(8))) to troublesome declarations, such as char[N] or unaligned fields within structs.
Comment 19 Joel Andres Granados 2008-09-25 11:37:52 EDT
(In reply to comment #15)
> OK, it appears this is a python-pyblock bug.  With my reproducer from comment
> #13 I can reproduce it on a RHEL5.2 system so the bug has been there all along
> but evidently there was a change in anaconda that causes it to get loaded now
> when it wasn't in RHEL5.2.
> 
> 
> python-pyblock doesn't build anymore (which itself is a concern) so we need to
> fix that before we can debug this issue.
It built ok when I did a scratch build.  It also builds with if I use the command `make` and installs correctly when I `make install`,  Am I missing something here.
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1496274
Comment 20 Doug Chapman 2008-09-25 12:28:07 EDT
(In reply to comment #19)
> (In reply to comment #15)
> > OK, it appears this is a python-pyblock bug.  With my reproducer from comment
> > #13 I can reproduce it on a RHEL5.2 system so the bug has been there all along
> > but evidently there was a change in anaconda that causes it to get loaded now
> > when it wasn't in RHEL5.2.
> > 
> > 
> > python-pyblock doesn't build anymore (which itself is a concern) so we need to
> > fix that before we can debug this issue.
> It built ok when I did a scratch build.  It also builds with if I use the
> command `make` and installs correctly when I `make install`,  Am I missing
> something here.
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1496274

Strange, I get an error when I try to build via rpmbuild on a freshly installed RHEL5.3 system.  I have another BZ open for that issue:

https://bugzilla.redhat.com/show_bug.cgi?id=463857
Comment 21 Doug Chapman 2008-09-25 12:43:04 EDT
The unaligned accesses come from this code in lib/format/format.c


We have a packed struct:

struct format_member {
        const char *msg;
        const unsigned short offset;
        const unsigned short flags;
} __attribute__ ((packed));



then we declare an array of that type of struct:

static struct format_member format_member[] = {
        { "name", offset(name), FMT_ALL },
        { "description", offset(descr), FMT_ALL },
        { "capabilities", offset(caps), 0 },
        { "read", offset(read), FMT_ALL | FMT_METHOD },
        { "write", offset(write), FMT_METHOD },
        { "create", offset(create), FMT_METHOD },
......


At link time when it tries to relocate this we hit an unaligned access on every other entry in the array.  I imagine we would hit the same thing at runtime when  handlers are registered.


It appears that format_member[] is only used by functions in this file that verify the validity of handlers.  Also, I don't see any cases of us casting format_member or using anything that would care that it is packed.  In this situation it appears the only advantage of using __attribute__ ((packed)) here is we save a few bytes (a whopping total of 44 bytes).

Am I missing something?  I removed the packed attribute and it does get rid of the warnings during linking but I am not sure how to test it.
Comment 22 Heinz Mauelshagen 2008-09-26 10:06:18 EDT
Doug,
this is a regression, CHECK_FORMAT_HANDLER schouldn't be defined in format.c.
Patch in CVS.
Can I get a pm_ack to checkin and build ?
Comment 24 Heinz Mauelshagen 2008-09-26 11:39:01 EDT
Fix checked in. Build dmraid-1_0_0_rc13-14_el5 done.
Comment 25 Heinz Mauelshagen 2008-09-26 11:39:30 EDT
State -> MODIFIED
Comment 30 errata-xmlrpc 2009-01-20 15:47:54 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0078.html

Note You need to log in before you can comment on or make changes to this bug.