Bug 236891 - dmraid -r -E bus error
dmraid -r -E bus error
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: dmraid (Show other bugs)
5.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Ian Kent
Corey Marthaler
:
Depends On:
Blocks: 363331
  Show dependency treegraph
 
Reported: 2007-04-18 06:08 EDT by Ask Bjørn Hansen
Modified: 2008-05-21 13:20 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHBA-2008-0475
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 13:20:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to avoid writting to non-existent memory during metadata erase (1.31 KB, patch)
2007-10-22 03:28 EDT, Ian Kent
no flags Details | Diff

  None (edit)
Description Ask Bjørn Hansen 2007-04-18 06:08:49 EDT
Description of problem:

[root@a2 ~]# dmraid -r -d -d -d -d -d  -d -d -d -d  -E /dev/sdb
ERROR: opening "sdb_isw.dat"
ERROR: opening "sdb_isw.size"
Do you really want to erase "isw" ondisk metadata on /dev/sdb ? [y/n] :y
Bus error

Version-Release number of selected component (if applicable):

1.0.0.rc13 

How reproducible:

Always.

Steps to Reproduce:
1. Run the command above.
2.
3.
  
Actual results:

Bus error.


Expected results:

Getting rid of that @#$@#$ dmraid data.

Additional info:

I've tried with the raid option disabled or enabled in the bios, but it doesn't seem to make a difference. 
(?!)  I did succeed deleting the meta data from /dev/sda.

[root@a2 ~]# dmraid -s -g -d -d -d -d -s 
DEBUG: _find_set: searching isw_dhgghbdiha
DEBUG: _find_set: not found isw_dhgghbdiha
DEBUG: _find_set: searching isw_dhgghbdiha_Volume0
DEBUG: _find_set: searching isw_dhgghbdiha_Volume0
DEBUG: _find_set: not found isw_dhgghbdiha_Volume0
DEBUG: _find_set: not found isw_dhgghbdiha_Volume0
DEBUG: checking isw device "/dev/sdb"
ERROR: isw device for volume "Volume0" broken on /dev/sdb in RAID set "isw_dhgghbdiha_Volume0"
ERROR: isw: wrong # of devices in RAID set "isw_dhgghbdiha_Volume0" [1/2] on /dev/sdb
DEBUG: set status of set "isw_dhgghbdiha_Volume0" to 2
DEBUG: set status of set "isw_dhgghbdiha" to 4
*** *Inconsistent* Superset
name   : isw_dhgghbdiha
size   : 976773166
stride : 0
type   : GROUP
status : inconsistent
subsets: 1
devs   : 1
spares : 0
--> Subset
name   : isw_dhgghbdiha_Volume0
size   : 976766976
stride : 128
type   : mirror
status : broken
subsets: 0
devs   : 1
spares : 0
DEBUG: freeing devices of RAID set "isw_dhgghbdiha_Volume0"
DEBUG: freeing device "isw_dhgghbdiha_Volume0", path "/dev/sdb"
DEBUG: freeing devices of RAID set "isw_dhgghbdiha"
DEBUG: freeing device "isw_dhgghbdiha", path "/dev/sdb"

[root@a2 ~]# gdb  dmraid
GNU gdb Red Hat Linux (6.5-16.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib64/libthread_db.so.1".

(gdb) run -r -d -d -d -d -d  -d -d -d -d  -E /dev/sdb
Starting program: /sbin/dmraid -r -d -d -d -d -d  -d -d -d -d  -E /dev/sdb
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
ERROR: opening "sdb_isw.dat"
ERROR: opening "sdb_isw.size"
Do you really want to erase "isw" ondisk metadata on /dev/sdb ? [y/n] :
Do you really want to erase "isw" ondisk metadata on /dev/sdb ? [y/n] :y

Program received signal SIGBUS, Bus error.
0x0000003bd9c6ca27 in malloc_consolidate () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003bd9c6ca27 in malloc_consolidate () from /lib64/libc.so.6
#1  0x0000003bd9c6eea2 in _int_malloc () from /lib64/libc.so.6
#2  0x0000003bd9c706dd in malloc () from /lib64/libc.so.6
#3  0x0000003bda80b9be in _dbg_free () from /usr/lib64/libdmraid.so.1.0.0.rc13
#4  0x0000003bda807502 in display_devices () from /usr/lib64/libdmraid.so.1.0.0.rc13
#5  0x0000003bda807a9a in check_valid_format () from /usr/lib64/libdmraid.so.1.0.0.rc13
#6  0x0000003bda807b1f in check_valid_format () from /usr/lib64/libdmraid.so.1.0.0.rc13
#7  0x0000003bda810a9c in _dbg_realloc () from /usr/lib64/libdmraid.so.1.0.0.rc13
#8  0x0000003bda80974e in erase_metadata () from /usr/lib64/libdmraid.so.1.0.0.rc13
#9  0x0000000000401a2c in perform ()
#10 0x000000000040164e in main ()
Comment 1 Ian Kent 2007-09-07 10:02:54 EDT
It's been a while and there doesn't seem to have been
any activity on this bug.

Is this still a problem?

If it is could you post the files created by "dmraid -rD"
and also send the standard output to a file and post that
as well please.

Otherwise just close the bug.

Ian
Comment 2 Ask Bjørn Hansen 2007-09-10 04:00:27 EDT
Hi Ian,

After a few weeks of not hearing back I got the server reconfigured to get rid of the dmraid stuff and 
reinstalled RHEL (If I recall correctly then it could be done on the actual console, the idiotic Intel RAID bios 
didn't do console redirection).

All the data I have is included above.



 - ask
Comment 3 Ian Kent 2007-09-10 08:57:06 EDT
(In reply to comment #2)
> Hi Ian,
> 
> After a few weeks of not hearing back I got the server reconfigured to get rid
of the dmraid stuff and 
> reinstalled RHEL (If I recall correctly then it could be done on the actual
console, the idiotic Intel RAID bios 
> didn't do console redirection).
> 
> All the data I have is included above.

Yes, it makes it hard when nobody can get to look into
the issue in a sensible time.

Unfortunately, to work out what's going on I would need
the output from the command I mentioned above. Without
it I'd just be guessing and that's rarely ever useful.

So I'm stuck now.
I'll see if Heinz has any suggestions but I don't hold
much hope. Sorry.

Ian
Comment 4 Ian Kent 2007-10-22 03:26:25 EDT
Using a test environment I got from Heinz Mauelshagen
I was able to reliably cause a SEGV with "dmraid -r -E".

This may not be what was seen here but since we no longer
have the setup to test I think we should just fix the
problem I have found. I suspect there is another problem
lurking here as there were two raid devices in this setup
but the problem I found is triggered when the configuration
has only one raid device. The above report does hind that
that the configuration may actually have had only one device
in it when the error occurred but I can't be sure.

The problem occurs when the on disk meta data configuration
contains one raid device only (one block only) then the isw
module determines there are no extended attributes and extra
space isn't allocated during the meta data read.

However, the meta data write function fails to check whether
the extra space is present before copying the meta data and
SEGVs when trying to write it to the non existent memory.

Ian
Comment 5 Ian Kent 2007-10-22 03:28:18 EDT
Created attachment 233921 [details]
Patch to avoid writting to non-existent memory during metadata erase
Comment 6 Ask Bjørn Hansen 2007-10-22 03:33:56 EDT
Hi Ian,

That sounds right actually.   The box I worked on had 2 or 4 drives, but I think only one of the drives had 
problems getting the dmraid data removed, so from the perspective of dmraid there might only have been 
one.

I don't use dmraid, but I'm glad that it got fixed! 


 - ask
Comment 7 RHEL Product and Program Management 2007-10-22 03:35:10 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 13 errata-xmlrpc 2008-05-21 13:20:56 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0475.html

Note You need to log in before you can comment on or make changes to this bug.