Bug 496835

Summary: PATCH: fails to activate isw raidset on disks with a long serialnummer
Product: Red Hat Enterprise Linux 5 Reporter: Hans de Goede <hdegoede>
Component: dmraidAssignee: Heinz Mauelshagen <heinzm>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: agk, bmr, dwysocha, hdegoede, heinzm, lvm-team, mbroz, pato.lukaz, prockai, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 490121 Environment:
Last Closed: 2009-09-02 11:17:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 490121    
Bug Blocks:    

Description Hans de Goede 2009-04-21 12:09:01 UTC
This is a clone from a Fedora bug about issues with activating
isw dmraid sets on newer harddisks, it would be very good to
have this fixed in 5.4 too.

This bug report includes a (tested and now part of Fedora) patch.

+++ This bug was initially created as a clone of Bug #490121 +++

A report to anaconda resulted in a user finding an issue with dmraid in liveCD.  look at https://bugzilla.redhat.com/show_bug.cgi?id=470543#c27.  The reporter provided a patch.

Anaconda has a tracker bug used to track all the dmraid related issues and this bug is getting duped so we can have one point of entrance for dmraid issues.  Im opening a new bug for dmraid in case this has not been reported and the patch is relevant.

--- Additional comment from pato.lukaz on 2009-03-14 17:20:59 EDT ---

Summary:

dmraid uses ioctl() function to get disk info (so do hdparm command), isw raid metadata contains also disk serial numbers. dmraid validates serial number obtained from ioctl() against serial data obtained from raid medatata. However, for my particular hardware configuration serial number from ioctl() is "090114FC3D00NJG3ZDYD" and serial number from isw metadata is "14FC3D00NJG3ZDYD"
This causes error:

sh-4.0# dmraid -s
ERROR: isw: Could not find disk /dev/sdb in the metadata
ERROR: isw: Could not find disk /dev/sda in the metadata

Detail:

I have run F10 Live CD, and I have also installed dmraid src rpm from rawhide
repo, after debugging the program dmraid I have narrowed my particular problem
(Dell Precision M6400):

[root@localhost ~]# hdparm -i /dev/sda

/dev/sda:

 Model=Hitachi HTS723225L9A362                 , FwRev=FCDOC30F,  SerialNo=090114FC3D00NJG3ZDYD
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=15058kB, MaxMultSect=16, MultSect=?0?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=488397168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-2,3,4,5,6,7

ioctl function returns serial number for di struct with this:
(gdb) p *di
$16 = {list = {next = 0x63f900, prev = 0x63f900}, path = 0x63f8e0 "/dev/sda",
serial = 0x63f930 "090114FC3D00NJG3ZDYD", sectors = 488397168}

and isw disk metadata info has this:

(gdb) p *isw
$25 = {sig = "Intel Raid ISM Cfg Sig. 1.0.00\000", check_sum = 4095911876,
mpb_size = 480, family_num = 3938914151, generation_num = 196, 
  error_log_size = 4080, attributes = 2147483648, num_disks = 2 '\002',
num_raid_devs = 1 '\001', error_log_pos = 2 '\002', fill = "", cache_size = 0, 
  orig_family_num = 3938914151, filler = {94, 0 <repeats 36 times>}, disk =
{{serial = "14FC3D00NJG3ZDYD", totalBlocks = 488397168, scsiId = 0, 
      status = 314, owner_cfg_num = 0, filler = {0, 0, 0, 0}}}}

(gdb) p isw->disk[0]
$26 = {serial = "14FC3D00NJG3ZDYD", totalBlocks = 488397168, scsiId = 0, status
= 314, owner_cfg_num = 0, filler = {0, 0, 0, 0}}

(gdb) p isw->disk[1]
$27 = {serial = "14FC3D00NJG3ZJ6D", totalBlocks = 488397168, scsiId = 262144,
status = 314, owner_cfg_num = 0, filler = {0, 0, 0, 0}}

The function _get_disk try to match serial number from di->serial
("090114FC3D00NJG3ZDYD") with serial number frow isw->disk[0].serial
("14FC3D00NJG3ZDYD"), but seriial number from ioctl has a 4 byte prefix that
causes the problem 

Patch: (Also attached)

--- lib/format/ataraid/isw.old	2009-03-10 23:05:09.000000000 -0400
+++ lib/format/ataraid/isw.c	2009-03-10 23:19:45.000000000 -0400
@@ -84,12 +84,18 @@
 static struct isw_disk *
 _get_disk(struct isw *isw, struct dev_info *di)
 {
+	size_t serial_len, serial_off;
+
 	if (di->serial) {
 		int i = isw->num_disks;
 		struct isw_disk *disk = isw->disk;
 
 		while (i--) {
-			if (!strncmp(di->serial, (const char *) disk[i].serial,
+			serial_off = ((serial_len=strlen(di->serial)) > MAX_RAID_SERIAL_LEN) ?
+						serial_len - MAX_RAID_SERIAL_LEN : 0;
+
+
+			if (!strncmp(di->serial+serial_off, (const char *) disk[i].serial,
 				     MAX_RAID_SERIAL_LEN))
 				return disk + i;
 		}

--- Additional comment from pato.lukaz on 2009-03-14 17:27:46 EDT ---

Created an attachment (id=335230)
Get last 16 characters from serial number when this is > than MAX_RAID_SERIAL_LEN

Similar to the patch to remove white spaces from serial number this is needed when serial number is > 16 characters wide.

--- Additional comment from hdegoede on 2009-04-10 08:59:18 EDT ---

Created an attachment (id=339090)
PATCH adding support for longer disk serial numbers

Here is a patch properly fixing this, this code is partially based on:
http://git.kernel.org/?p=linux/kernel/git/djbw/mdadm.git;a=blob_plain;f=super-intel.c;h=dd15673c153801bbb743b9fa4051867a254cc714

Thanks for the initial patch! And can you please test if this patch fixes things for you as well ?

Heinz, could you please review this patch ?

--- Additional comment from pato.lukaz on 2009-04-12 23:22:12 EDT ---

Hi Hans,

Thanks for the right patch, I tested this with rawhide dmraid-1.0.0.rc15-6.fc11.src.rpm and it's fine, also tested with serial number < MAX_RAID_SERIAL_LEN

[root@hawking ~]# /usr/local/sbin/dmraid -s
*** Group superset isw_djdijbebfb
--> Active Subset
name   : isw_djdijbebfb_ARRAY
size   : 976783360
stride : 256
type   : stripe
status : ok
subsets: 0
devs   : 2
spares : 0

Hope Heinz could test it and add it to Fedora 11. I had to build my own DVD/CD set to include dmraid patched by me and install Fedora 11 Beta on my Dell laptop.

--- Additional comment from hdegoede on 2009-04-20 15:04:08 EDT ---

This is fixed (with the patch I attached) in dmraid-1.0.0.rc15-7.fc11, which
will be in the next rawhide (and F-11 final).

Thanks for testing!

Comment 4 errata-xmlrpc 2009-09-02 11:17:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1347.html