Bug 100735 - Upgraded kernel will not start Software-RAID (md)
Summary: Upgraded kernel will not start Software-RAID (md)
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-07-24 20:08 UTC by Ricky Boone
Modified: 2007-04-18 16:56 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:41:20 UTC
Embargoed:


Attachments (Terms of Use)
Latest dmesg running kernel-2.4.20-19-17 (6.88 KB, text/plain)
2003-07-24 20:11 UTC, Ricky Boone
no flags Details

Description Ricky Boone 2003-07-24 20:08:06 UTC
Description of problem:

Background:  The system is running Red Hat Linux 7.2.  I am using Software-RAID 
for the root partition (/dev/hda3 and /dev/hdd1 as /dev/md0 RAID-1), and a 
standard ext3 partition for /boot.  I am using kernel-2.4.9-34, but would like 
to upgrade as soon as possible.

Also, the kernel had been upgraded before without incident from the original 
version that came with Red Hat Linux 7.2 (2.4.7-x) to 2.4.9-34.  The Software-
RAID partition was setup with the original installation.

Please note that most of the logs or config files I show will be trimmed to 
only show what is needed.  More detailed logs can be provided on request.


Problem:  I am trying to upgrade to kernel-2.4.20-18.7, but am having some 
difficulty with Software-RAID working once I have installed the RPM.  After 
installing the RPM with up2date or rpm, I made sure the changes to LILO were 
made, and rebooted.  It looked like everything was okay after that, but I 
wasn't getting any statistics from the second drive in the array (hdd1).  I ran 
procinfo from the shell, and it confirmed that only drive 1 (hda3) was getting 
any read/writes.  I went through /var/spool/messages, and found the following:

kernel: kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
kernel: md: personality 3 is not loaded!
kernel: md :do_md_run() returned -22
kernel: md: md0 stopped.
kernel: md: unbind<hdd1,1>
kernel: md: export_rdev(hdd1)
kernel: md: unbind<hda3,0>
kernel: md: export_rdev(hda3)

I checked /sbin/modprobe -c and it had an alias set up that looked okay:

alias md-personality-3 raid1

Most RPM's on the system have been updated using up2date, including raidtools, 
etc.

With the assistance from a few members of the redhat-list mailing list, I've 
tried troubleshooting the problem, but it still is not resolvable.  Below are 
some things we tried to fix this:

We mounted the initrd ramdisks for both kernels and found they were identical.  
I even attempted to create a new one with mkinitrd to force the raid1 module to 
load.  The system would boot, but did not have RAID.

We've also tried running raidstart on /dev/md0 by hand.  No luck there.

# raidstart /dev/md0
/dev/md0: File exists



Version-Release number of selected component (if applicable):

kernel-2.4.20-19.7
kernel-2.4.20-18.7

How reproducible:

Can reproduce on the same machine, but on other systems with similar 
environments it cannot be reproduced.

Steps to Reproduce:

1. Install Red Hat Linux 7.2, conforming to setup instructions provided by 
Ensim for Webppliance 3.1.x (specific packages on Red Hat CD's, etc).  
Installation manual located here: 
http://www.ensim.com/support/wpls/documents/lwp_313ls_install_manually.pdf
2. Upgrade specific RPM's (also noted in Ensim manual).
3. Install Ensim Webppliace 3.1.3, as well as incremental updates.
4. Using up2date, upgrade RPM's that aren't being used by Ensim, including 
kernel and related packages (mostly apache, perl, sendmail, etc.).
5. Make sure lilo is configured correctly, reboot.
6. Check to see if md-personality-3 (raid1) module loaded.

    
Actual results:

On the production system, I get the following error in dmesg:

kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md0 stopped.

RAID is not loaded, and only hda3 is running the bootable / partition.

Expected results:

RAID should have loaded, with hda3 and hdd1 being a RAID1 / partition.

Additional info:

The only thing special about this system is that there is custom RPM's 
installed for mostly web related services (Ensim).  These should not have any 
effect with the kernel.

/dev/hda1 has recently been replaced due to hardware failure.  The mirror was 
rebuilt and everything runs just fine under kernel-2.4.9.

Comment 1 Ricky Boone 2003-07-24 20:11:26 UTC
Created attachment 93119 [details]
Latest dmesg running kernel-2.4.20-19-17

Comment 2 Ricky Boone 2003-07-25 16:31:05 UTC
Ensim's list of RPM's not allowed to be upgraded for WP LS 3.1 is available 
here for those who are interested:  (login as Guest)
http://onlinesupport.ensim.com/TWKB/ViewCase.asp?
QSRuleID=640&QSType=ALL&QSKeyword=rpm&QSSortby=SCORE&QSCatid=271&QSInt1=0&QSCat1
=271&QSTop1=0&QSCScore=1&QSDescType=P&QSMatchType=E&QSStatus=ALLPUB&QSFromFileNa
me=ListRules&QSSearchResult=SR&QSReturn=&CurrentPage=1

My skip-list in up2date is as follows:  
analog*;apache*;cronolog*;frontpage*;gettext*;libsfio*;majordomo*;mod_jk*;mod_ss
l*;mod_perl*;mx*;perl*;php*;phpMyAdmin*;postgresql*;proftpd*;python2*;sendmail*;
tomcat4*;ucd-snmp*;vacation*;Zope*;

Comment 3 Bugzilla owner 2004-09-30 15:41:20 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.