1353423 – 7.2 not compatible with C236 and RSTe motherboard

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1353423 - 7.2 not compatible with C236 and RSTe motherboard

Summary: 7.2 not compatible with C236 and RSTe motherboard

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	systemd-maint
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-07 03:30 UTC by Todd
Modified:	2020-12-15 07:42 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-15 07:42:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
drive info from gsmartcontrol (5.00 KB, text/plain) 2016-07-07 03:30 UTC, Todd	no flags	Details
journal -xb after drop to maintenance mode (223.06 KB, text/plain) 2016-07-13 22:00 UTC, Todd	no flags	Details
/var/log/boot.log showing freeze up at /boot (32.86 KB, text/plain) 2016-07-13 22:03 UTC, Todd	no flags	Details
kdump fails too (1015 bytes, application/octet-stream) 2016-07-13 22:04 UTC, Todd	no flags	Details
View All

Description Todd 2016-07-07 03:30:26 UTC

Created attachment 1177129 [details]
drive info from gsmartcontrol

Dear Red Hat,

I am coming from the community: Scientific Linux 7.2 x64

Would one of our intrepid heroes please fix this for me?

I designed a server with a Supermicro X11SAE-M motherboard:
http://www.supermicro.com/products/motherboard/Xeon/C236_C232/X11SAE-M.cfm

With uses an Intel C236 chipset:
http://ark.intel.com/products/90594/Intel-GL82C236-PCH

I have two 2TB drives configured as RSTe RAID 1.

I have a third 4TB hard drive configured in a removable SATA sleeve for
backup (actually two drives, but only one can be inserted at a time, they rotate daily).  The/these drive(s) is/are a TOSHIBA MG03ACA400.

Problem 1: SL 7.2 will not boot if one of these drives is blank and inserted at the time of boot.  Boot get a ways in, then drops to maintenance.  Remove the drive, power off and back on, and SL 7.2 boots up just fine.

Problem 2: after boot up and reinserting the blank drive. gdisk, fdisk, gparted all do not see the disk (/dev/sdc).  Therefore you can not format it 

To format the disk, I booted off of a direct install Fedora Core 23 USB flash drive.  I used gparted to set the disks up with gpt partition and xfs.  After that, booting into SL 7.2 was not an issue and everyone saw the drive.  Hot swap
worked too.

So, the problem is with SL 7.2 and not Fedora 23.

Many thanks,
-T

Comment 2 Todd 2016-07-13 21:59:24 UTC

Okay, this is even worse than I thought. 

Now it conspires that SL 7.2 will randomly not find my /boot partition either.  So SL7.2 randomly will find and not find the primary raid partition and the second (third physical) drive.

Raid 1 pair is Toshiba MG04ACA200A 2TB SATA 6Gb/s 7200RPM 128MB 3.5 inch 4Kn (Tomcat) Bare.

This reproduces unencrypted or encrypted.

Here is the absolute bugger!  The Live DVD I made this off of:

http://ftp1.scientificlinux.org/linux/scientific/7.2/x86_64/iso/SL-72-x86_64-2016-02-03-LiveDVDgnome.iso

Recognizes both the RAID 1 pair and the extra drive PERFECTLY!  And it is repeatable every boot (or at least the 10 times I tried).

This issue repeated after wiping and reinstalling unencrypted with the above Live DVD.

As I can not deliver a server to a customer in the unreliable condition, I had to eventually wipe off SL 7.2 and install (also Red Hat's) Fedora Code 24 from

https://spins.fedoraproject.org/en/xfce/

And reboot after reboot, it is now working perfectly.

So far this has cost me nine unbillable hours.  Find this out has hurt me badly in the pocket book.  You guys may wish to fix this before more developers get hurt.  EL is not suppose to be BUGGIER than Fedora!  :'(

-T

Comment 3 Todd 2016-07-13 22:00:37 UTC

Created attachment 1179466 [details]
journal -xb after drop to maintenance mode

Comment 4 Todd 2016-07-13 22:03:27 UTC

Created attachment 1179467 [details]
/var/log/boot.log showing freeze up at /boot

boot long on screen:

Reached Target Encrypted Volumes
A start job is running on /boot
<ctl><D> drop to maintenance mode after job on /boot can't read /boot

takes five minutes to drop to maintenance mode

Comment 5 Todd 2016-07-13 22:04:13 UTC

Created attachment 1179468 [details]
kdump fails too

Who know if this is related, but ...

Comment 6 Michal Sekletar 2016-07-20 07:03:07 UTC

udev in RHEL7 is part of systemd package. Moving against correct component.

Comment 7 Todd 2016-09-02 20:37:47 UTC

This is now urgent.

Cim Cor, which makes CimTrak file Integrity monitoring software, is balking at supporting Fedora 24.  I have had to move this server to Fedora 24 do to this bug

Cim Cor is having "issues" getting their servers to work with SELinux under Fedora 24.  Cim Cor wants Red Hat 7.2 (or CentOS 7.2), which, do to this bug, will not run on a C236 motherboard.

Also please note that this bug does not occur from the EXACT same Live DVD that the server was installed from.  Therefore, this issue is a "timing issue": the Live DVD runs slower.

I have already lost close to $1000 U$D in free service do to this bug.  You guys are killing me here!

Comment 8 Todd 2016-09-03 01:20:20 UTC

Okay, back down to high as all new server grade hardware will be impacted for everyone else.

Cim Cor got back to me.  I had misunderstood them as to their support for Fedora.  And, they had a patch too!  Yippee!

Comment 9 Todd 2016-09-03 01:21:13 UTC

(In reply to Todd from comment #7)
> Cim Cor, which makes CimTrak file Integrity monitoring software, is balking
> at supporting Fedora 24. 

This was a misunderstanding on my part.

Comment 10 Jan Synacek 2017-02-13 11:55:16 UTC

> Problem 1: SL 7.2 will not boot if one of these drives is blank and inserted at the time of boot.  Boot get a ways in, then drops
> to maintenance.  Remove the drive, power off and back on, and SL 7.2 boots up just fine.

I couldn't reproduce this. I'm not sure why you would want to hotplug a drive *during* the boot.

> Problem 2: after boot up and reinserting the blank drive. gdisk, fdisk, gparted all do not see the disk (/dev/sdc).
> Therefore you can not format it 

Seems like a kernel/hardware issue.

> Jul 12 16:46:22 localhost.localdomain systemd-udevd[251]: error: /dev/sda3: No such file or directory
> Jul 12 16:46:22 localhost.localdomain systemd-udevd[251]: error: /dev/sda2: No such file or directory
> Jul 12 16:46:22 localhost.localdomain systemd-udevd[251]: error: /dev/sda1: No such file or directory
> Jul 12 16:46:22 localhost.localdomain systemd-udevd[306]: inotify_add_watch(6, /dev/sda2, 10) failed: No such file or directory
> Jul 12 16:46:22 localhost.localdomain systemd-udevd[305]: inotify_add_watch(6, /dev/sda1, 10) failed: No such file or directory
> Jul 12 16:46:22 localhost.localdomain systemd-udevd[315]: inotify_add_watch(6, /dev/sda3, 10) failed: No such file or directory

This looks like the drive was there for a moment and then disappeared. I've seen such cases and the problem was that there was not enough power supplied to the drive (usually USB drives).

Anyway, there is nothing I can do. Even if the problem still happens in 7.3, I would need a reproducer to debug this further. Simple hotplug during boot didn't help.

Comment 11 Todd 2017-02-13 22:52:31 UTC

Hi Jan,

Did you use the same motherboard and drives?

The power supply is at least 50% over spec'ed.

And, Fedora Code 24 and 25 have no such issue.

To add insult, the Live DVD that I used to install, also did not have the issue, so it is a timing issue (the DVD is slower).

Please dig a little deeper.  I am frightened to put together another EL7 based server because of this.

-T

Comment 12 Jan Synacek 2017-02-14 09:40:31 UTC

Fedora has a different kernel. I don't have the hardware to dig deeper.

Comment 13 Todd 2017-02-15 04:40:30 UTC

Well, Fedora's kernel working tells you that the issue has nothing to do with the hardware and is completely the fault of RHEL's Kernel.

I will ask Supermicro if they can load you a motherboard to test.

This bug is extremely important as it is a blocker to any new server quotes on RHEL.

Comment 14 Todd 2017-02-15 04:41:01 UTC

> load you
  loan you

Comment 15 Lukáš Nykrýn 2017-02-15 11:36:37 UTC

Can you send us a sos report from that machine when it is running?
https://access.redhat.com/solutions/3592
The sos package should be available for centos as well.

Comment 16 Todd 2017-02-16 17:24:51 UTC

(In reply to Todd from comment #13)
> Well, Fedora's kernel working tells you that the issue has nothing to do
> with the hardware and is completely the fault of RHEL's Kernel.
> 
> I will ask Supermicro if they can loan you a motherboard to test.
> 
> This bug is extremely important as it is a blocker to any new server quotes
> on RHEL.

Work from Supermicro is to have your "contact" person "contact" them: 408-503-8000

Comment 17 Todd 2017-02-16 17:29:27 UTC

(In reply to Lukáš Nykrýn from comment #15)
> Can you send us a sos report from that machine when it is running?
> https://access.redhat.com/solutions/3592
> The sos package should be available for centos as well.


Well now, there are some issues with sos report

1) won't run if it can't boot
2) Fedora is currently installed on this server

But, I would be able to boot off the Live DVD I installed from and run whatever tests you like

-T

Comment 18 Todd 2017-06-23 18:48:14 UTC

Dear Red Hat,

I will be building a new server for a customer in a few weeks time.  The server will be very close to the one reported on this ticket.  (It will be a Fedora server for obvious reasons).  The only major difference will be the drives: Samsung SSD MZ-7KM960NE SM863a 960GB.

There will be a point in the build were I can run whatever test you like (I will consider it hardware burn in) before installing Fedora for delivery to the customer.

If this works for you, please write me a list of tests.


Were you ever able to contact Supermicro for hardware to test?

-T

Comment 20 Todd 2017-07-12 00:56:59 UTC

Uh guys!  I start building tomorrow.  Are you ignoring me on purpose?

Comment 21 Jan Synacek 2017-07-12 07:10:45 UTC

(In reply to Todd from comment #20)
> Uh guys!  I start building tomorrow.  Are you ignoring me on purpose?

No, as I have already said, we don't have the hardware to try.

(In reply to Lukáš Nykrýn from comment #15)
> Can you send us a sos report from that machine when it is running?
> https://access.redhat.com/solutions/3592
> The sos package should be available for centos as well.

And your reply was:

> Work from Supermicro is to have your "contact" person "contact" them:
> 408-503-8000

This attitude will get you exactly nowhere.

Comment 22 Todd 2017-07-12 18:28:48 UTC

(In reply to Jan Synacek from comment #21)
> (In reply to Todd from comment #20)
> > Uh guys!  I start building tomorrow.  Are you ignoring me on purpose?
> 
> No, as I have already said, we don't have the hardware to try.

Apparently you misunderstand me.  I "WILL" have the hardware for a few days.  I was volunteering to run whatever test you wanted me to run.  It is a small windows, but I am offering.

> (In reply to Lukáš Nykrýn from comment #15)
> > Can you send us a sos report from that machine when it is running?
> > https://access.redhat.com/solutions/3592
> > The sos package should be available for centos as well.

There is indeed an sos report for Scientific Linux.  If the machine will randomly not boot, will this be of any help?  Seems to me the report would only      cover a working boot and not write data to a hard drive it can not mount when it does not boot.  Am I missing something?
 
> And your reply was:
> 
> > Work from Supermicro is to have your "contact" person "contact" them:
> > 408-503-8000
> 
> This attitude will get you exactly nowhere.

Why so crabby?  I am offering to help.  Supermicro told me that they have a program for big guys like you that they will provide hardware for testing.  If you call their tech support (extension 2), they will give you the details.

Comment 23 Todd 2017-07-14 23:12:11 UTC

I know you guys did not ask for this information, but in case this ticket ever gets revisited.  I did temporarily install Scientific Linux 7.3 Gnome from a Live USB on the new server.  The install went perfectly.  After the install, I was unable to boot natively.  After choosing the kernel, the screen went blank with the dreaded flashing cursor in the upper left corner.  This repeated several times.

Since the hard drives this time were SSD drives, one can eliminate the drive from the equation.

Fedora 26 x64 Xfce Live USB installed and boots perfectly.

Comment 27 RHEL Program Management 2020-12-15 07:42:49 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Note You need to log in before you can comment on or make changes to this bug.