Created attachment 1177129 [details] drive info from gsmartcontrol Dear Red Hat, I am coming from the community: Scientific Linux 7.2 x64 Would one of our intrepid heroes please fix this for me? I designed a server with a Supermicro X11SAE-M motherboard: http://www.supermicro.com/products/motherboard/Xeon/C236_C232/X11SAE-M.cfm With uses an Intel C236 chipset: http://ark.intel.com/products/90594/Intel-GL82C236-PCH I have two 2TB drives configured as RSTe RAID 1. I have a third 4TB hard drive configured in a removable SATA sleeve for backup (actually two drives, but only one can be inserted at a time, they rotate daily). The/these drive(s) is/are a TOSHIBA MG03ACA400. Problem 1: SL 7.2 will not boot if one of these drives is blank and inserted at the time of boot. Boot get a ways in, then drops to maintenance. Remove the drive, power off and back on, and SL 7.2 boots up just fine. Problem 2: after boot up and reinserting the blank drive. gdisk, fdisk, gparted all do not see the disk (/dev/sdc). Therefore you can not format it To format the disk, I booted off of a direct install Fedora Core 23 USB flash drive. I used gparted to set the disks up with gpt partition and xfs. After that, booting into SL 7.2 was not an issue and everyone saw the drive. Hot swap worked too. So, the problem is with SL 7.2 and not Fedora 23. Many thanks, -T
Okay, this is even worse than I thought. Now it conspires that SL 7.2 will randomly not find my /boot partition either. So SL7.2 randomly will find and not find the primary raid partition and the second (third physical) drive. Raid 1 pair is Toshiba MG04ACA200A 2TB SATA 6Gb/s 7200RPM 128MB 3.5 inch 4Kn (Tomcat) Bare. This reproduces unencrypted or encrypted. Here is the absolute bugger! The Live DVD I made this off of: http://ftp1.scientificlinux.org/linux/scientific/7.2/x86_64/iso/SL-72-x86_64-2016-02-03-LiveDVDgnome.iso Recognizes both the RAID 1 pair and the extra drive PERFECTLY! And it is repeatable every boot (or at least the 10 times I tried). This issue repeated after wiping and reinstalling unencrypted with the above Live DVD. As I can not deliver a server to a customer in the unreliable condition, I had to eventually wipe off SL 7.2 and install (also Red Hat's) Fedora Code 24 from https://spins.fedoraproject.org/en/xfce/ And reboot after reboot, it is now working perfectly. So far this has cost me nine unbillable hours. Find this out has hurt me badly in the pocket book. You guys may wish to fix this before more developers get hurt. EL is not suppose to be BUGGIER than Fedora! :'( -T
Created attachment 1179466 [details] journal -xb after drop to maintenance mode
Created attachment 1179467 [details] /var/log/boot.log showing freeze up at /boot boot long on screen: Reached Target Encrypted Volumes A start job is running on /boot <ctl><D> drop to maintenance mode after job on /boot can't read /boot takes five minutes to drop to maintenance mode
Created attachment 1179468 [details] kdump fails too Who know if this is related, but ...
udev in RHEL7 is part of systemd package. Moving against correct component.
This is now urgent. Cim Cor, which makes CimTrak file Integrity monitoring software, is balking at supporting Fedora 24. I have had to move this server to Fedora 24 do to this bug Cim Cor is having "issues" getting their servers to work with SELinux under Fedora 24. Cim Cor wants Red Hat 7.2 (or CentOS 7.2), which, do to this bug, will not run on a C236 motherboard. Also please note that this bug does not occur from the EXACT same Live DVD that the server was installed from. Therefore, this issue is a "timing issue": the Live DVD runs slower. I have already lost close to $1000 U$D in free service do to this bug. You guys are killing me here!
Okay, back down to high as all new server grade hardware will be impacted for everyone else. Cim Cor got back to me. I had misunderstood them as to their support for Fedora. And, they had a patch too! Yippee!
(In reply to Todd from comment #7) > Cim Cor, which makes CimTrak file Integrity monitoring software, is balking > at supporting Fedora 24. This was a misunderstanding on my part.
> Problem 1: SL 7.2 will not boot if one of these drives is blank and inserted at the time of boot. Boot get a ways in, then drops > to maintenance. Remove the drive, power off and back on, and SL 7.2 boots up just fine. I couldn't reproduce this. I'm not sure why you would want to hotplug a drive *during* the boot. > Problem 2: after boot up and reinserting the blank drive. gdisk, fdisk, gparted all do not see the disk (/dev/sdc). > Therefore you can not format it Seems like a kernel/hardware issue. > Jul 12 16:46:22 localhost.localdomain systemd-udevd[251]: error: /dev/sda3: No such file or directory > Jul 12 16:46:22 localhost.localdomain systemd-udevd[251]: error: /dev/sda2: No such file or directory > Jul 12 16:46:22 localhost.localdomain systemd-udevd[251]: error: /dev/sda1: No such file or directory > Jul 12 16:46:22 localhost.localdomain systemd-udevd[306]: inotify_add_watch(6, /dev/sda2, 10) failed: No such file or directory > Jul 12 16:46:22 localhost.localdomain systemd-udevd[305]: inotify_add_watch(6, /dev/sda1, 10) failed: No such file or directory > Jul 12 16:46:22 localhost.localdomain systemd-udevd[315]: inotify_add_watch(6, /dev/sda3, 10) failed: No such file or directory This looks like the drive was there for a moment and then disappeared. I've seen such cases and the problem was that there was not enough power supplied to the drive (usually USB drives). Anyway, there is nothing I can do. Even if the problem still happens in 7.3, I would need a reproducer to debug this further. Simple hotplug during boot didn't help.
Hi Jan, Did you use the same motherboard and drives? The power supply is at least 50% over spec'ed. And, Fedora Code 24 and 25 have no such issue. To add insult, the Live DVD that I used to install, also did not have the issue, so it is a timing issue (the DVD is slower). Please dig a little deeper. I am frightened to put together another EL7 based server because of this. -T
Fedora has a different kernel. I don't have the hardware to dig deeper.
Well, Fedora's kernel working tells you that the issue has nothing to do with the hardware and is completely the fault of RHEL's Kernel. I will ask Supermicro if they can load you a motherboard to test. This bug is extremely important as it is a blocker to any new server quotes on RHEL.
> load you loan you
Can you send us a sos report from that machine when it is running? https://access.redhat.com/solutions/3592 The sos package should be available for centos as well.
(In reply to Todd from comment #13) > Well, Fedora's kernel working tells you that the issue has nothing to do > with the hardware and is completely the fault of RHEL's Kernel. > > I will ask Supermicro if they can loan you a motherboard to test. > > This bug is extremely important as it is a blocker to any new server quotes > on RHEL. Work from Supermicro is to have your "contact" person "contact" them: 408-503-8000
(In reply to Lukáš Nykrýn from comment #15) > Can you send us a sos report from that machine when it is running? > https://access.redhat.com/solutions/3592 > The sos package should be available for centos as well. Well now, there are some issues with sos report 1) won't run if it can't boot 2) Fedora is currently installed on this server But, I would be able to boot off the Live DVD I installed from and run whatever tests you like -T
Dear Red Hat, I will be building a new server for a customer in a few weeks time. The server will be very close to the one reported on this ticket. (It will be a Fedora server for obvious reasons). The only major difference will be the drives: Samsung SSD MZ-7KM960NE SM863a 960GB. There will be a point in the build were I can run whatever test you like (I will consider it hardware burn in) before installing Fedora for delivery to the customer. If this works for you, please write me a list of tests. Were you ever able to contact Supermicro for hardware to test? -T
Uh guys! I start building tomorrow. Are you ignoring me on purpose?
(In reply to Todd from comment #20) > Uh guys! I start building tomorrow. Are you ignoring me on purpose? No, as I have already said, we don't have the hardware to try. (In reply to Lukáš Nykrýn from comment #15) > Can you send us a sos report from that machine when it is running? > https://access.redhat.com/solutions/3592 > The sos package should be available for centos as well. And your reply was: > Work from Supermicro is to have your "contact" person "contact" them: > 408-503-8000 This attitude will get you exactly nowhere.
(In reply to Jan Synacek from comment #21) > (In reply to Todd from comment #20) > > Uh guys! I start building tomorrow. Are you ignoring me on purpose? > > No, as I have already said, we don't have the hardware to try. Apparently you misunderstand me. I "WILL" have the hardware for a few days. I was volunteering to run whatever test you wanted me to run. It is a small windows, but I am offering. > (In reply to Lukáš Nykrýn from comment #15) > > Can you send us a sos report from that machine when it is running? > > https://access.redhat.com/solutions/3592 > > The sos package should be available for centos as well. There is indeed an sos report for Scientific Linux. If the machine will randomly not boot, will this be of any help? Seems to me the report would only cover a working boot and not write data to a hard drive it can not mount when it does not boot. Am I missing something? > And your reply was: > > > Work from Supermicro is to have your "contact" person "contact" them: > > 408-503-8000 > > This attitude will get you exactly nowhere. Why so crabby? I am offering to help. Supermicro told me that they have a program for big guys like you that they will provide hardware for testing. If you call their tech support (extension 2), they will give you the details.
I know you guys did not ask for this information, but in case this ticket ever gets revisited. I did temporarily install Scientific Linux 7.3 Gnome from a Live USB on the new server. The install went perfectly. After the install, I was unable to boot natively. After choosing the kernel, the screen went blank with the dreaded flashing cursor in the upper left corner. This repeated several times. Since the hard drives this time were SSD drives, one can eliminate the drive from the equation. Fedora 26 x64 Xfce Live USB installed and boots perfectly.
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.