Bug 1557655 - Install to system with HPSA storage fails to boot, stops at "Failed to start udev Wait for Complete Device Initialization"
Summary: Install to system with HPSA storage fails to boot, stops at "Failed to start ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-17 12:27 UTC by lnie
Modified: 2018-10-01 15:18 UTC (History)
33 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-03 12:38:13 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
console output of the f28 installation beaker job (41.62 KB, text/plain)
2018-03-17 12:27 UTC, lnie
no flags Details
console output of the "success" f28 installation beaker job(with Fedora-28-20180315.n.0 Server x86_64,too) (385.09 KB, text/plain)
2018-03-17 14:25 UTC, lnie
no flags Details
picture (132.89 KB, image/png)
2018-03-27 05:51 UTC, lnie
no flags Details
console output with debug rd.debug (684.41 KB, text/plain)
2018-03-27 06:05 UTC, lnie
no flags Details
screenshot (99.38 KB, image/png)
2018-03-29 07:00 UTC, lnie
no flags Details
pure hpsa server 28 log with systemd.journald.forward_to_console=1 (53.09 KB, text/plain)
2018-03-29 07:03 UTC, lnie
no flags Details
the fcoe server 28 log with systemd.journald.forward_to_console=1 (45.35 KB, text/plain)
2018-03-29 07:05 UTC, lnie
no flags Details
pure hpsa 26 log with systemd.journald.forward_to_console=1 (87.56 KB, text/plain)
2018-03-29 07:11 UTC, lnie
no flags Details
console output of the successful boot (63.94 KB, text/plain)
2018-04-10 07:43 UTC, lnie
no flags Details
console output of the f28 systemd on 27 system (64.55 KB, text/plain)
2018-04-11 06:40 UTC, lnie
no flags Details
console output with 28 kernel (64.80 KB, text/plain)
2018-04-11 07:34 UTC, lnie
no flags Details
console output of the successful f28 installation on hpsa server (145.53 KB, text/plain)
2018-05-03 12:39 UTC, lnie
no flags Details
Document that describes how to set up ilo VSP logging console (490.68 KB, application/pdf)
2018-09-26 23:11 UTC, Don Brace (Microchip)
no flags Details

Description lnie 2018-03-17 12:27:27 UTC
Created attachment 1409134 [details]
console output of the f28 installation beaker job

Description of problem:

I try to  install f28 to a fcoe server,but failed.
I've tried there times,all failed,but installation of f27/rhel to the save server is successful.

FAILED[  136.413954] audit: type=1130 audit(1521265040.092:13): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' 
   
] Failed to start udev Wait for Complete Device Initialization.  
See 'systemctl status syst[  136.516982] audit: type=1130 audit(1521265040.195:14): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=multipathd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' 
emd-udev-settle.service' for details.  


Version-Release number of selected component (if applicable):

Fedora-28-20180315.n.0 Server x86_64

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 lnie 2018-03-17 14:25:02 UTC
Created attachment 1409169 [details]
console output of the "success" f28 installation beaker job(with Fedora-28-20180315.n.0 Server x86_64,too)

Comment 2 lnie 2018-03-17 14:32:41 UTC
The bug affects both ixgbe and bnx2fc driver.

Comment 3 Fedora Blocker Bugs Application 2018-03-17 14:33:00 UTC
Proposed as a Blocker for 28-final by Fedora user lnie using the blocker tracking app because:

 seems affect  "The installer must be able to detect (if possible) and install to supported network-attached storage devices"

Comment 4 Jan Synacek 2018-03-20 10:34:22 UTC
(In reply to lnie from comment #0)
> FAILED[  136.413954] audit: type=1130 audit(1521265040.092:13): pid=1 uid=0
> auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-settle
> comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
> res=failed' 
>    
> ] Failed to start udev Wait for Complete Device Initialization.  
> See 'systemctl status syst[  136.516982] audit: type=1130
> audit(1521265040.195:14): pid=1 uid=0 auid=4294967295 ses=4294967295
> subj=kernel msg='unit=multipathd comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' 
> emd-udev-settle.service' for details.  

I couldn't find any of these in the attached log. Also, it looks like a selinux problem to me.

Comment 5 lnie 2018-03-20 12:53:46 UTC
(In reply to Jan Synacek from comment #4)
> (In reply to lnie from comment #0)
> > FAILED[  136.413954] audit: type=1130 audit(1521265040.092:13): pid=1 uid=0
> > auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-settle
> > comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
> > res=failed' 
> >    
> > ] Failed to start udev Wait for Complete Device Initialization.  
> > See 'systemctl status syst[  136.516982] audit: type=1130
> > audit(1521265040.195:14): pid=1 uid=0 auid=4294967295 ses=4294967295
> > subj=kernel msg='unit=multipathd comm="systemd"
> > exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' 
> > emd-udev-settle.service' for details.  
> 
> I couldn't find any of these in the attached log. Also, it looks like a
> selinux problem to me.

Really?just open the attachment from commen0,and search 'Failed to start udev Wait for Complete Device Initialization',you will see all of that

Comment 6 Jan Synacek 2018-03-22 10:56:16 UTC
Yeah, I was looking into the wrong attachment, sorry about that.

Comment 7 Zbigniew Jędrzejewski-Szmek 2018-03-23 06:31:49 UTC
Hmm, it looks like something goes wrong here, but it'll be very hard to diagnose without further information. Any chance you could attach the journal?

Comment 8 Adam Williamson 2018-03-26 18:10:24 UTC
lnie: can you please try and get the full system logs out, as Zbigniew requests? as he says, it'll be very difficult for anyone without FCoE hardware (which is just about everyone :>) to debug this without the full logs. Thanks!

Comment 9 Geoffrey Marr 2018-03-26 19:10:16 UTC
Discussed during the 2018-03-26 blocker review meeting: [1]

The decision to classify this bug as an AcceptedBlocker was made as it violates the following blocker criteria:

"The installer must be able to detect (if possible) and install to supported network-attached storage devices...Supported network-attached storage types include iSCSI, Fibre Channel and Fibre Channel over Ethernet (FCoE)"

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2018-03-26/f28-blocker-review.2018-03-26-16.01.txt

Comment 10 lnie 2018-03-27 05:50:27 UTC
As shown in the attached screenshot,the installer just hang there,and no shell is provided,so it seems that I'm not able to run journalctl.I'm gonna to do a fresh installation with "debug rd.debug",hope we can get something.

Comment 11 lnie 2018-03-27 05:51:16 UTC
Created attachment 1413517 [details]
picture

Comment 12 lnie 2018-03-27 06:05:41 UTC
Created attachment 1413518 [details]
console output with debug rd.debug

Comment 13 Zbigniew Jędrzejewski-Szmek 2018-03-27 07:18:38 UTC
2: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000\    link/ether e8:39:35:2d:e1:c8 brd ff:ff:ff:ff:ff:ff

dracut-initqueue is waiting for that device to be ready, but it has NO-CARRIER. This looks like the networking is not set up properly there. @lnie, how is the network configured? Is it possible that the network hardware is not connected properly?

Comment 14 lnie 2018-03-27 14:01:51 UTC
I don't think the network configure/hardware will be the problem,as I'm able to install f26/rhel7 successfully,and the CNA card works fine with all the fcoe testcases.

Comment 15 Adam Williamson 2018-03-27 16:01:02 UTC
lnie: some of the tips at https://freedesktop.org/wiki/Software/systemd/Debugging/ may help, particularly getting the logs out via a serial console, if you have access to the serial console on the test system.

Comment 16 lnie 2018-03-29 06:59:18 UTC
With systemd.journald.forward_to_console=1,
I got"seq 1481 '/devices/pci0000:00/0000:00:01.0/0000:04:00.0' is taking a long time",and that device is a hpsa RAID disk.I tested on pure hpsa server,saw the same bug. 
To make sure,I boot the system with rd.break=pre-trigger,and the system just hang there after I do modprobe hpsa manually,as shown in the attached screenshot.

So,it seems that we should start from hpsa driver,though what a little strange is f26 complains the same issue,but it dosen't hang there,and works fine.

Comment 17 lnie 2018-03-29 07:00:06 UTC
Created attachment 1414550 [details]
screenshot

Comment 18 lnie 2018-03-29 07:03:03 UTC
Created attachment 1414551 [details]
pure hpsa server 28 log with systemd.journald.forward_to_console=1

Comment 19 lnie 2018-03-29 07:05:06 UTC
Created attachment 1414564 [details]
the fcoe server 28 log with systemd.journald.forward_to_console=1

Comment 20 lnie 2018-03-29 07:11:55 UTC
Created attachment 1414566 [details]
pure hpsa 26 log with systemd.journald.forward_to_console=1

Comment 21 lnie 2018-03-29 07:19:13 UTC
Adam,the fcoe server is ProLiant DL120 G7 ,the pure hpsa server is ProLiant DL380 G6,and thanks for systemd.journald.forward_to_console=1.

Comment 22 Adam Williamson 2018-04-02 16:58:49 UTC
lili: can you test FCoE without the problematic HPSA storage?

Comment 23 Zbigniew Jędrzejewski-Szmek 2018-04-09 09:35:58 UTC
Comment on attachment 1414550 [details]
screenshot

pre-trigger:/# lsmod | grep hpsa
pre-trigger:/# modprobe hpsa
[  349.185228] HP HPSA Driver (v 3.4.20-125)
[  349.......] hpsa 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control
[  349.......] hpsa 0000:04:00.0: Logical aborts not supported
[  349.......] hpsa 0000:04:00.0: HP SSD Smart Path aborts not supported
[  349.......] scsi host2: hpsa
[  349.......] hpsa can't handle SMP requests

Comment 24 Zbigniew Jędrzejewski-Szmek 2018-04-09 09:43:17 UTC
Since modprobe never returns, it seems there's a hardware/driver problem. Then systemd-udev-settle.service times out, which is expected. udevadm settle has a default timeout of 120s, which matches the attached logs.

Let's get some input from the kernel folks.

Comment 25 Justin M. Forbes 2018-04-09 18:54:25 UTC
Can you try to 4.16 kernel from F28 on an F27 installation and see if that works?

Comment 26 lnie 2018-04-10 07:43:56 UTC
Created attachment 1419725 [details]
console output of the successful boot

Comment 27 lnie 2018-04-10 07:46:37 UTC
Hi,F28 4.16 kernel boots successfully on F27 installation and the console output is attached.

Comment 28 lnie 2018-04-10 07:51:59 UTC
Adam,I didn't see this bug on fcoe servers without hpsa driver

Comment 29 Zbigniew Jędrzejewski-Szmek 2018-04-10 08:00:18 UTC
Comment on attachment 1419725 [details]
console output of the successful boot

The relevant part is:
[    3.810747] HP HPSA Driver (v 3.4.20-125)
[    3.811642] hpsa 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control
[    3.813750] hpsa 0000:04:00.0: Logical aborts not supported
[    3.814926] hpsa 0000:04:00.0: HP SSD Smart Path aborts not supported
[    3.843823] scsi host2: hpsa
[    3.844581] hpsa can't handle SMP requests
[    3.853139] hpsa 0000:04:00.0: scsi 2:0:0:0: added RAID              HP       P410i            controller SSDSmartPathCap- En- Exp=1
[    3.855651] hpsa 0000:04:00.0: scsi 2:0:1:0: masked Direct-Access     HP       EG0146FAWJC      PHYS DRV SSDSmartPathCap- En- Exp=0
[    3.858225] hpsa 0000:04:00.0: scsi 2:1:0:0: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    3.875698] hpsa can't handle SMP requests
[    3.881761] scsi 2:0:0:0: RAID              HP       P410i            6.64 PQ: 0 ANSI: 5
[    3.903799] scsi 2:1:0:0: Direct-Access     HP       LOGICAL VOLUME   6.64 PQ: 0 ANSI: 5
[    3.915674] scsi 2:0:0:0: Attached scsi generic sg1 type 12
[    3.915889] sd 2:1:0:0: Attached scsi generic sg2 type 0
[    3.916283] sd 2:1:0:0: [sda] 286677120 512-byte logical blocks: (147 GB/137 GiB)
[    3.916613] sd 2:1:0:0: [sda] Write Protect is off
[    3.916829] sd 2:1:0:0: [sda] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA
[    3.919095]  sda: sda1 sda2
[    3.985891] sd 2:1:0:0: [sda] Attached SCSI disk

Comment 30 Zbigniew Jędrzejewski-Szmek 2018-04-10 08:20:35 UTC
So... it looks like something is different in userspace and causes the failure. No idea what.

@lnie, sorry to bother you like that, but can you try if installing  the F28 systemd rpms on F27 causes it to fail? I think you cannot install F28 rpms on F27 directly because of glibc, you can use the rpms from copr (https://copr.fedorainfracloud.org/coprs/zbyszek/systemd/build/739255/, building now). These are simply F28/rawhide sources rebuilt on older Fedoras.

Comment 31 lnie 2018-04-11 05:35:27 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #30)
> So... it looks like something is different in userspace and causes the
> failure. No idea what.
> 
> @lnie, sorry to bother you like that, but can you try if installing  the F28
> systemd rpms on F27 causes it to fail? I think you cannot install F28 rpms
> on F27 directly because of glibc, you can use the rpms from copr
> (https://copr.fedorainfracloud.org/coprs/zbyszek/systemd/build/739255/,
> building now). These are simply F28/rawhide sources rebuilt on older Fedoras.

Nope,but the rpms you build need 2.27 while the latest version for 27 is glibc-2.26-27.fc27

[root@storageqe-03 ~]# rpm -Uvh systemd*
warning: systemd-238-7.fc28.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID ab44190f: NOKEY
error: Failed dependencies:
	libc.so.6(GLIBC_2.27)(64bit) is needed by systemd-238-7.fc28.x86_64
	libcrypt.so.1(XCRYPT_2.0)(64bit) is needed by systemd-238-7.fc28.x86_64
	libcryptsetup.so.12()(64bit) is needed by systemd-238-7.fc28.x86_64
	libcryptsetup.so.12(CRYPTSETUP_2.0)(64bit) is needed by systemd-238-7.fc28.x86_64
	libc.so.6(GLIBC_2.27)(64bit) is needed by systemd-libs-238-7.fc28.x86_64
	libcryptsetup.so.12()(64bit) is needed by systemd-udev-238-7.fc28.x86_64
	libcryptsetup.so.12(CRYPTSETUP_2.0)(64bit) is needed by systemd-udev-238-7.fc28.x86_64

Comment 32 Zbigniew Jędrzejewski-Szmek 2018-04-11 05:49:10 UTC
There are separate builds for F26, F27, F28, and rawhide there. See https://copr-be.cloud.fedoraproject.org/results/zbyszek/systemd/fedora-26-x86_64/00739255-systemd/.

Comment 33 lnie 2018-04-11 06:15:12 UTC
Hi,you want me to install f28 rpms on F27,right?
so I download the rpms from this link https://copr-be.cloud.fedoraproject.org/results/zbyszek/systemd/fedora-28-x86_64/00739255-systemd/

[root@storageqe-03 ~]# ls | grep systemd
systemd-238-7.fc28.x86_64.rpm
systemd-libs-238-7.fc28.x86_64.rpm
systemd-pam-238-7.fc28.x86_64.rpm
systemd-udev-238-7.fc28.x86_64.rpm

You need to build the f28 rpms on F27 server if you want to test on f27 system.
Or, I miss anything here?

Comment 34 Zbigniew Jędrzejewski-Szmek 2018-04-11 06:20:33 UTC
It's the same SRPM built multiple times. Technically, the resulting binary RPMs are not *exactly* the same, since they were built using a different version of the compiler, against slightly different versions of libraries, etc, but in this case I don't expect this to make any difference; the goal is to test the changes in systemd itself. So please use the .fc26 rpms.

Comment 35 lnie 2018-04-11 06:40:43 UTC
Created attachment 1420187 [details]
console output of the f28 systemd on 27 system

Comment 36 lnie 2018-04-11 06:42:50 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #34)
> It's the same SRPM built multiple times. Technically, the resulting binary
> RPMs are not *exactly* the same, since they were built using a different
> version of the compiler, against slightly different versions of libraries,
> etc, but in this case I don't expect this to make any difference; the goal
> is to test the changes in systemd itself. So please use the .fc26 rpms.

I see,thanks for your explanation.F28 systemd seems fine and the console output is attached.

Comment 37 Zbigniew Jędrzejewski-Szmek 2018-04-11 07:21:06 UTC
Thanks.

I assume that you just installed the systemd RPMs and rebooted. But the initramfs module is not rebuilt automatically when systemd is installed. We should check with an initramfs rebuilt with the new systemd. The easiest way is to install systemd, and then reinstall the kernel, which also triggers an initramfs build. It could be the F27 or F28 kernel, doesn't matter, as long as you boot using the kernel that was installed *after* systemd was updated.

Comment 38 lnie 2018-04-11 07:34:04 UTC
Created attachment 1420202 [details]
console output with 28 kernel

Comment 39 lnie 2018-04-11 07:36:57 UTC
Nope.yes,and the console ouput with kernel updated is attached,seems still fine.

Comment 40 Zbigniew Jędrzejewski-Szmek 2018-04-11 20:38:01 UTC
We discussed this in the systemd team, and we realized that in F27 and F28 the firmware packages have different versions. But hpsa does not seem to use any firmware, so this does not seems directly relevant.

Comment 41 lnie 2018-04-12 05:49:33 UTC
I have a feeling that I'm making noise but just in case.
By "firmware packages" you mean linux-firmware,right? 
Just out of curiosity,how can you tell hpsa doesn't use any firmware?
Btw,the hpsa's P410i controller has the latest 6.64 firmware installed, though I'm almost sure that you've already seen it in the attached log.

Comment 42 Zbigniew Jędrzejewski-Szmek 2018-04-12 05:59:06 UTC
Yes, linux-firmware or any other *-firmware.rpm that is installed as rpms and loaded during boot. (Of course it has firmware on the device, but that wouldn't change based on Fedora version, so not relevant here.) I didn't see anything in /lib/firmware that seemed to apply and the hpsa module doesn't seem to load anything. Please correct me if I'm wrong.

Comment 43 Adam Williamson 2018-04-17 23:04:44 UTC
Per comment 28 above - "Adam,I didn't see this bug on fcoe servers without hpsa driver" - I'm kicking this back to proposed blocker. This was accepted as a blocker on the basis that FCoE installs were broken, which is clearly covered by the release criteria. However, it seems the problem here is not to do with FCoE at all, but with this HPSA storage controller / driver, so we need to re-consider whether it's a blocker.

The new criterion to consider this for would be "The installer must be able to detect and install to hardware or firmware RAID storage devices" - https://fedoraproject.org/wiki/Fedora_28_Beta_Release_Criteria#Hardware_and_firmware_RAID . Note there's a footnote "System-specific bugs don't necessarily constitute an infringement of this criterion. It is not unusual that support for some specific firmware RAID controller, for instance, might be broken. In the case of such system-specific bugs, whether the bug is considered to infringe the criterion will be a subjective decision based on the severity of the bug and how common the hardware in question is considered to be."

Comment 44 František Zatloukal 2018-04-23 16:40:49 UTC
Discussed during the 2018-04-23 blocker review meeting: [1]

The decision to classify this bug as an RejectedBlocker was made:

"we consider this bug as falling under the "System-specific bugs" footnote to the RAID criterion, and our feeling is that this family of storage controllers is not sufficiently commonly used with Fedora to make this bug constitute a release blocker."

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-04-23/f28-blocker-review.2018-04-23-16.00.log.txt

Comment 45 lnie 2018-05-03 12:38:13 UTC
Turns out this is already fixed by Ming Lei,and f28 installer with 4.16.2 kernel works fine,so close.Thanks Tomas and Joseph.

Comment 46 lnie 2018-05-03 12:39:38 UTC
Created attachment 1430671 [details]
console output of the successful f28  installation on hpsa server

Comment 47 Zbigniew Jędrzejewski-Szmek 2018-05-07 07:59:36 UTC
Yay! Do you have a link to the fix maybe?

Comment 49 Joe Byers 2018-09-25 12:30:16 UTC
I am experiencing this same issue.  I have posted on ask.fedora my issue before finding this bug.  https://ask.fedoraproject.org/en/question/126727/boot-to-new-kernel-hangs-after-distro-upgrade-25-to-26-to-27-to-28/?answer=126931#post-id-126931

I started to upgrade a F25 HP Proliant server with P400 SA, my personal server, with kernel 4.13.16-100.  I upgraded to F26, F27, and F28 all with this result where the vconsole failed to start, the device initialization failed to finish, and a start job on %x2droot ran indefinitely.  It seems the new boot build process fails to get the modules loaded to see the SA.

The F25 still boots.  Any kernel installed with files that build the /boot images experiences this issue.  I have tried reinstalling kernels, F26, from downloaded rpms, but the boot still fails.

What can I provide to help resolve this issue.  I know this is vintage equipment, but it is what I have.

Comment 50 Joe Byers 2018-09-25 12:31:26 UTC
I am experiencing this same issue.  I have posted on ask.fedora my issue before finding this bug.  https://ask.fedoraproject.org/en/question/126727/boot-to-new-kernel-hangs-after-distro-upgrade-25-to-26-to-27-to-28/?answer=126931#post-id-126931

I started to upgrade a F25 HP Proliant server with P400 SA, my personal server, with kernel 4.13.16-100.  I upgraded to F26, F27, and F28 all with this result where the vconsole failed to start, the device initialization failed to finish, and a start job on %x2droot ran indefinitely.  It seems the new boot build process fails to get the modules loaded to see the SA.

The F25 still boots.  Any kernel installed with files that build the /boot images experiences this issue.  I have tried reinstalling kernels, F26, from downloaded rpms, but the boot still fails.

What can I provide to help resolve this issue.  I know this is vintage equipment, but it is what I have.

Comment 51 Zbigniew Jędrzejewski-Szmek 2018-09-25 17:48:28 UTC
Jon: you can just install any kernel (newer or older), without doing the whole upgrade. This way you should be able to test e.g. the F29 kernels [see https://koji.fedoraproject.org/koji/packageinfo?packageID=8]. If even the newest ones don't work, then this might be a different kernel bug.

Comment 52 Joe Byers 2018-09-25 18:03:10 UTC
I downloaded F26 kernel files because I had an older F26 Cinnamon network install spin I used for my wife's laptop that booted.  I installed F26 kernel and tried to boot.  It hung as well.  Failed on the system-udev-settle, with screen showing Reached Basic Target.  I think it is something in systemd files because any /boot images built with systemd with newer version that is 233-6 (F26, 27: 234-8, 28: 238-7) or older does not create bootable images with P400 controllers.  The F26 kernel install used the F28 systemd files to create install the kernel and create the boot images, since I upgrade all the way to F28.

Comment 53 Joe Byers 2018-09-26 16:31:32 UTC
I have tried multiple kernels as suggested, going back to a 4.11 F26 Kernel.  Any kernel and the associated boot images will not boot on the HP Proliant with P400 SA computer.  The only bootable kernel from my grub is the F25 4.13.16 EOL kernel that was the updated.  It seems any new kernel installed and associate boot images created using the versions of files with that kernel releasever will not load the LVM modules or others correctly early in the boot to see \ (root) and read the disks.

I also get a failure to start vconsole early in the boot sequence on one booted kernel.   When I crtl-alt-del, after getting the the hang point of reached basic target system I get a boot job running on %x2droot with no time out.   I can't get to a shell to get to logs on any system with boot files built with the current F28, 4.18.9-200, system files.  A version of F28, 4.18.8-200, that was updated before the last couple of rounds of upgrade pushes, will drop me to a dracut shell with an dracut init-queue warning rolling across my screen.  I run lvm pv(lv)scan and they return no devices found.  Again, I can't mount a USB to dump logs, or probably can't figure out how to mount the USB device.

Thank you all for any and all help.

Comment 54 Don Brace (Microchip) 2018-09-26 23:09:05 UTC
(In reply to Joe Byers from comment #53)
> I have tried multiple kernels as suggested, going back to a 4.11 F26 Kernel.
> Any kernel and the associated boot images will not boot on the HP Proliant
> with P400 SA computer.  The only bootable kernel from my grub is the F25
> 4.13.16 EOL kernel that was the updated.  It seems any new kernel installed
> and associate boot images created using the versions of files with that
> kernel releasever will not load the LVM modules or others correctly early in
> the boot to see \ (root) and read the disks.
> 
> I also get a failure to start vconsole early in the boot sequence on one
> booted kernel.   When I crtl-alt-del, after getting the the hang point of
> reached basic target system I get a boot job running on %x2droot with no
> time out.   I can't get to a shell to get to logs on any system with boot
> files built with the current F28, 4.18.9-200, system files.  A version of
> F28, 4.18.8-200, that was updated before the last couple of rounds of
> upgrade pushes, will drop me to a dracut shell with an dracut init-queue
> warning rolling across my screen.  I run lvm pv(lv)scan and they return no
> devices found.  Again, I can't mount a USB to dump logs, or probably can't
> figure out how to mount the USB device.
> 
> Thank you all for any and all help.

Can you try adding VSP logging? I'll attach a document that explains how to do this.

Comment 55 Don Brace (Microchip) 2018-09-26 23:11:30 UTC
Created attachment 1487510 [details]
Document that describes how to set up ilo VSP logging console

This allows you to add console=ttyS0,115200 console=tty0 to your boot line. This can be done from the grub menu so you do not need a successful boot to add.

You also will need to update some entries in RBSU to enable serial logging and VSP.

Comment 56 Joe Byers 2018-09-27 03:31:33 UTC
I will work on this.  I have a DL380 G5 not a G9.  My bios version is 2.1 and my Ilo is version 1.73.  I think there is a firmware update from HP but it is hard to figure out what files to download and install. 

I followed the instructions for bios setting.  My version, the is not in a sub-menu, I set it as instructed.  The bios serial console and ems menu options were as described and set as instructed.  

My Ilo is very different, no place to set access settings.  Not how to proceed here.  I  have rebooted and will configure grub2 config files.  

Where do I get the Ilo IP address from?  I saw the Network configuration but not sure if I set it up to get an IP from my router.

Thanks so much for this help.

Comment 57 Don Brace (Microchip) 2018-09-27 14:37:10 UTC
(In reply to Joe Byers from comment #56)
> I will work on this.  I have a DL380 G5 not a G9.  My bios version is 2.1
> and my Ilo is version 1.73.  I think there is a firmware update from HP but
> it is hard to figure out what files to download and install. 
> 
> I followed the instructions for bios setting.  My version, the is not in a
> sub-menu, I set it as instructed.  The bios serial console and ems menu
> options were as described and set as instructed.  
> 
> My Ilo is very different, no place to set access settings.  Not how to
> proceed here.  I  have rebooted and will configure grub2 config files.  
> 
> Where do I get the Ilo IP address from?  I saw the Network configuration but
> not sure if I set it up to get an IP from my router.
> 
> Thanks so much for this help.

So, you can install ipmitool to get your ilo IP, but you would have to be able to boot and install ipmitool. And your server would have to have support for impi. 

If you can boot and the server has support for ipmi, this command will work:
       ipmitool lan print | awk '/IP Address *:/ {print $4}'

The ilo IP should also show up at POST on the console. That is the one to use and is easiest.

After that, if you have another Linux box, you can use:
script -c "ssh <ilo IP address" /tmp/console_log_for_my_issue

script will log everything to the log file.

If you have windows, you can use putty or some other tool to connect to the IP and log that way.

Comment 58 Joe Byers 2018-09-28 13:05:38 UTC
I can boot to an the EOL F25 kernel that I was upgrading. I have impitool installed on the server, so How do I determine if impi is supported?  

Also, on a side note since you showed me screenshots of a DL G9 bios.  Can I upgrade my firmware from 4.12 to 7.24, skipping version between.  I know this is off topic for here, but I would appreciate any reassurance. 

Last this is my home server, so I might be delayed in getting back on some of items due to other time requirements like I need to write 2 exams:)

Thank you so much.

Comment 59 Don Brace (Microchip) 2018-09-28 14:57:43 UTC
(In reply to Joe Byers from comment #58)
> I can boot to an the EOL F25 kernel that I was upgrading. I have impitool
> installed on the server, so How do I determine if impi is supported?  
> 
> Also, on a side note since you showed me screenshots of a DL G9 bios.  Can I
> upgrade my firmware from 4.12 to 7.24, skipping version between.  I know
> this is off topic for here, but I would appreciate any reassurance. 
> 
> Last this is my home server, so I might be delayed in getting back on some
> of items due to other time requirements like I need to write 2 exams:)
> 
> Thank you so much.

Do you mean the P410 FW? Yes.

For ipmi:
dmidecode --type 38 (Running on a workstation without ipmi)
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.

Running it on a server with ipmi:
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.1.1 present.
# SMBIOS implementations newer than version 3.0 are not
# fully supported by this version of dmidecode.

Handle 0x0015, DMI type 38, 18 bytes
IPMI Device Information
	Interface Type: KCS (Keyboard Control Style)
	Specification Version: 2.0
	I2C Slave Address: 0x10
	NV Storage Device: Not Present
	Base Address: 0x0000000000000CA2 (I/O)
	Register Spacing: Successive Byte Boundaries


Hope this helps.

Comment 60 Joe Byers 2018-09-29 03:47:48 UTC
I want to provide a follow up.  I followed all instructions.  And thank you for the support, I updated the P400 firmware and the ILO2 firmware for this DL380 
G5.  I rebooted for each update just to be sure.  P400 Firmware from 4.12 to 7.24

I was working more on the ipmi, which is running on the server.  I decided to try and boot to a F28 kernel.  And I'll be if it didn't boot to the F28 4.18.9-200 kernel.  This blew my mind.  I did try the 4.18.8 without success.

I am not sure, but it seems that something in the new version P400 controller firmware was needed for the Kernel to see the disks and the partitions.  The driver is hpsa since the devices are follow sdX and not ccissX.  

I want to thank you all for the assistance.  I am not sure if things are fully fixed, but I am at least running on the latest kernel.

Thank you all again.

Joe

Comment 61 Don Brace (Microchip) 2018-10-01 15:18:54 UTC
Thanks for the update. I do not think there are many (if any) hpsa driver differences. But I would like to diff them.

Can you POST the links to the kernel sources?


Note You need to log in before you can comment on or make changes to this bug.