Hide Forgot
Created attachment 1409134 [details] console output of the f28 installation beaker job Description of problem: I try to install f28 to a fcoe server,but failed. I've tried there times,all failed,but installation of f27/rhel to the save server is successful. FAILED[ 136.413954] audit: type=1130 audit(1521265040.092:13): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' ] Failed to start udev Wait for Complete Device Initialization. See 'systemctl status syst[ 136.516982] audit: type=1130 audit(1521265040.195:14): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=multipathd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' emd-udev-settle.service' for details. Version-Release number of selected component (if applicable): Fedora-28-20180315.n.0 Server x86_64 How reproducible: always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1409169 [details] console output of the "success" f28 installation beaker job(with Fedora-28-20180315.n.0 Server x86_64,too)
The bug affects both ixgbe and bnx2fc driver.
Proposed as a Blocker for 28-final by Fedora user lnie using the blocker tracking app because: seems affect "The installer must be able to detect (if possible) and install to supported network-attached storage devices"
(In reply to lnie from comment #0) > FAILED[ 136.413954] audit: type=1130 audit(1521265040.092:13): pid=1 uid=0 > auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-settle > comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? > res=failed' > > ] Failed to start udev Wait for Complete Device Initialization. > See 'systemctl status syst[ 136.516982] audit: type=1130 > audit(1521265040.195:14): pid=1 uid=0 auid=4294967295 ses=4294967295 > subj=kernel msg='unit=multipathd comm="systemd" > exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > emd-udev-settle.service' for details. I couldn't find any of these in the attached log. Also, it looks like a selinux problem to me.
(In reply to Jan Synacek from comment #4) > (In reply to lnie from comment #0) > > FAILED[ 136.413954] audit: type=1130 audit(1521265040.092:13): pid=1 uid=0 > > auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-settle > > comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? > > res=failed' > > > > ] Failed to start udev Wait for Complete Device Initialization. > > See 'systemctl status syst[ 136.516982] audit: type=1130 > > audit(1521265040.195:14): pid=1 uid=0 auid=4294967295 ses=4294967295 > > subj=kernel msg='unit=multipathd comm="systemd" > > exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > > emd-udev-settle.service' for details. > > I couldn't find any of these in the attached log. Also, it looks like a > selinux problem to me. Really?just open the attachment from commen0,and search 'Failed to start udev Wait for Complete Device Initialization',you will see all of that
Yeah, I was looking into the wrong attachment, sorry about that.
Hmm, it looks like something goes wrong here, but it'll be very hard to diagnose without further information. Any chance you could attach the journal?
lnie: can you please try and get the full system logs out, as Zbigniew requests? as he says, it'll be very difficult for anyone without FCoE hardware (which is just about everyone :>) to debug this without the full logs. Thanks!
Discussed during the 2018-03-26 blocker review meeting: [1] The decision to classify this bug as an AcceptedBlocker was made as it violates the following blocker criteria: "The installer must be able to detect (if possible) and install to supported network-attached storage devices...Supported network-attached storage types include iSCSI, Fibre Channel and Fibre Channel over Ethernet (FCoE)" [1] https://meetbot.fedoraproject.org/fedora-blocker-review/2018-03-26/f28-blocker-review.2018-03-26-16.01.txt
As shown in the attached screenshot,the installer just hang there,and no shell is provided,so it seems that I'm not able to run journalctl.I'm gonna to do a fresh installation with "debug rd.debug",hope we can get something.
Created attachment 1413517 [details] picture
Created attachment 1413518 [details] console output with debug rd.debug
2: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000\ link/ether e8:39:35:2d:e1:c8 brd ff:ff:ff:ff:ff:ff dracut-initqueue is waiting for that device to be ready, but it has NO-CARRIER. This looks like the networking is not set up properly there. @lnie, how is the network configured? Is it possible that the network hardware is not connected properly?
I don't think the network configure/hardware will be the problem,as I'm able to install f26/rhel7 successfully,and the CNA card works fine with all the fcoe testcases.
lnie: some of the tips at https://freedesktop.org/wiki/Software/systemd/Debugging/ may help, particularly getting the logs out via a serial console, if you have access to the serial console on the test system.
With systemd.journald.forward_to_console=1, I got"seq 1481 '/devices/pci0000:00/0000:00:01.0/0000:04:00.0' is taking a long time",and that device is a hpsa RAID disk.I tested on pure hpsa server,saw the same bug. To make sure,I boot the system with rd.break=pre-trigger,and the system just hang there after I do modprobe hpsa manually,as shown in the attached screenshot. So,it seems that we should start from hpsa driver,though what a little strange is f26 complains the same issue,but it dosen't hang there,and works fine.
Created attachment 1414550 [details] screenshot
Created attachment 1414551 [details] pure hpsa server 28 log with systemd.journald.forward_to_console=1
Created attachment 1414564 [details] the fcoe server 28 log with systemd.journald.forward_to_console=1
Created attachment 1414566 [details] pure hpsa 26 log with systemd.journald.forward_to_console=1
Adam,the fcoe server is ProLiant DL120 G7 ,the pure hpsa server is ProLiant DL380 G6,and thanks for systemd.journald.forward_to_console=1.
lili: can you test FCoE without the problematic HPSA storage?
Comment on attachment 1414550 [details] screenshot pre-trigger:/# lsmod | grep hpsa pre-trigger:/# modprobe hpsa [ 349.185228] HP HPSA Driver (v 3.4.20-125) [ 349.......] hpsa 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control [ 349.......] hpsa 0000:04:00.0: Logical aborts not supported [ 349.......] hpsa 0000:04:00.0: HP SSD Smart Path aborts not supported [ 349.......] scsi host2: hpsa [ 349.......] hpsa can't handle SMP requests
Since modprobe never returns, it seems there's a hardware/driver problem. Then systemd-udev-settle.service times out, which is expected. udevadm settle has a default timeout of 120s, which matches the attached logs. Let's get some input from the kernel folks.
Can you try to 4.16 kernel from F28 on an F27 installation and see if that works?
Created attachment 1419725 [details] console output of the successful boot
Hi,F28 4.16 kernel boots successfully on F27 installation and the console output is attached.
Adam,I didn't see this bug on fcoe servers without hpsa driver
Comment on attachment 1419725 [details] console output of the successful boot The relevant part is: [ 3.810747] HP HPSA Driver (v 3.4.20-125) [ 3.811642] hpsa 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control [ 3.813750] hpsa 0000:04:00.0: Logical aborts not supported [ 3.814926] hpsa 0000:04:00.0: HP SSD Smart Path aborts not supported [ 3.843823] scsi host2: hpsa [ 3.844581] hpsa can't handle SMP requests [ 3.853139] hpsa 0000:04:00.0: scsi 2:0:0:0: added RAID HP P410i controller SSDSmartPathCap- En- Exp=1 [ 3.855651] hpsa 0000:04:00.0: scsi 2:0:1:0: masked Direct-Access HP EG0146FAWJC PHYS DRV SSDSmartPathCap- En- Exp=0 [ 3.858225] hpsa 0000:04:00.0: scsi 2:1:0:0: added Direct-Access HP LOGICAL VOLUME RAID-0 SSDSmartPathCap- En- Exp=1 [ 3.875698] hpsa can't handle SMP requests [ 3.881761] scsi 2:0:0:0: RAID HP P410i 6.64 PQ: 0 ANSI: 5 [ 3.903799] scsi 2:1:0:0: Direct-Access HP LOGICAL VOLUME 6.64 PQ: 0 ANSI: 5 [ 3.915674] scsi 2:0:0:0: Attached scsi generic sg1 type 12 [ 3.915889] sd 2:1:0:0: Attached scsi generic sg2 type 0 [ 3.916283] sd 2:1:0:0: [sda] 286677120 512-byte logical blocks: (147 GB/137 GiB) [ 3.916613] sd 2:1:0:0: [sda] Write Protect is off [ 3.916829] sd 2:1:0:0: [sda] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA [ 3.919095] sda: sda1 sda2 [ 3.985891] sd 2:1:0:0: [sda] Attached SCSI disk
So... it looks like something is different in userspace and causes the failure. No idea what. @lnie, sorry to bother you like that, but can you try if installing the F28 systemd rpms on F27 causes it to fail? I think you cannot install F28 rpms on F27 directly because of glibc, you can use the rpms from copr (https://copr.fedorainfracloud.org/coprs/zbyszek/systemd/build/739255/, building now). These are simply F28/rawhide sources rebuilt on older Fedoras.
(In reply to Zbigniew Jędrzejewski-Szmek from comment #30) > So... it looks like something is different in userspace and causes the > failure. No idea what. > > @lnie, sorry to bother you like that, but can you try if installing the F28 > systemd rpms on F27 causes it to fail? I think you cannot install F28 rpms > on F27 directly because of glibc, you can use the rpms from copr > (https://copr.fedorainfracloud.org/coprs/zbyszek/systemd/build/739255/, > building now). These are simply F28/rawhide sources rebuilt on older Fedoras. Nope,but the rpms you build need 2.27 while the latest version for 27 is glibc-2.26-27.fc27 [root@storageqe-03 ~]# rpm -Uvh systemd* warning: systemd-238-7.fc28.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID ab44190f: NOKEY error: Failed dependencies: libc.so.6(GLIBC_2.27)(64bit) is needed by systemd-238-7.fc28.x86_64 libcrypt.so.1(XCRYPT_2.0)(64bit) is needed by systemd-238-7.fc28.x86_64 libcryptsetup.so.12()(64bit) is needed by systemd-238-7.fc28.x86_64 libcryptsetup.so.12(CRYPTSETUP_2.0)(64bit) is needed by systemd-238-7.fc28.x86_64 libc.so.6(GLIBC_2.27)(64bit) is needed by systemd-libs-238-7.fc28.x86_64 libcryptsetup.so.12()(64bit) is needed by systemd-udev-238-7.fc28.x86_64 libcryptsetup.so.12(CRYPTSETUP_2.0)(64bit) is needed by systemd-udev-238-7.fc28.x86_64
There are separate builds for F26, F27, F28, and rawhide there. See https://copr-be.cloud.fedoraproject.org/results/zbyszek/systemd/fedora-26-x86_64/00739255-systemd/.
Hi,you want me to install f28 rpms on F27,right? so I download the rpms from this link https://copr-be.cloud.fedoraproject.org/results/zbyszek/systemd/fedora-28-x86_64/00739255-systemd/ [root@storageqe-03 ~]# ls | grep systemd systemd-238-7.fc28.x86_64.rpm systemd-libs-238-7.fc28.x86_64.rpm systemd-pam-238-7.fc28.x86_64.rpm systemd-udev-238-7.fc28.x86_64.rpm You need to build the f28 rpms on F27 server if you want to test on f27 system. Or, I miss anything here?
It's the same SRPM built multiple times. Technically, the resulting binary RPMs are not *exactly* the same, since they were built using a different version of the compiler, against slightly different versions of libraries, etc, but in this case I don't expect this to make any difference; the goal is to test the changes in systemd itself. So please use the .fc26 rpms.
Created attachment 1420187 [details] console output of the f28 systemd on 27 system
(In reply to Zbigniew Jędrzejewski-Szmek from comment #34) > It's the same SRPM built multiple times. Technically, the resulting binary > RPMs are not *exactly* the same, since they were built using a different > version of the compiler, against slightly different versions of libraries, > etc, but in this case I don't expect this to make any difference; the goal > is to test the changes in systemd itself. So please use the .fc26 rpms. I see,thanks for your explanation.F28 systemd seems fine and the console output is attached.
Thanks. I assume that you just installed the systemd RPMs and rebooted. But the initramfs module is not rebuilt automatically when systemd is installed. We should check with an initramfs rebuilt with the new systemd. The easiest way is to install systemd, and then reinstall the kernel, which also triggers an initramfs build. It could be the F27 or F28 kernel, doesn't matter, as long as you boot using the kernel that was installed *after* systemd was updated.
Created attachment 1420202 [details] console output with 28 kernel
Nope.yes,and the console ouput with kernel updated is attached,seems still fine.
We discussed this in the systemd team, and we realized that in F27 and F28 the firmware packages have different versions. But hpsa does not seem to use any firmware, so this does not seems directly relevant.
I have a feeling that I'm making noise but just in case. By "firmware packages" you mean linux-firmware,right? Just out of curiosity,how can you tell hpsa doesn't use any firmware? Btw,the hpsa's P410i controller has the latest 6.64 firmware installed, though I'm almost sure that you've already seen it in the attached log.
Yes, linux-firmware or any other *-firmware.rpm that is installed as rpms and loaded during boot. (Of course it has firmware on the device, but that wouldn't change based on Fedora version, so not relevant here.) I didn't see anything in /lib/firmware that seemed to apply and the hpsa module doesn't seem to load anything. Please correct me if I'm wrong.
Per comment 28 above - "Adam,I didn't see this bug on fcoe servers without hpsa driver" - I'm kicking this back to proposed blocker. This was accepted as a blocker on the basis that FCoE installs were broken, which is clearly covered by the release criteria. However, it seems the problem here is not to do with FCoE at all, but with this HPSA storage controller / driver, so we need to re-consider whether it's a blocker. The new criterion to consider this for would be "The installer must be able to detect and install to hardware or firmware RAID storage devices" - https://fedoraproject.org/wiki/Fedora_28_Beta_Release_Criteria#Hardware_and_firmware_RAID . Note there's a footnote "System-specific bugs don't necessarily constitute an infringement of this criterion. It is not unusual that support for some specific firmware RAID controller, for instance, might be broken. In the case of such system-specific bugs, whether the bug is considered to infringe the criterion will be a subjective decision based on the severity of the bug and how common the hardware in question is considered to be."
Discussed during the 2018-04-23 blocker review meeting: [1] The decision to classify this bug as an RejectedBlocker was made: "we consider this bug as falling under the "System-specific bugs" footnote to the RAID criterion, and our feeling is that this family of storage controllers is not sufficiently commonly used with Fedora to make this bug constitute a release blocker." [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-04-23/f28-blocker-review.2018-04-23-16.00.log.txt
Turns out this is already fixed by Ming Lei,and f28 installer with 4.16.2 kernel works fine,so close.Thanks Tomas and Joseph.
Created attachment 1430671 [details] console output of the successful f28 installation on hpsa server
Yay! Do you have a link to the fix maybe?
yes,https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.16.7&id=8b834bff1b73dce46f4e9f5e84af6f73fed8b0ef
I am experiencing this same issue. I have posted on ask.fedora my issue before finding this bug. https://ask.fedoraproject.org/en/question/126727/boot-to-new-kernel-hangs-after-distro-upgrade-25-to-26-to-27-to-28/?answer=126931#post-id-126931 I started to upgrade a F25 HP Proliant server with P400 SA, my personal server, with kernel 4.13.16-100. I upgraded to F26, F27, and F28 all with this result where the vconsole failed to start, the device initialization failed to finish, and a start job on %x2droot ran indefinitely. It seems the new boot build process fails to get the modules loaded to see the SA. The F25 still boots. Any kernel installed with files that build the /boot images experiences this issue. I have tried reinstalling kernels, F26, from downloaded rpms, but the boot still fails. What can I provide to help resolve this issue. I know this is vintage equipment, but it is what I have.
Jon: you can just install any kernel (newer or older), without doing the whole upgrade. This way you should be able to test e.g. the F29 kernels [see https://koji.fedoraproject.org/koji/packageinfo?packageID=8]. If even the newest ones don't work, then this might be a different kernel bug.
I downloaded F26 kernel files because I had an older F26 Cinnamon network install spin I used for my wife's laptop that booted. I installed F26 kernel and tried to boot. It hung as well. Failed on the system-udev-settle, with screen showing Reached Basic Target. I think it is something in systemd files because any /boot images built with systemd with newer version that is 233-6 (F26, 27: 234-8, 28: 238-7) or older does not create bootable images with P400 controllers. The F26 kernel install used the F28 systemd files to create install the kernel and create the boot images, since I upgrade all the way to F28.
I have tried multiple kernels as suggested, going back to a 4.11 F26 Kernel. Any kernel and the associated boot images will not boot on the HP Proliant with P400 SA computer. The only bootable kernel from my grub is the F25 4.13.16 EOL kernel that was the updated. It seems any new kernel installed and associate boot images created using the versions of files with that kernel releasever will not load the LVM modules or others correctly early in the boot to see \ (root) and read the disks. I also get a failure to start vconsole early in the boot sequence on one booted kernel. When I crtl-alt-del, after getting the the hang point of reached basic target system I get a boot job running on %x2droot with no time out. I can't get to a shell to get to logs on any system with boot files built with the current F28, 4.18.9-200, system files. A version of F28, 4.18.8-200, that was updated before the last couple of rounds of upgrade pushes, will drop me to a dracut shell with an dracut init-queue warning rolling across my screen. I run lvm pv(lv)scan and they return no devices found. Again, I can't mount a USB to dump logs, or probably can't figure out how to mount the USB device. Thank you all for any and all help.
(In reply to Joe Byers from comment #53) > I have tried multiple kernels as suggested, going back to a 4.11 F26 Kernel. > Any kernel and the associated boot images will not boot on the HP Proliant > with P400 SA computer. The only bootable kernel from my grub is the F25 > 4.13.16 EOL kernel that was the updated. It seems any new kernel installed > and associate boot images created using the versions of files with that > kernel releasever will not load the LVM modules or others correctly early in > the boot to see \ (root) and read the disks. > > I also get a failure to start vconsole early in the boot sequence on one > booted kernel. When I crtl-alt-del, after getting the the hang point of > reached basic target system I get a boot job running on %x2droot with no > time out. I can't get to a shell to get to logs on any system with boot > files built with the current F28, 4.18.9-200, system files. A version of > F28, 4.18.8-200, that was updated before the last couple of rounds of > upgrade pushes, will drop me to a dracut shell with an dracut init-queue > warning rolling across my screen. I run lvm pv(lv)scan and they return no > devices found. Again, I can't mount a USB to dump logs, or probably can't > figure out how to mount the USB device. > > Thank you all for any and all help. Can you try adding VSP logging? I'll attach a document that explains how to do this.
Created attachment 1487510 [details] Document that describes how to set up ilo VSP logging console This allows you to add console=ttyS0,115200 console=tty0 to your boot line. This can be done from the grub menu so you do not need a successful boot to add. You also will need to update some entries in RBSU to enable serial logging and VSP.
I will work on this. I have a DL380 G5 not a G9. My bios version is 2.1 and my Ilo is version 1.73. I think there is a firmware update from HP but it is hard to figure out what files to download and install. I followed the instructions for bios setting. My version, the is not in a sub-menu, I set it as instructed. The bios serial console and ems menu options were as described and set as instructed. My Ilo is very different, no place to set access settings. Not how to proceed here. I have rebooted and will configure grub2 config files. Where do I get the Ilo IP address from? I saw the Network configuration but not sure if I set it up to get an IP from my router. Thanks so much for this help.
(In reply to Joe Byers from comment #56) > I will work on this. I have a DL380 G5 not a G9. My bios version is 2.1 > and my Ilo is version 1.73. I think there is a firmware update from HP but > it is hard to figure out what files to download and install. > > I followed the instructions for bios setting. My version, the is not in a > sub-menu, I set it as instructed. The bios serial console and ems menu > options were as described and set as instructed. > > My Ilo is very different, no place to set access settings. Not how to > proceed here. I have rebooted and will configure grub2 config files. > > Where do I get the Ilo IP address from? I saw the Network configuration but > not sure if I set it up to get an IP from my router. > > Thanks so much for this help. So, you can install ipmitool to get your ilo IP, but you would have to be able to boot and install ipmitool. And your server would have to have support for impi. If you can boot and the server has support for ipmi, this command will work: ipmitool lan print | awk '/IP Address *:/ {print $4}' The ilo IP should also show up at POST on the console. That is the one to use and is easiest. After that, if you have another Linux box, you can use: script -c "ssh <ilo IP address" /tmp/console_log_for_my_issue script will log everything to the log file. If you have windows, you can use putty or some other tool to connect to the IP and log that way.
I can boot to an the EOL F25 kernel that I was upgrading. I have impitool installed on the server, so How do I determine if impi is supported? Also, on a side note since you showed me screenshots of a DL G9 bios. Can I upgrade my firmware from 4.12 to 7.24, skipping version between. I know this is off topic for here, but I would appreciate any reassurance. Last this is my home server, so I might be delayed in getting back on some of items due to other time requirements like I need to write 2 exams:) Thank you so much.
(In reply to Joe Byers from comment #58) > I can boot to an the EOL F25 kernel that I was upgrading. I have impitool > installed on the server, so How do I determine if impi is supported? > > Also, on a side note since you showed me screenshots of a DL G9 bios. Can I > upgrade my firmware from 4.12 to 7.24, skipping version between. I know > this is off topic for here, but I would appreciate any reassurance. > > Last this is my home server, so I might be delayed in getting back on some > of items due to other time requirements like I need to write 2 exams:) > > Thank you so much. Do you mean the P410 FW? Yes. For ipmi: dmidecode --type 38 (Running on a workstation without ipmi) # dmidecode 3.2 Getting SMBIOS data from sysfs. SMBIOS 2.8 present. Running it on a server with ipmi: # dmidecode 3.0 Getting SMBIOS data from sysfs. SMBIOS 3.1.1 present. # SMBIOS implementations newer than version 3.0 are not # fully supported by this version of dmidecode. Handle 0x0015, DMI type 38, 18 bytes IPMI Device Information Interface Type: KCS (Keyboard Control Style) Specification Version: 2.0 I2C Slave Address: 0x10 NV Storage Device: Not Present Base Address: 0x0000000000000CA2 (I/O) Register Spacing: Successive Byte Boundaries Hope this helps.
I want to provide a follow up. I followed all instructions. And thank you for the support, I updated the P400 firmware and the ILO2 firmware for this DL380 G5. I rebooted for each update just to be sure. P400 Firmware from 4.12 to 7.24 I was working more on the ipmi, which is running on the server. I decided to try and boot to a F28 kernel. And I'll be if it didn't boot to the F28 4.18.9-200 kernel. This blew my mind. I did try the 4.18.8 without success. I am not sure, but it seems that something in the new version P400 controller firmware was needed for the Kernel to see the disks and the partitions. The driver is hpsa since the devices are follow sdX and not ccissX. I want to thank you all for the assistance. I am not sure if things are fully fixed, but I am at least running on the latest kernel. Thank you all again. Joe
Thanks for the update. I do not think there are many (if any) hpsa driver differences. But I would like to diff them. Can you POST the links to the kernel sources?