Created attachment 1409136 [details]
console output of the f28 installation beaker job
Description of problem:
Try to install f28 to a fcoe server with aacraid driver,but failed.
I've tried three times,all failed,but installation of f27/rhel to the save server is successful.
Version-Release number of selected component (if applicable):
Fedora-28-20180315.n.0 Server x86_64
Steps to Reproduce:
Created attachment 1409183 [details]
console output of the "success" f28 installation beaker job(with Fedora-28-20180315.n.0 Server x86_64,too)
The installation on the server with megaraid_sas driver succeed.
Proposed as a Blocker for 28-beta by Fedora user lnie using the blocker tracking app because:
seems affects "The installer must be able to detect and install to hardware or firmware RAID storage devices"
Discussed at 2018-03-19 Fedora 28 blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-03-19/f28-blocker-review.2018-03-19-16.02.html . We agreed that if this only affects a RAID set accessed via FCoE - effectively stacking two relatively unusual use cases that are blockers alone - this is too much of a corner case to constitute a blocker. However, if this can be reproduced with a *local* Adaptec RAID controller, we will re-evaluate it.
Lili, can you test with the RAID controller accessed as a regular local device? Thanks!
Hi Adam, I have done a manual installation with a local aacraid server(Dell PowerEdge T110),I found the bug with HP ProLiant DL120 G7,so it seems that this bug affects widely.
er,I mean the manual installation with local server is failed due to this bug,of course. I forgot to type"but faild"
The two guilty lines might be:
dracut-pre-udev: modprobe: ERROR: could not insert 'floppy': No such device
dracut-pre-udev: modprobe: ERROR: could not insert 'sha256_mb': No such device
didn't see that from the log of the successful installations.
Nah, I don't think so; it's odd that there's a difference, but those are just kernel modules for floppy disk support and SHA-256, I don't think they relate to aacraid.
This is definitely a problem, though :( Laura, Justin, how can lili debug this further?
I do see various module params for tweaking the modules behaviour:
parm: aac_sync_mode:Force sync. transfer mode 0=off, 1=on (int)
parm: aac_convert_sgl:Convert non-conformable s/g list 0=off, 1=on (int)
parm: nondasd:Control scanning of hba for nondasd devices. 0=off, 1=on (int)
parm: cache:Disable Queue Flush commands:
bit 0 - Disable FUA in WRITE SCSI commands
bit 1 - Disable SYNCHRONIZE_CACHE SCSI command
bit 2 - Disable only if Battery is protecting Cache (int)
parm: dacmode:Control whether dma addressing is using 64 bit DAC. 0=off, 1=on (int)
parm: commit:Control whether a COMMIT_CONFIG is issued to the adapter for foreign arrays.
This is typically needed in systems that do not have a BIOS. 0=off, 1=on (int)
parm: msi:IRQ handling. 0=PIC(default), 1=MSI, 2=MSI-X) (int)
parm: startup_timeout:The duration of time in seconds to wait for adapter to have it's kernel up and
running. This is typically adjusted for large systems that do not have a BIOS. (int)
parm: aif_timeout:The duration of time in seconds to wait for applications to pick up AIFs before
deregistering them. This is typically adjusted for heavily burdened systems. (int)
parm: aac_fib_dump:Dump controller fibs prior to IOP_RESET 0=off, 1=on (int)
parm: numacb:Request a limit to the number of adapter control blocks (FIB) allocated. Valid values are 512 and down. Default is to use suggestion from Firmware. (int)
parm: acbsize:Request a specific adapter control block (FIB) size. Valid values are 512, 2048, 4096 and 8192. Default is to use suggestion from Firmware. (int)
parm: update_interval:Interval in seconds between time sync updates issued to adapter. (int)
parm: check_interval:Interval in seconds between adapter health checks. (int)
parm: check_reset:If adapter fails health check, reset the adapter. a value of -1 forces the reset to adapters programmed to ignore it. (int)
parm: expose_physicals:Expose physical components of the arrays. -1=protect 0=off, 1=on (int)
parm: reset_devices:Force an adapter reset at initialization. (int)
Could you try playing with any of those that might affect things? You might need help from someone who knows what values might sensibly be used, though, I honestly have no idea (I've never worked with one of these adapters).
Just out of curiosity, as I am trying to track this down, does this system function as expected with F27 and the latest 4.15 kernel update? Entirely likely that this was introduced in 4.16, as there were many changes, but I want to rule out earlier kernels.
I have installed f27 on the Dell PowerEdge T110 and update the kernel to 4.15,it works fine.
lnie: just to confirm, can you also try installing a 4.16 kernel - an fc28 or f29 one, from koji - and see if *then* you see the bug? thanks!
Adam,the system failed to boot after I updated the kernel to kernel-4.16.0-0.rc6.git0.2.fc28,screenshot1 is the final picture before dracut messages flowing.
Created attachment 1411013 [details]
I've gotten that picture by removing rhgb quiet,for sure
Created attachment 1411035 [details]
journal after boot into the 4.15 kernel,just in case it's useful
(In reply to Adam Williamson from comment #8)
> Could you try playing with any of those that might affect things? You might
> need help from someone who knows what values might sensibly be used, though,
> I honestly have no idea (I've never worked with one of these adapters).
I'm gonna to see whether reset_devices param will bring us any difference tomorrow:)
One last thing to try and narrow things down, can you try to boot that F27 install with the rc4 kernel? https://koji.fedoraproject.org/koji/buildinfo?buildID=1053469
The criteria says we block on failures to install to hardware RAID, but I can also see us handwaving this if it was the last blocker at the Go/No-Go meeting and treating it as a Final blocker. I'm not going to +1 blocker this unless I hear evidence that it's going to hit a meaningful percentage of our users.
I'm definitely +1 FE in the meantime.
Adam,I've played with reset_devices and aac_sync_mode,aac_sync_mode works,the system successfully boot with kernel-4.16.0-0.rc6.git0.2.fc28.x86_64,yay~~
Created attachment 1411599 [details]
dmesg of the successful boot
Created attachment 1411600 [details]
journal of the successful boot
Created attachment 1411601 [details]
last picture before dracut messags flowing(with reset_devices)
(In reply to Justin M. Forbes from comment #17)
> One last thing to try and narrow things down, can you try to boot that F27
> install with the rc4 kernel?
rc4 kernel also dosen't work,system just hang there(picture6)
Created attachment 1411602 [details]
So if this fails with default options on 4.16 but passes with default options on 4.15, that's clearly a bug, but if a non-default option can make it work, that's probably a sufficient workaround at least for Beta. On that basis I'm -1 blocker for Beta, not sure about FE, would depend on the fix.
Thanks for the testing, Lili. Presumably RC4 kernel on F27 also works if you use aac_sync_mode ?
This would make sense, what I was looking for with the rc4 test isn't directly related to the aacraid code, but with patches that upstream hasn't picked up with yet. I will see what upstream has as a fix. With a known work around, I am -1 to blocker as well.
With the information above, I'm firmly -1 blocker.
I'm also going to go for -1 FE; if the fix is going to require a kernel rebase, I'm not comfortable with that level of risk during Freeze. Let's document the workaround in Known Issues and fix it as soon after Freeze lifts as possible.
Discussed during blocker review :
RejectedBlocker (Final) - this is specific to one particular type of RAID adapter, and we have a reasonable workaround, so we don't consider this a serious enough violation to be a Beta blocker
Discussed during blocker review :
RejectedBlocker (Beta) - this is specific to one particular type of RAID adapter, and we have a reasonable workaround, so we don't consider this a serious enough violation to be a Beta blocker
(In reply to Adam Williamson from comment #25)
> So if this fails with default options on 4.16 but passes with default
> options on 4.15, that's clearly a bug, but if a non-default option can make
> it work, that's probably a sufficient workaround at least for Beta. On that
> basis I'm -1 blocker for Beta, not sure about FE, would depend on the fix.
> Thanks for the testing, Lili. Presumably RC4 kernel on F27 also works if you
> use aac_sync_mode ?
Which card and system did you reproduce this on? and what is the firmware version for the card?
Created attachment 1413479 [details]
picture of the RAID controller
please feel free to ask if more information is needed.
*********** MASS BUG UPDATE **************
We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.
Fedora 28 has now been rebased to 4.17.7-200.fc28. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.