Bug 486244
Summary: | [Intel 6.0 FEAT] Add FCoE boot capability | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | John Ronciak <john.ronciak> |
Component: | anaconda | Assignee: | Hans de Goede <hdegoede> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Release Test Team <release-test-team-automation> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 6.0 | CC: | andriusb, atodorov, berthiaume_wayne, ddumas, ed.ciechanowski, eric.w.multanen, gene.heskett, hdegoede, jane.lv, jlv, john.ronciak, jvillalo, keve.a.gabbert, luyu, mchristi, minh.t.pham, robert.w.love, ross.b.brattain, rpacheco, rwilliam, snagar, supreeth.venkataraman, syeghiay, yi.zou |
Target Milestone: | beta | Keywords: | FutureFeature |
Target Release: | 6.0 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | anaconda-13.21.50-8 | Doc Type: | Enhancement |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-11-10 19:36:25 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 513011, 513018, 569766, 593744, 618875, 619604, 619605, 692939 | ||
Bug Blocks: | 510435, 519933, 554559 | ||
Attachments: |
Description
John Ronciak
2009-02-19 00:25:17 UTC
So, what exactly is required to support this feature? Do we have hardware and documentation in the Westford office that we can use to develop and test support? There are two parts to this feature. Support during install time, and support during boot time. Installation: 1. The default kernel needs to have the ixgbe, scsi_transport_fc, libfc, and the fcoe modules pre-loaded. 2. Anaconda must perform discovery of Fibre Channel targets/LUNs using the open-fcoe initiator and the user must be given an option to install to the remote disk (This could probably be similar to the iSCSI solution under the "advanced storage configuration" option). 3. To do the discovery, an fcoe interface needs to be created by writing the interface name that is connected to the fabric to /sys/module/fcoe/parameters/create. Post-Installation: Changes need to be made to the initrd image to support fcoe boot. At boot time, the initrd image will be transferred to the initiator machine using an Int13h connection from the FCoE boot Option ROM (Similar to iSCSI boot). To complete the boot, the open-fcoe initiator should be started from within the initrd image so that the correct root partition is mounted. Once this is done, we simply chroot to the new root partition and run /sbin/init. For this to happen changes need to be made to mkinitrd so that 1. init should load the ixgbe, scsi_trabsport_fc, libfc, and the fcoe modules soon after the SCSI module is loaded. 2. Ensure that networking is started. 3. Ensure that the correct ethernet interface is brought up. 4. Initiate a fcoe connection through that interface and perform discovery. 5. Mount the correct root and run /sbin/init. The above steps are very similar to RHEL's iSCSI boot solution which could serve as the reference for developing this feature. (In reply to comment #2) > There are two parts to this feature. Support during install time, and support > during boot time. > > Installation: > Hi, As discussed by mail (some time ago), I'll be working on adding support for this to Fedora (F-12). As also discussed I do not have access to hardware to test this, so testing will need to be done by you. I've written a first version of support of the "Installation" part of this feature. Here is an update.img which can be used together with F-11, which adds the ability to configure FCoE under the "Advanced Storage" button in the initial partitioning screen: http://people.atrpms.net/~hdegoede/updates-486244.img Please give this a try, note that you will still need to manually update your initrd after the install before rebooting. Thanks, Hans. I will give it a try and report back. cheers, Supreeth Hans, I tried the update image and could not get it to work. So there's either some problem with the image itself or I am doing something wrong. Here's a step by step account of what I'm doing 1. I'm using instructions from http://fedoraproject.org/wiki/Anaconda/Updates 2. I transferred the contents of the image into a floppy drive using dd as described in the link. 3. I ran the Fedora 11 installation CD, pressed TAB and included "linux updates" in the kernel command line. 4. I selected my floppy drive as the update device when asked (/dev/fd0) 4. The box on the screen said "Reading Anaconda Updates". 5. The usual install screen showed up, and there was no fcoe configuration in the "Advanced Storage" screen only iSCSI I wrote down the kernel messages as anaconda started and found some FATAL: messages. I'm including the messages I saw below (Note: I just copied these from the screen and these are not from a serial console). ----- INFO: kernel command line: initrd=initrd.img linux updates BOOT_IMAGE=vmlinuz ... INFO: UPDATES device is /dev/fd0 FATAL: Module dm_mod not found FATAL: Module dm_zero not found FATAL: Module dm_mirror not found FATAL: Module dm_snapshot not found INFO: Running anaconda script /usr/bin/anaconda ... INFO: anaconda called with cmdline = ['/usr/bin/anaconda', '--stage2', 'cdrom:///dev/sr0:/mnt/stage2', '--graphical', '--selinux'] ... ERROR: Error running xrandr: None ERROR: Exception when running xrandr: Error running xrandr: None INFO: Starting graphical installation ... WARNING: step installtype does not exist WARNING: step confirminstall does not exist WARNING: step complete does not exist ... ------- Please let me know if I need to do any extra steps in addition to what's on the link above. If you need any other info/traces please let me know. Thanks, Supreeth (In reply to comment #5) > Hans, > > I tried the update image and could not get it to work. <snip> Hmm, it looks like its not finding your updates.img. I'm not sure if the updates.img I provided is suitable for use on a floppy (its a compressed cpio archive). Try using it like this (from the boot command line): linux updates=http://people.atrpms.net/~hdegoede/updates-486244.img This requires the system has internet access ofcourse. If it doesn't you can put it on a local http server, if this is a problem let me know and I'll try the floppy method locally and see if I can get that to work. Once you've started anaconda this way you can check if you actually have the updates.img by switching to tty2 (ctrl+alt+f2) and then doing: ls /tmp/updates If that dir is empty somehow you did not get the updates.img Thanks, Hans. I put the updates.img on a USB drive and this time it read the image, but the results are the same as what I found yesterday. I looked into /tmp/updates and the image was there. All the messages I posted in comment#5 (including the FATAL ones) are exactly the same. My test machine is not connected to the Internet to try your other suggestion, but at least we know for sure that the image is being read from the USB. I also verified that the image is not corrupted by uncompressing all the files within to a separate directory and cpio had no issues in uncompresing them. Thanks, Supreeth Hi, are there any updates on this? Thanks, Supreeth (In reply to comment #7) > Thanks, Hans. I put the updates.img on a USB drive and this time it read the > image, but the results are the same as what I found yesterday. I looked into > /tmp/updates and the image was there. All the messages I posted in comment#5 > (including the FATAL ones) are exactly the same. > > My test machine is not connected to the Internet to try your other suggestion, > but at least we know for sure that the image is being read from the USB. I also > verified that the image is not corrupted by uncompressing all the files within > to a separate directory and cpio had no issues in uncompresing them. > > Thanks, > Supreeth Hmm, So you saw the same files you get if you un cpio the updates.img under /mnt/updates, right ? Strange I just double checked and that updates.img + F-11 gold does give me the FCoE option. This might be arch specific somehow, are you using an i386 (well i586 now a days actually) install CD ? If not could you try that? Also could you (in the non working case) try: modprobe fcoe ls -l /sys/module/fcoe On tty 2 and paste the output here ? Thanks Hans. Today I was able to get it working by copying the update image on a local http server I set up and giving the URL of the updates file to "linux updates." Once I did this the /tmp/updates directory had the correct glade/python files. So I got to the "Advanced storage" screen and it gave me the "Add FCoE SAN" option. I chose that and here's what happened. 1. Discovery is successful and I can see all the LUNs I expect to see. 2. The "What Drive would you like to boot this installation from?" box is inactive. When I click "next" it gives me the "Must select a drive to uses the bootable device" message and does not let me proceed any further. The only way I can proceed any further is if I choose the "create custom layout" option. 3. If I go to the "create custom layout" option and manually configure a partition with ext3 filesystem on the remote LUN I choose (say /dev/sdb), it asks me if I want to write the changes to disk. When I say yes, it tries to write the config to the remote disk but fails with a box that says "Storage Activation Failed". If I click on "Details" it says "Error opening /dev/sdb: No such device or address". If I go to tty2 and look at the partitions /dev/sdb is present and is accessible. At this point I only have the option to "File Bug" or "Exit installer." Upon quick investigation it looks like some I/O error happened and we'll be investigating the network traces to see what's going on. I'll update the BZ when we have some new info. Could you please look into #2 above? About #2, can you please attach /tmp/anaconda.log and /tmp/storage.log from after adding the drive (you can get to them from tty2) ? Created attachment 348122 [details]
/tmp/anaconda.log
Attaching /tmp/anaconda.log
Created attachment 348123 [details]
/tmp/storage.log
Attaching /tmp/storage.log
Hi, I've found the issue with the grayed out boot device selection and updated the updates.img to include a fix. Thanks, Hans. The new image you sent looks good. I was able to successfully install to a remote LUN using FCoE which is really cool! Your fix to the grayed out boot device selection also seems to have resolved the I/O error I had reported earlier. Do you have any ideas on how the boot device box being grayed out could this could have triggered the I/O issue? Thanks! Supreeth Hi Supreeth, Any news / progress on this ? Hi Hans, I think there was a miscomm where both of us was waiting for the other to reply. The installation is successful as I'd mentioned earlier. The next step would be for you guys to do the mkinitrd changes to facilitate auto boot. The changes should be very similar to your iSCSI solution. The difference between the iSCSI and FCoE solution would be that iSCSI solution reads parameters from the iBFT whereas the FCoE solution will be reading it from the EDD structure as I described in a mail sent to you and Mike Christie on 5/28/2009. Mike also suggested how the info from the EDD can be used. Since we did not exchange any mails after that I assumed everyone was OK with the solution. The EDD structure can be read by the kernel code in /drivers/firmware/edd.c. Once read, interface information from the EDD will be copied to sysfs (in the file /sys/firmware/edd/int13_dev80/interface). There will also be a link to the associated PCI device under pci_dev in this folder. Basically what should happen is 1. Some userspace app reads the interface file and retrieves the target WWPN and LUN #. 2. Follow the symlink pci_dev and determine which of /sys/class/net/ethX contains the pci_dev info (a simple regex script should take care of this). When a match is made, the corresponding ethX becomes the boot if. Let's say this info is stored in some variable called $ifname 3. Bring up the interface - ifconfig $ifname up 4. Create an FCoE interface - fcoeadm --create $ifname 5. Determine the remote port with the correct WWPN - A simple script will scan through the newly discovered remote port directories and check which port_name matches the WWPN from the interface file in step 1. This is the target we want to connect to using LUN# from the interface file. Eventually the solution will also need to support dcb at both install and boot time and we're working on a prototype solution here currently which we will share with you as soon as it works. The dcb stuff however should not hinder implementation of the steps described in comment #2 or reading the EDD structure above in any way. Which version of the Option ROM do you have? I want to make sure that you have the latest version which includes support for writing firmware parameters to EDD. Thanks, Supreeth (In reply to comment #17) > Hi Hans, > Hi, > I think there was a miscomm where both of us was waiting for the other to > reply. The installation is successful as I'd mentioned earlier. Ah I had read over the "Your fix to the grayed out boot device selection also seems to have resolved the I/O error I had reported earlier" part of your previous comment, so I was waiting on further feedback wrt that, my bad. > The next step > would be for you guys to do the mkinitrd changes to facilitate auto boot. The > changes should be very similar to your iSCSI solution. > <snip> This is not what this bug is about, this bug is about adding basic FCoE support as you outlined in comment #2, this does not include reading firmware tables through sysfs as you are now asking for in comment #17. The plan for the basic FCoE support in the case where / lives on an FCoE disk is to pass an ethernet device to use for FCoE to dracut on the kernel cmdline, and dracut will then write the device name to /sys/module/fcoe/parameters/create After this dracut will use mount by LABEL or mount by UUID to find the root filesystem. Please file a new feature request for the firmware table support. > Which version of the Option ROM do you have? I want to make sure that you have > the latest version which includes support for writing firmware parameters to > EDD. > iirc, the FCoE capable firmware only works on 10 gigabit nics, I only have a 1 gigabit nic, and having a 10 gigabit nic would be of little use as I have nothing to connect it to. Regards, Hans Hi, I've a couple of questions for you: I'm currently working on writing out the information about FCoE SAN's configured during boot to the installed system. This comes down to writing a /etc/fcoe/cfg-eth# File containing: ### FCOE_ENABLE="yes" DCB_REQUIRED="no" ### For each interfaced used for FCoE during the install. I wonder though, when / is on FCoE, if this should still be done (as dracut will already bring up the FCoE). Writing this will cause "fcoeadm -c eth#" to be called for an eth# which has already been activated as FCoE interface by dracut. I assume / hope that this is a no-op and thus not a problem, because if it is a problem we need to find a way to make this not happen. For iscsi we currently recognize if a scsis disk (/dev/sd?) is an iscsi disk or a regular scsi disk. It would be good to do the same for FCoE, as it might come in handy to tell the difference later on. For iscsi we use the /dev/disk/by-path name to see the disk is iscsi. Does udev currently create sensible /dev/disk/by-path names for FCoE disks like it does for iscsi, if it doesn't could you provide a patch to /lib/udev/path_id for this ? Thanks, Hans Hi, I've just put an updated updates.img here: http://people.atrpms.net/~hdegoede/updates-486244.img Which adds writing out of /etc/fcoe/cfg-eth# files for all NIC's used to connect to an FCoE SAN during the installation, could you give this a try, and see of the /etc/fcoe/cfg-eth# file(s) gets written out correctly? Thanks! (In reply to comment #19) Hi Hans, > For each interfaced used for FCoE during the install. I wonder though, when > / is on FCoE, if this should still be done (as dracut will already bring up the > FCoE). Writing this will cause "fcoeadm -c eth#" to be called for an eth# which > has already been activated as FCoE interface by dracut. I assume / hope that > this > is a no-op and thus not a problem, because if it is a problem we need to > find a way to make this not happen. I see this as a no-op because fcoeadm -c "ethX" essentially is "echo ethX > /sys/module/fcoe/parameters/create". If the interface already exists echo will flag a write error but the interface itself will still be available for use. So I tried the following on eth1 after the interface was created and discovery done. sh-4.0# echo "eth1" > /sys/module/fcoe/parameters/create sh: echo: write error: File exists The interface is still usable and I verified it as well. If the "write error" message is a concern then we should take steps to make sure this does not happen. > For iscsi we currently recognize if a scsis disk (/dev/sd?) is an iscsi disk > or a regular scsi disk. It would be good to do the same for FCoE, as it > might come in handy to tell the difference later on. For iscsi we use the > /dev/disk/by-path name to see the disk is iscsi. Does udev currently create > sensible /dev/disk/by-path names for FCoE disks like it does for iscsi, if it > doesn't could you provide a patch to /lib/udev/path_id for this ? Yes, udev does indeed create proper /dev/disk/by-path names for fc disks after discovery. Here's a run of ls on /dev/disk/by-path. I added newlines to facilitate easy reading. sh-4.0# cd /mnt/sysimage sh-4.0# cd dev/disk/by-path sh-4.0# ls pci-0000:00:1f.1-scsi-0:0:0:0 pci-eth1-fc-0x201600a0b842138c:0x0000000000000000 pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part1 pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part2 pci-eth1-fc-0x201600a0b842138c:0x0001000000000000 pci-eth1-fc-0x201600a0b842138c:0x0001000000000000-part1 pci-eth1-fc-0x201600a0b842138c:0x0002000000000000 pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part1 pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part2 pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part3 pci-eth1-fc-0x5006016141e03375:0x0000000000000000 sh-4.0# I think this should be sufficient to identify fc disks at boot time. cheers, Supreeth > Thanks, > Hans Created attachment 350196 [details]
Anaconda dump after unhandled eception.
Attaching the anaconda dump file.
(In reply to comment #20) > Hi, > I've just put an updated updates.img here: > http://people.atrpms.net/~hdegoede/updates-486244.img > Which adds writing out of /etc/fcoe/cfg-eth# files for all NIC's used to > connect to an FCoE SAN during the installation, could you give this a try, > and see of the /etc/fcoe/cfg-eth# file(s) gets written out correctly? Thanks! Hans, I tried the new image and got an unhandled exception after the filesystem was written to the remote LUN. The exception happened after the installer asked me to choose additional repositories to install and was checking dependencies of the packages chosen for install. I have attached the anacdump.txt for your investigation. I will also do some investigation here to see if a possible I/O error caused the issue. Thanks, Supreeth (In reply to comment #21) Hi Supreeth, Thanks for the info and the testing! > I see this as a no-op because fcoeadm -c "ethX" essentially is > "echo ethX > /sys/module/fcoe/parameters/create". If the interface already > exists echo will flag a write error but the interface itself will still be > available for use. So I tried the following on eth1 after the interface was > created and discovery done. > > sh-4.0# echo "eth1" > /sys/module/fcoe/parameters/create > sh: echo: write error: File exists > > The interface is still usable and I verified it as well. If the "write error" > message is a concern then we should take steps to make sure this does not > happen. > Ok, could you try doing "fcoeadm -c eth#" on an already up interface, maybe that is smart enough to just return success, if it isn't we indeed need to get rid of the error somehow (errors like this tends to scare users). But this is a minor issue. > > For iscsi we currently recognize if a scsis disk (/dev/sd?) is an iscsi disk > > or a regular scsi disk. It would be good to do the same for FCoE, as it > > might come in handy to tell the difference later on. For iscsi we use the > > /dev/disk/by-path name to see the disk is iscsi. Does udev currently create > > sensible /dev/disk/by-path names for FCoE disks like it does for iscsi, if it > > doesn't could you provide a patch to /lib/udev/path_id for this ? > > Yes, udev does indeed create proper /dev/disk/by-path names for fc disks after > discovery. Here's a run of ls on /dev/disk/by-path. I added newlines to > facilitate easy reading. > > sh-4.0# cd /mnt/sysimage > sh-4.0# cd dev/disk/by-path > sh-4.0# ls > pci-0000:00:1f.1-scsi-0:0:0:0 > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000 > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part1 > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part2 > pci-eth1-fc-0x201600a0b842138c:0x0001000000000000 > pci-eth1-fc-0x201600a0b842138c:0x0001000000000000-part1 > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000 > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part1 > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part2 > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part3 > pci-eth1-fc-0x5006016141e03375:0x0000000000000000 > sh-4.0# > > I think this should be sufficient to identify fc disks at boot time. It is, cool! Thanks, Hans (In reply to comment #23) Hi, > I tried the new image and got an unhandled exception after the filesystem was > written to the remote LUN. The exception happened after the installer asked me > to choose additional repositories to install and was checking dependencies of > the packages chosen for install. I have attached the anacdump.txt for your > investigation. I will also do some investigation here to see if a possible I/O > error caused the issue. > Thanks for the anacdump.txt this was a bug in my write out /etc/fcoe/cfg-eth# code, should be fixed now: http://people.atrpms.net/~hdegoede/updates-486244.img (In reply to comment #24) > sh-4.0# ls > pci-0000:00:1f.1-scsi-0:0:0:0 > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000 > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part1 > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part2 > pci-eth1-fc-0x201600a0b842138c:0x0001000000000000 > pci-eth1-fc-0x201600a0b842138c:0x0001000000000000-part1 > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000 > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part1 > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part2 > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part3 > pci-eth1-fc-0x5006016141e03375:0x0000000000000000 Question what does this look like for "real" fibrechannel ? I'm wondering if I need to check for "eth#" there (which could be tricky as nics can be renamed), or if just checking for pci-*-fc-* is enough. (In reply to comment #25) Hi Hans, > (In reply to comment #23) > Hi, > > I tried the new image and got an unhandled exception after the filesystem was > > written to the remote LUN. The exception happened after the installer asked me > > to choose additional repositories to install and was checking dependencies of > > the packages chosen for install. I have attached the anacdump.txt for your > > investigation. I will also do some investigation here to see if a possible I/O > > error caused the issue. > > > Thanks for the anacdump.txt this was a bug in my write out > /etc/fcoe/cfg-eth# code, should be fixed now: > http://people.atrpms.net/~hdegoede/updates-486244.img Thank you for the new image. The install is successful and the file is written correctly to /mnt/sysimage/etc/fcoe/cfg-eth1. Thanks, Supreeth (In reply to comment #24) Hi Hans, > (In reply to comment #21) > Hi Supreeth, > Thanks for the info and the testing! Not a problem at all! <snip> > Ok, could you try doing "fcoeadm -c eth#" on an already up interface, maybe > that is smart enough to just return success, if it isn't we indeed need to > get rid of the error somehow (errors like this tends to scare users). But > this is a minor issue. I haven't had a chance to check but I did inspect the source code. As long as the write is sucessful to /sys/module/fcoe/parameters/create (using fopen and fputs) fcoeadm -c is designed to return success. I think we should be fine but we will keep our eyes peeled out for this. I will check for any issues and update you. <snip> Thanks, Supreeth (In reply to comment #26) Hi Hans, > (In reply to comment #24) > > sh-4.0# ls > > pci-0000:00:1f.1-scsi-0:0:0:0 > > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000 > > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part1 > > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part2 > > pci-eth1-fc-0x201600a0b842138c:0x0001000000000000 > > pci-eth1-fc-0x201600a0b842138c:0x0001000000000000-part1 > > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000 > > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part1 > > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part2 > > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part3 > > pci-eth1-fc-0x5006016141e03375:0x0000000000000000 > Question what does this look like for "real" fibrechannel ? I'm wondering > if I need to check for "eth#" there (which could be tricky as nics can be > renamed), or if just checking for pci-*-fc-* is enough. For a real FC HBA, this looks similar in a traditional sense but the eth1 is replaced by PCI device information. For example it looks something like, pci-0000:00:1c.0-fc-0x201600a0b842138c:0x0001000000000000 Checking for pci-*-fc-* would retrieve all available FC disks, but since we'll always be connecting to an FCoE switch to do the discovery and not a traditional FC switch I think the search should be sufficient IMHO. I will however run some experiments here and try some permutations and combinations. Thanks, Supreeth Hi Supreeth, Once more thanks for all the input, I've prepared a new updates.img for you: http://people.atrpms.net/~hdegoede/updates-486244.img New this time around is that the code now recognizes FCoE disks as a separate type of disk from normal SCSI disk and tracks through which NIC it is connected, this allows for the following 2 things: 1) Make sure fcoe-utils gets installed 2) Write out an /etc/sysconfig/network-scripts/ifcfg-eth# file with "NM_CONTROLLED=no" in there, so that NetworkManager won't touch the interface Can you do another test install with this updates.img please and check that: 1) fcoe-utils gets installed 2) /etc/sysconfig/network-scripts/ifcfg-eth# with "NM_CONTROLLED=no" in there gets written ? Thanks! One more question, from a reviewer of this new batch of changes, is it possible to use one interface for both IP traffic and FCoE at the same time ? and if this is possible, is this a realistic scenario ? (In reply to comment #30) Hi Hans, > Once more thanks for all the input, I've prepared a new updates.img for you: > http://people.atrpms.net/~hdegoede/updates-486244.img > New this time around is that the code now recognizes FCoE disks as a separate > type of disk from normal SCSI disk and tracks through which NIC it is > connected, this allows for the following 2 things: > 1) Make sure fcoe-utils gets installed > 2) Write out an /etc/sysconfig/network-scripts/ifcfg-eth# file > with "NM_CONTROLLED=no" in there, so that NetworkManager won't > touch the interface > Can you do another test install with this updates.img please and check that: > 1) fcoe-utils gets installed > 2) /etc/sysconfig/network-scripts/ifcfg-eth# with > "NM_CONTROLLED=no" in there gets written ? > Thanks! Not a problem. I will work on this ASAP and update you (mostly the ETA is sometime tomorrow) > One more question, from a reviewer of this new batch of changes, is it > possible to use one interface for both IP traffic and FCoE at the same time ? > and if this is possible, is this a realistic scenario ? The short answer is yes, we can do this using DCB. DCB allows us to tag storage packets and IP packets with separate priorities. This in turn is used for Priority Flow Control using Pause frames. For example some priority grouping might say that assign 40% of bandwidth to LAN, 40% to SAN etc. When a class of traffic sees a sudden increase in bandwidth, the FCoE switch can send a pause frame to manage this situation. Thanks, Supreeth (In reply to comment #31) > > One more question, from a reviewer of this new batch of changes, is it > > possible to use one interface for both IP traffic and FCoE at the same time ? > > and if this is possible, is this a realistic scenario ? > > The short answer is yes, we can do this using DCB. DCB allows us to tag storage > packets and IP packets with separate priorities. This in turn is used for > Priority Flow Control using Pause frames. For example some priority grouping > might say that assign 40% of bandwidth to LAN, 40% to SAN etc. When a class of > traffic sees a sudden increase in bandwidth, the FCoE switch can send a pause > frame to manage this situation. > Ok, so this can be done, but it sounds like a really weird setup to me, surely one wants separate networks for TCP/IP and for storage ? So do you think this is something which we ought to support in the installer (my own vote goes to declaring this an unsupported setup) ? (In reply to comment #32) Hi Hans, > (In reply to comment #31) > > > One more question, from a reviewer of this new batch of changes, is it > > > possible to use one interface for both IP traffic and FCoE at the same time ? > > > and if this is possible, is this a realistic scenario ? > > > > The short answer is yes, we can do this using DCB. DCB allows us to tag storage > > packets and IP packets with separate priorities. This in turn is used for > > Priority Flow Control using Pause frames. For example some priority grouping > > might say that assign 40% of bandwidth to LAN, 40% to SAN etc. When a class of > > traffic sees a sudden increase in bandwidth, the FCoE switch can send a pause > > frame to manage this situation. > > > Ok, so this can be done, but it sounds like a really weird setup to me, > surely one wants separate networks for TCP/IP and for storage ? > So do you think this is something which we ought to support in the installer > (my own vote goes to declaring this an unsupported setup) ? We probably do not need to support both TCP and FCoE traffic (converged traffic) on the same port at install or boot time for now. However, since FCoE runs on DCB enabled networks, DCB needs to be supported at all times FCoE is running (install time and boot time) even if the install/boot interface is only being used for FCoE traffic and not converged traffic. (In reply to comment #33) > > We probably do not need to support both TCP and FCoE traffic (converged > traffic) on the same port at install or boot time for now. However, since FCoE > runs on DCB enabled networks, DCB needs to be supported at all times FCoE is > running (install time and boot time) even if the install/boot interface is only > being used for FCoE traffic and not converged traffic. Thanks for the input, as for DCB support, I understand from previous comments that that is still being worked on at the tools / kernel level, and that was not part of the original feature request. So please file a new feature request for adding DCB support, and lets focus first on getting the basic FCoE minimal support up and running. WRT your mail that fcoe-utils did not get installed and the ifcfg-eth# did not get written, can you please attach /tmp/storage.log and /tmp/anaconda.log from an install with the latest updates.img ? Thanks! Created attachment 351290 [details]
Anaconda.log file July 10, 2009
Created attachment 351291 [details]
Storage.log file July 10, 2009
(In reply to comment #30) Hi Hans, <snip> > Can you do another test install with this updates.img please and check that: > 1) fcoe-utils gets installed > 2) /etc/sysconfig/network-scripts/ifcfg-eth# with > "NM_CONTROLLED=no" in there gets written ? I did a full reinstall with the latest update image and here are some observations. 1. The NM_CONTROLLED=no is written to /etc/sysconfig/network-scripts/ifcfg-eth1 but is not written to the corresponding spot in /mnt/sysimage which is where I was looking for it. Is this the expected behavior, or does the file need to be written to /mnt/sysimage as well? 2. After install I manually looked for fcoeadm, fcoemon, etc. in /usr/sbin /mnt/sysimage/usr/sbin /mnt/runtime/usr/sbin and did not find them. I also did a system wide "find . | grep fcoe" from / and could not find any of the userspace apps. I think fcoe-utils is not being installed. I have attached the anaconda.log and storage.log files for your investigation. Thanks, Supreeth Hi, As always thanks for testing. Judging from the logs, your disks are being seen as regular disks (whereas anaconda should recognize them as being on FCoE to get anaconda to do things like install fcoe-utils Are you sure you were using the latest version of the updates.img ? I can find no reason looking at the logs + code for the latest updates.img to not recognize the disks as FCoE disks Here is a new updates.img with some debugging: http://people.atrpms.net/~hdegoede/updates-486244.img Can you please attach logs from an install done with this img ? There is no need to do a full install, after adding the FCoE SAN and the disks showing up in the partitioning UI, save the logs and I have what I need. I don't know how good your python is, maybe you can help debug this issue ? The problem is the udev_device_is_fcoe() function from /tmp/updates/storage/udev.py the info["ID_PATH"] and info["ID_BUS"] values used in there can also be found in the storage.log file by searching for ID_PATH resp. ID_BUS, I've done some local tests using the values for these from the latest storage.log you attached, and udev_device_is_fcoe() should recognize the disks from that install as FCoE, but somehow does not (the log does not contain: "sda is an fcoe disk" but instead contains "sda is a disk" Regards, Hans Created attachment 351504 [details]
anaconda.log July 13, 2009
Created attachment 351505 [details]
storage.log July 13, 2009
(In reply to comment #38) Hi Hans, I tried the new image you sent and the disks are now being recognized as fcoe disks. In storage.log I can see the "sdX is an fcoe disk" log messages. I looked into the older updates image and found that it was the exact same size as an older one. So I'm thinking that I might have been using an older image. It looks like there might have been a mixup either when I saved the updates file or when you updated it. In any case the disks are being recognized as fcoe disks now :-) Now that the disks are being recognized as fcoe disks, I let the installer run fully but could not find any trace of fcoe-utils again. What do I need to do in order to install fcoe-utils? I looked at customizing the packages to install from the UI but could not find fcoe-utils as an option there. Thanks, Supreeth (In reply to comment #41) > I tried the new image you sent and the disks are now being recognized as fcoe > disks. In storage.log I can see the "sdX is an fcoe disk" log messages. I > looked into the older updates image and found that it was the exact same size > as an older one. So I'm thinking that I might have been using an older image. > It looks like there might have been a mixup either when I saved the updates > file or when you updated it. In any case the disks are being recognized as fcoe > disks now :-) > Great, good to hear it is recognizing the disks now! > Now that the disks are being recognized as fcoe disks, I let the installer run > fully but could not find any trace of fcoe-utils again. What do I need to do in > order to install fcoe-utils? I looked at customizing the packages to install > from the UI but could not find fcoe-utils as an option there. > Hmm, are you doing the install from a F-11 dvd by chance ? That probably does not have fcoe-utils on there, you probably need to do a net install from a mirror of the "Everything" directory. Anyways, the part of anaconda which is responsible for getting fcoe-utils installed has not been changed by the FCoE patches, now that the disk is recognized as an FCoE disk, fcoe-utils should get installed if available in the repository. More interesting is the question did NM_CONTROLLED=NO get written to the ifcfg-eth# file on the installed system ? (In reply to comment #42) Hi Hans, > Hmm, are you doing the install from a F-11 dvd by chance ? That probably does > not have fcoe-utils on there, you probably need to do a net install from > a mirror of the "Everything" directory. > Anyways, the part of anaconda which is responsible for getting fcoe-utils > installed has not been changed by the FCoE patches, now that the disk is > recognized as an FCoE disk, fcoe-utils should get installed if available in the > repository. I can manually install fcoe-utils now. Yes, I was indeed using the F-11 DVD. I tried doing a net install but ran into some connection/proxy issues. So I went to the internal mirrors and got the RPMs for fcoe-utils and all dependencies (dcbd, libhbaapi, libhbalinux). Once the install completed I did a chroot to /mnt/sysimage and was able to install fcoe-utils without any issue. I have no doubt that that install from the repository will succeed as well and will give it a try once I resolve some proxy issues. > More interesting is the question did NM_CONTROLLED=NO get written to the > ifcfg-eth# file on the installed system ? Yes, NM_CONTROLLED=no is now written to th ifcfg-eth# file on the installed system too! Thanks! Supreeth Created attachment 358632 [details] PATCH: add FCoE boot support to dracut Hi Supreeth, Attached you find a patch for dracut to add support for booting from FCoE. Can you please test this. To test this: 1) Install F-11 using the latest updates.img I provided in this bug 2) When the install is completed switch to tty2 and chroot to /mnt/sysimage 3) Install dracut: http://koji.fedoraproject.org/koji/buildinfo?buildID=127300 4) Apply the attached patch under /usr/share/dracut 5) Regenerate the initrd using: dracut -f /boot/initrd-<kver>.img <kver> Where kver is the version string of the existing initrd 6) Leave the chroot 7) Reboot 8) In grub add: "fcoe=<mac-address-lowercase>:nodcb" to the kernel cmdline You can leave the : in the mac address as is so ie: fcoe=aa:bb:cc:dd:ee:ff:nodcb Regards, Hans Hi Hans, > Attached you find a patch for dracut to add support for booting from FCoE. Can > you please test this. Thank you for the patch and the instructions! I spent a few hours on this today, and in summary boot did not work. I will comment below after each instruction. > To test this: > 1) Install F-11 using the latest updates.img I provided in this bug Install completed successfully. > 2) When the install is completed switch to tty2 and chroot to /mnt/sysimage > 3) Install dracut: > http://koji.fedoraproject.org/koji/buildinfo?buildID=127300 I installed fcoe-utils and all dependencies, and after that I installed dracut. I had to install the bridge-utils and dash first before dracut installed. > 4) Apply the attached patch under /usr/share/dracut Patch applied successfully. > 5) Regenerate the initrd using: > dracut -f /boot/initrd-<kver>.img <kver> > Where kver is the version string of the existing initrd I did this and I know a new initrd was written. > 6) Leave the chroot > 7) Reboot > 8) In grub add: "fcoe=<mac-address-lowercase>:nodcb" to the kernel cmdline > You can leave the : in the mac address as is so ie: > fcoe=aa:bb:cc:dd:ee:ff:nodcb The Option ROM managed to pull the bootloader, and on the kernel cmdline (The 'c' option) the "fcoe=" command triggered a "Error 27: Command not found" The only way I was able to add this is by appending (The 'a' option)to the kernel commands. After this everything died with the message "Boot has failed. Sleeping forever." I ran the installer again so I could access the LUN. I uncompressed the initrd image and took a look inside. The fcoe modules are present in /lib/modules/... but I could not find fcoeadm or anything related. I also could not find any place where there was code to insmod the modules but then the init script has changed a bit from what it was in RHEL5. So I might not have looked in the right places. I would like to attach the initrd image for you but it is 12 MB in size. So please let me know if there are any specific files that you would like to see and I will send them to you. Thanks! Supreeth Created attachment 358783 [details]
Initrd image for failed FCoE boot
I hope this attaches. If not please let me know what files you'd like to go over.
Hi Supreeth, Using the a option to append the fcoe command is fine, can you please try again also adding "rdinitdebug rdbreak" to the commandline. This will drop you to a shell when dracut is done probing. From this shell can you please do: ip link show eth# Changing # until it shows the card you want to use for fcoe, and "paste" the output here? And also: cat /etc/udev/rules.d/60-fcoe.rules And again paste the output here ? Thanks, Hans Hi Hans, > Using the a option to append the fcoe command is fine, can you please try again > also adding "rdinitdebug rdbreak" to the commandline. This will drop you to a > shell when dracut is done probing. Cool! I did not know about this feature. This will be helpful for so many things :-) > From this shell can you please do: > ip link show eth# > Changing # until it shows the card you want to use for fcoe, and "paste" the > output here? I did this, and found that the link was still down. The output is ip link show eth1 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:1b:21:01:e5:2b brd ff:ff:ff:ff:ff:ff > And also: > cat /etc/udev/rules.d/60-fcoe.rules > And again paste the output here ? This file does not exist under /etc/udev/rules.d. So it looks like we're definitely missing some information here. Thanks, Supreeth > Thanks, > Hans Hi, Hmm, I think I know what is going on patch does not preserve rights, could you please do: chmod +x /usr/share/dracut/modules.d/95fcoe/* And then re-generate the initrd and try again ? Thanks, Hans p.s. For a faster turn around time, I'm hansg on irc, Freenode server, channel #anaconda or #dracut (In reply to comment #49) > Hi, > Hmm, I think I know what is going on patch does not preserve rights, could you > please do: > chmod +x /usr/share/dracut/modules.d/95fcoe/* > And then re-generate the initrd and try again ? I did this, and sure enough it worked. All FCoE LUNs are being discovered now, but the boot still fails saying "Root device not found". The contents of /etc/udev/rules.d/60-fcoe.rules is as follows ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="00:1b:21:01:e5:2b", RUN+="/sbin/fcoe-up $env{INTERFACE} nodcb In the grub menu, I changed the "root=" to root=/dev/sdc2. This time it gave me an error "mount: unknown filesystem type 'lvm2pv' " and then the usual "Boot has failed, sleeping forever" message popped up. I found this same behavior (unknown fs error)when I tried to mount /dev/sdc2 to simply access a couple of files. If the filesystem is forced to ext3 at install time maybe this will not happen but I am speculating :-) I will be out on vacation friday but back monday. cheers, Supreeth > Thanks, > Hans > p.s. > For a faster turn around time, I'm hansg on irc, Freenode server, channel > #anaconda or #dracut (In reply to comment #50) > (In reply to comment #49) > > please do: > > chmod +x /usr/share/dracut/modules.d/95fcoe/* > > And then re-generate the initrd and try again ? > > I did this, and sure enough it worked. All FCoE LUNs are being discovered now, Great, that means its working! > but the boot still fails saying "Root device not found". > In the grub menu, I changed the "root=" to root=/dev/sdc2. This time it gave me > an error > > "mount: unknown filesystem type 'lvm2pv' " > > and then the usual "Boot has failed, sleeping forever" message popped up. I > found this same behavior (unknown fs error)when I tried to mount /dev/sdc2 to > simply access a couple of files. If the filesystem is forced to ext3 at install > time maybe this will not happen but I am speculating :-) > You probably used default partitioning, which uses lvm. Add rdinitdebug and rdbreak again, and when dropped in the shell do "ls /dev/mapper" you should have a file named something like like this there: VolGroup-lv_root Now try again with: root=/dev/mapper/VolGroup-lv_root That should fix things, note that this should be present in the default grub.conf written by anaconda though. Hi Supreeth, dracut also has a mode called host only mode, where it generates an initrd tailored to the system it is running on. I need to add some code to dracut for this to check if a disk is an fcoe disk. For iscsi for example we have (bash code): is_iscsi() ( [[ -L /sys/dev/block/$1 ]] || return cd "$(readlink -f /sys/dev/block/$1)" until [[ -d sys || -d iscsi_session ]]; do cd .. done [[ -d iscsi_session ]] ) Which then can be called as "is_iscsi 8:0" (Where 8 is the scsi disk major number, use 8:16 for sdb 8:32 for sdc, etc). Could you please write and test a similar piece of bash code to detect FCoE disks? Hi Hans,
> You probably used default partitioning, which uses lvm. Add rdinitdebug and
> rdbreak again, and when dropped in the shell do "ls /dev/mapper"
> you should have a file named something like like this there:
> VolGroup-lv_root
> Now try again with:
> root=/dev/mapper/VolGroup-lv_root
> That should fix things, note that this should be present in the default
> grub.conf written by anaconda though.
I've been on vacation for a couple of days last week and so the late reply. Yes, I used default partitioning. After discovery I did a ls /dev/mapper, and all it displayed was "control." I could not see a VolGroup-lv_root or anything similar. control was the only file in the directory. I can see all the LUNs under /dev though.
Thanks,
Supreeth
Hi Supreeth, Hmm, so the FCoE part is working, but if the disks are FCoE somehow the lvm scanning is not working, can you try again please with the latest dracut (which now includes the FCoE support): http://koji.fedoraproject.org/koji/buildinfo?buildID=131035 Note remove "rhgb quiet" from the kernel cmdline and add: "nomodeset 1 rdbreak rdshell", then when wat the shell, do "ls /dev/mapper" again, if the nodes are there, try "ls /sysroot", if "ls /sysroot" shows a / filesystem, then exit the shell, and everything works. If things don't work, can you do "ls -a /tmp" and write down what is shown there please ? All bits for basic FCoE boot support in both dracut, and anaconda are in place now in rawhide, setting this to modified. What is bug 519933, and why does this bug block it? I'm unable to access #519933. Thanks, Supreeth Hi Supreeth, It's just a tracker bugzilla to link together requests for FCoE support from partners, as well as problem reports or anything that might potentially affect FCoE for RHEL6. Thank you for the clarification, Denise! -Supreeth Hi Supreeth, have you had a chance to test the latest rawhide trees? FCoE support has been in rawhide for 2 months already and should be working. Thanks. The following is a summary of FCoE boot related features in RHEL 6.0. It looks like we need both the installer and dracut to be updated with many FCoE related binaries. Also see BZ 563790 and BZ 563794. Anaconda Support: Anaconda UI support is present, EDD script is correctly used, and LLDP is successfully configured on the interface found in the EDD. I really like the “Use DCB” checkbox in Anaconda! However discovery fails due to lack of VLAN support. RHEL 6.0 is missing fipvlan and vconfig from the installer initrd. As a result it still calls fcoemon –c ethX which results in discovery failing as a result of VLAN not being configured. I was able to manually get install to be successful after adding the 8021q kernel module (for vconfig) and manually calling fipvlan (fipvlan –s ethX). We need to replace the call to fcoemon with calls to fipvlan in anaconda or else install will not be possible without manual intervention. Dracut Support: After install I am not able to see any required binaries in the initramfs image. We’re missing lldpad, dcbtool, fipvlan, fcoe_edd.sh, and vconfig. Maybe Dracut needs a specific command to include build an fcoe specific initrd but it might not be getting triggered as I bypassed the normal process with my manual steps. *What are the arguments given to Dracut to build an initrams with fcoe support?* The Option ROM finds the remote LUN with no problems at boot time and the initrd loads. After that everything stalls as the fcoe initiator is not brought up from within initrd. Thanks, Supreeth Hi supreeth, Thanks for testing! (In reply to comment #61) > The following is a summary of FCoE boot related features in RHEL 6.0. It looks > like we need both the installer and dracut to be updated with many FCoE related > binaries. Also see BZ 563790 and BZ 563794. > > Anaconda Support: > > Anaconda UI support is present, EDD script is correctly used, and LLDP is > successfully configured on the interface found in the EDD. I really like the > “Use DCB” checkbox in Anaconda! > > However discovery fails due to lack of VLAN support. RHEL 6.0 is missing > fipvlan and vconfig from the installer initrd. As a result it still calls > fcoemon –c ethX which results in discovery failing as a result of VLAN not > being configured. I was able to manually get install to be successful after > adding the 8021q kernel module (for vconfig) and manually calling fipvlan > (fipvlan –s ethX). We need to replace the call to fcoemon with calls to fipvlan > in anaconda or else install will not be possible without manual intervention. > Oh, this is caused by a bit of misunderstanding between me and Eric Multanen, Eric mentioned in bug 563794, that we should switch from fcoemon to fipvlan for dracut (the normal boot initrd), but did not say the same should be done for anaconda. But doing the same thing in anaconda and the boot inird makes sense, I should have asked. So assuming anaconda should do the same as dracut, I'll write a patch to bring up an interface for fcoe+dcb like this: start lldpad (if not done already) dcbtool sc ethX dcb on dcbtool sc ethX app:fcoe e:1 a:1 w:1 fipvlan ethX -c -s You're also talking about vconfig, I assume that this is needed / called by fipvlan and that vconfig (and the 8021q kernel module) thus should be part of the installer environment too? This is new information for me, and AFAIK dracut gets this wrong too. I'll include adding these to the installer environment to the patch for switching from fcoemon to fipvlan. > Dracut Support: > > After install I am not able to see any required binaries in the initramfs > image. We’re missing lldpad, dcbtool, fipvlan, fcoe_edd.sh, and vconfig. Maybe > Dracut needs a specific command to include build an fcoe specific initrd but it > might not be getting triggered as I bypassed the normal process with my manual > steps. *What are the arguments given to Dracut to build an initrams with fcoe > support?* > fcoe support is part of the dracut-network package, which does not get installed by default. When the code for bringing up the anaconda disks in FCoE is fixed to work (rather then you doing it manually) anaconda will automatically add dracut-network and fcoe-utils to the list of packages to install. Currently the installed system likely is not only missing dracut-network but also fcoe-utils. You can check the presence of the necessary tools in an initrd by generating an initrd on a normally installed system, which after wards has dracut-network and fcoe-utils installed. Note that the initrd will be missing vconfig and the 8021q kernel module as I was not aware that those were needed by fipvlan. Also the initrd will not include fcoe_edd.sh, as that will only be used during install time, after which the info about which nic to use will be stored in grub.conf and passed to the initrd through the kernel cmdline. The how and why of this has been discussed in detail in bug 513018 comment 24 . > The Option ROM finds the remote LUN with no problems at boot time and the > initrd loads. After that everything stalls as the fcoe initiator is not brought > up from within initrd. Yes, that is to be expected with fcoe support missing from it (and it probably also is not being passed the right kernel cmdline parameters). So moving forward with this, I'll do a patch for the anaconda bits. Since various files are missing from the installer environment I cannot easily fix this with an updates.img, so my plan is to get the fix in place asap, and then you'll have to wait till there is a new snapshot. For the dracut problem, I'll ask Harald Hoyer to do a new dracut fixing the missing of vconfig thing asap. Then you should be able to test this bit, by building a new initrd on a system with the exact same kernel as the FCoE test install (and dracut-network and fcoe-utils installed), and using that. When testing with a new initrd make sure you have the following on the kernel cmdline: ifname=eth#:AA:BB:CC:DD:EE:FF netroot=fcoe:eth#:dcb Thanks & Regards, Hans p.s. Note that due to how stage1 and stage2 get "glued" together that /sbin/vconfig will be /usr/sbin/fipvlan under the installer environment I hope that fipvlan can handle this. Hi Hans, Thank you very much for the detailed note. I think the changes you're making should take care of all the details for FCoE boot. Looking forward to seeing these changes in the next snapshot. Thanks! Supreeth *** Bug 588598 has been marked as a duplicate of this bug. *** Hi Hans, I tried out the Anaconda changes and they work really well. I was able to get a full install to succeed without any manual intervention which is awesome! I did however run into issues after install because the initramfs does not have any FCoE related binaries (lldpad, fcoe-utils, vconfig, and 8021q module). It looks like the dracut changes did not make it into the new snapshot yet(or dracut created initramfs without the command option for including fcoe). Could you please let us know when these binaries will be included in dracut? Without these we will not be able to test any of the initramfs portions of remote booting via FCoE. I will also reference this in the dracut BZ 563794. Thanks, Supreeth (In reply to comment #67) > Hi Hans, > > I tried out the Anaconda changes and they work really well. I was able to get a > full install to succeed without any manual intervention which is awesome! > Good to hear. > I did however run into issues after install because the initramfs does not have > any FCoE related binaries (lldpad, fcoe-utils, vconfig, and 8021q module). It > looks like the dracut changes did not make it into the new snapshot yet. Then only vconfig and the 8021q module would be missing, so this seems to be another problem. I hope you have access to the filesystem of the installed system in some other way? Note that if this is a lot of trouble, don't bother I need you to start a fresh install anyway, and I can also get all needed info from the logs from that install. So if you do have access to the filesystem of the installed system in some other way, can you please check if dracut-network and fcoe-utils were installed, I have the feeling they were not (likely caused by the FCoE disk not being recognized as such). To check if they were installed, check for the presence of the following files: /usr/share/dracut/modules.d/95fcoe/install /usr/sbin/fipvlan Also, could you please collect the following log files from the filesystem and attach them here? /root/install.log /var/log/anaconda* Then start a new install again using FCoE for / and /boot, and when it is installing packages, switch to tty2 (ctrl + alt + f2) and do: tar cvfz udevdb.tgz /dev/.udev/db tar cvfz logs.tgz /tmp/*log and attach the resulting tgz files here (you can use for example scp to get them out of the installer environment). Note: with some luck this can be fixed using an updates.img Created attachment 414606 [details]
install logs from the old install process as requested
Created attachment 414607 [details]
Install logs from the new install process
Created attachment 414608 [details]
udevb.tgz from time of install
Created attachment 414609 [details]
logs.tgz from installing system
(In reply to comment #68) Hi Hans, > I hope you have access to the filesystem of the installed system in some other > way? I do. I can connect to the remote LUN by using open-fcoe initiator in the installer and can examine the contents. > /usr/share/dracut/modules.d/95fcoe/install > /usr/sbin/fipvlan These files are not present in the installed system as you suspected. So it indeed looks as though fcoe-utils and dracut's FCoE module were not installed. However I also realized that on my first attempt I had not explicitly chosen to install the "FCoE Client" package and so am not surprised fipvlan was not present. On the second attempt I chose the FCoE client package and fipvlan is now found in the installed filesystem and so are lldpad and vconfig. However I cannot find the 95fcoe directory yet again. I also extracted the initramfs img file and could not find any fcoe related binaries in there. > Also, could you please collect the following log files from the filesystem and > attach them here? > /root/install.log > /var/log/anaconda* Please find these attached. I have attached the logs from both the old install and the new install with fcoe packages installed. > Then start a new install again using FCoE for / and /boot, and when it is > installing packages, switch to tty2 (ctrl + alt + f2) and > do: > tar cvfz udevdb.tgz /dev/.udev/db > tar cvfz logs.tgz /tmp/*log > and attach the resulting tgz files here (you can use for example scp to > get them out of the installer environment). Please find these attached as requested. These are from the new install taken around the time install was midway through. > Note: with some luck this can be fixed using an updates.img I hope so too. I am really pleased with the way anaconda is handling install via FCoE and we are all eagerly awaiting the changes to overcome what looks to be the final hurdle in the process. Many thanks for your quick investigation! Please let me know if you need me to send you any other logs or conduct further experiments for your investigation. cheers, Supreeth Supreeth, Many thanks for all the logs. As I expected / feared already anaconda is not recognizing the disk as an FCoE disk, but instead it sees it as a regular scsi disk, and thus it never adds fcoe-utils and dracut-network to the package set automatically. anaconda determines if a disk is an FCoE disk or not based on the ID_PATH udev database property. In the logs you attached that is: 'ID_PATH': 'pci-0000:00:1f.1-scsi-0:0:0:0' Which is, well, wrong, or at least different from earlier testing. I think this has to do with recent udev changes. PATH_ID used to be determined by a shell script, but now it is determined by an executable written in C. I'll prep an updates.img with a workaround as a solution for this for now. Regards, Hans Ok here is an updates.img working around the udev id_path issue: http://people.fedoraproject.org/~jwrdegoede/fcoe-updates.img As usual, to use this pass updates=http://people.fedoraproject.org/~jwrdegoede/fcoe-updates.img On the bootloader cmdline. Created attachment 414660 [details]
path_id C-code
Hi,
Ok, so here is the culprit code (it needs libudev-devel to compile and to be linked against libudev). I hope you can take a look at this when you have some
time as debugging this without hardware access is very hard.
The problem function is handle_scsi_fibre_channel(), which does not
seem to recognize the fcoe disk as being fibre channel.
Once compiled to test it run it like this:
id_path /devices/virtual/net/eth2.168-fcoe/host4/rport-4:0-4/target4:0:0/4:0:0:0/block/sda
Note you can get the path to pass in by doing:
readlink -f /sys/block/sda
And then stripping of the /sys at the front.
The output should be something in the following form:
pci-eth#-fc-${id}
At least that is what the old shell code spew out and thus what anaconda expects.
Thanks,
Hans
Hi Hans, Thanks for the quick updates! I am going to try the new updates image today and I am going to try find some time to tackle the C code debugging as well (most likely tomorrow). I will post updates as soon as possible. Thanks! Supreeth Hi Hans, I ran into a couple of issues when trying out the fcoe-updates.img. I cannot access the people.fedoraprojects.org site from within Intel so I copied the img file onto a local webserver (I renamed it as updates.img). First, the updates.img did not seem to get copied even though /tmp/anaconda.log has the message "Transferring /xx/updates.img" I have verified that the updates.img is accessible by transferring it via sftp. After this I also ran into an issue in the UI after "Add FCoE SAN". The remote disk(s) is not listed even though discovery is successful. /dev has the correct device info after discovery but the UI is not displaying the device. This issue does not happen if I do not attempt to add the "linux updates=" command line option. I am attaching the /tmp/anaconda.log and /tmp/storage.log for your scrutiny. Thanks, Supreeth Created attachment 414674 [details]
Anaconda.log for failing updates.img copy.
Created attachment 414675 [details]
Storage.log for case where discovery succeeds but LUN not displayed in UI
Hi Supreeth,
> Hi Hans,
>
> After this I also ran into an issue in the UI after "Add FCoE SAN". The remote
> disk(s) is not listed even though discovery is successful. /dev has the correct
> device info after discovery but the UI is not displaying the device. This issue
> does not happen if I do not attempt to add the "linux updates=" command line
> option.
This actually is good news, this means anaconda now recognizes the FCoE drive as an FCoE drive and thus lists it in the "Other SAN Devices" tab where it belongs.
So the updates.img is doing what it should, try switching to the "Other SAN Devices" tab and selecting the drive there.
Regards,
Hans
Created attachment 414910 [details]
init.log for failed boot.
Hi Hans,
> This actually is good news, this means anaconda now recognizes the FCoE drive
> as an FCoE drive and thus lists it in the "Other SAN Devices" tab where it
> belongs.
> So the updates.img is doing what it should, try switching to the "Other SAN
> Devices" tab and selecting the drive there.
> Regards,
> Hans
You're right! I was able to find the drive under "Other SAN devices" tab. The install completed and all the necessary files for FCoE boot (including 95fcoe dracut module) are present in the root filesystem and initramfs img.
On to the actual boot itself. Once the Option ROM fetched the initramfs, dracut was stalling. So I enabled debug and used the rdbreak/rdinitdebug options to get into a shell. Upon scrutinizing init.log I found the following
+ handler=fcoe
+ handler=fcoe
+ handler=/sbin/fcoeroot
+ [ -z fcoe:eth2:dcb ]
+ [ ! -e /sbin/fcoeroot ]
+ die No handler for netroot type 'fcoe:eth2:dcb'
+ echo <1>dracut: FATAL: No handler for netroot type 'fcoe:eth2:dcb'
+ echo <1>dracut: Refusing to continue
+ echo dracut: FATAL: No handler for netroot type 'fcoe:eth2:dcb'
dracut: FATAL: No handler for netroot type 'fcoe:eth2:dcb'
+ echo dracut: Refusing to continue
dracut: Refusing to continue
+ exit 1
The /sbin/fcoeroot script is not present. I do however find the regular netroot and iscsiroot scripts. I am attaching the init.log for you. Please let me know if you need any other files or need me to run further experiments. I am going to try my best to get some debugging done on the handle_scsi_fibre_channel() function today as well.
Thanks,
Supreeth
Upon further scrutiny: It looks like dracut should call /sbin/fcoe-up instead of /sbin/fcoeroot. The fcoe-up script has all the required components to bring up the fcoe interface. Thanks, Supreeth Hi, (In reply to comment #84) > Upon further scrutiny: It looks like dracut should call /sbin/fcoe-up instead > of /sbin/fcoeroot. The fcoe-up script has all the required components to bring > up the fcoe interface. > Yes, but it looks like the generic netroot= parsing code wants to see a /sbin/fcoeroot. Can you try creating an empty sh script named /sbin/fcoeroot (and make it executable) which just contains: exit 0 And add that to the initrd, to unpack the initrd do: mkdir t cd t zcat .../initrd-....img | cpio -i Then add t/sbin/fcoeroot And do: find . | cpio --quiet -c -o | gzip -9 > .../initrd-....img Thanks & Regards, Hans p.s. You can discuss things more interactively with me on irc, I'm on the freenode network and you can find me in #anaconda, #dracut and this week also in #fcoe, I'm hansg there. ps the fcoe-up script will be called from a udev rule, where as /sbin/fcoeroot gets called when there is an ip configuration complete event (which likely never happens in the fcoe case, as no ip config is done from the initrd for fcoe). Or so I think. I've asked Harald Hoyer to look further into this, but for now creating an empty /sbin/fcoeroot as described is a good starting point for debugging this further. Created attachment 414932 [details]
init.log for failed boot
Thanks, Hans. I added the fcoeroot script and the behavior changed. It now seems to be bringing up the eth0 interface instead of eth2. I have attached the init.log file. From there I find this + handler=fcoe + handler=fcoe + handler=/sbin/fcoeroot + [ -z fcoe:eth2:dcb ] + [ ! -e /sbin/fcoeroot ] + [ -z ] + IFACES=eth0 + . /tmp/net.eth0.up + ip addr add 10.23.21.47/255.255.255.0 broadcast 10.23.21.255 dev eth0 + [ -e /tmp/net.eth0.gw ] + . /tmp/net.eth0.gw + ip route add default via 10.23.21.1 dev eth0 + [ -e /tmp/net.eth0.hostname ] + [ -e /tmp/net.eth0.resolv.conf ] + cp -f /tmp/net.eth0.resolv.conf /etc/resolv.conf + [ -e /tmp/net.eth0.override ] + [ -e /tmp/dhclient.eth0.dhcpopts ] + . /tmp/dhclient.eth0.dhcpopts + new_broadcast_address=10.23.21.255 + new_dhcp_lease_time=43200 + new_dhcp_message_type=5 + new_dhcp_server_identifier=10.22.226.254 + new_domain_name=jf.intel.com + new_domain_name_servers=10.22.227.254 10.22.226.254 143.181.9.1 + new_expiry=1274220932 + new_filename=BStrap/X86pc/BStrap.0 + new_ip_address=10.23.21.47 + new_network_number=10.23.21.0 + new_routers=10.23.21.1 + new_subnet_mask=255.255.255.0 + [ -n 10.23.21.1 ] + dest=10.23.21.1 + [ -n ] + [ -z 10.23.21.1 ] + [ -n 10.23.21.1 ] + arping -q -f -w 60 -I eth0 10.23.21.1 + source_all netroot + local f + [ netroot ] + [ -d /netroot ] + return + /sbin/fcoeroot eth0 fcoe:eth2:dcb /sysroot + [ -f /tmp/dhclient.eth0.lease ] + cp /tmp/dhclient.eth0.lease /tmp/net.eth0.lease + [ -f /tmp/dhclient.eth0.dhcpopts ] + cp /tmp/dhclient.eth0.dhcpopts /tmp/net.eth0.dhcpopts + [ ! -f /tmp/net.ifaces ] + echo eth0 + exit 0 Thanks, Supreeth PS: I will try to find you on IRC. I don't have an IRC client installed but will work on it! Hi, Thanks for the log I see now that there is a bug in the anaconda dracut cmdline generating code. The current dracut does not expect a netroot=fcoe:eth2:dcb argument, but rather an fcoe=eth2:dcb argument, this will also get rid of dracut bringing online eth0 as it sees a netroot= argument, and the need for the empty /sbin/fcoeroot binary. I'll do a new updates.img including a fix for this tomorrow. For now if you edit the cmdline in grub.conf and change the netroot=fcoe:eth2:dcb to fcoe=eth2:dcb Things should work better. Thanks, Hans Hi Hans, This worked like a charm :-) I was able to successfully boot from the remote LUN with the stub fcoeroot script and after changing netroot= to fcoe=eth2:dcb. I am not seeing any other issues as of now. Looking forward to your new updates.img. I will try spend some time debugging the C code you attached. Hopefully I'll have an update on that for you tomorrow! So to summarize, the changes we need to work on in the ISOs for fully automated fcoe boot solution are 1. Include either the fix or workaround to anaconda viewing the remote disk as an fcoe disk and not basic disk. 2. Ensure kernel cmd line has fcoe=ethX:dcb instead of netroot=fcoe:ethX:dcb 3. Have the fcoeroot script in /sbin of initrd. Thanks again for all your help and assistance today! cheers, Supreeth (In reply to comment #90) > Hi Hans, > > This worked like a charm :-) I was able to successfully boot from the remote > LUN with the stub fcoeroot script and after changing netroot= to fcoe=eth2:dcb. > I am not seeing any other issues as of now. That is very good news! > So to summarize, the changes we need to work on in the ISOs for fully automated > fcoe boot solution are > > 1. Include either the fix or workaround to anaconda viewing the remote disk as > an fcoe disk and not basic disk. Correct, I would like to note that it is greatly prefered to fix id_path, the current workaround is a bit of a hack which depends on the output of "readlink -f /sys/block/sdx" following a certain pattern for fcoe disks. To be precise the python code of the workaround looks like this: def udev_device_is_fcoe(info): return "fcoe" in info["sysfs_path"] def udev_device_get_fcoe_nic(info): return info["sysfs_path"].split("/")[4].split(".")[0] Where info["sysfs_path"] is the output of "readlink -f /sys/block/sdx" with the /sys prefix removed. I'm afraid esp the second function will break in some cases, so I would much rather see id_path fixed. > 2. Ensure kernel cmd line has fcoe=ethX:dcb instead of netroot=fcoe:ethX:dcb Right the new updates.img (see below) should fix this, as will the next anaconda build for rhel-6 > 3. Have the fcoeroot script in /sbin of initrd. No, I believe the need for that was caused by the mistaken use of netroot= please test a fresh install with the new updates.img, I believe this will bootup without issues, iow no dracut changes are necessary. Here is an updated updates.img which should result in anaconda writing the proper kernel cmdline dracut option for fcoe, and still including the path_id workaround: http://people.fedoraproject.org/~jwrdegoede/fcoe-updates.img Hi Hans, > Correct, I would like to note that it is greatly prefered to fix id_path, > the current workaround is a bit of a hack which depends on the output > of "readlink -f /sys/block/sdx" following a certain pattern for fcoe disks. > To be precise the python code of the workaround looks like this: > def udev_device_is_fcoe(info): > return "fcoe" in info["sysfs_path"] > def udev_device_get_fcoe_nic(info): > return info["sysfs_path"].split("/")[4].split(".")[0] > Where info["sysfs_path"] is the output of "readlink -f /sys/block/sdx" > with the /sys prefix removed. I'm afraid esp the second function > will break in some cases, so I would much rather see id_path fixed. I agree. I would like to see ths fixed too rather than use a workaround unless there is no other option. I will work on debugging this and hopefully get a fix soon. > Here is an updated updates.img which should result in anaconda writing > the proper kernel cmdline dracut option for fcoe, and still including > the path_id workaround: > http://people.fedoraproject.org/~jwrdegoede/fcoe-updates.img Awesome. let me give this a whirl after our Red Hat FCoE call in a few minutes and post an update! Thanks, Supreeth Hi Supreeth, Have you had a chance yet to look into the id_path issue yet? We really need some help in getting this fixed. Thanks & Regards, Hans Hi Hans, I should have updated this BZ earlier. I took a look at the id_path and did not find any issues. We think the problem is being caused by an issue which is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=595522 Thanks, Supreeth Hi Supreeth, QA gave me access to a FCoE equipped machine, and id_path returns the following there: [root@storageqe-03 ~]# /lib/udev/path_id /devices/virtual/net/eth4.802-fcoe/host3/rport-3:0-4/target3:0:1/3:0:1:0/block/sde ID_PATH=fc-0x500a0982871965c8:0x0000000000000000 However anaconda expects (and that is what the older id_path gave us AFAIK): pci-eth#-fc-${id} Where as this looks like it is just fc-${id} Which means: 1) We cannot distinguish regular fc and fcoe this way 2) We cannot determine which nic the disk is attached to this way Regards, Hans Hi Hans, I see what you're saying. So I guess we want to change id_path so handle_scsi_fibre_channel prepends a pci-eth#- to the current string or write a new handle_scsi_fcoe function? I am not a udev expert by any means however. is there a udev function we can use to get the eth name associated with the fcoe disk so we can include the correct number in the eth# part? Thanks, Supreeth Hi Supreeth, I've investigated this further and the problem is that with virtual devices like a vlan device, there is no link in sysfs between the vlan device and the nic over which it is running. So the id-path output is correctly only: fc-0x500a0982871965c8:0x0000000000000000 Thus I've written an anaconda patch combining this id path output with some direct sysfs path parsing to get the necessary information about the disk. Note that this patch also requires bug 595522 to be resolved. Regards, Hans p.s. I've noticed that the wwpn (and the scsi id) for all disks attached to our FCoE test machine is the same, this does not seem right: [root@storageqe-03 ~]# cat /sys/devices/virtual/net/eth4.802-fcoe/host3/rport-3:0-3/target3:0:0/fc_transport/target3:0:0/port_name 0x500a0983971965c8 [root@storageqe-03 ~]# cat /sys/devices/virtual/net/eth4.802-fcoe/host3/rport-3:0-4/target3:0:1/fc_transport/target3:0:1/port_name 0x500a0982871965c8 [root@storageqe-03 ~]# cat /sys/devices/virtual/net/eth6.802-fcoe/host4/rport-4:0-4/target4:0:0/fc_transport/target4:0:0/port_name 0x500a0982871965c8 [root@storageqe-03 ~]# cat /sys/devices/virtual/net/eth6.802-fcoe/host4/rport-4:0-5/target4:0:1/fc_transport/target4:0:1/port_name 0x500a0983971965c8 This may be a different bug, (I did search) but I just got bit when installing Mint9-x64 on /dev/sdb. I have 4 drives in this box. It did a system scan at the end of the install and found the other bootable mdv install on /dev/sdd, but while it got the (hd3,1) statement for the kernel line correct, it blew the line for the initrd, placing it on (hd3,0) I was able to boot this mdv install by live editing that line to read (hd3,1) also, and then it booted. So it would appear the scan the system utility gets it wrong. I have now made /dev/sdb1/grub/grub.cfg editable long enough to fix it, then made it read-only and set the +i attribute too, so I'm fine till I need to do another kernel, but a new bee is gonna be flummoxed. Hi Hans, Here's an update after testing Beta 2 for full DVD install: Verified that the udev fix is in place and that the kernel cmd line is generated correctly in grub.conf. I however ran into a failure when trying to find the root device as the FCoE interface went down for some reason. I think it was a link issue but will investigate further and report back. This looks good so far! We'll have an update on the converged traffic BZ tomorrow. Thanks! Supreeth Supreeth, did this turn out to be a link issue, or is there a problem lurking here? Thanks! Denise I think there is more than a link issue here. I filed BZ 611976 for this issue. I'm going to try out the patch harald proposed after our weekly call this morning. Thanks! Supreeth Hi Supreeth, comment #103 indicates positive test result. Can we move this bug to VERIFIED state or you'd like to do more testing before declaring this feature fixed? Hi Alex, We definitely need to test this in Snapshot 7 before moving it to VERIFIED. There are a couple of fixes we are expecting in snapshot 7 that relate directly to boot (BZ 611796, BZ 602330). Thanks, Supreeth moving to verified based on comment 103 and bugs referenced in comment 107 have also addressed Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |