Bug 486244

Summary: [Intel 6.0 FEAT] Add FCoE boot capability
Product: Red Hat Enterprise Linux 6 Reporter: John Ronciak <john.ronciak>
Component: anacondaAssignee: Hans de Goede <hdegoede>
Status: CLOSED CURRENTRELEASE QA Contact: Release Test Team <release-test-team-automation>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: andriusb, atodorov, berthiaume_wayne, ddumas, ed.ciechanowski, eric.w.multanen, gene.heskett, hdegoede, jane.lv, jlv, john.ronciak, jvillalo, keve.a.gabbert, luyu, mchristi, minh.t.pham, robert.w.love, ross.b.brattain, rpacheco, rwilliam, snagar, supreeth.venkataraman, syeghiay, yi.zou
Target Milestone: betaKeywords: FutureFeature
Target Release: 6.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: anaconda-13.21.50-8 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-10 19:36:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 513011, 513018, 569766, 593744, 618875, 619604, 619605, 692939    
Bug Blocks: 510435, 519933, 554559    
Attachments:
Description Flags
/tmp/anaconda.log
none
/tmp/storage.log
none
Anaconda dump after unhandled eception.
none
Anaconda.log file July 10, 2009
none
Storage.log file July 10, 2009
none
anaconda.log July 13, 2009
none
storage.log July 13, 2009
none
PATCH: add FCoE boot support to dracut
none
Initrd image for failed FCoE boot
none
install logs from the old install process as requested
none
Install logs from the new install process
none
udevb.tgz from time of install
none
logs.tgz from installing system
none
path_id C-code
none
Anaconda.log for failing updates.img copy.
none
Storage.log for case where discovery succeeds but LUN not displayed in UI
none
init.log for failed boot.
none
init.log for failed boot none

Description John Ronciak 2009-02-19 00:25:17 UTC
Description of problem: Add FCoE boot capability to the system.

This feature is not there today but Mike Christie believes that it should be and says that the installer is making changes now and can also be updated to support this.

Comment 1 Chris Lumens 2009-02-19 16:20:33 UTC
So, what exactly is required to support this feature?  Do we have hardware and documentation in the Westford office that we can use to develop and test support?

Comment 2 Supreeth 2009-03-25 15:42:34 UTC
There are two parts to this feature. Support during install time, and support during boot time.

Installation:

1. The default kernel needs to have the ixgbe, scsi_transport_fc, libfc, and the fcoe modules pre-loaded.
2. Anaconda must perform discovery of Fibre Channel targets/LUNs using the open-fcoe initiator and the user must be given an option to install to the remote disk (This could probably be similar to the iSCSI solution under the "advanced storage configuration" option).
3. To do the discovery, an fcoe interface needs to be created by writing the interface name that is connected to the fabric to /sys/module/fcoe/parameters/create.

Post-Installation:

Changes need to be made to the initrd image to support fcoe boot. At boot time, the initrd image will be transferred to the initiator machine using an Int13h connection from the FCoE boot Option ROM (Similar to iSCSI boot). To complete the boot, the open-fcoe initiator should be started from within the initrd image so that the correct root partition is mounted. Once this is done, we simply chroot to the new root partition and run /sbin/init. For this to happen changes need to be made to mkinitrd so that

1. init should load the ixgbe, scsi_trabsport_fc, libfc, and the fcoe modules soon after the SCSI module is loaded.
2. Ensure that networking is started.
3. Ensure that the correct ethernet interface is brought up.
4. Initiate a fcoe connection through that interface and perform discovery.
5. Mount the correct root and run /sbin/init.

The above steps are very similar to RHEL's iSCSI boot solution which could serve as the reference for developing this feature.

Comment 3 Hans de Goede 2009-06-08 08:34:30 UTC
(In reply to comment #2)
> There are two parts to this feature. Support during install time, and support
> during boot time.
> 
> Installation:
> 

Hi,

As discussed by mail (some time ago), I'll be working on adding support for this to Fedora (F-12). As also discussed I do not have access to hardware to test this, so testing will need to be done by you.

I've written a first version of support of the "Installation" part of this feature. Here is an update.img which can be used together with F-11, which
adds the ability to configure FCoE under the "Advanced Storage" button in
the initial partitioning screen:
http://people.atrpms.net/~hdegoede/updates-486244.img

Please give this a try, note that you will still need to manually update
your initrd after the install before rebooting.

Comment 4 Supreeth 2009-06-08 14:11:10 UTC
Thanks, Hans. I will give it a try and report back.

cheers,
Supreeth

Comment 5 Supreeth 2009-06-09 16:42:18 UTC
Hans,

I tried the update image and could not get it to work. So there's either some problem with the image itself or I am doing something wrong. Here's a step by step account of what I'm doing

1. I'm using instructions from http://fedoraproject.org/wiki/Anaconda/Updates
2. I transferred the contents of the image into a floppy drive using dd as described in the link.
3. I ran the Fedora 11 installation CD, pressed TAB and included "linux updates" in the kernel command line.
4. I selected my floppy drive as the update device when asked (/dev/fd0)
4. The box on the screen said "Reading Anaconda Updates".
5. The usual install screen showed up, and there was no fcoe configuration in the "Advanced Storage" screen only iSCSI

I wrote down the kernel messages as anaconda started and found some FATAL: messages. I'm including the messages I saw below (Note: I just copied these from the screen and these are not from a serial console).

-----
INFO: kernel command line: initrd=initrd.img linux updates BOOT_IMAGE=vmlinuz

...

INFO:  UPDATES device is /dev/fd0
FATAL: Module dm_mod not found
FATAL: Module dm_zero not found
FATAL: Module dm_mirror not found
FATAL: Module dm_snapshot not found

INFO: Running anaconda script /usr/bin/anaconda

...

INFO: anaconda called with cmdline = ['/usr/bin/anaconda', '--stage2', 'cdrom:///dev/sr0:/mnt/stage2', '--graphical', '--selinux']

...

ERROR: Error running xrandr: None
ERROR: Exception when running xrandr: Error running xrandr: None
INFO: Starting graphical installation
...

WARNING: step installtype does not exist
WARNING: step confirminstall does not exist
WARNING: step complete does not exist
...

------- 

Please let me know if I need to do any extra steps in addition to what's on the link above. If you need any other info/traces please let me know.

Thanks,
Supreeth

Comment 6 Hans de Goede 2009-06-10 07:11:06 UTC
(In reply to comment #5)
> Hans,
> 
> I tried the update image and could not get it to work. 

<snip>

Hmm, it looks like its not finding your updates.img. I'm not sure if the
updates.img I provided is suitable for use on a floppy (its a compressed
cpio archive).

Try using it like this (from the boot command line):
linux updates=http://people.atrpms.net/~hdegoede/updates-486244.img

This requires the system has internet access ofcourse. If it doesn't you
can put it on a local http server, if this is a problem let me know and I'll
try the floppy method locally and see if I can get that to work.

Once you've started anaconda this way you can check if you actually have
the updates.img by switching to tty2 (ctrl+alt+f2) and then doing:
ls /tmp/updates

If that dir is empty somehow you did not get the updates.img

Comment 7 Supreeth 2009-06-10 19:26:55 UTC
Thanks, Hans. I put the updates.img on a USB drive and this time it read the image, but the results are the same as what I found yesterday. I looked into /tmp/updates and the image was there. All the messages I posted in comment#5 (including the FATAL ones) are exactly the same.

My test machine is not connected to the Internet to try your other suggestion, but at least we know for sure that the image is being read from the USB. I also verified that the image is not corrupted by uncompressing all the files within to a separate directory and cpio had no issues in uncompresing them. 

Thanks,
Supreeth

Comment 8 Supreeth 2009-06-15 14:49:27 UTC
Hi, are there any updates on this?

Thanks,
Supreeth

Comment 9 Hans de Goede 2009-06-15 18:34:35 UTC
(In reply to comment #7)
> Thanks, Hans. I put the updates.img on a USB drive and this time it read the
> image, but the results are the same as what I found yesterday. I looked into
> /tmp/updates and the image was there. All the messages I posted in comment#5
> (including the FATAL ones) are exactly the same.
> 
> My test machine is not connected to the Internet to try your other suggestion,
> but at least we know for sure that the image is being read from the USB. I also
> verified that the image is not corrupted by uncompressing all the files within
> to a separate directory and cpio had no issues in uncompresing them. 
> 
> Thanks,
> Supreeth  

Hmm,

So you saw the same files you get if you un cpio the updates.img under /mnt/updates, right ?

Strange I just double checked and that updates.img + F-11 gold does give me the
FCoE option. This might be arch specific somehow, are you using an i386 (well i586 now a days actually) install CD ? If not could you try that?

Also could you (in the non working case) try:
modprobe fcoe
ls -l /sys/module/fcoe

On tty 2 and paste the output here ?

Comment 10 Supreeth 2009-06-16 00:15:43 UTC
Thanks Hans. Today I was able to get it working by copying the update image on a local http server I set up and  giving the URL of the updates file to "linux updates." Once I did this the /tmp/updates directory had the correct glade/python files. So I got to the "Advanced storage" screen and it gave me the "Add FCoE SAN" option. I chose that and here's what happened.

1. Discovery is successful and I can see all the LUNs I expect to see. 

2. The "What Drive would you like to boot this installation from?" box is inactive. When I click "next" it gives me the "Must select a drive to uses the bootable device" message and does not let me proceed any further. The only way I can proceed any further is  if I choose the "create custom layout" option. 

3. If I go to the "create custom layout" option and manually configure a partition with ext3 filesystem on the remote LUN I choose (say /dev/sdb), it asks me if I want to write the changes to disk. When I say yes, it tries to write the config to the remote disk but fails with a box that says "Storage Activation Failed". If I click on "Details" it says "Error opening /dev/sdb: No such device or address". If I go to tty2 and look at the partitions /dev/sdb is present and is accessible. At this point I only have the option to "File Bug" or "Exit installer."

Upon quick investigation it looks like some I/O error happened and we'll be investigating the network traces to see what's going on. I'll update the BZ when we have some new info. Could you please look into #2 above?

Comment 11 Hans de Goede 2009-06-16 08:08:59 UTC
About #2, can you please attach /tmp/anaconda.log and /tmp/storage.log from after adding the drive (you can get to them from tty2) ?

Comment 12 Supreeth 2009-06-16 14:44:29 UTC
Created attachment 348122 [details]
/tmp/anaconda.log

Attaching /tmp/anaconda.log

Comment 13 Supreeth 2009-06-16 14:45:12 UTC
Created attachment 348123 [details]
/tmp/storage.log

Attaching /tmp/storage.log

Comment 14 Hans de Goede 2009-06-17 08:52:23 UTC
Hi,

I've found the issue with the grayed out boot device selection and updated the 
updates.img to include a fix.

Comment 15 Supreeth 2009-06-18 17:48:00 UTC
Thanks, Hans. The new image you sent looks good. I was able to successfully install to a remote LUN using FCoE which is really cool! 

Your fix to the grayed out boot device selection also seems to have resolved the I/O error I had reported earlier. Do you have any ideas on how the boot device box being grayed out could this could have triggered the I/O issue?

Thanks!
Supreeth

Comment 16 Hans de Goede 2009-06-30 20:26:03 UTC
Hi Supreeth,

Any news / progress on this ?

Comment 17 Supreeth 2009-06-30 21:36:22 UTC
Hi Hans,

I think there was a miscomm where both of us was waiting for the other to reply. The installation is successful as I'd mentioned earlier. The next step would be for you guys to do the mkinitrd changes to facilitate auto boot. The changes should be very similar to your iSCSI solution.

The difference between the iSCSI and FCoE solution would be that iSCSI solution reads parameters from the iBFT whereas the FCoE solution will be reading it from the EDD structure as I described in a mail sent to you and Mike Christie on 5/28/2009. Mike also suggested how the info from the EDD can be used. Since we did not exchange any mails after that I assumed everyone was OK with the solution. 

The EDD structure can be read by the kernel code in /drivers/firmware/edd.c. Once read, interface information from the EDD will be copied to sysfs (in the file /sys/firmware/edd/int13_dev80/interface). There will also be a link to the associated PCI device under pci_dev in this folder. Basically what should happen is

1. Some userspace app reads the interface file and retrieves the target WWPN and LUN #.
2. Follow the symlink pci_dev and determine which of /sys/class/net/ethX contains the pci_dev info (a simple regex script should take care of this). When a match is made, the corresponding ethX becomes the boot if. Let's say this info is stored in some variable called $ifname
3. Bring up the interface 
    - ifconfig $ifname up
4. Create an FCoE interface
    - fcoeadm --create $ifname
5. Determine the remote port with the correct WWPN
    - A simple script will scan through the newly discovered remote port directories and check which port_name matches the WWPN from the interface file in step 1. This is the target we want to connect to using LUN# from the interface file.

Eventually the solution will also need to support dcb at both install and boot time and we're working on a prototype solution here currently which we will share with you as soon as it works. The dcb stuff however should not hinder implementation of the steps described in comment #2 or reading the EDD structure above in any way.

Which version of the Option ROM do you have? I want to make sure that you have the latest version which includes support for writing firmware parameters to EDD.

Thanks,
Supreeth

Comment 18 Hans de Goede 2009-06-30 21:52:58 UTC
(In reply to comment #17)
> Hi Hans,
> 

Hi,

> I think there was a miscomm where both of us was waiting for the other to
> reply. The installation is successful as I'd mentioned earlier.

Ah I had read over the "Your fix to the grayed out boot device selection also seems to have resolved the I/O error I had reported earlier" part of your previous comment, so I was waiting on further feedback wrt that, my bad.

> The next step
> would be for you guys to do the mkinitrd changes to facilitate auto boot. The
> changes should be very similar to your iSCSI solution.
> 

<snip>

This is not what this bug is about, this bug is about adding basic FCoE support
as you outlined in comment #2, this does not include reading firmware
tables through sysfs as you are now asking for in comment #17.

The plan for the basic FCoE support in the case where / lives on an FCoE disk
is to pass an ethernet device to use for FCoE to dracut on the kernel cmdline, and dracut will then write the device name to /sys/module/fcoe/parameters/create

After this dracut will use mount by LABEL or mount by UUID to find the root
filesystem.

Please file a new feature request for the firmware table support.

> Which version of the Option ROM do you have? I want to make sure that you have
> the latest version which includes support for writing firmware parameters to
> EDD.
> 

iirc, the FCoE capable firmware only works on 10 gigabit nics, I only
have a 1 gigabit nic, and having a 10 gigabit nic would be of little use as I
have nothing to connect it to.

Regards,

Hans

Comment 19 Hans de Goede 2009-07-01 08:59:51 UTC
Hi,

I've a couple of questions for you:

I'm currently working on writing out the information about FCoE SAN's configured
during boot to the installed system. This comes down to writing a
/etc/fcoe/cfg-eth#
File containing:
###
FCOE_ENABLE="yes"
DCB_REQUIRED="no"
###

For each interfaced used for FCoE during the install. I wonder though, when
/ is on FCoE, if this should still be done (as dracut will already bring up the
FCoE). Writing this will cause "fcoeadm -c eth#" to be called for an eth# which
has already been activated as FCoE interface by dracut. I assume / hope that this
is a no-op and thus not a problem, because if it is a problem we need to
find a way to make this not happen.

For iscsi we currently recognize if a scsis disk (/dev/sd?) is an iscsi disk
or a regular scsi disk. It would be good to do the same for FCoE, as it
might come in handy to tell the difference later on. For iscsi we use the 
/dev/disk/by-path name to see the disk is iscsi. Does udev currently create
sensible /dev/disk/by-path names for FCoE disks like it does for iscsi, if it
doesn't could you provide a patch to /lib/udev/path_id for this ?

Thanks,

Hans

Comment 20 Hans de Goede 2009-07-01 11:27:07 UTC
Hi,

I've just put an updated updates.img here:
http://people.atrpms.net/~hdegoede/updates-486244.img

Which adds writing out of /etc/fcoe/cfg-eth# files for all NIC's used to
connect to an FCoE SAN during the installation, could you give this a try,
and see of the /etc/fcoe/cfg-eth# file(s) gets written out correctly? Thanks!

Comment 21 Supreeth 2009-07-01 20:52:57 UTC
(In reply to comment #19)
Hi Hans,

> For each interfaced used for FCoE during the install. I wonder though, when
> / is on FCoE, if this should still be done (as dracut will already bring up the
> FCoE). Writing this will cause "fcoeadm -c eth#" to be called for an eth# which
> has already been activated as FCoE interface by dracut. I assume / hope that
> this
> is a no-op and thus not a problem, because if it is a problem we need to
> find a way to make this not happen.

I see this as a no-op because fcoeadm -c "ethX" essentially is 
"echo ethX > /sys/module/fcoe/parameters/create". If the interface already exists echo will flag a write error but the interface itself will still be available for use. So I tried the following on eth1 after the interface was created and discovery done.

sh-4.0# echo "eth1" > /sys/module/fcoe/parameters/create
sh: echo: write error: File exists

The interface is still usable and I verified it as well. If the "write error" message is a concern then we should take steps to make sure this does not happen. 


> For iscsi we currently recognize if a scsis disk (/dev/sd?) is an iscsi disk
> or a regular scsi disk. It would be good to do the same for FCoE, as it
> might come in handy to tell the difference later on. For iscsi we use the 
> /dev/disk/by-path name to see the disk is iscsi. Does udev currently create
> sensible /dev/disk/by-path names for FCoE disks like it does for iscsi, if it
> doesn't could you provide a patch to /lib/udev/path_id for this ?

Yes, udev does indeed create proper /dev/disk/by-path names for fc disks after discovery. Here's a run of ls on /dev/disk/by-path. I added newlines to facilitate easy reading.
 
sh-4.0# cd /mnt/sysimage
sh-4.0# cd dev/disk/by-path
sh-4.0# ls
pci-0000:00:1f.1-scsi-0:0:0:0   
pci-eth1-fc-0x201600a0b842138c:0x0000000000000000
pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part1
pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part2
pci-eth1-fc-0x201600a0b842138c:0x0001000000000000
pci-eth1-fc-0x201600a0b842138c:0x0001000000000000-part1
pci-eth1-fc-0x201600a0b842138c:0x0002000000000000
pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part1
pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part2
pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part3
pci-eth1-fc-0x5006016141e03375:0x0000000000000000                       
sh-4.0#

I think this should be sufficient to identify fc disks at boot time. 

cheers,
Supreeth

> Thanks,
> Hans

Comment 22 Supreeth 2009-07-01 20:55:33 UTC
Created attachment 350196 [details]
Anaconda dump after unhandled eception.

Attaching the anaconda dump file.

Comment 23 Supreeth 2009-07-01 20:57:45 UTC
(In reply to comment #20)
> Hi,
> I've just put an updated updates.img here:
> http://people.atrpms.net/~hdegoede/updates-486244.img
> Which adds writing out of /etc/fcoe/cfg-eth# files for all NIC's used to
> connect to an FCoE SAN during the installation, could you give this a try,
> and see of the /etc/fcoe/cfg-eth# file(s) gets written out correctly? Thanks!  

Hans,

I tried the new image and got an unhandled exception after the filesystem was written to the remote LUN. The exception happened after the installer asked me to choose additional repositories to install and was checking dependencies of the packages chosen for install. I have attached the anacdump.txt for your investigation. I will also do some investigation here to see if a possible I/O error caused the issue.

Thanks,
Supreeth

Comment 24 Hans de Goede 2009-07-02 09:35:13 UTC
(In reply to comment #21)

Hi Supreeth,

Thanks for the info and the testing!

> I see this as a no-op because fcoeadm -c "ethX" essentially is 
> "echo ethX > /sys/module/fcoe/parameters/create". If the interface already
> exists echo will flag a write error but the interface itself will still be
> available for use. So I tried the following on eth1 after the interface was
> created and discovery done.
> 
> sh-4.0# echo "eth1" > /sys/module/fcoe/parameters/create
> sh: echo: write error: File exists
> 
> The interface is still usable and I verified it as well. If the "write error"
> message is a concern then we should take steps to make sure this does not
> happen. 
> 

Ok, could you try doing "fcoeadm -c eth#" on an already up interface, maybe
that is smart enough to just return success, if it isn't we indeed need to
get rid of the error somehow (errors like this tends to scare users). But
this is a minor issue.

> > For iscsi we currently recognize if a scsis disk (/dev/sd?) is an iscsi disk
> > or a regular scsi disk. It would be good to do the same for FCoE, as it
> > might come in handy to tell the difference later on. For iscsi we use the 
> > /dev/disk/by-path name to see the disk is iscsi. Does udev currently create
> > sensible /dev/disk/by-path names for FCoE disks like it does for iscsi, if it
> > doesn't could you provide a patch to /lib/udev/path_id for this ?
> 
> Yes, udev does indeed create proper /dev/disk/by-path names for fc disks after
> discovery. Here's a run of ls on /dev/disk/by-path. I added newlines to
> facilitate easy reading.
> 
> sh-4.0# cd /mnt/sysimage
> sh-4.0# cd dev/disk/by-path
> sh-4.0# ls
> pci-0000:00:1f.1-scsi-0:0:0:0   
> pci-eth1-fc-0x201600a0b842138c:0x0000000000000000
> pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part1
> pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part2
> pci-eth1-fc-0x201600a0b842138c:0x0001000000000000
> pci-eth1-fc-0x201600a0b842138c:0x0001000000000000-part1
> pci-eth1-fc-0x201600a0b842138c:0x0002000000000000
> pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part1
> pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part2
> pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part3
> pci-eth1-fc-0x5006016141e03375:0x0000000000000000                       
> sh-4.0#
> 
> I think this should be sufficient to identify fc disks at boot time. 

It is, cool!

Thanks,

Hans

Comment 25 Hans de Goede 2009-07-02 09:36:47 UTC
(In reply to comment #23)
Hi,

> I tried the new image and got an unhandled exception after the filesystem was
> written to the remote LUN. The exception happened after the installer asked me
> to choose additional repositories to install and was checking dependencies of
> the packages chosen for install. I have attached the anacdump.txt for your
> investigation. I will also do some investigation here to see if a possible I/O
> error caused the issue.
> 

Thanks for the anacdump.txt this was a bug in my write out
/etc/fcoe/cfg-eth# code, should be fixed now:
http://people.atrpms.net/~hdegoede/updates-486244.img

Comment 26 Hans de Goede 2009-07-02 09:44:01 UTC
(In reply to comment #24)
> sh-4.0# ls
> pci-0000:00:1f.1-scsi-0:0:0:0   
> pci-eth1-fc-0x201600a0b842138c:0x0000000000000000
> pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part1
> pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part2
> pci-eth1-fc-0x201600a0b842138c:0x0001000000000000
> pci-eth1-fc-0x201600a0b842138c:0x0001000000000000-part1
> pci-eth1-fc-0x201600a0b842138c:0x0002000000000000
> pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part1
> pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part2
> pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part3
> pci-eth1-fc-0x5006016141e03375:0x0000000000000000                       

Question what does this look like for "real" fibrechannel ? I'm wondering
if I need to check for "eth#" there (which could be tricky as nics can be renamed), or if just checking for pci-*-fc-* is enough.

Comment 27 Supreeth 2009-07-06 18:19:45 UTC
(In reply to comment #25)
Hi Hans,

> (In reply to comment #23)
> Hi,
> > I tried the new image and got an unhandled exception after the filesystem was
> > written to the remote LUN. The exception happened after the installer asked me
> > to choose additional repositories to install and was checking dependencies of
> > the packages chosen for install. I have attached the anacdump.txt for your
> > investigation. I will also do some investigation here to see if a possible I/O
> > error caused the issue.
> > 
> Thanks for the anacdump.txt this was a bug in my write out
> /etc/fcoe/cfg-eth# code, should be fixed now:
> http://people.atrpms.net/~hdegoede/updates-486244.img  

Thank you for the new image. The install is successful and the file is written correctly to /mnt/sysimage/etc/fcoe/cfg-eth1.

Thanks,
Supreeth

Comment 28 Supreeth 2009-07-06 18:26:27 UTC
(In reply to comment #24)
Hi Hans,
> (In reply to comment #21)
> Hi Supreeth,
> Thanks for the info and the testing!

Not a problem at all!

<snip>

> Ok, could you try doing "fcoeadm -c eth#" on an already up interface, maybe
> that is smart enough to just return success, if it isn't we indeed need to
> get rid of the error somehow (errors like this tends to scare users). But
> this is a minor issue.

I haven't had a chance to check but I did inspect the source code. As long as the write is sucessful to /sys/module/fcoe/parameters/create (using fopen and fputs) fcoeadm -c is designed to return success. I think we should be fine but we will keep our eyes peeled out for this. I will check for any issues and update you. 

<snip>

Thanks,
Supreeth

Comment 29 Supreeth 2009-07-06 18:36:14 UTC
(In reply to comment #26)
Hi Hans,

> (In reply to comment #24)
> > sh-4.0# ls
> > pci-0000:00:1f.1-scsi-0:0:0:0   
> > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000
> > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part1
> > pci-eth1-fc-0x201600a0b842138c:0x0000000000000000-part2
> > pci-eth1-fc-0x201600a0b842138c:0x0001000000000000
> > pci-eth1-fc-0x201600a0b842138c:0x0001000000000000-part1
> > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000
> > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part1
> > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part2
> > pci-eth1-fc-0x201600a0b842138c:0x0002000000000000-part3
> > pci-eth1-fc-0x5006016141e03375:0x0000000000000000                       
> Question what does this look like for "real" fibrechannel ? I'm wondering
> if I need to check for "eth#" there (which could be tricky as nics can be
> renamed), or if just checking for pci-*-fc-* is enough.  

For a real FC HBA, this looks similar in a traditional sense but the eth1 is replaced by PCI device information. For example it looks something like,

pci-0000:00:1c.0-fc-0x201600a0b842138c:0x0001000000000000 

Checking for pci-*-fc-* would retrieve all available FC disks, but since we'll always be connecting to an FCoE switch to do the discovery and not a traditional FC switch I think the search should be sufficient IMHO. I will however run some experiments here and try some permutations and combinations.

Thanks,
Supreeth

Comment 30 Hans de Goede 2009-07-06 18:44:31 UTC
Hi Supreeth,

Once more thanks for all the input, I've prepared a new updates.img for you:
http://people.atrpms.net/~hdegoede/updates-486244.img

New this time around is that the code now recognizes FCoE disks as a separate
type of disk from normal SCSI disk and tracks through which NIC it is connected, this allows for the following 2 things:

1) Make sure fcoe-utils gets installed
2) Write out an /etc/sysconfig/network-scripts/ifcfg-eth# file
   with "NM_CONTROLLED=no" in there, so that NetworkManager won't
   touch the interface

Can you do another test install with this updates.img please and check that:
1) fcoe-utils gets installed
2) /etc/sysconfig/network-scripts/ifcfg-eth# with
   "NM_CONTROLLED=no" in there gets written ?

Thanks!

One more question, from a reviewer of this new batch of changes, is it
possible to use one interface for both IP traffic and FCoE at the same time ?
and if this is possible, is this a realistic scenario ?

Comment 31 Supreeth 2009-07-06 21:42:32 UTC
(In reply to comment #30)
Hi Hans,


> Once more thanks for all the input, I've prepared a new updates.img for you:
> http://people.atrpms.net/~hdegoede/updates-486244.img
> New this time around is that the code now recognizes FCoE disks as a separate
> type of disk from normal SCSI disk and tracks through which NIC it is
> connected, this allows for the following 2 things:
> 1) Make sure fcoe-utils gets installed
> 2) Write out an /etc/sysconfig/network-scripts/ifcfg-eth# file
>    with "NM_CONTROLLED=no" in there, so that NetworkManager won't
>    touch the interface
> Can you do another test install with this updates.img please and check that:
> 1) fcoe-utils gets installed
> 2) /etc/sysconfig/network-scripts/ifcfg-eth# with
>    "NM_CONTROLLED=no" in there gets written ?
> Thanks!

Not a problem. I will work on this ASAP and update you (mostly the ETA is sometime tomorrow)
 
> One more question, from a reviewer of this new batch of changes, is it
> possible to use one interface for both IP traffic and FCoE at the same time ?
> and if this is possible, is this a realistic scenario ?  

The short answer is yes, we can do this using DCB. DCB allows us to tag storage packets and IP packets with separate priorities. This in turn is used for Priority Flow Control using Pause frames.  For example some priority grouping might say that assign 40% of bandwidth to LAN, 40% to SAN etc. When a class of traffic sees a sudden increase in bandwidth, the FCoE switch can send a pause frame to manage this situation. 

Thanks,
Supreeth

Comment 32 Hans de Goede 2009-07-07 18:42:07 UTC
(In reply to comment #31)
> > One more question, from a reviewer of this new batch of changes, is it
> > possible to use one interface for both IP traffic and FCoE at the same time ?
> > and if this is possible, is this a realistic scenario ?  
> 
> The short answer is yes, we can do this using DCB. DCB allows us to tag storage
> packets and IP packets with separate priorities. This in turn is used for
> Priority Flow Control using Pause frames.  For example some priority grouping
> might say that assign 40% of bandwidth to LAN, 40% to SAN etc. When a class of
> traffic sees a sudden increase in bandwidth, the FCoE switch can send a pause
> frame to manage this situation. 
> 

Ok, so this can be done, but it sounds like a really weird setup to me,
surely one wants separate networks for TCP/IP and for storage ?

So do you think this is something which we ought to support in the installer
(my own vote goes to declaring this an unsupported setup) ?

Comment 33 Supreeth 2009-07-09 23:01:26 UTC
(In reply to comment #32)

Hi Hans,
> (In reply to comment #31)
> > > One more question, from a reviewer of this new batch of changes, is it
> > > possible to use one interface for both IP traffic and FCoE at the same time ?
> > > and if this is possible, is this a realistic scenario ?  
> > 
> > The short answer is yes, we can do this using DCB. DCB allows us to tag storage
> > packets and IP packets with separate priorities. This in turn is used for
> > Priority Flow Control using Pause frames.  For example some priority grouping
> > might say that assign 40% of bandwidth to LAN, 40% to SAN etc. When a class of
> > traffic sees a sudden increase in bandwidth, the FCoE switch can send a pause
> > frame to manage this situation. 
> > 
> Ok, so this can be done, but it sounds like a really weird setup to me,
> surely one wants separate networks for TCP/IP and for storage ?
> So do you think this is something which we ought to support in the installer
> (my own vote goes to declaring this an unsupported setup) ?  

We probably do not need to support both TCP and FCoE traffic (converged traffic) on the same port at install or boot time for now. However, since FCoE runs on DCB enabled networks, DCB needs to be supported at all times FCoE is running (install time and boot time) even if the install/boot interface is only being used for FCoE traffic and not converged traffic.

Comment 34 Hans de Goede 2009-07-10 07:26:53 UTC
(In reply to comment #33)
>
> We probably do not need to support both TCP and FCoE traffic (converged
> traffic) on the same port at install or boot time for now. However, since FCoE
> runs on DCB enabled networks, DCB needs to be supported at all times FCoE is
> running (install time and boot time) even if the install/boot interface is only
> being used for FCoE traffic and not converged traffic.  

Thanks for the input, as for DCB support, I understand from previous comments
that that is still being worked on at the tools / kernel level, and that was
not part of the original feature request. So please file a new feature request
for adding DCB support, and lets focus first on getting the basic FCoE minimal support up and running.

WRT your mail that fcoe-utils did not get installed and the ifcfg-eth# did not get written, can you please attach /tmp/storage.log and /tmp/anaconda.log from an install with the latest updates.img ?

Thanks!

Comment 35 Supreeth 2009-07-10 18:27:39 UTC
Created attachment 351290 [details]
Anaconda.log file July 10, 2009

Comment 36 Supreeth 2009-07-10 18:28:22 UTC
Created attachment 351291 [details]
Storage.log file July 10, 2009

Comment 37 Supreeth 2009-07-10 18:31:18 UTC
(In reply to comment #30)

Hi Hans,

<snip>

> Can you do another test install with this updates.img please and check that:
> 1) fcoe-utils gets installed
> 2) /etc/sysconfig/network-scripts/ifcfg-eth# with
>    "NM_CONTROLLED=no" in there gets written ?

I did a full reinstall with the latest update image and here are some observations.

1. The NM_CONTROLLED=no is written to /etc/sysconfig/network-scripts/ifcfg-eth1 but is not written to the corresponding spot in /mnt/sysimage which is where I was looking for it. Is this the expected behavior, or does the file need to be written to /mnt/sysimage as well?

2. After install I manually looked for fcoeadm, fcoemon, etc. in
   /usr/sbin
   /mnt/sysimage/usr/sbin
   /mnt/runtime/usr/sbin
   
and did not find them. I also did a system wide "find . | grep fcoe" from / and could not find any of the userspace apps. I think fcoe-utils is not being installed. I have attached the anaconda.log and storage.log files for your investigation.

Thanks,
Supreeth

Comment 38 Hans de Goede 2009-07-12 21:14:36 UTC
Hi,

As always thanks for testing. Judging from the logs, your disks are being
seen as regular disks (whereas anaconda should recognize them as being
on FCoE to get anaconda to do things like install fcoe-utils

Are you sure you were using the latest version of the updates.img ? I can
find no reason looking at the logs + code for the latest updates.img to
not recognize the disks as FCoE disks

Here is a new updates.img with some debugging:
http://people.atrpms.net/~hdegoede/updates-486244.img

Can you please attach logs from an install done with this img ? There is no
need to do a full install, after adding the FCoE SAN and the disks showing up
in the partitioning UI, save the logs and I have what I need.

I don't know how good your python is, maybe you can help debug this issue ? The problem is the udev_device_is_fcoe() function from /tmp/updates/storage/udev.py
the info["ID_PATH"] and info["ID_BUS"] values used in there can also
be found in the storage.log file by searching for ID_PATH resp. ID_BUS,

I've done some local tests using the values for these from the latest storage.log you attached, and udev_device_is_fcoe() should recognize the disks from that install as FCoE, but somehow does not (the log does not contain:
"sda is an fcoe disk" but instead contains "sda is a disk"

Regards,

Hans

Comment 39 Supreeth 2009-07-13 18:15:05 UTC
Created attachment 351504 [details]
anaconda.log July 13, 2009

Comment 40 Supreeth 2009-07-13 18:15:50 UTC
Created attachment 351505 [details]
storage.log July 13, 2009

Comment 41 Supreeth 2009-07-13 18:24:29 UTC
(In reply to comment #38)

Hi Hans,

I tried the new image you sent and the disks are now being recognized as fcoe disks. In storage.log I can see the "sdX is an fcoe disk" log messages. I looked into the older updates image and found that it was the exact same size as an older one. So I'm thinking that I might have been using an older image. It looks like there might have been a mixup either when I saved the updates file or when you updated it. In any case the disks are being recognized as fcoe disks now :-) 

Now that the disks are being recognized as fcoe disks, I let the installer run fully but could not find any trace of fcoe-utils again. What do I need to do in order to install fcoe-utils? I looked at customizing the packages to install from the UI but could not find fcoe-utils as an option there.

Thanks,
Supreeth

Comment 42 Hans de Goede 2009-07-14 07:34:03 UTC
(In reply to comment #41)
> I tried the new image you sent and the disks are now being recognized as fcoe
> disks. In storage.log I can see the "sdX is an fcoe disk" log messages. I
> looked into the older updates image and found that it was the exact same size
> as an older one. So I'm thinking that I might have been using an older image.
> It looks like there might have been a mixup either when I saved the updates
> file or when you updated it. In any case the disks are being recognized as fcoe
> disks now :-) 
> 

Great, good to hear it is recognizing the disks now!

> Now that the disks are being recognized as fcoe disks, I let the installer run
> fully but could not find any trace of fcoe-utils again. What do I need to do in
> order to install fcoe-utils? I looked at customizing the packages to install
> from the UI but could not find fcoe-utils as an option there.
> 

Hmm, are you doing the install from a F-11 dvd by chance ? That probably does not have fcoe-utils on there, you probably need to do a net install from
a mirror of the "Everything" directory.

Anyways, the part of anaconda which is responsible for getting fcoe-utils installed has not been changed by the FCoE patches, now that the disk is recognized as an FCoE disk, fcoe-utils should get installed if available in the repository.

More interesting is the question did NM_CONTROLLED=NO get written to the ifcfg-eth# file on the installed system ?

Comment 43 Supreeth 2009-07-15 18:48:34 UTC
(In reply to comment #42)
Hi Hans,

> Hmm, are you doing the install from a F-11 dvd by chance ? That probably does
> not have fcoe-utils on there, you probably need to do a net install from
> a mirror of the "Everything" directory.
> Anyways, the part of anaconda which is responsible for getting fcoe-utils
> installed has not been changed by the FCoE patches, now that the disk is
> recognized as an FCoE disk, fcoe-utils should get installed if available in the
> repository.

I can manually install fcoe-utils now.

Yes, I was indeed using the F-11 DVD. I tried doing a net install but ran into some connection/proxy issues. So I went to the internal mirrors and got the RPMs for fcoe-utils and all dependencies (dcbd, libhbaapi, libhbalinux). Once the install completed I did a chroot to /mnt/sysimage and was able to install fcoe-utils without any issue. I have no doubt that that install from the repository will succeed as well and will give it a try once I resolve some proxy issues.
                   
> More interesting is the question did NM_CONTROLLED=NO get written to the
> ifcfg-eth# file on the installed system ?  

Yes, NM_CONTROLLED=no is now written to th ifcfg-eth# file on the installed system too!

Thanks!
Supreeth

Comment 44 Hans de Goede 2009-08-25 20:49:45 UTC
Created attachment 358632 [details]
PATCH: add FCoE boot support to dracut

Hi Supreeth,

Attached you find a patch for dracut to add support for booting from FCoE. Can you please test this.

To test this:
1) Install F-11 using the latest updates.img I provided in this bug
2) When the install is completed switch to tty2 and chroot to /mnt/sysimage
3) Install dracut:
   http://koji.fedoraproject.org/koji/buildinfo?buildID=127300
4) Apply the attached patch under /usr/share/dracut
5) Regenerate the initrd using:
   dracut -f /boot/initrd-<kver>.img <kver>
   Where kver is the version string of the existing initrd
6) Leave the chroot
7) Reboot
8) In grub add: "fcoe=<mac-address-lowercase>:nodcb" to the kernel cmdline
   You can leave the : in the mac address as is so ie:
   fcoe=aa:bb:cc:dd:ee:ff:nodcb

Regards,

Hans

Comment 45 Supreeth 2009-08-26 22:24:38 UTC
Hi Hans,

> Attached you find a patch for dracut to add support for booting from FCoE. Can
> you please test this.

Thank you for the patch and the instructions! I spent a few hours on this today, and in summary boot did not work. I will comment below after each instruction.

> To test this:
> 1) Install F-11 using the latest updates.img I provided in this bug

Install completed successfully.

> 2) When the install is completed switch to tty2 and chroot to /mnt/sysimage
> 3) Install dracut:
>    http://koji.fedoraproject.org/koji/buildinfo?buildID=127300

I installed fcoe-utils and all dependencies, and after that I installed dracut. I had to install the bridge-utils and dash first before dracut installed.

> 4) Apply the attached patch under /usr/share/dracut

Patch applied successfully.

> 5) Regenerate the initrd using:
>    dracut -f /boot/initrd-<kver>.img <kver>
>    Where kver is the version string of the existing initrd

I did this and I know a new initrd was written. 

> 6) Leave the chroot
> 7) Reboot
> 8) In grub add: "fcoe=<mac-address-lowercase>:nodcb" to the kernel cmdline
>    You can leave the : in the mac address as is so ie:
>    fcoe=aa:bb:cc:dd:ee:ff:nodcb

The Option ROM managed to pull the bootloader, and on the kernel cmdline (The 'c' option) the "fcoe=" command triggered a "Error 27: Command not found" The only way I was able to add this is by appending (The 'a' option)to the kernel commands. After this everything died with the message "Boot has failed. Sleeping forever."

I ran the installer again so I could access the LUN. I uncompressed the initrd image and took a look inside. The fcoe modules are present in /lib/modules/... but I could not find fcoeadm or anything related. I also could not find any place where there was code to insmod the modules but then the init script has changed a bit from what it was in RHEL5. So I might not have looked in the right places. I would like to attach the initrd image for you but it is 12 MB in size. So please let me know if there are any specific files that you would like to see and I will send them to you. 

Thanks!
Supreeth

Comment 46 Supreeth 2009-08-26 22:30:45 UTC
Created attachment 358783 [details]
Initrd image for failed FCoE boot

I hope this attaches. If not please let me know what files you'd like to go over.

Comment 47 Hans de Goede 2009-08-27 06:29:07 UTC
Hi Supreeth,

Using the a option to append the fcoe command is fine, can you please try again also adding "rdinitdebug rdbreak" to the commandline. This will drop you to a shell when dracut is done probing.

From this shell can you please do:
ip link show eth#

Changing # until it shows the card you want to use for fcoe, and "paste" the output here?

And also:
cat /etc/udev/rules.d/60-fcoe.rules

And again paste the output here ?

Thanks,

Hans

Comment 48 Supreeth 2009-08-27 16:27:24 UTC
Hi Hans,

> Using the a option to append the fcoe command is fine, can you please try again
> also adding "rdinitdebug rdbreak" to the commandline. This will drop you to a
> shell when dracut is done probing.

Cool! I did not know about this feature. This will be helpful for so many things :-)

> From this shell can you please do:
> ip link show eth#
> Changing # until it shows the card you want to use for fcoe, and "paste" the
> output here?

I did this, and found that the link was still down. The output is

ip link show eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:1b:21:01:e5:2b brd ff:ff:ff:ff:ff:ff

> And also:
> cat /etc/udev/rules.d/60-fcoe.rules
> And again paste the output here ?

This file does not exist under /etc/udev/rules.d. So it looks like we're definitely missing some information here. 

Thanks,
Supreeth

> Thanks,
> Hans

Comment 49 Hans de Goede 2009-08-27 17:43:27 UTC
Hi,

Hmm, I think I know what is going on patch does not preserve rights, could you please do:
chmod +x /usr/share/dracut/modules.d/95fcoe/*

And then re-generate the initrd and try again ?

Thanks,

Hans

p.s.

For a faster turn around time, I'm hansg on irc, Freenode server, channel
#anaconda or #dracut

Comment 50 Supreeth 2009-08-27 20:23:05 UTC
(In reply to comment #49)
> Hi,
> Hmm, I think I know what is going on patch does not preserve rights, could you
> please do:
> chmod +x /usr/share/dracut/modules.d/95fcoe/*
> And then re-generate the initrd and try again ?

I did this, and sure enough it worked. All FCoE LUNs are being discovered now, but the boot still fails saying "Root device not found". The contents of /etc/udev/rules.d/60-fcoe.rules is as follows

ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="00:1b:21:01:e5:2b", RUN+="/sbin/fcoe-up $env{INTERFACE} nodcb

In the grub menu, I changed the "root=" to root=/dev/sdc2. This time it gave me an error

"mount: unknown filesystem type 'lvm2pv' "

and then the usual "Boot has failed, sleeping forever" message popped up. I found this same behavior (unknown fs error)when I tried to mount /dev/sdc2 to simply access a couple of files. If the filesystem is forced to ext3 at install time maybe this will not happen but I am speculating :-)

I will be out on vacation friday but back monday. 

cheers,
Supreeth

> Thanks,
> Hans
> p.s.
> For a faster turn around time, I'm hansg on irc, Freenode server, channel
> #anaconda or #dracut

Comment 51 Hans de Goede 2009-08-28 07:01:11 UTC
(In reply to comment #50)
> (In reply to comment #49)
> > please do:
> > chmod +x /usr/share/dracut/modules.d/95fcoe/*
> > And then re-generate the initrd and try again ?
> 
> I did this, and sure enough it worked. All FCoE LUNs are being discovered now,

Great, that means its working!

> but the boot still fails saying "Root device not found".
> In the grub menu, I changed the "root=" to root=/dev/sdc2. This time it gave me
> an error
> 
> "mount: unknown filesystem type 'lvm2pv' "
> 
> and then the usual "Boot has failed, sleeping forever" message popped up. I
> found this same behavior (unknown fs error)when I tried to mount /dev/sdc2 to
> simply access a couple of files. If the filesystem is forced to ext3 at install
> time maybe this will not happen but I am speculating :-)
> 

You probably used default partitioning, which uses lvm. Add rdinitdebug and rdbreak again, and when dropped in the shell do "ls /dev/mapper"

you should have a file named something like like this there:
VolGroup-lv_root

Now try again with:
root=/dev/mapper/VolGroup-lv_root

That should fix things, note that this should be present in the default grub.conf written by anaconda though.

Comment 52 Hans de Goede 2009-08-31 08:32:33 UTC
Hi Supreeth,

dracut also has a mode called host only mode, where it generates an initrd tailored to the system it is running on. I need to add some code to dracut for this to check if a disk is an fcoe disk. For iscsi for example we have
(bash code):

is_iscsi() ( 
    [[ -L /sys/dev/block/$1 ]] || return
    cd "$(readlink -f /sys/dev/block/$1)"
    until [[ -d sys || -d iscsi_session ]]; do
        cd ..
    done
    [[ -d iscsi_session ]]
)

Which then can be called as "is_iscsi 8:0" (Where 8 is the scsi disk major number,
use 8:16 for sdb 8:32 for sdc, etc). Could you please write and test a similar
piece of bash code to detect FCoE disks?

Comment 53 Supreeth 2009-09-08 21:45:30 UTC
Hi Hans,

> You probably used default partitioning, which uses lvm. Add rdinitdebug and
> rdbreak again, and when dropped in the shell do "ls /dev/mapper"
> you should have a file named something like like this there:
> VolGroup-lv_root
> Now try again with:
> root=/dev/mapper/VolGroup-lv_root
> That should fix things, note that this should be present in the default
> grub.conf written by anaconda though.  

I've been on vacation for a couple of days last week and so the late reply. Yes, I used default partitioning. After discovery I did a ls /dev/mapper, and all it displayed was "control." I could not see a VolGroup-lv_root or anything similar. control was the only file in the directory. I can see all the LUNs under /dev though.

Thanks,
Supreeth

Comment 54 Hans de Goede 2009-09-09 07:12:02 UTC
Hi Supreeth,

Hmm, so the FCoE part is working, but if the disks are FCoE somehow the lvm scanning is not working, can you try again please with the latest dracut
(which now includes the FCoE support):
http://koji.fedoraproject.org/koji/buildinfo?buildID=131035

Note remove "rhgb quiet" from the kernel cmdline and add: "nomodeset 1 rdbreak rdshell", then when wat the shell, do "ls /dev/mapper" again, if the nodes are there, try "ls /sysroot", if "ls /sysroot" shows a / filesystem, then exit the shell, and everything works.

If things don't work, can you do "ls -a /tmp" and write down what is shown there please ?

Comment 55 Hans de Goede 2009-09-21 12:36:32 UTC
All bits for basic FCoE boot support in both dracut, and anaconda are in place now in rawhide, setting this to modified.

Comment 56 Supreeth 2009-11-02 20:08:31 UTC
What is bug 519933, and why does this bug block it? I'm unable to access #519933.

Thanks,
Supreeth

Comment 57 Denise Dumas 2009-11-02 20:18:04 UTC
Hi Supreeth, 

It's just a tracker bugzilla to link together requests for FCoE support from partners, as well as problem reports or anything that might potentially affect FCoE for RHEL6.

Comment 58 Supreeth 2009-11-03 00:07:01 UTC
Thank you for the clarification, Denise!

-Supreeth

Comment 59 Alexander Todorov 2009-11-20 13:44:56 UTC
Hi Supreeth,
have you had a chance to test the latest rawhide trees? FCoE support has been in rawhide for 2 months already and should be working.

Thanks.

Comment 61 Supreeth 2010-05-04 16:51:21 UTC
The following is a summary of FCoE boot related features in RHEL 6.0. It looks like we need both the installer and dracut to be updated with many FCoE related  binaries. Also see BZ 563790 and BZ 563794.

Anaconda Support:

Anaconda UI support is present, EDD script is correctly used, and LLDP is successfully configured on the interface found in the EDD. I really like the “Use DCB” checkbox in Anaconda! 

However discovery fails due to lack of VLAN support. RHEL 6.0 is missing fipvlan and vconfig from the installer initrd. As a result it still calls fcoemon –c ethX which results in discovery failing as a result of VLAN not being configured. I was able to manually get install to be successful after adding the 8021q kernel module (for vconfig) and manually calling fipvlan (fipvlan –s ethX). We need to replace the call to fcoemon with calls to fipvlan in anaconda or else install will not be possible without manual intervention.

Dracut Support:

After install I am not able to see any required binaries in the initramfs image. We’re missing lldpad, dcbtool, fipvlan, fcoe_edd.sh, and vconfig. Maybe Dracut needs a specific command to include build an fcoe specific initrd but it might not be getting triggered as I bypassed the normal process with my manual steps. *What are the arguments given to Dracut to build an initrams with fcoe support?* 

The Option ROM finds the remote LUN with no problems at boot time and the initrd loads. After that everything stalls as the fcoe initiator is not brought up from within initrd.

Thanks,
Supreeth

Comment 62 Hans de Goede 2010-05-06 10:30:09 UTC
Hi supreeth,

Thanks for testing!

(In reply to comment #61)
> The following is a summary of FCoE boot related features in RHEL 6.0. It looks
> like we need both the installer and dracut to be updated with many FCoE related
>  binaries. Also see BZ 563790 and BZ 563794.
> 
> Anaconda Support:
> 
> Anaconda UI support is present, EDD script is correctly used, and LLDP is
> successfully configured on the interface found in the EDD. I really like the
> “Use DCB” checkbox in Anaconda! 
> 
> However discovery fails due to lack of VLAN support. RHEL 6.0 is missing
> fipvlan and vconfig from the installer initrd. As a result it still calls
> fcoemon –c ethX which results in discovery failing as a result of VLAN not
> being configured. I was able to manually get install to be successful after
> adding the 8021q kernel module (for vconfig) and manually calling fipvlan
> (fipvlan –s ethX). We need to replace the call to fcoemon with calls to fipvlan
> in anaconda or else install will not be possible without manual intervention.
> 

Oh, this is caused by a bit of misunderstanding between me and  Eric Multanen,
Eric mentioned in bug 563794, that we should switch from fcoemon to fipvlan for
dracut (the normal boot initrd), but did not say the same should be done for
anaconda. But doing the same thing in anaconda and the boot inird makes sense, I
should have asked.

So assuming anaconda should do the same as dracut, I'll write a patch to bring up
an interface for fcoe+dcb like this:
start lldpad (if not done already)
dcbtool sc ethX dcb on
dcbtool sc ethX app:fcoe e:1 a:1 w:1
fipvlan ethX -c -s

You're also talking about vconfig, I assume that this is needed / called by
fipvlan and that vconfig (and the 8021q kernel module) thus should be part
of the installer environment too? This is new information for me, and
AFAIK dracut gets this wrong too. I'll include adding these to the installer
environment to the patch for switching from fcoemon to fipvlan.

> Dracut Support:
> 
> After install I am not able to see any required binaries in the initramfs
> image. We’re missing lldpad, dcbtool, fipvlan, fcoe_edd.sh, and vconfig. Maybe
> Dracut needs a specific command to include build an fcoe specific initrd but it
> might not be getting triggered as I bypassed the normal process with my manual
> steps. *What are the arguments given to Dracut to build an initrams with fcoe
> support?* 
> 

fcoe support is part of the dracut-network package, which does not get installed
by default. When the code for bringing up the anaconda disks in FCoE is fixed to
work (rather then you doing it manually) anaconda will automatically add
dracut-network and fcoe-utils to the list of packages to install. Currently
the installed system likely is not only missing dracut-network but also fcoe-utils.

You can check the presence of the necessary tools in an initrd by generating
an initrd on a normally installed system, which after wards has
dracut-network and fcoe-utils installed.

Note that the initrd will be missing vconfig and the 8021q kernel module as I was
not aware that those were needed by fipvlan. Also the initrd will not include
fcoe_edd.sh, as that will only be used during install time, after which the
info about which nic to use will be stored in grub.conf and passed to the initrd
through the kernel cmdline. The how and why of this has been discussed in detail
in bug 513018 comment 24 .

> The Option ROM finds the remote LUN with no problems at boot time and the
> initrd loads. After that everything stalls as the fcoe initiator is not brought
> up from within initrd.

Yes, that is to be expected with fcoe support missing from it (and it probably
also is not being passed the right kernel cmdline parameters).

So moving forward with this, I'll do a patch for the anaconda bits. Since various
files are missing from the installer environment I cannot easily fix this with an
updates.img, so my plan is to get the fix in place asap, and then you'll have
to wait till there is a new snapshot.

For the dracut problem, I'll ask Harald Hoyer to do a new dracut fixing the
missing of vconfig thing asap. Then you should be able to test this bit, by
building a new initrd on a system with the exact same kernel as the FCoE
test install (and dracut-network and fcoe-utils installed), and using that.

When testing with a new initrd make sure you have the following on the kernel
cmdline:
ifname=eth#:AA:BB:CC:DD:EE:FF netroot=fcoe:eth#:dcb

Thanks & Regards,

Hans

Comment 63 Hans de Goede 2010-05-06 10:51:30 UTC
p.s.

Note that due to how stage1 and stage2 get "glued" together that /sbin/vconfig
will be /usr/sbin/fipvlan under the installer environment I hope that fipvlan
can handle this.

Comment 64 Supreeth 2010-05-06 14:55:51 UTC
Hi Hans,

Thank you very much for the detailed note. I think the changes you're making should take care of all the details for FCoE boot. Looking forward to seeing these changes in the next snapshot.

Thanks!
Supreeth

Comment 66 Hans de Goede 2010-05-10 17:42:12 UTC
*** Bug 588598 has been marked as a duplicate of this bug. ***

Comment 67 Supreeth 2010-05-14 21:22:30 UTC
Hi Hans,

I tried out the Anaconda changes and they work really well. I was able to get a full install to succeed without any manual intervention which is awesome! 

I did however run into issues after install because the initramfs does not have any FCoE related binaries (lldpad, fcoe-utils, vconfig, and 8021q module). It looks like the dracut changes did not make it into the new snapshot yet(or dracut created initramfs without the command option for including fcoe). Could you please let us know when these binaries will be included in dracut? Without these we will not be able to test any of the initramfs portions of remote booting via FCoE. I will also reference this in the dracut BZ 563794.

Thanks,
Supreeth

Comment 68 Hans de Goede 2010-05-15 11:42:10 UTC
(In reply to comment #67)
> Hi Hans,
> 
> I tried out the Anaconda changes and they work really well. I was able to get a
> full install to succeed without any manual intervention which is awesome! 
> 

Good to hear.

> I did however run into issues after install because the initramfs does not have
> any FCoE related binaries (lldpad, fcoe-utils, vconfig, and 8021q module). It
> looks like the dracut changes did not make it into the new snapshot yet.

Then only vconfig and the 8021q module would be missing, so this seems to be another problem.

I hope you have access to the filesystem of the installed system in some other way? Note that if this is a lot of trouble, don't bother I need you to start a fresh install anyway, and I can also get all needed info from the logs
from that install.

So if you do have access to the filesystem of the installed system in some other way, can you please check if dracut-network and fcoe-utils were installed, I have the feeling they were not (likely caused by the FCoE disk not being recognized as such). To check if they were installed, check for the
presence of the following files:
/usr/share/dracut/modules.d/95fcoe/install
/usr/sbin/fipvlan

Also, could you please collect the following log files from the filesystem and attach them here?
/root/install.log
/var/log/anaconda*

Then start a new install again using FCoE for / and /boot, and when it is installing packages, switch to tty2 (ctrl + alt + f2) and
do:
tar cvfz udevdb.tgz /dev/.udev/db
tar cvfz logs.tgz /tmp/*log
and attach the resulting tgz files here (you can use for example scp to
get them out of the installer environment).

Note: with some luck this can be fixed using an updates.img

Comment 69 Supreeth 2010-05-17 16:40:50 UTC
Created attachment 414606 [details]
install logs from the old install process as requested

Comment 70 Supreeth 2010-05-17 16:41:37 UTC
Created attachment 414607 [details]
Install logs from the new install process

Comment 71 Supreeth 2010-05-17 16:42:31 UTC
Created attachment 414608 [details]
udevb.tgz from time of install

Comment 72 Supreeth 2010-05-17 16:43:07 UTC
Created attachment 414609 [details]
logs.tgz from installing system

Comment 73 Supreeth 2010-05-17 16:45:48 UTC
(In reply to comment #68)

Hi Hans,

> I hope you have access to the filesystem of the installed system in some other
> way? 

I do. I can connect to the remote LUN by using open-fcoe initiator in the installer and can examine the contents. 


> /usr/share/dracut/modules.d/95fcoe/install
> /usr/sbin/fipvlan

These files are not present in the installed system as you suspected. So it indeed looks as though fcoe-utils and dracut's FCoE module were not installed. However I also realized that on my first attempt I had not explicitly chosen to install the "FCoE Client" package and so am not surprised fipvlan was not present. On the second attempt I chose the FCoE client package and fipvlan is now found in the installed filesystem and so are lldpad and vconfig. However I cannot find the 95fcoe directory yet again. I also extracted the initramfs img file and could not find any fcoe related binaries in there. 

> Also, could you please collect the following log files from the filesystem and
> attach them here?
> /root/install.log
> /var/log/anaconda*

Please find these attached. I have attached the logs from both the old install and the new install with fcoe packages installed.

> Then start a new install again using FCoE for / and /boot, and when it is
> installing packages, switch to tty2 (ctrl + alt + f2) and
> do:
> tar cvfz udevdb.tgz /dev/.udev/db
> tar cvfz logs.tgz /tmp/*log
> and attach the resulting tgz files here (you can use for example scp to
> get them out of the installer environment).

Please find these attached as requested. These are from the new install taken around the time install was midway through.

> Note: with some luck this can be fixed using an updates.img    

I hope so too. I am really pleased with the way anaconda is handling install via FCoE and we are all eagerly awaiting the changes to overcome what looks to be the final hurdle in the process. Many thanks for your quick investigation! Please let me know if you need me to send you any other logs or conduct further experiments for your investigation.

cheers,
Supreeth

Comment 74 Hans de Goede 2010-05-17 20:11:48 UTC
Supreeth,

Many thanks for all the logs. As I expected / feared already anaconda is not recognizing the disk as an FCoE disk, but instead it sees it as a regular
scsi disk, and thus it never adds fcoe-utils and dracut-network to the package set automatically.

anaconda determines if a disk is an FCoE disk or not based on the ID_PATH udev database property. In the logs you attached that is:
'ID_PATH': 'pci-0000:00:1f.1-scsi-0:0:0:0'

Which is, well, wrong, or at least different from earlier testing. I think this has to do with recent udev changes. PATH_ID used to be determined by a shell script, but now it is determined by an executable written in C.

I'll prep an updates.img with a workaround as a solution for this for now.

Regards,

Hans

Comment 75 Hans de Goede 2010-05-17 20:35:45 UTC
Ok here is an updates.img working around the udev id_path issue:
http://people.fedoraproject.org/~jwrdegoede/fcoe-updates.img

As usual, to use this pass
updates=http://people.fedoraproject.org/~jwrdegoede/fcoe-updates.img

On the bootloader cmdline.

Comment 76 Hans de Goede 2010-05-17 20:41:50 UTC
Created attachment 414660 [details]
path_id C-code

Hi,

Ok, so here is the culprit code (it needs libudev-devel to compile and to be linked against libudev). I hope you can take a look at this when you have some
time as debugging this without hardware access is very hard.

The problem function is handle_scsi_fibre_channel(), which does not
seem to recognize the fcoe disk as being fibre channel.

Once compiled to test it run it like this:
id_path /devices/virtual/net/eth2.168-fcoe/host4/rport-4:0-4/target4:0:0/4:0:0:0/block/sda

Note you can get the path to pass in by doing:
readlink -f /sys/block/sda

And then stripping of the /sys at the front.

The output should be something in the following form:
pci-eth#-fc-${id}

At least that is what the old shell code spew out and thus what anaconda expects.

Thanks,

Hans

Comment 77 Supreeth 2010-05-17 20:58:27 UTC
Hi Hans,

Thanks for the quick updates! I am going to try the new updates image today and I am going to try find some time to tackle the C code debugging as well (most likely tomorrow). I will post updates as soon as possible.

Thanks!
Supreeth

Comment 78 Supreeth 2010-05-17 22:18:23 UTC
Hi Hans,

I ran into a couple of issues when trying out the fcoe-updates.img. I cannot access the people.fedoraprojects.org site from within Intel so I copied the img file onto a local webserver (I renamed it as updates.img).

First, the updates.img did not seem to get copied even though /tmp/anaconda.log has the message "Transferring /xx/updates.img" I have verified that the updates.img is accessible by transferring it via sftp. 

After this I also ran into an issue in the UI after "Add FCoE SAN". The remote disk(s) is not listed even though discovery is successful. /dev has the correct device info after discovery but the UI is not displaying the device. This issue does not happen if I do not attempt to add the "linux updates=" command line option.

I am attaching the /tmp/anaconda.log and /tmp/storage.log for your scrutiny.

Thanks,
Supreeth

Comment 79 Supreeth 2010-05-17 22:22:02 UTC
Created attachment 414674 [details]
Anaconda.log for failing updates.img copy.

Comment 80 Supreeth 2010-05-17 22:23:00 UTC
Created attachment 414675 [details]
Storage.log for case where discovery succeeds but LUN not displayed in UI

Comment 81 Hans de Goede 2010-05-18 07:57:58 UTC
Hi Supreeth,

> Hi Hans,
> 
> After this I also ran into an issue in the UI after "Add FCoE SAN". The remote
> disk(s) is not listed even though discovery is successful. /dev has the correct
> device info after discovery but the UI is not displaying the device. This issue
> does not happen if I do not attempt to add the "linux updates=" command line
> option.

This actually is good news, this means anaconda now recognizes the FCoE drive as an FCoE drive and thus lists it in the "Other SAN Devices" tab where it belongs.

So the updates.img is doing what it should, try switching to the "Other SAN Devices" tab and selecting the drive there.

Regards,

Hans

Comment 82 Supreeth 2010-05-18 16:48:03 UTC
Created attachment 414910 [details]
init.log for failed boot.

Comment 83 Supreeth 2010-05-18 16:48:36 UTC
Hi Hans,

> This actually is good news, this means anaconda now recognizes the FCoE drive
> as an FCoE drive and thus lists it in the "Other SAN Devices" tab where it
> belongs.
> So the updates.img is doing what it should, try switching to the "Other SAN
> Devices" tab and selecting the drive there.
> Regards,
> Hans    

You're right! I was able to find the drive under "Other SAN devices" tab. The install completed and all the necessary files for FCoE boot (including 95fcoe dracut module) are present in the root filesystem and initramfs img.

On to the actual boot itself. Once the Option ROM fetched the initramfs, dracut was stalling. So I enabled debug and used the rdbreak/rdinitdebug options to get into a shell. Upon scrutinizing init.log I found the following

+ handler=fcoe
+ handler=fcoe
+ handler=/sbin/fcoeroot
+ [ -z fcoe:eth2:dcb ]
+ [ ! -e /sbin/fcoeroot ]
+ die No handler for netroot type 'fcoe:eth2:dcb'
+ echo <1>dracut: FATAL: No handler for netroot type 'fcoe:eth2:dcb'
+ echo <1>dracut: Refusing to continue
+ echo dracut: FATAL: No handler for netroot type 'fcoe:eth2:dcb'
dracut: FATAL: No handler for netroot type 'fcoe:eth2:dcb'
+ echo dracut: Refusing to continue
dracut: Refusing to continue
+ exit 1

The /sbin/fcoeroot script is not present. I do however find the regular netroot and iscsiroot scripts. I am attaching the init.log for you. Please let me know if you need any other files or need me to run further experiments. I am going to try my best to get some debugging done on the handle_scsi_fibre_channel() function today as well.

Thanks,
Supreeth

Comment 84 Supreeth 2010-05-18 16:59:12 UTC
Upon further scrutiny: It looks like dracut should call /sbin/fcoe-up instead of /sbin/fcoeroot. The fcoe-up script has all the required components to bring up the fcoe interface.

Thanks,
Supreeth

Comment 85 Hans de Goede 2010-05-18 17:36:17 UTC
Hi,

(In reply to comment #84)
> Upon further scrutiny: It looks like dracut should call /sbin/fcoe-up instead
> of /sbin/fcoeroot. The fcoe-up script has all the required components to bring
> up the fcoe interface.
> 

Yes, but it looks like the generic netroot= parsing code wants to see
a /sbin/fcoeroot. Can you try creating an empty sh script named /sbin/fcoeroot (and make it executable) which just contains:
exit 0

And add that to the initrd, to unpack the initrd do:
mkdir t
cd t
zcat .../initrd-....img | cpio -i

Then add t/sbin/fcoeroot And do:
find . | cpio --quiet -c -o | gzip -9 > .../initrd-....img

Thanks & Regards,

Hans

p.s.

You can discuss things more interactively with me on irc, I'm on the freenode network and you can find me in #anaconda, #dracut and this week also in #fcoe, I'm hansg there.

Comment 86 Hans de Goede 2010-05-18 17:38:34 UTC
ps the fcoe-up script will be called from a udev rule, where as /sbin/fcoeroot gets called when there is an ip configuration complete event (which likely never happens in the fcoe case, as no ip config is done from the initrd for fcoe).

Or so I think. I've asked Harald Hoyer to look further into this, but for now
creating an empty /sbin/fcoeroot as described is a good starting point for debugging this further.

Comment 87 Supreeth 2010-05-18 18:23:48 UTC
Created attachment 414932 [details]
init.log for failed boot

Comment 88 Supreeth 2010-05-18 18:25:54 UTC
Thanks, Hans. I added the fcoeroot script and the behavior changed. It now seems to be bringing up the eth0 interface instead of eth2. I have attached the init.log file. From there I find this

+ handler=fcoe
+ handler=fcoe
+ handler=/sbin/fcoeroot
+ [ -z fcoe:eth2:dcb ]
+ [ ! -e /sbin/fcoeroot ]
+ [ -z  ]
+ IFACES=eth0
+ . /tmp/net.eth0.up
+ ip addr add 10.23.21.47/255.255.255.0 broadcast 10.23.21.255 dev eth0
+ [ -e /tmp/net.eth0.gw ]
+ . /tmp/net.eth0.gw
+ ip route add default via 10.23.21.1 dev eth0
+ [ -e /tmp/net.eth0.hostname ]
+ [ -e /tmp/net.eth0.resolv.conf ]
+ cp -f /tmp/net.eth0.resolv.conf /etc/resolv.conf
+ [ -e /tmp/net.eth0.override ]
+ [ -e /tmp/dhclient.eth0.dhcpopts ]
+ . /tmp/dhclient.eth0.dhcpopts
+ new_broadcast_address=10.23.21.255
+ new_dhcp_lease_time=43200
+ new_dhcp_message_type=5
+ new_dhcp_server_identifier=10.22.226.254
+ new_domain_name=jf.intel.com
+ new_domain_name_servers=10.22.227.254 10.22.226.254 143.181.9.1
+ new_expiry=1274220932
+ new_filename=BStrap/X86pc/BStrap.0
+ new_ip_address=10.23.21.47
+ new_network_number=10.23.21.0
+ new_routers=10.23.21.1
+ new_subnet_mask=255.255.255.0
+ [ -n 10.23.21.1 ]
+ dest=10.23.21.1
+ [ -n  ]
+ [ -z 10.23.21.1 ]
+ [ -n 10.23.21.1 ]
+ arping -q -f -w 60 -I eth0 10.23.21.1
+ source_all netroot
+ local f
+ [ netroot ]
+ [ -d /netroot ]
+ return
+ /sbin/fcoeroot eth0 fcoe:eth2:dcb /sysroot
+ [ -f /tmp/dhclient.eth0.lease ]
+ cp /tmp/dhclient.eth0.lease /tmp/net.eth0.lease
+ [ -f /tmp/dhclient.eth0.dhcpopts ]
+ cp /tmp/dhclient.eth0.dhcpopts /tmp/net.eth0.dhcpopts
+ [ ! -f /tmp/net.ifaces ]
+ echo eth0
+ exit 0

Thanks,
Supreeth

PS: I will try to find you on IRC. I don't have an IRC client installed but will work on it!

Comment 89 Hans de Goede 2010-05-18 20:09:38 UTC
Hi,

Thanks for the log I see now that there is a bug in the anaconda dracut cmdline generating code.

The current dracut does not expect a netroot=fcoe:eth2:dcb argument, but rather
an fcoe=eth2:dcb argument, this will also get rid of dracut bringing online
eth0 as it sees a netroot= argument, and the need for the empty /sbin/fcoeroot
binary.

I'll do a new updates.img including a fix for this tomorrow. For now if you edit the cmdline in grub.conf and change the
netroot=fcoe:eth2:dcb
to
fcoe=eth2:dcb

Things should work better.

Thanks,

Hans

Comment 90 Supreeth 2010-05-18 21:22:53 UTC
Hi Hans,

This worked like a charm :-) I was able to successfully boot from the remote LUN with the stub fcoeroot script and after changing netroot= to fcoe=eth2:dcb. I am not seeing any other issues as of now.

Looking forward to your new updates.img. I will try spend some time debugging the C code you attached. Hopefully I'll have an update on that for you tomorrow!

So to summarize, the changes we need to work on in the ISOs for fully automated fcoe boot solution are

1. Include either the fix or workaround to anaconda viewing the remote disk as an fcoe disk and not basic disk.
2. Ensure kernel cmd line has fcoe=ethX:dcb instead of netroot=fcoe:ethX:dcb
3. Have the fcoeroot script in /sbin of initrd.

Thanks again for all your help and assistance today!  

cheers,
Supreeth

Comment 91 Hans de Goede 2010-05-19 08:40:33 UTC
(In reply to comment #90)
> Hi Hans,
> 
> This worked like a charm :-) I was able to successfully boot from the remote
> LUN with the stub fcoeroot script and after changing netroot= to fcoe=eth2:dcb.
> I am not seeing any other issues as of now.

That is very good news!

> So to summarize, the changes we need to work on in the ISOs for fully automated
> fcoe boot solution are
> 
> 1. Include either the fix or workaround to anaconda viewing the remote disk as
> an fcoe disk and not basic disk.

Correct, I would like to note that it is greatly prefered to fix id_path,
the current workaround is a bit of a hack which depends on the output
of "readlink -f /sys/block/sdx" following a certain pattern for fcoe disks.

To be precise the python code of the workaround looks like this:

def udev_device_is_fcoe(info):
    return "fcoe" in info["sysfs_path"]

def udev_device_get_fcoe_nic(info):
    return info["sysfs_path"].split("/")[4].split(".")[0]

Where info["sysfs_path"] is the output of "readlink -f /sys/block/sdx"
with the /sys prefix removed. I'm afraid esp the second function
will break in some cases, so I would much rather see id_path fixed.

> 2. Ensure kernel cmd line has fcoe=ethX:dcb instead of netroot=fcoe:ethX:dcb

Right the new updates.img (see below) should fix this, as will the
next anaconda build for rhel-6

> 3. Have the fcoeroot script in /sbin of initrd.

No, I believe the need for that was caused by the mistaken use of netroot=
please test a fresh install with the new updates.img, I believe this will
bootup without issues, iow no dracut changes are necessary.

Here is an updated updates.img which should result in anaconda writing
the proper kernel cmdline dracut option for fcoe, and still including
the path_id workaround:
http://people.fedoraproject.org/~jwrdegoede/fcoe-updates.img

Comment 92 Supreeth 2010-05-19 14:56:04 UTC
Hi Hans,

> Correct, I would like to note that it is greatly prefered to fix id_path,
> the current workaround is a bit of a hack which depends on the output
> of "readlink -f /sys/block/sdx" following a certain pattern for fcoe disks.
> To be precise the python code of the workaround looks like this:
> def udev_device_is_fcoe(info):
>     return "fcoe" in info["sysfs_path"]
> def udev_device_get_fcoe_nic(info):
>     return info["sysfs_path"].split("/")[4].split(".")[0]
> Where info["sysfs_path"] is the output of "readlink -f /sys/block/sdx"
> with the /sys prefix removed. I'm afraid esp the second function
> will break in some cases, so I would much rather see id_path fixed.

I agree. I would like to see ths fixed too rather than use a workaround unless there is no other option. I will work on debugging this and hopefully get a fix soon.

> Here is an updated updates.img which should result in anaconda writing
> the proper kernel cmdline dracut option for fcoe, and still including
> the path_id workaround:
> http://people.fedoraproject.org/~jwrdegoede/fcoe-updates.img   

Awesome. let me give this a whirl after our Red Hat FCoE call in a few minutes and post an update!

Thanks,
Supreeth

Comment 93 Hans de Goede 2010-06-07 13:44:37 UTC
Hi Supreeth,

Have you had a chance yet to look into the id_path issue yet? We really need some help in getting this fixed.

Thanks & Regards,

Hans

Comment 94 Supreeth 2010-06-07 14:53:12 UTC
Hi Hans,

I should have updated this BZ earlier. I took a look at the id_path and did not find any issues. We think the problem is being caused by an issue which is being tracked at 

https://bugzilla.redhat.com/show_bug.cgi?id=595522

Thanks,
Supreeth

Comment 95 Hans de Goede 2010-06-07 15:22:30 UTC
Hi Supreeth,

QA gave me access to a FCoE equipped machine, and id_path returns the following
there:
[root@storageqe-03 ~]# /lib/udev/path_id /devices/virtual/net/eth4.802-fcoe/host3/rport-3:0-4/target3:0:1/3:0:1:0/block/sde
ID_PATH=fc-0x500a0982871965c8:0x0000000000000000

However anaconda expects (and that is what the older id_path gave us AFAIK):
pci-eth#-fc-${id}

Where as this looks like it is just
fc-${id}

Which means:
1) We cannot distinguish regular fc and fcoe this way
2) We cannot determine which nic the disk is attached to this way

Regards,

Hans

Comment 96 Supreeth 2010-06-07 15:51:23 UTC
Hi Hans,

I see what you're saying. So I guess we want to change id_path so handle_scsi_fibre_channel prepends a pci-eth#- to the current string or write a new handle_scsi_fcoe function? I am not a udev expert by any means however. is there a udev function we can use to get the eth name associated with the fcoe disk so we can include the correct number in the eth# part?

Thanks,
Supreeth

Comment 97 Hans de Goede 2010-06-08 11:40:22 UTC
Hi Supreeth,

I've investigated this further and the problem is that with virtual devices like a vlan device, there is no link in sysfs between the vlan device and the nic over which it is running. So the id-path output is correctly only:
fc-0x500a0982871965c8:0x0000000000000000

Thus I've written an anaconda patch combining this id path output with some
direct sysfs path parsing to get the necessary information about the disk.

Note that this patch also requires bug 595522 to be resolved.

Regards,

Hans

p.s.

I've noticed that the wwpn (and the scsi id) for all disks attached to our
FCoE test machine is the same, this does not seem right:

[root@storageqe-03 ~]# cat /sys/devices/virtual/net/eth4.802-fcoe/host3/rport-3:0-3/target3:0:0/fc_transport/target3:0:0/port_name
0x500a0983971965c8
[root@storageqe-03 ~]# cat /sys/devices/virtual/net/eth4.802-fcoe/host3/rport-3:0-4/target3:0:1/fc_transport/target3:0:1/port_name
0x500a0982871965c8
[root@storageqe-03 ~]# cat /sys/devices/virtual/net/eth6.802-fcoe/host4/rport-4:0-4/target4:0:0/fc_transport/target4:0:0/port_name
0x500a0982871965c8
[root@storageqe-03 ~]# cat /sys/devices/virtual/net/eth6.802-fcoe/host4/rport-4:0-5/target4:0:1/fc_transport/target4:0:1/port_name
0x500a0983971965c8

Comment 98 Gene Heskett 2010-06-12 23:04:26 UTC
This may be a different bug, (I did search) but I just got bit when installing Mint9-x64 on /dev/sdb.  I have 4 drives in this box.

It did a system scan at the end of the install and found the other bootable mdv install on /dev/sdd, but while it got the (hd3,1) statement for the kernel line correct, it blew the line for the initrd, placing it on (hd3,0)

I was able to boot this mdv install by live editing that line to read (hd3,1) also, and then it booted.

So it would appear the scan the system utility gets it wrong.  I have now made /dev/sdb1/grub/grub.cfg editable long enough to fix it, then made it read-only and set the +i attribute too, so I'm fine till I need to do another kernel, but a new bee is gonna be flummoxed.

Comment 103 Supreeth 2010-07-01 21:55:46 UTC
Hi Hans,

Here's an update after testing Beta 2 for full DVD install: Verified that the udev fix is in place and that the kernel cmd line is generated correctly in grub.conf. I however ran into a failure when trying to find the root device as the FCoE interface went down for some reason. I think it was a link issue but will investigate further and report back. This looks good so far!

We'll have an update on the converged traffic BZ tomorrow.

Thanks!
Supreeth

Comment 104 Denise Dumas 2010-07-07 14:38:11 UTC
Supreeth, did this turn out to be a link issue, or is there a problem lurking here?
Thanks!
Denise

Comment 105 Supreeth 2010-07-07 14:42:11 UTC
I think there is more than a link issue here. I filed BZ 611976 for this issue. I'm going to try out the patch harald proposed after our weekly call this morning.

Thanks!
Supreeth

Comment 106 Alexander Todorov 2010-07-13 16:36:43 UTC
Hi Supreeth,
comment #103 indicates positive test result. Can we move this bug to VERIFIED state or you'd like to do more testing before declaring this feature fixed?

Comment 107 Supreeth 2010-07-13 16:42:50 UTC
Hi Alex,

We definitely need to test this in Snapshot 7 before moving it to VERIFIED. There are a couple of fixes we are expecting in snapshot 7 that relate directly to boot (BZ 611796, BZ 602330).

Thanks,
Supreeth

Comment 108 Robert M Williams 2010-08-30 14:25:39 UTC
moving to verified based on comment 103 and bugs referenced in comment 107 have also addressed

Comment 109 releng-rhel@redhat.com 2010-11-10 19:36:25 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.