Bug 840507

Summary: grub2 gets stuck in an infinite loop on qemu
Product: [Fedora] Fedora Reporter: IBM Bug Proxy <bugproxy>
Component: grub2Assignee: Peter Jones <pjones>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: dennis, gustavold, jkachuck, mads, pjones, wgomerin
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-20 16:42:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description IBM Bug Proxy 2012-07-16 14:01:06 UTC
Grub2 cannot boot a kernel running on a qemu environment. Even a 'ls' on grub2's command line is enough to get it stuck in an infinite loop.
Following the output of 'ls' on grub2's command line with debug enabled. The last lines repeat indefinitely.

                             GNU GRUB  version 2.00

   Minimal BASH-like line editing is supported. For the first word, TAB      
   lists possible command completions. Anywhere else TAB lists possible      
   device or file completions. ESC at any time exits.      
   

grub> set pager=1
grub> set debug=all
script/script.c:65: free 0x7ffd4660
script/script.c:65: free 0x7ffd4710
script/script.c:65: free 0x7ffd4730
script/script.c:65: free 0x7ffd40d0
script/script.c:65: free 0x7ffd4100
script/script.c:65: free 0x7ffd4120
script/script.c:65: free 0x7ffd4150
script/script.c:65: free 0x7ffd4450
script/script.c:65: free 0x7ffd4570
script/script.c:65: free 0x7ffd4680
script/script.c:65: free 0x7ffd46b0
script/script.c:65: free 0x7ffd46d0
grub> ls
script/lexer.c:318: token 288 text [ls]
script/script.c:50: malloc 0x7ffd4260
script/script.c:50: malloc 0x7ffd4240
script/script.c:163: arglist
script/script.c:50: malloc 0x7ffd4150
script/lexer.c:318: token 259 text [
]
script/script.c:50: malloc 0x7ffd4120
script/script.c:50: malloc 0x7ffd4100
script/script.c:198: cmdline
script/script.c:50: malloc 0x7ffd7850
script/lexer.c:318: token 0 text []
script/script.c:50: malloc 0x7ffd40d0
script/script.c:50: malloc 0x7ffd7830
script/script.c:294: append command
script/script.c:50: malloc 0x7ffd68f0
kern/disk.c:230: Opening `ieee1275/disk,msdos2'...
disk/ieee1275/ofdisk.c:330: Opening `disk'.
partmap/msdos.c:181: partition 0: flag 0x80, type 0x41, start 0x800, len
0x2000
partmap/msdos.c:181: partition 1: flag 0x0, type 0x83, start 0x2800, len
0xfa000 
kern/fs.c:55: Detecting ext2...
kern/disk.c:326: Closing `ieee1275/disk'.
kern/dl.c:602: module at 0x7ffde7b0, size 0x1638
kern/dl.c:626: relocating to 0x7ffd46e0
kern/dl.c:590: flushing 0x1695 bytes at 0x7ffdd100
kern/dl.c:649: module name: ls
kern/dl.c:650: init function: 0x7ffdd750
kern/ieee1275/openfw.c:155: devalias name = scsi
kern/ieee1275/openfw.c:155: devalias name = cdrom
disk/ieee1275/ofdisk.c:126: disk name = cdrom, path =
/vdevice/v-scsi@1002/cdrom@2,0
disk/ieee1275/ofdisk.c:90: devpath = cdrom, canonical =
/vdevice/v-scsi@1002/cdrom@2,0
kern/ieee1275/openfw.c:155: devalias name = disk
disk/ieee1275/ofdisk.c:126: disk name = disk, path =
/vdevice/v-scsi@1002/disk@0,0
disk/ieee1275/ofdisk.c:90: devpath = disk, canonical =
/vdevice/v-scsi@1002/disk@0,0
kern/ieee1275/openfw.c:155: devalias name = net
kern/ieee1275/openfw.c:155: devalias name = hvterm
kern/ieee1275/openfw.c:155: devalias name = name
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@d8, path =
/vdevice/v-scsi@1002/disk@d8
disk/ieee1275/ofdisk.c:90: devpath = /vdevice/v-scsi@1002/disk@d8, canonical =
/vdevice/v-scsi@1002/disk@d8
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@88, path =
/vdevice/v-scsi@1002/disk@88
disk/ieee1275/ofdisk.c:90: devpath = /vdevice/v-scsi@1002/disk@88, canonical =
/vdevice/v-scsi@1002/disk@88
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@d8, path =
/vdevice/v-scsi@1002/disk@d8
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@88, path =
/vdevice/v-scsi@1002/disk@88
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@d8, path =
/vdevice/v-scsi@1002/disk@d8
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@88, path =
/vdevice/v-scsi@1002/disk@88
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@d8, path =
/vdevice/v-scsi@1002/disk@d8
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@88, path =
/vdevice/v-scsi@1002/disk@88
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@d8, path =
/vdevice/v-scsi@1002/disk@d8
disk/ieee1275/ofdisk.c:126: disk name = /vdevice/v-scsi@1002/disk@88, path =
/vdevice/v-scsi@1002/disk@88
(continues indefinitely)


Steps to reproduce:
1) Get latest upstream qemu code: git clone git://git.qemu.org/qemu.git
2) Configure it: ./configure --target-list=ppc64-softmmu
3) Build it: make
4) Create an image: ./qemu-img create -f raw fedora.img 10G
5) Run qemu: ./ppc64-softmmu/qemu-system-ppc64 -M pseries -m 1024  -nographic -cdrom Fedora-17-ppc64-DVD.iso fedora.img
6) Follow the installation instructions
7) Reboot the VM and try to boot the installed kernel

Expected Results:
Grub2 should boot the installed kernel

Actual Results:
Grub2 gets stuck trying to boot the installed kernel

Additional info:
The same results happen both with grub2-2.0-0.36.beta6.fc17 and grub2-2.00-1.fc18.

As for the cause of the problem itself, I am not sure, but from a quick look it seems like the unit address (the @xxxx part of the device path) is wrong.

My understanding is that grub2 is messing around with unit addresses and tries to "know" how to build them up from scratch. The reality is that unit addresses are pretty device specific and thus such an algorithm can only be very fragile. In this case, it tries to build a unit address that might work with IBM open firmware vscsi driver under pHyp but doesn't with SLOF (the former uses a custom format, the latter uses the old OFW standard for SCSI addresses of @id,lun).

I think grub2 should be less pro-active at messing around with these, ie, only do that when installed if it knows for sure that it will have to access files outside of the device it was loaded from. In 99% of the cases, it will not have to do that and can just use the path it was loaded from as a device-path with a working unit address. This is how yaboot does it as well and it works reliably.

Comment 1 IBM Bug Proxy 2012-07-18 23:00:21 UTC
------- Comment From bherren.com 2012-07-18 22:51 EDT-------
So I'm going to use this bug as a place for a more complete
discussion on the problem of having the appropriate unit
addresses (the last @xxx part) in OFW path. Can somebody make
sure we have the right grub2 people CCed on the redhat side ?

So the unit address is more or less adapter specific. Each adapter
firmware has its own way of encoding it unfortunately. We can start
building specific knowledge about each adapter type in our grub2
configuration script (fortunately we have a limited number of
supported adapters with OF firmwares on them), but it might be better
to seek a solution involving the kernel drivers knowing about the
methods used by the firmware for the specific adapters it drives.

I'm adding Brian on CC who might help discuss that from a SCSI
driver perspective. Ideally we'd want to add a "devspec" attribute
to the sysfs nodes of the disks.

So I've collected some info about a couple of common adapters.

First VSCSI:

The unit address for vscsi is the SRP "LUN" value (which is not
the same as the SCSI LUN). The formula to calculate it can be found in
the linux driver:

static inline u16 lun_from_dev(struct scsi_device *dev)
{
return (0x2 << 14) | (dev->id << 8) | (dev->channel << 5) | dev->lun;
}

Currently, SLOF in qemu doesn't use the above formula however, but I
will change it ASAP so grub2 doesnt have to deal with two different
methods for vscsi.

Then we have IPR/Obsidian. This itself falls into two categories, the
newer "SIS64" variants, which you can recognize via the presence of an
"ibm,sis64" property in the adapter device node, and the older "SIS32"
variants which don't have this property.

For SIS32, the unit address is a "resource address" of the form

(bus << 16) | (id << 8) | lun

However, bus can be 0xff when using HW RAID

For SIS64, the unit address is the SAS WWN of the disk (though it
might get appended a ",lun" when applicable, we need to double check
that)

So here we'll have to detect the adapter type, we'll also need to be
careful that below the adpater PCI device in the device-tree can be
a "functional" sub node to differenciate SAS from SATA which some of
those support (for the optical drive). I don't know what the address
encoding scheme is for SATA btw, I'll try to find it out later.

Due to the above complexity, it's clearly a piece of logic that is
best located in the IPR driver itself, which could either expose an
ioctl to retrieve a disk unit address or better, would create sysfs
devspec attributes in the disk sysfs directories, but that won't
happen immediately.

I'm still trying to get more info about other supported adapters such
as our fiber channel ones.

Additionally, there are some methods that our adapter firmwares
provide that can be called within the OFW environment to retrieve
lists of attached devices. Those are used by the SMS menu system, and
would be handy for grub2 to be able to use as well in some cases, to
display fallback menus of devices maybe, that sort of thing...

I'm in the process of obtaining the documentation for these and will
update this BZ when I have it.

Comment 2 IBM Bug Proxy 2012-07-18 23:10:21 UTC
------- Comment From bjking1.com 2012-07-18 23:04 EDT-------
Is there any reason that grub2 can't use the ofpathname script that is included in powerpc-utils to translate from a logical device name (i.e. /dev/sda) to an OF path name? There are far too many different pieces of code trying to do this translation, which makes it nearly impossible to keep them all up to date when the OF binding changes.

As for doing something in the ipr driver itself, the ipr driver does expose some attributes in sysfs that the ofpathname script uses in order to be able to build the OF path, but they still require some intelligence to know how to use them.

However, I'm not convinced that adding a devspec to the ipr driver for each device is the right answer, since there are plenty of other I/O adapters that need special treatment as well - LSI SAS, QLogic FC, Emulex FC, VSCSI, VFC, and more...

Comment 3 IBM Bug Proxy 2012-07-19 00:40:20 UTC
------- Comment From pfsmorigo.com 2012-07-19 00:31 EDT-------
I tested ofpathname here and it shows the id in the pHyp's OFW format:

[root@localhost target0:0:0]# ofpathname /dev/sda1
/vdevice/v-scsi@1002/disk@8000000000000000

I discover that the boot-device was blank after the instalation:

0 > printenv
---environment variable--------current value-------------default value------
use-axon-ddr?               true                      true
real-mode?                  true                      true
direct-serial?              false                     false
use-nvramrc?                false                     false
selftest-#megs              0                         0
security-password
security-mode               0                         0
security-#badlogins         0                         0
screen-#rows                200                       200
screen-#columns             200                       200
output-device
oem-logo?                   false                     false
oem-logo
oem-banner?                 false                     false
oem-banner
nvramrc
input-device
fcode-debug?                true                      true
diag-switch?                false                     false
diag-file
diag-device
boot-command                boot                      boot
boot-file
boot-device
auto-boot?                  true                      true

The device tree and the alias:

0 > ls
3e597a18 :  /vdevice
3e597c98 :  |-- vty@1000
3e597e70 :  |-- l-lan@1001
3e5981f8 :  +-- v-scsi@1002
3e5b7368 :      |-- disk@0,0
3e5b7a58 :      +-- cdrom@2,0 ok

0 > devalias
scsi : /vdevice/v-scsi@1002
cdrom : /vdevice/v-scsi@1002/cdrom@2,0
disk : /vdevice/v-scsi@1002/disk@0,0
net : /vdevice/l-lan@1001
hvterm : /vdevice/vty@1000 ok

I set the boot-device and the system booted:
setenv boot-device disk

Comment 4 IBM Bug Proxy 2012-07-19 01:20:24 UTC
------- Comment From bherren.com 2012-07-19 01:10 EDT-------
So to begin with, I wasn't even aware we had an ofpathname script in powerpc-utils :-)

As for the problem with qemu vs. OFW using a different format for vscsi, as I said in my previous post, I will fix that in qemu/SLOF, hopefully later today.

------- Comment From bherren.com 2012-07-19 01:13 EDT-------
BTW. We should also make yaboot use ofpathname instead of its own built-in ofpath.... might cause some "interesting" dependencies but probably the way to go, a single script to rule them all and in the darkness of forth bind them !

Comment 5 IBM Bug Proxy 2012-07-19 07:00:34 UTC
------- Comment From bherren.com 2012-07-19 06:58 EDT-------
I've now fixed qemu to behave like vscsi. I've also added a vscsi-report-luns method (which unlike OFW one supports multiple SCSI IDs which from what I can tell grub will parse properly).

This makes grub2 works in qemu for me. I've pushed the fixes to github and will submit a qemu patch to get a new build of SLOF upstream.

Comment 6 Peter Jones 2012-08-08 18:58:00 UTC
So can we close this out, then?