Bug 729340 - kernel 2.6.40-4 running on EC2 makes devices ordering wrong
Summary: kernel 2.6.40-4 running on EC2 makes devices ordering wrong
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-09 15:26 UTC by Marek Goldmann
Modified: 2011-11-26 21:26 UTC (History)
14 users (show)

Fixed In Version: kernel-2.6.40.3-0.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-17 21:05:40 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
EC2 boot log (19.67 KB, text/plain)
2011-08-09 15:26 UTC, Marek Goldmann
no flags Details
Successful boot log (30.06 KB, text/plain)
2011-08-10 06:44 UTC, Marek Goldmann
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 733313 0 unspecified CLOSED Fedora 16 kernel-3.0.1-3 running on EC2 makes devices ordering wrong 2021-02-22 00:41:40 UTC

Internal Links: 733313

Description Marek Goldmann 2011-08-09 15:26:05 UTC
Created attachment 517429 [details]
EC2 boot log

Description of problem:

When using kernel-2.6.40-4 on Fedora 15 AMI it makes the detection and ordering of available block devices wrong:

[7913500.473912] blkfront: xvde1: barrier or flush: disabled
[7913500.474696] Setting capacity to 20971520
[7913500.474708] xvde1: detected capacity change from 0 to 10737418240
[7913500.492794] blkfront: xvdf: barrier or flush: disabled
[7913500.493882] xvdf: unknown partition table
[7913500.494695] Setting capacity to 880732160
[7913500.494708] xvdf: detected capacity change from 0 to 450934865920
[7913500.501176] blkfront: xvdg: barrier or flush: disabled
[7913500.505442] xvdg: unknown partition table
[7913500.505676] Setting capacity to 880732160

Devices should be detected starting with letter "a": /dev/xvda, /dev/xvdb and so on.

With kernel-2.6.38.8-35.fc15 - everything works as expected.

Examples with new kernel (wrong):

/dev/xvde1
/dev/xvdf
/dev/xvdg

And with older kernel (correct):

/dev/xvda1
/dev/xvdb
/dev/xvdc

Attached a boot log from AWS.

The issue causes boot issues on *every AMI that updates to new 2.6.40 kernel*.

Version-Release number of selected component (if applicable):

kernel-2.6.40-4.fc15

How reproducible:

Always

Steps to Reproduce:
1. Upgrade to 2.6.40-4 on EC2
2. Reboot.
  
Actual results:

Device names incorrect, boot fails.

Expected results:

Correct device names, successful boot.

Comment 1 Konrad Rzeszutek Wilk 2011-08-09 17:56:39 UTC
Can you attach the bootup log of a successful boot please (I am curious if it worked with 2.6.38 or 2.6.39)? How does one go about reproducing this? Can you dump the xenstore keys using this little script inside the guest (the working and the non-working one please):

You might have to install the xenstore-ls tool - not exactly sure which package that is in FC15 or FC14. 

sh-4.1# more /a
A=1
DEV=`dmesg | grep vbd | sed s/.*vbd//`
RC=1
while (true)
do
 # /local/domain/0/backend/vbd/12/51760
 for a in $DEV
 do
        xenstore-ls /local/domain/0/backend/vbd/$A$a
        RC=$?
 done
 if [ $RC -eq 0 ]; then
        break;
 fi
 A=$(($A+1))
 if [ $A -eq 0 ]; then
        break;
 fi
done

You should get something like this:

frontend = "/local/domain/12/device/vbd/51712"
physical-device = "fc:a"
params = "/dev/vg_guest_1/data-ext4"
frontend-id = "12"
online = "1"
removable = "0"
bootable = "1"
state = "4"
dev = "xvda"
type = "phy"
mode = "w"
feature-flush-cache = "1"
sectors = "20971520"
info = "0"
sector-size = "512"
frontend = "/local/domain/12/device/vbd/51744"
physical-device = "fc:9"
params = "/dev/vg_guest_1/data-ext3"
frontend-id = "12"
online = "1"
removable = "0"
bootable = "1"
state = "4"
... and so on.

Comment 2 Konrad Rzeszutek Wilk 2011-08-09 18:07:13 UTC
Ah, hadn't noticed that the previous working instance was 2.6.38. You guys skipped 2.6.39 in this test.

In which case I think you are hitting:
c80a420995e721099906607b07c09a24543b31d9:
Author: Stefano Stabellini <stefano.stabellini.com>
Date:   Thu Dec 2 17:55:00 2010 +0000

    xen-blkfront: handle Xen major numbers other than XENVBD

    This patch makes sure blkfront handles correctly virtual device numbers
    corresponding to Xen emulated IDE and SCSI disks: in those cases
    blkfront translates the major number to XENVBD and the minor number to a
    low xvd minor.

    Note: this behaviour is different from what old xenlinux PV guests used
    to do: they used to steal an IDE or SCSI major number and use it
    instead.

And for 3.1 we have this patch:
commit 196cfe2ae8fcdc03b3c7d627e7dfe8c0ce7229f9
Author: Stefan Bader <stefan.bader>
Date:   Thu Jul 14 15:30:22 2011 +0200

    xen-blkfront: Drop name and minor adjustments for emulated scsi devices
    
    These were intended to avoid the namespace clash when representing
    emulated IDE and SCSI devices. However that seems to confuse users
    more than expected (a disk defined as sda becomes xvde).
    So for now go back to the scheme which does no adjustments. This
    will break when mixing IDE and SCSI names in the configuration of
    guests but should be by now expected.
    
    Acked-by: Stefano Stabellini <stefano.stabellini.com>
    Signed-off-by: Stefan Bader <stefan.bader>
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk>

which fixes that. I think that is what you are hitting.

Comment 4 Marek Goldmann 2011-08-10 06:44:52 UTC
Created attachment 517524 [details]
Successful boot log

(In reply to comment #1)
> Can you attach the bootup log of a successful boot please (I am curious if it
> worked with 2.6.38 or 2.6.39)? How does one go about reproducing this? Can you
> dump the xenstore keys using this little script inside the guest (the working
> and the non-working one please):
> 
> You might have to install the xenstore-ls tool - not exactly sure which package
> that is in FC15 or FC14. 

This script only returns this output in an unending loop:

    xenstore-ls: xs_open: No such file or directory

$ dmesg | grep vbd | sed s/.*vbd//
/2049
/2064
/2080

Successful boot log attached, please ignore running yum afterwards.

Comment 5 Marek Goldmann 2011-08-10 06:48:12 UTC
(In reply to comment #2)
> Ah, hadn't noticed that the previous working instance was 2.6.38. You guys
> skipped 2.6.39 in this test.

There was no 2.6.39 kernel submitted for Fedora 15, or at least I cannot find such one:

    https://admin.fedoraproject.org/updates/search/kernel

Comment 7 Konrad Rzeszutek Wilk 2011-08-10 13:25:14 UTC
Ugh, well then I think we are back to trying trying the patch:

commit 196cfe2ae8fcdc03b3c7d627e7dfe8c0ce7229f9
Author: Stefan Bader <stefan.bader>
Date:   Thu Jul 14 15:30:22 2011 +0200

    xen-blkfront: Drop name and minor adjustments for emulated scsi devices
    
    These were intended to avoid the namespace clash when representing
    emulated IDE and SCSI devices. However that seems to confuse users
    more than expected (a disk defined as sda becomes xvde).
    So for now go back to the scheme which does no adjustments. This
    will break when mixing IDE and SCSI names in the configuration of
    guests but should be by now expected.
    
    Acked-by: Stefano Stabellini <stefano.stabellini.com>
    Signed-off-by: Stefan Bader <stefan.bader>
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk>

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b536a9c..238b941 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -123,8 +123,8 @@ static DEFINE_SPINLOCK(minor_lock);
 #define BLKIF_MINOR_EXT(dev) ((dev)&(~EXTENDED))
 #define EMULATED_HD_DISK_MINOR_OFFSET (0)
 #define EMULATED_HD_DISK_NAME_OFFSET (EMULATED_HD_DISK_MINOR_OFFSET / 256)
-#define EMULATED_SD_DISK_MINOR_OFFSET (EMULATED_HD_DISK_MINOR_OFFSET + (4 * 16))
-#define EMULATED_SD_DISK_NAME_OFFSET (EMULATED_HD_DISK_NAME_OFFSET + 4)
+#define EMULATED_SD_DISK_MINOR_OFFSET (0)
+#define EMULATED_SD_DISK_NAME_OFFSET (EMULATED_SD_DISK_MINOR_OFFSET / 256)
 
 #define DEV_NAME	"xvd"	/* name in /dev */
 

Can you apply it the kernel and try it out?

Comment 8 Marek Goldmann 2011-08-10 16:12:16 UTC
(In reply to comment #7)
> 
> Can you apply it the kernel and try it out?

I'll do this shortly!

Comment 9 Marek Goldmann 2011-08-11 06:39:38 UTC
Konrad,

I can confirm, that the patch fixes the issue! I created a fresh AMI and booted it successfully.

Comment 10 Dave Jones 2011-08-11 16:02:25 UTC
thanks for testing. I'll add this for the next build.
I'll push out a 2.6.40.2 update as soon as GregKH drops a 3.0.2 release upstream.

thanks for your help chasing this down Konrad.

Comment 11 Fedora Update System 2011-08-16 12:46:16 UTC
kernel-2.6.40.3-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.3-0.fc15

Comment 12 Fedora Update System 2011-08-17 01:17:00 UTC
Package kernel-2.6.40.3-0.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.40.3-0.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/kernel-2.6.40.3-0.fc15
then log in and leave karma (feedback).

Comment 13 Fedora Update System 2011-08-18 02:29:11 UTC
kernel-2.6.40.3-0.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.