910359 – Incorrect lun numbers of virtio_scsi disks

Bug 910359 - Incorrect lun numbers of virtio_scsi disks

Summary: Incorrect lun numbers of virtio_scsi disks

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	qemu
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Fedora Virtualization Maintainers
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-02-12 13:20 UTC by Lukáš Doktor
Modified:	2015-01-22 14:24 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-01-22 14:24:14 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
seabios output (23.43 KB, text/x-log) 2013-02-12 13:20 UTC, Lukáš Doktor	no flags	Details
serial console output (38.14 KB, text/x-log) 2013-02-12 13:21 UTC, Lukáš Doktor	no flags	Details
View All

Description Lukáš Doktor 2013-02-12 13:20:08 UTC

Description of problem:
The specified lun number on cmdline does not match the lun number in /proc/scsi/scsi


Version-Release number of selected component (if applicable):
host: Fedora 17
      qemu-kvm-1.0.1-3.fc17.x86_64
      3.6.11-5.fc17.x86_64
guest: Fedora 17
       kernel-3.7.3-101.fc17.x86_64


How reproducible:
Always


Steps to Reproduce:
1. /bin/qemu-kvm -S -name 'vm1' -nodefaults -monitor stdio -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20130212-131229-D51juGBx,server,nowait -device isa-serial,chardev=serial_id_serial1 -chardev socket,id=seabioslog_id_20130212-131229-D51juGBx,path=/tmp/seabios-20130212-131229-D51juGBx,server,nowait -device isa-debugcon,chardev=seabioslog_id_20130212-131229-D51juGBx,iobase=0x402 -device ich9-usb-uhci1,id=usb1 -drive file='/home/medic/Work/Projekty/autotest/autotest-ldoktor/client/tests/virt/shared/data/images/jeos-17-64.qcow2',if=none,id=virtio0 -device virtio-blk-pci,drive=virtio0,bootindex=1 -device virtio-scsi-pci,id=virtio_scsi_pci0 -drive file='/tmp/stg0.qcow2',if=none,id=virtio-scsi2-i0-l0 -device scsi-disk,scsi-id=0,lun=0,drive=virtio-scsi2-i0-l0 -drive file='/tmp/stg1.qcow2',if=none,id=virtio-scsi3-i0-l8191 -device scsi-disk,scsi-id=0,lun=8191,drive=virtio-scsi3-i0-l8191 -drive file='/tmp/stg2.qcow2',if=none,id=virtio-scsi4-i0-l16382 -device scsi-disk,scsi-id=0,lun=16382,drive=virtio-scsi4-i0-l16382 -drive file='/tmp/stg3.qcow2',if=none,id=virtio-scsi5-i127-l0 -device scsi-disk,scsi-id=127,lun=0,drive=virtio-scsi5-i127-l0 -drive file='/tmp/stg4.qcow2',if=none,id=virtio-scsi6-i127-l8191 -device scsi-disk,scsi-id=127,lun=8191,drive=virtio-scsi6-i127-l8191 -drive file='/tmp/stg5.qcow2',if=none,id=virtio-scsi7-i127-l16382 -device scsi-disk,scsi-id=127,lun=16382,drive=virtio-scsi7-i127-l16382 -drive file='/tmp/stg6.qcow2',if=none,id=virtio-scsi8-i254-l0 -device scsi-disk,scsi-id=254,lun=0,drive=virtio-scsi8-i254-l0 -drive file='/tmp/stg7.qcow2',if=none,id=virtio-scsi9-i254-l8191 -device scsi-disk,scsi-id=254,lun=8191,drive=virtio-scsi9-i254-l8191 -drive file='/tmp/stg8.qcow2',if=none,id=virtio-scsi10-i254-l16382 -device scsi-disk,scsi-id=254,lun=16382,drive=virtio-scsi10-i254-l16382 -device virtio-net-pci,netdev=idiM6M1W,mac='9a:95:96:97:98:99',id='idc9mORo' -netdev user,id=idiM6M1W,hostfwd=tcp::5000-:22 -m 512 -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 -cpu 'Nehalem' -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :0 -vga std -rtc base=utc,clock=host,driftfix=none  -boot order=cdn,once=c,menu=off  -enable-kvm

2. info qtree - all disks present with correct lun/scsiid

3. cat /proc/scsi/scsi - disks with lun=0 are alright, other disks have lun number 2-3x higher.


Actual results:
Some disks have different number in qtree and in guest:
13:12:38 ERROR| Disk 0-127-8191 is in qtree but not in /proc/scsi/scsi.
13:12:38 ERROR| Disk 0-254-8191 is in qtree but not in /proc/scsi/scsi.
13:12:38 ERROR| Disk 0-0-16382 is in qtree but not in /proc/scsi/scsi.
13:12:38 ERROR| Disk 0-254-16382 is in qtree but not in /proc/scsi/scsi.
13:12:38 ERROR| Disk 0-127-16382 is in qtree but not in /proc/scsi/scsi.
13:12:38 ERROR| Disk 0-0-8191 is in qtree but not in /proc/scsi/scsi.

13:12:38 ERROR| Disk 0-254-32766 is in /proc/scsi/scsi but not in qtree.
13:12:38 ERROR| Disk 0-127-32766 is in /proc/scsi/scsi but not in qtree.
13:12:38 ERROR| Disk 0-0-24575 is in /proc/scsi/scsi but not in qtree.
13:12:38 ERROR| Disk 0-254-24575 is in /proc/scsi/scsi but not in qtree.
13:12:38 ERROR| Disk 0-0-32766 is in /proc/scsi/scsi but not in qtree.
13:12:38 ERROR| Disk 0-127-24575 is in /proc/scsi/scsi but not in qtree.


Expected results:
To have the specified LUN number on cmdline in the guest.


Additional info:
With guest kernel-3.6.10-2.fc17 it adds only disks with lun=0. In dmesg there is a message saying:

scsi: host 2 channel 0 id 254 lun32766 has a LUN larger than allowed by the host adapter

(or lun24575) and the corresponding disks are not available in guest.

Comment 1 Lukáš Doktor 2013-02-12 13:20:42 UTC

Created attachment 696490 [details]
seabios output

Comment 2 Lukáš Doktor 2013-02-12 13:21:12 UTC

Created attachment 696491 [details]
serial console output

Comment 3 Lukáš Doktor 2013-02-12 13:25:31 UTC

Outputs are based on modified autotest test 'multi_disk..virtio_scsi_variants..multi_scsiid_lun'. The total number of disks was lowered. Anyway the original test works the same way so you can test by running:

AUTOTEST_PATH=#PATH_TO_AUTOTEST ./run -v -t qemu --tests='multi_disk..virtio_scsi_variants..multi_scsiid_lun'

Comment 4 Paolo Bonzini 2013-02-18 13:34:10 UTC

Linux chooses to treat LUN numbers as "black boxes".  LUNs > 255 are represented as numbers between 0x4100 and 0x7FFF, so they come up with numbers starting at 16384+256=16640.

Comment 5 Lukáš Doktor 2013-02-20 13:11:01 UTC

Hi Paolo,

thank you for reply, I wasn't aware of that. Could you please tell me if there is any possibility to get the LUN mapping?

I tried sg_map, which gives me:
Strange, could not find device /dev/sdb mapped to sg device??
Strange, could not find device /dev/sdc mapped to sg device??
/dev/sg0  /dev/sda
/dev/sg1
/dev/sg2


sg_scan:
/dev/sg0: scsi2 channel=0 id=0 lun=0
/dev/sg1: scsi2 channel=0 id=0 lun=254
/dev/sg2: scsi2 channel=0 id=0 lun=255

proc/scsi/scsi:
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: QEMU     Model: QEMU HARDDISK    Rev: 1.0.
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 32766
  Vendor: QEMU     Model: QEMU HARDDISK    Rev: 1.0.
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 24575
  Vendor: QEMU     Model: QEMU HARDDISK    Rev: 1.0.
  Type:   Direct-Access                    ANSI  SCSI revision: 05


I defined:
virtio-scsi2-l0
virtio-scsi3-l8191
virtio-scsi4-l16382

This gives me three different lun numbers for the same device plus 1 error. I went through /sys but I'm unable to find any useful information.

I used these numbers for verifying qtree vs. guest_os disks.

Comment 6 Lukáš Doktor 2013-02-20 14:10:54 UTC

^^ or do you think I might rely on hook like this:
lun = int(scsi[3])
if lun > 16384:
    lun = lun - 16384

I still don't understand how would I remove/scan the specific disk if the lun numbers don't correspond.

Also when there is more disks, none of them are auto-discovered. When I run:
for AAA in `seq 0 33000` ; do echo "scsi add-single-device 2 0 0 $AAA" > /proc/scsi/scsi; done

I get some disks twice:
/dev/sg0: scsi2 channel=0 id=0 lun=0
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 
/dev/sg1: scsi2 channel=0 id=0 lun=63
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 
/dev/sg2: scsi2 channel=0 id=0 lun=126
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 
/dev/sg3: scsi2 channel=0 id=0 lun=189
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 
/dev/sg4: scsi2 channel=0 id=0 lun=252
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 
/dev/sg5: scsi2 channel=0 id=0 lun=59
...
/dev/sg517: scsi2 channel=0 id=0 lun=0
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 
/dev/sg518: scsi2 channel=0 id=0 lun=63
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 
/dev/sg519: scsi2 channel=0 id=0 lun=126
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 
/dev/sg520: scsi2 channel=0 id=0 lun=189
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 
/dev/sg521: scsi2 channel=0 id=0 lun=252
    QEMU      QEMU HARDDISK     1.0. [rmb=0 cmdq=1 pqual=0 pdev=0x0] 

That's really confusing for me. Would you be please so kind and give me some pointers where can I learn more about this? Or if you know where can I get reliable info about every disk in system, which I could compare to qtree? It would help me a lot.

Comment 7 Daniel Berrangé 2013-02-20 14:16:29 UTC

(In reply to comment #6)
> I still don't understand how would I remove/scan the specific disk if the
> lun numbers don't correspond.

If you want a reliable way to match up host vs guest devices, then the only good option is to specify a unique "serial" string for each disk you export to the guest. In the guest you can find the disk with that serial and then lookup its LUN number.

Comment 8 Lukáš Doktor 2013-02-22 18:36:45 UTC

Thank you I also thought about that. Still there is a problem how to get the original lun numbers to verify that they are set according the qemu command.

Anyway I still have the problem I described earlier. I add 262 disks, but in system there is 524 disks. There is cropped example for the first disk:

[cmdline]
-device scsi-disk,serial=0,lun=0,drive=virtio-scsi2-l0

[disks-by-path]
lrwxrwxrwx. 1 root root 10 22. úno 13.14 pci-0000:00:05.0-virtio-pci-virtio1-scsi-0:0:0:16384 -> ../../sdjb
lrwxrwxrwx. 1 root root  9 22. úno 13.14 pci-0000:00:05.0-virtio-pci-virtio1-scsi-0:0:0:0 -> ../../sda

[disks-by-id]
lrwxrwxrwx. 1 root root 10 22. úno 13.14 scsi-0QEMU_QEMU_HARDDISK_virtio-scsi2-l0 -> ../../sdjb

and when I mount the /dev/sda, create a file on it and then mount /dev/sdjb, the file is there.

I doubt this is a correct behavior, or am I wrong?

Comment 9 Lukáš Doktor 2013-02-26 14:30:28 UTC

I was told to use sg_luns, which should be most accurate. This command reports errors and is unable to get any information about lun, even about drives where sg_map works fine (sdsx is mapped to /dev/sg0):

[root@localhost ~]# sg_luns -v -s 2 /dev/sda
    report luns cdb: a0 00 02 00 00 00 00 00 20 00 00 00 
report luns:  Fixed format, current;  Sense key: Illegal Request
 Additional sense: Invalid field in cdb
  Info fld=0x0 [0] 
Report Luns command has bad field in cdb

[root@localhost ~]# sg_luns -v -s 2 /dev/sdsx
    report luns cdb: a0 00 02 00 00 00 00 00 20 00 00 00 
report luns:  Fixed format, current;  Sense key: Illegal Request
 Additional sense: Invalid field in cdb
  Info fld=0x0 [0] 
Report Luns command has bad field in cdb

Comment 10 Paolo Bonzini 2013-02-26 14:38:16 UTC

re. comment 7/8: cannot really figure that out without sg_luns output.  Do you have 2 disks after 262 hotplugs, or is the VM started with 262 disks?  virtio-scsi supports hotplug without having to write to /proc/scsi/scsi, and that should explain the duplicate disks: the kernel is already adding the devices on its own.

re. comment 9: do not use -s2.

Comment 11 Lukáš Doktor 2013-02-27 10:55:45 UTC

Hi Paolo,

I booted the VM with 262 disks specified on cmdline. They didn't occur anywhere (/proc/scsi/scsi, /dev/sg*, /dev/sd*, /dev/disks/*/*, sg_scan).

for AAA in `seq 0 33000` ; do echo "scsi add-single-device 2 0 0 $AAA" > /proc/scsi/scsi; done

Then there were:
sg_scan 522
/proc/scsi/scsi 522
/dev/sd* 522
/dev/sg* 522
/dev/disks/by-id 265
/dev/disk/by-path 522
/dev/disk/by-uuid 3

When I create a filesystem on them (ext3), there is:
/dev/disk/by-uuid/ 525

After a while:
/dev/disk/by-uuid/ 269

After reboot + add-single-device over all luns:
/dev/disk/by-uuid/ 264


About the comment 9: I tried that with 0, 1, 2 with the same results. I tried all options always with the same reply on all disks. There are still 522 disks in the system, I sent only partial output to show that 2 disks with different lun number are the same. The output is exactly the same for every disk, even for disks with lun < 256.

Comment 12 Lukáš Doktor 2013-02-27 13:04:51 UTC

I don't know which information could help you, so I tried various settings, where:

RESULTSA - results directly after boot
RESULTSB - results after execution of: 'for AAA in `seq 0 32768` ; do echo "scsi add-single-device 2 0 0 $AAA" > /proc/scsi/scsi; done'

sg_s = number of disks according to sg_scan
proc = number of disks from /proc/scsi/scsi
sd = number of disks in /dev/sd*
sg = number of disks in /dev/sg*
id = number of disks in /dev/disks/by-id/*
path = number of disks in /dev/disks/by-path/*
uuid = number of disks in /dev/disks/by-uuid/*
SCAN = maximal lun number from sg_scan
SCSI = maximal lun number from /proc/scsi/scsi
luns = assigned lun numbers on cmdline


TIME    LEVEL | RESULTS : sg_s proc  sd   sg   id  path uuid  SCAN  SCSI luns
13:42:40 INFO | RESULTSA:    3    3    3    3    7    3    3   170 27306 range(0,16383,5461)
13:42:45 INFO | RESULTSB:    6    6    6    6    7    6    3   170 27306 range(0,16383,5461)

13:43:19 INFO | RESULTSA:   61   61   61   61   65   61    3   255 32764 range(0,16383,273)
13:43:24 INFO | RESULTSB:  122  122  122  122   65  122    3   255 32764 range(0,16383,273)

13:44:18 INFO | RESULTSA:  253  253  253  253  257  253    3   255 32764 range(0,16383,65)
13:44:27 INFO | RESULTSB:  506  506  506  506  257  506    3   255 32764 range(0,16383,65)

13:45:37 INFO | RESULTSA:  256  256  256  256  260  256    3   192 32704 range(0,16383,64)
13:45:47 INFO | RESULTSB:  512  512  512  512  260  512    3   192 32704 range(0,16383,64)

13:46:52 INFO | RESULTSA:    1    1    1    1    5    1    3     0     0 range(0,16383,63)
13:47:02 INFO | RESULTSB:  522  522  522  522  265  522    3   255 32764 range(0,16383,63)

13:48:21 INFO | RESULTSA:    1    1    1    1    5    1    3     0     0 range(0,16383,31)
13:48:40 INFO | RESULTSB: 1058 1058 1058 1058  533 1058    3   255 32752 range(0,16383,31)


I used autotest, you can checkout the modified version of multi_disk here:
https://github.com/ldoktor/virt-test/tree/multi_disk

I used the command:
./run -t qemu --tests='multi_disk..virtio_scsi_variants..multi_lun' -g Fedora.18.x86_64 --smp=1 -m 4096 -v

Comment 13 Lukáš Doktor 2013-02-27 14:10:09 UTC

I played a bit with it and when I have <=256 number of disks (with lun numbers spread across the interval 0-16383) it works fine (better) until I scan for disks.

This means that after boot:
/proc/scsi/scsi reports 256 disks with lun numbers exactly how you described (0, 64, 128, 192, 16640, 16704, ..., 32704).

sg_luns reports luns exactly how they are in qtree (0, 64, 128, 192, 256, 320, ... 16320). (for all -s 0, 1, 2)

sg_map maps only 4 disks (sg0, sg253, sg254, sg255) but shows /dev/sg0 - /dev/sg255


When I scan for disks:
for AAA in `seq 0 32768` ; do echo "scsi add-single-device 2 0 0 $AAA" > /proc/scsi/scsi; done

than:
/proc/scsi/scsi reports 512 disks (0, 64, 128, ..., 32704)

sg_luns reports everything as before (256 disks, lun numbers 0, 64, 128, ... 16320)

and sg_map maps only 4 disks but shows /dev/sg0 - sg511


When I have > 256 disks only first disk (0-0-0) is there after boot and I'm unable to use sg_luns.

Comment 14 Paolo Bonzini 2013-02-27 18:19:08 UTC

Ok, so the bug is in how /proc/scsi/scsi is handled, plus the 257-disk case.

Comment 15 Cole Robinson 2013-04-01 20:46:09 UTC

(In reply to comment #14)
> Ok, so the bug is in how /proc/scsi/scsi is handled, plus the 257-disk case.

Excuse my ignorance, are these issues or virt-test, qemu, or kernel? Just seeing if I should re-assign

Comment 16 Paolo Bonzini 2013-04-02 12:29:21 UTC

qemu is fine for now, then we can clone/reassign.

Comment 17 Fedora End Of Life 2013-07-04 04:14:38 UTC

This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 18 Fedora End Of Life 2013-08-01 12:59:59 UTC

Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 19 Lukáš Doktor 2013-10-07 19:50:45 UTC

Hi guys,

I retested this again with 260 lun targets on a single scsi controller with the exact same errors using:

host: Fedora 19, upsteam git ~qemu-1.6.50
guest: Fedora 19, kernel-3.11.3-201.fc19

(executed autotest test: "run -t qemu --tests='multi_disk.virtio_scsi_variants.multi_lun' -g Fedora.19.x86_64 --machine-type=i440fx --qemu-bin=/usr/local/bin/qemu-system-x86_64 -v --no-download")

Kind regards,
Lukáš

Comment 20 Fedora End Of Life 2015-01-09 22:04:09 UTC

This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Lukáš Doktor 2015-01-22 09:32:59 UTC

Well, good news, Paolo. I tried that on:

host: kernel-3.17.2-200.fc20.x86_64
qemu: upstream tag v2.2.0-rc5
guest: kernel-3.17.8-200.fc20

and it seems to work properly :-) Thanks.

Comment 22 Cole Robinson 2015-01-22 14:24:14 UTC

Thanks for testing Lukas

Note You need to log in before you can comment on or make changes to this bug.