Bug 680407 - xm block-attach fails to attach more than ~20 images
Summary: xm block-attach fails to attach more than ~20 images
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Laszlo Ersek
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 452650
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-25 13:25 UTC by Laszlo Ersek
Modified: 2018-12-01 18:32 UTC (History)
10 users (show)

Fixed In Version: xen-3.0.3-125.el5
Doc Type: Bug Fix
Doc Text:
Previously, the blktap user-space component did not support attaching more than 100 devices to a guest. With this update, a patch has been provided to address this issue, and now up to 255 image files can be properly attached to a guest.
Clone Of:
Environment:
Last Closed: 2011-07-21 09:13:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
synchronize blktapctrl limit to kernel value (256) (466 bytes, patch)
2011-02-28 18:23 UTC, Laszlo Ersek
no flags Details | Diff
synch blktapctrl limit to kernel value (256); fix aio-max-nr warning msgs (1.99 KB, patch)
2011-03-01 13:43 UTC, Laszlo Ersek
no flags Details | Diff
synch MAX_TAP_DEV to kernel value (256); fix aio-max-nr warnings & xenstored cmdline help (2.76 KB, patch)
2011-03-02 07:48 UTC, Laszlo Ersek
no flags Details | Diff
synch MAX_TAP_DEV to kernel (256); fix aio-max-nr warnings & xenstored cmdline help; quadruple xenstored default quotas (3.17 KB, patch)
2011-03-02 11:46 UTC, Laszlo Ersek
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1070 0 normal SHIPPED_LIVE xen bug fix and enhancement update 2011-07-21 09:12:56 UTC

Description Laszlo Ersek 2011-02-25 13:25:32 UTC
** Description of problem:

Testing the different fix attempts for bug 452650 requires attaching at least 100 vbds to a PV guest. I first tried "xm block-attach domid file://.. /dev/xvdZZ r", but that is loop based and cannot "scale up" to the necessary number. Then I tried "tap:aio:..." instead of "file://...". That did not work either, and this bug report is about the latter one.

** Version-Release number of selected component (if applicable):
xen-3.0.3-120.el5 (with brew kernels based on -244 and -245)

** How reproducible:
Always

** Steps to Reproduce:

Create a PV guest:

name = "rhel-56pv"
uuid = "76b3c8e1-5591-b7a5-a259-ee4b82091a19"
maxmem = 4096
memory = 4096
vcpus = 16
bootloader = "/usr/bin/pygrub"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=en-us" ]
disk = [ "tap:aio:/var/lib/xen/images/rhel56-pv.img,xvda,w" ]
vif = [ "mac=00:16:36:75:d5:0c,bridge=virbr0,script=vif-bridge" ]

Start it. Then create a bunch of image files in dom0:

seq -w 0 255 \
| while read X; do
    dd if=/dev/zero of=$X.img bs=1M count=1
  done

Try to attach them to the guest:

ARR=(a b c d e f g h i j k l m n o p q r s t u v w x y z)

seq -w 0 255 \
| (
    while read X; do
      echo $X
      FIRST=$((10#$X / 26))
      SECOND=$((10#$X % 26))
      xm block-attach rhel-56pv tap:aio:$X.img \
          /dev/xvd${ARR[$FIRST]}${ARR[$SECOND]} r
      sleep 6
    done
  )

** Actual results:

The "sleep 6" is necessary because the loop seems to get slower and slower (this is a known bug with an already committed fix). The attached drives seem to show up in virt-manager/virt-viewer, but after a while (>= ~20 images) not within the guest (/dev/xvdZZ), and the guest console displays messages like

xenbus: failed to write error node for ...

** Expected results:

"xm block-attach" should successfully attach the images as vbds to the guest, up to

min { dom0 blktap limit, guest vbd limit }

which is 256 at this point.

Comment 1 Laszlo Ersek 2011-02-26 00:21:39 UTC
Host is RHEL5-Server-U6, with upgraded kernel and xen packages:

[root@hp-dl385g7-02 ~]# uname -psrn
Linux hp-dl385g7-02.lab.eng.brq.redhat.com 2.6.18-245.el5xen x86_64

[root@hp-dl385g7-02 ~]# rpm -qa | grep xen
xen-libs-3.0.3-123.el5
xen-3.0.3-123.el5
kernel-xen-2.6.18-245.el5

Created the following RHEL5-Server-U6 PV guest (2.6.18-238.el5xen x86_64):

name = "rhel56pv"
uuid = "bd692cfb-1792-49e9-2866-3de97077b884"
maxmem = 4096
memory = 4096
vcpus = 4
bootloader = "/usr/bin/pygrub"
kernel = "/var/lib/xen/boot_kernel.XRj4yk"
ramdisk = "/var/lib/xen/boot_ramdisk.NIiHY9"
extra = "ro root=/dev/VolGroup00/LogVol00 rhgb quiet"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=en-us" ]
disk = [ "file:/var/lib/xen/images/rhel56pv.img,xvda,w" ]
vif = [ "mac=00:16:36:32:ad:c0,bridge=virbr0,script=vif-bridge,vifname=vif8.0" ]

Attached two 1 MB files:

[root@hp-dl385g7-02 ~]# xm block-attach rhel56pv file:/root/000.img xvdb w
[root@hp-dl385g7-02 ~]# xm block-attach rhel56pv tap:aio:/root/001.img xvdc w

Both xm commands completed (seemingly) successfully on the host. In the guest, only /dev/xvdb was created, /dev/xvdc is missing. The guest dmesg says

 xvdb: unknown partition table

but the corresponding message is missing for xvdc.

xvdb and xvdc are displayed by virt-manager (in dom0) alike.

I have been experimenting with this for about six hours now, looking at udev debug output, xenstore contents, blktap and blkfront driver source, dmesgs, and xend.log, and as seen above, even gave up on xvdaa (ie. extended minors). It seems that I can't attach a tap:aio disk at all.

xend seem to write the correct nodes to xenstore:

[2011-02-25 18:44:22 xend 6371] DEBUG (DevController:116) DevController:
    writing {'virtual-device': '51744', 'device-type': 'disk', 'protocol':
    'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend':
    '/local/domain/0/backend/tap/1/51744'} to
    /local/domain/1/device/vbd/51744.
[2011-02-25 18:44:22 xend 6371] DEBUG (DevController:118) DevController:
    writing {'domain': 'rhel56pv', 'frontend':
    '/local/domain/1/device/vbd/51744', 'format': 'raw', 'dev': 'xvdc',
    'state': '1', 'params': 'aio:/root/001.img', 'mode': 'w', 'online': '1',
    'frontend-id': '1', 'type': 'tap'} to
    /local/domain/0/backend/tap/1/51744.
[2011-02-25 18:44:22 xend 6371] DEBUG (DevController:166) Waiting for
    51744.
[2011-02-25 18:44:22 xend 6371] DEBUG (DevController:547)
    hotplugStatusCallback
    /local/domain/0/backend/tap/1/51744/hotplug-status.
[2011-02-25 18:44:23 xend 6371] DEBUG (DevController:547)
    hotplugStatusCallback
    /local/domain/0/backend/tap/1/51744/hotplug-status.
[2011-02-25 18:44:23 xend 6371] DEBUG (DevController:561)
    hotplugStatusCallback 1.

The reference count of the xenblk module is 3 nonetheless (xvda, xvdb, xvdc).

# find /proc/ /sys/ | sed -n 's/51728/XXXX/p' >xvdb
# find /proc/ /sys/ | sed -n 's/51744/XXXX/p' >xvdc
# diff xvdb xvdc
4d3
< /sys/devices/xen/vbd-XXXX/block:xvdb

If I create /dev/xvdc manually (mknod /dev/xvdc b 202 32), then I cannot open that device with fdisk (ENXIO).

Comment 2 Laszlo Ersek 2011-02-28 11:03:17 UTC
xm block-attach failed in these further tests (same machine (hp-dl385g7-02.lab.eng.brq.redhat.com) and host kernel (-245) as in comment 1):

1. host userspace: -105, PV guest kernel: -238,   1 MB image file, xvdc
2. host userspace: -120, PV guest kernel: -194,   1 MB image file, xvdc
3. host userspace: -120, PV guest kernel: -194, 100 MB image file, xvdb

I'm obviously doing something wrong.

Comment 3 Laszlo Ersek 2011-02-28 11:36:31 UTC
When rebooting the guest such that the tap:aio disks are present in /etc/xen/rhel56pv, the guest boot takes very long, and the following message is printed in dmesg:

XENBUS: Waiting for devices to initialise: 295s...290s...285s...280s...275s...270s...265s...260s...255s...250s...245s...240s...235s...230s...225s...220s...215s...210s...205s...200s...195s...190s...185s...180s...175s...170s...165s...160s...155s...150s...145s...140s...135s...130s...125s...120s...115s...110s...105s...100s...95s...90s...85s...80s...75s...70s...65s...60s...55s...50s...45s...40s...35s...30s...25s...20s...15s...10s...5s...0s...
XENBUS: Timeout connecting to device: device/vbd/51744 (local state 3, remote state 1)
XENBUS: Timeout connecting to device: device/vbd/51728 (local state 3, remote state 1)

This is printed by wait_for_devices() in drivers/xen/xenbus/xenbus_probe.c; probably waiting for the xenbus state walks to complete, then giving up. For some reason the tap:aio disks' states are out of sync between the host and the guest.

I now think this might even be a host kernel problem. I did all the testing with -244 and -245 based host kernels, and a much bigger range of guest kernels and xen userspace.

Comment 4 Laszlo Ersek 2011-02-28 18:06:50 UTC
Reinstalled the same machine with pristine x86_64 rhel-56, created a PV guest
with the same OS, using virt-install:

name = "rhel56pv"
uuid = "c51570f7-34ea-16b4-0469-027263743425"
maxmem = 4096
memory = 4096
vcpus = 4
bootloader = "/usr/bin/pygrub"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=en-us" ]
disk = [ "tap:aio:/var/lib/xen/images/rhel56pv.img,xvda,w" ]
vif = [ "mac=00:16:36:04:7c:5f,bridge=virbr0,script=vif-bridge" ]

The following command works perfectly:

# xm block-attach rhel56pv tap:aio:/root/images/000.img xvdaa r

---------------------------

Installed 2.6.18-245.el5.blktap_limit_bz452650_try2xen on the host, and
repeated the loop to attach further 254 tap:aio images.

seq -w 0 253 \
| (
    ARR=(a b c d e f g h i j k l m n o p q r s t u v w x y z)
    while read X; do
      FIRST=$((10#$X / 26))
      SECOND=$((10#$X % 26))
      DEV=xvd${ARR[$FIRST]}${ARR[$SECOND]}
      echo $X $DEV
      xm block-attach rhel56pv tap:aio:/root/images/$X.img $DEV r
      sleep 2
    done
  )

The first 99 extra disks (after the default xvda, xvda1, xvda2), that is, xvdaa
to xvddu were attached. xvddv to xvdiz completed in the host, but did not show
up in the guest. Attaching xvdja failed with "Error: Unable to find number for
device (xvdja)".

---------------------------

Reinstalled the -238 (rhel-56) kernel. Again, up to 100 disks in total could be
attached. This is a bit different from the patched kernel because there the
xvda1 and xvda2 partitions didn't count in the 100 (so in total there were 102
block devices), while now with -238 they did count. (xvdds is the maximum with
-238.) This suggests that there's another limit of 100 which was not raised.

---------------------------

This limit is in xen userspace, tools/blktap/drivers/blktapctrl.c (as of xen-3.0.3-124.el5):

   218  /* Abitrary values, must match the underlying driver... */
   219  #define MAX_TAP_DEV 100

and it's used in tools/blktap/drivers/blktapctrl.c, get_new_dev():

   135                  ret = ioctl(ctlfd, BLKTAP_IOCTL_NEWINTF_EXT, &tr_ext);
[...]
   147          if ( (ret <= 0)||(ret > MAX_TAP_DEV) ) {
   148                  DPRINTF("Incorrect Dev ID [%d]\n",ret);
   149                  return -1;
   150          }

I guess the return value gets ignored somewhere along the way, but after looking at tools/blktap/drivers/tapdisk.h, blktapctrl seems to log the message to syslog on level LOG_DEBUG. After enabling this level in /etc/syslog.conf and retrying with the patched -245, the messages do show up:

Feb 28 13:04:02 hp-dl385g7-02 BLKTAPCTRL[6041]: Received a poll for a new vbd 
Feb 28 13:04:02 hp-dl385g7-02 BLKTAPCTRL[6041]: Sent domid 1 and be_id
                                                268467456 
Feb 28 13:04:02 hp-dl385g7-02 BLKTAPCTRL[6041]: Incorrect Dev ID [101] 

blktapctrl is required in order to create new tapdisks, see "Detailed description" under http://wiki.xensource.com/xenwiki/blktap: "The request for a new virtual disk is propagated to blktapctrl, which creates a new character device and two named pipes for communication with a newly forked tapdisk process".

It is interesting that guest device nodes that correspond to *partitions* (like xvda1, xvda2) count against the kernel limit, but do not appear to count against the blktapctrl limit. Raising the kernel limit allowed me to create two more entire-disk device nodes, so that the total number of entire-disk nodes reached 100.

Comment 5 Laszlo Ersek 2011-02-28 18:23:11 UTC
Created attachment 481425 [details]
synchronize blktapctrl limit to kernel value (256)

This patch (if correct) will have to be sent to upstream as well, because as of xen-unstable c/s 22956:8af88ff698ff, they still have a limit of 100 in userspace, even though linux-2.6.18-xen c/s 1071:23270e45ef1e has a kernel limit of 256. The upstream commit that changed the kernel limit (xen-unstable c/s 11868:80b296ec93dc) seems to have missed the userspace limit; in fact, the last change of the "#define MAX_TAP_DEV 100" line in userspace is rev 10677, which is the first commit of the file ever.

(
(In reply to comment #4)

> This limit is in xen userspace, tools/blktap/drivers/blktapctrl.c (as of
> xen-3.0.3-124.el5):
> 
>    218  /* Abitrary values, must match the underlying driver... */
>    219  #define MAX_TAP_DEV 100

Cut'n'paste error, the file name is "tools/blktap/lib/blktaplib.h".
)

Comment 6 Laszlo Ersek 2011-03-01 13:43:33 UTC
Created attachment 481617 [details]
synch blktapctrl limit to kernel value (256); fix aio-max-nr warning msgs

Extended previous patch with fixing typos in the aio-max-nr warning messages.

Comment 7 Laszlo Ersek 2011-03-01 14:03:18 UTC
It turns out that there's a third (run-time) limit: the "fs.aio-max-nr" sysctl, according to blktapctrl debug log messages.

Mar  1 08:22:10 hp-dl385g7-02 TAPDISK[14507]: Couldn't setup AIO context.  If
                                              you are trying to concurrently use
                                              a large number of blktap-based
                                              disks, you may need to increase
                                              the system-wide aio request limit.
                                              (e.g. 'echo echo 1048576 >
                                              /proc/sys/fs/aio-max-nr')

(This message contains a typo, "echo echo", so I'll fix that too.)

After permanently setting "fs.aio-max-nr = 262144" in /etc/sysctl.conf (raising it from the default 65536), I tested the patch (attachment 481617 [details]) with the following kernel and xen-userspace:

http://brewweb.devel.redhat.com/taskinfo?taskID=3134123
http://brewweb.devel.redhat.com/taskinfo?taskID=3147838

kernel: 2.6.18-245.el5.blktap_limit_bz452650_try2xen
userspace: xen-{,libs-}3.0.3-124.el5.blktap_lost_bz680407_v2

and now I could get up to xvdeb (109 devices in total, including xvda, xvda1, and xvda2). For each attempt following that, the guest dmesg contains such triplet:

vbd vbd-268469248: 2 writing ring-ref
xenbus: failed to write error node for device/vbd/268469248 (2 writing ring-ref)
vbd vbd-268469248: 2 xenbus_dev_probe on device/vbd/268469248

This is the message I reported in comment #0. I'm going to investigate further.

Comment 8 Laszlo Ersek 2011-03-01 14:51:10 UTC
Under drivers/xen:

xenbus/xenbus_probe.c      -- xenbus_dev_probe() calls drv->probe()
blkfront/blkfront.c        -- blkfront_probe() calls talk_to_backend()
                           -- talk_to_backend() calls xenbus_printf() to
                              communicate the ring-ref, but fails
                           -- talk_to_backend() calls xenbus_dev_fatal(),
xenbus/xenbus_client.c     -- xenbus_dev_fatal() calls _dev_error(),
                           -- _dev_error() calls dev_err()
.../include/linux/device.h -- dev_err() logs the first message
xenbus/xenbus_client.c     -- _dev_error() tries to communicate the error
                              via xenbus_write(), but fails, logs the second
                              message
xenbus/xenbus_probe.c      -- xenbus_dev_probe() calls xenbus_dev_error(),
                              etc, logs third message

The error node could likely not be propagated back to the backend for the same reason why the ring-ref could not be communicated to the backend in the first place. XenStore just refuses writing nodes for this device.

Error code 2 is ENOENT.

Comment 9 Laszlo Ersek 2011-03-01 15:30:36 UTC
Enabled xenstored debugging by putting XENSTORED_TRACE=true in /etc/environment; also modified xend in-place to pass -V (--verbose) to xenstored as well. Perhaps I will have to use Michal's advanced xenstored debugging/tracing patches as well.

I did not reboot the machine yet, stopped xend and xenstored, and restarted xend (which starts xenstored itself). /var/log/xen/xenstored-trace.log contains an entry like

OUT 0x19500d80 20110301 10:08:44 ERROR (ENOSPC )

and xend.log contains a big exception trace saying

[2011-03-01 10:08:44 xend 20319] ERROR (SrvDaemon:297) Exception starting xend ((28, 'No space left on device'))
Traceback (most recent call last):
[...]
  File "/usr/lib64/python2.4/site-packages/xen/xend/xenstore/xstransact.py", line 218, in mkdir
    xshandle().mkdir(self.transaction, self.path)
Error: (28, 'No space left on device')

Perhaps the guest can't write to the xenstore (either ring-ref or error-node) because the xenstore ran out of space. /etc/init.d/xend has

XENSTORED_TMPFS=yes

which puts /var/lib/xenstored on tmpfs. Perhaps that was the problem.

... Though now I tried to look in the xenstore source if there was a limit on the entire MySQL database. Instead of that, there's "quota_max_entry_size", settable by -S/--entry-size, defaulting to 2KB. The "quota_max_entry_size" variable is used as a limit in write_node() (in tools/xenstore/xenstored_core.c), and if an unprivileged domain tries to exceed that, it gets ENOSPC.

Raising this limit would require UI changes, so I'll just distribute the cca. 240 vbds I wish to attach over two guests.

Comment 10 Laszlo Ersek 2011-03-01 15:33:50 UTC
Actually there are further quotas:

"  --entry-nb <nb>     limit the number of entries per domain,\n"
"  --entry-size <size> limit the size of entry per domain, and\n"
"  --entry-watch <nb>  limit the number of watches per domain,\n"

Distributing the vbds over two (or even more) guests should please these too.

Comment 15 Laszlo Ersek 2011-03-02 07:48:46 UTC
Created attachment 481801 [details]
synch MAX_TAP_DEV to kernel value (256); fix aio-max-nr warnings & xenstored cmdline help

While constructing the command line for xenstored, in order to quadruple all quotas, I noticed that the command line help does not match the accepted options exactly. Extended the patch with "--entry-watch" --> "--watch-nb". (The set of accepted options didn't change, only the help text was updated.)

Comment 16 Laszlo Ersek 2011-03-02 11:46:07 UTC
Created attachment 481845 [details]
synch MAX_TAP_DEV to kernel (256); fix aio-max-nr warnings & xenstored cmdline help; quadruple xenstored default quotas

Quadrupled all four xenstored quotas to verify the idea in comment 8 & comment 9.

I was finally able to attach 234 additional tap:aio disks to the same guest, xvdaa to xvdiz, inclusive, on top of xvda+xvda1+xvda2. I'll try to describe here everything, also in order to help QE repeat the verification.

** 1. Provision "hp-dl385g7-02.lab.eng.brq.redhat.com" with RHEL-5.6 from Beaker, x86_64 flavor.

** 2. Install "kernel-xen-2.6.18-245.el5.blktap_limit_bz452650_try2" from https://brewweb.devel.redhat.com/taskinfo?taskID=3134123 -- that is, the fix for bug 452650; this raises the kernel side limit on tap disks to 256. Do not reboot just yet.

** 3. Add "fs.aio-max-nr = 262144" to "/etc/sysctl.conf".

** 4. Install (with the rpm tool) xen-libs-3.0.3-124.el5.blktap_lost_bz680407_v3 from https://brewweb.devel.redhat.com/taskinfo?taskID=3150848 -- this (and the xen package, to be installed later) contain the fixes in the patch I'm attaching now. rpm will complain about missing dependencies; install those (but nothing else) with yum. Then install the RPM with rpm.

** 5. Separately install (with the rpm tool) xen-3.0.3-124.el5.blktap_lost_bz680407_v3 from the location in step 4. rpm will complain about missing dependencies (but notably *not* about xen-libs); install those dependencies (but nothing else) with yum. Then install the RPM with rpm.

** 6. Modify "/etc/grub.conf" so that the new kernel (step 2) is the default. Reboot the host now.

** 7. After reboot, install virt-viewer with yum.

** 8. Create the following rhel56pv guest, and complete its installation with virt-viewer (choose default options everywhere, or use a kickstart file to the same effect):

virt-install \
    --connect=xen \
    --name=rhel56pv \
    --ram=4096 \
    --arch=x86_64 \
    --vcpus=4 \
    --os-type=linux \
    --os-variant=rhel5.4 \
    -p \
    --location=http://download.englab.brq.redhat.com/pub/rhel/released/RHEL-5-Server/U6/x86_64/os/ \
    --disk=path=/var/lib/xen/images/rhel56pv.img,size=6,sparse=false \
    --network=network:default \
    --debug \
    --prompt \
    --extra-args=text

** 9. After the guest is up & running, execute the following in the host, as root -- create image files:

(
  set -e
  mkdir /root/images
  cd /root/images
  seq -w 0 255 \
  | while read X; do
      dd if=/dev/zero of=$X.img bs=1M count=1
    done
)

** 10. ssh into the guest, or open a virtual terminal on with virt-viewer. Execute and keep running this command:

watch --interval=1 ls -C '/dev/xvd*'

** 11. On the host, run the following script as root. It will take some time due to the sleep, which may not be necessary, but it's better to give udev and co. some time:

seq -w 0 255 \
| (
    ARR=(a b c d e f g h i j k l m n o p q r s t u v w x y z)

    while read X; do
      FIRST=$((10#$X / 26))
      SECOND=$((10#$X % 26))
      DEV=xvd${ARR[$FIRST]}${ARR[$SECOND]}
      echo $X $DEV
      xm block-attach rhel56pv tap:aio:/root/images/$X.img $DEV r
      sleep 5
    done
  )

** 12. Watch in the guest terminal opened in step 10 how devices get added.

** 13. Somewhere above 200, the host script described in step 11 will start to return errors like

----v----
234 xvdja
Error: Unable to find number for device (xvdja)
Usage: xm block-attach <Domain> <BackDev> <FrontDev> <Mode>

Create a new virtual block device.
----^----

Interrupt the script then with Ctrl-C.

** 14. In the guest, select the finally attached device, eg. /dev/xvdiz, and open it with fdisk:

fdisk /dev/xvdiz 

fdisk should open the block device, complain a bit, but then offer its command prompt. Exit with "q".

Comment 18 Laszlo Ersek 2011-03-07 09:03:27 UTC
(In reply to comment #16)

> I was finally able to attach 234 additional tap:aio disks to the same guest,
> xvdaa to xvdiz, inclusive, on top of xvda+xvda1+xvda2. I'll try to describe
> here everything, also in order to help QE repeat the verification.

If one skips steps 3, 4 and 5, then this list should provide a reliable reproducer as well.

Comment 22 Yuyu Zhou 2011-04-25 09:48:35 UTC
I have two questions.
1. I failed to reproduce the bug as comment1, on xen-3.0.3-120.el5, kernel-xen-2.6.18-245.el5.
I succeed to attach /dev/xvddu, 99 in all
and the guest console displays messages like:
xenbus: failed to write error node for device/vbd/... 
It didn't fail around 20 images.

2. When I try to verified the bug according to comment 16, the xvdaa to xvdiz attached in the host, but the xvdhd to xvdiz did not show up in guest. 

** Version-Release number of selected component(if applicable)
kernel-xen-2.6.18-257.el5, xen-3.0.3-129.el5

** How reproducible:
Always

** Steps to Reproduce:
1. Add "fs.aio-max-nr = 262144" to "/etc/sysctl.conf" in host.
2. Create a PV guest and create 256 image files in dom0
3. Attach the 256 images to guest one by one by script.

** Actual Result:
Return error since 235th images but only first 185 images(xvdaa-xvdhc) can be seen in guest.

** Expected result
Return error since 235th images, all 234 images can be seen and works well in guest.

Comment 23 Laszlo Ersek 2011-04-26 09:10:57 UTC
(In reply to comment #22)
> I have two questions.
> 1. I failed to reproduce the bug as comment1, on xen-3.0.3-120.el5,
> kernel-xen-2.6.18-245.el5.
> I succeed to attach /dev/xvddu, 99 in all
> and the guest console displays messages like:
> xenbus: failed to write error node for device/vbd/... 
> It didn't fail around 20 images.

That's alright. I saw it to go wrong with about 20 images for the first time, but my test environment was quite messed up at that time. There are several limits in userspace (see points 3 & 4 in comment 11), and I likely ran out of one of those earlier than you ran out of the blktapctrl limit (which is point 2 in comment 11).

So what you see is alright. I was progressively hitting more and more *types* of limits as I was trying to analyze this bug. The bug's summary / title and comment #0 describes only the first problem I faced.

I cannot provide a recipe to reproduce the exact same problem that I saw. There are multiple limits. The actual resource usages against those, and the first limit that "hits", can be different from host to host. The only symptom that matters is that the disks don't show up in the guest.

The fact that "xm block-attach" does not provide a clear error message in this case is a separate issue.


> 2. When I try to verified the bug according to comment 16, the xvdaa to xvdiz
> attached in the host, but the xvdhd to xvdiz did not show up in guest. 
> 
> ** Version-Release number of selected component(if applicable)
> kernel-xen-2.6.18-257.el5, xen-3.0.3-129.el5
> 
> ** How reproducible:
> Always
> 
> ** Steps to Reproduce:
> 1. Add "fs.aio-max-nr = 262144" to "/etc/sysctl.conf" in host.

Did you make sure that this also took effect? For example, by executing
"sysctl -p" after modifying "/etc/sysctl.conf"?


> 2. Create a PV guest and create 256 image files in dom0
> 3. Attach the 256 images to guest one by one by script.

An excerpt from comment 16:

> ** 3. Add "fs.aio-max-nr = 262144" to "/etc/sysctl.conf".
> ** 6. [...] Reboot the host now.
> ** 11. On the host, run the following script as root. [...]
>       xm block-attach rhel56pv tap:aio:/root/images/$X.img $DEV r

So I didn't spell out "sysctl -p" in comment 16, because the steps I described  there require (for other reasons as well) rebooting the host after "/etc/sysctl.conf" is modified.

To see if indeed the "aio-max-nr" sysctl setting is missing, I suggest two things:

(1) As per comment #4, you can enable the logging of debug level messages in /etc/syslog.conf (don't forget "service syslog restart" afterwards). Then, after retrying the same steps (without a host reboot), you should see the "Couldn't setup AIO context" in the syslog file, as described in comment #7.

(2) Reboot the host with the "aio-max-nr" setting in place and retry the test.

Thanks!

Comment 24 Yuyu Zhou 2011-04-26 12:52:35 UTC
Verified the bug on kernel-xen-2.6.18-257.el5, xen-3.0.3-129.el5.

Steps to verified:
1. Add "fs.aio-max-nr = 262144" to "/etc/sysctl.conf" in host, and execute
"sysctl -p".
2. Create a PV guest and create 256 image files in dom0
3. Attach the 256 images to guest one by one by script.

Results:
Return error since 235th images, all 234 images can be seen and works well in
guest. no other limits hit.

So change the Status to VERIFIED.

Comment 28 Tomas Capek 2011-07-13 13:31:34 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, the MAX_TAP_DEV variable value was not synchronized with kernel. As a consequence, the "xm block-attach" failed to attach a large number of image files as virtual block devices to a guest. With this update, a patch has been provided to address this issue, and now image files are properly attached to a guest up to the set limit.

Comment 30 Paolo Bonzini 2011-07-13 15:04:03 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously, the MAX_TAP_DEV variable value was not synchronized with kernel. As a consequence, the "xm block-attach" failed to attach a large number of image files as virtual block devices to a guest. With this update, a patch has been provided to address this issue, and now image files are properly attached to a guest up to the set limit.+Previously, the user-space blktap component did not support attaching more than 100 devices. With this update, a patch has been provided to address this issue, and now up to 256 image files can be properly attached to a guest.

Comment 32 Laszlo Ersek 2011-07-13 16:14:29 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously, the user-space blktap component did not support attaching more than 100 devices. With this update, a patch has been provided to address this issue, and now up to 256 image files can be properly attached to a guest.+Previously, the user-space blktap component did not support attaching more than 100 devices. With this update, a patch has been provided to address this issue, and now up to 255 image files can be properly attached to a guest.

Comment 33 Tomas Capek 2011-07-14 14:03:53 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously, the user-space blktap component did not support attaching more than 100 devices. With this update, a patch has been provided to address this issue, and now up to 255 image files can be properly attached to a guest.+Previously, the blktap user-space component did not support attaching more than 100 devices to a guest. With this update, a patch has been provided to address this issue, and now up to 255 image files can be properly attached to a guest.

Comment 34 errata-xmlrpc 2011-07-21 09:13:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1070.html

Comment 35 errata-xmlrpc 2011-07-21 12:07:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1070.html


Note You need to log in before you can comment on or make changes to this bug.