Bug 452650 - [RHEL5.2]: Blktap is limited to 100 disks total
[RHEL5.2]: Blktap is limited to 100 disks total
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.2
All Linux
low Severity low
: rc
: ---
Assigned To: Laszlo Ersek
Virtualization Bugs
:
Depends On:
Blocks: 514490 680407
  Show dependency treegraph
 
Reported: 2008-06-24 06:09 EDT by Chris Lalancette
Modified: 2011-07-21 06:26 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-07-21 06:26:32 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Remove unused MAX_DEV_NAME macro. (422 bytes, patch)
2011-02-18 11:22 EST, Laszlo Ersek
no flags Details | Diff
Make the maximum number of blktap devices a module parameter. (5.40 KB, patch)
2011-02-18 11:24 EST, Laszlo Ersek
no flags Details | Diff
Remove unused MAX_DEV_NAME macro. Make the maximum number of blktap devices a module parameter. (5.75 KB, patch)
2011-02-21 10:12 EST, Laszlo Ersek
drjones: review-
Details | Diff
Remove unused MAX_DEV_NAME macro. Make the maximum number of blktap devices a module parameter. (5.32 KB, patch)
2011-02-21 12:29 EST, Laszlo Ersek
no flags Details | Diff
max_tap_dev module parameter; v4 (5.17 KB, patch)
2011-02-22 07:37 EST, Laszlo Ersek
no flags Details | Diff
Bump MAX_TAP_DEV from 100 to 256. Remove superfluous MAX_DEV_NAME. (528 bytes, patch)
2011-02-23 07:26 EST, Laszlo Ersek
no flags Details | Diff

  None (edit)
Description Chris Lalancette 2008-06-24 06:09:34 EDT
Description of problem:
Currently in the RHEL-5.2 kernel, the blktap driver has a hardcoded limit of 100
devices.  This means that the dom0 kernel can only service 100 disks total,
across all guests.  Upstream has kicked this up to 256; however, there really
seems to be no reason that this couldn't be made dynamic in the
drivers/xen/blktap/blktapmain.c code, so you could have basically unlimited
disks (pending memory requirements, of course).

The basic thing to fix is changing the statically allocated tapfds and
domid_translate arrays to dynamically allocated, and then fixing all of the
fallout from that.  It should be relatively straightforward and confined to the
blktap driver.  Of course, we should definitely work with upstream on this one.
Comment 1 Chris Lalancette 2008-06-24 06:11:55 EDT
Note that this is closely related to BZ 442723, where we are expanding the
maximum number of disks that you can attach to a guest.

Chris Lalancette
Comment 2 RHEL Product and Program Management 2008-06-24 06:23:20 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 5 Laszlo Ersek 2011-02-17 11:51:37 EST
(In reply to comment #0)
> Description of problem:
> Currently in the RHEL-5.2 kernel, the blktap driver has a hardcoded limit of
> 100 devices.  This means that the dom0 kernel can only service 100 disks
> total, across all guests.  Upstream has kicked this up to 256; however, there
> really seems to be no reason that this couldn't be made dynamic in the
> drivers/xen/blktap/blktapmain.c code, so you could have basically unlimited
> disks (pending memory requirements, of course).
> 
> The basic thing to fix is changing the statically allocated tapfds and
> domid_translate arrays to dynamically allocated, and then fixing all of the
> fallout from that.  It should be relatively straightforward and confined to
> the blktap driver.  Of course, we should definitely work with upstream on
> this one.

Minor devices for /dev/xen/blktap* are already dynamically allocated and reused from a "pool"; it's only that the pool (the array) holding the pointers has static storage duration (and so fixed size).

A dynamically resized pool would require kmalloc() + memcpy() + kfree(), then resetting the pointer pointing to the base of the pool. (Naturally, the array object would be replaced by a pointer object.) This reassignment would require locking or atomic operations, beacuse lots of places in "blktap.c" query "tapfds[minor]". I think this would introduce unnecessary risk.

Another data structure to consider would be a list, but that's even worse -- not only does it involve continuous pointer manipulation, it also doesn't support O(1) access by device minor id. But structure more complicated than a vector or a list is unwarranted, in my opinion.

I think the best we can do, and the most we should do, is to introduce a module parameter that specifies the pool size, and replace the static array with a dynamically *pre*allocated array (blkif_init()).

(I'm talking about a single array to replace, but I realize in RHEL-5 we also have the "translate_domid array".)

Considering this bug is two and a half years old and was not linked with business cases here; what's more, that upstream still has a fixed limit of 256 as of rev 1070:2994d2997f9d, I think we can safely assume that a fixed but tuneable pool size would cover all real demands. It seems that the real bottleneck was bug 442723 (number of guest disks).

Now that a single guest can handle up to 26 + 26 * 26 = 702 disks, considering a 1 TB host with ~1000 1 GB guests, I think our new module parameter should be clamped between the round limits of "64" and "512 * 1024", and it should default to 256 (to adjust ourselves to upstream). Whoever needs more than the default 256 does qualify, after 2.5 years, as a special case, and should be willing to set the module parameter. Minor device numbers have 20 bits of information, so this should fit.

Two arguments for lowering the upper limit to something actually sensible:
- There really shouldn't be hundreds of thousands of /dev/xen/blktap* nodes.
- Many functions search these arrays in a linear fashion. For example, a
  sequence of device allocations (via N invocations of get_next_free_dev(),
  each called from a separate BLKTAP_IOCTL_NEWINTF / _EXT ioctl) takes
  quadratic time in total.

BTW. the "MAX_DEV_NAME" macro is superfluous; there's no single application of it in the entire tree, not even in upstream.

Proposition:
a) Brainstorm an upper limit for the pool size.
b) Delete MAX_DEV_NAME.
c) Make the pool size a module parameter.
d) Send patch to upstream.
e) Send (potentially modified) patch to rhkernel-list.

I'm willing to do (b)-(e); please comment on (a), and of course on the idea itself. Thank you very much!
Comment 6 Chris Lalancette 2011-02-17 13:32:48 EST
(In reply to comment #5)
> A dynamically resized pool would require kmalloc() + memcpy() + kfree(), then
> resetting the pointer pointing to the base of the pool. (Naturally, the array
> object would be replaced by a pointer object.) This reassignment would require
> locking or atomic operations, beacuse lots of places in "blktap.c" query
> "tapfds[minor]". I think this would introduce unnecessary risk.
> 
> Another data structure to consider would be a list, but that's even worse --
> not only does it involve continuous pointer manipulation, it also doesn't
> support O(1) access by device minor id. But structure more complicated than a
> vector or a list is unwarranted, in my opinion.
> 
> I think the best we can do, and the most we should do, is to introduce a module
> parameter that specifies the pool size, and replace the static array with a
> dynamically *pre*allocated array (blkif_init()).
> 
> (I'm talking about a single array to replace, but I realize in RHEL-5 we also
> have the "translate_domid array".)
> 
> Considering this bug is two and a half years old and was not linked with
> business cases here; what's more, that upstream still has a fixed limit of 256
> as of rev 1070:2994d2997f9d, I think we can safely assume that a fixed but
> tuneable pool size would cover all real demands. It seems that the real
> bottleneck was bug 442723 (number of guest disks).

I disagree with this point, though not strongly.  In those 2.5 years, I have had multiple people ask why this limitation is in place, and a way to workaround it.  The workaround is typically easy; use phy: devices instead.  Still, it is something that has come up frequently enough that it is annoying (and very non-obvious).

My preferred solution (and the one that I would have tried to implement, given enough time) is the kmalloc() + memcpy() + kfree() one with the locking.  I don't think it will be that hard to do.

All of that being said, I do agree that it adds some risk to the whole thing, and we are getting awfully late in RHEL-5 Xen's lifetime to start doing risky things.  Additionally, hasn't blktap mostly been replaced by blktap2 in upstream Xen?  Assuming this is the case, it would be code that wouldn't get much testing by the community, so we would be going it alone here.

So...

> Now that a single guest can handle up to 26 + 26 * 26 = 702 disks, considering
> a 1 TB host with ~1000 1 GB guests, I think our new module parameter should be
> clamped between the round limits of "64" and "512 * 1024", and it should
> default to 256 (to adjust ourselves to upstream). Whoever needs more than the
> default 256 does qualify, after 2.5 years, as a special case, and should be
> willing to set the module parameter. Minor device numbers have 20 bits of
> information, so this should fit.
> 
> Two arguments for lowering the upper limit to something actually sensible:
> - There really shouldn't be hundreds of thousands of /dev/xen/blktap* nodes.
> - Many functions search these arrays in a linear fashion. For example, a
>   sequence of device allocations (via N invocations of get_next_free_dev(),
>   each called from a separate BLKTAP_IOCTL_NEWINTF / _EXT ioctl) takes
>   quadratic time in total.
> 
> BTW. the "MAX_DEV_NAME" macro is superfluous; there's no single application of
> it in the entire tree, not even in upstream.
> 
> Proposition:
> a) Brainstorm an upper limit for the pool size.
> b) Delete MAX_DEV_NAME.
> c) Make the pool size a module parameter.
> d) Send patch to upstream.
> e) Send (potentially modified) patch to rhkernel-list.
> 
> I'm willing to do (b)-(e); please comment on (a), and of course on the idea
> itself. Thank you very much!

I think that we should increase the limit to 256 like you said.  The addition of a module parameter is a nice-to-have; if you do add it, I wouldn't artificially restrict it.  As we like to say, we'll give you all the rope you want; it's up to you whether you choose to strangle yourself with it.

So that means my counter-proposition would be:
a) Delete MAX_DEV_NAME
b) Increase the limit to 256 to match upstream
c) Make the pool size a module parameter
d) Send patch to upstream
e) Send patch to rhkernel-list

Does that work for you?

Chris Lalancette
Comment 7 Laszlo Ersek 2011-02-18 04:14:43 EST
(In reply to comment #6)
> (In reply to comment #5)

> My preferred solution (and the one that I would have tried to
> implement, given enough time) is the kmalloc() + memcpy() + kfree()
> one with the locking.  I don't think it will be that hard to do.

I'm generally not afraid (just very conscious) of locking. Being a
novice in kernel space, I couldn't claim I'm completely aware of my
surroundings all the time, so I try to constrain my modifications to a
scope as narrow as possible.

Perhaps a spinlock would be appropriate here. I'm not sure if a normal
spinlock ensures visibility (cache flushes, "publication safety") -- I
might have to force acquire-release semantics with memory barriers after
storing the new pointer value.

Protecting only the base pointer doesn't seem enough, even; the accesses
to tapfds, tapfds[i], and *tapfds[i] do not form a pure
producer/consumer pattern. Protecting each entry separately is overkill
and not very easy to get right IMHO; protecting all those 1 + n + n
objects with a single spinlock might prove a bottleneck.


> All of that being said, I do agree that it adds some risk to the
> whole thing, and we are getting awfully late in RHEL-5 Xen's lifetime
> to start doing risky things.  Additionally, hasn't blktap mostly been
> replaced by blktap2 in upstream Xen?

I couldn't say; I was surprised they have blktap2 at all.


> I think that we should increase the limit to 256 like you said.  The
> addition of a module parameter is a nice-to-have; if you do add it, I
> wouldn't artificially restrict it.  As we like to say, we'll give you
> all the rope you want; it's up to you whether you choose to strangle
> yourself with it.
>
> So that means my counter-proposition would be:
> a) Delete MAX_DEV_NAME
> b) Increase the limit to 256 to match upstream
> c) Make the pool size a module parameter
> d) Send patch to upstream
> e) Send patch to rhkernel-list
>
> Does that work for you?

It's less work than what I proposed, so obviously it is :)

Though there is a constraint we certainly must put on the upper value.
The array size (after multiplication) must not overflow a size_t. That
is, supposing the module parameter must be a signed integer,

0 < num_tapfds
&& (size_t)num_tapfds <= (size_t)-1 / sizeof tapfds[0]

must hold. (Otherwise, the allocation might "succeed" for an overflown
size_t, and the subsequent initialization loop will trample on
whatever's after the truncated array.) On 64 bit, the second condition
can't yield false; sometimes that condition stand-alone is the source of
a useless gcc warning.

Thank you!
lacos
Comment 8 RHEL Product and Program Management 2011-02-18 08:40:10 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 9 Laszlo Ersek 2011-02-18 11:22:33 EST
Created attachment 479561 [details]
Remove unused MAX_DEV_NAME macro.
Comment 10 Laszlo Ersek 2011-02-18 11:24:16 EST
Created attachment 479562 [details]
Make the maximum number of blktap devices a module parameter.

Testing in progress:

https://brewweb.devel.redhat.com/taskinfo?taskID=3123631
Comment 12 Laszlo Ersek 2011-02-21 10:12:00 EST
Created attachment 479935 [details]
Remove unused MAX_DEV_NAME macro. Make the maximum number of blktap devices a module parameter.

Make limit checks stricter.
Rely on correct implicit conversions instead of explicit casts.
Collapse patches into one.
Comment 13 Andrew Jones 2011-02-21 11:21:15 EST
Comment on attachment 479935 [details]
Remove unused MAX_DEV_NAME macro. Make the maximum number of blktap devices a module parameter.

http://www.kernel.org/doc/Documentation/CodingStyle

The static asserts are unnecessary.

The additional context block for the 'new' is unnecessary.

Using uint32_t, so the limit is the same for both 32-bit and 64-bit compiles, or even something smaller would be better than using size_t-1.

The " As of kernel-2.6.18-245.el5" comment is confusing. It sounds like those leaks were introduced with -245, but really you mean that those leaks have been
there a long time, up to and including -245. I would just drop it from the comment. If someone fixes the leaks, they'd remove the comment anyway.
Comment 14 Paolo Bonzini 2011-02-21 11:39:41 EST
Also, if you really want static asserts the idiom is BUILD_BUG_ON.
Comment 15 Paolo Bonzini 2011-02-21 11:43:48 EST
Bugzilla ate my first comment. :) I would just set a relatively low LIM_TAP_DEV, like 1024 (without any static check: 1024 times any sensible struct size is less than 2^32 or 2^64!).

Don't forget that the limit on the number of event channels in dom0 would be hit before that: only Windows guests are able to tear down block devices and free their event channels.
Comment 16 Laszlo Ersek 2011-02-21 11:59:34 EST
(In reply to comment #14)
> Also, if you really want static asserts the idiom is BUILD_BUG_ON.

Thanks; I was searching for variations on "static_assert" and couldn't remember BUILD_BUG_ON.

(In reply to comment #15)
> I would just set a relatively low
> LIM_TAP_DEV, like 1024 (without any static check: 1024 times any sensible
> struct size is less than 2^32 or 2^64!).

Thank you; this is what I asked for in comment 5 ("Brainstorm an upper limit for the pool size"). I wrote the current patch (attachment 479935 [details]) for the case recommended by Chris in comment 6 ("we'll give you all the rope you
want"). Unsigned overflow is even more rope than that, so I had to make sure that the product can be represented in a size_t (= the type that kmalloc() takes).


> Don't forget that the limit on the number of event channels in dom0 would be
> hit before that: only Windows guests are able to tear down block devices and
> free their event channels.

Doesn't this defeat the entire purpose of this bug? Since guests can now support up to 600-700 virtual block devices, I would think of at least an order of (decimal) magnitude higher limit in the host, like 8192.

(In reply to comment #13)

> The " As of kernel-2.6.18-245.el5" comment is confusing. It sounds like those
> leaks were introduced with -245, but really you mean that those leaks have
> been there a long time, up to and including -245. I would just drop it from
> the comment. If someone fixes the leaks, they'd remove the comment anyway.

I will correct the comment, thanks.
Comment 17 Laszlo Ersek 2011-02-21 12:29:11 EST
Created attachment 479965 [details]
Remove unused MAX_DEV_NAME macro. Make the maximum number of blktap devices a module parameter.

Define LIM_MAX_TAP_DEV as 1024. Consequently:
- Update description of "max_tap_dev" modparam to display LIM_MAX_TAP_DEV as
  well.
- Remove static asserts.
- Use <include/linux/kernel.h> clamp() macro for clamping.

Remove RHEL kernel tag reference from leak comment.

Task info (in progress):

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3128347
Comment 18 Andrew Jones 2011-02-21 13:09:59 EST
I put a link in my first review to the CodingStyle because you have a few things that violate it.

1) you have dyslexic conditions, e.g. 0 != ret. I know the reasoning, but the kernel doesn't do it like that. kernel guys just don't confuse = and ==.
2) you're missing your () for sizeof, it's not required, even by the CodingStyle, but it's the norm.
3) you have {} braces wrapping a single statement, for an if-block. they should never be used for single statements
4) not your fault, but there's a multi-statement line that you touch
  for(i = 0; i < max_tap_dev; i++) translate_domid[i].domid = 0xFFFF;
it would be nice to drop that second statement down where it belongs

sorry for the nitpicking...
Comment 19 Laszlo Ersek 2011-02-22 04:11:04 EST
The Xen code has these:

<include/xen/sched.h>
#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d) * 64)

and

<include/public/xen.h>
/*
 * Event channel endpoints per domain:
 *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
 */
#define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64)

The latter is present in the kernel tree as well (<include/xen/interface/xen.h>); I'll change the patch to use NR_EVENT_CHANNELS as the upper limit. Thanks Paolo for the hint. I'll also fix the problems listen in comment 18, thanks.
Comment 20 Laszlo Ersek 2011-02-22 07:37:05 EST
Created attachment 480123 [details]
max_tap_dev module parameter; v4

Difference to previous version:
- Use NR_EVENT_CHANNELS instead of LIM_MAX_TAP_DEV.
- Since NR_EVENT_CHANNELS is unsuitable for stringification by the preprocessor,
  change modparam desc to display only default value.
- Fix extra spaces inside parentheses on touched lines (CodingStyle section 3.1)
  and break out loop body in a separate line.
- Switch to clamp_t() from clamp(), as NR_EVENT_CHANNELS is of type "size_t".
- Remove "0 != " where it means success.
- Replace "0 != " with " != 0" where it means failure.
- Remove spurious braces around single statement under "if".
- Use parentheses with sizeof all the time.
Comment 21 Laszlo Ersek 2011-02-22 12:10:02 EST
(In reply to comment #0)

> Currently in the RHEL-5.2 kernel, the blktap driver has a hardcoded limit of
> 100 devices.  This means that the dom0 kernel can only service 100 disks
> total, across all guests.  Upstream has kicked this up to 256; however, there
> really seems to be no reason that this couldn't be made dynamic in the
> drivers/xen/blktap/blktapmain.c code, so you could have basically unlimited
> disks (pending memory requirements, of course).

http://lists.xensource.com/archives/html/xen-devel/2011-02/msg01332.html

> Without replacing the call to register_chrdev() with one to
> __register_chrdev() (available only with 2.6.32 and newer) I
> can't see how you would get beyond 256 devices with the
> changes you propose.

I assumed this was already covered, based on the related bug 442723.

So, what should I do? I could simply bump the original macro from 100 to 256, or I could reimplement register_chrdev() in blktap in a way that I can pass in max_tap_dev:

- alloc_chrdev_region()
- cdev_alloc()
- cdev_init()
- cdev_add()

On the error path:
- cdev_del()
- unregister_chrdev_region()

However, some settings would have to be done by hand:
- setting the cdev->owner,
- setting the name of &cdev->kobject.

Is this bug worth that?
Comment 22 Don Dutile 2011-02-22 13:40:49 EST
(In reply to comment #21)
> (In reply to comment #0)
> 
> > Currently in the RHEL-5.2 kernel, the blktap driver has a hardcoded limit of
> > 100 devices.  This means that the dom0 kernel can only service 100 disks
> > total, across all guests.  Upstream has kicked this up to 256; however, there
> > really seems to be no reason that this couldn't be made dynamic in the
> > drivers/xen/blktap/blktapmain.c code, so you could have basically unlimited
> > disks (pending memory requirements, of course).
> 
> http://lists.xensource.com/archives/html/xen-devel/2011-02/msg01332.html
> 
> > Without replacing the call to register_chrdev() with one to
> > __register_chrdev() (available only with 2.6.32 and newer) I
> > can't see how you would get beyond 256 devices with the
> > changes you propose.
> 
> I assumed this was already covered, based on the related bug 442723.
> 
> So, what should I do? I could simply bump the original macro from 100 to 256,
> or I could reimplement register_chrdev() in blktap in a way that I can pass in
> max_tap_dev:
> 
> - alloc_chrdev_region()
> - cdev_alloc()
> - cdev_init()
> - cdev_add()
> 
> On the error path:
> - cdev_del()
> - unregister_chrdev_region()
> 
> However, some settings would have to be done by hand:
> - setting the cdev->owner,
> - setting the name of &cdev->kobject.
> 
> Is this bug worth that?

NO! ... RHEL5.7 ! .... wire to 256 & move on.

Like this 'feature', it's unlikely another bz could change what should be init'd &/or added to the cdev structure, and thus, what else may need to be
init'd (which this code will sit hidden, and unknown to a modification in the common cdev code),  but.... it could happen, and this (rotten) code could cause mysterious cdev events/failures/secondary-effects due to lack of another element not init'd properly.

If rhel6.1, and able to modify/use common register_chrdev() functions, that'd
be one (out-of-band) step, but to mimic the ops of an existing register_chrdev() with caveats, hoping the common, replicated code doesn't change in a way that buries a potential issue in blktap.... I think there is too much risk for this late in the RHEL5 schedule.

Unfortunately, someone didn't do their scaling homework, and solve this problem earlier in (upstream first and then) RHEL5.

Yet another selling reason for RHEL6! ;-)
Comment 23 Laszlo Ersek 2011-02-23 07:16:12 EST
The upstream commit that changed MAX_TAP_DEV from 100 to 256 is xen-unstable c/s 11868:80b296ec93dc, written by (if I may) our own Steven Rostedt.

The patch raises the limit to 256, and solves the dynamic allocation of tapfd descriptors. It merges the translate_domid array with the tapfds array (by embedding parallel elements of the former in elements of the latter), and adds a wmb() to get_next_free_dev() (see comment 7), besides rewriting most of it.

The upstream patch delays the allocation of *tapfds[i], but the array of pointers (tapfds) itself remains statically allocated and cannot be resized. I think the only thing that we need to backport from it is the MAX_TAP_DEV change.

(I wrote this comment purely to point out the upstream origin of the MAX_TAP_DEV increase.)
Comment 24 Laszlo Ersek 2011-02-23 07:26:32 EST
Created attachment 480437 [details]
Bump MAX_TAP_DEV from 100 to 256. Remove superfluous MAX_DEV_NAME.
Comment 25 Chris Lalancette 2011-02-24 11:42:27 EST
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #0)
> > 
> > > Currently in the RHEL-5.2 kernel, the blktap driver has a hardcoded limit of
> > > 100 devices.  This means that the dom0 kernel can only service 100 disks
> > > total, across all guests.  Upstream has kicked this up to 256; however, there
> > > really seems to be no reason that this couldn't be made dynamic in the
> > > drivers/xen/blktap/blktapmain.c code, so you could have basically unlimited
> > > disks (pending memory requirements, of course).
> > 
> > http://lists.xensource.com/archives/html/xen-devel/2011-02/msg01332.html
> > 
> > > Without replacing the call to register_chrdev() with one to
> > > __register_chrdev() (available only with 2.6.32 and newer) I
> > > can't see how you would get beyond 256 devices with the
> > > changes you propose.
> > 
> > I assumed this was already covered, based on the related bug 442723.
> > 
> > So, what should I do? I could simply bump the original macro from 100 to 256,
> > or I could reimplement register_chrdev() in blktap in a way that I can pass in
> > max_tap_dev:
> > 
> > - alloc_chrdev_region()
> > - cdev_alloc()
> > - cdev_init()
> > - cdev_add()
> > 
> > On the error path:
> > - cdev_del()
> > - unregister_chrdev_region()
> > 
> > However, some settings would have to be done by hand:
> > - setting the cdev->owner,
> > - setting the name of &cdev->kobject.
> > 
> > Is this bug worth that?
> 
> NO! ... RHEL5.7 ! .... wire to 256 & move on.

As that thread pointed out, > 256 minors is possible in 2.6.18, but it is not worth the churn.  You've done due diligence to try to solve this the right way, and we know what the right way would take.  That's always the situation we want to be in; know what the right answer is, and then we can decide whether it is worth it to go down that route.

In this case, I agree that that is more churn than we want in 5.7.  So wire it to 256 and be done with it.

Chris Lalancette
Comment 27 Jarod Wilson 2011-03-03 15:23:42 EST
in kernel-2.6.18-246.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.
Comment 28 Jarod Wilson 2011-03-03 15:33:15 EST
in kernel-2.6.18-246.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.
Comment 30 Yuyu Zhou 2011-04-26 08:53:10 EDT
Verified the bug on kernel-xen-2.6.18-257.el5, xen-3.0.3-129.el5.

Steps to verified:
1. Add "fs.aio-max-nr = 262144" to "/etc/sysctl.conf" in host, and execute
"sysctl -p".
2. Create a PV guest and create 256 image files in dom0
3. Attach the 256 images to guest one by one by script.

Results:
Return error since 235th images, all 234 images can be seen and works well in
guest. no other limits hit.

So change the Status to VERIFIED.
Comment 31 errata-xmlrpc 2011-07-21 06:26:32 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html

Note You need to log in before you can comment on or make changes to this bug.