RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1695879 - large number of luns reports "failed errno 24"/"Too many open files" when running lvm commands on RHEL7.6 [rhel-7.6.z]
Summary: large number of luns reports "failed errno 24"/"Too many open files" when run...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.6
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Marian Csontos
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1691277
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-03 20:21 UTC by RAD team bot copy to z-stream
Modified: 2019-07-23 01:54 UTC (History)
14 users (show)

Fixed In Version: lvm2-2.02.180-10.el7_6.7
Doc Type: If docs needed, set a value
Doc Text:
On a system with around 1000 devices, LVM would fail to open devices because of the default open file limit. This fix avoids the problem by closing non-LVM devices before reaching the limit.
Clone Of: 1691277
Environment:
Last Closed: 2019-04-23 14:28:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0814 0 None None None 2019-04-23 14:28:15 UTC

Description RAD team bot copy to z-stream 2019-04-03 20:21:57 UTC
This bug has been copied from bug #1691277 and has been proposed to be backported to 7.6 z-stream (EUS).

Comment 6 David Teigland 2019-04-04 14:38:53 UTC
I've not been able to reproduce this with or without lvmetad.  I have a suspicion it could be a side effect of the udev problems, the EMFILE errors first appear immediately after those udev issues.  Could you try setting obtain_device_list_from_udev=0 in lvm.conf and see if this still happens?

Comment 8 David Teigland 2019-04-04 16:06:26 UTC
There are other commits in the stable branch that fix this issue (so stable and 7.7 do not have this problem.)   So, it looks like your testing has validated the original fix for bug 1691277 (there's no more problem when not using lvmetad), but has also uncovered other fixes that would be needed to make lvmetad work with this many devices.

One or more of the following commits from stable would need to be backported to 7.6.z, but it's not clear how many of them can be cherry-picked directly.  Some may depend on other unrelated changes.  This may become more backporting than is appropriate for zstream.


commit 9799c8da07b77844451c64bcbbce0d9d43ce2552
Author: David Teigland <teigland>
Date:   Tue Nov 6 16:03:17 2018 -0600

    devices: reuse bcache fd when getting block size
    
    This avoids an unnecessary open() on the device.

commit f7ffba204e06ae432ae2c7943cb41eec5b8e8bb1
Author: David Teigland <teigland>
Date:   Tue Jun 26 12:05:39 2018 -0500

    devs: use bcache fd for read ahead ioctl
    
    to avoid an unnecessary open of the device in
    most cases.

commit 73578e36faa78c616716617a83083cc3a31ba03f
Author: David Teigland <teigland>
Date:   Fri May 11 14:28:46 2018 -0500

    dev_cache: remove the lvmcache check when closing fd
    
    This is no longer used since devices are not held
    open in dev_cache.

commit 3e3cb22f2a115f71f883a75c7840ab271bd83454
Author: David Teigland <teigland>
Date:   Fri May 11 14:25:08 2018 -0500

    dev_cache: fix close in utility functions
    
    All these functions are now used as utilities,
    e.g. for ioctl (not for io), and need to
    open/close the device each time they are called.
    (Many of the opens can probably be eliminated by
    just using the bcache fd for the ioctl.)

commit ccab54677c9f92cf1bd11895251799c043a57602
Author: David Teigland <teigland>
Date:   Fri May 11 13:53:19 2018 -0500

    dev_cache: fix close in dev_get_block_size

Comment 9 Marian Csontos 2019-04-09 12:42:17 UTC
To summarize, when there are many PVs in the system:

- async io, supposed to speed things up, can not used because of Bug 1656498,
- and lvmetad, supposed to speed things up, can not be used because of this bug.

So sync io without lvmetad is the only option. Is not that hurting performance badly?

Comment 10 David Teigland 2019-04-09 15:05:16 UTC
(In reply to Marian Csontos from comment #9)
> To summarize, when there are many PVs in the system:
> 
> - async io, supposed to speed things up, can not used because of Bug 1656498,

That shouldn't be related to the number of devices.  It's caused by other unknown software that's using all the aio contexts.  A user can simply increase the number of aio contexts on the system if they want lvm to use aio instead of falling back to sync io.

> - and lvmetad, supposed to speed things up, can not be used because of this bug.

Just to clarify, this specific bug appears to be fixed, but there are other issues mentioned in comment 8 that will cause similar problems at around 1000 devices.  That issue can also be avoided by simply increasing the open fd limit.  If this is a problem we could open a new bug to do zstream backports of some of the other commits in comment 8.

> So sync io without lvmetad is the only option. Is not that hurting performance badly?

Comment 11 Marian Csontos 2019-04-10 14:15:11 UTC
Martin, are you happy with the above explanation?

Comment 13 Marian Csontos 2019-04-11 12:48:25 UTC
David, could you provide a doc string, please.

Comment 15 kailas 2019-04-22 15:25:56 UTC
Hello,

When we are going to release the patch for this.

Comment 17 errata-xmlrpc 2019-04-23 14:28:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0814


Note You need to log in before you can comment on or make changes to this bug.