1463843 – storaged flooding messages with "Failed to update LVM volume group"

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1463843 - storaged flooding messages with "Failed to update LVM volume group"

Summary: storaged flooding messages with "Failed to update LVM volume group"

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	storaged
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Vratislav Podzimek
QA Contact:	xhe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-21 21:26 UTC by Federico Sun
Modified:	2021-09-03 14:13 UTC (History)
CC List:	3 users (show)
Fixed In Version:	storaged-2.5.2-4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-09-05 09:56:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1491753	0	unspecified	CLOSED	Storaged flooding with "LVM2 plugin: Too many open files" messages	2021-09-03 14:08:07 UTC
Red Hat Product Errata	RHBA-2017:2597	0	normal	SHIPPED_LIVE	storaged bug fix update	2017-09-05 13:56:04 UTC

Internal Links: 1491753

Description Federico Sun 2017-06-21 21:26:06 UTC

Description of problem:

in messages, storaged process is flooding with:

  storaged[13731]: LVM2 plugin: Too many open files

From lsof a lot of 'pipe' from storaged's pid opened:
~~
$ cat grep -c 13731 lsof                                                                                                                                                                          
82120

Version-Release number of selected component (if applicable):

RHEL 7.2
kernel 3.10.0-327.28.2.el7.x86_64
libstoraged-2.5.2-2.el7.x86_64
storaged-2.5.2-2.el7.x86_64
storaged-lvm2-2.5.2-2.el7.x86_64


Additional info:

This issue has been detected and fixed in upstream:

* https://github.com/storaged-project/udisks/issues/45
* https://github.com/storaged-project/udisks/pull/90/commits/583cf3031f3c22cbb95ba507b21859772f8cdf05

But the patch doesn't seem to be in the latest version of RHEL 7.3.

Comment 2 Vratislav Podzimek 2017-06-23 08:20:40 UTC

(In reply to Federico Sun from comment #0)
> This issue has been detected and fixed in upstream:
> 
> * https://github.com/storaged-project/udisks/issues/45
> *
> https://github.com/storaged-project/udisks/pull/90/commits/
> 583cf3031f3c22cbb95ba507b21859772f8cdf05
> 
> But the patch doesn't seem to be in the latest version of RHEL 7.3.

I'll try to backport and ship it next week.

Comment 3 Vratislav Podzimek 2017-06-27 08:12:51 UTC

I have the patche backported.

Comment 8 xhe@redhat.com 2017-08-11 08:49:41 UTC

For the manual test, I tried to reproduce it on RHEL-7.3 with the storaged-2.5.2-2, I did start the storaged service and created 1000 lvms (100M-500M peer lvm) from one same vg, I even made filesystem on these lvms and mount them. Then check the directory of "/proc/425/fd", I still didn't hit the same issue. Please let me know what steps I lost.

*********** manual details *****************
# rpm -qa|egrep "storaged|kernel"
libstoraged-2.5.2-2.el7.x86_64
storaged-lvm2-2.5.2-2.el7.x86_64
storaged-2.5.2-2.el7.x86_64
kernel-3.10.0-514.el7.x86_64

# storagectl status
MODEL                     REVISION  SERIAL               DEVICE
--------------------------------------------------------------------------
IBM ServeRAID M5110e      3.19      0028f23c21ca1c6819f0ae9640600705 sda     
NETAPP LUN                811a      BVfC5+GmgIm0         dm-3 sdb sdf sdj sdn
NETAPP LUN                811a      BVfC5+GmgInR         dm-4 sdc sdg sdk sdo
NETAPP LUN                811a      BVfC5+GmgInT         dm-5 sdd sdh sdl sdp
NETAPP LUN                811a      BVfC5+GmgInV         dm-6 sde sdi sdm sdq

for i in {1..1000}
do
    lvcreate -L 100M -n "lvm_test$i" rhel_storageqe-53
    mkfs.ext3 "/dev/rhel_storageqe-53/lvm_test$i"
    mkdir "/lvm_test$i"
    mount "/dev/rhel_storageqe-53/lvm_test$i" "/lvm_test$i"
done

# ps -aux|grep storaged
root       425  0.1  0.1 465436 33684 ?        Ssl  03:06   0:05 /usr/libexec/storaged/storaged --no-debug
root     19634  0.0  0.0 112652   960 pts/1    S+   04:31   0:00 grep --color=auto storaged

# ls /proc/425/fd
0  1  10  11  12  2  3  4  5  6  7  8  9

# ll /proc/425/fd
total 0
lrwx------. 1 root root 64 Aug 11 03:06 0 -> /dev/null
lrwx------. 1 root root 64 Aug 11 03:06 1 -> /dev/null
lr-x------. 1 root root 64 Aug 11 03:06 10 -> anon_inode:inotify
lrwx------. 1 root root 64 Aug 11 03:06 11 -> socket:[76410]
lrwx------. 1 root root 64 Aug 11 03:06 12 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 11 03:06 2 -> /dev/null
lrwx------. 1 root root 64 Aug 11 03:06 3 -> socket:[76405]
lrwx------. 1 root root 64 Aug 11 03:06 4 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 11 03:06 5 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 11 03:06 6 -> socket:[40385]
lrwx------. 1 root root 64 Aug 11 03:06 7 -> anon_inode:[eventfd]
lr-x------. 1 root root 64 Aug 11 03:06 8 -> /proc/425/mountinfo
lr-x------. 1 root root 64 Aug 11 03:06 9 -> /proc/swaps

Comment 10 xhe@redhat.com 2017-08-28 11:10:52 UTC

I still didn't reproduce it, I did rename the lvm, mount/umount and sleep 1 second for the 1000 lvms. 

*************** snip **************
[1/1000]
  Renamed "lvm_1" to "lvm_test1" in volume group "rhel_storageqe-53"
  Renamed "lvm_test1" to "lvm_1" in volume group "rhel_storageqe-53"
  mount  /dev/rhel_storageqe-53/lvm_1 /lvm_test1
  umount  /dev/rhel_storageqe-53/lvm_1
  sleep 1
[.../1000]
  ...
[899/1000]
  Renamed "lvm_899" to "lvm_test899" in volume group "rhel_storageqe-53"
  Renamed "lvm_test899" to "lvm_899" in volume group "rhel_storageqe-53"
  mount  /dev/rhel_storageqe-53/lvm_899 /lvm_test899
  umount  /dev/rhel_storageqe-53/lvm_899
[900/1000]
  Renamed "lvm_900" to "lvm_test900" in volume group "rhel_storageqe-53"
  Renamed "lvm_test900" to "lvm_900" in volume group "rhel_storageqe-53"
  mount  /dev/rhel_storageqe-53/lvm_900 /lvm_test900
  umount  /dev/rhel_storageqe-53/lvm_900
  ...
[1000/1000]

# ps -aux|grep storaged
root      4735  0.4  0.0 455116 25636 ?        Ssl  05:39   0:26 /usr/libexec/storaged/storaged --no-debug
root     14207  0.0  0.0 112652   960 pts/4    S+   07:10   0:00 grep --color=auto storaged
root     28236  0.0  0.0 143864  1348 ?        S    03:07   0:00 journalctl -q --output=json --lines=10 --follow -- _SYSTEMD_UNIT=storaged.service + _SYSTEMD_UNIT=udisks2.service + _SYSTEMD_UNIT=dm-event.service + _SYSTEMD_UNIT=smartd.service + _SYSTEMD_UNIT=multipathd.service

#  ls -l /proc/4735/fd
total 0
lrwx------. 1 root root 64 Aug 28 05:39 0 -> /dev/null
lrwx------. 1 root root 64 Aug 28 05:39 1 -> /dev/null
lr-x------. 1 root root 64 Aug 28 05:39 10 -> anon_inode:inotify
lrwx------. 1 root root 64 Aug 28 05:39 11 -> socket:[10198018]
lrwx------. 1 root root 64 Aug 28 05:39 12 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 28 05:39 2 -> /dev/null
lrwx------. 1 root root 64 Aug 28 05:39 3 -> socket:[10131453]
lrwx------. 1 root root 64 Aug 28 05:39 4 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 28 05:39 5 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 28 05:39 6 -> socket:[10191481]
lrwx------. 1 root root 64 Aug 28 05:39 7 -> anon_inode:[eventfd]
lr-x------. 1 root root 64 Aug 28 05:39 8 -> /proc/4735/mountinfo
lr-x------. 1 root root 64 Aug 28 05:39 9 -> /proc/swaps

Comment 12 xhe@redhat.com 2017-08-29 02:17:01 UTC

I tested it on the 2.5.2-2, 2.5.2-3 and 2.5.2-4, I have not reproduced it so far.
I updated the 1000 lvms which are from the same one vg, I tested the lvm renaming twice, mount and umount in one 1000 times loop and sleep 1 second for every cycle.

And I even tried to run three processes for one 1000+ time loop at the same time for the case of with 1 second sleep and without 1 second sleep :
1st process - umount lvm
2nd process - rename lvm twice
3rd process - mount lvm

My testing process is still running on the 1000 lvm removing and recreating at moment, and I will update my test result when the whole process finished later.

Comment 16 xhe@redhat.com 2017-08-29 08:36:25 UTC

According to the upstream test result on #c2, it has fixed the original issue, so I did the sanityonly (due to it's hard to reproduced to me without the customer environment).

I verified it as SanityOnly.

Comment 18 errata-xmlrpc 2017-09-05 09:56:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2597

Note You need to log in before you can comment on or make changes to this bug.