Bug 1463843

Summary:	storaged flooding messages with "Failed to update LVM volume group"
Product:	Red Hat Enterprise Linux 7	Reporter:	Federico Sun <fsun>
Component:	storaged	Assignee:	Vratislav Podzimek <vpodzime>
Status:	CLOSED ERRATA	QA Contact:	xhe <xhe>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	7.2	CC:	bdonahue, vpodzime, xhe
Target Milestone:	rc	Keywords:	Extras
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	storaged-2.5.2-4	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-09-05 09:56:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Federico Sun 2017-06-21 21:26:06 UTC

Description of problem:

in messages, storaged process is flooding with:

  storaged[13731]: LVM2 plugin: Too many open files

From lsof a lot of 'pipe' from storaged's pid opened:
~~
$ cat grep -c 13731 lsof                                                                                                                                                                          
82120

Version-Release number of selected component (if applicable):

RHEL 7.2
kernel 3.10.0-327.28.2.el7.x86_64
libstoraged-2.5.2-2.el7.x86_64
storaged-2.5.2-2.el7.x86_64
storaged-lvm2-2.5.2-2.el7.x86_64


Additional info:

This issue has been detected and fixed in upstream:

* https://github.com/storaged-project/udisks/issues/45
* https://github.com/storaged-project/udisks/pull/90/commits/583cf3031f3c22cbb95ba507b21859772f8cdf05

But the patch doesn't seem to be in the latest version of RHEL 7.3.

Comment 2 Vratislav Podzimek 2017-06-23 08:20:40 UTC

(In reply to Federico Sun from comment #0)
> This issue has been detected and fixed in upstream:
> 
> * https://github.com/storaged-project/udisks/issues/45
> *
> https://github.com/storaged-project/udisks/pull/90/commits/
> 583cf3031f3c22cbb95ba507b21859772f8cdf05
> 
> But the patch doesn't seem to be in the latest version of RHEL 7.3.

I'll try to backport and ship it next week.

Comment 3 Vratislav Podzimek 2017-06-27 08:12:51 UTC

I have the patche backported.

Comment 8 xhe@redhat.com 2017-08-11 08:49:41 UTC

For the manual test, I tried to reproduce it on RHEL-7.3 with the storaged-2.5.2-2, I did start the storaged service and created 1000 lvms (100M-500M peer lvm) from one same vg, I even made filesystem on these lvms and mount them. Then check the directory of "/proc/425/fd", I still didn't hit the same issue. Please let me know what steps I lost.

*********** manual details *****************
# rpm -qa|egrep "storaged|kernel"
libstoraged-2.5.2-2.el7.x86_64
storaged-lvm2-2.5.2-2.el7.x86_64
storaged-2.5.2-2.el7.x86_64
kernel-3.10.0-514.el7.x86_64

# storagectl status
MODEL                     REVISION  SERIAL               DEVICE
--------------------------------------------------------------------------
IBM ServeRAID M5110e      3.19      0028f23c21ca1c6819f0ae9640600705 sda     
NETAPP LUN                811a      BVfC5+GmgIm0         dm-3 sdb sdf sdj sdn
NETAPP LUN                811a      BVfC5+GmgInR         dm-4 sdc sdg sdk sdo
NETAPP LUN                811a      BVfC5+GmgInT         dm-5 sdd sdh sdl sdp
NETAPP LUN                811a      BVfC5+GmgInV         dm-6 sde sdi sdm sdq

for i in {1..1000}
do
    lvcreate -L 100M -n "lvm_test$i" rhel_storageqe-53
    mkfs.ext3 "/dev/rhel_storageqe-53/lvm_test$i"
    mkdir "/lvm_test$i"
    mount "/dev/rhel_storageqe-53/lvm_test$i" "/lvm_test$i"
done

# ps -aux|grep storaged
root       425  0.1  0.1 465436 33684 ?        Ssl  03:06   0:05 /usr/libexec/storaged/storaged --no-debug
root     19634  0.0  0.0 112652   960 pts/1    S+   04:31   0:00 grep --color=auto storaged

# ls /proc/425/fd
0  1  10  11  12  2  3  4  5  6  7  8  9

# ll /proc/425/fd
total 0
lrwx------. 1 root root 64 Aug 11 03:06 0 -> /dev/null
lrwx------. 1 root root 64 Aug 11 03:06 1 -> /dev/null
lr-x------. 1 root root 64 Aug 11 03:06 10 -> anon_inode:inotify
lrwx------. 1 root root 64 Aug 11 03:06 11 -> socket:[76410]
lrwx------. 1 root root 64 Aug 11 03:06 12 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 11 03:06 2 -> /dev/null
lrwx------. 1 root root 64 Aug 11 03:06 3 -> socket:[76405]
lrwx------. 1 root root 64 Aug 11 03:06 4 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 11 03:06 5 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 11 03:06 6 -> socket:[40385]
lrwx------. 1 root root 64 Aug 11 03:06 7 -> anon_inode:[eventfd]
lr-x------. 1 root root 64 Aug 11 03:06 8 -> /proc/425/mountinfo
lr-x------. 1 root root 64 Aug 11 03:06 9 -> /proc/swaps

Comment 10 xhe@redhat.com 2017-08-28 11:10:52 UTC

I still didn't reproduce it, I did rename the lvm, mount/umount and sleep 1 second for the 1000 lvms. 

*************** snip **************
[1/1000]
  Renamed "lvm_1" to "lvm_test1" in volume group "rhel_storageqe-53"
  Renamed "lvm_test1" to "lvm_1" in volume group "rhel_storageqe-53"
  mount  /dev/rhel_storageqe-53/lvm_1 /lvm_test1
  umount  /dev/rhel_storageqe-53/lvm_1
  sleep 1
[.../1000]
  ...
[899/1000]
  Renamed "lvm_899" to "lvm_test899" in volume group "rhel_storageqe-53"
  Renamed "lvm_test899" to "lvm_899" in volume group "rhel_storageqe-53"
  mount  /dev/rhel_storageqe-53/lvm_899 /lvm_test899
  umount  /dev/rhel_storageqe-53/lvm_899
[900/1000]
  Renamed "lvm_900" to "lvm_test900" in volume group "rhel_storageqe-53"
  Renamed "lvm_test900" to "lvm_900" in volume group "rhel_storageqe-53"
  mount  /dev/rhel_storageqe-53/lvm_900 /lvm_test900
  umount  /dev/rhel_storageqe-53/lvm_900
  ...
[1000/1000]

# ps -aux|grep storaged
root      4735  0.4  0.0 455116 25636 ?        Ssl  05:39   0:26 /usr/libexec/storaged/storaged --no-debug
root     14207  0.0  0.0 112652   960 pts/4    S+   07:10   0:00 grep --color=auto storaged
root     28236  0.0  0.0 143864  1348 ?        S    03:07   0:00 journalctl -q --output=json --lines=10 --follow -- _SYSTEMD_UNIT=storaged.service + _SYSTEMD_UNIT=udisks2.service + _SYSTEMD_UNIT=dm-event.service + _SYSTEMD_UNIT=smartd.service + _SYSTEMD_UNIT=multipathd.service

#  ls -l /proc/4735/fd
total 0
lrwx------. 1 root root 64 Aug 28 05:39 0 -> /dev/null
lrwx------. 1 root root 64 Aug 28 05:39 1 -> /dev/null
lr-x------. 1 root root 64 Aug 28 05:39 10 -> anon_inode:inotify
lrwx------. 1 root root 64 Aug 28 05:39 11 -> socket:[10198018]
lrwx------. 1 root root 64 Aug 28 05:39 12 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 28 05:39 2 -> /dev/null
lrwx------. 1 root root 64 Aug 28 05:39 3 -> socket:[10131453]
lrwx------. 1 root root 64 Aug 28 05:39 4 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 28 05:39 5 -> anon_inode:[eventfd]
lrwx------. 1 root root 64 Aug 28 05:39 6 -> socket:[10191481]
lrwx------. 1 root root 64 Aug 28 05:39 7 -> anon_inode:[eventfd]
lr-x------. 1 root root 64 Aug 28 05:39 8 -> /proc/4735/mountinfo
lr-x------. 1 root root 64 Aug 28 05:39 9 -> /proc/swaps

Comment 12 xhe@redhat.com 2017-08-29 02:17:01 UTC

I tested it on the 2.5.2-2, 2.5.2-3 and 2.5.2-4, I have not reproduced it so far.
I updated the 1000 lvms which are from the same one vg, I tested the lvm renaming twice, mount and umount in one 1000 times loop and sleep 1 second for every cycle.

And I even tried to run three processes for one 1000+ time loop at the same time for the case of with 1 second sleep and without 1 second sleep :
1st process - umount lvm
2nd process - rename lvm twice
3rd process - mount lvm

My testing process is still running on the 1000 lvm removing and recreating at moment, and I will update my test result when the whole process finished later.

Comment 16 xhe@redhat.com 2017-08-29 08:36:25 UTC

According to the upstream test result on #c2, it has fixed the original issue, so I did the sanityonly (due to it's hard to reproduced to me without the customer environment).

I verified it as SanityOnly.

Comment 18 errata-xmlrpc 2017-09-05 09:56:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2597