Bug 1379623

Summary: Do not limit rate of lvm2-pvscan@major:minor.service systemd unit runs - avoid "Start request repeated too quickly"
Product: Red Hat Enterprise Linux 7 Reporter: Peter Rajnoha <prajnoha>
Component: lvm2Assignee: Peter Rajnoha <prajnoha>
lvm2 sub component: LVM Metadata / lvmetad QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: agk, bhu, fsumsal, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, prockai, rbednar, zkabelac
Version: 7.3   
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.171-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 21:47:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Rajnoha 2016-09-27 09:00:52 UTC
By default, systemd limits the rate of service runs, from man 5 systemd.unit:

"By default, units which are started more than 5 times within 10 seconds are not permitted to start any more times until the 10 second interval ends."

However, using such limit for lvm2-pvscan@major:minor.service instances can cause lvmetad to not see the PV or vice versa - to still hold the PV in cache if it's already removed from system due to the fact that pvscan might not be run in case the rate limit is hit for the service.

For this reason, we need to disable this limit.

For example:

[0] fedora/~ # while true; do systemctl stop lvm2-pvscan@252:2.service; systemctl 
start lvm2-pvscan@252:2.service; done
Job for lvm2-pvscan@252:2.service failed. See "systemctl status lvm2-pvscan@252:2.service" and "journalctl -xe" for details.

[0] fedora/~ # systemctl status lvm2-pvscan@252:2.service
* lvm2-pvscan@252:2.service - LVM2 PV scan on device 252:2
   Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled)
   Active: inactive (dead) since Tue 2016-09-27 10:56:57 CEST; 23s ago
     Docs: man:pvscan(8)
  Process: 3487 ExecStop=/usr/sbin/lvm pvscan --cache %i (code=exited, status=0/SUCCESS)
  Process: 3478 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay %i (code=exited, status=0/SUCCESS)
 Main PID: 3478 (code=exited, status=0/SUCCESS)

Sep 27 10:56:57 fedora.virt systemd[1]: Stopped LVM2 PV scan on device 252:2.
Sep 27 10:56:57 fedora.virt systemd[1]: lvm2-pvscan@252:2.service: Start request repeated too quickly.
Sep 27 10:56:57 fedora.virt systemd[1]: Failed to start LVM2 PV scan on device 252:2.


See also bug #1331709 comment #107 where this caused an issue in real environment (anaconda installer).

The rate limiting is disabled with this patch:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=0a480c5c52bd34a1eb8337d59a7f4cc60317d2b1

Comment 1 Peter Rajnoha 2017-01-31 12:08:21 UTC
The StartLimitInterval directive is recognized under [Service] section in older versions of systemd while in newer ones, it's in [Unit] section so make sure the patch is properly edited for the version of systemd we have in RHEL7!!!

Comment 4 Frantisek Sumsal 2017-04-12 09:47:17 UTC
As Peter mentioned in comment #1, in the current RHEL 7 systemd version the StartLimitInterval directive is recognized under [Service] section (man systemd.service). Current lvm2 version includes this directive under [Unit] section, which triggers following warning (thanks mmalik for reporting this):

systemd[1]: [/usr/lib/systemd/system/lvm2-pvscan@.service:5] Unknown lvalue 'StartLimitInterval' in section 'Unit'

Comment 5 Roman Bednář 2017-05-16 09:38:10 UTC
Thank you for the comment, marking this as failed QA until fixed.
After moving 'StartLimitInterval' under [Service] section limit change applied correctly.


1) error in log when using the patched unit file:

May 16 11:16:03 virt-378 systemd: Reloading.
May 16 11:16:03 virt-378 systemd: [/usr/lib/systemd/system/lvm2-pvscan@.service:10] Unknown lvalue 'StartLimitInterval' in section 'Unit'
May 16 11:16:20 virt-378 systemd: Stopping LVM2 PV scan on device 252:2...
May 16 11:16:20 virt-378 systemd: Stopped LVM2 PV scan on device 252:2.
May 16 11:16:23 virt-378 systemd: Starting LVM2 PV scan on device 252:2...
May 16 11:16:23 virt-378 systemd: Started LVM2 PV scan on device 252:2.


2) limit is still applied by systemd

# while true; do systemctl stop lvm2-pvscan@252:2.service;systemctl start lvm2-pvscan@252:2.service;done
Job for lvm2-pvscan@252:2.service failed because start of the service was attempted too often. See "systemctl status lvm2-pvscan@252:2.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed lvm2-pvscan@252:2.service" followed by "systemctl start lvm2-pvscan@252:2.service" again.
Job for lvm2-pvscan@252:2.service failed because start of the service was attempted too often. See "systemctl status lvm2-pvscan@252:2.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed lvm2-pvscan@252:2.service" followed by "systemctl start lvm2-pvscan@252:2.service" again.
.......

3) edit unit file, move limit to service section

# cat /lib/systemd/system/lvm2-pvscan@.service 
[Unit]
Description=LVM2 PV scan on device %i
Documentation=man:pvscan(8)
DefaultDependencies=no
BindsTo=dev-block-%i.device
Requires=lvm2-lvmetad.socket
After=lvm2-lvmetad.socket lvm2-lvmetad.service
Before=shutdown.target
Conflicts=shutdown.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/lvm pvscan --cache --activate ay %i
ExecStop=/usr/sbin/lvm pvscan --cache %i
StartLimitInterval=0


4) reload unit files
# systemctl daemon-reload

5) try again, limit is overridden

# for i in {1..20}; do systemctl stop lvm2-pvscan@252:2.service;systemctl start lvm2-pvscan@252:2.service;done
# systemctl status lvm2-pvscan@252:2.service
● lvm2-pvscan@252:2.service - LVM2 PV scan on device 252:2
   Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled)
   Active: active (exited) since Tue 2017-05-16 11:25:12 CEST; 3s ago
     Docs: man:pvscan(8)
  Process: 29729 ExecStop=/usr/sbin/lvm pvscan --cache %i (code=exited, status=0/SUCCESS)
  Process: 29738 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay %i (code=exited, status=0/SUCCESS)
 Main PID: 29738 (code=exited, status=0/SUCCESS)

May 16 11:25:12 virt-378.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Starting LVM2 PV scan on device 252:2...
May 16 11:25:12 virt-378.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Started LVM2 PV scan on device 252:2.


NOTE:
Disregard my Comment 2. Enabling limits in system.conf does not have to be done, they're enabled by default.

Comment 7 Roman Bednář 2017-05-31 12:18:21 UTC
Verified. 'StartLimitInterval' has moved to [Service] section in lvm2-pvscan unit file (lvm2-2.02.171-2)

# cat /usr/lib/systemd/system/lvm2-pvscan@.service
[Unit]
Description=LVM2 PV scan on device %i
Documentation=man:pvscan(8)
DefaultDependencies=no
BindsTo=dev-block-%i.device
Requires=lvm2-lvmetad.socket
After=lvm2-lvmetad.socket lvm2-lvmetad.service
Before=shutdown.target
Conflicts=shutdown.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/lvm pvscan --cache --activate ay %i
ExecStop=/usr/sbin/lvm pvscan --cache %i
StartLimitInterval=0


3.10.0-671.el7.x86_64

lvm2-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
lvm2-libs-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
lvm2-cluster-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-libs-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-event-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-event-libs-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 17:15:46 CEST 2017
cmirror-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017

Comment 8 errata-xmlrpc 2017-08-01 21:47:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2222