Bug 1600161 - gnome-settings-daemon account plugin polling accountsservice in a continous loop
Summary: gnome-settings-daemon account plugin polling accountsservice in a continous loop
Status: ON_QA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: gnome-settings-daemon   
(Show other bugs)
Version: 7.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Ray Strode [halfline]
QA Contact: Desktop QE
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 1621561 1656436
TreeView+ depends on / blocked
 
Reported: 2018-07-11 14:51 UTC by jigar
Modified: 2019-03-07 15:15 UTC (History)
30 users (show)

Fixed In Version: gnome-settings-daemon-3.28.1-3.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
account: don't poll more frequently than notification period (5.05 KB, patch)
2018-11-09 15:56 UTC, Ray Strode [halfline]
no flags Details | Diff
test patch packages from halfline (3.64 MB, application/zip)
2018-11-30 16:22 UTC, Joe Wright
no flags Details

Description jigar 2018-07-11 14:51:30 UTC
Description of problem ( At Customer's end ) : On a VM cloned test environment, the process polkitd is constantly consuming about 15% of CPU.  This behavior is not seen in the primary environment (which was recently upgraded to 7.5 and used as a template to create a VM clone). Since the upgrade to 7.5, it is noticed that gnome-shell is very active (you can constantly see it popping up as top CPU consumer on the "top" window. our "nomachine" client also kept on freezing once the screensaver kicks in and locks the window. This behaviour started after the upgrade to 7.5 from 7.4.


Version-Release number of selected component (if applicable): 
polkit-0.112-14.el7

How reproducible: Frequently at customer's end

Actual results: Polkitd consuming 15% of CPU


Expected results: Polkitd shouldn't consume high CPU continuously 


Additional info:

- appears to be polling 'GetPasswordExpirationPolicy' constantly
- strace reflects the polling issue
- no centralized authentication in use
- confirmed that the frequent logins of root are automation and normal behaviour for their environment

Comment 2 David Rheinsberg 2018-07-12 14:57:55 UTC
(In reply to jigar from comment #0)
> Additional info:
> 
> - appears to be polling 'GetPasswordExpirationPolicy' constantly

Who polls this?

> - strace reflects the polling issue

Strace on what?

> - no centralized authentication in use

Can you elaborate what you mean with that?

> - confirmed that the frequent logins of root are automation and normal
> behaviour for their environment

Which "frequent logins of root"? What do you mean with that?

Comment 3 Yan Li 2018-07-18 00:24:30 UTC
It is gsd-account that keeps polling polkit for GetPasswordExpirationPolicy. 

Following is my analysis. This is a bug in gnome-settings-daemon:plugins/account/gsd-account-manager.c. A timeout is added to check the password expiration time, and a notification would be shown a few days before the password expires. The timeout queues itself back after being triggered. However, the timeout relies on manager->priv->last_notify_time, which is _ONLY_ updated when a notification is shown, but not after a check. Therefore, the timeout is added with 0 second and is triggered again instantly after it is added, causing the plugin to keep busy checking password expiration policy.

This bug should be reassigned to gsd maintainer.

Comment 4 Alessandro Suardi 2018-08-01 12:32:11 UTC
I can observe this on OL7.5 as well (2 laptops) and can add to observations at #3 that:

a) this issue appears to pop up for me after suspend-to-memory/resume cycles (but not in 100% cases)
b) strace'ing of /usr/libexec/gsd-account in a "normal" situation shows no ordinary activity on the daemon's threads (4 threads on restart_syscall)
c) strace'ing of /usr/libexec/gsd-account in a "bad" situation shows looping of TID 4098 as described in #3 above:


[asuardi@tippy ~]$ strace -fp 4098
strace: Process 4098 attached with 4 threads
[pid  4138] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid  4112] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid  4110] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid  4098] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid  4112] <... restart_syscall resumed> ) = 1
[pid  4112] recvmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"l\2\1\0010\0\0\0\0257\16\0005\0\0\0", 16}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 16
[pid  4112] poll([{fd=8, events=POLLIN}], 1, 0) = 1 ([{fd=8, revents=POLLIN}])
[pid  4112] recvmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\10\1g\0\6xxxxxx\0\0\0\0\0\5\1u\0\0\275\4\0\6\1s\0\6\0\0\0"..., 104}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 104
[pid  4112] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, 0) = 0 (Timeout)
[pid  4112] write(3, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  4098] <... restart_syscall resumed> ) = 1
[pid  4112] poll([{fd=8, events=POLLIN}], 1, 0 <unfinished ...>
[pid  4098] read(3,  <unfinished ...>
[pid  4112] <... poll resumed> )        = 0 (Timeout)
[pid  4098] <... read resumed> "\1\0\0\0\0\0\0\0", 16) = 8
[pid  4112] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}], 3, -1 <unfinished ...>
[pid  4098] access("MARK: gsd-account on_got_password_expiration_policy: start ", F_OK) = -1 ENOENT (No such file or directory)
[pid  4098] access("MARK: gsd-account on_got_password_expiration_policy: end ", F_OK) = -1 ENOENT (No such file or directory)
[pid  4098] poll([{fd=3, events=POLLIN}], 1, 0) = 0 (Timeout)
[pid  4098] write(6, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
[pid  4112] <... poll resumed> )        = 1 ([{fd=6, revents=POLLIN}])
[pid  4098] <... write resumed> )       = 8
[pid  4112] futex(0x1639c80, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid  4098] futex(0x1639c80, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid  4112] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid  4098] <... futex resumed> )       = 0
[pid  4112] read(6,  <unfinished ...>
[pid  4098] futex(0x1639c80, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid  4112] <... read resumed> "\1\0\0\0\0\0\0\0", 16) = 8
[pid  4112] futex(0x1639c80, FUTEX_WAKE_PRIVATE, 1) = 1
[pid  4098] <... futex resumed> )       = 0
[pid  4112] futex(0x7f8238003f50, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid  4098] futex(0x1639c80, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  4098] futex(0x7f8238003f50, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid  4112] <... futex resumed> )       = 0
[pid  4098] <... futex resumed> )       = 1
[pid  4112] futex(0x7f8238003f50, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid  4098] poll([{fd=3, events=POLLIN}], 1, 25000 <unfinished ...>
[pid  4112] <... futex resumed> )       = 0
[pid  4112] sendmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"l\1\0\1\0\0\0\0\1\275\4\0\225\0\0\0\10\1g\0\0\0\0\0\1\1o\0\"\0\0\0"..., 168}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 168
[pid  4112] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}], 3, 0) = 0 (Timeout)
[pid  4112] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}], 3, -1) = 1 ([{fd=8, revents=POLLIN}])
[pid  4112] recvmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"l\2\1\0010\0\0\0\0307\16\0005\0\0\0", 16}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 16
[pid  4112] poll([{fd=8, events=POLLIN}], 1, 0) = 1 ([{fd=8, revents=POLLIN}])
[pid  4112] recvmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\10\1g\0\6xxxxxx\0\0\0\0\0\5\1u\0\1\275\4\0\6\1s\0\6\0\0\0"..., 104}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 104
[pid  4112] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, 0) = 0 (Timeout)
[pid  4112] write(3, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  4098] <... poll resumed> )        = 1 ([{fd=3, revents=POLLIN}])
[pid  4112] poll([{fd=8, events=POLLIN}], 1, 0 <unfinished ...>
[pid  4098] read(3,  <unfinished ...>
[pid  4112] <... poll resumed> )        = 0 (Timeout)
[pid  4098] <... read resumed> "\1\0\0\0\0\0\0\0", 16) = 8
[pid  4112] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}], 3, -1 <unfinished ...>
[pid  4098] access("MARK: gsd-account on_got_password_expiration_policy: start ", F_OK) = -1 ENOENT (No such file or directory)
[pid  4098] access("MARK: gsd-account on_got_password_expiration_policy: end ", F_OK) = -1 ENOENT (No such file or directory)
[pid  4098] poll([{fd=3, events=POLLIN}], 1, 0) = 0 (Timeout)
[pid  4098] write(6, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
[pid  4112] <... poll resumed> )        = 1 ([{fd=6, revents=POLLIN}])
[pid  4098] <... write resumed> )       = 8
[pid  4112] futex(0x1639c80, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid  4098] futex(0x1639c80, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid  4112] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)

[etc.]

d) [partly offtopic, but maybe can add further light to interactions between polkitd and this accounts daemon issue] restarting polkitd using instructions in comment #5 in bug 1235681 eventually causes my user to lose GNOME desktop permissions to:

  1. shutdown/suspend the laptop
  2. open up a VPN connection
 (both functionalities available in top right corner GNOME menu)


Relevant RPMS -

gnome-settings-daemon-3.26.2-9.el7.x86_64
accountsservice-0.6.45-7.el7.x86_64
 
This issue also appears to be reported on CentOS 7 systems (top CPU using processes are the same I can observe on my laptops): https://www.centos.org/forums/viewtopic.php?t=67547

Comment 5 Emanuel 2018-08-10 08:41:40 UTC
(In reply to Yan Li from comment #3)
> It is gsd-account that keeps polling polkit for GetPasswordExpirationPolicy. 
> 
> Following is my analysis. This is a bug in
> gnome-settings-daemon:plugins/account/gsd-account-manager.c. A timeout is
> added to check the password expiration time, and a notification would be
> shown a few days before the password expires. The timeout queues itself back
> after being triggered. However, the timeout relies on
> manager->priv->last_notify_time, which is _ONLY_ updated when a notification
> is shown, but not after a check. Therefore, the timeout is added with 0
> second and is triggered again instantly after it is added, causing the
> plugin to keep busy checking password expiration policy.
> 
> This bug should be reassigned to gsd maintainer.

Same issue here with several RHEL 7.5 servers and Yan Li's workaround works fine.

Comment 7 Paul F. Richards 2018-08-20 15:33:07 UTC
A user name mlogic on the centos forum has found a workaround that may or may not be suitable for your environment:

"A workaround that I've been using is to remove /etc/xdg/autostart/org.gnome.SettingsDaemon.Account.desktop until the above mentioned bug is fixed. The only function you will lose is the notification of expiring account password."

I can also verify this workaround resolved the high cpu usage of polkitd on two systems.

Comment 12 Ray Strode [halfline] 2018-11-09 14:48:59 UTC
easiest work around is to set a notify period of 0.


gsettings set org.gnome.settings-daemon.plugins.account notify-period 0

Comment 13 Tom Sorensen 2018-11-09 15:28:49 UTC
The work around in comment #12 is effective. Obviously it disables the notification in question, but not an issue for me. This reduced both polkitd and dbus-daemon CPU usage to 0, while previously they had been the top CPU users on my system.

Comment 14 Ray Strode [halfline] 2018-11-09 15:56 UTC
Created attachment 1503706 [details]
account: don't poll more frequently than notification period

At the moment, if an account has no reason to receive a notification,
the account plugin ends up polling continuously, desperately,
and unsuccessfully trying to find a reason to notify. That leads
to unnecessary CPU utilization.

The reason is an apparent think-o in the code.  The code tracks
the last time a notification was shown, so it knows when to show
the next notification later, even if the notification period, or
account policy is updated by the admin in the interim.

The problem is that it's wrong to look at the last notification
time if there's no reason to show a notification.  In that case
the wakeup is merely to poll updates on the account policy.

This commit addresses the problem by only looking at the previous
notification time, if it was within the current notification period.

Comment 15 Ray Strode [halfline] 2018-11-09 15:58:11 UTC
Yan Li, thank you for investigating this issue.

I believe attachment 1503706 [details] should address the problem, though I haven't tested it yet.

Comment 16 Jeong Han-Bit 2018-11-14 00:20:55 UTC
This problem also occurs with RHEL7.6.

Comment 22 Joe Wright 2018-11-30 16:22 UTC
Created attachment 1510185 [details]
test patch packages from halfline


Note You need to log in before you can comment on or make changes to this bug.