Bug 1837808 - NIS with systemd-logind.service: start operation timed out. Terminating. Got lookup error: io.systemd.TimedOut
Summary: NIS with systemd-logind.service: start operation timed out. Terminating. Got ...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: nss_nis
Version: 32
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-20 03:30 UTC by RobbieTheK
Modified: 2021-05-25 17:21 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-05-25 17:21:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
here is a strace on systemd, PID 1 and what happens when starting systemd-logind (133.23 KB, text/plain)
2020-05-20 04:29 UTC, RobbieTheK
no flags Details
strace on systemd-logind as it is trying to start (34.74 KB, text/plain)
2020-05-20 04:36 UTC, RobbieTheK
no flags Details
strace with -f (65.55 KB, text/plain)
2020-05-21 14:44 UTC, RobbieTheK
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github systemd systemd issues 15316 0 None open "Unexpected error response from GetNameOwner(): Connection terminated" messages + boot takes a lot of time 2021-02-01 11:59:51 UTC
Red Hat Bugzilla 1829572 0 unspecified CLOSED Login fails after upgrade fo Fedora 32 2021-02-22 00:41:40 UTC

Description RobbieTheK 2020-05-20 03:30:40 UTC
Just upgraded for Fedora 32, still running NUS/ypserv, not getting very slow logins and systemd-logind is not starting. I enabled debug logs and am seeing the below logs. I don't think https://github.com/systemd/systemd/issues/7074 is back with the nss-nis bug but I made sure that IPAddress= is set (to nothing).

May 19 23:18:48 ourserver systemd-logind[26154]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state awaiting-reply → pending-timeout
May 19 23:18:48 ourserver systemd-logind[26154]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state pending-timeout → processing-timeout
May 19 23:18:48 ourserver systemd-logind[26154]: Got lookup error: io.systemd.TimedOut
May 19 23:18:48 ourserver systemd-logind[26154]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state processing-timeout → disconnected
May 19 23:18:48 ourserver systemd-logind[26154]: Couldn't add lingering user myuser, ignoring: Connection timed out
May 19 23:18:48 ourserver systemd-logind[26154]: n/a: varlink: setting state idle-client
May 19 23:18:48 ourserver systemd-logind[26154]: /run/systemd/userdb/io.systemd.Multiplexer: Sending message: {"method":"io.systemd.UserDatabase.GetUserRecord","parameters":{"userName":"localuser","service":"io.systemd.Multiplexer"}}
May 19 23:18:48 ourserver systemd-logind[26154]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state idle-client → awaiting-reply
May 19 23:19:33 ourserver systemd[1]: systemd-logind.service: start operation timed out. Terminating.
May 19 23:19:34 ourserver systemd-logind[26154]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state awaiting-reply → pending-timeout
May 19 23:19:34 ourserver systemd-logind[26154]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state pending-timeout → processing-timeout
May 19 23:19:34 ourserver systemd-logind[26154]: Got lookup error: io.systemd.TimedOut
May 19 23:19:34 ourserver systemd-logind[26154]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state processing-timeout → disconnected
May 19 23:19:34 ourserver systemd-logind[26154]: Couldn't add lingering user localguy, ignoring: Connection timed out
May 19 23:19:34 ourserver systemd-logind[26154]: User enumeration failed: Connection timed out
May 19 23:19:34 ourserver systemd-logind[26154]: Watching system buttons on /dev/input/event0 (Power Button)
May 19 23:19:34 ourserver systemd-logind[26154]: Watching system buttons on /dev/input/event1 (Barcode Reader )
May 19 23:19:34 ourserver systemd-logind[26154]: Watching system buttons on /dev/input/event3 (Barcode Reader  System Control)
May 19 23:19:34 ourserver systemd-logind[26154]: Watching system buttons on /dev/input/event4 (Barcode Reader  Consumer Control)
May 19 23:19:34 ourserver systemd-logind[26154]: systemd-logind running as pid 26154
May 19 23:19:34 ourserver systemd-logind[26154]: Bus n/a: changing state AUTHENTICATING → CLOSED
May 19 23:19:34 ourserver systemd-logind[26154]: systemd-logind stopped as pid 26154
May 19 23:19:34 ourserver systemd[1]: systemd-logind.service: Failed with result 'timeout'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The unit systemd-logind.service has entered the 'failed' state with result 'timeout'.
May 19 23:19:34 ourserver systemd[1]: Failed to start Login Service.
-- Subject: A start job for unit systemd-logind.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit systemd-logind.service has finished with a failure.
--
-- The job identifier is 14622 and the job result is failed.
May 19 23:19:34 ourserver systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 2.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Automatic restarting of the unit systemd-logind.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
May 19 23:19:34 ourserver systemd[1]: Stopped Login Service.
-- Subject: A stop job for unit systemd-logind.service has finished
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A stop job for unit systemd-logind.service has finished.
--
-- The job identifier is 14731 and the job result is done.
May 19 23:19:34 ourserver systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
-- Subject: A start job for unit modprobe has finished successfully
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit modprobe has finished successfully.
--
-- The job identifier is 14838.
May 19 23:19:34 ourserver systemd[1]: Starting Login Service...
-- Subject: A start job for unit systemd-logind.service has begun execution
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit systemd-logind.service has begun execution.
--
-- The job identifier is 14731.
May 19 23:19:34 ourserver systemd-logind[26377]: Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
May 19 23:19:34 ourserver systemd-logind[26377]: Failed to read memory.max cgroup attribute, ignoring cgroup memory limit: No such file or directory
May 19 23:19:34 ourserver systemd-logind[26377]: Bus n/a: changing state UNSET → OPENING
May 19 23:19:34 ourserver systemd-logind[26377]: Bus n/a: changing state OPENING → AUTHENTICATING
May 19 23:19:34 ourserver systemd-logind[26377]: New seat seat0.
-- Subject: A new seat seat0 is now available
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: https://www.freedesktop.org/wiki/Software/systemd/multiseat
--
-- A new seat seat0 has been configured and is now available.
May 19 23:19:34 ourserver systemd-logind[26377]: Preallocating VTs...
May 19 23:19:34 ourserver systemd-logind[26377]: VT changed to 1
May 19 23:19:34 ourserver systemd-logind[26377]: n/a: varlink: setting state idle-client
May 19 23:19:34 ourserver systemd-logind[26377]: /run/systemd/userdb/io.systemd.Multiplexer: Sending message: {"method":"io.systemd.UserDatabase.GetUserRecord","parameters":{"userName":"myuser","service":"io.systemd.Multiplexer"}}
May 19 23:19:34 ourserver systemd-logind[26377]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state idle-client → awaiting-reply

Comment 1 RobbieTheK 2020-05-20 04:29:41 UTC
Created attachment 1690068 [details]
here is a strace on systemd, PID 1 and what happens when starting systemd-logind

I see these:
read(98, "0::/system.slice/systemd-logind."..., 1024) = 40
ioctl(98, TCGETS, 0x7ffdea3941c0)       = -1 ENOTTY (Inappropriate ioctl for device)

Comment 2 RobbieTheK 2020-05-20 04:36:06 UTC
Created attachment 1690069 [details]
strace on systemd-logind as it is trying to start

Comment 3 Zbigniew Jędrzejewski-Szmek 2020-05-20 06:58:26 UTC
I'll wager it's the same issue as in #1829572. Please test the linked PR.

*** This bug has been marked as a duplicate of bug 1829572 ***

Comment 4 RobbieTheK 2020-05-20 13:08:09 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #3)
> I'll wager it's the same issue as in #1829572. Please test the linked PR.
> 
> *** This bug has been marked as a duplicate of bug 1829572 ***

It does look the same but the fix does not work. I did:
cp /usr/lib/systemd/system/systemd-logind.service.d/nss_nis.conf /usr/lib/systemd/system/systemd-userdbd.service.d/nss_nis.conf
systemctl daemon-reload && systemctl restart systemd-userdbd

systemctl status systemd-userdbd
● systemd-userdbd.service - User Database Manager
     Loaded: loaded (/usr/lib/systemd/system/systemd-userdbd.service; static; vendor preset: disabled)
    Drop-In: /usr/lib/systemd/system/systemd-userdbd.service.d
             └─nss_nis.conf
     Active: active (running) since Wed 2020-05-20 09:02:07 EDT; 1min 1s ago
TriggeredBy: ● systemd-userdbd.socket
       Docs: man:systemd-userdbd.service(8)
   Main PID: 90087 (systemd-userdbd)
     Status: "Processing requests..."
      Tasks: 6 (limit: 76721)
     Memory: 208.9M
        CPU: 814ms
     CGroup: /system.slice/systemd-userdbd.service
             ├─90087 /usr/lib/systemd/systemd-userdbd
             ├─90088 systemd-userwork
             ├─90089 systemd-userwork
             ├─90090 systemd-userwork
             ├─90091 systemd-userwork
             └─90092 systemd-userwork

May 20 09:02:07 ourserver systemd[1]: Starting User Database Manager...
May 20 09:02:07 ourserver systemd[1]: Started User Database Manager.
May 20 09:02:07 ourserver systemd-userdbd[90088]: yp_bind_client_create_v3: RPC: Unable to send
May 20 09:02:07 ourserver systemd-userdbd[90090]: yp_bind_client_create_v3: RPC: Unable to send
May 20 09:02:07 ourserver systemd-userdbd[90089]: yp_bind_client_create_v3: RPC: Unable to send
May 20 09:02:25 ourserver systemd-userdbd[90092]: yp_bind_client_create_v3: RPC: Unable to send

restart of systemd-logind still fails:

● systemd-logind.service - Login Service
     Loaded: loaded (/usr/lib/systemd/system/systemd-logind.service; static; vendor preset: disabled)
    Drop-In: /usr/lib/systemd/system/systemd-logind.service.d
             └─nss_nis.conf
     Active: activating (start) since Wed 2020-05-20 09:05:26 EDT; 23s ago
       Docs: man:systemd-logind.service(8)
             man:logind.conf(5)
             https://www.freedesktop.org/wiki/Software/systemd/logind
             https://www.freedesktop.org/wiki/Software/systemd/multiseat
   Main PID: 90653 (systemd-logind)
      Tasks: 1 (limit: 76721)
     Memory: 1.4M
        CPU: 513ms
     CGroup: /system.slice/systemd-logind.service
             └─90653 /usr/lib/systemd/systemd-logind

May 20 09:05:27 ourserver systemd-logind[90653]: Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
May 20 09:05:27 ourserver systemd-logind[90653]: Failed to read memory.max cgroup attribute, ignoring cgroup memory limit: No such file or directory
May 20 09:05:27 ourserver systemd-logind[90653]: Bus n/a: changing state UNSET → OPENING
May 20 09:05:27 ourserver systemd-logind[90653]: Bus n/a: changing state OPENING → AUTHENTICATING
May 20 09:05:27 ourserver systemd-logind[90653]: New seat seat0.
May 20 09:05:27 ourserver systemd-logind[90653]: Preallocating VTs...
May 20 09:05:27 ourserver systemd-logind[90653]: VT changed to 1
May 20 09:05:27 ourserver systemd-logind[90653]: n/a: varlink: setting state idle-client
May 20 09:05:27 ourserver systemd-logind[90653]: /run/systemd/userdb/io.systemd.Multiplexer: Sending message: {"method":"io.systemd.UserDatabase.GetUserRecord","parameters":{"userName":"kudyba","service":"io.systemd.Multiplexer"}}
May 20 09:05:27 ourserver systemd-logind[90653]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state idle-client → awaiting-reply


May 20 09:05:46 ourserver  crond[90595]: pam_systemd(crond:session): Failed to get user record: Connection timed out
May 20 09:05:46 ourserver  CROND[90703]: (mailman) CMD (/usr/lib/mailman/cron/gate_news)
May 20 09:06:12 ourserver  systemd-logind[90653]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state awaiting-reply → pending-timeout
May 20 09:06:12 ourserver  systemd-logind[90653]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state pending-timeout → processing-timeout
May 20 09:06:12 ourserver  systemd-logind[90653]: Got lookup error: io.systemd.TimedOut
May 20 09:06:12 ourserver  systemd-logind[90653]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state processing-timeout → disconnected
May 20 09:06:12 ourserver  systemd-logind[90653]: Couldn't add lingering user myuser, ignoring: Connection timed out
May 20 09:06:12 ourserver  systemd-logind[90653]: n/a: varlink: setting state idle-client
May 20 09:06:12 ourserver  systemd-logind[90653]: /run/systemd/userdb/io.systemd.Multiplexer: Sending message: {"method":"io.systemd.UserDatabase.GetUserRecord","parameters":{"userName":"localuser","service":"io.systemd.Multiplexer"}}
May 20 09:06:12 ourserver  systemd-logind[90653]: /run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state idle-client → awaiting-reply

Comment 5 Zbigniew Jędrzejewski-Szmek 2020-05-20 13:29:18 UTC
I think it's userdbd that the source of the problem. Please try the following:
1. run userdbctl. I expect that this will show the timeout / long delay.
2. if the delay is there, verify what exactly fails:
   sudo strace -p $(systemctl show --value -p MainPID systemd-userdbd)
   and rerun the query with userdbctl.

Comment 6 RobbieTheK 2020-05-20 13:56:18 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #5)
> I think it's userdbd that the source of the problem. Please try the
> following:
> 1. run userdbctl. I expect that this will show the timeout / long delay.
> 2. if the delay is there, verify what exactly fails:
>    sudo strace -p $(systemctl show --value -p MainPID systemd-userdbd)
>    and rerun the query with userdbctl.

Yes takes about a minute ran it 2x:
userdbctl
NAME   DISPOSITION   UID   GID REALNAME HOME  SHELL
root   intrinsic       0     0 -        /root /bin/sh
nobody intrinsic   65534 65534 -        /     /usr/sbin/nologin

userdbctl
NAME   DISPOSITION   UID   GID REALNAME HOME  SHELL
root   intrinsic       0     0 -        /root /bin/sh
nobody intrinsic   65534 65534 -        /     /

 ps -9 90087
    PID TTY      STAT   TIME COMMAND
  90087 ?        Ss     0:00 /usr/lib/systemd/systemd-userdbd

I had to wait a couple of minutes but I finally got some results from strace::
strace -p $(systemctl show --value -p MainPID systemd-userdbd)
strace: Process 90087 attached

epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\21\0\0\0\0\0\0\0\1\0\0\0\fx\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96261, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96263, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96264, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96268, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {}, WNOHANG|WEXITED, NULL) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\21\0\0\0\0\0\0\0\1\0\0\0\5x\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
waitid(P_ALL, 0, {}, WNOHANG|WEXITED, NULL) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\21\0\0\0\0\0\0\0\1\0\0\0\6x\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96262, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {}, WNOHANG|WEXITED, NULL) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\21\0\0\0\0\0\0\0\1\0\0\0\rx\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96269, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {}, WNOHANG|WEXITED, NULL) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\21\0\0\0\0\0\0\0\1\0\0\0\vx\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96259, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96267, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96258, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96260, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96265, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {}, WNOHANG|WEXITED, NULL) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\21\0\0\0\0\0\0\0\1\0\0\0\4x\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
waitid(P_ALL, 0, {}, WNOHANG|WEXITED, NULL) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\21\0\0\0\0\0\0\0\1\0\0\0\nx\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96257, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=96266, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WNOHANG|WEXITED, NULL) = 0
waitid(P_ALL, 0, {}, WNOHANG|WEXITED, NULL) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97510
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97511
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\21\0\0\0\0\0\0\0\1\0\0\0\1x\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
waitid(P_ALL, 0, {}, WNOHANG|WEXITED, NULL) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\346|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97512
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\350|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97513
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\351|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97514
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\352|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97515
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\353|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97516
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\354|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97517
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\355|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97518
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\356|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97519
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\357|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97520
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\360|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97521
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\361|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97522
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\362|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97523
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5, [{EPOLLIN, {u32=1987913200, u64=93962987447792}}], 4, -1) = 1
read(6, "\f\0\0\0\0\0\0\0\0\0\0\0\363|\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR2 TERM CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90c4e7ae50) = 97524
rt_sigprocmask(SIG_SETMASK, [INT USR2 TERM CHLD], NULL, 8) = 0
epoll_wait(5,

I take it the other strace's I previously attached didn't provide anything helpful ?

Comment 7 RobbieTheK 2020-05-20 15:11:20 UTC
Lennert's suggestion at https://lists.freedesktop.org/archives/systemd-devel/2020-May/044611.html got me around this problem.

In /usr/lib/systemd/system/systemd-userdbd.service I set IPAddress= (again, set to nothing, it was set to "Any"). I restarted that service (after running systemctl daemon-reload). That also took about a minute. Then restarting systemd-logind.service worked and logins are now quick.

I'm just surprised your workaround didn't work. Any idea why?

Comment 8 RobbieTheK 2020-05-20 15:15:11 UTC
Any reason for these error messages in the logs after restarting systemd-logind?

May 20 11:12:50 secondary systemd-logind[382684]: New seat seat0.
May 20 11:12:50 secondary systemd-logind[382684]: User or group name "0" starts with a digit, accepting for compatibility.
May 20 11:12:50 secondary systemd-logind[382684]: Failed to add user by file name 0, ignoring: Invalid argument
May 20 11:12:50 secondary systemd-logind[382684]: User enumeration failed: Invalid argument

May 20 11:13:41 primary systemd[1]: Starting Login Service...
May 20 11:13:41 primary systemd-logind[110604]: New seat seat0.
May 20 11:13:41 primary systemd-logind[110604]: User or group name "199" starts with a digit, accepting for compatibility.
May 20 11:13:41 primary systemd-logind[110604]: Failed to add user by file name 199, ignoring: Invalid argument
May 20 11:13:41 primary systemd-logind[110604]: User or group name "6105" starts with a digit, accepting for compatibility.
May 20 11:13:41 primary systemd-logind[110604]: Failed to add user by file name 6105, ignoring

Comment 9 RobbieTheK 2020-05-20 17:00:29 UTC
Please see Lennert P. suggestions at https://lists.freedesktop.org/archives/systemd-devel/2020-May/044614.html

Comment 10 Zbigniew Jędrzejewski-Szmek 2020-05-21 14:08:16 UTC
> I had to wait a couple of minutes but I finally got some results from strace::
> strace -p $(systemctl show --value -p MainPID systemd-userdbd)
> strace: Process 90087 attached

Sorry, my bad. That strace is not useful without -f, because userdbd uses child
processes to do lookups.
All we can see from this strace is that userdbd is waiting for a child to finish.

> I take it the other strace's I previously attached didn't provide anything helpful ?

Yes, the interesting things are happening in other processes.

> Any reason for these error messages in the logs after restarting systemd-logind?
> 
> May 20 11:12:50 secondary systemd-logind[382684]: New seat seat0.
> May 20 11:12:50 secondary systemd-logind[382684]: User or group name "0" starts with a digit, accepting for compatibility.
> May 20 11:12:50 secondary systemd-logind[382684]: Failed to add user by file name 0, ignoring: Invalid argument
> May 20 11:12:50 secondary systemd-logind[382684]: User enumeration failed: Invalid argument

That looks as if something is passing uids in place of user names.
There was some issue about this in systemd upstream, but I don't recall the details
of the top of my head. It's possible that some patch needs to be backported.

> Lennert's suggestion at https://lists.freedesktop.org/archives/systemd-devel/2020-May/044611.html got me around this problem.

It think it is fairly obvious that the sandbox is the problem. The question is
which of the constraints is the source of the problem. Turning all sandboxing off
is not a nice solution. That is why I wanted to see the strace and pinpoint the issue.

Comment 11 RobbieTheK 2020-05-21 14:44:10 UTC
Created attachment 1690666 [details]
strace with -f

Perhaps these are a clue:
[pid 21765] execve("/usr/lib/systemd/systemd-userwork", ["systemd-userwork", "xxxxxxxxxxxxxxxx"], 0x55ac4b4c4f00 /* 10 vars */) = 0
[pid 21765] brk(NULL)                   = 0x56250f2fb000
[pid 21765] arch_prctl(0x3001 /* ARCH_??? */, 0x7ffee01dfc00) = -1 EINVAL (Invalid argument)
[pid 21765] access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
[pid 21765] openat(AT_FDCWD, "/usr/lib/systemd/tls/haswell/avx512_1/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 21765] stat("/usr/lib/systemd/tls/haswell/avx512_1/x86_64", 0x7ffee01dee60) = -1 ENOENT (No such file or directory)
[pid 21765] openat(AT_FDCWD, "/usr/lib/systemd/tls/haswell/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 21765] stat("/usr/lib/systemd/tls/haswell/avx512_1", 0x7ffee01dee60) = -1 ENOENT (No such file or directory)

Comment 12 Zbigniew Jędrzejewski-Szmek 2020-05-26 10:15:09 UTC
Unfortunately that strace is not useful, because it shows one of the userwork processes
idling while waiting for a job to come in. It seems possible that there's selinux involved
too, which could be muddying the picture.

A more appropriate command that captures any already-spawned workers:
strace -f $(cat /sys/fs/cgroup/system.slice/systemd-userdbd.service/cgroup.procs|sed 's/^/-p /')

[pid 13691] connect(11, {sa_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
[pid 13661] connect(13, {sa_family=AF_INET6, sin6_port=htons(111), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28 <unfinished ...>
[pid 13660] connect(13, {sa_family=AF_INET6, sin6_port=htons(111), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28

In my testing, with nss_nis-3.1-5.fc33.x86_64, this connect() suceeds:

[pid 13958] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 13
[pid 13958] connect(13, {sa_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid 13958] getsockname(13, {sa_family=AF_INET, sin_port=htons(46473), sin_addr=inet_addr("127.0.0.1")}, [28->16]) = 0
[pid 13958] close(13)                   = 0
[pid 13958] socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 13
[pid 13958] getsockname(13, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, [128->16]) = 0
[pid 13958] getsockopt(13, SOL_SOCKET, SO_TYPE, [1], [4]) = 0
[pid 13958] openat(AT_FDCWD, "/etc/bindresvport.blacklist", O_RDONLY) = 14
[pid 13958] fstat(14, {st_mode=S_IFREG|0644, st_size=447, ...}) = 0
[pid 13958] read(14, "#\n# This file contains a list of"..., 4096) = 447
[pid 13958] read(14, "", 4096)          = 0
[pid 13958] close(14)                   = 0
[pid 13958] getsockname(13, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, [128->16]) = 0
[pid 13958] getpid()                    = 13958
[pid 13958] bind(13, {sa_family=AF_INET, sin_port=htons(990), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
[pid 13958] setsockopt(13, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
[pid 13958] prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=512*1024, rlim_max=512*1024}) = 0
[pid 13958] mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc5aaf2c000
[pid 13958] mmap(NULL, 25169920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc5a972b000
[pid 13958] getpeername(13, 0x7ffc2ac1dc40, [128]) = -1 ENOTCONN (Transport endpoint is not connected)
[pid 13958] connect(13, {sa_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid 13958] getsockname(13, {sa_family=AF_INET, sin_port=htons(59024), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 0
[pid 13958] getsockopt(13, SOL_SOCKET, SO_TYPE, [1], [4]) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 13958] gettimeofday({tv_sec=1590487611, tv_usec=957327}, NULL) = 0
[pid 13958] getpid()                    = 13958
[pid 13958] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 13958] brk(NULL)                   = 0x2311000
[pid 13958] brk(0x233d000)              = 0x233d000
[pid 13958] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
[pid 13958] write(13, "\200\0\0Xs\3\231\323\0\0\0\0\0\0\0\2\0\1\206\240\0\0\0\4\0\0\0\3\0\0\0\0"..., 92) = 92
[pid 13958] poll([{fd=13, events=POLLIN}], 1, 60000) = 1 ([{fd=13, revents=POLLIN}])
[pid 13958] read(13, "\200\0\0\34s\3\231\323\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 65536) = 32
[pid 13958] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 13958] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
[pid 13958] close(13)                   = 0

I also saw the following denials:
May 26 11:30:41 workstation-uefi audit[13661]: AVC avc:  denied  { search } for  pid=13661 comm="systemd-userwor" name="yp" dev="vda3" ino=4972 scontext=system_u:system_r:systemd_userdbd_t:s0 tcontext=system_u:object_r:var_yp_t:s0 tclass=dir permissive=1
May 26 11:30:41 workstation-uefi audit[13661]: AVC avc:  denied  { name_bind } for  pid=13661 comm="systemd-userwor" src=693 scontext=system_u:system_r:systemd_userdbd_t:s0 tcontext=system_u:object_r:hi_reserved_port_t:s0 tclass=tcp_socket permissive=1
May 26 11:30:41 workstation-uefi audit[13661]: AVC avc:  denied  { name_connect } for  pid=13661 comm="systemd-userwor" dest=111 scontext=system_u:system_r:systemd_userdbd_t:s0 tcontext=system_u:object_r:portmap_port_t:s0 tclass=tcp_socket permissive=1
but I don't see them again. Not sure why.

Anyway, please verify that the issue still occurs with all latest packages.

Comment 13 Fedora Program Management 2021-04-29 16:53:56 UTC
This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 14 Ben Cotton 2021-05-25 17:21:38 UTC
Fedora 32 changed to end-of-life (EOL) status on 2021-05-25. Fedora 32 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.