Bug 1978610
| Summary: | tlog-rec-session loops when the user logs into the GUI | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> | ||||
| Component: | tlog | Assignee: | jstephen | ||||
| Status: | CLOSED ERRATA | QA Contact: | Scott Poore <spoore> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 8.4 | CC: | aboscatt, alrodrig, ctowsley, jstephen, nikolai.kondrashov, spoore | ||||
| Target Milestone: | rc | Keywords: | Triaged | ||||
| Target Release: | 8.6 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | tlog-12-1.el8 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-05-10 13:53:11 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Thanks for the report, Renaud! Could you post e.g. the process tree/branch to which tlog-rec-session belongs to? What's running it and what does run under the shell it records? What's recorded by tlog-rec-session, wherever it is set up to record? Below is the tree:
systemd(1)───gdm(1375)───gdm-session-wor(3524)───gdm-wayland-ses(3566)───tlog-rec-sessio(3575)───gnome-session-b(3577)─┬─gnome-shell(3600)─┬─Xwayland(3648)─┬─{Xwayland}(3650)
...
Disabling wayland doesn't have any impact:
systemd(1)───gdm(5677)───gdm-session-wor(7087)───gdm-x-session(7126)───tlog-rec-sessio(7159)───gnome-session-b(7175)─┬─gnome-shell(7228)─┬─ibus-daemon(7269)─┬─ibus-dconf(7274)─┬─{ibus-dconf}(7275)
...
Thank you. First, I have to say, that last time I checked tlog shouldn't be used with graphical sessions, but I might be out of date. Check official administrator guide. The loop you posted is the normal recording loop. I.e. something is being read/written through that shell and then recorded. Could you find out what is recorded, to verify? Even though (I think) the GUI configuration is unsupported, we might still need to handle it. The documentation states tlog doesn't work for graphical terminals, but still I think tlog shouldn't start spinning in case it's used as the default shell, which is the standard way to enable tlog if I understand correctly. I don't see anything recorded there btw. The use case for this is customer wants to log everything for remote connections. It appears if the server has a GUI as well and the user logs in through the GUI / VNC, then havoc starts to happen. There's little sense in recording GUI sessions, since the standard shell often wouldn't normally even be started by the user, and if they want, they can just run a terminal emulator with the shell of their choice. Normally if you have a reason to run tlog-rec-session, for the same reason you don't want to run a GUI session. Otherwise, just to be absolutely sure that nothing is recorded (which would be strange), did you check the actual tlog-rec-session configuration used and the configured log destination? To reproduce the issue, I used the standard configuration file /etc/tlog/tlog-rec-session.conf
Then I amended to log to path /var/log/tlog:
"file": {
"path" : "/var/log/tlog"
},
And I'm still getting nothing at all.
Actually I don't know at all when it's supposed to show something.
Configuring to write to the journal, I see the session recording when executing a ssh session, but nothing when on the GUI.
This is weird and needs reproduction and debugging. I'm not the current maintainer, just the original author of tlog, and I'll leave it to the maintainer to handle. Thank you for filing this issue Renaud, I agree this high CPU issue should not occur for tlog-rec-session logins on the Graphical Interface, even though there are obvious limitations with recording graphical users (as stated here in comments). I need to investigate this further, however this will need to wait until next quarter due to other higher priority planned development tasks. So I checked with the customer and found out that tlog-rec-session was *not* the default shell for his users (so unlike my reproducer). The tlog infrastructure is set up through IDM using the recommended procedure. For the customer also, his users that connect through ssh then create a VNC session (the non-tech guys basically) get the issue, because tlog-rec-session is fired by gdm-session-worker, similar to my reproducer. BTW I disagree with moving the Severity to Low, there is available workaround for now. To degrade the severity, we need a safe workaround first. I confirmed with a customer that enabling session recording the "recommended way" (using sssd) leads to exact same issue when the user logs in through the GUI or a VNC session. So this needs to be handled with high priority. Hi Renaud, Thanks for the inputs, we will bring this topic to our group session on Wednesday and we will give you some update as soon as we can. Kindly Hello,
I had suggested low severity to match the severity of the attached customer case 02921035, I looked at the other case 02947406 and it does not appear to be related to this issue.
I will look into debugging this issue further based on the reproducer, hopefully the customer understands that recording will not work with graphical sessions.
If it helps, please note SSSD configuration allows for providing 'exclude_users' and 'exclude_groups' which could be used to exclude users/groups of VNC users.
exclude_users (string)
A comma-separated list of users to be excluded from recording, only applicable with 'scope=all'.
Default: Empty. No users excluded.
exclude_groups (string)
A comma-separated list of groups, members of which should be excluded from recording. Only applicable with 'scope=all'.
NOTE: using this option (having it set to anything) has a considerable performance cost, because each uncached request for a user requires
retrieving and matching the groups the user is member of.
Default: Empty. No groups excluded.
Verified. Version :: tlog-12-1.el8.x86_64 Results :: # su - test -c '/usr/local/bin/qecore-headless /tmp/whoami.sh' Locale charset is ANSI_X3.4-1968 (ASCII) Assuming locale environment is lost and charset is UTF-8 ATTENTION! Your session is being recorded! headless: Starting Display Manager headless: Setting environment variable TERM as 'xterm' headless: Running 'wayland' with desktop 'gnome' headless: Changing a11y value from 'false' to 'true' headless: Started the script with PID 42720 test 0.0 -/usr/bin/tlog-rec-session -c /usr/bin/gnome-session -l 0.0 -/usr/bin/tlog-rec-session -c /usr/bin/gnome-session -l 0.0 -/usr/bin/tlog-rec-session -c /usr/bin/gnome-session -l 0.0 -/usr/bin/tlog-rec-session -c /usr/bin/gnome-session -l test headless: The user script finished with return code 0 headless: Stopping Display Manager Verified. Version :: tlog-12-1.el8.x86_64 Results :: :: [ 13:41:46 ] :: [ BEGIN ] :: Running 'su - test -c '/tmp/tmp.6OPY9FA5Lg/pyvirt/bin/qecore-headless /tmp/tmp.6OPY9FA5Lg/whoami.sh' > /tmp/tmp.6OPY9FA5Lg/whoami.sh.out' Locale charset is ANSI_X3.4-1968 (ASCII) Assuming locale environment is lost and charset is UTF-8 ATTENTION! Your session is being recorded! :: [ 13:43:25 ] :: [ PASS ] :: Command 'su - test -c '/tmp/tmp.6OPY9FA5Lg/pyvirt/bin/qecore-headless /tmp/tmp.6OPY9FA5Lg/whoami.sh' > /tmp/tmp.6OPY9FA5Lg/whoami.sh.out' (Expected 0, got 0) :: [ 13:43:25 ] :: [ BEGIN ] :: Running 'cat /tmp/tmp.6OPY9FA5Lg/whoami.sh.out' headless: Starting Display Manager headless: Setting environment variable TERM as 'xterm' headless: Running 'wayland' with desktop 'gnome' headless: Changing a11y value from 'false' to 'true' headless: Started the script with PID 41949 test 0.0 -/usr/bin/tlog-rec-session -c /usr/bin/gnome-session -l 0.0 -/usr/bin/tlog-rec-session -c /usr/bin/gnome-session -l 0.0 -/usr/bin/tlog-rec-session -c /usr/bin/gnome-session -l 0.0 -/usr/bin/tlog-rec-session -c /usr/bin/gnome-session -l test headless: The user script finished with return code 0 headless: Stopping Display Manager :: [ 13:43:25 ] :: [ PASS ] :: Command 'cat /tmp/tmp.6OPY9FA5Lg/whoami.sh.out' (Expected 0, got 0) :: [ 13:43:25 ] :: [ LOG ] :: Checking if TOPCPU(0.0) > 10.0% :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Duration: 99s :: Assertions: 5 good, 0 bad :: RESULT: PASS (Test) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (tlog bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1858 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |
Created attachment 1797112 [details] Script showing tlog-rec-session spinning in while loop Description of problem: When a user is configured with /usr/bin/tlog-rec-session as the login shell and he logs in using the GUI, tlog-rec-session spins on the CPU. Using GDB, I could find that the code seems to spin in a while loop: rec.c: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 886 /* 887 * Transfer I/O and window changes 888 */ 889 while (tlog_rec_exit_signum == 0) { : 931 /* Log the received data, if any */ 932 if (tlog_pkt_pos_is_in(&log_pos, &pkt)) { 933 /* If asked to log this type of packet */ 934 if (item_mask & (1 << tlog_rec_item_from_pkt(&pkt))) { 935 grc = tlog_sink_write(log_sink, &pkt, &log_pos, NULL); 936 if (grc == TLOG_RC_OK) { 937 log_pending = true; 938 } else if (grc != TLOG_GRC_FROM(errno, EINTR)) { 939 return_grc = grc; 940 TLOG_ERRS_RAISECS(grc, "Failed logging terminal data"); 941 } 942 } else { 943 tlog_pkt_pos_move_past(&log_pos, &pkt); 944 } 945 continue; 946 } 947 948 /* Read new data */ 949 tlog_pkt_cleanup(&pkt); 950 log_pos = TLOG_PKT_POS_VOID; 951 tty_pos = TLOG_PKT_POS_VOID; : -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- From above, I could find that on line 935 grc was set to TLOG_RC_OK, making the code enter condition block on line 937. Then it went to line 945 which starts the while loop on line 889 again. I could then verify using a systemtap script (attached for reference) that counts how much time line 937 is entered compared to line 948 (which effectively handles data) that the code was spinning in the loop: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # timeout 30s stap -v ./tlog.stp [...] Number of hits on line 937: 120592: 12601432 Number of hits on line 949: Pass 5: run completed in 10usr/70sys/28691real ms. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- From above, on a 30 seconds run, the condition was entered 12 millions time! and line 949 was never reached. Version-Release number of selected component (if applicable): tlog-9-2.el8.x86_64 How reproducible: ALWAYS Steps to Reproduce: 1. Install a VM with GUI, create a user and set /usr/bin/tlog-rec-session as the shell for that user 2. Log in graphically Actual results: tlog-rec-session spins on CPU Expected results: no spinning