Bug 1912046 - BASH_FUNC environment variables reigning havoc and it just keeps getting worse
Summary: BASH_FUNC environment variables reigning havoc and it just keeps getting worse
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 33
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-02 23:57 UTC by Gerald Cox
Modified: 2021-01-27 19:14 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-27 19:14:47 UTC
Type: Bug
Embargoed:
bcotton: fedora_prioritized_bug+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github sddm sddm issues 1335 0 None closed Plasma 5.20 has Black desktop screen when using wayland in conjunction with SDDM 2021-02-01 08:53:40 UTC
Github systemd systemd issues 14878 0 None closed exporting bash functions as env vars are not compatible with systemctl's `--import-environment` operation 2021-02-01 08:53:40 UTC
Github systemd systemd issues 16704 0 None closed UpdateActivationEnvironment fails when an environment variable contains the value \033 2021-02-01 08:53:40 UTC
Github systemd systemd issues 4446 0 None closed Allow all control characters in env vars set via SetEnvironment on the bus API 2021-02-01 08:53:40 UTC
KDE Software Compilation 404335 0 NOR REOPENED Don't rely on dbus-run-session or dbus-launch for starting plasma-wayland session 2021-02-01 08:53:40 UTC
Red Hat Bugzilla 1754395 0 unspecified CLOSED Qt applications lose system theme if launched via dbus activation (from flatpak applications or krunner) 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1897717 0 high CLOSED Plasma 5.20 has Black desktop screen when using wayland in conjunction with SDDM 2021-02-22 00:41:40 UTC

Internal Links: 1897717

Description Gerald Cox 2021-01-02 23:57:47 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1754395
https://bugzilla.redhat.com/show_bug.cgi?id=1897717
https://github.com/systemd/systemd/issues/4446 
https://github.com/sddm/sddm/issues/1335
 
I've been researching a black screen bug for about a month now and ultimately find that this problem has been going on in one form or another for years.  Here it is in a nutshell, from one of the comments:

> Cause of this bug is simple - systemd don't import ALL enviroment variables if any variable name is not POSIX compilant
> (it must conform to the regex [0-9A-Z_], see: https://github.com/systemd/systemd/blob/a859abf062cef1511e4879c4ee39c6036ebeaec8/src/basic/env-util.c#L19).
> 
> You can test it with this command: `env %=1 systemctl --user import-environment`, it will fail.
> 
> Now there is 2 known packages exporting those variables: environment-modules and Lmod.
> 
> Solution is simple - skip ONLY invalid variables while importing others.
> 
> I've created bug in the systemd, but still without answer - https://github.com/systemd/systemd/issues/14878.

So, what we basically have is two projects lmod and environment-modules that create apparently non-compliant env variables that systemd won't recognize.  This in turn has caused and is continuing to cause a cascading trail of havoc.  The correct way to fix this is either lmod and environment-modules change to a compliant naming scheme or systemd accept the names - or just skip the ones it doesn't want to recognize.  

It appears the root cause of the problem has been identified, yet as problems occur instead of fixing the root cause, each impact has been handled as a one-off with it's own circumvention.  As time goes on, other issues will continue to surface until the root cause is fixed.

Comment 1 Matthew Miller 2021-01-03 00:05:28 UTC
Hmmm, as I understand it, the reported problem is that if the environment contains non-posix-compliant variables (like "%"), systemctl import-environment will fail to import _anything_. If that's the case, I can't reproduce. If use "env %=1 AAAA=2", % isn't imported, but AAAA is.

Comment 2 Gerald Cox 2021-01-03 04:16:31 UTC
Matt, please read:  https://github.com/systemd/systemd/issues/14878

That is the basic issue explained in more detail, and it appears
we are at loggerheads at coming to an agreement on how to solve
the issue.  I'm definitely not the SME on this, but as I've
already mentioned, it makes absolutely no sense to build a workaround
into every application which is affected when the root cause should be
addressed.


I don't have Lmod on my system, but I do have environment-modules and if
I try to remove it, it wants to take 155 other applications with it, so
that doesn't appear to be an option either.

Comment 3 Matthew Miller 2021-01-03 16:35:36 UTC
Gerald, it looks like a fix was merged in that ticket? In https://github.com/systemd/systemd/pull/17188#issuecomment-708518052, Lennart says "Summary: terrible usecase, but will merge this anyway."

Like I said, I can't reproduce the minimal reproducer you've given, unless I misunderstand the problem. I have systemd-246.7-2.fc33 on my system. Can you also try the minimal case and test if you're having the KDE problems still?

Comment 4 Gerald Cox 2021-01-03 18:08:10 UTC
Hey Matt, thanks for your help with this.  I had missed https://github.com/systemd/systemd/pull/17188#issuecomment-708518052 
and from the description that "should" fix the issue.  I am also running systemd-246.7-2 but unfortuately, unless I am reading
this incorrectly, the fix was incorporated into systemd-247:  https://github.com/systemd/systemd/pull/17188#issuecomment-708518052
and 247 according to koji is only in rawhide - and yes, I'm still having the issue.  

I'll let kde upstream know and also see if I can figure out how to get 247 installed to test.  Hopefully, this will fix quite a
few problems and negate the need for hacky workarounds.

Comment 5 Zbigniew Jędrzejewski-Szmek 2021-01-03 19:20:52 UTC
Let me clarify:
- please ignore https://github.com/systemd/systemd/pull/17188. It was (mostly) reverted.
- https://github.com/keszybz/systemd/commit/a4ccce22d9552dc74b6916cc5ec57f2a0b686b4f makes systemctl
  ignore envvars that systemd doesn't like. This means that tools which call 'systemctl import-environment'
  will not fail.
  Other tools which call the DBus api directly need to do similar filtering.
- Lmod is not doing anything wrong. It is export bash functions as variables, which is a bit wonky, but allowed by POSIX.
  It actually makes sense for that use case.

So... if there's some particular tool which calls the Dbus api directly and tries to push unacceptable variables,
it needs to be patched not to do that. *Ideally* it would be patched to just push *only specific variables*,
because most variables only make sense for that program, for example $OLDPWD, $PWD, $SHLVL, $TERM, $SHELL.
Only variables that make sense *globally* for the user session should be pushed to the systemd environment block.

AFAIC, the issue has been resolved on the systemd side. The only thing that I plan to change (maybe)
is https://github.com/systemd/systemd/issues/4446. But it is not directly related to this issue.

Comment 6 Gerald Cox 2021-01-03 21:27:19 UTC
Thanks very much(In reply to Zbigniew Jędrzejewski-Szmek from comment #5)
> Let me clarify:
> - please ignore https://github.com/systemd/systemd/pull/17188. It was
> (mostly) reverted.
> -
> https://github.com/keszybz/systemd/commit/
> a4ccce22d9552dc74b6916cc5ec57f2a0b686b4f makes systemctl
>   ignore envvars that systemd doesn't like. This means that tools which call
> 'systemctl import-environment'
>   will not fail.
>   Other tools which call the DBus api directly need to do similar filtering.
> - Lmod is not doing anything wrong. It is export bash functions as
> variables, which is a bit wonky, but allowed by POSIX.
>   It actually makes sense for that use case.
> 
> So... if there's some particular tool which calls the Dbus api directly and
> tries to push unacceptable variables,
> it needs to be patched not to do that. *Ideally* it would be patched to just
> push *only specific variables*,
> because most variables only make sense for that program, for example
> $OLDPWD, $PWD, $SHLVL, $TERM, $SHELL.
> Only variables that make sense *globally* for the user session should be
> pushed to the systemd environment block.
> 
> AFAIC, the issue has been resolved on the systemd side. The only thing that
> I plan to change (maybe)
> is https://github.com/systemd/systemd/issues/4446. But it is not directly
> related to this issue.

Yeah, I just tested with 247 and the problem was still there, and then I found
where it had been backed out.  Need to wait for a response from upstream kde.

Thank you very much for the clarification and quick response!

Comment 7 Göran Uddeborg 2021-01-03 22:03:08 UTC
FWIW, I can confirm removing environment-modules fixes the problem, at least for me.

Comment 8 Gerald Cox 2021-01-03 23:28:07 UTC
(In reply to Göran Uddeborg from comment #7)
> FWIW, I can confirm removing environment-modules fixes the problem, at least
> for me.

How did you remove it?  Isn't it required?  When I issue dnf remove environment-modules,
DNF also wants to remove 155 dependent packages.

I guess I don't understand why this problem has been going on for so long, and why it
appears so difficult to fix.  In the meantime, the bug reports just continue
to come in.

Comment 9 Gerald Cox 2021-01-03 23:58:52 UTC
(In reply to Gerald Cox from comment #8)
> (In reply to Göran Uddeborg from comment #7)
> > FWIW, I can confirm removing environment-modules fixes the problem, at least
> > for me.
> 
> How did you remove it?  Isn't it required?  When I issue dnf remove
> environment-modules,
> DNF also wants to remove 155 dependent packages.
> 
> I guess I don't understand why this problem has been going on for so long,
> and why it
> appears so difficult to fix.  In the meantime, the bug reports just continue
> to come in.

I removed temporarily with rpm -e environment-modules --nodeps
and I can also confirm it fixes the issue with 1897717, but
that's not really a solution for me since I have other programs that
need environment-modules.  Clearly the issue is with the BASH_FUNC
env variables - which circle us back to square one again.

Comment 10 Xavier Delaruelle 2021-01-04 09:20:53 UTC
If there is a new way to push shell function to the user environment, could you please give some hints on how this should be done. I would be happy to update the environment-modules package to help fix this issue.

Comment 11 Göran Uddeborg 2021-01-04 09:38:51 UTC
(In reply to Gerald Cox from comment #8)
> (In reply to Göran Uddeborg from comment #7)
> > FWIW, I can confirm removing environment-modules fixes the problem, at least
> > for me.
> 
> How did you remove it?  Isn't it required?  When I issue dnf remove
> environment-modules,
> DNF also wants to remove 155 dependent packages.

That wasn't a problem for me, but I guess this depends on what you have installed. In my case DNF only removed three packages: environment-modules itself, scl-utils since it requires modulecmd from environment-modules, and vim-filesystem since it wasn't needed any more. That was all.

Comment 12 Zbigniew Jędrzejewski-Szmek 2021-01-04 13:38:01 UTC
(In reply to Xavier Delaruelle from comment #10)
> If there is a new way to push shell function to the user environment, could
> you please give some hints on how this should be done. I would be happy to
> update the environment-modules package to help fix this issue.

I assume you mean bash functions, since for example zsh doesn't have the equivalent
functionality of inheriting shell functions through the environment afaik. Currently
such functions cannot be pushed through the systemd environment block. They need to be
(and are) created individually in various shells using .bash_profile and equivalent
mechanisms for other shells.

I think this is the correct solution, for two reasons. Firstly, such functions are
shell-specific, and we support multiple shells, and we don't want to clutter the
environment with an implementation for every installed shell, esp. that some shells
don't support inheriting functions through the environment anyway. Secondly, this kind
of setup is not appropriate for the global settings, which are used for non-shell
environments too. (So for example, even if we wanted some environment module settings
to apply to all processes of a given user, for systemd services which are invoked
directly without a shell being involved at any point, exported bash functions wouldn't
have any effect anyway.)

It's entirely possible that environment modules are already DTRT, and don't need to
change anything. As discussed above, the problem became *visible* when the environment
block including exported bash functions was pushed into systemd and systemd refused it.
Such variables need to be filtered out by the pusher.

tl;dr: Define shell-specific setup through shell-specific profile scripts.
       Only push variables which make sense globally to the systemd environment block.


> The only thing that I plan to change (maybe) is https://github.com/systemd/systemd/issues/4446.
> But it is not directly related to this issue.

https://github.com/systemd/systemd/pull/18129

Comment 13 Xavier Delaruelle 2021-01-04 14:06:46 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #12)
> It's entirely possible that environment modules are already DTRT, and don't
> need to
> change anything. As discussed above, the problem became *visible* when the
> environment
> block including exported bash functions was pushed into systemd and systemd
> refused it.
> Such variables need to be filtered out by the pusher.

Maybe to clarify a little bit, either environment-modules, Lmod or scl-utils defines:

* a bash shell function in a .sh file under /etc/profile.d
* a csh shell alias in a .csh file under /etc/profile.d

The goal is to get such shell function (or shell alias for the csh shell family) defined when a bash/sh or csh/tcsh shell is started whatever the way this shell is started.

Comment 14 Zbigniew Jędrzejewski-Szmek 2021-01-04 14:19:16 UTC
> Maybe to clarify a little bit, either environment-modules, Lmod or scl-utils defines:

... and exports (export -f funcname).

Comment 15 Gerald Cox 2021-01-04 18:36:12 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #12)

> tl;dr: Define shell-specific setup through shell-specific profile scripts.
>        Only push variables which make sense globally to the systemd
> environment block.

I really appreciate you taking the time to respond and clarify.

But for some reason, people aren't getting that memo or they are finding it
extremely difficult to accomplish.  This problem in one form or another has been 
going on for quite a while as evidenced by all the tickets that reference
it in one way or another and as time goes by the collateral damage keeps 
accumulating.  People aren't connecting the dots.  

Hopefully, this report will hammer home the root cause.

Comment 16 Zbigniew Jędrzejewski-Szmek 2021-01-04 20:37:21 UTC
> This problem in one form or another has been 
> going on for quite a while as evidenced by all the tickets that reference
> it in one way or another and as time goes by the collateral damage keeps 
> accumulating.

That is to a large extent my fault. I wanted to fix it in one way, and it took a long
time because I had a hard time convincing other upstream maintainers to the approach.
And then it turned out that my approach is infeasible, though for different reasons
than those raised in the discussion.

If you start counting the time from systemd-246.7 which had the patch for systemctl import-invironment,
it's not that long.

Comment 17 Xavier Delaruelle 2021-01-04 20:55:00 UTC
(In reply to Gerald Cox from comment #15)
> (In reply to Zbigniew Jędrzejewski-Szmek from comment #12)
> 
> > tl;dr: Define shell-specific setup through shell-specific profile scripts.
> >        Only push variables which make sense globally to the systemd
> > environment block.
> 
> I really appreciate you taking the time to respond and clarify.
> 
> But for some reason, people aren't getting that memo or they are finding it
> extremely difficult to accomplish.  This problem in one form or another has
> been 
> going on for quite a while as evidenced by all the tickets that reference
> it in one way or another and as time goes by the collateral damage keeps 
> accumulating.  People aren't connecting the dots.  
> 
> Hopefully, this report will hammer home the root cause.

To dig a bit further, the memo says "Define shell-specific setup through shell-specific profile scripts.", but systemd goes through the shell-specific profile scripts.

Is there a reference doc about how systemd is doing the user environment initialization on Fedora? I would be glad to help here on the environment-modules side.

Comment 18 Zbigniew Jędrzejewski-Szmek 2021-01-05 12:23:10 UTC
I pushed a pull request to beef up systemd's docs a bit: https://github.com/systemd/systemd/pull/18137.
It also deprecates a blanket 'systemctl import-environment' command and changes systemctl to emit
a warning when that is used.

It is fairly easy to describe how the environment is constructed by systemd, but the interaction
with other parts of the stack is complex and not documented well. I don't think the systemd
man pages are appropriate for this kind of documentation — we don't have the expertise and they
would become stale at some point. We could probably put a document under systemd.io if no better
place is found.

> To dig a bit further, the memo says "Define shell-specific setup through shell-specific profile scripts.",
> but systemd goes through the shell-specific profile scripts.

Systemd itself doesn't use the shell *at all*. This also means that it doesn't care about profile
scripts. Instead, various desktop environments execute in an environment where the shell profile
functions have been executed, and when they call 'import-environment' or the dbus-api equivalent
to propagate session-specific variables, various variables are pushed to the manager environment
block. I don't think this makes sense, because we end up with too many wrong things in the global
environment.

What I think should happen is that the desktop environment should export a few variables it
sets itself (emphasize *itself* here, if the desktop environment is assigning a variable, it knows
whether it belongs in the global session block): DESKTOP_SESSION, DISPLAY, GDMSESSION, GDM_LANG,
GNOME_SETUP_DISPLAY?, SESSION_MANAGER?, WAYLAND_DISPLAY, XAUTHORITY, XDG_SESSION_*, or equivalent
for other environments.

Whenever a shell process is created for the user, it should execute shell profile functions and
create set the shell-specific variables there.

I'm not an expert in bash profile initialization, but it seems that the right thing happens already:
when I clean up my systemd user environment block by removing e.g. all MODULE* variables, and
start a shell process as a service, I get them back:
$ systemd-run --user -t bash
(nested) $ set|grep ^MODULE
MODULEPATH=/etc/scl/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles
MODULEPATH_modshare=/usr/share/modulefiles:1:/usr/share/Modules/modulefiles:1:/etc/modulefiles:1
MODULESHOME=/usr/share/Modules
MODULES_CMD=/usr/share/Modules/libexec/modulecmd.tcl
MODULES_RUN_QUARANTINE='LD_LIBRARY_PATH LD_PRELOAD'

When I start a gnome-terminal, the shell in the gnome terminal also has the MODULE* variables.
So it seems that the environment-modules stuff is properly initialized without pushing anything
through the manager environment block.
I don't think there's anything to fix in environment modules. It's gnome-desktop/kde that might
need some tweaking.

Comment 19 Xavier Delaruelle 2021-01-06 06:16:15 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #18)
> Systemd itself doesn't use the shell *at all*. This also means that it
> doesn't care about profile
> scripts. Instead, various desktop environments execute in an environment
> where the shell profile
> functions have been executed, and when they call 'import-environment' or the
> dbus-api equivalent
> to propagate session-specific variables, various variables are pushed to the
> manager environment
> block. I don't think this makes sense, because we end up with too many wrong
> things in the global
> environment.

Thanks a lot for the clarification and doc updates. It better understand it now.

I can confirm that default bash shell configuration for user accounts correctly executes shell profile scripts (~/.bash_profile and ~/.bashrc evaluates /etc/bashrc which in turn evaluates /etc/profile.d/*.sh scripts).

Comment 20 Ben Cotton 2021-01-12 15:45:00 UTC
This bug will be discussed at the Fedora Prioritized Bugs meeting on 2021-01-13 at 1600 UTC in #fedora-meeting
https://apps.fedoraproject.org/calendar/base/#m9788

Comment 22 Gerald Cox 2021-01-14 20:51:23 UTC
Latest update from KDE:



    FYI, a patch merged into Plasma 5.21 that filters env names as well the env values. I'm hoping this fixes our issue.

Great news David! Can you point me toward the patch so I can advise the Fedora maintainers and see if they can backport so we can test it out? Also, I had opened https://bugzilla.redhat.com/show_bug.cgi?id=1912046 in Fedora to see if we could get the systemd and DE folks all on the same page to avoid this type of issue going forward. Part of the systemd response to that bug was:

What I think should happen is that the desktop environment should export a few variables it
sets itself (emphasize *itself* here, if the desktop environment is assigning a variable, it knows
whether it belongs in the global session block): DESKTOP_SESSION, DISPLAY, GDMSESSION, GDM_LANG,
GNOME_SETUP_DISPLAY?, SESSION_MANAGER?, WAYLAND_DISPLAY, XAUTHORITY, XDG_SESSION_*, or equivalent
for other environments.

Whenever a shell process is created for the user, it should execute shell profile functions and
create set the shell-specific variables there.

Is this the approach KDE is taking going forward or do you have a better idea or path? You can either respond in the fedora bug or here and I'll pass the word along. Fedora has made it a prioritized bug to attempt to ensure we have everyone's concurrence (as much as possible).

Thanks again for your quick attention and action. It is much appreciated!

Comment 23 Gerald Cox 2021-01-15 18:12:32 UTC
The patch referenced in comment #22 was backported by Rex (Thanks again for the quick
work on getting it applied!) and looks like it circumvents the current known issues.  
However, after reading the comments for the patch, here:  

https://invent.kde.org/plasma/plasma-workspace/-/merge_requests/569

it doesn't appear to address the root cause.  As I understand it, if there
is another variable created that somehow isn't covered by the patch filters,
we're back to square one and stuff again starts breaking.

Unless anyone has a better idea the solution by Zbigniew seems very logical:

> The long-term solution is to only pass specific variables. DISPLAY, WAYLAND_DISPLAY, and so on. 
> The rule should be "The display manager sets a variable, 
> it may forward it to the environment block if appropriate. No other variables

Wouldn't it be easier to just pass what is needed, then to 
try to filter out something that is unknown?

Comment 24 Ben Cotton 2021-01-27 19:14:47 UTC
In today's Prioritized Bugs meeting, we agreed that since our two main desktop environments have addressed this upstream and zbyszek has said there are no addtional systemd changes to make, this bug can be considered closed. So I am closing it.


Note You need to log in before you can comment on or make changes to this bug.