Description of problem: I have been noticing during F21 cloud and atomic testing that I sometimes lose my bash history. This seems to be triggered by reboots as can be shown below: [root@f21 ~]# cat .bash_history | wc -l 2 [root@f21 ~]# history | wc -l 123 [root@f21 ~]# reboot Connection to 172.24.4.228 closed by remote host. Connection to 172.24.4.228 closed. [root@localhost ~(keystone_admin)]# [root@localhost ~(keystone_admin)]# ssh -i key1.pem fedora.4.228 Last login: Thu Nov 20 18:54:48 2014 from 172.24.4.225 [fedora@f21 ~]$ [fedora@f21 ~]$ sudo su - [root@f21 ~]# history | wc -l 3 Version-Release number of selected component (if applicable): This is from the F21 Cloud RC5 image: [fedora@f21 ~]$ rpm -q systemd systemd-216-12.fc21.x86_64 How reproducible: Seems to be pretty consistent to me. Steps to Reproduce: See description. Actual results: My bash history is not saved. Expected results: My bash history is saved. Additional info: I think this is an issue with systemd but it could be some other component. Feel free to re-assign where appropriate. I think the fact that I use 'sudo su -' to become another user plays a role in this. I don't think it happens without doing that. The following fedora bug seems to be the same issue from a time in the past: https://bugzilla.redhat.com/show_bug.cgi?id=821254 A few suse bugs with similar descriptions: https://bugzilla.novell.com/show_bug.cgi?id=671719 https://bugzilla.novell.com/show_bug.cgi?id=652633
Let me add to my recreation in the description by explaining how I log in to the system. I ssh as 'fedora' to the system and the 'sudo su -' to root: [root@localhost ~(keystone_admin)]# ssh -i key1.pem fedora.4.228 [fedora@f21 ~]$ sudo su - <<<< LOTS OF COMMANDS RUN IN HERE >>>>> [root@f21 ~]# cat .bash_history | wc -l 2 [root@f21 ~]# history | wc -l 123 [root@f21 ~]# reboot Connection to 172.24.4.228 closed by remote host. Connection to 172.24.4.228 closed. [root@localhost ~(keystone_admin)]# [root@localhost ~(keystone_admin)]# ssh -i key1.pem fedora.4.228 Last login: Thu Nov 20 18:54:48 2014 from 172.24.4.225 [fedora@f21 ~]$ [fedora@f21 ~]$ sudo su - [root@f21 ~]# history | wc -l 3
Created attachment 987257 [details] Output from journalctl -a I am also having this problem on both i686 and x86_64. I had chunks of bash history missing every now and then and after reading this report I realized that this has been happening consistently after rebooting. I get the same misbehavior in local and remote sessions. It doesn't matter how I become root, "su", "sudo su" and "su -" make no difference. Last time it happened, I had just updated my selinux-policy packages and then ordered a reboot. I am attaching the output from journalctl -a for the same time frame.
I ran a few more tests and history loss does not happen exclusively on reboot, but sometimes also on poweroff (though, not always) and it's not just root's history that gets mangled; all users are affected.
About a day after my last comment, the problem seems to have gone away. Dusty, could you please check if you're still seeing it?
Created attachment 988850 [details] yum.log excerpt These are the packages that got updated in the meantime.
My recreator from comment1 still has the same behavior even after updating all packages to latest as of tonight. This includes systemd-216-17.fc21.x86_64 .
I too did a fresh install of F21 on a Lenovo g50-30 and after updating everything, the problem is there. I'll check my other systems again.
This is getting worse by the day. I spent quite a few hours working from the terminal yesterday, a lot of git, rsync and several other tools that I use on a daily basis. When I finished working, I quit the terminals and shut down the system from the gnome menu, strongly resisting the urge to just type "poweroff" in my root terminal. This morning, my history was intact, both in the root account and in my user. At noon, I powered down the system exactly like the night before; quitting the terminals and issuing the shutdown from within gnome. Now, I turned the machine back on, root account was missing the last 5-6 typed commands but my user had this: $ history 1 su 2 history $ cat .bash_history su where there should have been more than a thousand commands... I really don't know if systemd is to blame, I originally posted here to avoid opening a duplicate bug report. However, something is chewing off history files and it has become more than annoying, it is severely impacting my work flow. Could someone list the programs or systemd units that perform history-related tasks?
Should we close this as a duplicate of #1183194 which contains more useful links?
Since this bug was opened first I would vote that we add any pertinent information from that bug to this one and dup that one.
I'm sorry, I misread the bug numbers and thought it was the other way around.
*** Bug 1183194 has been marked as a duplicate of this bug. ***
I believe this is caused by a systemd bug: https://bugzilla.redhat.com/show_bug.cgi?id=1141137 See my dupe. I also included a reproducer there: 1. Install a clean Fedora 21 2. Open a terminal, open two tabs, 'su' in one 3. Run 'test01' in one terminal, 'test02' in the other 4. Reboot (by running 'reboot' in either terminal or using the GNOME 'Restart' button, it doesn't seem to matter) 5. Open a terminal, check history for both user and root and see if you see 'test01' and 'test02' 6. Open two tabs, su in one, run 'test03' in one, 'test04' in the other 7. GOTO 4 (at step 5 check for 'test03' and 'test04', obviously) Loop as much as you like. Quite often, the history from one or both of the sessions running at reboot time is lost. and confirmed the bug does not affect F20 (when systemd didn't have this insta-kill behaviour).
*** Bug 1199026 has been marked as a duplicate of this bug. ***
See also: https://bugs.launchpad.net/ubuntu/+source/mosh/+bug/1446982
And F-22.
As Martin said in 1141137#c7 , can we have 743970d reverted? It is very annoying lose my bash history on every shutdown.
I also experienced this after upgrading to F22. I consider this a data integrity issue.
On Fedora21 my my ~/.bash_history has been truncated to length 0. :-( All I did was shutdown then power up.. [colin@k8 ~]$ yum info systemd Installed Packages Name : systemd Arch : x86_64 Version : 216 Release : 25.fc21 [colin@k8 ~]$ yum info bash Loaded plugins: fastestmirror, langpacks Installed Packages Name : bash Arch : x86_64 Version : 4.3.39 Release : 1.fc21 Size : 6.8 M [colin@k8 ~]$ ls -alrt /home/colin/.bash* -rw-r--r--. 1 colin colin 231 Oct 8 2014 /home/colin/.bashrc -rw-r--r--. 1 colin colin 193 Oct 8 2014 /home/colin/.bash_profile -rw-r--r--. 1 colin colin 18 Oct 8 2014 /home/colin/.bash_logout -rw-------. 1 colin colin 41385 Jun 1 23:57 /home/colin/.bash_history~ -rw-------. 1 colin colin 40916 Jun 2 15:52 /home/colin/.bash_history.old -rw-------. 1 colin colin 0 Jul 27 18:17 /home/colin/.bash_history [colin@k8 ~]$
yeah, I think one of the #fedora-desktop guys saw the same when we were diagnosing this; I think there's some very unfortunate special case of this where, if the terminal/bash is killed at a very specific unfortunate point, the file gets truncated :/
I had mine truncated to 0 once too. It makes me a little concerned that the issue is not bash-specific and other files may have the same thing happen.
It's not at all bash-specific; it's just that bash makes us more likely to notice, as far as I can tell. It has happened to me multiple times; to the point where I make sure to keep frequent copies of my .bash_history around as backups. It has hit me quite a few times now. If I have a logged-in session and reboot in any other way than choosing to reboot when logging out of a graphical session (e.g. "shutdown -r now" in a root shell, or the one time my X session quit displaying properly on F22 and I had to "shutdown -r now" from a terminal session) it seems to pretty reliably truncate .bash_history. But I agree, we may have arbitrary other data corruption happening due to this that is more subtle and harder to trigger or trace down. There's no particular reason to assume that the only way this happens is a truncated file, nor that all impact would be so immediately visible. When did data integrity failures lose their no-ship status for Fedora?
Upstream bug: https://github.com/systemd/systemd/issues/317
"When did data integrity failures lose their no-ship status for Fedora?" So far as I'm aware there wasn't ever any such rule. There are, like, 10,000 complicated bits of software in Fedora. The chance that any given Fedora image contains some sort of data integrity issue is, basically, 100%. We have a release criterion: "All known bugs that can cause corruption of user data must be fixed or documented at Common F23 bugs." (the release number changes as appropriate), so this probably ought to be in common bugs (I sorta thought it was already). That criterion was introduced with the formal release criteria revision in F13, prior to that I don't believe there was a corresponding requirement of any kind. Anyhow, aside from that, I think you're working with a rather squishy definition of "the issue". The fact that systemd kills processes without giving them a real chance to shut down properly is of course not at all bash-specific, but I think the .bash_history truncation bug very likely *is* bash-specific, in the sense that the bash code is so written that if it happens to get killed at a specific point, the file gets truncated. There may of course be similar cases in other code, where terrible things might happen if a given process is killed at just the wrong time, but they wouldn't logically speaking be the same issue (as they'd be in completely different code, so you couldn't fix them by changing bash), and so far as I'm aware we haven't had any clear reports of such cases yet.
(In reply to Adam Williamson from comment #24) > The chance that any given Fedora > image contains some sort of data integrity issue is, basically, 100%. Pure gold! That's the most accurate and the funniest thing I've read in a while. We should make it a Fedora motto or something. Or at least a signature. :-) > The fact that systemd kills processes without giving them a real chance to > shut down properly is of course not at all bash-specific, but I think the > .bash_history truncation bug very likely *is* bash-specific, in the sense > that the bash code is so written that if it happens to get killed at a > specific point, the file gets truncated. Is this maybe one of those "atomic rename" things that there was a storm about around the time ext4 was about to replace ext3?
(In reply to Adam Williamson from comment #24) > So far as I'm aware there wasn't ever any such rule. There are, like, 10,000 > complicated bits of software in Fedora. The chance that any given Fedora > image contains some sort of data integrity issue is, basically, 100%. When I was FPL it was the rule that we didn't ship with known data corruption bugs. But that was FC1 and not all the rules were written down yet... So whether it changed in F13 or had been informally been relaxed before that is probably lost to posterity. Thanks for the pointer to the criterion! > Anyhow, aside from that, I think you're working with a rather squishy > definition of "the issue". I don't see where I wrote "the issue" as you quote. I think that you are right that #1141137 is the cause ("The right fix for the whole mess is to port systemd to the unified cgroup logic" as Lennart says in the upstream bug report) and should be fixed there to avoid corruption generally. So are you saying that this issue should be for possibly working around the cgroup partially implemented feature by investigating why bash is truncating the history file, and we should discuss the general problem of possible unrecognized data corruption in #114113? In that case, this issue (bash-specific) would still be a known data corruption issue that is specific, I would think, so adding the CommonBugs keyword is a good idea based on the release criterion you quoted.
Just to get a clear picture of the case here: without knowing bash code I guess that bash is actually caching those commands in memory and when the machine should shut down, it has no time to write those commands to .bash_history? "Systemd is a good test bench because now we can make take applications work correctly" some would say. I've actually already heard/read this comment from multiple sources. Fixing this "correctly" would require writing each line of .bash_history file separately (with a possible forced sync for each write). Avoiding unnecessary I/O activity and having delayed writes are probably more common today. Now we are running this stuff in Raspberry Pi 2, for example, with their particularly bad "SSD drives" and we want to avoid frequent small writes as much as possible. We also want to avoid unnecessary writes because of performance/load reasons. Because of systemd behavior the shutdown is now forced power off case for many applications. There are two possibilities: Fix to systemd: a) Simple and fast fix. (+) b) Preserve SSD lifetime. (+) c) Have better system load/performance/responsiveness. (+) d) Possibly a little bit slower (probably unnoticeable if implemented correctly) shutdown sequence. (-) Fix to applications: a) Investigation needed. (-) b) Documentation change needed. (-) c) Responsible parties should be notified. (-) d) All responsible applications need to be changed. (-) e) Reduced SSD lifetime (-) f) Have worse system load/performance/responsiveness. (-) g) Possibly a little bit faster (probably unnoticeable) shutdown sequence. (+)
Thanks to all for their contributions to that enlightening discussion. What I have learned so far: 1) There exists a serious data corruption bug which already has a longstanding bugzilla entry: https://bugzilla.redhat.com/show_bug.cgi?id=1141137 2) The intermittent data loss that I have occasionally noticed for many months now has been repeatedly reported by others- see 1) 3) In the intervening 10+ months since 2014-09-12 until now there have been MANY reports of this serious problem: http://lists.freedesktop.org/archives/systemd-devel/2014-October/024452.html https://bugs.launchpad.net/ubuntu/+source/mosh/+bug/1446982 https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1448259 3) Some sort of Serious Organisational Disfunction seems to be manifesting itself here whereby The Fedora Project has completely failed to release to its users a package build that either reverts the already identified patch that introduced the error or included improvements to fix it. Did I miss out anything else important?
(In reply to colin from comment #28) > Did I miss out anything else important? Yes, RH reverted it in RHEL 7.2 systemd: https://bugzilla.redhat.com/show_bug.cgi?id=1199644
(In reply to Michael K Johnson from comment #26) > (In reply to Adam Williamson from comment #24) > > So far as I'm aware there wasn't ever any such rule. There are, like, 10,000 > > complicated bits of software in Fedora. The chance that any given Fedora > > image contains some sort of data integrity issue is, basically, 100%. > > When I was FPL it was the rule that we didn't ship with known data corruption > bugs. At the time you where the FPL there was no functional QA sub community in the project so you had no real means of determine if the distribution was being shipped with data corruption or not, nor was there any real testing being performed on the image being composed or being shipped. . .
Thankyou Lukas. :-) That new piece of information suggests to me that there would be no point in me being bounced into a forced update from fc21 -> fc22 to get systemd-219 If http://koji.fedoraproject.org/koji/packageinfo?packageID=10477 has not been fixed yet. p.s. OOps I posted this in the wrong thread first. <shrug> Since mere BZ commentors are denied any way to fix their mistakes.. I will just have to carry on like it never happened <whistles>
colin: basically, this hasn't been treated as an especially high-priority problem in Fedora; I'm not in any special position, but my personal take has been that it's more an annoyance than anything else (I run three systems that are subject to the bug as my main systems for daily use, though I worked around the bash case in my bash config). Again this is a personal take not anything official, but I think calling it a 'serious data corruption bug' is sort of pushing things a bit - "I'll look it up in my bash history!" is not, to my knowledge, a widely recommended personal data storage strategy :) I don't think anyone was particularly keen to gainsay the systemd maintainers and just revert the commit downstream, because it's never entirely safe to do that kind of thing. Just because you've identified that change A causes behaviour change B, and reverting change A is possible and does indeed stop behaviour change B, it doesn't follow that it's *all* that reverting change A does, or that reverting change A is safe. It's entirely possible that other, later changes assume that change A is in place, and reverting it from a later version of the code will have unexpected consequences. Obviously this isn't something anyone wants to risk lightly when it comes to systemd. Each time I've looked at this I kind of got the impression it was expected that a proper fix would be arriving in upstream shortly, but at this point that seems somewhat less likely, and I'm trying to get the systemd maintainers to sign off on reverting the change until a better fix is available upstream.
michael: sorry for any confusion, I think I was conflating your comment with a parallel github discussion.
Thankyou for your full reply Adam. It has certainly been useful for people to document their knowledge of what has been going on - and enabled me to examine the full history of the problem. Since I do not have the free time to expend to create a custom build of a shipped package, just to fix it _for_myself_ I will just hurry-up-and-wait some more. I have waited this long for Fedora to release a fix, but at least I now have hope that a fix will appear in Centos (via RHEL7.2) which _should_ dovetail into my existing launch schedule. (it is actually the systemd-shutdown-in-a-container-corruption issue that I am more concerned about than the 'my-history-disapeared-again-for-the-nth-time') Once again - thanks for now creating enough of a breadcrumb trail in this thread that an external observer will be able to fully appreciate the history of the problem. Perhaps it will all be sorted out for Fedora23. ;-) That is all. Thanks again, Colin
Adam, You seem to be all over the place on this ... on the one hand saying its not that big of an issue, and just barely/not really a data integrity issue , and going so far as to be dismissive of anyone relying on .bash_history ... yet you then say "I run three systems that are subject to the bug as my main systems for daily use, though I worked around the bash case in my bash config" (yay you!) Have you ever heard the phrase "eat your own dog food"? To say it is not a big issue and reply to me in the systemd git issue queue that you knew about it, but wouldn't get around to pushing fix till F23 ... yet you found it annoying/exasperating enough to kludge a fix/work around on your own. Yet the peons who are not as smart as you should be ok just waiting until Oct and F23 to get the issue fixed. If its not a big deal, undo your bash config fix, and live with the pain point so you can objectively see what an issue it is ... At the very least, care to share bash config so we can "work around" it until it finally does get released? Or maybe reach out to the bash maintainers and commit it officially for the betterment of the distro? I think that clearly goes to colin's statement >3) Some sort of Serious Organisational Disfunction seems to be manifesting itself here whereby The Fedora Project has completely failed to release to its users a package build that either reverts the already identified patch that introduced the error or included improvements to fix it. You also state "I don't think anyone was particularly keen to gainsay the systemd maintainers and just revert the commit downstream" but that is simply untrue .... The maintainer for Debian/Ubuntu/Redhat 7.2 have all reverted it. They eat their own dog food. You seem to take umbrage in systemd git issue queue when I repeated what I had been told by Michael Chapman ," "As with many other critical Fedora packages, the Fedora maintainers are also upstream maintainers and they typically only backport patches to Fedora that have already applied upstream, so the best place to start would be to get this fixed upstream: https://github.com/systemd/systemd" You answered, "I'm not really sure what you mean when you keep saying Fedora 'rely on systemd directly as an upstream sources', implying that it's somehow different from any other distro. All distros use systemd as the upstream source for systemd, it couldn't really be any other way." But yet this conversation seems to imply exactly that! Every other distro has recognized this as a BAD commit and reverted it. Yes or No, Does Poettering have to sign off on it before it can be reverted for Fedora? I DO agree with your statement, "Software is complex, and it's not generally a good idea to run around whacking on buttons that superficially look like they do the right thing." ... but I feel the commit in question is the egregious "whacking on a button".
(In reply to sforsyt from comment #35) > You seem to take umbrage in systemd git issue queue when I repeated what I > had been told by Michael Chapman ," "As with many other critical Fedora > packages, the Fedora maintainers are also upstream maintainers and they > typically only backport patches to Fedora that have already applied > upstream, so the best place to start would be to get this fixed upstream: > https://github.com/systemd/systemd" It was actually Michael Catanzaro, not me. :-) Just to add my point of view, I think we need to remember this bug is not really about losing Bash history. That's just the most visible symptom of the bug. systemd will SIGKILL *any* scope unit when that unit is stopped. The problem is not related to login scopes. It's certainly got nothing to do with the system being shutdown or rebooted: in bug #1183194 I gave a demonstration of this using "systemd-run --scope". Why should stopping a scope unit behave any differently to stopping a service unit? Why do I need to be ultra careful that the software I run through a transient scope unit doesn't lose data when SIGKILLed, when I don't need to be so careful when using a full service unit? I think we do need to fully understand what reverting that commit will do. As I understand it, it's guarding against a possible-but-somewhat-unlikely race condition... and moreover, that will resolve itself automatically after TimeoutStopSec (which is 120 seconds for scope units). But I have yet to have any confirmation from a core systemd developer of this; I know I did ask about it at least once on the mailing list.
Just for the record, that's not what "eat your own dogfood" means. But *more* importantly, bugs aren't the best place for that kind of discussion. In fact, really, since you seem to mostly be focusing on one of the people who is trying to help rather than focusing on the problem itself, I don't think there *is* a good place for that. However, if you want to discuss any broader issues of Fedora policies or approaches, it would be better to use the Fedora Devel or Fedora Test mailing lists -- let's keep this bug focused on the problem itself.
(In reply to sforsyt from comment #35) > At the very least, care to share bash config so we can "work around" it > until it finally does get released? I believe this is what you're looking for: https://www.happyassassin.net/2015/01/16/bash-history-with-multiple-sessions/ I think it was mentioned somewhere in one of the links from #1183194.
(In reply to Michael Chapman from comment #36) > Just to add my point of view, I think we need to remember this bug is not > really about losing Bash history. That's just the most visible symptom of > the bug. That's not my understanding. My understanding is that #1141137 is the systemd behavior of sending SIGKILL right after SIGTERM (what I understand to be a partially-implemented feature from Lennart's description in the upstream bug at github), and this bug is about bash losing all its history because it can't handle a SIGKILL arriving almost immediately after a SIGTERM; a bash bug that was not generally relevant until systemd started to commonly do precisely that. I believe that I contributed to improper aliasing of the two related but logically independent issues, and I apologize for that.
Sorry Michael Chapman for mis-attributing quote; thanks for the link Alexander.
(In reply to Michael K Johnson from comment #39) > That's not my understanding. My understanding is that #1141137 is the > systemd behavior of sending SIGKILL right after SIGTERM (what I > understand to be a partially-implemented feature from Lennart's > description in the upstream bug at github), and this bug is about > bash losing all its history because it can't handle a SIGKILL > arriving almost immediately after a SIGTERM; a bash bug that was not > generally relevant until systemd started to commonly do precisely that. Yeah, that's an excellent point. By "this bug" I was really referring to the systemd bug, not this bugzilla ticket. But putting that aside, I think it'd be a good idea to treat "Bash's behaviour with ill-timed SIGKILLs" and "systemd's behaviour when stopping scope units" as separate problems. They can be solved independently. Perhaps this ticket should be reassigned to the "bash" component?
Is it really "Bash's behaviour ill-timed SIGKILLs"? As I understand it , you send a SIGHUP to a process to tell it to exit and explicitly give it a chance to do any house-keeping/shutdown. Sending SIGKILL immediately after SIGHUP ( 1 cpu cycle later) , why bother sending SIGHUP at all ... just send the SIGKILL. If anything I would guess the SIGHUP is actually what causes the "problem", bash begins to shutdown and opens a filehandle with ">" (truncating the existing file) , but before it can write out it is sent SIGKILL. That's purely a guess and probably gross over simplification, but I think fairly typical of how a program would be coded to handle SIGHUP
pam sessions behaviour/logic not bash thats ( part of ) problem here.
(In reply to sforsyt from comment #42) > Is it really "Bash's behaviour ill-timed SIGKILLs"? I'd say so. No other signal need be involved. $ cat >/tmp/history one two three four five $ HISTFILE=/tmp/history HISTSIZE=5 strace bash execve("/usr/bin/bash", ["bash"], [/* 75 vars */]) = 0 ... read(0, "e", 1) = 1 write(2, "e", 1e) = 1 read(0, "x", 1) = 1 write(2, "x", 1x) = 1 read(0, "i", 1) = 1 write(2, "i", 1i) = 1 read(0, "t", 1) = 1 write(2, "t", 1t) = 1 read(0, "\r", 1) = 1 write(2, "\n", 1 ) = 1 ... stat("/tmp/history", {st_mode=S_IFREG|0600, st_size=24, ...}) = 0 open("/tmp/history", O_WRONLY|O_APPEND) = 3 write(3, "exit\n", 5) = 5 close(3) = 0 open("/tmp/history", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0600, st_size=29, ...}) = 0 read(3, "one\ntwo\nthree\nfour\nfive\nexit\n", 29) = 29 close(3) = 0 open("/tmp/history", O_WRONLY|O_TRUNC) = 3 write(3, "two\nthree\nfour\nfive\nexit\n", 25) = 25 close(3) = 0 ... So if the Bash process is killed anywhere between those last open and write commands, the history file can end up completely empty. Note that if Bash detects that the file doesn't need to be completely rewritten (i.e. it has fewer than HISTFILESIZE entries in it), it opens the file with O_WRONLY|O_APPEND and simply writes the new entries. This is safe even if Bash is unexpectedly killed.
(In reply to Michael Chapman from comment #44) > Note that if Bash detects that the file doesn't need to be completely > rewritten (i.e. it has fewer than HISTFILESIZE entries in it), it opens the > file with O_WRONLY|O_APPEND and simply writes the new entries. This is safe > even if Bash is unexpectedly killed. I should clarify this. It's "safe" in that you don't lose existing entries in the history file, but you may get none, some or all of the new entries, depending on exactly when the process is killed and the size of data that needs to be added. I don't consider that to be a Bash bug. Certainly though I think there could be an improvement in the way it handles the complete rewrite of the file, when that's necessary.
sforsyt: When I said 'anyone' I meant 'anyone in Fedora'. Fedora can choose to patch this issue downstream. It has not yet done so. But that's simply a question of choice: you seemed to be suggesting that we were somehow prevented from doing so by some sort of technical or policy issue. This is not the case. There is no fundamental reason Fedora's systemd package cannot have downstream patches, it's simply the case that we have not chosen to patch this issue downstream (yet). For the record, this is what I was told by a core systemd dev when asking if they would consider it a good idea to revert the commit downstream: it's broken either way but i'd rather take missing shell history over leaking without bounds I'm asking if we can at least get a temporary fix which avoids the kills without causing unacceptable other consequences. I did note that there are consequences beyond the bash_history one. The Fedora package *can* be patched without the say-so of upstream systemd maintainers and even by people other than the downstream packagers, but we tend to take the advice of the former seriously and it's generally much better to avoid the latter and do things by consensus.
(In reply to Adam Williamson from comment #46) > > The Fedora package *can* be patched without the say-so of upstream systemd > maintainers and even by people other than the downstream packagers, but we > tend to take the advice of the former seriously and it's generally much > better to avoid the latter and do things by consensus. I agree. But how long the right fix will take? A week, a month, a year? More? This is the problem here. I would simply follow RHEL until the right fix arrive. My 2 cents.
Please read comment #46.
Has there been any response from the systemd maintainer in the last 7 days on signing off on this?
Not since what I posted in #c46, no. Sorry. AFAIK that's still the status: upstream doesn't believe that reverting the commit is an appropriate fix, so we're waiting on something else.
Why is the discussion with "upstream" not being discussed in public on fedoras issue queue, but rather in private communication outside of the projects own issue queue? Many people have commented and discussed this bug, its implications, arguing for/against reverting patch ... have the "maintainers" of systemd for Fedora made one comment explaining on here? Earlier Adam said >Fedora can choose to patch this issue downstream. It has not yet done so. But that's simply a question of choice: you seemed to be suggesting that we were somehow prevented from doing so by some sort of technical or policy issue. This is not the case. There is no fundamental reason Fedora's systemd package cannot have downstream patches, it's simply the case that we have not chosen to patch this issue downstream (yet). To me from the discussion (which I can't see as its completely behind closed doors) seems to be proving the opposite. The fact that it has taken 21 days suggests either there was alot of internal discussion on this issue ... or it was simply beneath the maintainers time to even respond to. I missed it , but in C46 you said that upstream made this comment > but i'd rather take missing shell history over leaking without bounds What "leaking without bounds" ... I've not seen anyone suggest or describe this as a "leak" .. what type of leak is he referring to? It blocks shutdown for a few minutes ... I'd rather take THAT then data loss. Many, many people have shown and experience the data loss as per comments left on this ticket .. to my knowledge a single complaint led to that commit that caused these issue. I've not seen multiple people complain this effects them nor claim that it is easily reproducible. In fact it has been the opposite, some have stated that it is a rare edge case that not everyone can reproduce or trigger. Every other distribution that has an independent maintainer for systemd; that can objectively decide what is best for the user base of its distribution has reverted this commit due to the impact of data loss on its users. Those other distributions have not erupted in flames or come to a screaching halt because of "leaking without bounds" , in fact I don't think there has been a single issue/ticket created since those other distributions reverted this patch complaining of an issue (ie it was a very small subset that this issue effected). Earlier you said if no progress was made you would revert this commit for F23 to be released in Oct ... is that still your plan?
There aren't any 'closed doors'. The upstream discussion is happening on upstream's bug tracker. I usually talk to systemd devs on IRC, their IRC channel is public on freenode. "Those other distributions have not erupted in flames or come to a screaching halt because of "leaking without bounds"" As I understand it, the leak is a potential security issue, not a flaming eruption. If you have an authoritative technical analysis of the bug, please provide it.
Proposed as a Blocker for 23-final by Fedora user sforsythe using the blocker tracking app because: See also TTS 1141137 The commit at issue for TTS 1141137 and 1170765 has demonstrated wide spread data loss in multiple applications .. most notably bash history, but in a wide range of other applications as detailed by many different users in the two TTS listed above. The issue of this commit has been raised in every other major distribution (Redhat, ubuntu, debian,etc) as well; all have unanimously agreed that the commit causes data loss and is unacceptable for the quality and have reverted the commit. I realize fedora is a testing ground and considered "bleeding edge". To experience a bug is one thing , to know a bug that causes data loss exists and to ignore it is another. The package maintainers for systemed for fedora have not responded one to the issues or claims raised in the TTS. Rather as I understand it they have had internal "closed door" discussion with Adam Williamson. If Lennart, Kay, and the other systemd maintainers are beyond reproach and what they say goes ... that needs to be stated clearly and simply. The illusion of of community input and feedback need to be abandoned and stated implicitly that the direction and quality of fedora will be determine by the appointed Redhat Employees. Simply put ... this issue seriously stains the reputation and integrity of the fedora project as it ignores data integrity. Some have tried to "wishy wash" it and say "well TECHNICALLY it's a data integrity issue , but not really". Yes, it is a data integrity issue. Easily verified, easily repeatable, and effects a wide range of applications.
er, upstream's bug tracker and mailing list, I should say (there are bits in both places). I'm not closely following that, I'm relying on other people's reports.
Please don't make unwarranted extrapolations and assumptions from my comments. Look, I'm trying to help here, OK? Getting on a soapbox and making unwarranted assumptions is not *helping* you to get this bug addressed, it is *hurting*.
The commit in questions changes makes one change - wait_for_exit = true; + /* wait_for_exit = true; */ No where in this TTS or in any other has anyone suggested that there is a security issue. I have followed the systemd git issue queue and the systemd devel mailing list and I've never seen that stated once. If there is a security issue, why isn't it being disclosed so other downstream distribution can take appropriate actions? It sounds suspiciously like a red herring. If someone with "authoritative technical analysis of the bug" believes there is a security issue ... where is the tts for that? Please provide it.
I'm telling you what the devs told me. That's all I know. Please stop assuming bad faith, here. Reverting the commit is easy. If there isn't actually a problem with doing it, what possible reason would the developers have to lie? Who does that benefit? Frankly I'm tired of trying to help with this bug, which is an awkward position to be in in the first place, and getting crapped on for my troubles.
The upstream issue queue to my understanding is on git, the only ticket I've seen on this issue is https://github.com/systemd/systemd/issues/317 which has not been updated in 23 days. No one there has made any mention of a security issue. The devel board is http://lists.freedesktop.org/archives/systemd-devel/ and I've not seen mention of it since Oct 2014 http://lists.freedesktop.org/archives/systemd-devel/2014-October/024452.html I'm sorry your feeling beat up .. but your only one that seems to be responding to the issue. This is what I know as the "maintainters" for systemd in fedora https://admin.fedoraproject.org/pkgdb/package/systemd/ kay (Fedora devel, Fedora 23, Fedora 22, Fedora 21) harald (Fedora devel, Fedora 23, Fedora 22, Fedora 21) michich (Fedora devel, Fedora 23, Fedora 22, Fedora 21) lennart (Fedora devel, Fedora 23, Fedora 22, Fedora 21) and the systemd-maint (Fedora devel, Fedora 23, Fedora 22, Fedora 21) Which I don't know who that goes too. To my knowledge none of them have responded to the issue on this TTS here. That is a "fedora" issue, if the maintainers are not present for the discussion and you have to act as a middle man for them. I listed the systemd git issue queue, and the devel mailing forum .. I have been following them but don't see any "discussion" of this issue. That is why I continue to discuss it here. Is there someplace else it can be discussed?
(In reply to Adam Williamson from comment #57) > I'm telling you what the devs told me. That's all I know. I really don't want to inflame this discussion even further, but I would like to say that I too am frustrated at the lack of communication from the systemd developers. It's not just this bug; I am CCed on several other systemd bugs in Fedora (e.g. [1] [2] [3]) that have had no communication from the systemd package's maintainers. It's great that we now hear that the systemd developers think the problem described in this bug should not be solved by simply reverting that one commit. But really, we should be getting this news from *them* (via systemd-maint or their own Bugzilla accounts), not indirectly through you, and certainly not after having to practically beg for the information. I really do hope there's a way we can improve communication between the systemd developers and its users in Fedora. This bug shows that the present situation simply isn't working well. [1] https://bugzilla.redhat.com/show_bug.cgi?id=995792 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1072368 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1243319
Discussed at 2015-08-24 blocker review meeting: http://meetbot-raw.fedoraproject.org/fedora-blocker-review/2015-08-24/f23-blocker-review.2015-08-24-16.03.log.txt . Accepted as a Final blocker per criterion "All known bugs that can cause corruption of user data must be fixed or documented at Common F23 bugs." We also note that we consider this bug sufficiently serious that we'd be unhappy with just documenting it as a resolution, this bug should be fixed.
*** Bug 1238456 has been marked as a duplicate of this bug. ***
*** Bug 1141137 has been marked as a duplicate of this bug. ***
This upstream commit looks relevant https://github.com/poettering/systemd/commit/e9db43d5910717a1084924c512bf85e2b8265375
Well, kinda, but I think part of this comment: https://github.com/systemd/systemd/pull/350#issuecomment-137005614 is about that. Note: "I also made a change as part of #1111 (e9db43d) that reenables waiting for cgroup empty events on the legacy hierarchy under certain conditions, specifically when the unit in question is not a delegation unit, and we are not running in a container (which won't do a thing about the session case though, as session scopes are delegation units)." so, I don't think it helps the major case we care about here.
I just pushed batch of bugfixes for systemd to F23 along with revert which should fix this bug. I know that revert is trade-off rather than fix, but I fear we don't have any other viable options. Build is already underway and scratch build is here http://koji.fedoraproject.org/koji/taskinfo?taskID=11217918
Thanks a lot. Can we please do the same for F22?
And for F21 maybe?
I changed my mind a bit about revert. I think more sensible would be to backport this change [1] instead. It should cover most cases we care about here, i.e. login sessions, since those are not delegation units. https://github.com/systemd/systemd/commit/e9db43d5910717a1084924c512bf85e2b8265375
(In reply to Michal Sekletar from comment #68) > I changed my mind a bit about revert. I think more sensible would be to > backport this change [1] instead. It should cover most cases we care about > here, i.e. login sessions, since those are not delegation units. Doesn't Lennart's comment directly contradict that? """ I also made a change as part of #1111 (e9db43d) that reenables waiting for cgroup empty events on the legacy hierarchy under certain conditions, ... (which won't do a thing about the session case though, as session scopes are delegation units). """ I am also worried about how he said this won't help SSH-parented scopes, since that is the *specific* case that bothers me the most: I log into a remote system, make some changes, run "reboot", and my Bash history doesn't get saved.
Hmm, you are probably right. But I am bit puzzled, because on my machine $ systemctl show -p Delegate session-1.scope Delegate=no Anyway, I will ask Lennart.
systemd-222-6.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-1e06faabb7
systemd-222-6.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update systemd' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-1e06faabb7
systemd-222-6.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
Argh, this's what it is - I arrived at the bug from reading F23 common bugs (contemplating upgrades and getting prepared), and realise this is why occasionally I felt the systemd seems to be shutting down a bit too quickly than I felt it should, *and* coming back up needing fsck and/or various small things seem to get corrupted. FWIW I am seeing a pattern of systemd devs going for news-headline-worthy speed-claims over anything else. "We shut down faster than anybody else!". A while back systemd was killing gdm if it takes more than 45(?) seconds to start, risking putting older slower systems in limbo, with no GUI and no console, killing gdm mid-boot; I just filed a bug a few hours ago on journald self-destructing repeatedly corrupting the journal every 15 minutes when I had a big 'git gc' going... What's wrong with the idea of giving other pieces of software reasonable amount of time to do their work, and patiently wait a bit?
This isn't some kind of excessive optimization, it's more an unfortunate consequence of larger changes which aren't really anything to do with speed.
Hi all. Anyone know if an updated package to fix this is likely to get rolled out to fedora 22?
Hi there, I also would appreciate if the fix would be backported to FC22 as well. Currently all programs left open in my favorite desktop-environment (KDE) are being closed after shutdown/reboot, which is very annoying. If this should be a separate issue please apologize for the noise.
Lads. https://bugzilla.redhat.com/show_bug.cgi?id=1274537
I'm going to reopen this. I started losing history again on reboots. x64_64 VM running on ESXi.
Just adding that it started again with Fedora 24 (and 25) for me also, on real hardware (X1 laptop), so it's not specific to VM or ESXi.
Add Fedora 23 to the list. (Currently with systemd-222-16.fc23.x86_64)
Hello, I have system shutdown script to shutdown databases before poweroff/reboot. The system shutdown script could not shutdown the database timely because some database processes was killed before the database is shutdown properly. The databases need to perform recovery process every time when the system start up.
Just had this happen on an upgraded F25 VM. So, not fixed there either.
I think the cause is probably somewhat different at some point, but yes, this does appear to be happening again. All processes are also instakilled on *logout* from GNOME. Not sure about other desktops.
There really hasn't been any change to systemd particularly in F23 lately which looks like it should cause this. Can anyone provide any more precision on when they observed this behaviour changing? I've checked the latest F23, and it still has the reversion of the change that originally caused the bug, so it's definitely not the *same* issue again. F24 and F25 have different code because the 'unified hierarchy' thing landed there, but I don't know the ins and outs of that enough to know if that's a problem.
(In reply to Adam Williamson from comment #85) > There really hasn't been any change to systemd particularly in F23 lately > which looks like it should cause this. Can anyone provide any more precision > on when they observed this behaviour changing? I think it was about a week before I commented here, so somewhere around October 10. The last commands in root's history weren't what I remembered. I have upgraded this system to F25, but I have kept a Fedora 23 MATE VM, because I need scidavis. Should I run any tests in that one?
We accepted this as an F23 Final blocker, and several people are reporting it again on F25, so we really should consider it again for F25.
zbyszek, do you have an idea why this has started happening again?
For the record (as I may not be at the blocker bug meeting), I start as a +/-0 on this. I'm hoping we get more info by the time of / during the meeting.
Discussed at 2016-11-14 blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2016-11-14/f25-blocker-review.2016-11-14-17.00.html . We agreed to delay the decision on this bug for further testing and input from developers. We may open a new bug and transfer the proposed blocker status there, as this does not seem to have the same root cause as the case from last year.
Sorry for not looking into this easier. Too many bugs :( I think the following is happening: systemd will not wait for processes to terminate under certain conditions. In this case the relevant one is that Delegate=yes is set for the unit. Currently, gnome-terminal runs as systemd --user unit, which means that from the POV of PID1, it is part of user@.service, which has Delegate=yes. So... we have the following conundrum: current code SGIKILLs graphical terminal sessions. If we simply always wait, systemd will "hang" (for 180s probably) in some cases during shutdown. Dunno, there's no nice solution here, but it's probably better to hang (if it happens rarely) than the current situation. I'll prep a patch to drop the Delegate=yes part of the condition for F25.
I've opened a new bug here since the long history could be kinda confusing in this one: https://bugzilla.redhat.com/show_bug.cgi?id=1394937 The original bug here clearly was fixed and remains so, so closing again.