Bug 2162708 - systemd-oomd kills my backup script (and other GNOME desktop things).
Summary: systemd-oomd kills my backup script (and other GNOME desktop things).
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 37
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-20 15:18 UTC by Anthony DeDominic
Modified: 2023-12-07 14:59 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-12-07 14:59:28 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
sadf -g -- -q PSI -s 08:10:00 > sa_PSI.svg (62.90 KB, image/svg+xml)
2023-01-20 15:18 UTC, Anthony DeDominic
no flags Details
logs of sys-backup service. (236.56 KB, text/plain)
2023-01-20 15:19 UTC, Anthony DeDominic
no flags Details
logs of systemd-oomd (78.94 KB, text/plain)
2023-01-20 15:19 UTC, Anthony DeDominic
no flags Details
backup script. (768 bytes, application/x-shellscript)
2023-01-20 15:20 UTC, Anthony DeDominic
no flags Details
backup service unit (174 bytes, text/x-systemd-unit)
2023-01-20 15:22 UTC, Anthony DeDominic
no flags Details

Description Anthony DeDominic 2023-01-20 15:18:53 UTC
Created attachment 1939400 [details]
sadf -g -- -q PSI -s 08:10:00 > sa_PSI.svg

Description of problem:
systemd-oomd will occasionally kill my backup script when I send full snapshot (non-incremental) to my backup drive, it also seems to kill unrelated GNOME desktop components? see journalctl attachment.

Version-Release number of selected component (if applicable):
% oomctl --version
systemd 251 (251.10-588.fc37)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

How reproducible:
Mixed; reproducibility seems more consistent when sending full snapshots to backup and actively using my desktop.
I have yet to create a perfect reproduction, but am working on it.

Steps to Reproduce:
1. Send full snapshot between two disks, either in the user.slice or in the 
service.slice
1.a btrfs send /big_snapshot | btrfs receive /mnt/some_other_disk

I hope to have better reproduction in the future.

Actual results:
systemd-oomd will kill the systemd service I use for this or kill the offending terminal that is running this command when it's not put into another systemd slice using systemd-run or ran as a service.


Expected results:
Backups to work without being SIGKILLed

Additional info:

Backup Drive [btrfs recv]: Model(ST8000DM004) P/N(2CX188) Serial(ZCT07N2M)
___
connected via a Sabrent 3.5" harddrive USB UASP enclosure. It is running using the UAS USB driver, not BOT mass-storage.
This appears to be a "device-managed" SMR drive. Googling the serial number appears to confirm this, plus the disk indicates it supports ATA TRIM, but has no zone namespaces.
Note this disk is unpowered most of the time, I power it on and it immediately triggers my backups via a udev rule.

In terms of how this block device is configured on Linux:
USB UAS <-> dm-crypt (LUKS) <-> Btrfs

It was suggested to me in #fedora that the speed of writing to this drive may be the culprit for the OOM condition? hard to say, but I have been using this backup system longer than systemd-oomd seemed to have replaced the older facebook/oomd binary.
___

Storage Drive [btrfs send]: ST3000VN007-2E4166_Z73030Z2
___
This is where the data is coming from.
ATA (AHCI) <-> dm-crypt (LUKS) <-> Btrfs
___


Attachments:
sa_PSI.svg:
is a plot of the following command: `sadf -g -- -q PSI -s 08:10:00 > sa_PSI.svg`
unfortunately, the granularity is quite low, I wish I had a 1sec tick on it. When I attempt another backup, I'll make sure to have sar write to another datafile with 1sec intervals.

oomd_journalctl.txt
output of systemd-oomd since I noticed my backup script getting killed. what I find interesting here is the more recent ones, when I tried to replicate this, I see it killing random GNOME stuff in my user slice. I can confirm whatever it is killing, it's also killing the terminal that is running the sudo btrfs send/recv commands I'm using.

sys-backup_journalctl.txt
output of logging for this service. PLEASE NOTE, some of the failures were me making tweaks to my script because I noticed my backup failed once for the luks-backup-* uuid being mapped already which confused me. I now believe this occured because oomd killed my script and I didn't notice it til very recently.

sys-backup.sh / sys-backup.service
how my backups work. If you want the udev rule, I can give that as well?

Please ask if you need anything else. In the meantime, I'll try and build a better reproducible example and will likely turn systemd-oomd off for now.

Comment 1 Anthony DeDominic 2023-01-20 15:19:18 UTC
Created attachment 1939401 [details]
logs of sys-backup service.

Comment 2 Anthony DeDominic 2023-01-20 15:19:37 UTC
Created attachment 1939402 [details]
logs of systemd-oomd

Comment 3 Anthony DeDominic 2023-01-20 15:20:39 UTC
Created attachment 1939403 [details]
backup script.

Comment 4 Anthony DeDominic 2023-01-20 15:22:00 UTC
Created attachment 1939404 [details]
backup service unit

Comment 5 Anthony DeDominic 2023-01-20 15:34:38 UTC
Sorry, I forgot to mention other hardware details, if relevant:
64GiB memory
4GiB swap (on NVMe)

Comment 6 David C. Chipman 2023-01-21 16:09:02 UTC
I have noticed that OOMD kills my Xfce session when building the 6.1.7 kernel from source. Suspect we may have the same problem, but I have only 8 GB of RAM.

Comment 7 Aoife Moloney 2023-11-23 01:01:22 UTC
This message is a reminder that Fedora Linux 37 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '37'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 37 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 8 Aoife Moloney 2023-12-07 14:59:28 UTC
Fedora Linux 37 entered end-of-life (EOL) status on 2023-12-05.

Fedora Linux 37 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.