2277537 – frequent core dumps of smartctl at boot time

Bug 2277537 - frequent core dumps of smartctl at boot time

Summary: frequent core dumps of smartctl at boot time

Keywords:
Status:	CLOSED DUPLICATE of bug 2324086
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	smartmontools
Sub Component:
Version:	41
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Michal Hlavinka
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-04-27 20:18 UTC by M. Schlegel
Modified:	2024-11-05 23:52 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2024-11-05 23:52:40 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description M. Schlegel 2024-04-27 20:18:01 UTC

I used coredumpctl list to review my Fedora 40 core dumps. I found many repeated coredumps from smartctl from smartmontools-7.4-3.fc40.x86_64

This is a the latest 'coredumpctl info' listing:

           PID: 2600 (smartctl)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Sat 2024-04-27 13:08:59 EDT (2h 57min ago)
  Command Line: /usr/sbin/smartctl --all /dev/nvme0n1
    Executable: /usr/sbin/smartctl
 Control Group: /system.slice/system-dbus\x2d:1.3\x2dorg.kde.kded.smart.slice/dbus-:1.3-org.kde.kded.smart
          Unit: dbus-:1.3-org.kde.kded.smart
         Slice: system-dbus\x2d:1.3\x2dorg.kde.kded.smart.slice
       Boot ID: 2a7263aecd1a41eea19c7831a678d1b4
    Machine ID: 00ecfe8d976c4992b66770980e8d368a
      Hostname: msi
       Storage: /var/lib/systemd/coredump/core.smartctl.0.2a7263aecd1a41eea19c7831a678d1b4.2600.1714237739000000.zst (inaccessible)
       Message: Process 2600 (smartctl) of user 0 dumped core.
                
                Module libpcre2-8.so.0 from rpm pcre2-10.42-2.fc40.2.x86_64
                Module libselinux.so.1 from rpm libselinux-3.6-4.fc40.x86_64
                Stack trace of thread 2600:
                #0  0x00007f37905c8144 __pthread_kill_implementation (libc.so.6 + 0x98144)
                #1  0x00007f379057065e raise (libc.so.6 + 0x4065e)
                #2  0x00007f3790558902 abort (libc.so.6 + 0x28902)
                #3  0x00007f3790559767 __libc_message_impl.cold (libc.so.6 + 0x29767)
                #4  0x00007f37905d2175 malloc_printerr (libc.so.6 + 0xa2175)
                #5  0x00007f37905d450c _int_free (libc.so.6 + 0xa450c)
                #6  0x00007f37905d6dce free (libc.so.6 + 0xa6dce)
                #7  0x0000562a8d42a8f1 _ZN14drive_databaseD1Ev (smartctl + 0x658f1)
                #8  0x00007f3790572bb1 __run_exit_handlers (libc.so.6 + 0x42bb1)
                #9  0x00007f3790572c7e exit (libc.so.6 + 0x42c7e)
                #10 0x00007f379055a08f __libc_start_call_main (libc.so.6 + 0x2a08f)
                #11 0x00007f379055a14b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a14b)
                #12 0x0000562a8d3f6745 _start (smartctl + 0x31745)
                ELF object binary architecture: AMD x86-64
The drive info from inxi -D is:

inxi -aD
Drives:
  Local Storage: total: 953.87 GiB used: 71.86 GiB (7.5%)
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: A-Data model: SX6000LNP
    size: 953.87 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 tech: SSD serial: 2K2020091093 fw-rev: V9002s45 temp: 29.9 C
    scheme: GPT
  SMART: yes health: PASSED on: 35d 16h cycles: 1,543
    read-units: 6,104,462 [3.12 TB] written-units: 10,949,869 [5.60 TB]



Reproducible: Sometimes

Steps to Reproduce:
1.Have smartmontools package installed on host
2.Boot Fedora 40 host

Actual Results:  
1. smartctl crashes leaving core information

Expected Results:  
1. smartctl should not crash

Crashes occur for normal boots but an exception seen in the newest five boots was the short boot for doing a 'dnf5 offline reboot' seen at journalctl --list-boots for "-3"
-3 5de9b0cf2f064d659eb349ea6df447f7 Fri 2024-04-26 13:12:38 EDT

# coredumpctl list |tail -5
Thu 2024-04-25 17:47:49 EDT  2441 1000 1000 SIGABRT present  /usr/bin/plasmashell                 27.0M
Fri 2024-04-26 12:24:18 EDT  2682    0    0 SIGABRT present  /usr/sbin/smartctl                  190.1K
Fri 2024-04-26 13:17:08 EDT  2606    0    0 SIGABRT present  /usr/sbin/smartctl                  190.3K
Fri 2024-04-26 17:27:35 EDT  2524    0    0 SIGABRT present  /usr/sbin/smartctl                  190.3K
Sat 2024-04-27 13:08:59 EDT  2600    0    0 SIGABRT present  /usr/sbin/smartctl                  190.5K

# journalctl --list-boots |tail -5
 -4 90a57b349c584bdfa15aa57c17ac2b33 Fri 2024-04-26 12:23:40 EDT Fri 2024-04-26 13:12:25 EDT
 -3 5de9b0cf2f064d659eb349ea6df447f7 Fri 2024-04-26 13:12:38 EDT Fri 2024-04-26 13:16:18 EDT
 -2 7123b1d0571349dea5df5f4093b06ea9 Fri 2024-04-26 13:16:31 EDT Fri 2024-04-26 14:58:10 EDT
 -1 6be890c9068c416abbd69d3925f87b45 Fri 2024-04-26 17:27:01 EDT Fri 2024-04-26 18:44:06 EDT
  0 2a7263aecd1a41eea19c7831a678d1b4 Sat 2024-04-27 13:08:20 EDT Sat 2024-04-27 16:08:03 EDT

Comment 1 M. Schlegel 2024-04-27 20:23:56 UTC

I suspect the dnf offline reboot short boot to update packages at Fri 2024-04-26 13:12:38 EDT doesn't show a crash in smartctl because it skips running service smartd.service because 'dnf5 offline reboot' does a minimal boot->packages upgrade->full boot.
So the crash was the full boot at "-2" 2024-04-26 13:16:31 EDT

Comment 2 M. Schlegel 2024-05-12 03:33:42 UTC

I ran the same command after bootup is completed, the core dump doesn't occur if I run this manually:

sudo /usr/sbin/smartctl --all /dev/nvme0n1

Comment 3 M. Schlegel 2024-06-27 15:51:19 UTC

Constant crashes of this are relatively annoying, I've raised this to 'high' and on my system with the crash I've temporarily turned off  'smartd' service.

Comment 4 M. Schlegel 2024-10-01 14:53:54 UTC

I've bumped this up to Fedora 41 and smartmontools-7.4-6.fc41.x86_64 since I've confirmed this also happens in the latest Fedora 41 prebeta

Comment 5 M. Schlegel 2024-10-01 14:56:18 UTC

The "coredumpctl info" for Fedora 41, kernel 6.11.0-63.fc41.x86_64 and smartmontools-7.4-6 shows:

           PID: 2338 (smartctl)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Tue 2024-10-01 10:28:21 EDT (27min ago)
  Command Line: /usr/sbin/smartctl --all /dev/nvme0n1
    Executable: /usr/sbin/smartctl
 Control Group: /system.slice/system-dbus\x2d:1.3\x2dorg.kde.kded.smart.slice/dbus-:1.3-org.kde.kded.smart
          Unit: dbus-:1.3-org.kde.kded.smart
         Slice: system-dbus\x2d:1.3\x2dorg.kde.kded.smart.slice
       Boot ID: c2d5998fcf6448889e7e4e2120e6e170
    Machine ID: 606b1acf646545ed8a19c9cf0245d31e
      Hostname: msi
       Storage: /var/lib/systemd/coredump/core.smartctl.0.c2d5998fcf6448889e7e4e2120e6e170.2338.1727792901000000.zst (inaccessible)
       Message: Process 2338 (smartctl) of user 0 dumped core.
                
                Module libpcre2-8.so.0 from rpm pcre2-10.44-1.fc41.1.x86_64
                Module libselinux.so.1 from rpm libselinux-3.7-5.fc41.x86_64
                Stack trace of thread 2338:
                #0  0x00007fc9dd651724 __pthread_kill_implementation (libc.so.6 + 0x72724)
                #1  0x00007fc9dd5f8d0e raise (libc.so.6 + 0x19d0e)
                #2  0x00007fc9dd5e0942 abort (libc.so.6 + 0x1942)
                #3  0x00007fc9dd5e17a7 __libc_message_impl.cold (libc.so.6 + 0x27a7)
                #4  0x00007fc9dd65b8a5 malloc_printerr (libc.so.6 + 0x7c8a5)
                #5  0x00007fc9dd65dcdc _int_free (libc.so.6 + 0x7ecdc)
                #6  0x00007fc9dd66060e free (libc.so.6 + 0x8160e)
                #7  0x00005594d11e5de1 _ZN14drive_databaseD1Ev (smartctl + 0x65de1)
                #8  0x00007fc9dd5fb461 __run_exit_handlers (libc.so.6 + 0x1c461)
                #9  0x00007fc9dd5fb52e exit (libc.so.6 + 0x1c52e)
                #10 0x00007fc9dd5e224f __libc_start_call_main (libc.so.6 + 0x324f)
                #11 0x00007fc9dd5e230b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x330b)
                #12 0x00005594d11b1755 _start (smartctl + 0x31755)
                ELF object binary architecture: AMD x86-64

Comment 6 Michal Hlavinka 2024-10-02 09:44:05 UTC

What are the steps to reproduce this exactly? There is smartmontools service calle smartd.service which runs /usr/sbin/smartd but in the report it says /usr/bin/smartctl so something else must be starting it

Comment 7 M. Schlegel 2024-10-02 12:27:11 UTC

1.  sudo systemctl enable smartd.service

is all I needed to reproduce it (having smartd service enabled)

Comment 8 Michal Hlavinka 2024-10-14 21:30:35 UTC

smartd service runs smartd executable, not smartctl so it can't originate from that

looking at the report, this line:
> Control Group: /system.slice/system-dbus\x2d:1.3\x2dorg.kde.kded.smart.slice/dbus-:1.3-org.kde.kded.smart

indicates that this has something to do with KDE, after some checking, file
/usr/share/dbus-1/system-services/org.kde.kded.smart.service

comes from plasma-disks package. My guess is that plasma-disks tries to do something that is not possible with selinux policy, unfortunately the core dump is not provide too much information. 

Is there any smartmontools related information in journalctl log?

You may also try to boot with selinux in permissive mode and see if the crash happens again or not.

Comment 9 M. Schlegel 2024-10-14 21:51:31 UTC

>  Is there any smartmontools related information in journalctl log?

sure:

journalctl -b 0 --no-pager -g smart

Oct 14 13:56:25 msi audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-:1.3-org.kde.kded.smart@0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 14 13:56:25 msi audit[2330]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=2330 comm="smartctl" exe="/usr/sbin/smartctl" sig=6 res=1
Oct 14 13:56:25 msi systemd-coredump[2346]: [🡕] Process 2330 (smartctl) of user 0 dumped core.
                                            
                                            Module libpcre2-8.so.0 from rpm pcre2-10.44-1.fc41.1.x86_64
                                            Module libselinux.so.1 from rpm libselinux-3.7-5.fc41.x86_64
                                            Stack trace of thread 2330:
                                            #0  0x00007fa8b699a724 __pthread_kill_implementation (libc.so.6 + 0x72724)
                                            #1  0x00007fa8b6941d0e raise (libc.so.6 + 0x19d0e)
                                            #2  0x00007fa8b6929942 abort (libc.so.6 + 0x1942)
                                            #3  0x00007fa8b692a7a7 __libc_message_impl.cold (libc.so.6 + 0x27a7)
                                            #4  0x00007fa8b69a48a5 malloc_printerr (libc.so.6 + 0x7c8a5)
                                            #5  0x00007fa8b69a6cdc _int_free (libc.so.6 + 0x7ecdc)
                                            #6  0x00007fa8b69a960e free (libc.so.6 + 0x8160e)
                                            #7  0x000055b95189cde1 _ZN14drive_databaseD1Ev (smartctl + 0x65de1)
                                            #8  0x00007fa8b6944461 __run_exit_handlers (libc.so.6 + 0x1c461)
                                            #9  0x00007fa8b694452e exit (libc.so.6 + 0x1c52e)
                                            #10 0x00007fa8b692b24f __libc_start_call_main (libc.so.6 + 0x324f)
                                            #11 0x00007fa8b692b30b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x330b)
                                            #12 0x000055b951868755 _start (smartctl + 0x31755)
                                            ELF object binary architecture: AMD x86-64
Oct 14 13:56:26 msi abrt-notification[2704]: [🡕] Process 2921 (smartctl) crashed in ??()
Oct 14 13:56:35 msi audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-:1.3-org.kde.kded.smart@0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

Comment 10 M. Schlegel 2024-11-05 23:17:39 UTC

I've stopped accumlating core dumps by simply removing the plasma-disks package.  So you're right it's not related to smartd

Comment 11 M. Schlegel 2024-11-05 23:52:40 UTC

I will close this and open a new bug under the correct related component of "plasma-disks" at new bug:  https://bugzilla.redhat.com/show_bug.cgi?id=2324086

*** This bug has been marked as a duplicate of bug 2324086 ***

Note You need to log in before you can comment on or make changes to this bug.