1900382 – rpm-ostree crashes on armv7hl during install transaction

Bug 1900382 - rpm-ostree crashes on armv7hl during install transaction

Summary: rpm-ostree crashes on armv7hl during install transaction

Keywords:
Status:	CLOSED DUPLICATE of bug 1906184
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	rpm-ostree
Sub Component:
Version:	33
Hardware:	armv7hl
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Colin Walters
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-22 17:25 UTC by jerome
Modified:	2020-12-09 21:15 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-12-09 21:15:54 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description jerome 2020-11-22 17:25:43 UTC

Description of problem:

[root@localhost ~]# rpm-ostree install cockpit cockpit-podman cockpit-storaged cockpit-dashboard cockpit-ostree
Checking out tree 5947c35... done
Enabled rpm-md repositories: fedora updates fedora-cisco-openh264
rpm-md repo 'fedora' (cached); generated: 2020-10-19T23:26:56Z
rpm-md repo 'updates' (cached); generated: 2020-11-22T00:51:17Z
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2020-08-25T19:10:34Z
Importing rpm-md... done
Resolving dependencies... done
Checking out packages... done
Running pre scripts... done
error: Bus owner changed, aborting. This likely means the daemon crashed; check logs with `journalctl -xe`.
[root@localhost ~]# coredumpctl info
           PID: 3513 (rpm-ostree)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Sun 2020-11-22 17:07:13 UTC (43s ago)
  Command Line: /usr/bin/rpm-ostree start-daemon
    Executable: /usr/bin/rpm-ostree
 Control Group: /system.slice/rpm-ostreed.service
          Unit: rpm-ostreed.service
         Slice: system.slice
       Boot ID: c349b1eb7e1246b3b237981757aee70e
    Machine ID: 4d1fdddf42234ef6a3f89e72dfda9354
      Hostname: localhost.localdomain
       Storage: /var/lib/systemd/coredump/core.rpm-ostree.0.c349b1eb7e1246b3b237981757aee70e.3513.1606064833000000.zst
       Message: Process 3513 (rpm-ostree) of user 0 dumped core.
                
                Stack trace of thread 4160:
                #0  0x00000000b62f3d30 strchrnul (libc.so.6 + 0x80d30)


Version-Release number of selected component (if applicable):
Fedora 33.20201119.0 (IoT Edition)

coredumpctl available here : https://easyupload.io/3vfksw (available for 30 days)

Comment 1 Colin Walters 2020-11-22 18:20:34 UTC

What's the output of `rpm -q rpm-ostree` ?

This is a likely dup of https://bugzilla.redhat.com/show_bug.cgi?id=1890577

Comment 2 jerome 2020-11-22 18:24:47 UTC

[root@localhost ~]# rpm -q rpm-ostree
warning: Found bdb Packages database while attempting sqlite backend: using bdb backend.
rpm-ostree-2020.8-1.fc33.armv7hl

Comment 3 Luca BRUNO 2020-11-23 09:00:00 UTC

Thanks for the report. It looks like this may be a different issue from what has been fixed in 2020.8.

Jérôme, can you please follow https://fedoraproject.org/wiki/StackTraces and report back the output of `thread apply all bt full` from a gdb session using the coredump above?

Comment 4 james 2020-12-01 12:04:57 UTC

I have the same issue when attempting to install fail2ban on a raspberry pi 2b (armv7).
Same output of rpm -q rpm-ostree as above, and same output of coredumpctl info.
I check journalctl -xe as advised and systemd-coredump reports process xxx (rpm-ostree) of user 0 dumped core:
Stack trace of thread 2382:
#0 0x000000000b62b5d30 strchrnul (libc.so.6 + 0x80d30)

I am unable to install gdb as the rpm-ostree fails with the same error/stack trace etc

Comment 5 Jonathan Lebon 2020-12-01 20:12:40 UTC

Can you install gdb and the same rpm-ostree version + debuginfo in a privileged container with bind mounts so that you can transfer the core dump and try there?
Otherwise, you can also use `rpm-ostree usroverlay` and then installing `gbd-minimal` directly by RPM. (I think there's one other dep you'd also need to fetch by hand.)

Comment 6 Piotr Rogowski 2020-12-02 14:43:04 UTC

I'm experiencing exact same issue (0x00000000b62a0d60 strchrnul (libc.so.6 + 0x80d60))

I prepared gdb in container and there is backtrace generated:

Core was generated by `/usr/bin/rpm-ostree start-daemon'.                                                                                                                                                                                                                        
Program terminated with signal SIGSEGV, Segmentation fault.                                                                                                                                                                                                                      
(gdb) bt
#0  0xb6350d30 in __argz_create_sep (string=<optimized out>, delim=2, argz=0xb38fe008, len=0xb6421e18) at argz-ctsep.c:47
#1  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) frame
#0  0xb6350d30 in __argz_create_sep (string=<optimized out>, delim=2, argz=0xb38fe008, len=0xb6421e18) at argz-ctsep.c:47
47                    --nlen;
(gdb) x/50x $sp
0xb38fdb00:     0xb38fdf42      0x00000001      0x00000001      0xb64ed754
0xb38fdb10:     0xb38fdf52      0x00000001      0xb38fdb3c      0xb6504cec
0xb38fdb20:     0x00000000      0x00000000      0x00000000      0x00000000
0xb38fdb30:     0x00000000      0x00000003      0x000005e8      0xb6422c78
0xb38fdb40:     0x00000000      0x00000000      0x00000020      0x00000000
0xb38fdb50:     0x00000000      0x00000000      0x00000000      0xb6421e18
0xb38fdb60:     0x00000000      0x00000002      0x0052e82d      0x00001034
0xb38fdb70:     0xb38ff580      0x00000000      0xffffffff      0x00000000
0xb38fdb80:     0xb640403c      0x00000000      0x00000000      0xb6327d90
0xb38fdb90:     0x00000011      0x0052e83e      0xb38fe39c      0xb6543e45
0xb38fdba0:     0xb642320c      0x00000001      0xb38fe1c4      0xb6328108
0xb38fdbb0:     0x00000000      0x00000000      0x00000000      0x00000000
0xb38fdbc0:     0x00000000      0x00000000
(gdb) info locals
rp = 0xfbad8000 <error: Cannot access memory at address 0xfbad8000>
wp = <optimized out>
nlen = 3012551580


it looks like there is some buffer overflow corrupting stack.
In my case it is caused by installing k3s selinux: https://rpm.rancher.io/k3s/stable/common/centos/7/noarch/k3s-selinux-0.2-1.el7_8.noarch.rpm

Comment 7 Piotr Rogowski 2020-12-02 15:23:37 UTC

I just found in previous comment request for this:

(gdb) thread apply all bt full

Thread 4 (Thread 0xb4cfd040 (LWP 847)):
#0  0xb63a54e4 in internal_fallocate64 (len=-5462859712275939329, offset=4294967295, fd=1) at ../sysdeps/posix/posix_fallocate64.c:36
        st = {st_dev = 2, __pad1 = 0, __st_ino = 2147483647, st_mode = 4294967295, st_nlink = 3670904576, st_uid = 11806224, st_gid = 3059451596, st_rdev = 754823547118, __pad2 = 0, st_size = -5310270675074856684, st_blksize = -1261450496, st_blocks = -5310232845147732544, st_atim = {tv_sec = 11814408, tv_nsec = 11785064}, st_mtim = {tv_sec = -1090696978, tv_nsec = -1236178984}, st_ctim = {tv_sec = 0, tv_nsec = 0}, st_ino = 13028863522405089280}
        increment = <optimized out>
#1  __GI___posix_fallocate64_l64 (fd=1, offset=<optimized out>, len=0) at ../sysdeps/unix/sysv/linux/posix_fallocate64.c:37
        res = -1271921144
#2  0x00000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (Thread 0xb42ff040 (LWP 848)):
#0  0xb63a54e4 in internal_fallocate64 (len=52713615387525119, offset=4294967295, fd=1) at ../sysdeps/posix/posix_fallocate64.c:36
        st = {st_dev = 3, __pad1 = 11865296, __st_ino = 2147483647, st_mode = 4294967295, st_nlink = 3670904576, st_uid = 11865456, st_gid = 11865460, st_rdev = 50961783036135942, __pad2 = 0, st_size = -5310224031730021844, st_blksize = 11865296, st_blocks = 50743217547978744, st_atim = {tv_sec = 11865272, tv_nsec = -1090696698}, st_mtim = {tv_sec = 175, tv_nsec = -1233994756}, st_ctim = {tv_sec = 11814576, tv_nsec = 11872248}, st_ino = 13137395768631314950}
        increment = <optimized out>
#1  __GI___posix_fallocate64_l64 (fd=1, offset=<optimized out>, len=0) at ../sysdeps/unix/sysv/linux/posix_fallocate64.c:37
        res = 12273344
#2  0x00000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0xb4cfe020 (LWP 846)):
#0  0xb63a54e4 in internal_fallocate64 (len=-5507796649423929345, offset=4294967295, fd=1) at ../sysdeps/posix/posix_fallocate64.c:36
        st = {st_dev = 23369724511387651, __pad1 = 0, __st_ino = 2147483647, st_mode = 4294967295, st_nlink = 3670904576, st_uid = 11818592, st_gid = 1, st_rdev = 50894194226495489, __pad2 = 32, st_size = -5310270678279127040, st_blksize = 1, st_blocks = 22543235376498408, st_atim = {tv_sec = 11785272, tv_nsec = 4994756}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = -1236347052}, st_ino = 140643224740}
        increment = <optimized out>
#1  __GI___posix_fallocate64_l64 (fd=1, offset=<optimized out>, len=-4684501980372918760) at ../sysdeps/unix/sysv/linux/posix_fallocate64.c:37
        res = -1282383840
#2  0xb62eb1dc in __libc_start_main (main=0xbefd4e44, argc=-1237180904, argv=0xb62eb1dc <__libc_start_main+344>, init=<optimized out>, fini=0x5234c8 <__libc_csu_fini>, rtld_fini=0xb6fc909c <_dl_fini>, stack_end=0xbefd4e44) at libc-start.c:320
        __p = <optimized out>
        ptr = <optimized out>
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-1725690856, -1846021788, 5387368, 0, 4890756, 0, 0, 0, 5582608, 0 <repeats 17 times>, 119, 1, -1090695600, 35032, 520, 11802408, -1261443744, -1237178980, -1090695600, 0, 0, -1238065408, 0, -1237179604, 11780216, -1227823300, -1237180904, -1238349752, -1237097072, -1237174488, -1227839572, -1238349576, -1227823392, -1227823328, -1227823392, -1227823296, -1244734276, -1237174488, -1227839572, -1238349576, -1090695600, 0, -1225571436, -1226043696, -1225608956, -1226043696, 2, -1224962476}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0xb6febfb4, 0xbefd4e50}, data = {prev = 0x0, cleanup = 0x0, canceltype = -1224818764}}}
        not_first_call = -1282383840
#3  0x004aa0c8 in _start ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xb38ff040 (LWP 2160)):
#0  0xb6350d30 in __argz_create_sep (string=<optimized out>, delim=2, argz=0xb38fe008, len=0xb6421e18) at argz-ctsep.c:47
        rp = 0xfbad8000 <error: Cannot access memory at address 0xfbad8000>
        wp = <optimized out>
        nlen = 3012551580
#1  0x00000000 in ?? ()
No symbol table info available.

Comment 8 james 2020-12-09 17:52:57 UTC

Hi, I may have logged in with different credentials.  I am james, who tried to install fail2ban.

I installed gdb-headless using rpm-ostree ok, it is a package that is a dependency for gdb.
I then used rpm-ostree usroverlay and installed gdb after wgeting the package.
I ran rpm-ostree install fail2ban, and it crapped out as usual with the complaint that the bus owner had changed.
I then ran gdb and used the command bt, but there was no backtrace.
I su'd to root, and ran gdb with file set to rpm-ostree and ran the command run install fail2ban.  The command ran and crapped out as usual, and then I ran the command for bt to obtain a backtrace, but was informed - no stack.
I am inexperienced with gdb.  I wonder if I perhaps need to install debug symbols?

I have also opened a bug report for fedora-iot - https://pagure.io/fedora-iot/issue/38

Comment 9 Jonathan Lebon 2020-12-09 21:15:54 UTC

Paul got a good backtrace in 1906184 for this strchrnul fault. Closing as dupe of that one.

*** This bug has been marked as a duplicate of bug 1906184 ***

Note You need to log in before you can comment on or make changes to this bug.