Bug 1804993

Summary: abrt-dump-journal-xorg: Cannot read journal data
Product: [Fedora] Fedora Reporter: Chris Murphy <bugzilla>
Component: abrtAssignee: ekulik
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: abrt-devel-list, bugzilla, ekulik, fzatlouk, jakub, jmilan, mhabrnal, michal.toman, mkutlak, mmarusak, robatino
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedBlocker
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-28 07:17:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1705305    
Attachments:
Description Flags
journal none

Description Chris Murphy 2020-02-20 03:35:57 UTC
Description of problem:

abrt-xorg.service fails to start


Version-Release number of selected component (if applicable):
$ rpm -qa | grep abrt | sort -n
abrt-2.14.0-2.fc32.x86_64
abrt-addon-ccpp-2.14.0-2.fc32.x86_64
abrt-addon-kerneloops-2.14.0-2.fc32.x86_64
abrt-addon-pstoreoops-2.14.0-2.fc32.x86_64
abrt-addon-vmcore-2.14.0-2.fc32.x86_64
abrt-addon-xorg-2.14.0-2.fc32.x86_64
abrt-cli-2.14.0-2.fc32.x86_64
abrt-dbus-2.14.0-2.fc32.x86_64
abrt-desktop-2.14.0-2.fc32.x86_64
abrt-gui-2.14.0-2.fc32.x86_64
abrt-gui-libs-2.14.0-2.fc32.x86_64
abrt-java-connector-1.1.4-1.fc32.x86_64
abrt-libs-2.14.0-2.fc32.x86_64
abrt-plugin-bodhi-2.14.0-2.fc32.x86_64
abrt-retrace-client-2.14.0-2.fc32.x86_64
abrt-tui-2.14.0-2.fc32.x86_64
gnome-abrt-1.3.1-1.fc32.x86_64
python3-abrt-2.14.0-2.fc32.x86_64
python3-abrt-addon-2.14.0-2.fc32.noarch


How reproducible:
Always


Steps to Reproduce:
1.
2.
3.

Actual results:

[chris@vm ~]$ systemctl --failed
  UNIT              LOAD   ACTIVE SUB    DESCRIPTION          
● abrt-xorg.service loaded failed failed ABRT Xorg log watcher

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

1 loaded units listed.
[chris@vm ~]$ systemctl status abrt-xorg.service
● abrt-xorg.service - ABRT Xorg log watcher
     Loaded: loaded (/usr/lib/systemd/system/abrt-xorg.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2020-02-19 20:24:35 MST; 2min 0s ago
    Process: 818 ExecStart=/usr/bin/abrt-dump-journal-xorg -fxtD (code=exited, status=1/FAILURE)
   Main PID: 818 (code=exited, status=1/FAILURE)
        CPU: 33ms

Feb 19 20:24:35 vm.local systemd[1]: Started ABRT Xorg log watcher.
Feb 19 20:24:35 vm.local abrt-dump-journal-xorg[818]: abrt-dump-journal-xorg: Cannot read journal data.
Feb 19 20:24:35 vm.local systemd[1]: abrt-xorg.service: Main process exited, code=exited, status=1/FAILURE
Feb 19 20:24:35 vm.local systemd[1]: abrt-xorg.service: Failed with result 'exit-code'.
[chris@vm ~]$ 



Expected results:

Service units enabled by default should not fail.


Additional info:

enforcing=0 so selinux isn't related

Comment 1 Fedora Blocker Bugs Application 2020-02-20 03:37:11 UTC
Proposed as a Blocker for 32-final by Fedora user chrismurphy using the blocker tracking app because:

 Final: All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present.

Comment 2 Chris Murphy 2020-02-20 03:42:17 UTC
Created attachment 1664246 [details]
journal

Comment 3 František Zatloukal 2020-02-24 20:37:36 UTC
Discussed during the 2020-02-24 blocker review meeting: [1]

The decision to classify this bug as an AcceptedBlocker was made:

"All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present."

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2020-02-24/f32-blocker-review.2020-02-24-17.00.log.txt

Comment 4 ekulik 2020-02-25 07:55:48 UTC
I’ll take a look.

Comment 5 ekulik 2020-02-25 08:38:27 UTC
I am unable to reproduce the issue on Fedora-Workstation-Live-x86_64-32-20200224.n.0.

Comment 6 ekulik 2020-02-25 08:52:03 UTC
And the service is running after installation without me touching anything. I’m going to close this unless you can give me a clear reproducer.

Comment 7 ekulik 2020-02-25 08:58:24 UTC
(In reply to Chris Murphy from comment #2)
> Created attachment 1664246 [details]
> journal

This is full of noise.

Comment 8 ekulik 2020-02-25 09:31:16 UTC
Please take a look at https://bugzilla.redhat.com/show_bug.cgi?id=1433652#c13. Could be an issue with your storage configuraiton.

Comment 9 ekulik 2020-02-28 07:17:27 UTC
I’ll add a better error message to suggest verifying the state of the journal, but this cannot be blocked on without information from you.

Comment 10 Chris Murphy 2020-02-28 17:34:06 UTC
I can't reproduce with Fedora-Workstation-Live-x86_64-32-20200226.n.0.iso on baremetal. I did get two reproducers in VMs over a week ago. Clean install, reboot, and the unit fails.

re: bug 1433652#c13, I don't have a corrupt journal in this case, but if there were a corrupt journal I think abrt-xorg.service needs to be able to handle it gracefully, rather than the service unit failing. Journal corrupt is less common these days but it's still an ordinary event. Systemd devs have said it's not a big deal, the file can still be read, just the corrupt records are skipped over, and the journal is rotated out anyway.

If I find a more consistent reproducer, I'll reopen.

Comment 11 ekulik 2020-03-05 16:01:03 UTC
I can suggest adding -v, -vv or -vvv to the service unit and seeing what kind of error libsystemd is returning (genius idea to print the error at a higher log level, so it doesn’t normally appear), because it seems to be happening on that side exclusively. We only try to get some data from it and, if it returns an error, then it returns an error and we can only bail out. If we can narrow down the error, maybe we can add some fail-safes and not explode, but I do not have a reliable reproducer.

Comment 12 Chris Murphy 2020-03-05 19:13:02 UTC
Merely having a system and/or user journal corrupt, doesn't trigger this...


$ sudo journalctl --verify
[sudo] password for chris: 
Sorry, try again.
[sudo] password for chris: 
PASS: /var/log/journal/26336922e1044e80ae4bd42e1d6b9099/system~                             
4baf58: Invalid entry item (7/32 offset: 000000                                                                                       
4baf58: Invalid object contents: Bad message                                                                                          
File corruption detected at /var/log/journal/26336922e1044e80ae4bd42e1d6b9099/user-1000~:4baf58 (of 8388608 bytes, 59%).
FAIL: /var/log/journal/26336922e1044e80ae4bd42e1d6b9099/user-1000~ (Bad message)
2f76f00: Invalid entry item (12/27 offset: 000000                                                                                     
2f76f00: Invalid object contents: Bad message                                                                                         
File corruption detected at /var/log/journal/26336922e1044e80ae4bd42e1d6b9099/system~:2f76f00 (of 50331648 bytes, 98%).
FAIL: /var/log/journal/26336922e1044e80ae4bd42e1d6b9099/system~ (Bad message)
PASS: /var/log/journal/26336922e1044e80ae4bd42e1d6b9099/system.journal                                                                
PASS: /var/log/journal/26336922e1044e80ae4bd42e1d6b9099/user-1000~                          
PASS: /var/log/journal/26336922e1044e80ae4bd42e1d6b9099/user-1000.journal                                                             
$ systemctl status abrt-xorg.service
● abrt-xorg.service - ABRT Xorg log watcher
     Loaded: loaded (/usr/lib/systemd/system/abrt-xorg.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2020-03-05 12:08:52 MST; 1min 20s ago
   Main PID: 764 (abrt-dump-journ)
      Tasks: 1 (limit: 3459)
     Memory: 19.5M
        CPU: 36ms
     CGroup: /system.slice/abrt-xorg.service
             └─764 /usr/bin/abrt-dump-journal-xorg -fxtD

Mar 05 12:08:52 localhost.localdomain systemd[1]: Started ABRT Xorg log watcher.
$ 


It's fairly trivial to corrupt a systemd journal, just by forcing power off on a VM. Such sabotage probably makes any discovered bug not a blocker, but still a valid way of checking for proper error handling. Yet I still can't trigger it.

Comment 13 ekulik 2020-03-06 07:19:43 UTC
Okay, so I’ll just go through the code path and bump the log level for error messages, so we can get more information the next time this happens.