Bug 2178836
| Summary: | rasdaemon spewing diskerror_eventstore messages | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Andrew Schorr <ajschorr> |
| Component: | rasdaemon | Assignee: | Aristeu Rozanski <arozansk> |
| Status: | CLOSED MIGRATED | QA Contact: | Jiri Dluhos <jdluhos> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | CentOS Stream | CC: | bstinson, jwboyer |
| Target Milestone: | rc | Keywords: | MigratedToJIRA, Triaged |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-09-25 17:25:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Andrew Schorr
2023-03-15 20:32:31 UTC
rasdaemon is the userspace for HERM which includes but is not limited to memory errors. I'll take a look on why it's generating stdout/stderr messages as well see how to improve the recorded events. Thanks. I'm happy if rasdaemon tells me about additional system errors beyond memory errors. But are these actually errors? The messages are inscrutable, and I'm not aware of any actual hardware problems with these drives. But if rasdaemon is trying to tell me about an actual problem, then I'd certainly like to understand what the issue is. Some more examples from another system: bash-5.1$ ras-mc-ctl --errors | tail 9123 2023-06-01 16:44:15 -0400 error: dev=0:2080, sector=578224128, nr_sector=32, error='unknown block error', rwbs='R', cmd='', 9124 2023-06-01 16:44:15 -0400 error: dev=0:2080, sector=578224160, nr_sector=64, error='unknown block error', rwbs='R', cmd='', 9125 2023-06-01 16:44:15 -0400 error: dev=0:2080, sector=578225016, nr_sector=160, error='unknown block error', rwbs='R', cmd='', 9126 2023-06-01 16:44:15 -0400 error: dev=0:2080, sector=562617792, nr_sector=32, error='unknown block error', rwbs='R', cmd='', 9127 2023-06-01 16:51:08 -0400 error: dev=0:0, sector=-1, nr_sector=8, error='operation not supported error', rwbs='N', cmd='', 9128 2023-06-01 16:51:12 -0400 error: dev=0:0, sector=-1, nr_sector=8, error='critical target error', rwbs='N', cmd='', 9129 2023-06-01 16:51:20 -0400 error: dev=0:0, sector=-1, nr_sector=8, error='operation not supported error', rwbs='N', cmd='', No MCE errors. bash-5.1$ How can I make sense of these error messages? Are these real or spurious problems? I don't know how to interpret them. Thanks, Andy I just upgraded another system, and rasdaemon is spewing incessant error messages. I simply have no idea what they mean. Is there any way to find out what it's trying to tell me? In the journal, I see errors like this: Aug 25 16:13:13 ti11 rasdaemon[22141]: rasdaemon: diskerror_eventstore: 0x55bb8bec8eb8 Aug 25 16:13:13 ti11 rasdaemon[22141]: rasdaemon: register inserted at db Aug 25 16:13:13 ti11 rasdaemon[22141]: <...>-660 [002] 0.009848: block_rq_complete: 2023-08-25 16:13:13 -0400 Aug 25 16:13:13 ti11 rasdaemon[22141]: rasdaemon: diskerror_eventstore: 0x55bb8bec8eb8 Aug 25 16:13:13 ti11 rasdaemon[22141]: rasdaemon: register inserted at db Aug 25 16:13:13 ti11 rasdaemon[22141]: <...>-677 [003] 0.009848: block_rq_complete: 2023-08-25 16:13:13 -0400 Aug 25 16:13:15 ti11 rasdaemon[22141]: rasdaemon: diskerror_eventstore: 0x55bb8bec8eb8 Aug 25 16:13:15 ti11 rasdaemon[22141]: rasdaemon: register inserted at db Aug 25 16:13:15 ti11 rasdaemon[22141]: <...>-660 [002] 0.009848: block_rq_complete: 2023-08-25 16:13:15 -0400 Aug 25 16:13:15 ti11 rasdaemon[22141]: rasdaemon: diskerror_eventstore: 0x55bb8bec8eb8 Aug 25 16:13:15 ti11 rasdaemon[22141]: rasdaemon: register inserted at db Aug 25 16:13:15 ti11 rasdaemon[22141]: <...>-677 [003] 0.009848: block_rq_complete: 2023-08-25 16:13:15 -0400 And "ras-mc-ctl --errors" says this: 10427 2023-08-25 16:13:13 -0400 error: dev=0:2096, sector=-1, nr_sector=0, error='I/O error', rwbs='N', cmd='', 10428 2023-08-25 16:13:13 -0400 error: dev=0:2112, sector=-1, nr_sector=0, error='I/O error', rwbs='N', cmd='', 10429 2023-08-25 16:13:15 -0400 error: dev=0:2096, sector=-1, nr_sector=0, error='I/O error', rwbs='N', cmd='', 10430 2023-08-25 16:13:15 -0400 error: dev=0:2112, sector=-1, nr_sector=0, error='I/O error', rwbs='N', cmd='', What does it mean? I'm forced to stop the rasdaemon service to avoid being buried in messages. Is there an actual problem here? How do I decode these messages? Thanks, Andy Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |