Bug 1938222
| Summary: | Incorrect symbol translation from crash shown by 'struct blk_mq_ops' | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | John Pittman <jpittman> |
| Component: | crash | Assignee: | ltao |
| Status: | NEW --- | QA Contact: | xiaoying yan <yiyan> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.3 | CC: | dwysocha, hartsjc, lijiang, ruyang |
| Target Milestone: | rc | Keywords: | Regression, Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 8
Dave Wysochanski
2022-11-10 19:48:32 UTC
Still reproduces on the latest crash as of upstream 487551488b15 ppc64: still allow to move on if the emergency stacks info fails to initialize This narrows things down a bit. It seems to me this type of issue has come up before. Lianbo, do you think this is a known issue but just hard to fix? Has a workaround so not critical but seems to come up periodically as I recall. [dwysocha@galvatron dwysocha]$ cat crashrc-bz1938222 cd /cores/retrace/tasks/726657196/results #mod -s virtio_blk /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/kernel/drivers/block/virtio_blk.ko.debug mod -S /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/ px ((struct request *)0xffff880fdb246000)->q->mq_ops #struct blk_mq_ops.queue_rq 0xffffffffc00a7160 struct blk_mq_ops 0xffffffffc00a7160 quit [dwysocha@galvatron dwysocha]$ /cores/crashext/gitlab-runner/bin/crash -i ./crashrc-bz1938222 /cores/retrace/tasks/726657196/crash/vmcore /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/vmlinux | grep queue_rq | grep -q floppy; if [ $? -eq 0 ]; then echo FAIL; else echo PASS; fi FAIL Workaround is to avoid the "mod -S" to load all symbols and just load the one with 'mod -s' [dwysocha@galvatron dwysocha]$ /cores/crashext/gitlab-runner/bin/crash -i ./crashrc-bz1938222 /cores/retrace/tasks/726657196/crash/vmcore /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/vmlinux | grep queue_rq | grep -q floppy; if [ $? -eq 0 ]; then echo FAIL; else echo PASS; fi PASS [dwysocha@galvatron dwysocha]$ cat crashrc-bz1938222 cd /cores/retrace/tasks/726657196/results mod -s virtio_blk /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/kernel/drivers/block/virtio_blk.ko.debug #mod -S /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/ px ((struct request *)0xffff880fdb246000)->q->mq_ops #struct blk_mq_ops.queue_rq 0xffffffffc00a7160 struct blk_mq_ops 0xffffffffc00a7160 quit [dwysocha@galvatron dwysocha]$ ls -l /cores/crashext/gitlab-runner/bin/crash lrwxrwxrwx. 1 gitlab-runner gss-eng-collab 55 Oct 11 08:38 /cores/crashext/gitlab-runner/bin/crash -> /cores/crashext/gitlab-runner/build/crash/crash-786e89a $ git log --oneline galvatron-prod | head 786e89a7d3ff Rebase to upstream crash 8.0.1+ until 487551488b15 ppc64: still allow to move on if the emergency stacks info fails to initialize 8b4dd899ef69 gitlab-ci: Add snappy to crash build commandline 9ab348af531f gitlab-ci: Add new galvatron-x86 machine into production 329ab120187a gitlab-ci: Add small aarch64 server to build / install 37cc515f731e gitlab-ci: Add build / install for gitlab CI both dev and prod 487551488b15 ppc64: still allow to move on if the emergency stacks info fails to initialize 3b5e3e1583a1 Let "kmem" print task context with physical address 60cb8650a012 Fix page offset issue when converting physical to virtual address ad1397a73594 Fix "kmem" failing to print task context when address is vmalloced stack 4ea3a806d11f Fix for the invalid linux_banner pointer issue Thank you for the comment, David. It can be reproduced with the "mod -S", but cannot be reproduced with the "mod -s". $ cat .crashrc #1 #mod -S /home/lijiang/bz1938222/3.10.0-693.2.2.el7.x86_64 #2 mod -s virtio /home/lijiang/bz1938222/3.10.0-693.2.2.el7.x86_64/kernel/drivers/virtio/virtio.ko.debug px ((struct request *)0xffff880fdb246000)->q->mq_ops struct blk_mq_ops 0xffffffffc00a7160 quit I would suggest using the "mod -s" as a workaround for now. It's not a critical issue and may be solved as a low priority in the future, or closed? Any thoughts? Thanks. Lianbo, normally I'd agree this probably should be closed, and it still may be the right outcome. One thing I didn't mention though in the previous comment is a driving force for investigating is how our vmcore system works. Our vmcore system by default loads all modules into crash session that are listed as loaded at the time of the crash, and workaround may not be obvious. I know similar issue has come up before too with support, though I don't know how often it does, or even if the underlying cause is the same. So I'd lean towards at least a bit of investigation before closing. I haven't started looking into the code but any idea on how complicated this might be? Is this likely to be in gdb so would need patched there, or don't you have any idea? If you don't have cycles to investigate, someone in support may be able to investigate since it impacts them and I know a few have done patches for crash-utility. Thank you, Dave. As we discussed, it's not an urgent issue, but this issue still affects CEE's vmcore system(when loading all modules into crash session with the "mod -S"), it would be good to fix it in the future. For now, I'm working on high priority issues, once those issues are completed, I will try to resolve it. But anyway, if someone is willing to work on this bug and post a patch to upstream, that will be welcome. |