Bug 1938222

Summary: Incorrect symbol translation from crash shown by 'struct blk_mq_ops'
Product: Red Hat Enterprise Linux 8 Reporter: John Pittman <jpittman>
Component: crashAssignee: ltao
Status: NEW --- QA Contact: xiaoying yan <yiyan>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.3CC: dwysocha, hartsjc, lijiang, ruyang
Target Milestone: rcKeywords: Regression, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 8 Dave Wysochanski 2022-11-10 19:48:32 UTC
Cased closed > 1 year ago.  No work on this bug in > 1 year.

John is this still an issue or can this be closed (maybe INSUFFICIENT_DATA)?  Maybe it's fixed in upstream crash or only reproducible on a small set of vmcores / kernels?

Comment 9 Dave Wysochanski 2022-11-19 08:30:20 UTC
Still reproduces on the latest crash as of upstream 487551488b15 ppc64: still allow to move on if the emergency stacks info fails to initialize
This narrows things down a bit.  It seems to me this type of issue has come up before.  Lianbo, do you think this is a known issue but just hard to fix?
Has a workaround so not critical but seems to come up periodically as I recall.


[dwysocha@galvatron dwysocha]$ cat crashrc-bz1938222 
cd /cores/retrace/tasks/726657196/results
#mod -s virtio_blk /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/kernel/drivers/block/virtio_blk.ko.debug
mod -S /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/
px ((struct request *)0xffff880fdb246000)->q->mq_ops
#struct blk_mq_ops.queue_rq 0xffffffffc00a7160
struct blk_mq_ops 0xffffffffc00a7160
quit
[dwysocha@galvatron dwysocha]$ /cores/crashext/gitlab-runner/bin/crash -i ./crashrc-bz1938222 /cores/retrace/tasks/726657196/crash/vmcore /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/vmlinux | grep queue_rq | grep -q floppy; if [ $? -eq 0 ]; then echo FAIL; else echo PASS; fi
FAIL


Workaround is to avoid the "mod -S" to load all symbols and just load the one with 'mod -s'


[dwysocha@galvatron dwysocha]$ /cores/crashext/gitlab-runner/bin/crash -i ./crashrc-bz1938222 /cores/retrace/tasks/726657196/crash/vmcore /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/vmlinux | grep queue_rq | grep -q floppy; if [ $? -eq 0 ]; then echo FAIL; else echo PASS; fi
PASS
[dwysocha@galvatron dwysocha]$ cat crashrc-bz1938222 
cd /cores/retrace/tasks/726657196/results
mod -s virtio_blk /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/kernel/drivers/block/virtio_blk.ko.debug
#mod -S /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/
px ((struct request *)0xffff880fdb246000)->q->mq_ops
#struct blk_mq_ops.queue_rq 0xffffffffc00a7160
struct blk_mq_ops 0xffffffffc00a7160
quit

[dwysocha@galvatron dwysocha]$ ls -l /cores/crashext/gitlab-runner/bin/crash
lrwxrwxrwx. 1 gitlab-runner gss-eng-collab 55 Oct 11 08:38 /cores/crashext/gitlab-runner/bin/crash -> /cores/crashext/gitlab-runner/build/crash/crash-786e89a

$ git log --oneline galvatron-prod | head
786e89a7d3ff Rebase to upstream crash 8.0.1+ until 487551488b15 ppc64: still allow to move on if the emergency stacks info fails to initialize
8b4dd899ef69 gitlab-ci: Add snappy to crash build commandline
9ab348af531f gitlab-ci: Add new galvatron-x86 machine into production
329ab120187a gitlab-ci: Add small aarch64 server to build / install
37cc515f731e gitlab-ci: Add build / install for gitlab CI both dev and prod
487551488b15 ppc64: still allow to move on if the emergency stacks info fails to initialize
3b5e3e1583a1 Let "kmem" print task context with physical address
60cb8650a012 Fix page offset issue when converting physical to virtual address
ad1397a73594 Fix "kmem" failing to print task context when address is vmalloced stack
4ea3a806d11f Fix for the invalid linux_banner pointer issue

Comment 10 lijiang 2022-11-21 05:14:43 UTC
Thank you for the comment, David.

It can be reproduced with the "mod -S", but cannot be reproduced with the "mod -s".

$ cat .crashrc
#1
#mod -S /home/lijiang/bz1938222/3.10.0-693.2.2.el7.x86_64
#2
mod -s virtio /home/lijiang/bz1938222/3.10.0-693.2.2.el7.x86_64/kernel/drivers/virtio/virtio.ko.debug

px ((struct request *)0xffff880fdb246000)->q->mq_ops
struct blk_mq_ops 0xffffffffc00a7160
quit

I would suggest using the "mod -s" as a workaround for now.  It's not a critical issue and may be solved
as a low priority in the future, or closed? Any thoughts?

Thanks.

Comment 11 Dave Wysochanski 2022-11-21 12:17:24 UTC
Lianbo, normally I'd agree this probably should be closed, and it still may be the right outcome.

One thing I didn't mention though in the previous comment is a driving force for investigating is how our vmcore system works.  Our vmcore system by default loads all modules into crash session that are listed as loaded at the time of the crash, and workaround may not be obvious.  I know similar issue has come up before too with support, though I don't know how often it does, or even if the underlying cause is the same.  So I'd lean towards at least a bit of investigation before closing.  I haven't started looking into the code but any idea on how complicated this might be?  Is this likely to be in gdb so would need patched there, or don't you have any idea?  If you don't have cycles to investigate, someone in support may be able to investigate since it impacts them and I know a few have done patches for crash-utility.

Comment 12 lijiang 2022-11-22 02:25:15 UTC
Thank you, Dave.

As we discussed, it's not an urgent issue, but this issue still affects CEE's vmcore system(when loading all modules into crash session with the "mod -S"), it would be good to fix it in the future.

For now, I'm working on high priority issues, once those issues are completed, I will try to resolve it.

But anyway, if someone is willing to work on this bug and post a patch to upstream, that will be welcome.