Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1538906

Summary: several beaker jobs on aarch64 can not report panic
Product: [Retired] Beaker Reporter: Li Shuang <shuali>
Component: reportsAssignee: Roman Joost <rjoost>
Status: CLOSED CURRENTRELEASE QA Contact: Matt Tyson 🤬 <mtyson>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 24CC: achatter, dcallagh, jbastian, mjia, mtyson, rjoost
Target Milestone: 25.0Keywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-19 04:17:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Dan Callaghan 2018-01-26 14:00:19 UTC
It seems like the exact wording of this oops message is different to the normal ones we are expecting. In this case, the important line indicating an oops happened is this one:

[ 1595.106182] Internal error: Oops - SP/PC alignment exception: 8a000000 [#1] SMP 

but the PANIC_REGEX setting in the lab controller settings is only going to match on "Oops: " (note the colon).

We can expand the pattern of course, but I am curious as to why this message on arm64 seems to be different and also what other possibilities there might be.

Comment 2 Dan Callaghan 2018-01-26 14:03:15 UTC
Add jbastian to cc... Jeff you have probably seen more than your fair share of arm64 oops messages :-) so I was wondering if you have any opinion here.

Should we just add another pattern to Beaker's panic regex to match the "Internal error: Oops" string?

I feel like we are fighting a bit of a losing battle here, if kernel developers keep using any arbitrary formatting and spelling for their oops messages, but maybe there is nothing better we can do.

Comment 3 Dan Callaghan 2018-01-26 14:14:31 UTC
I was curious what abrt does, since it also catches oops by reading kernel messages. It looks like it has this quite lengthy list of possible patterns it will match on:

https://github.com/abrt/abrt/blob/faf826e9b76f9a0de0c2b046080cf792f1232668/src/lib/kernel.c#L77

although nowhere can I see where it matches on the actual string "Oops" or anything like it. Maybe I am misreading that code.

Comment 4 Jeff Bastian 2018-01-31 22:37:52 UTC
Just from perusing the kernel source code, I see lots of arbitrary formatting on the ARM side:

$ find . -type f | xargs grep Oops
...
./arm/kernel/traps.c:		str = "Oops - BUG";
./arm/kernel/traps.c:	arm_notify_die("Oops - undefined instruction", regs, &info, 0, 6);
./arm/kernel/traps.c:	die("Oops - bad mode", regs, 0);
./arm/kernel/traps.c:	arm_notify_die("Oops - bad syscall", regs, &info, n, 0);
./arm/kernel/traps.c:	arm_notify_die("Oops - bad syscall(2)", regs, &info, no, 0);
./arm/kernel/traps.c:	panic("Oops failed to kill thread");
./arm/mm/alignment.c:	 * Oops, we didn't handle the instruction.
./arm/mm/fault.c: * Oops.  The kernel tried to access some page that wasn't present.
./arm/mm/fault.c:	die("Oops", regs, fsr);
./arm64/kernel/traps.c:	die("Oops - bad mode", regs, 0);
./arm64/kernel/traps.c:		die("Oops - BUG", regs, 0);
./arm64/mm/fault.c:	die("Oops", regs, esr);
./arm64/mm/fault.c:	arm64_notify_die("Oops - SP/PC alignment exception", regs, &info, esr);
...


The x86 Oops messages are wrapped in a single function to give consistent formatting:

./x86/mm/fault.c:	if (__die("Oops", regs, error_code))


The __die() function adds the colon:

int __die(const char *str, struct pt_regs *regs, long err)
{
    ...
    printk(KERN_DEFAULT
            "%s: %04lx [#%d]%s%s%s%s\n", str, err & 0xffff, ++die_counter,
            ...


I suppose a safe regex would be to search for the string Oops surrounded by word boundaries, e.g., \bOops\b

Comment 5 Roman Joost 2018-02-01 05:45:51 UTC
With the possibility that I might have missed the mark fixing this or clashing with Dan's progress, I put up a patch:

https://gerrit.beaker-project.org/c/5989/

Comment 7 Matt Tyson 🤬 2018-02-19 00:02:12 UTC
Injecting the following string into the dmesg log results in an expected failure during a beaker run.

echo 'Internal error: Oops - SP/PC alignment exception: 8a000000 [#1] SMP' > /dev/kmesg

Comment 8 Roman Joost 2018-03-19 04:17:57 UTC
Beaker 25.0 has been released.

Release notes are available upstream: https://beaker-project.org/docs/whats-new/release-25.html

Comment 9 Dan Callaghan 2018-05-03 04:07:26 UTC
This new pattern is too broad, it now triggers if someone puts "Oops" into their test case name: bug 1572880. We need to find a narrow one...