Bug 1538906 - several beaker jobs on aarch64 can not report panic
Summary: several beaker jobs on aarch64 can not report panic
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: reports
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 25.0
Assignee: Roman Joost
QA Contact: Matt Tyson 🤬
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-26 05:44 UTC by Li Shuang
Modified: 2018-05-03 04:07 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-19 04:17:57 UTC
Embargoed:


Attachments (Terms of Use)

Comment 1 Dan Callaghan 2018-01-26 14:00:19 UTC
It seems like the exact wording of this oops message is different to the normal ones we are expecting. In this case, the important line indicating an oops happened is this one:

[ 1595.106182] Internal error: Oops - SP/PC alignment exception: 8a000000 [#1] SMP 

but the PANIC_REGEX setting in the lab controller settings is only going to match on "Oops: " (note the colon).

We can expand the pattern of course, but I am curious as to why this message on arm64 seems to be different and also what other possibilities there might be.

Comment 2 Dan Callaghan 2018-01-26 14:03:15 UTC
Add jbastian to cc... Jeff you have probably seen more than your fair share of arm64 oops messages :-) so I was wondering if you have any opinion here.

Should we just add another pattern to Beaker's panic regex to match the "Internal error: Oops" string?

I feel like we are fighting a bit of a losing battle here, if kernel developers keep using any arbitrary formatting and spelling for their oops messages, but maybe there is nothing better we can do.

Comment 3 Dan Callaghan 2018-01-26 14:14:31 UTC
I was curious what abrt does, since it also catches oops by reading kernel messages. It looks like it has this quite lengthy list of possible patterns it will match on:

https://github.com/abrt/abrt/blob/faf826e9b76f9a0de0c2b046080cf792f1232668/src/lib/kernel.c#L77

although nowhere can I see where it matches on the actual string "Oops" or anything like it. Maybe I am misreading that code.

Comment 4 Jeff Bastian 2018-01-31 22:37:52 UTC
Just from perusing the kernel source code, I see lots of arbitrary formatting on the ARM side:

$ find . -type f | xargs grep Oops
...
./arm/kernel/traps.c:		str = "Oops - BUG";
./arm/kernel/traps.c:	arm_notify_die("Oops - undefined instruction", regs, &info, 0, 6);
./arm/kernel/traps.c:	die("Oops - bad mode", regs, 0);
./arm/kernel/traps.c:	arm_notify_die("Oops - bad syscall", regs, &info, n, 0);
./arm/kernel/traps.c:	arm_notify_die("Oops - bad syscall(2)", regs, &info, no, 0);
./arm/kernel/traps.c:	panic("Oops failed to kill thread");
./arm/mm/alignment.c:	 * Oops, we didn't handle the instruction.
./arm/mm/fault.c: * Oops.  The kernel tried to access some page that wasn't present.
./arm/mm/fault.c:	die("Oops", regs, fsr);
./arm64/kernel/traps.c:	die("Oops - bad mode", regs, 0);
./arm64/kernel/traps.c:		die("Oops - BUG", regs, 0);
./arm64/mm/fault.c:	die("Oops", regs, esr);
./arm64/mm/fault.c:	arm64_notify_die("Oops - SP/PC alignment exception", regs, &info, esr);
...


The x86 Oops messages are wrapped in a single function to give consistent formatting:

./x86/mm/fault.c:	if (__die("Oops", regs, error_code))


The __die() function adds the colon:

int __die(const char *str, struct pt_regs *regs, long err)
{
    ...
    printk(KERN_DEFAULT
            "%s: %04lx [#%d]%s%s%s%s\n", str, err & 0xffff, ++die_counter,
            ...


I suppose a safe regex would be to search for the string Oops surrounded by word boundaries, e.g., \bOops\b

Comment 5 Roman Joost 2018-02-01 05:45:51 UTC
With the possibility that I might have missed the mark fixing this or clashing with Dan's progress, I put up a patch:

https://gerrit.beaker-project.org/c/5989/

Comment 7 Matt Tyson 🤬 2018-02-19 00:02:12 UTC
Injecting the following string into the dmesg log results in an expected failure during a beaker run.

echo 'Internal error: Oops - SP/PC alignment exception: 8a000000 [#1] SMP' > /dev/kmesg

Comment 8 Roman Joost 2018-03-19 04:17:57 UTC
Beaker 25.0 has been released.

Release notes are available upstream: https://beaker-project.org/docs/whats-new/release-25.html

Comment 9 Dan Callaghan 2018-05-03 04:07:26 UTC
This new pattern is too broad, it now triggers if someone puts "Oops" into their test case name: bug 1572880. We need to find a narrow one...


Note You need to log in before you can comment on or make changes to this bug.