1509162 – Failing HTM tbegin for z Series guests despite claiming support.

Bug 1509162 - Failing HTM tbegin for z Series guests despite claiming support.

Summary: Failing HTM tbegin for z Series guests despite claiming support.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	27
Hardware:	s390x
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:	1499260
Blocks:	ZedoraTracker 1513894
TreeView+	depends on / blocked

Reported:	2017-11-03 08:53 UTC by Dan Horák
Modified:	2018-03-23 16:52 UTC (History)
CC List:	37 users (show)
Fixed In Version:
Clone Of:	1499260
Environment:
Last Closed:	2018-03-23 16:49:37 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
full boot log (23.02 KB, text/plain) 2017-11-03 09:11 UTC, Dan Horák	no flags	Details
test patch against v4.13 (647 bytes, patch) 2017-11-08 21:50 UTC, IBM Bug Proxy	no flags	Details \| Diff
For more bugs within transactional execution (5.06 KB, patch) 2017-11-09 12:12 UTC, IBM Bug Proxy	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	161003	0	None	None	None	2017-11-06 14:55:01 UTC
Red Hat Bugzilla	1513894	1	None	None	None	2021-01-20 06:05:38 UTC

Internal Links: 1513894

Description Dan Horák 2017-11-03 08:53:39 UTC

+++ This bug was initially created as a clone of Bug #1499260 +++

I have prepared a F-27 installer image (from https://kojipkgs.fedoraproject.org/compose/branched/Fedora-27-20171004.n.0/compose/Everything/s390x/os/) and now I get a segfault when kernel switches to user-space.

================================================================================
[    3.786004] Key type big_key registered
[    3.787436] Key type encrypted registered
[    3.787720] Freeing unused kernel memory: 664K
[    3.787725] Write protected read-only-after-init data: 20k
[    3.787728] rodata_test: all tests were successful
[    3.790779] User process fault: interruption code 0013 ilc:3 in libpthread-2.
26.so[3ff93c00000+1b000]
[    3.790786] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
[    3.790788] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[    3.790789] task: 00000000fafc8000 task.stack: 00000000fafc4000
[    3.790791] User PSW : 0705200180000000 000003ff93c14e70
[    3.790792]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:
0 EA:3
[    3.790794] User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000
003ff93144d5e
[    3.790795]            0000000000000000 0000000000000002 0000000000000000 000
003ff00000000
[    3.790796]            0000000000000000 0000000000000418 0000000000000000 000
003ffcc9fe770
[    3.790797]            000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000
003ffcc9fe6d0
[    3.790805] User Code: 000003ff93c14e62: 60e0b030            std     %f14,48(
%r11)
[    3.790805]            000003ff93c14e66: 60f0b038            std     %f15,56(
%r11)
[    3.790805]           #000003ff93c14e6a: e5600000ff0e        tbegin  0,65294
[    3.790805]           >000003ff93c14e70: a7740006            brc     7,3ff93c
14e7c
[    3.790805]            000003ff93c14e74: a7080000            lhi     %r0,0
[    3.790805]            000003ff93c14e78: a7f40023            brc     15,3ff93
c14ebe
[    3.790805]            000003ff93c14e7c: b2220000            ipm     %r0
[    3.790805]            000003ff93c14e80: 8800001c            srl     %r0,28
[    3.790819] Last Breaking-Event-Address:
[    3.790821]  [<000003ff93c14de4>] 0x3ff93c14de4
[    3.790950] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00
000004
[    3.790950]
[    3.790952] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
[    3.790953] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[    3.790953] Call Trace:

                                                            RUNNING   DEVEL2



I have a guest running F-27 without problem, but what changed is that z/VM 6.4 hypervisor (updated last weekend to 6.4) now exposes the Transactional Execution bit (TE) to z/VM guests. It wasn't the case with the previously installed z/VM (6.1 or 6.3?)


Version-Release number of selected component (if applicable):
glibc-2.26-8.fc27.s390x

--- Additional comment from Carlos O'Donell on 2017-10-06 11:53:59 EDT ---

I find it suspicious that this is after a 'tbegin' instruction has started executing a transactional region.

Exactly what hardware is this and does it claim to support HWCAP_S390_TE?

--- Additional comment from Dan Horák on 2017-10-06 12:12:49 EDT ---

(In reply to Carlos O'Donell from comment #1)
> I find it suspicious that this is after a 'tbegin' instruction has started
> executing a transactional region.
> 
> Exactly what hardware is this and does it claim to support HWCAP_S390_TE?

It's RH zEC12 which supports TE, and I read in z/VM 6.4 news that it brings "Guest Transactional Execution support". That's a change in our environment since last week.

--- Additional comment from Carlos O'Donell on 2017-10-06 12:26:07 EDT ---

(In reply to Dan Horák from comment #2)
> (In reply to Carlos O'Donell from comment #1)
> > I find it suspicious that this is after a 'tbegin' instruction has started
> > executing a transactional region.
> > 
> > Exactly what hardware is this and does it claim to support HWCAP_S390_TE?
> 
> It's RH zEC12 which supports TE, and I read in z/VM 6.4 news that it brings
> "Guest Transactional Execution support". That's a change in our environment
> since last week.

Is there any way to disable TE at the hardware level so the kernel doesn't report it and then see if this fixes the boot issue?

Otherwise I will have to rebuild F27 glibc for s390x with elision turned off until I get the upstream tunables in place.

--- Additional comment from Carlos O'Donell on 2017-10-06 12:26:53 EDT ---

(In reply to Carlos O'Donell from comment #3)
> (In reply to Dan Horák from comment #2)
> > (In reply to Carlos O'Donell from comment #1)
> > > I find it suspicious that this is after a 'tbegin' instruction has started
> > > executing a transactional region.
> > > 
> > > Exactly what hardware is this and does it claim to support HWCAP_S390_TE?
> > 
> > It's RH zEC12 which supports TE, and I read in z/VM 6.4 news that it brings
> > "Guest Transactional Execution support". That's a change in our environment
> > since last week.
> 
> Is there any way to disable TE at the hardware level so the kernel doesn't
> report it and then see if this fixes the boot issue?
> 
> Otherwise I will have to rebuild F27 glibc for s390x with elision turned off
> until I get the upstream tunables in place.

... if that's the issue.

--- Additional comment from Dan Horák on 2017-10-09 07:09:32 EDT ---

so with a glibc that correctly disables the lock elision (https://koji.fedoraproject.org/koji/taskinfo?taskID=22348456) the boot of the installation image continues correctly, without the segfault ...

--- Additional comment from Dan Horák on 2017-10-09 07:37:46 EDT ---

The installation then succeeds, it installs glibc-2.26-8.fc27.s390x from the Fedora repos and the installed system boots without an issue.

--- Additional comment from Dan Horák on 2017-10-12 06:03:07 EDT ---

for the record, rawhide compose has the same problem
(https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20171011.n.0/compose/Server/s390x/os/images/)

--- Additional comment from Carlos O'Donell on 2017-10-13 12:44:59 EDT ---

It goes without saying that we are very interested in the root cause analysis of this issue with input from IBM, since this should "just work" (tm).

--- Additional comment from Dan Horák on 2017-10-16 08:21:09 EDT ---

There is (only) one difference I'm aware of between the working and failing scenario - the installer boot is initiated from the CMS shell using the virtual card reader, while the installed systems starts from CP using a DASD.

Comment 1 Dan Horák 2017-11-03 08:54:41 UTC

Cloning the original bug to allow further investigation.

Comment 2 Dan Horák 2017-11-03 09:11:15 UTC

Created attachment 1347225 [details]
full boot log

Boot log from Rawhide installation started from zipl from a previous installation.

When the same glibc and kernel as in the installer are on installed, there is no such crash as when the installer is run. Installing kernel after glibc ensures that the new glibc is included in the boot initramfs.

Comment 3 Dan Horák 2017-11-06 14:16:43 UTC

Switching to kernel, because
- the interruption code 0013 means a special-operation exception and PoP says for TBEGIN: "2. A special-operation exception is recognized and the operation is suppressed if the transactional-execution control, bit 8 of control register 0, is zero.

- the content or CR0 can be confirmed with
    00: d xg0
    00: CRG  0 =  0000000014846A12
after catching the exception with a trace (TR prog 13)
00:  -> 0000000001000582'  TBEGIN  E5600000FF0E    0000000000000000
00: *** 0000000001000582'      PROG    0013 -> 00000000008EF144"
00:               SPECIAL OPERATION

The problem can be reproduced with something like

#include <stdio.h>

int tbegin()
{
    int ret;

    __asm__ __volatile__ (
        "  tbegin 0, 0xFF0E\n\t"
        "  jnz 0f\n\t"
        "  lhi %0, 0\n\t"
        "  j 1f\n\t"
        "0: ipm %0\n\t"
        "  srl %0, 28\n\t"
        "1:"
        :"=&d" (ret)::
    );

    return ret;
}

void tend()
{
    __asm__ (
        "       tend\n\t"
        :::
    );
}

int main(void)
{
    int t;

    t = tbegin();
    printf("tbegin=%d\n", t);

    tend();
    return 0;
}

linked staticly (or dynamicly) and used as /init


CR0 content from another guest in the same z/VM instance
[sharkcz@devel11 ~]$ sudo vmcp d xg0
[sudo] password for sharkcz: 
CRG  0 =  0080000014846A12


I suspect the kernel doesn't explicitly enable TE (under some conditions) when it runs on machine with TE available in the facilities bits (like our zEC12 with zVM 6.4.0).

Comment 4 Dan Horák 2017-11-08 17:40:23 UTC

PPC kernel added recently a command line option to disable transactional memory - see http://patchwork.ozlabs.org/patch/824763/
It might be useful for s390x to follow the same idea.

Comment 5 IBM Bug Proxy 2017-11-08 21:40:26 UTC

------- Comment From h.carstens.com 2017-11-08 16:38 EDT-------
(In reply to comment #7)
> PPC kernel added recently a command line option to disable transactional
> memory - see http://patchwork.ozlabs.org/patch/824763/
> It might be useful for s390x to follow the same idea.

Yes, indeed. This bug has proven that would be useful.

Comment 6 IBM Bug Proxy 2017-11-08 21:50:40 UTC

Created attachment 1349586 [details]
test patch against v4.13


------- Comment on attachment From h.carstens.com 2017-11-08 16:41 EDT-------


I had just a short look into the code and suspect the missing initial enabling of the transactional execution facility early at startup to be the culprit.
Later on the corresponding control register bit will be set at every task switch.

Could you please try this, and let me know if it fixes the problem?

The patch is against vanilla v4.13 but should easily apply to your kernel sources.

Comment 7 Dan Horák 2017-11-09 11:14:47 UTC

yes, the problem is away, both with minimal test case from comment #3 and also with Rawhide installer images recreated with an updated kernel.

Comment 8 Dan Horák 2017-11-09 11:23:56 UTC

Shouldn't something like this be applied as well?

diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 164a1e16b53e..ad6e4a4c65dc 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -764,7 +764,7 @@ static int __init setup_hwcaps(void)
        /*
         * Transactional execution support HWCAP_S390_TE is bit 10.
         */
-       if (test_facility(50) && test_facility(73))
+       if (test_facility(50) && test_facility(73) && __ctl_get_bit(0, 55))
                elf_hwcap |= HWCAP_S390_TE;
 
        /*


And how about KVM or qemu? Or arch/s390/kernel/ptrace.c (it checks for MACHINE_HAS_TE)?

Would you have an explanation why we saw the problem only in a installation, but did not in already installed guests?

Comment 9 IBM Bug Proxy 2017-11-09 12:11:47 UTC

------- Comment From h.carstens.com 2017-11-09 07:08 EDT-------
(In reply to comment #10)
> yes, the problem is away, both with minimal test case from comment #3 and
> also with Rawhide installer images recreated with an updated kernel.

Thanks for testing. After looking a bit deeper into the code, it looks incomplete however. I will mark the current patch obsolete and a new one. There are more bugs with respect to transactional exection.

> Shouldn't something like this be applied as well?
> * Transactional execution support HWCAP_S390_TE is bit 10.
> */
> -       if (test_facility(50) && test_facility(73))
> +       if (test_facility(50) && test_facility(73) && __ctl_get_bit(0, 55))
> elf_hwcap |= HWCAP_S390_TE;

As a cleanup this should be converted to something like

if (MACHINE_HAS_TE)

But that doesn't fix any bug.

> And how about KVM or qemu? Or arch/s390/kernel/ptrace.c (it checks for
> MACHINE_HAS_TE)?

KVM + QEMU are fine. This a "simple" kernel bug.

> Would you have an explanation why we saw the problem only in a installation, but did not
> but did not in already installed guests?

Yes, as soon as the system runs, control register 0 is updated on every task switch and the facility gets enabled (if the following process is not a kernel thread). So you will see this behavior only on startup. Unless you use ptrace requests which disable it again... which is yet another bug. See new patch.

------- Comment From h.carstens.com 2017-11-09 07:11 EDT-------
Could you give the new patch also a try, please?

Comment 10 IBM Bug Proxy 2017-11-09 12:12:03 UTC

Created attachment 1349861 [details]
For more bugs within transactional execution


------- Comment on attachment From h.carstens.com 2017-11-09 07:10 EDT-------


There are several bugs with control register handling with respect to
transactional execution:

- on task switch update_per_regs() is only called if the next task has
  an mm (is not a kernel thread). This however is incorrect. This
  breaks e.g. for user mode helper handling, where the kernel creates
  a kernel thread and then execve's a user space program. Control
  register contents related to transactional execution won't be
  updated on execve. If the previous task ran with transactional
  execution disabled then the new task will also run with
  transactional execution disabled, which is incorrect. Therefore call
  update_per_regs() unconditionally within switch_to().

- on startup the transactional execution factility is not enabled for
  the idle thread. This is not really a bug, but an inconsistency to
  other facilities. Therefore enable the facility if it is available.

- on fork the new thread's per_flags field is not cleared. This means
  that a child process inherits the PER_FLAG_NO_TE flag. This flag can
  be set with a ptrace request to disable transactional execution for
  the current process. It should not be inherited by new child
  processes in order to be consistent with the handling of all other
  PER related debugging option. Therefore clear the per_flags field in
  copy_thread_tls().

Comment 11 Dan Horák 2017-11-09 16:17:57 UTC

> ------- Comment From h.carstens.com 2017-11-09 07:11 EDT-------
> Could you give the new patch also a try, please?

yes, all looks still good with the new patch

Comment 12 Dan Horák 2017-11-09 16:32:57 UTC

And from the "Cc: <stable.org> # v3.7+" note in the patch, I assume we need the fix in RHEL-7 kernel as well, right?

Hanns, will you manage it from the IBM side for the whole enterprise part, or shall I initiate the process by cloning this bug to RHEL?

Comment 13 IBM Bug Proxy 2017-11-10 10:40:26 UTC

------- Comment From h.carstens.com 2017-11-10 05:32 EDT-------
Yes, a backport to RHEL-7 is also needed.

Comment 14 Hanns-Joachim Uhl 2017-11-11 13:39:05 UTC

(In reply to Dan Horák from comment #12)
> And from the "Cc: <stable.org> # v3.7+" note in the patch, I
> assume we need the fix in RHEL-7 kernel as well, right?
> 
> Hanns, will you manage it from the IBM side for the whole enterprise part,
> or shall I initiate the process by cloning this bug to RHEL?
.
... Dan, I will check at our side how to proceed for RHEL ...

Comment 15 Hanns-Joachim Uhl 2017-11-11 13:41:53 UTC

(In reply to IBM Bug Proxy from comment #10)
> Created attachment 1349861 [details]
> For more bugs within transactional execution
> 
> 
.
fyi ... the upstream git commit in the s390 tree is
https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556
("s390: fix transactional execution control register handling")
.

Comment 16 Dan Horák 2017-11-14 19:48:21 UTC

and merged to mainline as https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556

Comment 17 Laura Abbott 2018-02-20 19:55:50 UTC

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  As kernel maintainers, we try to keep up with bugzilla but due the rate at which the upstream kernel project moves, bugs may be fixed without any indication to us. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.
 
Fedora 27 has now been rebased to 4.15.3-300.f27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you experience different issues, please open a new bug report for those.

Comment 18 Laura Abbott 2018-03-23 16:49:37 UTC

*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you did actually update, we apologize for the inconvenience (there are a lot of bugs). If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Note You need to log in before you can comment on or make changes to this bug.

admiller
airlied
ajax
aoliva
arjun
brueckner
bskeggs
bugproxy
codonell
dan
dj
ewk
extras-qa
fweimer
gmarr
hannsj_uhl
hdegoede
herrold
ichavero
itamar
jakub
jarodwilson
jeremy
jglisse
john.j5live
jonathan
josef
kernel-maint
law
linville
mboddu
mchehab
mfabian
mjg59
pfrankli
siddhesh
steved