Bug 1499260 - Failing HTM tbegin for z Series guests despite claiming support.
Summary: Failing HTM tbegin for z Series guests despite claiming support.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 27
Hardware: s390x
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedFreezeException
Depends On:
Blocks: ZedoraTracker F27FinalFreezeException 1509162
TreeView+ depends on / blocked
 
Reported: 2017-10-06 13:42 UTC by Dan Horák
Modified: 2017-11-06 14:56 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1509162 (view as bug list)
Environment:
Last Closed: 2017-10-26 22:30:44 UTC


Attachments (Terms of Use)
full boot log (19.49 KB, text/plain)
2017-10-06 13:42 UTC, Dan Horák
no flags Details

Description Dan Horák 2017-10-06 13:42:37 UTC
Created attachment 1335333 [details]
full boot log

I have prepared a F-27 installer image (from https://kojipkgs.fedoraproject.org/compose/branched/Fedora-27-20171004.n.0/compose/Everything/s390x/os/) and now I get a segfault when kernel switches to user-space.

================================================================================
[    3.786004] Key type big_key registered
[    3.787436] Key type encrypted registered
[    3.787720] Freeing unused kernel memory: 664K
[    3.787725] Write protected read-only-after-init data: 20k
[    3.787728] rodata_test: all tests were successful
[    3.790779] User process fault: interruption code 0013 ilc:3 in libpthread-2.
26.so[3ff93c00000+1b000]
[    3.790786] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
[    3.790788] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[    3.790789] task: 00000000fafc8000 task.stack: 00000000fafc4000
[    3.790791] User PSW : 0705200180000000 000003ff93c14e70
[    3.790792]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:
0 EA:3
[    3.790794] User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000
003ff93144d5e
[    3.790795]            0000000000000000 0000000000000002 0000000000000000 000
003ff00000000
[    3.790796]            0000000000000000 0000000000000418 0000000000000000 000
003ffcc9fe770
[    3.790797]            000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000
003ffcc9fe6d0
[    3.790805] User Code: 000003ff93c14e62: 60e0b030            std     %f14,48(
%r11)
[    3.790805]            000003ff93c14e66: 60f0b038            std     %f15,56(
%r11)
[    3.790805]           #000003ff93c14e6a: e5600000ff0e        tbegin  0,65294
[    3.790805]           >000003ff93c14e70: a7740006            brc     7,3ff93c
14e7c
[    3.790805]            000003ff93c14e74: a7080000            lhi     %r0,0
[    3.790805]            000003ff93c14e78: a7f40023            brc     15,3ff93
c14ebe
[    3.790805]            000003ff93c14e7c: b2220000            ipm     %r0
[    3.790805]            000003ff93c14e80: 8800001c            srl     %r0,28
[    3.790819] Last Breaking-Event-Address:
[    3.790821]  [<000003ff93c14de4>] 0x3ff93c14de4
[    3.790950] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00
000004
[    3.790950]
[    3.790952] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
[    3.790953] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[    3.790953] Call Trace:

                                                            RUNNING   DEVEL2



I have a guest running F-27 without problem, but what changed is that z/VM 6.4 hypervisor (updated last weekend to 6.4) now exposes the Transactional Execution bit (TE) to z/VM guests. It wasn't the case with the previously installed z/VM (6.1 or 6.3?)


Version-Release number of selected component (if applicable):
glibc-2.26-8.fc27.s390x

Comment 1 Carlos O'Donell 2017-10-06 15:53:59 UTC
I find it suspicious that this is after a 'tbegin' instruction has started executing a transactional region.

Exactly what hardware is this and does it claim to support HWCAP_S390_TE?

Comment 2 Dan Horák 2017-10-06 16:12:49 UTC
(In reply to Carlos O'Donell from comment #1)
> I find it suspicious that this is after a 'tbegin' instruction has started
> executing a transactional region.
> 
> Exactly what hardware is this and does it claim to support HWCAP_S390_TE?

It's RH zEC12 which supports TE, and I read in z/VM 6.4 news that it brings "Guest Transactional Execution support". That's a change in our environment since last week.

Comment 3 Carlos O'Donell 2017-10-06 16:26:07 UTC
(In reply to Dan Horák from comment #2)
> (In reply to Carlos O'Donell from comment #1)
> > I find it suspicious that this is after a 'tbegin' instruction has started
> > executing a transactional region.
> > 
> > Exactly what hardware is this and does it claim to support HWCAP_S390_TE?
> 
> It's RH zEC12 which supports TE, and I read in z/VM 6.4 news that it brings
> "Guest Transactional Execution support". That's a change in our environment
> since last week.

Is there any way to disable TE at the hardware level so the kernel doesn't report it and then see if this fixes the boot issue?

Otherwise I will have to rebuild F27 glibc for s390x with elision turned off until I get the upstream tunables in place.

Comment 4 Carlos O'Donell 2017-10-06 16:26:53 UTC
(In reply to Carlos O'Donell from comment #3)
> (In reply to Dan Horák from comment #2)
> > (In reply to Carlos O'Donell from comment #1)
> > > I find it suspicious that this is after a 'tbegin' instruction has started
> > > executing a transactional region.
> > > 
> > > Exactly what hardware is this and does it claim to support HWCAP_S390_TE?
> > 
> > It's RH zEC12 which supports TE, and I read in z/VM 6.4 news that it brings
> > "Guest Transactional Execution support". That's a change in our environment
> > since last week.
> 
> Is there any way to disable TE at the hardware level so the kernel doesn't
> report it and then see if this fixes the boot issue?
> 
> Otherwise I will have to rebuild F27 glibc for s390x with elision turned off
> until I get the upstream tunables in place.

... if that's the issue.

Comment 5 Dan Horák 2017-10-09 11:09:32 UTC
so with a glibc that correctly disables the lock elision (https://koji.fedoraproject.org/koji/taskinfo?taskID=22348456) the boot of the installation image continues correctly, without the segfault ...

Comment 6 Dan Horák 2017-10-09 11:37:46 UTC
The installation then succeeds, it installs glibc-2.26-8.fc27.s390x from the Fedora repos and the installed system boots without an issue.

Comment 7 Dan Horák 2017-10-12 10:03:07 UTC
for the record, rawhide compose has the same problem
(https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20171011.n.0/compose/Server/s390x/os/images/)

Comment 8 Carlos O'Donell 2017-10-13 16:44:08 UTC
Fixed scratch build with --disable-lock-elision:
https://koji.fedoraproject.org/koji/taskinfo?taskID=22424738

Comment 9 Carlos O'Donell 2017-10-13 16:44:59 UTC
It goes without saying that we are very interested in the root cause analysis of this issue with input from IBM, since this should "just work" (tm).

Comment 10 Carlos O'Donell 2017-10-14 05:26:39 UTC
Scratch build passes and final libpthread.so.0 has no tbegin/tend.

Final F27 build here:
https://koji.fedoraproject.org/koji/taskinfo?taskID=22435457

Comment 11 Carlos O'Donell 2017-10-14 22:18:29 UTC
(In reply to Carlos O'Donell from comment #10)
> Scratch build passes and final libpthread.so.0 has no tbegin/tend.
> 
> Final F27 build here:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=22435457

Dan,

Can you please check these builds and see if they work and get back to me quickly? The sooner I hear back the faster I'll put this into a Bodhi update for F27.

Thanks!

Comment 12 Fedora Update System 2017-10-15 21:37:53 UTC
glibc-2.26-14.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-0d3fdd3d1f

Comment 13 Fedora Update System 2017-10-15 21:38:46 UTC
glibc-2.26-14.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-0d3fdd3d1f

Comment 14 Dan Horák 2017-10-16 12:21:09 UTC
There is (only) one difference I'm aware of between the working and failing scenario - the installer boot is initiated from the CMS shell using the virtual card reader, while the installed systems starts from CP using a DASD.

Comment 15 Fedora Blocker Bugs Application 2017-10-16 12:34:30 UTC
Proposed as a Freeze Exception for 27-final by Fedora user sharkcz using the blocker tracking app because:

 The installer doesn't boot on a s390x system when glibc with lock elision is used.

Comment 16 Dan Horák 2017-10-16 13:23:26 UTC
(In reply to Carlos O'Donell from comment #11)
> (In reply to Carlos O'Donell from comment #10)
> > Scratch build passes and final libpthread.so.0 has no tbegin/tend.
> > 
> > Final F27 build here:
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=22435457
> 
> Dan,
> 
> Can you please check these builds and see if they work and get back to me
> quickly? The sooner I hear back the faster I'll put this into a Bodhi update
> for F27.

thanks for the update, the installer image boots with your glibc build, karma will follow

Comment 17 Fedora Update System 2017-10-17 02:23:57 UTC
glibc-2.26-14.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-0d3fdd3d1f

Comment 18 Adam Miller 2017-10-17 17:39:13 UTC
+1 FE

Comment 19 Mohan Boddu 2017-10-17 17:40:43 UTC
+1 FE

Comment 20 Fedora Update System 2017-10-21 16:14:30 UTC
glibc-2.26-15.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-0d3fdd3d1f

Comment 21 Fedora Update System 2017-10-21 19:25:07 UTC
glibc-2.26-15.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-0d3fdd3d1f

Comment 22 Geoffrey Marr 2017-10-23 21:49:17 UTC
Discussed during the 2017-10-23 blocker review meeting: [1]

The decision to classify this bug as an AcceptedFreezeException was made as this breaks install boot on a non-blocking arch and can't be fixed via an update.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2017-10-23/f27-blocker-review.2017-10-23-16.00.txt

Comment 23 Fedora Update System 2017-10-24 20:08:29 UTC
glibc-2.26-15.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Adam Williamson 2017-10-26 22:30:44 UTC
Went stable, closing bug. Please re-open if anything somehow still needs doing here.


Note You need to log in before you can comment on or make changes to this bug.