Bug 1691430 - dnf.exceptions.Error: Incorrect or unknown "arch": armv7hcnl
Summary: dnf.exceptions.Error: Incorrect or unknown "arch": armv7hcnl
Keywords:
Status: MODIFIED
Alias: None
Product: Fedora
Classification: Fedora
Component: libdnf
Version: 32
Hardware: aarch64
OS: Linux
urgent
unspecified
Target Milestone: ---
Assignee: rpm-software-management
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2019-03-21 15:15 UTC by Paul Whalen
Modified: 2020-02-14 07:29 UTC (History)
24 users (show)

Fixed In Version: libdnf-0.35.3-6.fc31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-24 17:09:49 UTC


Attachments (Terms of Use)
anaconda log (5.23 KB, text/plain)
2019-03-21 15:21 UTC, Paul Whalen
no flags Details

Description Paul Whalen 2019-03-21 15:15:11 UTC
Description of problem:
Armv7 virt install on aarch64 fails with:

Traceback (most recent call last):
  File "/sbin/anaconda", line 615, in <module>
    display.setup_display(anaconda, opts)
  File "/usr/lib/python3.7/site-packages/pyanaconda/display.py", line 346, in setup_display
    anaconda.initInterface()
  File "/usr/lib/python3.7/site-packages/pyanaconda/anaconda.py", line 288, in initInterface
    self._intf = TextUserInterface(self.storage, self.payload)
  File "/usr/lib/python3.7/site-packages/pyanaconda/anaconda.py", line 100, in payload
    self._payload = klass(self.ksdata)




  File "/usr/lib/python3.7/site-packages/pyanaconda/payload/dnfpayload.py", line 304, in __init__
    self._configure()
  File "/usr/lib/python3.7/site-packages/pyanaconda/payload/dnfpayload.py", line 660, in _configure
    self._base = dnf.Base()
  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 93, in __init__
    self._conf = conf or self._setup_default_conf()
  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 152, in _setup_default_conf
    conf = dnf.conf.Conf()
  File "/usr/lib/python3.7/site-packages/dnf/conf/config.py", line 213, in __init__
    self.arch = hawkey.detect_arch()
  File "/usr/lib/python3.7/site-packages/dnf/conf/config.py", line 80, in __setattr__


Version-Release number of selected component (if applicable):
dnf-4.2.1-1.fc30.noarch


How reproducible:
everytime

Comment 1 Paul Whalen 2019-03-21 15:21:42 UTC
Created attachment 1546574 [details]
anaconda log

Comment 2 Paul Whalen 2019-03-21 15:44:34 UTC
This is happening on an F30 aarch64 host, attempting to install an F30 armv7 guest. F29 aarch64 host with f30 armv7 guest works ok

Comment 3 Jeremy Linton 2019-04-23 21:25:20 UTC
This problem seems to have worked its way into the F29 repo's too. I just `dnf upgraded` a F29 armv7 guest, and now its doing this too. 

dnf is 4.2.2-2.fc29.

Worse yet, it seems --forcearch armv7hl doesn't work around the problem.

Comment 4 Jeremy Linton 2019-04-23 21:35:36 UTC
I just hacked up the _BASEARCH_MAP function in dnf/rpm/__init__.py, and now I can forcearch it.

Comment 5 Paul Whalen 2019-05-06 15:32:57 UTC
This seems to affect Seattle hardware, not reproducible on a Mustang.

Comment 6 David Rheinsberg 2019-05-28 07:30:15 UTC
This makes `dnf` refuse operation on all my `armv7hl` machines. This affects both F29 and F30.

I investigated and I assume this is triggered by a change in `libdnf` which generalized the architecture detection on ARM. It now produces `armv7hcnl` for my machines, because it detects NEON and AES support. Before, it would return `armv7hl`.

I opened a PR against `dnf` to properly detect `armv7hcnl` as architecture:

    https://github.com/rpm-software-management/dnf/pull/1404

As a workaround I use the following command to patch dnf on a running system:

    sed -i "s/'armv7hnl', 'armv8hl'/'armv7hnl', 'armv7hcnl', 'armv8hl'/" /usr/lib/python3.7/site-packages/dnf/rpm/__init__.py

(It is safe to run this multiple times. It will only have an effect the first time it is run.)

Comment 7 Yaakov Selkowitz 2019-05-30 23:56:52 UTC
That doesn't seem to be sufficient.  On F29 and F30 VMs hosted on a Seattle, editing dnf/rpm/__init__.py is enough to get metadata to download, but then trying to install or update anything fails because "package ____.armv7hl does not have a compatible architecture".

Comment 8 David Rheinsberg 2019-05-31 04:58:13 UTC
(In reply to Yaakov Selkowitz from comment #7)
> That doesn't seem to be sufficient.  On F29 and F30 VMs hosted on a Seattle,
> editing dnf/rpm/__init__.py is enough to get metadata to download, but then
> trying to install or update anything fails because "package ____.armv7hl
> does not have a compatible architecture".

Correct. You still need `--forcearch=armv7hl`. With that, everything works fine for me. If someone figures out how to fix that properly, please go ahead ;)

Comment 9 Neal Gompa 2019-06-04 10:57:09 UTC
(In reply to David Rheinsberg from comment #8)
> (In reply to Yaakov Selkowitz from comment #7)
> > That doesn't seem to be sufficient.  On F29 and F30 VMs hosted on a Seattle,
> > editing dnf/rpm/__init__.py is enough to get metadata to download, but then
> > trying to install or update anything fails because "package ____.armv7hl
> > does not have a compatible architecture".
> 
> Correct. You still need `--forcearch=armv7hl`. With that, everything works
> fine for me. If someone figures out how to fix that properly, please go
> ahead ;)

Fixing that requires rpm to declare 32-bit arm arches to be compatible with aarch64 in the same way that 32-bit x86 is compatible with x86_64.

Comment 10 Peter Robinson 2019-06-04 12:48:03 UTC
(In reply to Neal Gompa from comment #9)
> (In reply to David Rheinsberg from comment #8)
> > (In reply to Yaakov Selkowitz from comment #7)
> > > That doesn't seem to be sufficient.  On F29 and F30 VMs hosted on a Seattle,
> > > editing dnf/rpm/__init__.py is enough to get metadata to download, but then
> > > trying to install or update anything fails because "package ____.armv7hl
> > > does not have a compatible architecture".
> > 
> > Correct. You still need `--forcearch=armv7hl`. With that, everything works
> > fine for me. If someone figures out how to fix that properly, please go
> > ahead ;)
> 
> Fixing that requires rpm to declare 32-bit arm arches to be compatible with
> aarch64 in the same way that 32-bit x86 is compatible with x86_64.

Why? They're not compatible. And they're as a result reported completely differently. The 64 bit variant is reported as aarch64, where as the 32 bit variant is reported as armv7l armv8l etc

Comment 11 Neal Gompa 2019-06-04 12:55:15 UTC
(In reply to Peter Robinson from comment #10)
> (In reply to Neal Gompa from comment #9)
> > (In reply to David Rheinsberg from comment #8)
> > > (In reply to Yaakov Selkowitz from comment #7)
> > > > That doesn't seem to be sufficient.  On F29 and F30 VMs hosted on a Seattle,
> > > > editing dnf/rpm/__init__.py is enough to get metadata to download, but then
> > > > trying to install or update anything fails because "package ____.armv7hl
> > > > does not have a compatible architecture".
> > > 
> > > Correct. You still need `--forcearch=armv7hl`. With that, everything works
> > > fine for me. If someone figures out how to fix that properly, please go
> > > ahead ;)
> > 
> > Fixing that requires rpm to declare 32-bit arm arches to be compatible with
> > aarch64 in the same way that 32-bit x86 is compatible with x86_64.
> 
> Why? They're not compatible. And they're as a result reported completely
> differently. The 64 bit variant is reported as aarch64, where as the 32 bit
> variant is reported as armv7l armv8l etc

It's not that black-and-white. There are AArch64 systems that do support running 32-bit ARM code (the ARM builders we use in Mageia are such systems). OpenMandriva and openSUSE have similar builders in place too, so that they can use more performant hardware to build for 32-bit ARM.

Unfortunately, RPM is not able to determine this compatibility at runtime, so mock with --forcearch is used in such cases so that we can do builds for 32-bit ARM on AArch64.

Comment 12 Peter Robinson 2019-06-04 18:37:44 UTC
> > Why? They're not compatible. And they're as a result reported completely
> > differently. The 64 bit variant is reported as aarch64, where as the 32 bit
> > variant is reported as armv7l armv8l etc
> 
> It's not that black-and-white. There are AArch64 systems that do support
> running 32-bit ARM code (the ARM builders we use in Mageia are such

It is actually very black and white but you're conflating two completely different issues.

The actual problem:
The architecture ISAs, unlike x86 and x86_64, are incompatible. More on that below.

Your answer:
Some vendor SoCs ship both ISAs in the same piece of silicon to enable application compatibility by being able to run both ISAs side by side giving an appearance of compatibility when there isn't.

The x86 -> x86_64 ISAs are compatible, the later initially being purely 64 bit instructions added to x86 instructions. A instruction superset if you will.

The "arm" and "aarch64" ISAs are not, and they're certainly not a sub/superset like x86, there are components around the ISA that are compatible or mostly compatible such as FPV and SIMD but the core ISAs are incompatible (that's why in the kernel has two separate arm/arm64 directories, unlike say powerpc or x86 where they're combined) with things like registers and other such things widely different.

The silicon, whether it be the Cortex-Axx references from Arm, or third party designs from Arm licensees can choose which components they put in the silicon because it adds cost and power and other such issues.

There's "ARMv8" silicon that has purely aarch64 ISA components (EG Marvell ThunderX2), both arm and aarch64 ISA (Cortex-A57) or just arm (Cortex-A32).

> Unfortunately, RPM is not able to determine this compatibility at runtime,
> so mock with --forcearch is used in such cases so that we can do builds for
> 32-bit ARM on AArch64.

Which is why it should only attempt this if the device reports armv7l or armv8l from uname -a, if it reports aarch64 it should be assumed it's aarch64 and hence incompatible.

This is actually going to get even worse in newer upcoming chips where they can possibly run EL0 as arm (32-bit userspace) but not EL1 (32-bit kernel).

This has clearly regressed due to trying to make assumptions around the two architectures that are naive at best or just wrong.

Comment 13 Gerd Hoffmann 2019-06-05 06:50:57 UTC
> Fixing that requires rpm to declare 32-bit arm arches to be compatible with
> aarch64 in the same way that 32-bit x86 is compatible with x86_64.

Hmm?  I don't think so.  rpm (or dnf?) needs to learn that armv7hcnl is a
superset of armv7hl and thus armv7hl rpms will work just fine on armv7hcnl
machines.

comment 6 explains this.

To compare with x86:  It's like dnf/rpm knowing that i386 rpms will work just
fine on i686 machines because i686 is a i386 superset.

Comment 14 Panu Matilainen 2019-06-05 08:41:40 UTC
> Which is why it should only attempt this if the device reports armv7l or armv8l from uname -a, if it reports aarch64 it  
> should be assumed it's aarch64 and hence incompatible.
>
> This is actually going to get even worse in newer upcoming chips where they can possibly run EL0 as arm (32-bit userspace) > but not EL1 (32-bit kernel).

They really should've named it "aargh"...

Comment 15 Jaroslav Rohel 2019-06-20 11:09:10 UTC
The reported bug "dnf.exceptions.Error: Incorrect or unknown "arch": armv7hcnl" was fixed.
More info in Comment 6 and PR https://github.com/rpm-software-management/dnf/pull/1404 was merged.

I will close the bug. If there is another problem, please open new bugreport.

Comment 16 Fedora Update System 2019-07-04 13:50:21 UTC
FEDORA-2019-58c2d3f1aa has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-58c2d3f1aa

Comment 17 Fedora Update System 2019-07-05 00:45:55 UTC
dnf-4.2.7-1.fc30, libdnf-0.35.1-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-58c2d3f1aa

Comment 18 Paul Whalen 2019-07-19 17:16:53 UTC
This still fails, now with incompatible arch errors:


 Problem 1: conflicting requests
  - package kernel-lpae-5.3.0-0.rc0.git3.1.fc31.armv7hl does not have a
compatible architecture
  - nothing provides kernel-lpae-core-uname-r =
5.3.0-0.rc0.git3.1.fc31.armv7hl+lpae needed by kernel-
lpae-5.3.0-0.rc0.git3.1.fc31.armv7hl
  - nothing provides kernel-lpae-modules-uname-r =
5.3.0-0.rc0.git3.1.fc31.armv7hl+lpae needed by kernel-
lpae-5.3.0-0.rc0.git3.1.fc31.armv7hl
 Problem 2: conflicting requests
  - package grubby-deprecated-8.40-34.fc31.armv7hl does not have a compatible
architecture

Comment 19 Peter Robinson 2019-07-19 18:05:23 UTC
pkratoch: all these arm changes need to be reverted, they're incorrect and breaking things. Or at the very least made configurable so the distros that want incorrect implementations can opt into them, we in Fedora (I'm speaking as both the Fedora Arm lead and as part of the actual Arm community) do not want this and it's causing us significant support load so please revert all the Arm changes and engage with the Arm community in Fedora on subsequent changes.

Comment 20 Pavla Kratochvilova 2019-07-22 07:16:11 UTC
Peter, I can only see PR https://github.com/rpm-software-management/dnf/pull/1404 associated with this bug. Is it sufficient to revert only this PR or are there more Arm changes you wish to revert?

Comment 21 Peter Robinson 2019-07-22 11:07:15 UTC
There was at least two changes, the first one where we started to get these errors, which are an incorrect "enhancement" to running ARMv7 on aarch64, which is when we started seeing the errors in this bug report, then there was the fix to this bug. All should go.

Comment 22 Pavla Kratochvilova 2019-07-22 11:59:23 UTC
Do you know, by any chance, which commit caused this? And if not, is it ok, if for now I make a Fedora 30 update with only the second patch reverted and revert the first one after I can discuss it with Jaroslav Rohel? If I understand this correctly, the first patch is in F30 now, the second is not, so such an update would not change anything.

Comment 23 Peter Robinson 2019-07-22 14:12:48 UTC
No, I don't, but if you just revert the second it'll still be broken.

Comment 24 Fedora Update System 2019-07-23 07:21:17 UTC
FEDORA-2019-672a74d688 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-672a74d688

Comment 25 Pavla Kratochvilova 2019-07-23 07:25:25 UTC
This bug, of course, shouldn't have been added to the new update to Fedora 30. Sorry about that, I removed it now. Also, I am moving this back to NEW.

Comment 26 Jun Aruga 2019-08-04 00:12:25 UTC
I faced this issue when running Fedora ARM 32 container image "arm32v7/fedora": https://hub.docker.com/r/arm32v7/fedora/ with QEMU on Travis CI.

https://travis-ci.org/junaruga/fedora-workshop-multiarch/jobs/567414034#L994
> Error: Incorrect or unknown "arch": armv7hcnl

Can we add a unit test to check this, by adding ARM-32 bit case to dnf upstream project's CI or adding dnf.spec %check section?

Comment 27 Jun Aruga 2019-08-05 09:10:54 UTC
> https://travis-ci.org/junaruga/fedora-workshop-multiarch/jobs/567414034#L994
> > Error: Incorrect or unknown "arch": armv7hcnl

I created the reproducer that you can check it on your local.

Prepare below Dockerfile.

```
$ cat Dockerfile 
FROM arm32v7/fedora

RUN uname -m
RUN rpm -q rpm --qf "%{arch}\n"
RUN rpm -q dnf

RUN ARCH=$(rpm -q rpm --qf "%{arch}") && \
  dnf -y --forcearch "${ARCH}" upgrade && \
  dnf -y --forcearch "${ARCH}" install gcc
```

Install qemu-user-static RPM if you have not installed it yet. [1]

```
$ sudo dnf install qemu-user-static
```

Then you will see the /proc/sys/fs/binfmt_misc/qemu-$arch files installed by the RPM on your local.
You can run different architecture's container on your local. My environment is x86_64.

```
$ ls /proc/sys/fs/binfmt_misc/qemu-*

$ uname -m
x86_64

$ podman run --rm -t arm32v7/fedora uname -m
armv7l
```

Then build above Dockerfile like this to see the error message.

```
$ podman build --rm -t my-fedora-armv7hl .
...
Installed:
  gnupg2-smime-2.2.13-1.fc29.armv7hl     grubby-8.40-18.fc29.armv7hl           
  libxkbcommon-0.8.2-1.fc29.armv7hl      pinentry-1.1.0-4.fc29.armv7hl         
  trousers-0.3.13-11.fc29.armv7hl        libsecret-0.18.7-1.fc29.armv7hl       
  xkeyboard-config-2.24-5.fc29.noarch    trousers-lib-0.3.13-11.fc29.armv7hl   

Complete!
Error: Incorrect or unknown "arch": armv7hcnl
Error: error building at STEP "RUN ARCH=$(rpm -q rpm --qf "%{arch}") &&   dnf -y --forcearch "${ARCH}" upgrade &&   dnf -y --forcearch "${ARCH}" install   gcc": error while running runtime: exit status 1
```

The container image is below environment.

```
$ uname -m
armv7l

$ rpm -q rpm --qf "%{arch}\n"
armv7hl

$ rpm -q dnf
dnf-4.0.9-2.fc29.noarch
```

* [1] qemu-user-static RPM installs /proc/sys/fs/binfmt_misc/qemu-* files.
  The files are not removed even when the RPM is removed by "dnf remove qemu-user-static".
  The files are not harmful, but if you want to remove the files, you can run below command on your responsibility.

  ```
  # find /proc/sys/fs/binfmt_misc -type f -name 'qemu-*' -exec sh -c 'echo -1 > {}' \;
  ```

  I reported the issue to qemu project.
  qemu-user-static: qemu-user-static works even after "dnf remove qemu-user-static"
  https://bugzilla.redhat.com/show_bug.cgi?id=1732178

Comment 28 Jaroslav Rohel 2019-08-07 14:15:30 UTC
Comment 21

>There was at least two changes, the first one where we started to get these errors, which are an incorrect "enhancement" to running ARMv7 on aarch64,

Do you mean the commit "Improve ARM detection" ( https://github.com/rpm-software-management/libdnf/pull/442 ) ? The commit unifies detection. It is good idea.
And the same algorithm was added to the RPM https://github.com/rpm-software-management/rpm/commit/8c3a7b8fa92b49a811fe36b60857b12f5d7db8a8 .

I'm not sure if armv7 with crypto instructions physically exists. May be it is problem only in "qemu" virtualization -> Emulation of physically not existent CPU.
I created utility with the same detection algorithm and try it with "qemu-arm" hosted on x86-64. The problem occurs only if "qemu-arm" is started without cpu definition or with definition "any" or "max" (max in this case means all features). 

#qemu-arm -cpu cortex-a7 /home/containers/fedora30_arm/arch_detect
Result 'armv7hnl'

#qemu-arm -cpu cortex-r5f /home/containers/fedora30_arm/arch_detect
Result 'armv7hl'

#qemu-arm -cpu any /home/containers/fedora30_arm/arch_detect
Result 'armv7hcnl'

#qemu-arm -cpu max /home/containers/fedora30_arm/arch_detect
Result 'armv7hcnl'

#qemu-arm /home/containers/fedora30_arm/arch_detect
Result 'armv7hcnl'

Anyway from my point of view "armv7hcnl" is "armv7hnl" with added crypto instruction set. Isn't it?

Comment 29 Jaroslav Rohel 2019-08-08 10:13:01 UTC
I suggest to add the new "armv7hcnl" (as superset of "armv7hnl") architecture into the "rpm" and "libsolv". What do you think about it?

Maybe the "armv7hcnl" is nonsense. And the "qemu-arm" or the "uname" reports wrongly armv7 with crypto instead armv8.

Summary of situation:
Some time ago there was merged patches  "Improve arm detection". into "libdnf" https://github.com/rpm-software-management/libdnf/pull/442 and into "rpm" https://github.com/rpm-software-management/rpm/commit/8c3a7b8fa92b49a811fe36b60857b12f5d7db8a8. 
The patches unify detection of all armv* architecture subtypes and add support for crypto extensions detection.
I don't know if physically exists arm v7 with crypto extension but "qemu-arm" with default (not specified) CPU is detected (according to my tests comment #28) as "armv7hcnl" (fpu, crypto, NEON, little endian) now.
The "armv7hcnl" architecture was not supported in libdnf. There was added patch into "libdnf"  https://github.com/rpm-software-management/dnf/pull/1404/files which add the "armv7hcnl" architecture and assumes that "armv7hcnl" is superset of "armv7hnl".
But the similar table with architectures is on more places. I found it in the libsolv project ("libsolv/src/poolarch.c"). I also found "arch_canon" in the rpm project ("rpm/rpm.c"). It is problem for DNF. Probably there is a workaround using "--forcearch armv7hnl".

Comment 30 Peter Robinson 2019-08-08 12:43:31 UTC
(In reply to Jaroslav Rohel from comment #29)
> I suggest to add the new "armv7hcnl" (as superset of "armv7hnl")
> architecture into the "rpm" and "libsolv". What do you think about it?
> 
> Maybe the "armv7hcnl" is nonsense. And the "qemu-arm" or the "uname" reports
> wrongly armv7 with crypto instead armv8.

No, please don't add this at all, the "c" component is actually garbage in the context. It should not be added at all. It's not an extension that is available on ARMv7 see some of the explanation in comment 12

Comment 31 Jaroslav Rohel 2019-08-09 07:14:36 UTC
Reply to Peter Robinson comment #30

DNF on "qemu-arm" is (in some configuration) broken now. We want to fix it.
I see 2 solutions:
1. adding the new "armv7hcnl" (as superset of "armv7hnl")
2. change of the detection algorithm to detect crypto extension only if arm version >= 8. This means "armv7hnl" will be detected instead "armv7hcnl".

I considered both solutions. I'm goint to do second one -> crypto will be detected only if arm version >= 8.
OK?

Comment 32 Jaroslav Rohel 2019-08-09 08:20:09 UTC
PR https://github.com/rpm-software-management/libdnf/pull/771
The crypto extension is detected only on arm version >= 8.

Comment 33 Peter Robinson 2019-08-13 16:42:41 UTC
> DNF on "qemu-arm" is (in some configuration) broken now. We want to fix it.
> I see 2 solutions:
> 1. adding the new "armv7hcnl" (as superset of "armv7hnl")
> 2. change of the detection algorithm to detect crypto extension only if arm
> version >= 8. This means "armv7hnl" will be detected instead "armv7hcnl".
> 
> I considered both solutions. I'm goint to do second one -> crypto will be
> detected only if arm version >= 8.
> OK?

3. Revert all the changes made around this so it's not special cased at all.

The optimisation of this specific feature is not something that should be optimised for by compile time, especially on ARMv7. It should be run time detected. Please don't do this.

Comment 34 Paul Whalen 2019-08-23 19:36:46 UTC
This is still broken. Is there a plan to move it forward?

Comment 35 Kevin Fenzi 2019-10-07 17:27:49 UTC
I'm hitting this trying to install armv7 builders for fedora. :( Pretty please a fix soon would be most welcome.

Comment 36 Peter Robinson 2019-10-16 15:03:17 UTC
So I'm just testing a possibly fix to this.

Comment 38 Fedora Blocker Bugs Application 2019-10-16 18:48:24 UTC
Proposed as a Blocker for 31-final by Fedora user pbrobinson using the blocker tracking app because:

 There's been a regression on Arm architectures which is now affecting clean installs on ARMv7 on certain newer hardware so this can't be fixed post release with an update. There's fixes in both armv7 and aarch64.

Not sure what the best blocker is here but these are possible:

Install problems on virt environments including the next generation builder hardware:
https://fedoraproject.org/wiki/Fedora_31_Beta_Release_Criteria#self-hosting-virtualization
The release must be able host virtual guest instances of the same release

https://fedoraproject.org/wiki/Fedora_31_Beta_Release_Criteria#Kickstart_delivery
Install via kickstart

Comment 39 Adam Williamson 2019-10-16 18:58:45 UTC
So this is a gigantic thread and not particularly clear exactly what the problem is at a glance. Can we get a simple outline of what the problem is and exactly when and how it would be encountered? thanks.

Comment 40 Kevin Fenzi 2019-10-16 19:02:40 UTC
A virt-install of a armv7 guest on a aarch64 host fails to install. Or if you install f29 with no updates (before the changes that caused this bug) and then tried to update, it would work the first time (using the old dnf) then be broken forever. 

I do not know how widespread the aarch64 hardware is, but I know first hand it affects lenovo ampere emag hardware (a popular enterprise aarch64 solution) because we have a bunch of them here to replace our existing hardware with.

Comment 41 Peter Robinson 2019-10-16 19:35:56 UTC
(In reply to Adam Williamson from comment #39)
> So this is a gigantic thread and not particularly clear exactly what the
> problem is at a glance. Can we get a simple outline of what the problem is
> and exactly when and how it would be encountered? thanks.

It would be encountered primarily in VM and in containers but could also possibly encountered on RPi4 (when we actually support it) and other newer HW types. Basically some "features" were added to dnf/rpm stack which are in fact bugs, no-one from the Fedora or Red Hat Arm teams were engaged for the PRs to verify the correctness.

I initially thought the issue was isolated to the Seattle platform, which is not widely available, it eventually became clear it was a much wider problem, I had outlined the solution (to revert the changes) and assumed it has been dealt with but infra-eng escalated it to me this week due to failures in install of build VMs on the new arm build HW so I prioritised it. So primarily armv7 but there was also some aarch64 issues in the over all changes that affect aarch64 on F-31+

Comment 42 Gerd Hoffmann 2019-10-17 06:42:21 UTC
It seems to be fixed in f30 (my armv7 vm @ seattle updates fine again without any trickery (see comment 6 and comment 8).

Comment 43 Peter Robinson 2019-10-17 07:28:43 UTC
(In reply to Gerd Hoffmann from comment #42)
> It seems to be fixed in f30 (my armv7 vm @ seattle updates fine again
> without any trickery (see comment 6 and comment 8).

We're still getting reports of issues with F-30 so that's interesting.

Comment 44 Gerd Hoffmann 2019-10-17 08:41:05 UTC
(In reply to Peter Robinson from comment #43)
> (In reply to Gerd Hoffmann from comment #42)
> > It seems to be fixed in f30 (my armv7 vm @ seattle updates fine again
> > without any trickery (see comment 6 and comment 8).
> 
> We're still getting reports of issues with F-30 so that's interesting.

Maybe because you don't get the machine out of the broken state without manual invention?

My versions:
  kraxel@arm-b32 ~# rpm -q python3-hawkey python3-dnf
  python3-hawkey-0.35.5-2.fc30.armv7hl
  python3-dnf-4.2.11-2.fc30.noarch

Looking at /usr/lib/python3.7/site-packages/dnf/rpm/__init__.py I see armv7hcnl being included there,
but it's not from manual patching (comment 6), it's the packages version according to "rpm --verify".

Comment 45 Daniel Mach 2019-10-17 11:52:58 UTC
(In reply to Peter Robinson from comment #37)
> Pull requests filed upstream for this issue:
> 
> https://github.com/rpm-software-management/rpm/pull/901
> https://github.com/rpm-software-management/libdnf/pull/818
> https://github.com/rpm-software-management/dnf/pull/1506

We'll review the pull-requests.

I'm a little bit concerned about the timing.
This is proposed as F31 blocker and if it gets approved, it should be fixed ASAP which means little time for testing.
The changes aren't big, but they may impact rpm/dnf stack on Fedora and also other distros.

I'm proposing to make downstream patches fixing critical Fedora issues first.
That should give us enough time to properly discuss and develop everything in upstream.

Could someone summarize failing scenarios on Fedora and help me understand how the minimal fix would look like?
We definitely don't want to break more things than we fix.

Comment 46 Peter Robinson 2019-10-17 12:57:30 UTC
(In reply to Daniel Mach from comment #45)
> (In reply to Peter Robinson from comment #37)
> > Pull requests filed upstream for this issue:
> > 
> > https://github.com/rpm-software-management/rpm/pull/901
> > https://github.com/rpm-software-management/libdnf/pull/818
> > https://github.com/rpm-software-management/dnf/pull/1506
> 
> We'll review the pull-requests.
> 
> I'm a little bit concerned about the timing.
> This is proposed as F31 blocker and if it gets approved, it should be fixed
> ASAP which means little time for testing.

I asked the changes to be reverted in comment 19 back in July but it was ignored, subsequently it was escalated to me this week where I investigated.

> The changes aren't big, but they may impact rpm/dnf stack on Fedora and also
> other distros.

Yet the Arm people in Fedora weren't consulted on the impact of these on us when they were initially proposed for merge.

> I'm proposing to make downstream patches fixing critical Fedora issues first.

That's fine for me, at least we don't end up with uninstallable and hence unsupportable on key new hardware on primary architectures.

> That should give us enough time to properly discuss and develop everything
> in upstream.

If the architecture changes were highlighted to Fedora architecture leads when they were proposed we wouldn't have got into this situation in the first place.

> Could someone summarize failing scenarios on Fedora and help me understand
> how the minimal fix would look like?
> We definitely don't want to break more things than we fix.

See comments above.

Comment 47 Matthew Miller 2019-10-17 13:12:55 UTC
+1 blocker, especially given comment #40 and the hardware we have.

Comment 48 Peter Robinson 2019-10-17 13:16:01 UTC
> > I'm proposing to make downstream patches fixing critical Fedora issues first.
> 
> That's fine for me, at least we don't end up with uninstallable and hence
> unsupportable on key new hardware on primary architectures.

I can do PRs for the key fixes in src.fp.o if that makes it easier.

Comment 49 Adam Williamson 2019-10-17 16:04:42 UTC
I think so, yeah. I'm willing to take more or less on trust that this is a blocker if you and Kevin say so, but please do provide the minimal fixes that need to go in the frozen release, and defer anything else to post-release updates. Thanks!

Comment 50 Adam Williamson 2019-10-17 19:54:08 UTC
Discussed at 2019-10-17 Fedora 31 go/no-go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-meeting-1/2019-10-17/f31-final-go_no_go-meeting.2019-10-17-17.00.html . Accepted as a blocker under Beta criterion "The release must be able host virtual guest instances of the same release" applied to affected ARM platforms.

Comment 51 Panu Matilainen 2019-10-18 10:54:41 UTC
Just for the record, I've no objections whatsoever to revert any ARM-related changes in Fedora rpm that *need* reverting to resolve this blocker swiftly. Once that is out of the way, we can let upstream discussions run their course.

Comment 52 Panu Matilainen 2019-10-18 12:02:30 UTC
Currently building rpm for rawhide + f31 with the problematic armv8 variants removed:
https://koji.fedoraproject.org/koji/taskinfo?taskID=38364945
https://koji.fedoraproject.org/koji/taskinfo?taskID=38364959

Is that sufficient to close this, or are dnf-side updates needed too?

Comment 53 Peter Robinson 2019-10-18 12:22:48 UTC
Both libdnf and dnf changes are needed too.

Comment 54 Fedora Update System 2019-10-18 12:34:53 UTC
FEDORA-2019-801cba6c72 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-801cba6c72

Comment 55 Panu Matilainen 2019-10-18 12:36:54 UTC
(In reply to Peter Robinson from comment #53)
> Both libdnf and dnf changes are needed too.

Ack. Rpm should be taken care of (at least the most pressing need) by the update just submitted.

Comment 56 Daniel Mach 2019-10-18 12:39:29 UTC
(In reply to Peter Robinson from comment #53)
> Both libdnf and dnf changes are needed too.

The libdnf update is definitely needed, because the fix changes arch detection in the dnf stack.
I don't think that the dnf code change is needed to fix the blocker, because it's only a mapping of detected arch (libdnf code) to basearch.

I've checked the pull requests and they seem they have only relevant changes for resolving the blocker.
Since our CI is not working at the moment, I need to manually verify that the code compiles and tests pass
Then we'll release a new libdnf build.
I believe that the dnf code change can be delivered any time later.

Comment 57 Fedora Update System 2019-10-18 17:36:06 UTC
rpm-4.15.0-3.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-801cba6c72

Comment 58 Fedora Update System 2019-10-21 09:17:57 UTC
FEDORA-2019-8d9a447e0d has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-8d9a447e0d

Comment 59 Fedora Update System 2019-10-21 14:35:15 UTC
libdnf-0.35.3-6.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-8d9a447e0d

Comment 60 Fedora Update System 2019-10-21 18:54:20 UTC
rpm-4.15.0-3.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 61 Paul Whalen 2019-10-23 16:38:12 UTC
Testing F31 RC 1.8 compose this still fails with:

An unknown error has occurred
  package libreport-filesystem-2.10.1-2.fc31.noarch is intended for a different architecture
  package dnf-data-4.2.9-5.fc31.noarch is intended for a different architecture
  package libX11-common-1.6.8-3.fc31.noarch is intended for a different architecture
  package kbd-misc-2.0.4-14.fc31.noarch is intended for a different architecture

Comment 62 Adam Williamson 2019-10-23 16:46:28 UTC
I checked that RC-1.8 did get the updated rpm (it has 4.15.0-5.fc31) and libdnf (it has 0.35.3-6.fc31). So there must still be more to do somehow?

Comment 63 Fedora Update System 2019-10-23 20:27:49 UTC
FEDORA-2019-fec5c2fbb8 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-fec5c2fbb8

Comment 64 Paul Whalen 2019-10-24 02:04:41 UTC
Fixed in rpm-4.15.0-6.fc31 and F31 RC 1.9.

Comment 65 Fedora Update System 2019-10-24 17:09:49 UTC
libdnf-0.35.3-6.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 66 Fedora Update System 2019-10-24 17:09:58 UTC
rpm-4.15.0-6.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 67 Paul Whalen 2019-11-14 16:42:34 UTC
Reopening, this is back in rawhide. Composes are failing with:

package * is intended for a different architecture

Full logs:
https://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20191112.n.0/logs/armhfp/buildinstall-Everything.armhfp.log

Comment 68 Adam Williamson 2019-11-14 18:20:59 UTC
So I think I have a theory here. pbrobinson attempted to add a patch that reverts the upstream change that introduces the possibility of 'armv7hcnl':

https://src.fedoraproject.org/rpms/rpm/blob/master/f/0001-Revert-Improve-ARM-detection.patch

but on master branch, he neglected to actually add the patch to the spec file. (On F31 branch he did). So rpm-4.15.0-6.fc32 still has the problematic code that can result in this 'armv7hcnl' identifier, which I think we don't want (the reversion makes it impossible for that to be produced, AFAICS).

We figured out today that what changed on 2019-11-12 - when this compose failure started happening - is we started running the composes on newer (ampere) boxes rather than older (moonshot) boxes. So my belief here is that, with the problematic new code, our identifier comes out as 'armv7hcnl' on the ampere HW, but not the moonshot HW. There are various bits of code in rpm which are set up to deal with 'armv7hl' as an arch identifier, but which do not handle 'armv7hcnl', so that's probably the issue here. Applying the patch so the new code really gets reverted should mean we get 'armv7hl' on the ampere HW, and hopefully should mean the compose works.

So, I think we need to do a rpm-4.15.0-7.fc32 with the patch actually applied and hopefully that will solve things. I've just fired that build: https://koji.fedoraproject.org/koji/taskinfo?taskID=38998093

once that's done, we can refire the compose and see what happens.

Comment 69 Adam Williamson 2019-11-14 18:30:46 UTC
sorry, we'll get armv7hnl . the n is fine, it's the c that other code doesn't expect to show up.

Comment 70 Ben Cotton 2020-02-11 15:43:35 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.


Note You need to log in before you can comment on or make changes to this bug.