Bug 1420523 - Cloud base image composes fail in current Rawhide
Summary: Cloud base image composes fail in current Rawhide
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: lorax
Version: 26
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Brian Lane
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
: 1424812 (view as bug list)
Depends On:
Blocks: F26AlphaBlocker 1420146
TreeView+ depends on / blocked
 
Reported: 2017-02-08 21:29 UTC by Adam Williamson
Modified: 2017-03-02 23:44 UTC (History)
11 users (show)

Fixed In Version: lorax-26.6-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-02 23:44:44 UTC
Type: Bug


Attachments (Terms of Use)

Description Adam Williamson 2017-02-08 21:29:19 UTC
The Cloud base images always fail to build in current Rawhide composes, e.g.: https://koji.fedoraproject.org/koji/taskinfo?taskID=17670764

The problem looks to be a size one (see screenshot.ppm): "Not enough space in file systems for the current software selection. An additional 295 MiB is needed."

Dennis says this likely needs a tweak in the Pungi config.

On https://fedoraproject.org/wiki/Releases/26/ReleaseBlocking , these images are still listed as release blocking. Therefore I'm marking this as an automatic blocker: "Bugs which entirely prevent the composition of one or more of the release-blocking images required to be built for a currently-pending (pre-)release" - https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Automatic_blockers . From other discussions though, we may not actually want these images to be release blocking any more; I'll try and start a discussion about that.

Comment 1 Adam Williamson 2017-02-08 21:38:04 UTC
Reported to Pagure pungi-fedora (as requested by Dennis): https://pagure.io/pungi-fedora/issue/130

Comment 2 Adam Williamson 2017-02-12 15:57:38 UTC
A kickstart change has been made which ought to fix this, I believe, but in recent composes, the buildinstall-Cloud task is failing instead:

https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20170212.n.0/logs/x86_64/buildinstall-Cloud.x86_64.log

2017-02-12 07:26:37,013: doing post-install configuration
doing post-install configuration
2017-02-12 07:26:37,035: running runtime-postinstall.tmpl
running runtime-postinstall.tmpl
2017-02-12 07:26:37,151: command output:
error: Failed to initialize NSS library

command output:
error: Failed to initialize NSS library

2017-02-12 07:26:37,151: command returned failure (1)
command returned failure (1)
2017-02-12 07:26:37,152: template command error in runtime-postinstall.tmpl:
template command error in runtime-postinstall.tmpl:
2017-02-12 07:26:37,152:   runcmd chroot /var/tmp/lorax.sxfb6lkt/installtree /bin/rpm -qa --pipe tee /root/lorax-packages.log
  runcmd chroot /var/tmp/lorax.sxfb6lkt/installtree /bin/rpm -qa --pipe tee /root/lorax-packages.log
2017-02-12 07:26:37,153:   subprocess.CalledProcessError: Command '['chroot', '/var/tmp/lorax.sxfb6lkt/installtree', '/bin/rpm', '-qa', '--pipe', 'tee /root/lorax-packages.log']' returned non-zero exit status 1.
  subprocess.CalledProcessError: Command '['chroot', '/var/tmp/lorax.sxfb6lkt/installtree', '/bin/rpm', '-qa', '--pipe', 'tee /root/lorax-packages.log']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/sbin/lorax", line 273, in <module>
    main()
  File "/usr/sbin/lorax", line 133, in main
    remove_temp=True, verify=opts.verify)
  File "/usr/lib/python3.6/site-packages/pylorax/__init__.py", line 292, in run
    rb.postinstall()
  File "/usr/lib/python3.6/site-packages/pylorax/treebuilder.py", line 145, in postinstall
    self._runner.run("runtime-postinstall.tmpl", configdir=configdir_path)
  File "/usr/lib/python3.6/site-packages/pylorax/ltmpl.py", line 220, in run
    self._run(commands)
  File "/usr/lib/python3.6/site-packages/pylorax/ltmpl.py", line 239, in _run
    f(*args)
  File "/usr/lib/python3.6/site-packages/pylorax/ltmpl.py", line 515, in runcmd
    stdout = runcmd_output(cmd)
  File "/usr/lib/python3.6/site-packages/pylorax/executils.py", line 347, in runcmd_output
    return execWithCapture(cmd[0], cmd[1:], **kwargs)
  File "/usr/lib/python3.6/site-packages/pylorax/executils.py", line 249, in execWithCapture
    reset_handlers=reset_handlers, reset_lang=reset_lang)[1]
  File "/usr/lib/python3.6/site-packages/pylorax/executils.py", line 201, in _run_program
    raise subprocess.CalledProcessError(proc.returncode, argv, output)
subprocess.CalledProcessError: Command '['chroot', '/var/tmp/lorax.sxfb6lkt/installtree', '/bin/rpm', '-qa', '--pipe', 'tee /root/lorax-packages.log']' returned non-zero exit status 1.

Comment 3 Dennis Gilmore 2017-02-20 20:54:03 UTC
changes in the most recent version of nss have broken the ability to compose fedora. as a workaround we have untagged the latest nvr.

Comment 4 Kai Engert (:kaie) (inactive account) 2017-02-20 21:00:11 UTC
Very strange.
Did 3.29 break, but 3.28.x works?

Could you please name the NVRs that work for you?
Thanks

Comment 5 Kai Engert (:kaie) (inactive account) 2017-02-20 21:07:35 UTC
Does this bug occur without lorax, simply by executing "rpm -qa" ?

Comment 6 Kai Engert (:kaie) (inactive account) 2017-02-20 21:25:02 UTC
I'll give you some information about a recent issue with NSS, although I had assumed it's unrelated to RPM.

We had recently discovered that NSS 3.28 had introduced an incompatible change of a binary API. If an application was built against earlier version of NSS, and upgraded to NSS 3.28, the application could be broken.

We reverted the bad change upstream.

Broken upstream versions are:
- 3.28
- 3.28.1
- 3.28.2
- 3.29

Fixed upstream versions are:
- 3.28.3
- 3.29.1

This means, an application that was built against 3.28, 3.28.1, 3.28.2, 3.29, and then is linked to 3.28.3 or 3.29.1, the application must be rebuilt, to ensure it's back to using the old ABI.

However, based on source inspection, I assumed that the list of affected applications/packages is limited to java-1.8.0-openssl and nss-pem.

I couldn't find any use of the affected data structures in the RPM sources, so I currently believe that RPM isn't affected by this ABI issue. Am I wrong?

Comment 7 Kamil Dudka 2017-02-21 09:22:51 UTC
(In reply to Dennis Gilmore from comment #3)
> changes in the most recent version of nss have broken the ability to compose
> fedora. as a workaround we have untagged the latest nvr.

I guess this implies I cannot easily rebuild nss-pem against the latest nss any more?

(In reply to Kai Engert (:kaie) from comment #6)
> We had recently discovered that NSS 3.28 had introduced an incompatible
> change of a binary API. If an application was built against earlier version
> of NSS, and upgraded to NSS 3.28, the application could be broken.

Based on which technical data are you assuming that the broken rpm and some ABI change that broke java are the same issues?

Do we have some steps to reproduce this bug locally?

I believe that nss maintainers tried to install the updated nss packages on their staging systems before pushing it to rawhide.  So just installing the latest nss will likely not be sufficient to trigger this bug?

Comment 8 Kai Engert (:kaie) (inactive account) 2017-02-21 10:00:49 UTC
(In reply to Kamil Dudka from comment #7)
> (In reply to Kai Engert (:kaie) from comment #6)
> > We had recently discovered that NSS 3.28 had introduced an incompatible
> > change of a binary API. If an application was built against earlier version
> > of NSS, and upgraded to NSS 3.28, the application could be broken.
> 
> Based on which technical data are you assuming that the broken rpm and some
> ABI change that broke java are the same issues?

I don't know if they are the same issue.

I provided the information in case it's related.


> Do we have some steps to reproduce this bug locally?

That's my question, too.


FYI, I have a local rawhide VM, which was using the NSS 3.29 build, and the system was working for me. The graphical desktop was starting, and I could even run firefox.

I want to help, but I don't know how to trigger the failure with rpm.

Comment 9 Kai Engert (:kaie) (inactive account) 2017-02-21 14:01:03 UTC
Good news. I'm able to reproduce the bug.

I've downloaded a mirror of the latest-rawhide tree from the download server. jkonecny pointed me to the config files that are used by lorax. I've tweaked those files to install only a small subset of the files. (Because for whatever reason I don't know, the latest-rawhide tree is missing a lot of files that lorax wants to install, like tmux, gdb etc.)

With nss-*3.28.1*.rpm files, I gets past step "running runtime-postinstall.tmpl" and can run "rpm -qa" fine.

Then I tweaked my local tree. Because changing the repodata files is complicated, I used a symbolic link approach, and simply linked the nss-*3.28.1*.rpm files to newer 3.29.x files.

I was able to reproduce the failure using both 3.29 and 3.29.1.
That means, the issue isn't related to the ABI issue I had mentioned in comment 6, which means "rpm" isn't affected by the ABI issue, which is good news.

So the issue must be change between 3.28.x and 3.29.x


Then Daiki Ueno guessed that the issue might be related to 
  https://bugzilla.mozilla.org/show_bug.cgi?id=889116

And indeed, I was able to confirm that's the issue.

NSS now requires that /dev/urandom is present on Linux, but the chroot environment created by lorax doesn't have that.


Can lorax be fixed to setup /dev/urandom ?

Comment 10 Kai Engert (:kaie) (inactive account) 2017-02-21 14:38:58 UTC
I suggest to mount /dev inside the chroot environment.

I've submitted a pull request at https://github.com/rhinstaller/lorax/pull/189
That change makes it work in my reduced test case.

Comment 11 Kai Engert (:kaie) (inactive account) 2017-02-21 14:39:41 UTC
I'm reassining to lorax. Do you agree to apply a fix to lorax?

Comment 12 Kai Engert (:kaie) (inactive account) 2017-02-21 14:53:45 UTC
FWIW, I'd like to ask that once you fix this bug with lorax, please move back to the newer NSS 3.29.1 packages, beause some other rawhide packages already depend on it - and we need to ensure that java-1.8.0-openjdk and nss-pem are rebuilt into a sane state.

Comment 13 Kai Engert (:kaie) (inactive account) 2017-02-21 16:30:37 UTC
Brian said he doesn't want to mount all of /dev
and suggested to use mknod to create the device node for /dev/urandom only.
I confirmed that works, too.

mknod -m 444 /dev/random c 1 8
mknod -m 444 /dev/urandom c 1 9

Comment 14 Adam Williamson 2017-02-21 16:42:22 UTC
Holy crap, mknod? Let's party like it's 1999!

Comment 15 Kai Engert (:kaie) (inactive account) 2017-02-21 17:11:37 UTC
(In reply to Adam Williamson from comment #14)
> Holy crap, mknod? Let's party like it's 1999!

lorax sets up a OS environment from scratch, and in 2017 a good randomness source shall be present

Comment 16 Adam Williamson 2017-02-21 17:16:27 UTC
sure, I wasn't criticising, just found it funny :)

Comment 17 Kevin Fenzi 2017-02-21 17:25:56 UTC
*** Bug 1424812 has been marked as a duplicate of this bug. ***

Comment 18 Kai Engert (:kaie) (inactive account) 2017-02-22 15:14:40 UTC
Brian, thanks a lot for fixing Rawhide F26 !

Would it be possible to deliver this fix into stable F24 + F25, too?

I guess that's blocking an upgrade to NSS 3.29.x (which will be required for Firefox 53).

Comment 19 Brian Lane 2017-02-22 16:09:41 UTC
(In reply to Kai Engert (:kaie) from comment #18)
> Brian, thanks a lot for fixing Rawhide F26 !
> 
> Would it be possible to deliver this fix into stable F24 + F25, too?
> 
> I guess that's blocking an upgrade to NSS 3.29.x (which will be required for
> Firefox 53).

I'll do that today.

Comment 20 Fedora End Of Life 2017-02-28 11:13:25 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 21 Adam Williamson 2017-03-02 23:44:44 UTC
This is clearly resolved, as we have Cloud image composes again in recent Rawhide.


Note You need to log in before you can comment on or make changes to this bug.