Bug 1243736 - Layered Docker Image Build Service
Summary: Layered Docker Image Build Service
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: Changes Tracking
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Adam Miller
QA Contact: Pete Travis
URL:
Whiteboard: ChangeAcceptedF23, SystemWideChange, ...
: 1292478 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-16 07:49 UTC by Jan Kurik
Modified: 2016-09-29 11:26 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-09-29 11:26:06 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Jan Kurik 2015-07-16 07:49:21 UTC
This is a tracking bug for Change: Layered Docker Image Build Service
For more details, see: https://fedoraproject.org//wiki/Changes/Layered_Docker_Image_Build_Service

Fedora currently ships a Docker base image, but Docker supports a layering concept.
There are some applications like Cockpit which we would like to ship as layered applications.

Comment 1 Jan Kurik 2015-08-06 09:05:23 UTC
Removing from the scope of F23 as we have already passed the deadline when this change needs to be in MODIFIED state.

Comment 2 Jan Kurik 2015-08-06 09:06:04 UTC
Oh, sorry. This stays in the F23 scope.

Comment 3 Jan Kurik 2015-09-01 14:54:26 UTC
This message is a reminder that Fedora 23 Change Checkpoint: 100% Code Complete Deadline is on 2015-Sep-08 [1].

Expected bug state is ON_QA - Change has to be code complete and is possible to test it in the Beta release.

Status will be provided to FESCo right after the deadline. If, for any reasons, your Change is not in required state, let me know and we will try to find solution.  It's important milestone as contingency plan may be put into effect if the Change that miss this deadline.

In case of any questions, don't hesitate to ask Wrangler (jkurik). Thank you.

[1] https://fedoraproject.org/wiki/Releases/23/Schedule

Comment 4 Dennis Gilmore 2015-09-02 20:03:27 UTC
We are delaying this to Fedora 24

Comment 5 Adam Miller 2015-09-02 21:01:00 UTC
My apologies for the radio silence. I've been relatively head-down on this work as much as possible. All involved upstreams have been very helpful and responsive along the way, but the container build toolchain is a space where there's a lot of very active development, moving parts, and sometimes things changing out from one another. Here's the saga so far:

Were considering a different backend for the koji component focused directly on atomic-reactor instead of a full OpenShift Install (there were concerns that running a full PaaS just for build tooling was a lot of overhead but given the momentum behind, development effort going into it, and the upstream acceptance it ended up being the best option)

Proof-of-concept atomic-reactor koji plugin
    https://pagure.io/koji-atomic-reactor-plugin
    http://koji.stg.fedoraproject.org/koji/taskinfo?taskID=90001193
Discussion upstream:
https://lists.projectatomic.io/projectatomic-archives/atomic-devel/2015-July/msg00039.html

Issues along the way:

osbs uses system:anonymous no matter what username passed in
    https://github.com/projectatomic/osbs-client/issues/164

OSBS Ansible Playbook failing deployment of OSv3 Node
    https://github.com/projectatomic/ansible-osbs/issues/11

OpenShift's Ansible playbook failing on pre-existing host
    https://github.com/openshift/openshift-ansible/issues/293

atomic-reactor fails on RHEL7/CentOS7 (python-requests too old)
    https://github.com/projectatomic/atomic-reactor/issues/225
    - This has been fixed, but now it fails because the version is too new.

atomic-reactor fails on Fedora 22 (python-requests too new)
    https://github.com/projectatomic/atomic-reactor/issues/226

I ended up maintaining my own atomic-reactor CORP (and a handful of other packages) where work arounds were necessary
    https://copr.fedoraproject.org/coprs/maxamillion/atomic-reactor/

SELinux permission denied for atomic-reactor builds
    https://github.com/projectatomic/atomic-reactor/issues/206

"No valid build json" error related to SELinux issue in atomic-reactor
    https://github.com/projectatomic/atomic-reactor/issues/228

Deployment notes on multi-tenant/multi-namespaced OpenShift for OSBS deploy (upstream does not recommend this, we're not going that route)
    https://gist.github.com/maxamillion/620d29ebb4894eb262e8

Deployment notes on dedicated OpenShift deploy for OSBS (this doc is continuing to be updated as work is being done)
    https://gist.github.com/maxamillion/7e71f252830d08da4e3a 

The OpenShift Ansible deployment documentation was incorrect which
lead to a wild goose chase of what was going wrong with the
deployment. Opened an issue upstream and submitted a fix.
    https://github.com/openshift/openshift-ansible/issues/403
    https://github.com/openshift/openshift-ansible/pull/404

Found that OpenShift is logging RSA private keys, for the sake of
OSBS this likely doesn't impact us since users won't be on the
builder and therefore won't have access to the log but it's still
concerning and I reported upstream.
    https://github.com/openshift/origin/issues/3951

atomic-reactor added backports-lzma but didn't update their spec, this
caused failures when testing tito built rpms from git
    https://github.com/projectatomic/atomic-reactor/issues/247

Another issue found with backports-lzma from inside the container,
this is still ongoing upstream to decide on a proper fix
    https://github.com/projectatomic/atomic-reactor/pull/246

Build issue with upstream atomic-reactor
- Added dependencies to runtime without adding them to the internal
docker build container or the spec file.
    https://github.com/projectatomic/atomic-reactor/issues/247
    https://github.com/projectatomic/atomic-reactor/pull/252
    https://github.com/projectatomic/atomic-reactor/pull/246
Also some changes made to the project file layout and not updated in the spec
    https://github.com/projectatomic/atomic-reactor/issues/250
    https://github.com/projectatomic/atomic-reactor/pull/251

Ambiguous naming in the openshift-ansible repo (concern for Fedora Infra)
  Unfortunately the names 'master' and 'node' are used to identify
host groups in their ansible configs which is extremely vague. This
issue has been brought up but broke something else so the commit has
been reverted and this will be revisited.
    https://github.com/openshift/openshift-ansible/pull/423

atomic-reactor dep issue on RHEL7 because python-websocket-client version too old
    https://github.com/projectatomic/atomic-reactor/issues/262

Dusty Mabe found an issue where docker changed default functionality
and it turns out that the docker fedora package is in violation of all
sorts of guidlelines. I've become a co-maintainer and plan to work on
fixing asap. It's in bad shape, more on this as I find time to work on
it.
    https://bugzilla.redhat.com/show_bug.cgi?id=1252168
    https://github.com/release-engineering/koji-containerbuild/issues/9

Started some work on breaking out the docker package properly, found
that docker-selinux had no license. This has been resolved upstream.
    https://github.com/fedora-cloud/docker-selinux/issues/6

atomic-reactor fails on python-six verson for EL7
    https://github.com/projectatomic/atomic-reactor/issues/286

docker 1.7 is broken because of Red Hat patches (not upstream'd)
    https://github.com/docker/docker/issues/12487#issuecomment-135550256

docker 1.8 is broken because of Red Hat patches (not upstream'd)
    https://bugzilla.redhat.com/show_bug.cgi?id=1258037

Fix up some openshift origin spec file problems found in osbs testing
with v1.0.5.
    https://github.com/openshift/origin/pull/4413

RFE docs on koji-container build what to do after initial install (I
ended up figuring this out, I have plans to submit a PR with added as
soon as I find the time).
       https://github.com/release-engineering/koji-containerbuild/issues/11

fix paths for koji-containerbuild spec file
       https://github.com/release-engineering/koji-containerbuild/pull/12

Comment 6 Jan Kurik 2015-09-03 11:38:53 UTC
Change moved from Category:ChangeAcceptedF23 to Category:ChangePageIncomplete.

Comment 7 Matthew Miller 2015-09-03 14:56:02 UTC
Should this be resubmitted to fesco as an F24 change?

Comment 8 Tomas Tomecek 2015-09-04 07:48:28 UTC
Current status is that Fedora will have full OSBS deployment (openshift with builder image with atomic-reactor preinstalled inside), as described in e.g. [1]. I think it would be good idea to write down precise architecture of final solution (how will authentication work, what workflow for building images will be used, how will builder image be updated, will there be stage and prod? ...).

Adam, are you planning to write such design and if so, can we (osbs upstream) assist with that somehow?

[1] https://github.com/projectatomic/osbs-client/blob/master/docs/osbs_instance_setup.md

Comment 9 Adam Miller 2015-09-04 15:43:34 UTC
For the sake of posterity, the proof of concept atomic-reactor koji plugin was evaluated and decided against not because it was a bad option but because we would have had to re-implement a lot of the functionality the the koji-containerbuild plugin already delivers and by the time we were done it would not have been much more simple of an approach if at all. The koji-containerbuild plugin provides introspection into the resulting built image in order to verify that the contents of the image originated from the koji build sytem, provides a manifest of the included content, among other reproduce-ability features that the release engineering team is interested in. So, for the sake of not re-inventing the wheel specifically for Fedora and to align with others in the upstream community it was decided to go with koji-containerbuild backed by OSBS which is comprised of OpenShift V3, osbs-client, atomic-reactor, and docker. The above log is a listing of various issues found along the way with each of these very actively developed components.

Comment 10 Adam Miller 2015-09-04 15:45:13 UTC
Tomas,
    I would love to work together on the final design, there are some aspects of the authentication that need hashing out and I would appreciate your help with. I'll sync with you off-BZ and once we have something we can update here with information.

Comment 11 Tomáš Hozza 2015-09-07 09:21:44 UTC
So just to double check, the change is moved to F24 due to incompleteness?


(In reply to Matthew Miller from comment #7)
> Should this be resubmitted to fesco as an F24 change?

Yes, this is the case with any change that is moved to the next release.

Comment 12 Adam Miller 2015-09-18 15:31:27 UTC
Yes, this change has been moved to F24.

Comment 13 Jan Kurik 2015-12-17 14:50:08 UTC
*** Bug 1292478 has been marked as a duplicate of this bug. ***

Comment 14 Amanda Carter 2016-01-13 21:56:39 UTC
=Status update=

Completed
 - OpenShift Origin change is completed
 - Internal changes to support v2 completed
 - OSBS running on F23
 - distgit changes to support containers
 - fedpkg changes to support containers
 - pkgdb changes to support containers


Currently working on
 - Proposal for a containers committee to make decisions about packaging, naming, and other standards
 - Filing RFEs with Taskotron & to test containers
 - Packaging pulp & pulp plugins for Fedora
 - Ansible for OSBS


Next up
 - Deploying the koji-containerbuild plugin
 - rhpkg changes to support containers


Risks / notes
 - Due to timelines and complexity, deploying Pulp in prod is at risk for F24 Alpha; we have a contingency plan for this if needed and will target it at a later date

Comment 15 Jan Kurik 2016-02-24 14:26:19 UTC
On 2016-Feb-23, we have reached Fedora 24 Change Checkpoint: Completion deadline (testable).

At this point, all accepted changes should be substantially complete, and testable. Additionally, if a change is to be enabled by default, it must be so enabled at Change Completion deadline.

Change tracking bug should be set to the MODIFIED state to indicate it achieved completeness.

Incomplete and non testable Changes will be reported to FESCo on 2016-Feb-26 meeting.  Contingency plan for System Wide Changes, if planned for Alpha (or in case of serious doubts regarding Change completion), will be activated.

Comment 16 Adam Miller 2016-02-25 18:08:26 UTC
Current status:
 - The Fedora OSBS System deploy-able via Fedora Infrastructure Ansible Playbooks
 - Testing osbs deployment is in place
 - OSBS builds can be completed (however not yet tied to koji)
 - docker-distribution registry is deploy-able via Fedora Infrastructure Ansible Playbooks
 - Testing docker-distribution registry is in place
 - koji-containerbuild plugin has a patch for Ansible playbooks, but there are Infrastructure networking concerns that need to be sorted out to allow the builder network to talk to OSBS
 - Request for Resources has been placed for both Stage and Production environment (these will be deployed with same Ansible Roles as testing environment just with environment configuration changes as needed)
 - distgit has been "namespaced" to allow for docker/ and rpms/ namespaces (allowing builds for both via fedpkg)

Things not yet done:
 - fedpkg still needs some config entries to enable 'fedpkg container-build'
 - deployment to stage and prod (pending network configs, as listed above)
 - testing the system end-to-end

Risks:
 - Once everything is deployed and in place, it might not work

Fallback plan:
 - Worst case scenario, this gets pushed to F25 since it won't impact any current Fedora deliverables. It's goal is to enable new deliverables in the future.

Comment 17 Adam Miller 2016-03-11 19:55:07 UTC
Everything is deployed, we are testing but running into a potential NSS bug. 
- https://github.com/release-engineering/koji-containerbuild/issues/25


Also, patches are in for fedpkg/rpkg
- https://pagure.io/rpkg/pull-request/36
- https://pagure.io/fedpkg/pull-request/12

Comment 18 Adam Miller 2016-03-24 14:49:05 UTC
Just an update here, we're having to rework the way that the OSBS/Koji integration functions because of some changes with docker v2 images.

    We have a new plan, all builds will be Content Generators:

    - koji-containerbuild still starts OSBS build for builds explicitly
      requested by developers, but does *not* create a Koji Build for it,
      only the createContainer Koji Task. It will fetch logs from the OSBS
      build, but not the 'docker save' output.

    - osbs-client always enables the koji_promote plugin for prod builds,
      regardless of whether chain rebuild triggers are enabled (need to
      figure out how to handle scratch builds)

    - koji_promote plugin runs CG import for all builds, not just for
      automated rebuilds

    I'm going to be taking a pass at patching koji-containerbuild,
    osbs-client, and atomic-reactor while twaugh is afk on PTO this week.

        https://github.com/projectatomic/atomic-reactor/issues/435
        https://github.com/projectatomic/osbs-client/issues/366
        https://github.com/release-engineering/koji-containerbuild/issues/27
        https://github.com/projectatomic/osbs-client/issues/367

    I now have patches in my own branch of each project and am iterating as
    I test the code (hope to have this done within a day or two).

        https://github.com/maxamillion/osbs-client/tree/content-generator
        https://github.com/maxamillion/koji-containerbuild/tree/content-generator
        https://github.com/maxamillion/atomic-reactor/tree/content-generator

    I'm doing builds in my COPR space here (in case anyone is interested):

        https://copr.fedorainfracloud.org/coprs/maxamillion/atomic-reactor

    Request for koji account to enable Content Generator Imports
        https://fedorahosted.org/rel-eng/ticket/6380

Comment 19 Adam Miller 2016-03-24 14:50:17 UTC
Also as a note, I'm working around the pycurl error for now by just calling the osbs-client tools via a subprocess call. This is just a temporary fix to not block forward progress on everything else.

https://github.com/maxamillion/koji-containerbuild/tree/shell-out

Comment 20 Adam Miller 2016-04-04 16:16:13 UTC
We have working builds with content generator import:

        $ fedpkg --config ~/.fedpkg-stg.conf container-build
        Created task: 90071655
        Task info:
               http://koji.stg.fedoraproject.org/koji/taskinfo?taskID=90071655
        Watching tasks (this may be safely interrupted)...
        90071655 buildContainer (noarch): free
        90071655 buildContainer (noarch): free
            -> open (buildvm-01.stg.phx2.fedoraproject.org)
              90071657 createContainer (x86_64): free
              90071657 createContainer (x86_64): free
                -> open (buildvm-01.stg.phx2.fedoraproject.org)
              90071657 createContainer (x86_64): open
                (buildvm-01.stg.phx2.fedoraproject.org) -> closed
              0 free  1 open  1 done  0 failed
              90071655 buildContainer (noarch): open
                (buildvm-01.stg.phx2.fedoraproject.org) -> closed
              0 free  0 open  2 done  0 failed

        90071655 buildContainer (noarch) completed successfully

    Content Generator Import:
        http://koji.stg.fedoraproject.org/koji/buildinfo?buildID=741588

Pull Requests upstream to enable this:
    https://github.com/projectatomic/atomic-reactor/pull/445
    https://github.com/projectatomic/osbs-client/pull/378
    https://github.com/release-engineering/koji-containerbuild/pull/32

There will still be some deployment work needed into the Fedora Infra, but I already have the Request For Resources filed and fulfilled so there are no blockers.

Comment 21 Adam Miller 2016-04-20 14:22:25 UTC
This has been completely ansible-ized and deployed on the new stage VMs (also can push-button destroy and redeploy it all). Testing end-to-end builds is passing. We need to get some patches into Fedora proper before going to production and that is being worked on now.

Comment 22 Jan Kurik 2016-04-20 15:02:15 UTC
On 2016-Apr-19 we reached the "Change Checkpoint: 100% Code Complete Deadline" milestone for Fedora 24 release. At this point all the Changes not at least in in "ON_QA" state should be brought to FESCo for review. Please update the state of this bug to "ON_QA" if it is already 100% completed. Please let me know in case you have any trouble with the implementation and the Change needs any help or review.

Thanks, Jan

Comment 23 Adam Miller 2016-04-21 13:12:03 UTC
This is code complete, deployed, and testable. Marking ON_QA.


Note You need to log in before you can comment on or make changes to this bug.