Bug 2237396 - Failed to create task for container: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown [NEEDINFO]
Summary: Failed to create task for container: failed to create shim task: ttrpc: canno...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: containerd
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Sergio Basto
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedFreezeException
: 2239849 2246041 2246836 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-05 09:42 UTC by Sandro Mani
Modified: 2023-11-29 19:17 UTC (History)
54 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:
sergio: needinfo? (zebob.m)
dustymabe: needinfo? (zebob.m)


Attachments (Terms of Use)

Description Sandro Mani 2023-09-05 09:42:15 UTC
I suspect since containerd-1.6.23-1.fc40.x86_64 containers fail to start with errors like

failed to create task for container: failed to start shim: mkdir /var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/b0312fe2d99e00886fb055a449856fe7bf67edefa3da7d4d86b4594226d7e471: file exists: unknown

or, after removing /var/run/docker/containerd

Failed to create task for container: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown

I managed once to get docker/moby working again by downgrading to containerd-1.6.19-2.fc39.x86_64, but after this one time this also did not help anymore.

I'm not sure where to look further, any pointers appreciated.



Reproducible: Always

Comment 1 Sergio Basto 2023-09-05 09:57:08 UTC
Hi Sabdro,

Unfortunately is not easy to fix , I think we need update containerd to 1.7.x

can you check if contained 1.7.0 from https://copr.fedorainfracloud.org/coprs/sergiomb/docker2/builds/ works for you ?

we have a lot of stuff to update ...


more info about the state here: 

https://src.fedoraproject.org/rpms/containerd/pull-request/15 
https://bodhi.fedoraproject.org/updates/FEDORA-2023-aaa2b3d20b

Comment 2 Sandro Mani 2023-09-05 10:17:27 UTC
Thanks for your quick reply, unfortunately I get the same errors with containerd-1.7.0-5.fc39.x86_64

Comment 3 Sandro Mani 2023-09-14 19:27:49 UTC
Actually correction, with containerd-1.7.0-5.fc39.x86_64 it works. If I can help in any way getting things updated in Fedora proper, let me know.

Comment 4 Sergio Basto 2023-09-22 13:43:39 UTC
*** Bug 2239849 has been marked as a duplicate of this bug. ***

Comment 5 Geoffrey Marr 2023-09-25 23:46:02 UTC
Discussed during the 2023-09-25 blocker review meeting: [0]

The decision to delay the classification of this as a blocker bug was made as we would like a clearer rationale for why it would be beneficial to grant an FE to this bug, given that containerd is not used by podman and probably isn't on any media aside from CoreOS. Our question: what does an FE achieve here that a 0-day update would not?

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2023-09-25/f39-blocker-review.2023-09-25-16.02.txt

Comment 6 Geoffrey Marr 2023-09-25 23:47:01 UTC
Note: this is being considered as a Freeze Exception and *not* a Blocker Bug as stated above.

Comment 7 Dusty Mabe 2023-09-26 02:06:53 UTC
I'm not sure if this bit of info would or should play into the calculation, but for upgrading systems people theoretically upgrade from a working `docker` setup (f38) to one that won't work (f39).

Comment 8 Sergio Basto 2023-09-26 16:21:35 UTC
but F38 (and F37) also have the same problems , anyone is going fix it ? , ATM I don't have time .

Comment 9 Sandro Mani 2023-09-26 16:35:12 UTC
What needs to be done? Is it safe to just import all the builds from [1] to fedora? If so, I can help.


[1] https://copr.fedorainfracloud.org/coprs/sergiomb/docker2/builds/

Comment 10 Dusty Mabe 2023-09-26 16:42:13 UTC
(In reply to Sergio Basto from comment #8)
> but F38 (and F37) also have the same problems , anyone is going fix it ? ,
> ATM I don't have time .

The nuance here is that the updates for f37 and f38 never made it to `stable` repos. So users on those platforms are probably using `docker` without issue right now.

- https://bodhi.fedoraproject.org/updates/FEDORA-2023-5c4718e547
- https://bodhi.fedoraproject.org/updates/FEDORA-2023-aaa2b3d20b

When they switch to F39 is when they would have trouble.

Comment 11 Sergio Basto 2023-09-28 12:23:26 UTC
> (In reply to Dusty Mabe from comment #10)
> (In reply to Sergio Basto from comment #8)
> > but F38 (and F37) also have the same problems , anyone is going fix it ? ,
> > ATM I don't have time .
> 
> The nuance here is that the updates for f37 and f38 never made it to
> `stable` repos. So users on those platforms are probably using `docker`
> without issue right now.
> 
> - https://bodhi.fedoraproject.org/updates/FEDORA-2023-5c4718e547
> - https://bodhi.fedoraproject.org/updates/FEDORA-2023-aaa2b3d20b
> 
> When they switch to F39 is when they would have trouble.

you need build containerd 1.6 with golang-github-containerd-ttrpc 1.1.0 to work,  can you try https://koji.fedoraproject.org/koji/buildinfo?buildID=2236784  ? it have 

DEBUG util.py:444:   golang-github-containerd-ttrpc-devel                              noarch  1:1.1.0-5.fc39                           build   38 k


@zebob.m should have a better status of containd 1.7 than me

Comment 12 Dusty Mabe 2023-09-28 14:21:42 UTC
(In reply to Sergio Basto from comment #11)
> > 
> > When they switch to F39 is when they would have trouble.
> 
> you need build containerd 1.6 with golang-github-containerd-ttrpc 1.1.0 to

For Fedora CoreOS we do have the ability to pin on older RPMs and we
pinned on 1.6.19-1.fc39 in https://github.com/coreos/fedora-coreos-config/pull/2624

We didn't do 1.6.19-2.fc39, which was the one done by releng for the
F39 mass rebuild because I didn't see it in the list of updates for
F39: https://bodhi.fedoraproject.org/updates/?search=&packages=containerd&releases=F39

I'll bump our pin to 1.6.19-2.fc39 now, but this doesn't help the
rest of Fedora that isn't Fedora CoreOS. In order to do that we'd
need to file a ticket with releng to untag the 1.6.23-1.fc39 build
so that 1.6.19-2.fc39 is the latest.

> @zebob.m should have a better status of containd 1.7 than me

I'll add needinfo to see if they can weigh in here.

Comment 13 Adam Williamson 2023-10-03 22:57:52 UTC
"I'm not sure if this bit of info would or should play into the calculation, but for upgrading systems people theoretically upgrade from a working `docker` setup (f38) to one that won't work (f39)."

That does not really affect FE status, since being an FE does not guarantee a bug will be fixed. It simply *allows* for it. If there is a fix but we don't give it an FE, it would go out on day 0, and people upgrading on release day would still get the fixed version. If there wasn't a fix, the bug having an FE wouldn't make any difference.

There's a weak argument to give it an FE to avoid a regression for people upgrading *before* release day, but that's kinda the same for any bug, so...

Comment 14 Robert-André Mauchin 🐧 2023-10-04 00:43:05 UTC
For containerd we have the following review requests:

containerd:


github.com/container-orchestrated-devices/container-device-interface
github.com/containerd/cgroups/v3
github.com/containerd/nri

golang-github-container-orchestrated-devices-device-interface https://bugzilla.redhat.com/show_bug.cgi?id=2229406
golang-github-containerd-cgroups-3 https://bugzilla.redhat.com/show_bug.cgi?id=2229425




golang-github-containerd-nri:

github.com/containers/common
github.com/r3labs/diff/v3
github.com/sters/yaml-diff

golang-github-containers-common  https://bugzilla.redhat.com/show_bug.cgi?id=2229811
golang-github-r3labs-diff-3 https://bugzilla.redhat.com/show_bug.cgi?id=2229465
golang-github-sters-yaml-diff https://bugzilla.redhat.com/show_bug.cgi?id=2229466



golang-github-containers-common:


github.com/containers/storage
github.com/containers/image/v5
github.com/disiqueira/gotree/v3

golang-github-containers-storage https://bugzilla.redhat.com/show_bug.cgi?id=2229483
golang-github-containers-image-5 https://bugzilla.redhat.com/show_bug.cgi?id=2229518
golang-github-disiqueira-gotree-3 https://bugzilla.redhat.com/show_bug.cgi?id=2229481


golang-github-containers-storage:

github.com/google/go-intervals
github.com/mistifyio/go-zfs/v3

golang-github-google-intervals https://bugzilla.redhat.com/show_bug.cgi?id=2229478
golang-github-mistifyio-zfs-3 https://bugzilla.redhat.com/show_bug.cgi?id=2229479


golang-github-containers-image-5:

dario.cat/mergo
github.com/containers/libtrust
github.com/cyberphone/json-canonicalization
github.com/proglottis/gpgme
github.com/vbauerster/mpb/v8

golang-dario-mergo https://pagure.io/releng/fedora-scm-requests/issue/55275 B+
golang-github-containers-libtrust https://bugzilla.redhat.com/show_bug.cgi?id=2229469
golang-github-cyberphone-json-canonicalization https://bugzilla.redhat.com/show_bug.cgi?id=2229470
golang-github-proglottis-gpgme https://bugzilla.redhat.com/show_bug.cgi?id=2229471
golang-github-vbauerster-mpb-8 https://bugzilla.redhat.com/show_bug.cgi?id=2229477

Comment 15 Sergio Basto 2023-10-05 18:49:20 UTC
I have reviewed  golang-github-container-orchestrated-devices-device-interface https://bugzilla.redhat.com/show_bug.cgi?id=2229406 
golang-github-sters-yaml-diff https://bugzilla.redhat.com/show_bug.cgi?id=2229466 

since containerd-1.7.0 seems work better I'm trying for now just bring containerd-1.7.0 to Fedora (containrd-1.7.4 needs even more new packages) 
I start build container 1.7.0 in a new copr repo: https://copr.fedorainfracloud.org/coprs/sergiomb/docker4/builds/

from my tests I did this report :  https://sergiomb.fedorapeople.org/docker_report3.txt

to build containerd we need  golang-opentelemetry-contrib [1] and we need update k8s.io/apimachinery and k8s.io/cri-api [2]


[1]
to unbootstrap golang-opentelemetry-contrib needs:
  golang-github-docker 	23.0.4-1
  new package  golang-github-moby-pubsub  1.0.0-1 (
  new package  golang-github-azure-ansiterm 0-0.1.20230420git306776e
  golang-github-docker-devel-23.0.4-1 built
  No matching package to install: 'golang(github.com/brunoscheufler/aws-ecs-metadata-go)'
  new package golang-github-brunoscheufler-aws-ecs-metadata-0-0.1.20230425git67e37ae


[2]
# update golang-github-googleapis-gnostic to 0.5.5-1 (not sure if it needed)
package rename from golang-github-googleapis-gnostic to golang-github-google-gnostic and update to 0.6.9-1
  new package golang-sigs-k8s-json-0-0.1.20230426gitbc3834c
  new package golang-github-flowstack-jsonschema-0.1.2-1

update golang-k8s-apimachinery to 1.26.4
update and bootstrap golang-k8s-apiserver to 1.26.4 (apiserver-kubernetes)
update golang-k8s-client to 1.26.4 with bootstrap %global __requires_exclude %{?__requires_exclude:%{__requires_exclude}|}^golang\\(.*\\)$
update golang-k8s-cri-api to 1.26.4
update golang-k8s-klog to 2.90.1-1.fc39

Comment 16 Adam Williamson 2023-10-06 19:14:10 UTC
-4 FE in https://pagure.io/fedora-qa/blocker-review/issue/1345 , marking rejected. We feel there's a lot going on here and it'd be more suited to a 0-day update.

Comment 17 lists 2023-10-19 17:38:19 UTC
I also have this issue with F39 workstation beta.

Downgrading helps:

```
sudo dnf install ~/Downloads/containerd-1.6.19-2.fc39.x86_64.rpm
```

Comment 18 Sergio Basto 2023-10-29 16:46:27 UTC
*** Bug 2246836 has been marked as a duplicate of this bug. ***

Comment 19 Nathan G. Grennan 2023-11-03 23:48:50 UTC
I just ran into this after upgrading from Fedora 38 to Fedora 39 and containerd-1.6.23-1.fc39.x86_64 was installed. Downgrading to containerd-1.6.19-2.fc39.x86_64 fixed it.

I then looked at my other working Fedora 39 and Kubernetes server where I had the docke-ce repo. With it I had containerd.io-1.6.24-3.1.fc39.x86_64, and then using that version on the original server I had an issue with works.

Comment 20 Nathan G. Grennan 2023-11-05 17:14:20 UTC
(In reply to Nathan G. Grennan from comment #19)
> I just ran into this after upgrading from Fedora 38 to Fedora 39 and
> containerd-1.6.23-1.fc39.x86_64 was installed. Downgrading to
> containerd-1.6.19-2.fc39.x86_64 fixed it.
> 
> I then looked at my other working Fedora 39 and Kubernetes server where I
> had the docke-ce repo. With it I had containerd.io-1.6.24-3.1.fc39.x86_64,
> and then using that version on the original server I had an issue with works.

  I ended up finding using containerd.io-1.6.24-3.1.fc39.x86_64 on this system only half worked, and ended up downgrading to containerd-1.6.19-2.fc39.x86_64. On the other system the actual runtime is docker, but on this system it is moby-engine. Mixing containerd.io from docker with moby-engine caused a different error with kubelet that prevented it from coming up properly. At least with Fedora 39 I can just use the kubernetes packages that came with Fedora instead of having to build my own.

Comment 21 William Oprandi 2023-11-09 16:36:58 UTC
Same error for a docker build command since F39 upgrade

Comment 22 Sergio Basto 2023-11-13 22:46:13 UTC
*** Bug 2246041 has been marked as a duplicate of this bug. ***

Comment 23 Everard Brown 2023-11-13 23:48:39 UTC
(In reply to Nathan G. Grennan from comment #19)
> I just ran into this after upgrading from Fedora 38 to Fedora 39 and
> containerd-1.6.23-1.fc39.x86_64 was installed. Downgrading to
> containerd-1.6.19-2.fc39.x86_64 fixed it.

I just upgraded from Fedora 38 to Fedora 39 and can confirm, downgrading to containerd-1.6.19-2.fc39.x86_64 fixes the issue.

The actual package I used can be found here:
https://koji.fedoraproject.org/koji/packageinfo?packageID=25455

Comment 24 Benjamin Evans 2023-11-15 12:25:03 UTC
(In reply to bugzilla.redhat from comment #23)
> (In reply to Nathan G. Grennan from comment #19)
> > I just ran into this after upgrading from Fedora 38 to Fedora 39 and
> > containerd-1.6.23-1.fc39.x86_64 was installed. Downgrading to
> > containerd-1.6.19-2.fc39.x86_64 fixed it.
> 
> I just upgraded from Fedora 38 to Fedora 39 and can confirm, downgrading to
> containerd-1.6.19-2.fc39.x86_64 fixes the issue.
> 
> The actual package I used can be found here:
> https://koji.fedoraproject.org/koji/packageinfo?packageID=25455

Exactly the same for me. Downgrading also fixed the issue.

Comment 25 Jan Pazdziora 2023-11-16 11:30:00 UTC
What is the plan? If per comment 15 upgrade of containerd to 1.7 needs some dependencies that are not in Fedora right now, can we just downgrade to the containerd-1.6.19-2 while bumping epoch, to provide people with the revert they are after?

Comment 26 Sergio Basto 2023-11-16 11:49:16 UTC
I'm waiting for some help, I think you are making some confusion, we can't build  containerd 1.7 without dependencies , and rollback ttrpc can break other things I guess

Comment 27 Jan Pazdziora 2023-11-16 12:04:05 UTC
Asking two questions about the plan after over a month after the last status comment causes confusion?

Is there an indication that the help you are waiting for is actually on its way? I wouldn't want to see us still waiting for help many months later.

Is the breakage caused by downgrade from 1.6.23 to 1.6.19 bigger than the current situation when the 1.6.23 setup is broken?

Comment 28 Stefan van der Eijk 2023-11-16 12:28:18 UTC
I added the https://copr.fedorainfracloud.org/coprs/sergiomb/docker2/ repo and installed the containerd package from there. Haven't seen adverse behavior.

Interesting to see that a package which is part of the distribution is supported like this --> that breakage is being accepted. If podman is the preferred way to go, then why not remove containerd and provide a comprehensive guide to migrate to podman?

Comment 29 Marc Dionne 2023-11-16 22:15:27 UTC
I second the question as to what the plan is; is a downgrade to 1.6.19 not feasible because of issues with other packages?

I realize that the trend is going away from docker, but many of us have existing tools and scripts that rely on docker as it exists today, and having "docker run" work is basic and important in my circles, where the word of mouth atm is to hold off on f39.

Comment 30 Sergio Basto 2023-11-16 23:52:40 UTC
(In reply to Stefan van der Eijk from comment #28)
> I added the https://copr.fedorainfracloud.org/coprs/sergiomb/docker2/ repo
> and installed the containerd package from there. Haven't seen adverse
> behavior.

I did that ( https://copr.fedorainfracloud.org/coprs/sergiomb/docker2/ ) with a very big effort and which cost me some family and work problems because I was very tired and even exhausted.

I know very little or nothing about the go language, go as it is packaged, has thousands of dependencies and some of them circular, after package reviews stop at the smallest details.

I want bring container 1.7.0 to Fedora for start , I think the problem of docker environment is get rid of cgroupsv2 and use cgroupsv3 , I think it was a security issue but I'm just speculating . Now it seems to me that updates are more slowly and peaceful, but we need to move all to container 1.7 and docker 24.0.0

I did a new repo  https://copr.fedorainfracloud.org/coprs/sergiomb/docker4/builds/ which is not finished with a refresh of the build that we need 


> Interesting to see that a package which is part of the distribution is
> supported like this --> that breakage is being accepted. If podman is the
> preferred way to go, then why not remove containerd and provide a
> comprehensive guide to migrate to podman?

I use docker on  my work and my employee don't want to change , so is not an option work with podman

Comment 31 Adam Williamson 2023-11-17 00:58:32 UTC
Sergio: I think the idea is, since updating to 1.7.0 definitely is not trivial but people kinda need docker to work now, we should *first* ship a downgrade to 1.6.19 just to get things working again for folks, then we have time to get to 1.7.0 whenever we can get the pile of go deps packaged up.

Alternatively - maybe someone could figure out exactly what change between 1.6.19 and 1.6.23 broke things here, and we can see if it's feasible to just revert that? Unfortunately it seems like the patches in the package are a bit sensitive to version bumps, now I've started trying to do this...

Comment 32 Sergio Basto 2023-11-17 01:24:49 UTC
(In reply to Adam Williamson from comment #31)
> Sergio: I think the idea is, since updating to 1.7.0 definitely is not
> trivial but people kinda need docker to work now, we should *first* ship a
> downgrade to 1.6.19 just to get things working again for folks, then we have
> time to get to 1.7.0 whenever we can get the pile of go deps packaged up.
> 
> Alternatively - maybe someone could figure out exactly what change between
> 1.6.19 and 1.6.23 broke things here, and we can see if it's feasible to just
> revert that? Unfortunately it seems like the patches in the package are a
> bit sensitive to version bumps, now I've started trying to do this...

ah for that, ( I already explained that several times but maybe I miss that in this bug report), we need rollback golang-github-containerd-ttrpc to 1.1.0 and rebuild containerd against it 

I did that already once https://src.fedoraproject.org/rpms/golang-github-containerd-ttrpc/c/e755e8bd10893cad974c506e4350ea7ccf2d34e6?branch=rawhide see https://src.fedoraproject.org/rpms/golang-github-containerd-ttrpc/commits/rawhide 

maybe we should try it again :) beacause only package that depends on it are cadvisor and containerd !

Comment 33 Allie 2023-11-17 10:59:48 UTC
Also running into this issue, would greatly appreciate a fix 😇

Comment 34 alexandru.lesi 2023-11-20 16:29:25 UTC
For the folks that critically need to get things to work again I wanted to share this post that mentioned reinstalling docker that worked for me as an alternative to downgrading https://discussion.fedoraproject.org/t/docker-failed-to-create-shim-task-ttrpc/95902/3

Comment 35 William Oprandi 2023-11-20 20:32:17 UTC
There are official package (moby-engine). Install from docker repository can only be a workaround


Note You need to log in before you can comment on or make changes to this bug.