Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifying Platform-Specific OCI Artifacts #1216

Open
Wwwsylvia opened this issue Oct 31, 2024 · 13 comments
Open

Identifying Platform-Specific OCI Artifacts #1216

Wwwsylvia opened this issue Oct 31, 2024 · 13 comments

Comments

@Wwwsylvia
Copy link

Identifying Platform-Specific OCI Artifacts

Hello OCI Community,

We are the maintainers of the ORAS project. We are considering adding platform information to the manifest when producing artifacts to support multi-arch scenarios, such as distributing multi-arch binaries. This would allow the manifest to contain information about the specific platforms that the artifacts are intended for, similar to how container image configs include platform properties.

To address this, we have identified two potential approaches and believe it would be beneficial to discuss them with the community.

Approach 1: Adding Platform Annotations in the Manifest

One approach is to introduce new annotations in the manifest to store platform information. For instance, the architecture and OS information could be placed in org.opencontainers.image.platform.architecture and org.opencontainers.image.platform.os annotations, respectively. Additional details like OS version, OS features, and variant could be included in org.opencontainers.image.platform.osversion, org.opencontainers.image.platform.osfeatures, and org.opencontainers.image.platform.variant.

For example, the manifest annotations for a linux/amd64 artifact would look like this:

{
  "org.opencontainers.image.platform.architecture": "amd64",
  "org.opencontainers.image.platform.os": "linux"
}

The complete manifest containing such annotations would then look like this:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.unknown.artifact.v1",
  "config": {
    "mediaType": "application/vnd.oci.empty.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2,
    "data": "e30="
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.empty.v1+json",
      "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
      "size": 2
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-10-22T15:41:20Z",
    "org.opencontainers.image.platform.architecture": "amd64",
    "org.opencontainers.image.platform.os": "linux"
  }
}

This approach is straightforward to implement for both producers and consumers, with annotations that are friendly for humans to read. It also enables end users to query or filter out specific platforms based on annotations when listing manifests. Additionally, the annotations can be applied to platform-specific Image Indexes.

Approach 2: Adding a Platform Field in the Config Descriptor

Another approach is to add a platform field in the config descriptor to indicate the platform of the manifest. The resulting config descriptor would be similar to the multi-arch manifest descriptor in an Image Index. The config data can be empty or in any custom form.

For example, suppose the config data is empty; the config descriptor with platform information would look like this:

{
  "mediaType": "application/vnd.oci.empty.v1+json",
  "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
  "size": 2,
  "data": "e30=",
  "platform": {
    "architecture": "amd64",
    "os": "linux"
  }
}

The complete manifest containing such a config descriptor would then look like this:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.unknown.artifact.v1",
  "config": {
    "mediaType": "application/vnd.oci.empty.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2,
    "data": "e30=",
    "platform": {
      "architecture": "amd64",
      "os": "linux"
    }
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.empty.v1+json",
      "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
      "size": 2
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-10-22T15:41:20Z"
  }
}

This approach allows consumers to easily extract platform information from the manifest content, and it also makes it simple for producers to add this detail. However, since the config and manifest are separate objects, there might be concerns about storing platform information for the manifest in the config descriptor.

Note

Although the Go package mentions that the platform field of a descriptor should only be used when referring to manifests, the image spec (descriptor.md) does not restrict it to manifests.

Alternative Considered: Embedding Platform Information in the Config Data

We also considered embedding the platform information directly into the config data, following the same approach used for container images. This way, consumer clients (like ORAS) can extract the platform details from the artifact config just as they do for container images.

The config payload containing platform information would look like this:

{
  "architecture": "amd64",
  "os": "linux"
}

The config descriptor would look like this:

{
  "mediaType": "application/vnd.unknown.config.v1+json",
  "digest": "sha256:9d99a75171aea000c711b34c0e5e3f28d3d537dd99d110eafbfbc2bd8e52c2bf",
  "size": 37
}

The complete manifest containing such a config would then look like this:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.unknown.artifact.v1",
  "config": {
    "mediaType": "application/vnd.unknown.config.v1+json",
    "digest": "sha256:9d99a75171aea000c711b34c0e5e3f28d3d537dd99d110eafbfbc2bd8e52c2bf",
    "size": 37
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.empty.v1+json",
      "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
      "size": 2
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-10-22T15:41:20Z"
  }
}

However, we identified several problems with this approach:

  1. It requires an extra request for the consumer to fetch the config blob in order to get the platform details.
  2. Since a config can be any blob other than JSON and can be extremely large, a list of acceptable config media types would need to be maintained by all consumers, and a size limit would need to be applied for security considerations.
  3. It would be challenging for existing OCI artifact producers to embed the platform field if they already have their config utilized.

Summary

To summarize, here are the pros and cons for the two main options:

Option Pros Cons
Option 1: Adding Platform Annotations in the Manifest - Straightforward to implement
- Annotations are human-friendly to read
- Enables querying/filtering based on annotations
- Can be applied to Image Index
- Introduces new annotation definitions in the image-spec
Option 2: Adding a Platform Field in the Config Descriptor - Straightforward to implement - Potential concerns about storing platform info for the manifest in the config descriptor

Overall, the ORAS community favors the approach 1 due to its numerous advantages.

Request for Comments

We would love to hear your thoughts and insights on the approaches we've proposed! If you have any alternative approaches or suggestions, please share them with us.

Thank you!

Related issues on ORAS:

@sajayantony
Copy link
Member

@Wwwsylvia could you also help elaborate with some example use cases. Maybe @FeynmanZhou can talk about the use and types of artifacts. This question came up on the call.

@jonjohnsonjr
Copy link
Contributor

Have you considered using an image index?

@sudo-bmitch
Copy link
Contributor

I'm not a fan of annotations because the platform field has the potential to grow and include fields that would need to be serialized into a string (see the proposals in wg-image-compatibility for more background).

Of the options, the config descriptor makes more sense to me. But I have concerns that they are both bad options. My biggest concern with adding it to the manifest is repeating the same data multiple times in different places. In addition to being redundant, there's the risk they don't match, or implementations pick which they populate or which they consume.

  1. It requires an extra request for the consumer to fetch the config blob in order to get the platform details.

This assumes the config blob isn't small enough to be embedded in the data field.

  1. It would be challenging for existing OCI artifact producers to embed the platform field if they already have their config utilized.

Presumably these existing use cases didn't need the platform field when they were created or already have other options.

Have you considered using an image index?

Every use case I can think of comes back to this, and it's currently the approach used by regclient. If it is a standalone artifact this is referenced directly, presumably the client already knows it is the artifact they wanted for their platform. Adding platform data implies there could be multiple platforms where an index would be used.

For referrers, you wouldn't need the platform in the referrers response because the artifact is either directly associated to the appropriate platform specific image, or if a platform lookup is needed, then an index would have the subject populated in it and clients would query the referrers response (itself an index) to find their artifact type, pull the artifact index that has the subject defined, and then pull the platform specific artifact. In each case, there will be an index to lookup the platform, either for pulling the platform specific image to query, or to pull the platform specific artifact.

@qweeah
Copy link

qweeah commented Nov 1, 2024

My biggest concern with adding it to the manifest is repeating the same data multiple times in different places.

@sudo-bmitch Do you mean the platform information in the config layer might be different from the config.platform in the manifest? If so, would it be better if we place platform info "platform":{"architecture": "amd64","os": "linux"} into the config.data in the manifest, like:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.unknown.artifact.v1",
  "config": {
    "mediaType": "application/vnd.oci.example.v1+json",
    "digest": "sha256:75a30a7ed1a18b1fcfe59e831c35cb8eb7a629d3966804fa97b0c4fd533e8a10",
    "size": 50,
    "data": "InBsYXRmb3JtIjp7ImFyY2hpdGVjdHVyZSI6ICJhbWQ2NCIsIm9zIjogImxpbnV4In0="
  },
 ...
}

@shizhMSFT
Copy link
Contributor

if a platform lookup is needed, then an index would have the subject populated in it and clients would query the referrers response (itself an index) to find their artifact type, pull the artifact index that has the subject defined, and then pull the platform specific artifact.

It looks like an approach 3.

@shizhMSFT
Copy link
Contributor

My biggest concern with adding it to the manifest is repeating the same data multiple times in different places. In addition to being redundant, there's the risk they don't match, or implementations pick which they populate or which they consume.

Today we do have duplicated data for images as the platform info exists in both image index and the image config. In case they don't match, we can recreate the image index using the information of the image config as the source of the truth.

@Wwwsylvia
Copy link
Author

Thanks @jonjohnsonjr and @sudo-bmitch for the prompt response!

or if a platform lookup is needed, then an index would have the subject populated in it and clients would query the referrers response (itself an index) to find their artifact type, pull the artifact index that has the subject defined, and then pull the platform specific artifact. In each case, there will be an index to lookup the platform, either for pulling the platform specific image to query, or to pull the platform specific artifact.

Could you help elaborate on this a bit more? Do you mean an Image Index containing a list of multi-platform manifests is needed anyway, or is it something else?

@sudo-bmitch
Copy link
Contributor

Without more details / examples of the problem you are solving, I'm envisioning two possible solutions:

  • Runnable Index

    • Image amd64 <- artifact for amd64 with subject to platform specific image
    • Image arm64 <- artifact for arm64 with subject to platform specific image
  • Runnable Index <- artifact Index (containing platform specific artifacts) and subject to top level index

    • Image amd64
    • Image arm64

I think the former is the better approach in general since platform specific artifacts are accessible if the platform specific image is directly referenced. But in both cases, a platform specific image is dereferenced before the platform specific artifact is retrieved.

If you don't have a subject/referrer, and are pushing an artifact with platform specific variants, then an Index is appropriate. That same template would be used for artifacts that have multiple variants that are not based on platform, it just happens that OCI makes it even easier when the selector is based on platform and not an annotation.

@sudo-bmitch
Copy link
Contributor

Do you mean the platform information in the config layer might be different from the config.platform in the manifest? If so, would it be better if we place platform info "platform":{"architecture": "amd64","os": "linux"} into the config.data in the manifest...

Nope, I don't want to redefine the empty blob contents to include data.

Today we do have duplicated data for images as the platform info exists in both image index and the image config.

Indeed, there's duplication in several places, but I'd like to avoid making it worse where feasible.

I think there's a good case to be made for moving platform into the image manifest as a top level field, removing it from the config, along with removing the layer digests from the config. But that is a heavy lift and very breaking change (v2 of the spec) that I don't think anyone is ready for right now.

@Wwwsylvia
Copy link
Author

  • Runnable Index

    • Image amd64 <- artifact for amd64 with subject to platform specific image
    • Image arm64 <- artifact for arm64 with subject to platform specific image

@sudo-bmitch Just wanted to confirm, did you mean a structure like this?

And the platform field would be included in the subject descriptor of the artifact manifest? Something like:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.unknown.artifact.v1",
  "config": {
    "mediaType": "application/vnd.oci.empty.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2,
    "data": "e30="
  },
  "subject": {
    "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
    "digest": "sha256:79d4fa4e64e8bee2a7f54813297eec1daed518db9bde667f8daea7b9e652e717",
    "size": 410,
    "platform": {
        "architecture": "amd64",
        "os": "linux"
    }
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.empty.v1+json",
      "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
      "size": 2
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-10-22T15:41:20Z"
  }
}

@sudo-bmitch
Copy link
Contributor

@Wwwsylvia that's the correct picture, but at that point I believe including the platform anywhere in the artifact manifest would be attempting to solve a problem that doesn't exist. We don't need the platform in attestation and signature artifacts for platform specific images today, because the query for the referrer came from a platform specific image, the one running the query already knows the platform.

Is there a use case where the platform is needed in the artifact manifest?

@tianon
Copy link
Member

tianon commented Nov 7, 2024

@Wwwsylvia could you also help elaborate with some example use cases. Maybe @FeynmanZhou can talk about the use and types of artifacts. This question came up on the call.

To echo, this is hard to review in the abstract without real-world use cases -- can we please get some examples of cases where having this kind of data is important?

@Wwwsylvia
Copy link
Author

Thanks all for your comments! @FeynmanZhou is currently preparing the use cases and will share them later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants