Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3: Disable strong integrity checksums #12264

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ebyhr
Copy link
Contributor

@ebyhr ebyhr commented Feb 14, 2025

The recent AWS SDK bump introduced strong integrity checksums, and broke compatibility with many S3-compatible object storages (pre-2025 Minio, Vast, Dell EC etc).

In Trino project, we received the error report (Missing required header for this request: Content-Md5) from several users and had to disable the check temporarily. We recommend disabling it in Iceberg as well. I faced this issue when I tried upgrading Iceberg library to 1.8.0 in Trino.

Relates to trinodb/trino#24954 & trinodb/trino#24713

@github-actions github-actions bot added the AWS label Feb 14, 2025
@ebyhr ebyhr force-pushed the ebi/s3-integrity-check branch from e629e83 to 83185e7 Compare February 14, 2025 05:02
// TODO Remove me once all of the S3-compatible storage support strong integrity checks
@SuppressWarnings("deprecation")
static AwsRequestOverrideConfiguration disableStrongIntegrityChecksums() {
return AwsRequestOverrideConfiguration.builder().signer(AwsS3V4Signer.create()).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, is the issue in compatibility is that older versions of Minio/3rd party object storage solutions expect Content-MD5 and in the new SDK we are not sending that and so the service rejects the request? Still feels like there should be a different way to force setting MD5

Copy link
Contributor Author

@ebyhr ebyhr Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting WHEN_REQUIRED for checksum calculation/validation doesn't resolve as far as I tested. Calculating MD5 for PutObjectRequest looks feasible, but I'm not sure how to do it for DeleteObjectsRequest.

@wendigo Do you happen to know any other approaches?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the same issue on the PyIceberg: apache/iceberg-python#1546 How about making this configurable?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr no. The only workaround that I have found is to force previous signature and it works against Dell ECS, old Minio versions and Ozone. I haven't tested Vast since I don't have access to it

@@ -149,4 +151,10 @@ static void configurePermission(
Function<ObjectCannedACL, S3Request.Builder> aclSetter) {
aclSetter.apply(s3FileIOProperties.acl());
}

// TODO Remove me once all of the S3-compatible storage support strong integrity checks
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intent but this doesn't feel like a practical way of going about this just because there's many out there and "all" just seems like a moving goalpost.

@@ -29,15 +29,18 @@
import software.amazon.awssdk.services.s3.S3ClientBuilder;

public class MinioUtil {
public static final String LATEST_TAG = "latest";
// This version doesn't support strong integrity checks
static final String LEGACY_TAG = "RELEASE.2024-12-18T13-15-44Z";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try and roll back all of the changes in the PR and then only update the TAG to RELEASE.2025-01-20T14-49-07Z (see also minio/minio#20845 (comment)). I've tried it locally and that fixed the issue described in #12237 when running those tests locally with Docker Desktop

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra it does, but it won't still work with some other S3-compatible storages like Vast, Dell ECS, so upgrading Minio to compatible version doesn't solve the issue

Copy link
Contributor Author

@ebyhr ebyhr Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra We intentionally specified pre-2025 here. Using a newer tag hides the actual problem.

@nastra
Copy link
Contributor

nastra commented Feb 19, 2025

I think we should revert the AWS SDK version (#12339) for 1.8.1 and then properly fix it for 1.9.0. @ebyhr does that make sense?

@wendigo
Copy link

wendigo commented Feb 19, 2025

@nastra that will work :)

@RussellSpitzer
Copy link
Member

@ebyhr How hard would it be for us to get some integration tests with one of these systems into the Iceberg project? Seems like we should have some coverage for these other S3-Compat systems. I'd also be ok with a separate project that we just use as a canary before release.

@nk1506
Copy link
Contributor

nk1506 commented Feb 21, 2025

How hard would it be for us to get some integration tests with one of these systems into the Iceberg project? Seems like we should have some coverage for these other S3-Compat systems. I'd also be ok with a separate project that we just use as a canary before release.

Hi @RussellSpitzer,
@jbonofre had a similar vision, which led to the creation of Project Iceland with the goal of providing integration test coverage for various catalogs. Initially, the plan was to cover all catalog types, but given the recent focus on REST and Polaris, we are currently prioritizing REST-based catalogs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants