-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect partitionValues_parsed with id & name column mapping in Delta Lake #24129
base: master
Are you sure you want to change the base?
Conversation
@@ -161,7 +161,7 @@ public void write(CheckpointEntries entries, TrinoOutputFile outputFile) | |||
} | |||
List<DeltaLakeColumnHandle> partitionColumns = extractPartitionColumns(entries.metadataEntry(), entries.protocolEntry(), typeManager); | |||
List<RowType.Field> partitionValuesParsedFieldTypes = partitionColumns.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a corresponding test in io.trino.plugin.deltalake.transactionlog.checkpoint.TestCheckpointWriter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share scenarios you want to cover in the class? I intentionally avoided that. Both TestCheckpointWriter & TestCheckpointEntryIterator are not suitable to verify partitionValues_parsed
field because AddFileEntry
doesn't hold the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about a test similar to io.trino.plugin.deltalake.transactionlog.checkpoint.TestCheckpointEntryIterator#testReadAddEntriesPartitionPruning
with corresponding resource files
@@ -183,7 +183,7 @@ public RowType getAddEntryType( | |||
List<DeltaLakeColumnHandle> partitionColumns = extractPartitionColumns(metadataEntry, protocolEntry, typeManager); | |||
if (!partitionColumns.isEmpty()) { | |||
List<RowType.Field> partitionValuesParsed = partitionColumns.stream() | |||
.map(column -> RowType.field(column.columnName(), typeManager.getType(getTypeSignature(DeltaHiveTypeTranslator.toHiveType(column.type()))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test in io.trino.plugin.deltalake.transactionlog.checkpoint.TestCheckpointEntryIterator
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeBasic.java
Outdated
Show resolved
Hide resolved
ee3627b
to
a60bbd3
Compare
@@ -161,7 +161,7 @@ public void write(CheckpointEntries entries, TrinoOutputFile outputFile) | |||
} | |||
List<DeltaLakeColumnHandle> partitionColumns = extractPartitionColumns(entries.metadataEntry(), entries.protocolEntry(), typeManager); | |||
List<RowType.Field> partitionValuesParsedFieldTypes = partitionColumns.stream() | |||
.map(column -> RowType.field(column.columnName(), column.type())) | |||
.map(column -> RowType.field(column.basePhysicalColumnName(), column.type())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now
"PartitionValues": {
"col-6d32b73c-d46b-47f3-aeee-b4ce2231c81f": "30"
},
"PartitionValues_parsed": {
"Col456d32b73c45d46b4547f345aeee45b4ce2231c81f": 30
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the missing dashes.
try (TestTable table = new TestTable( | ||
getQueryRunner()::execute, | ||
"test_checkpoint", | ||
"(x int, part int) WITH (checkpoint_interval = 3, column_mapping_mode = '" + columnMappingMode + "', partitioned_by = ARRAY['part'])")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add also test for other types (like Date) which has different representation in PartitionValues
and PartitionValues_parsed
testPartitionValuesParsedCheckpoint(ColumnMappingMode.NONE); | ||
} | ||
|
||
private void testPartitionValuesParsedCheckpoint(ColumnMappingMode columnMappingMode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have also product test in TestDeltaLakeColumnMappingMode
to check reading/writing checkpoints by trino/delta
(Changed to draft for avoiding accidental merge) |
Description
We should write physical column names in
partitionValues_parsed
field on checkpoint files.Fixes #24121
Release notes