-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50286][SQL] Correctly propagate SQL options to WriteBuilder #48822
base: master
Are you sure you want to change the base?
Conversation
@@ -44,7 +46,7 @@ object V2Writes extends Rule[LogicalPlan] with PredicateHelper { | |||
|
|||
override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown { | |||
case a @ AppendData(r: DataSourceV2Relation, query, options, _, None, _) => | |||
val writeBuilder = newWriteBuilder(r.table, options, query.schema) | |||
val writeBuilder = newWriteBuilder(r.table, r.options.asScala.toMap ++ options, query.schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add an assert that only one of them can be non empty?
import org.apache.spark.sql.execution.CommandResultExec | ||
import org.apache.spark.sql.execution.datasources.v2._ | ||
|
||
class DataSourceV2OptionSuite extends DatasourceV2SQLBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class DataSourceV2OptionSuite extends DatasourceV2SQLBase { | |
class DataSourceV2OptionSQLSuite extends DatasourceV2SQLBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is testing SQL API only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you for the fix, @pan3793 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, did not realize this. Looks from @cloud-fan comment that only one can be set?
} | ||
} | ||
|
||
test("SPARK-36680, SPARK-50286: Supports Dynamic Table Options for SQL Insert Overwrite") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my fault, but we can optionally change the first JIRA's in these tests to SPARK-49098 as its the one that added the support to the inserts?
@szehon-ho yes, as mentioned in the description, DataFrame's API |
Wait, I forget the In fact, I submitted a PR to Iceberg to support this feature, but unfortunately, this patch doesn't seem to be getting attention, @szehon-ho do you think we can re-open this PR and get it in? If so, the assumption would not hold.
and we should define the priority, I think it should be
Currently, if there are duplicated options, 2 overrides 3, see spark/sql/core/src/main/scala/org/apache/spark/sql/internal/DataFrameWriterImpl.scala Lines 141 to 142 in c1968a1
@cloud-fan, do you think the proposed priority makes sense? or any new ideas? |
yea 3 should have lower priority. |
What changes were proposed in this pull request?
SPARK-49098 introduced a SQL syntax to allow users to set table options on DSv2 write cases, but unfortunately, the options set by SQL are not propagated correctly to the underlying DSv2
WriteBuilder
From the user's perspective, the above two are equivalent, but internal implementations differ slightly. Both of them are going to construct an
but the SQL
options
are carried byr.options
, and theDataFrame
APIoptions
are carried bywriteOptions
. Currently, only the latter is propagated to theWriteBuilder
, and the former is silently dropped. This PR fixes the above issue by merging those twooptions
.An additional question: if the user only uses
SQL
orDataFrame
API to construct the query, only one "options" will be filled, but if the user assembles LogicalPlan directly, there is a chance thatr.options
andwriteOptions
contain duplicated pairs, which one should take effect?Why are the changes needed?
Correctly propagate SQL options to
WriteBuilder
, to complete the feature added in SPARK-49098, so that DSv2 implementations like Iceberg can benefit.Does this PR introduce any user-facing change?
No, it's an unreleased feature.
How was this patch tested?
UTs added by SPARK-36680 and SPARK-49098 are updated also to check SQL
options
are correctly propagated to the physical planWas this patch authored or co-authored using generative AI tooling?
No.