{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1715221937.0","currentOid":""},"activityList":{"items":[{"before":"97bf1ee9f6f76d49df50560bf792135308f289a9","after":"91da2caa409cb156a970fea0fc8355fcd8c6a2e6","ref":"refs/heads/master","pushedAt":"2024-05-14T15:39:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48263] Collate function support for non UTF8_BINARY strings\n\n### What changes were proposed in this pull request?\ncollate(\"xx\", \"\") does not work when there is a config for default collation set which configures non UTF8_BINARY collation as default.\n\n### Why are the changes needed?\nFixing the compatibility issue with default collation config and collate function.\n\n### Does this PR introduce _any_ user-facing change?\nCustomers will be able to execute collation(, ) function even when default collation config is configured to some other collation than UTF8_BINARY. We are expanding the surface area for cx.\n\n### How was this patch tested?\nAdded tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46574 from nebojsa-db/SPARK-48263.\n\nAuthored-by: Nebojsa Savic \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48263] Collate function support for non UTF8_BINARY strings"}},{"before":"47006a493f98ca85196194d16d58b5847177b1a3","after":"97bf1ee9f6f76d49df50560bf792135308f289a9","ref":"refs/heads/master","pushedAt":"2024-05-14T15:37:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-47301][SQL][TESTS][FOLLOWUP] Remove workaround for ParquetIOSuite\n\n### What changes were proposed in this pull request?\nThe pr aims to remove workaround for ParquetIOSuite.\n\n### Why are the changes needed?\nAfter https://github.com/apache/spark/pull/46562 is completed, the reason why the ut `SPARK-7837 Do not close output writer twice when commitTask() fails` failed due to different event processing time sequence no longer exists, so we remove the previous workaround here.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\n- Manually test.\n- Pass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46577 from panbingkun/SPARK-47301_FOLLOWUP.\n\nAuthored-by: panbingkun \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-47301][SQL][TESTS][FOLLOWUP] Remove workaround for ParquetIOSuite"}},{"before":"6e22f1108bf4c0d28b03f2618e308cde6fc7faa0","after":"a848e2790cba0b7ee77d391dc534146bd35ee50a","ref":"refs/heads/branch-3.4","pushedAt":"2024-05-14T15:35:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects\n\nSpecial case escaping for MySQL and fix issues with redundant escaping for ' character.\n\nWhen pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\\\' syntax instead of ESCAPE '\\' which would cause errors when trying to push down.\n\nYes\n\nTests for each existing dialect.\n\nNo.\n\nCloses #46437 from mihailom-db/SPARK-48172.\n\nAuthored-by: Mihailo Milosevic \nSigned-off-by: Wenchen Fan \n(cherry picked from commit 47006a493f98ca85196194d16d58b5847177b1a3)\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"}},{"before":"172a23f780ae2a603908421b49683aff6748e419","after":"f37fa436cd4e0ef9f486a60f9af91a3ce0195df9","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-14T15:33:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects\n\nSpecial case escaping for MySQL and fix issues with redundant escaping for ' character.\n\nWhen pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\\\' syntax instead of ESCAPE '\\' which would cause errors when trying to push down.\n\nYes\n\nTests for each existing dialect.\n\nNo.\n\nCloses #46437 from mihailom-db/SPARK-48172.\n\nAuthored-by: Mihailo Milosevic \nSigned-off-by: Wenchen Fan \n(cherry picked from commit 47006a493f98ca85196194d16d58b5847177b1a3)\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"}},{"before":"e5ad5e94a8c891210637084a69359c1364201653","after":"47006a493f98ca85196194d16d58b5847177b1a3","ref":"refs/heads/master","pushedAt":"2024-05-14T15:31:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects\n\n### What changes were proposed in this pull request?\nSpecial case escaping for MySQL and fix issues with redundant escaping for ' character.\n\n### Why are the changes needed?\nWhen pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\\\' syntax instead of ESCAPE '\\' which would cause errors when trying to push down.\n\n### Does this PR introduce _any_ user-facing change?\nYes\n\n### How was this patch tested?\nTests for each existing dialect.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46437 from mihailom-db/SPARK-48172.\n\nAuthored-by: Mihailo Milosevic \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"}},{"before":"6766c39b458ad7abacd1a5b11c896efabf36f95c","after":"e5ad5e94a8c891210637084a69359c1364201653","ref":"refs/heads/master","pushedAt":"2024-05-14T09:33:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec\n\n### What changes were proposed in this pull request?\nIt's a new approach to fix [SPARK-39551](https://issues.apache.org/jira/browse/SPARK-39551)\nThis situation happened for AQEPropagateEmptyRelation when one side is empty and one side is BroadcastQueryStateExec\nThis pr avoid do propagate, not to revert all queryStagePreparationRules's result.\n\n### Why are the changes needed?\nFix bug\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nManuel tested `SPARK-39551: Invalid plan check - invalid broadcast query stage`, it can work well without origin fix and current pr\n\nFor added UT,\n```\n test(\"SPARK-48155: AQEPropagateEmptyRelation check remained child for join\") {\n withSQLConf(\n SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> \"true\") {\n val (_, adaptivePlan) = runAdaptiveAndVerifyResult(\n \"\"\"\n |SELECT /*+ BROADCAST(t3) */ t3.b, count(t3.a) FROM testData2 t1\n |INNER JOIN (\n | SELECT * FROM testData2\n | WHERE b = 0\n | UNION ALL\n | SELECT * FROM testData2\n | WHErE b != 0\n |) t2\n |ON t1.b = t2.b AND t1.a = 0\n |RIGHT OUTER JOIN testData2 t3\n |ON t1.a > t3.a\n |GROUP BY t3.b\n \"\"\".stripMargin\n )\n assert(findTopLevelBroadcastNestedLoopJoin(adaptivePlan).size == 1)\n assert(findTopLevelUnion(adaptivePlan).size == 0)\n }\n }\n```\n\nbefore this pr the adaptive plan is\n```\n*(9) HashAggregate(keys=[b#226], functions=[count(1)], output=[b#226, count(a)#228L])\n+- AQEShuffleRead coalesced\n +- ShuffleQueryStage 3\n +- Exchange hashpartitioning(b#226, 5), ENSURE_REQUIREMENTS, [plan_id=356]\n +- *(8) HashAggregate(keys=[b#226], functions=[partial_count(1)], output=[b#226, count#232L])\n +- *(8) Project [b#226]\n +- BroadcastNestedLoopJoin BuildRight, RightOuter, (a#23 > a#225)\n :- *(7) Project [a#23]\n : +- *(7) SortMergeJoin [b#24], [b#220], Inner\n : :- *(5) Sort [b#24 ASC NULLS FIRST], false, 0\n : : +- AQEShuffleRead coalesced\n : : +- ShuffleQueryStage 0\n : : +- Exchange hashpartitioning(b#24, 5), ENSURE_REQUIREMENTS, [plan_id=211]\n : : +- *(1) Filter (a#23 = 0)\n : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]\n : : +- Scan[obj#22]\n : +- *(6) Sort [b#220 ASC NULLS FIRST], false, 0\n : +- AQEShuffleRead coalesced\n : +- ShuffleQueryStage 1\n : +- Exchange hashpartitioning(b#220, 5), ENSURE_REQUIREMENTS, [plan_id=233]\n : +- Union\n : :- *(2) Project [b#220]\n : : +- *(2) Filter (b#220 = 0)\n : : +- *(2) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#219, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#220]\n : : +- Scan[obj#218]\n : +- *(3) Project [b#223]\n : +- *(3) Filter NOT (b#223 = 0)\n : +- *(3) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#222, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#223]\n : +- Scan[obj#221]\n +- BroadcastQueryStage 2\n +- BroadcastExchange IdentityBroadcastMode, [plan_id=260]\n +- *(4) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#225, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#226]\n +- Scan[obj#224]\n\n```\n\nAfter this patch\n```\n*(6) HashAggregate(keys=[b#226], functions=[count(1)], output=[b#226, count(a)#228L])\n+- AQEShuffleRead coalesced\n +- ShuffleQueryStage 3\n +- Exchange hashpartitioning(b#226, 5), ENSURE_REQUIREMENTS, [plan_id=319]\n +- *(5) HashAggregate(keys=[b#226], functions=[partial_count(1)], output=[b#226, count#232L])\n +- *(5) Project [b#226]\n +- BroadcastNestedLoopJoin BuildRight, RightOuter, (a#23 > a#225)\n :- LocalTableScan , [a#23]\n +- BroadcastQueryStage 2\n +- BroadcastExchange IdentityBroadcastMode, [plan_id=260]\n +- *(4) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#225, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#226]\n +- Scan[obj#224]\n[info] - xxxx (3 seconds, 136 milliseconds)\n\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46523 from AngersZhuuuu/SPARK-48155.\n\nAuthored-by: Angerszhuuuu \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if…"}},{"before":"c0982621f46b696c7c4c6a805ae0c7d101570929","after":"6766c39b458ad7abacd1a5b11c896efabf36f95c","ref":"refs/heads/master","pushedAt":"2024-05-14T07:53:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through aggregates\n\n### What changes were proposed in this pull request?\nPush down throwable predicate through aggregates and add ut for \"can't push down nondeterministic filter through aggregate\".\n\n### Why are the changes needed?\nIf we can push down a filter through Aggregate, it means the filter only references the grouping keys. The Aggregate operator can't reduce grouping keys so the filter won't see any new data after pushing down. So push down throwable filter through aggregate does not affect exception thrown.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nUT\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #44975 from zml1206/SPARK-46707-FOLLOWUP.\n\nAuthored-by: zml1206 \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through ag…"}},{"before":"34588a82239a5c12fefed13e271edd963b821b1c","after":"172a23f780ae2a603908421b49683aff6748e419","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-14T06:47:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-48267][SS] Regression e2e test with SPARK-47305\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to add a regression test (e2e) with SPARK-47305.\n\nAs of commit cae2248bc13 (pre-Spark 4.0), the query in new unit test is represented as below logical plans:\n\n> Batch 0\n\n>> analyzed plan\n\n```\nWriteToMicroBatchDataSource MemorySink, 5067923b-e1d0-484c-914c-b111c9e60aac, Append, 0\n+- Project [value#1]\n +- Join Inner, (cast(code#5 as bigint) = ref_code#14L)\n :- Union false, false\n : :- Project [value#1, 1 AS code#5]\n : : +- StreamingDataSourceV2ScanRelation[value#1] MemoryStreamDataSource\n : +- Project [value#3, cast(code#9 as int) AS code#16]\n : +- Project [value#3, null AS code#9]\n : +- LocalRelation , [value#3]\n +- Project [id#12L AS ref_code#14L]\n +- Range (1, 5, step=1, splits=Some(2))\n```\n\n>> optimized plan\n\n```\nWriteToDataSourceV2 MicroBatchWrite[epoch: 0, writer: ...]\n+- Join Inner\n :- StreamingDataSourceV2ScanRelation[value#1] MemoryStreamDataSource\n +- Project\n +- Filter (1 = id#12L)\n +- Range (1, 5, step=1, splits=Some(2))\n```\n\n> Batch 1\n\n>> analyzed plan\n\n```\nWriteToMicroBatchDataSource MemorySink, d1c8be66-88e7-437a-9f25-6b87db8efe17, Append, 1\n+- Project [value#1]\n +- Join Inner, (cast(code#5 as bigint) = ref_code#14L)\n :- Union false, false\n : :- Project [value#1, 1 AS code#5]\n : : +- LocalRelation , [value#1]\n : +- Project [value#3, cast(code#9 as int) AS code#16]\n : +- Project [value#3, null AS code#9]\n : +- StreamingDataSourceV2ScanRelation[value#3] MemoryStreamDataSource\n +- Project [id#12L AS ref_code#14L]\n +- Range (1, 5, step=1, splits=Some(2))\n```\n\n>> optimized plan\n\n```\nWriteToDataSourceV2 MicroBatchWrite[epoch: 1, writer: ...]\n+- Join Inner\n :- StreamingDataSourceV2ScanRelation[value#3] MemoryStreamDataSource\n +- LocalRelation \n```\n\nNotice the difference in optimized plan between batch 0 and batch 1. In optimized plan for batch 1, the batch side is pruned out, which goes with the path of PruneFilters. The sequence of optimization is,\n\n1) left stream side is collapsed with empty local relation\n2) union is replaced with subtree for right stream side as left stream side is simply an empty local relation\n3) the value of 'code' column is now known to be 'null' and it's propagated to the join criteria (`null = ref_code`)\n4) join criteria is extracted out from join, and being pushed to the batch side\n5) the value of 'ref_code' column can never be null, hence the filter is optimized as `filter false`\n6) `filter false` triggers PruneFilters (where we fix a bug in SPARK-47305)\n\nBefore SPARK-47305, a new empty local relation was incorrectly marked as streaming.\n\nNOTE: I intentionally didn't put the detail like above as code comment, as optimization result is subject to change for Spark versions.\n\n### Why are the changes needed?\n\nIn the PR of SPARK-47305 we only added an unit test to verify the fix, but it wasn't e2e about the workload we encountered an issue. Given the complexity of QO, it'd be ideal to put an e2e reproducer (despite simplified) as regression test.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nNew UT.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46569 from HeartSaVioR/SPARK-48267.\n\nAuthored-by: Jungtaek Lim \nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-48267][SS] Regression e2e test with SPARK-47305"}},{"before":"e6c914f630793992eba7a409ec2cd061f385ce02","after":"c0982621f46b696c7c4c6a805ae0c7d101570929","ref":"refs/heads/master","pushedAt":"2024-05-14T06:40:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-48267][SS] Regression e2e test with SPARK-47305\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to add a regression test (e2e) with SPARK-47305.\n\nAs of commit cae2248bc13 (pre-Spark 4.0), the query in new unit test is represented as below logical plans:\n\n> Batch 0\n\n>> analyzed plan\n\n```\nWriteToMicroBatchDataSource MemorySink, 5067923b-e1d0-484c-914c-b111c9e60aac, Append, 0\n+- Project [value#1]\n +- Join Inner, (cast(code#5 as bigint) = ref_code#14L)\n :- Union false, false\n : :- Project [value#1, 1 AS code#5]\n : : +- StreamingDataSourceV2ScanRelation[value#1] MemoryStreamDataSource\n : +- Project [value#3, cast(code#9 as int) AS code#16]\n : +- Project [value#3, null AS code#9]\n : +- LocalRelation , [value#3]\n +- Project [id#12L AS ref_code#14L]\n +- Range (1, 5, step=1, splits=Some(2))\n```\n\n>> optimized plan\n\n```\nWriteToDataSourceV2 MicroBatchWrite[epoch: 0, writer: ...]\n+- Join Inner\n :- StreamingDataSourceV2ScanRelation[value#1] MemoryStreamDataSource\n +- Project\n +- Filter (1 = id#12L)\n +- Range (1, 5, step=1, splits=Some(2))\n```\n\n> Batch 1\n\n>> analyzed plan\n\n```\nWriteToMicroBatchDataSource MemorySink, d1c8be66-88e7-437a-9f25-6b87db8efe17, Append, 1\n+- Project [value#1]\n +- Join Inner, (cast(code#5 as bigint) = ref_code#14L)\n :- Union false, false\n : :- Project [value#1, 1 AS code#5]\n : : +- LocalRelation , [value#1]\n : +- Project [value#3, cast(code#9 as int) AS code#16]\n : +- Project [value#3, null AS code#9]\n : +- StreamingDataSourceV2ScanRelation[value#3] MemoryStreamDataSource\n +- Project [id#12L AS ref_code#14L]\n +- Range (1, 5, step=1, splits=Some(2))\n```\n\n>> optimized plan\n\n```\nWriteToDataSourceV2 MicroBatchWrite[epoch: 1, writer: ...]\n+- Join Inner\n :- StreamingDataSourceV2ScanRelation[value#3] MemoryStreamDataSource\n +- LocalRelation \n```\n\nNotice the difference in optimized plan between batch 0 and batch 1. In optimized plan for batch 1, the batch side is pruned out, which goes with the path of PruneFilters. The sequence of optimization is,\n\n1) left stream side is collapsed with empty local relation\n2) union is replaced with subtree for right stream side as left stream side is simply an empty local relation\n3) the value of 'code' column is now known to be 'null' and it's propagated to the join criteria (`null = ref_code`)\n4) join criteria is extracted out from join, and being pushed to the batch side\n5) the value of 'ref_code' column can never be null, hence the filter is optimized as `filter false`\n6) `filter false` triggers PruneFilters (where we fix a bug in SPARK-47305)\n\nBefore SPARK-47305, a new empty local relation was incorrectly marked as streaming.\n\nNOTE: I intentionally didn't put the detail like above as code comment, as optimization result is subject to change for Spark versions.\n\n### Why are the changes needed?\n\nIn the PR of SPARK-47305 we only added an unit test to verify the fix, but it wasn't e2e about the workload we encountered an issue. Given the complexity of QO, it'd be ideal to put an e2e reproducer (despite simplified) as regression test.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nNew UT.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46569 from HeartSaVioR/SPARK-48267.\n\nAuthored-by: Jungtaek Lim \nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-48267][SS] Regression e2e test with SPARK-47305"}},{"before":"9241b8e8c0dfe35fbe1631fd440527eb72d88de8","after":"e6c914f630793992eba7a409ec2cd061f385ce02","ref":"refs/heads/master","pushedAt":"2024-05-14T06:17:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48157][SQL] Add collation support for CSV expressions\n\n### What changes were proposed in this pull request?\nIntroduce collation awareness for CSV expressions: from_csv, schema_of_csv, to_csv.\n\n### Why are the changes needed?\nAdd collation support for CSV expressions in Spark.\n\n### Does this PR introduce _any_ user-facing change?\nYes, users should now be able to use collated strings within arguments for CSV functions: from_csv, schema_of_csv, to_csv.\n\n### How was this patch tested?\nE2e sql tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46504 from uros-db/csv-expressions.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48157][SQL] Add collation support for CSV expressions"}},{"before":"7974811218c9fb52ac9d07f8983475a885ada81b","after":"9241b8e8c0dfe35fbe1631fd440527eb72d88de8","ref":"refs/heads/master","pushedAt":"2024-05-14T06:08:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48229][SQL] Add collation support for inputFile expressions\n\n### What changes were proposed in this pull request?\nIntroduce collation awareness for inputFile expressions: input_file_name.\n\n### Why are the changes needed?\nAdd collation support for inputFile expressions in Spark.\n\n### Does this PR introduce _any_ user-facing change?\nYes, users should now be able to use collated strings within arguments for inputFile functions: input_file_name.\n\n### How was this patch tested?\nE2e sql tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46503 from uros-db/input-file-block.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48229][SQL] Add collation support for inputFile expressions"}},{"before":"19d12b249f0fe4cb5b20b9722188c5a850147cec","after":"34588a82239a5c12fefed13e271edd963b821b1c","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-14T05:45:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48265][SQL] Infer window group limit batch should do constant folding\n\n### What changes were proposed in this pull request?\nPlan after PropagateEmptyRelation may generate double local limit\n```\n GlobalLimit 21\n +- LocalLimit 21\n! +- Union false, false\n! :- LocalLimit 21\n! : +- Project [item_id#647L]\n! : +- Filter (xxxx)\n! : +- Relation db.table[,... 91 more fields] parquet\n! +- LocalLimit 21\n! +- Project [item_id#738L]\n! +- LocalRelation , [, ... 91 more fields]\n```\nto\n```\n GlobalLimit 21\n +- LocalLimit 21\n - LocalLimit 21\n +- Project [item_id#647L]\n +- Filter (xxxx)\n +- Relation db.table[,... 91 more fields] parquet\n```\nafter `Infer window group limit batch` batch's `EliminateLimits`\nwill be\n```\n GlobalLimit 21\n +- LocalLimit least(21, 21)\n +- Project [item_id#647L]\n +- Filter (xxxx)\n +- Relation db.table[,... 91 more fields] parquet\n```\nIt can't work, here miss a `ConstantFolding`\n\n### Why are the changes needed?\nFix bug\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46568 from AngersZhuuuu/SPARK-48265.\n\nAuthored-by: Angerszhuuuu \nSigned-off-by: Wenchen Fan \n(cherry picked from commit 7974811218c9fb52ac9d07f8983475a885ada81b)\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48265][SQL] Infer window group limit batch should do constant …"}},{"before":"0ea808880e22e2b6cc97a3e946123bec035ade93","after":"7974811218c9fb52ac9d07f8983475a885ada81b","ref":"refs/heads/master","pushedAt":"2024-05-14T05:44:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48265][SQL] Infer window group limit batch should do constant folding\n\n### What changes were proposed in this pull request?\nPlan after PropagateEmptyRelation may generate double local limit\n```\n GlobalLimit 21\n +- LocalLimit 21\n! +- Union false, false\n! :- LocalLimit 21\n! : +- Project [item_id#647L]\n! : +- Filter (xxxx)\n! : +- Relation db.table[,... 91 more fields] parquet\n! +- LocalLimit 21\n! +- Project [item_id#738L]\n! +- LocalRelation , [, ... 91 more fields]\n```\nto\n```\n GlobalLimit 21\n +- LocalLimit 21\n - LocalLimit 21\n +- Project [item_id#647L]\n +- Filter (xxxx)\n +- Relation db.table[,... 91 more fields] parquet\n```\nafter `Infer window group limit batch` batch's `EliminateLimits`\nwill be\n```\n GlobalLimit 21\n +- LocalLimit least(21, 21)\n +- Project [item_id#647L]\n +- Filter (xxxx)\n +- Relation db.table[,... 91 more fields] parquet\n```\nIt can't work, here miss a `ConstantFolding`\n\n### Why are the changes needed?\nFix bug\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46568 from AngersZhuuuu/SPARK-48265.\n\nAuthored-by: Angerszhuuuu \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48265][SQL] Infer window group limit batch should do constant …"}},{"before":"e7a1efda2221765a8019dc8bdc26b7347405233a","after":"0ea808880e22e2b6cc97a3e946123bec035ade93","ref":"refs/heads/master","pushedAt":"2024-05-14T05:26:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48027][SQL][FOLLOWUP] Add comments for the other code branch\n\n### What changes were proposed in this pull request?\nThis PR propose to add comments for the other code branch.\n\n### Why are the changes needed?\nhttps://github.com/apache/spark/pull/46263 missing the comments for the other code branch.\n\n### Does this PR introduce _any_ user-facing change?\n'No'.\n\n### How was this patch tested?\nN/A\n\n### Was this patch authored or co-authored using generative AI tooling?\n'No'.\n\nCloses #46536 from beliefer/SPARK-48027_followup.\n\nAuthored-by: beliefer \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48027][SQL][FOLLOWUP] Add comments for the other code branch"}},{"before":"78d2a86a927f64403e485b14715a119e282cbdc8","after":"e7a1efda2221765a8019dc8bdc26b7347405233a","ref":"refs/heads/master","pushedAt":"2024-05-14T05:10:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48266][CONNECT] Move package object `org.apache.spark.sql.connect.dsl` to test directory\n\n### What changes were proposed in this pull request?\nAs discussed in https://github.com/apache/spark/pull/46559, the package object `org.apache.spark.sql.connect.dsl` is purely used for testing now, so this pr move it from `src/main/scala/org/apache/spark/sql/connect/dsl` to `src/test/scala/org/apache/spark/sql/connect/dsl`.\n\n### Why are the changes needed?\nThe code only used for testing should be placed in the test directory.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nPass GitHub Actions\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46567 from LuciferYang/SPARK-48266.\n\nAuthored-by: yangjie01 \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48266][CONNECT] Move package object `org.apache.spark.sql.conn…"}},{"before":"ab511a784db2d2b9d0980b63a02fea8f472ceb76","after":"19d12b249f0fe4cb5b20b9722188c5a850147cec","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-14T05:07:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48241][SQL][3.5] CSV parsing failure with char/varchar type columns\n\n### What changes were proposed in this pull request?\nCSV table containing char and varchar columns will result in the following error when selecting from the CSV table:\n```\nspark-sql (default)> show create table test_csv;\nCREATE TABLE default.test_csv (\n id INT,\n name CHAR(10))\nUSING csv\n```\n```\njava.lang.IllegalArgumentException: requirement failed: requiredSchema (struct) should be the subset of dataSchema (struct).\n at scala.Predef$.require(Predef.scala:281)\n at org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56)\n at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)\n at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)\n at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)\n at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)\n at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)\n at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)\n```\n\n### Why are the changes needed?\nFor char and varchar types, Spark will convert them to `StringType` in `CharVarcharUtils.replaceCharVarcharWithStringInSchema` and record `__CHAR_VARCHAR_TYPE_STRING` in the metadata.\n\nThe reason for the above error is that the `StringType` columns in the `dataSchema` and `requiredSchema` of `UnivocityParser` are not consistent. The `StringType` in the `dataSchema` has metadata, while the metadata in the `requiredSchema` is empty. We need to retain the metadata when resolving schema.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nAdd a new test case in `CSVSuite`.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46565 from liujiayi771/branch-3.5-SPARK-48241.\n\nAuthored-by: joey.ljy \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48241][SQL][3.5] CSV parsing failure with char/varchar type co…"}},{"before":"4cc589afe5b5f23442fcacbe149a8ab3057889dc","after":"78d2a86a927f64403e485b14715a119e282cbdc8","ref":"refs/heads/master","pushedAt":"2024-05-14T04:04:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework\n\n### What changes were proposed in this pull request?\nThe pr aims to\n1.migrate `error/warn/info` in module `common` with variables to `structured logging framework` for java side.\n2.convert all dependencies on `org.slf4j.Logger & org.slf4j.LoggerFactory` to `org.apache.spark.internal.Logger & org.apache.spark.internal.LoggerFactory`, in order to completely `prohibit` importing `org.slf4j.Logger & org.slf4j.LoggerFactory` in java code later.\n\n### Why are the changes needed?\nTo enhance Apache Spark's logging system by implementing structured logging.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\n- Pass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46493 from panbingkun/common_java_sl.\n\nAuthored-by: panbingkun \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-48209][CORE] Common (java side): Migrate error/warn/info wit…"}},{"before":"28cf3db779322a487d26fa17282889e217f2d6b5","after":"4cc589afe5b5f23442fcacbe149a8ab3057889dc","ref":"refs/heads/master","pushedAt":"2024-05-14T03:52:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-41794][FOLLOWUP] Add `try_remainder` to python API references\n\n### What changes were proposed in this pull request?\nAdd `try_remainder` to python API references\n\n### Why are the changes needed?\nnew methods should be added to API references\n\n### Does this PR introduce _any_ user-facing change?\ndoc changes\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46566 from zhengruifeng/doc_try_remainder.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-41794][FOLLOWUP] Add try_remainder to python API references"}},{"before":"d9ff78e2e3419b0e82fa1c85c4107dafae3801bd","after":"28cf3db779322a487d26fa17282889e217f2d6b5","ref":"refs/heads/master","pushedAt":"2024-05-14T02:16:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48259][CONNECT][TESTS] Add 3 missing methods in dsl\n\n### What changes were proposed in this pull request?\nAdd 3 missing methods in dsl\n\n### Why are the changes needed?\nthose methods could be used in tests\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46559 from zhengruifeng/missing_3_func.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-48259][CONNECT][TESTS] Add 3 missing methods in dsl"}},{"before":"a101c48dd9650d2bca2047b91f9e2a3ba90f142d","after":"d9ff78e2e3419b0e82fa1c85c4107dafae3801bd","ref":"refs/heads/master","pushedAt":"2024-05-13T23:52:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48260][SQL] Disable output committer coordination in one test of ParquetIOSuite\n\n### What changes were proposed in this pull request?\n\nA test from `ParquetIOSuite` is flaky: `SPARK-7837 Do not close output writer twice when commitTask() fails`\n\nIt turns out to be a race condition. The test injects error to the task committing step, and the job may fail in two ways:\n1. The task got the driver's permission to commit the task, but the committing failed and thus the task failed. This will trigger a stage failure as it means possible data duplication, see https://github.com/apache/spark/pull/36564\n2. In test we disable task retry, so `TaskSetManager` will abort the stage.\n\nBoth these two failures are done by sending an event to `DAGScheduler`, so the final job failure depends on which event gets processed first. This is not a big deal, but that test in `ParquetIOSuite` checks the error class. This PR fixes the flaky test by running the test case in a new test suite with output committer coordination disabled\n\n### Why are the changes needed?\n\nfix flaky test\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nGA test + manual test on lcoal\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46562 from gengliangwang/fixParquetIO.\n\nAuthored-by: Gengliang Wang \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-48260][SQL] Disable output committer coordination in one test …"}},{"before":"8d8cc623085ee4583432bd00cdfb7aeee9c9f7fa","after":"a101c48dd9650d2bca2047b91f9e2a3ba90f142d","ref":"refs/heads/master","pushedAt":"2024-05-13T20:36:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-44953][CORE] Log a warning when shuffle tracking is enabled along side another DA supported mechanism\n\n### What changes were proposed in this pull request?\n\nLog a warning when shuffle tracking is enabled along side another DA supported mechanism\n\n### Why are the changes needed?\n\nSome users enable both shuffle tracking and another mechanism (like migration) and then are confused when their jobs don't scale down.\n\nhttps://issues.apache.org/jira/browse/SPARK-44953\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, user can find the warning log when enabled both shuffle tracking and another DA supported mechanism(shuffle decommission).\n\n### How was this patch tested?\n\nNo\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNO\n\nCloses #45454 from zwangsheng/SPARK-44953.\n\nAuthored-by: zwangsheng \nSigned-off-by: Holden Karau ","shortMessageHtmlLink":"[SPARK-44953][CORE] Log a warning when shuffle tracking is enabled al…"}},{"before":"b14abb3a2ed086d2ff8f340f60c0dc1e460c7a59","after":"8d8cc623085ee4583432bd00cdfb7aeee9c9f7fa","ref":"refs/heads/master","pushedAt":"2024-05-13T17:43:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-41794][SQL] Add `try_remainder` function and re-enable column tests\n\n### What changes were proposed in this pull request?\nAs part of re-enabling the ANSI mode tests for Spark Connect, we discovered that we don't have an equivalent for `try_*` for the remainder of operations. This patch adds the `try_remainder` function in Scala, Python, and Spark Connect and adds the required testing.\n\n### Why are the changes needed?\nANSI and Spark 4\n\n### Does this PR introduce _any_ user-facing change?\nYes, it adds the `try_remainder` function that behaves according to ANSI for division by zero.\n\n### How was this patch tested?\nAdded new UT and E2E tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46434 from grundprinzip/grundprinzip/SPARK-41794.\n\nLead-authored-by: Martin Grund \nCo-authored-by: Martin Grund \nCo-authored-by: Hyukjin Kwon \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-41794][SQL] Add try_remainder function and re-enable column …"}},{"before":"3456d4f69a86af11e7c7124b029a52a81b23f94e","after":"b14abb3a2ed086d2ff8f340f60c0dc1e460c7a59","ref":"refs/heads/master","pushedAt":"2024-05-13T14:42:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48241][SQL] CSV parsing failure with char/varchar type columns\n\n### What changes were proposed in this pull request?\nCSV table containing char and varchar columns will result in the following error when selecting from the CSV table:\n```\nspark-sql (default)> show create table test_csv;\nCREATE TABLE default.test_csv (\n id INT,\n name CHAR(10))\nUSING csv\n```\n```\njava.lang.IllegalArgumentException: requirement failed: requiredSchema (struct) should be the subset of dataSchema (struct).\n at scala.Predef$.require(Predef.scala:281)\n at org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56)\n at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)\n at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)\n at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)\n at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)\n at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)\n at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)\n```\n\n### Why are the changes needed?\nFor char and varchar types, Spark will convert them to `StringType` in `CharVarcharUtils.replaceCharVarcharWithStringInSchema` and record `__CHAR_VARCHAR_TYPE_STRING` in the metadata.\n\nThe reason for the above error is that the `StringType` columns in the `dataSchema` and `requiredSchema` of `UnivocityParser` are not consistent. The `StringType` in the `dataSchema` has metadata, while the metadata in the `requiredSchema` is empty. We need to retain the metadata when resolving schema.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nAdd a new test case in `CSVSuite`.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46537 from liujiayi771/csv-char.\n\nAuthored-by: joey.ljy \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48241][SQL] CSV parsing failure with char/varchar type columns"}},{"before":"42f2132d1fc99bf2ec5bd398d21dcbdbd5cbde47","after":"3456d4f69a86af11e7c7124b029a52a81b23f94e","ref":"refs/heads/master","pushedAt":"2024-05-13T14:37:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-47681][FOLLOWUP] Fix schema_of_variant(decimal)\n\n### What changes were proposed in this pull request?\n\nThe PR https://github.com/apache/spark/pull/46338 found `schema_of_variant` sometimes could not correctly handle variant decimals and had a fix. However, I found that the fix is incomplete and `schema_of_variant` can still fail on some inputs. The reason is that `VariantUtil.getDecimal` calls `stripTrailingZeros`. For an input decimal `10.00`, the resulting scale is -1 and the unscaled value is 1. However, negative decimal scale is not allowed by Spark. The correct approach is to use the `BigDecimal` to construct a `Decimal` and read its precision and scale, as what we did in `VariantGet`.\n\nThis PR also includes a minor change for `VariantGet`, where a duplicated expression is computed twice.\n\n### Why are the changes needed?\n\nThey are bug fixes and are required to process decimals correctly.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nMore unit tests. Some of them would fail without the change in this PR (e.g., `check(\"10.00\", \"DECIMAL(2,0)\")`). Others wouldn't fail, but can still enhance test coverage.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46549 from chenhao-db/fix_decimal_schema.\n\nAuthored-by: Chenhao Li \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-47681][FOLLOWUP] Fix schema_of_variant(decimal)"}},{"before":"27048702830d42864ecd7cbb10da600277887fbe","after":"42f2132d1fc99bf2ec5bd398d21dcbdbd5cbde47","ref":"refs/heads/master","pushedAt":"2024-05-13T14:28:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteWithExpression\n\n### What changes were proposed in this pull request?\n\nThis PR adds more testing for `RewriteWithExpression` around `Window` operators.\n\n### Why are the changes needed?\n\nAdds more testing for `RewriteWithExpression`, which can be fragile around `WindowExpressions`.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nUnit tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46492 from kelvinjian-db/SPARK-48206-window.\n\nAuthored-by: Kelvin Jiang \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteW…"}},{"before":"722fac7cd54eabb9f0b2be018c85955ceee113a4","after":"27048702830d42864ecd7cbb10da600277887fbe","ref":"refs/heads/master","pushedAt":"2024-05-13T14:22:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48250][PYTHON][CONNECT][TESTS] Enable array inference tests at test_parity_types.py\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to enable some array inference tests at test_parity_types.py\n\n### Why are the changes needed?\n\nFor better test coverage for Spark Connect.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, test-only.\n\n### How was this patch tested?\n\nCI in this PR should verify them.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46550 from HyukjinKwon/SPARK-48250.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48250][PYTHON][CONNECT][TESTS] Enable array inference tests at…"}},{"before":"c4c9ccbdf562b5da6066d6cd0517ab27bf9de3fa","after":"722fac7cd54eabb9f0b2be018c85955ceee113a4","ref":"refs/heads/master","pushedAt":"2024-05-13T14:22:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48031][SQL] Support view schema evolution\n\n### What changes were proposed in this pull request?\n\nAdd the following syntax to CREATE VIEW... WITH SCHEMA ...:\n```\nCREATE [ OR REPLACE ] [ TEMPORARY ] VIEW [ IF NOT EXISTS ] view_name\n [ column_list ]\n [ schema_binding ]\n [ COMMENT view_comment ]\n [ TBLPROPERTIES clause ]\n AS query\n\nschema_binding\n WITH SCHEMA { BINDING | [TYPE] EVOLUTION | COMPENSATION }\n\ncolumn_list\n ( { column_alias [ COMMENT column_comment ] } [, ...] )\n```\n\nAllow changing of the schema binding in:\n```\nALTER VIEW view_name schema_binding\n```\n\nThe semantic is:\n\nSchema_binding\nOptionally specifies how the view adapts to changes to the schema of the query due to changes in the underlying object definitions.\nThis clause is not supported for temporary views.\n\n- BINDING\nThe view will become invalid if the query column-list changes except for the following conditions:\nThe column-list includes a star clause, and there are additional columns. These additional columns are ignored.\nThe type of one or more columns changed in a way that allows them to be safely cast to the types using implicit casting rules.\n\n- COMPENSATION\nThe view will become invalid if the query column list changes except for the following conditions:\nThe column-list includes a star clause, and there are additional columns. These additional columns are ignored.\nThe type of one or more columns changed in a way that allows them to be cast using explicit cast rules.\nThis is the default behavior.\n\n- TYPE EVOLUTION\nThe view will adopt any changes to types in the query column list into is own definition when such a change is detected upon reference of the view.\n\n- EVOLUTION\nBehaves like TYPE EVOLUTION and also adopts changes in column names or added and dropped columns if the view does not include an explicit column list.\nThe view will only be invalidated if the query cannot be parsed anymore, or the optional view column_list does not match the number of expressions in the query select-list anymore.\n\nWe also introduce a new SQL Config:\nspark.sql.defaultViewSchemaBinding\n Control the default behavior of views when the underlying schema changes.\n Valid values are:\n* COMPENSATION - Any supported casts.\n* DISABLED - Disable the feature.\n\n### Why are the changes needed?\n\nSchema changes are a frequent occurrence, especially when ingesting data.\nIn the course of it most frequently:\n\n- Types may need to be widened\n- Columns are added to tables\n- Fields are added to structs\n\nThe traditional SCHEMA BINDING behavior makes schema evolution very hard since it invalidates views agressively.\nThis causes erros, requiring user intervention.\nAllowing views to be created to \"roll with the punches\" or tolerate changes in the underlying schema improves uptime.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, this is a new feature with new grammar and a new config to influence its default.\n\n### How was this patch tested?\n\nNew tests are added\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46267 from srielau/SPARK-48031-view-evolution.\n\nAuthored-by: Serge Rielau \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48031][SQL] Support view schema evolution"}},{"before":"224fd35d6fca63ed0f321817ea8c35620abf8caf","after":"c4c9ccbdf562b5da6066d6cd0517ab27bf9de3fa","ref":"refs/heads/master","pushedAt":"2024-05-13T13:59:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"LuciferYang","name":"YangJie","path":"/LuciferYang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1475305?s=80&v=4"},"commit":{"message":"[SPARK-48257][BUILD] Polish POM for Hive dependencies\n\n### What changes were proposed in this pull request?\n\n1. `org.apache.hive` and `${hive.group}` co-exists in `pom.xml`, this PR unifies them to `${hive.group}`\n2. `hive23.version`, `hive.version.short`, `` were used in Spark 3.0 period to distinguish hive 1.2 and hive 2.3, which are useless today, this PR removes those outdated definitions.\n3. update/remove some outdated comments. e.g. remove the comment for Hive LOG4J exclusion because Spark already switched to LOG4J2, generalize the comments for Hive Parquet/Jetty exclusion\n\n### Why are the changes needed?\n\nCleanup POM.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass CI.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46558 from pan3793/SPARK-48257.\n\nAuthored-by: Cheng Pan \nSigned-off-by: yangjie01 ","shortMessageHtmlLink":"[SPARK-48257][BUILD] Polish POM for Hive dependencies"}},{"before":"92b708b6b7672e827370139ecd430d448503b445","after":"224fd35d6fca63ed0f321817ea8c35620abf8caf","ref":"refs/heads/master","pushedAt":"2024-05-13T11:13:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-48250][PYTHON][CONNECT][TESTS] Enable array inference tests at test_parity_types.py\"\n\nThis reverts commit 13b0d1aab36740293814ce54e38cb4d86f8b762d.","shortMessageHtmlLink":"Revert \"[SPARK-48250][PYTHON][CONNECT][TESTS] Enable array inference …"}},{"before":"2554e4716f1693c7f35bdf6bd4f1671c25e0cfe9","after":"92b708b6b7672e827370139ecd430d448503b445","ref":"refs/heads/master","pushedAt":"2024-05-13T09:59:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48254][BUILD] Enhance Guava version extraction rule in `dev/test-dependencies.sh`\n\n### What changes were proposed in this pull request?\n\nEnhance Guava version extraction rule(regex) in `dev/test-dependencies.sh`\n\n### Why are the changes needed?\n\nThe existing regex `^[0-9.]+$` only matches older Guava versions, e.g. 14.0.1, while the recent Guava version has a suffix `-jre` or `-android`, for example, `33.1.0-jre`\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nManually tested on upgrading Guava to `33.1.0-jre`\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46555 from pan3793/SPARK-48254.\n\nAuthored-by: Cheng Pan \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48254][BUILD] Enhance Guava version extraction rule in `dev/te…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAESaWqPgA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}