Releases · jorgecarleitao/arrow2

This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

16 Jul 01:59

sundy-li

v0.17.0

73ed7c8

v0.17.0 Latest

Latest

What's Changed

Fixed writing nested parquet by @jorgecarleitao in #1390
Fixed error in writing sliced binary by @jorgecarleitao in #1391
Fixed broken guide link by @kjschiroo in #1395
Changed methods to slice arrays by @jorgecarleitao in #1396
Fixed writing of sliced arrays to parquet by @jorgecarleitao in #1397
Simplified code via DRY by @jorgecarleitao in #1398
Improved API of getting mutable from Buffer by @jorgecarleitao in #1399
Simplified code by @jorgecarleitao in #1401
Improved support for date64 written by pyarrow to parquet by @jorgecarleitao in #1402
Fixed nested boolean offset by @ritchie46 in #1404
Added apply_validity and set_validity to mutable utf8 array by @Arty-Maly in #1406
Fixed ahash dependency for wasm by @hzuo in #1407
Added cast for FixedSizeBinary to (Large)Binary by @ritchie46 in #1403
Updated base64 to 0.21 by @WindSoilder in #1408
Fixed statistics writing flag and correct null_count in dictionaries by @ritchie46 in #1414
Added convenience accessor array.get by @ozgrakkurt in #1416
Re-exported the bloom_filter module from parquet2 crate by @ozgrakkurt in #1420
Added support for MapArray read and write to parquet by @b41sh in #1419
Added support for decimal256 read/write in parquet by @TCeason in #1412
Added support for JSON serialization of dictionary by @ritchie46 in #1424
Added MapScalar by @b41sh in #1428
Changed encoded float::Inf as null in json by @SimonSchneider in #1427
Added set_len method to Buffer by @haixuanTao in #1374
Fixed issue with Time32/Time64 datatype in csv reader by @christophe-petitjean in #1425
Made num_values public by @b41sh in #1431
Made len/len_proxy consistent with Offsets by @ritchie46 in #1434
Added memmap &[u8] as BooleanArray by @ritchie46 in #1436
Added impl_mutable_array_mut_validity macro for mutable arrays by @Arty-Maly in #1435
Added buffer interoperability with arrow-rs by @tustvold in #1437
Changed async ipc writer to accept schema by value by @ritchie46 in #1439
Updated multiversion and support wider registers by @ritchie46 in #1440
Updated dependencies by @ritchie46 in #1441
Added interoperability with arrow-schema by @tustvold in #1442

New Contributors

@kjschiroo made their first contribution in #1395
@WindSoilder made their first contribution in #1408
@TCeason made their first contribution in #1412
@haixuanTao made their first contribution in #1374
@christophe-petitjean made their first contribution in #1425

Full Changelog: v0.16.0...v0.17.0

Contributors

christophe-petitjean, b41sh, and 11 other contributors

Assets 2

09 Feb 05:26

jorgecarleitao

v0.16.0

ac28bc9

v0.16.0

A new release is here! Thank you everyone that contributed to it! 🙇

Full Changelog

Breaking changes:

Made IPC writer take owned schema #1361 (ritchie46)
Correctly update child-offsets in GrowableUnion #1360 (jleibs)

Fixed bugs:

invalid written parquet file of nested structures. (Mixing list with structs) #1325
Fix incorrect downcast in estimated_size_bytes #1351 (jleibs)
fix(parquet): nested struct /list writing #1347 (ritchie46)
Fixed csv infer_schema on empty fields #1342 (tripokey)

Enhancements:

Added support for take of FixedSizeListArray #1386 (kylebarron)
Renamed factory argument on parquet read functions to reader_factory #1380 (ozgrakkurt)
Made some structs and functions public #1375 (b41sh)
Added Utf8Array::apply_validity #1367 (Arty-Maly)
Added set/get scratches #1363 (ritchie46)
Amortized intermediate allocations in IPC writer #1362 (ritchie46)
Improved clippy #1353 (jorgecarleitao)

Documentation updates:

Fixed typo in OffsetsBuffer docs #1373 (DzenanJupic)
Update README.md to fix capitalization and spelling #1338 (yerke)

Testing updates:

add toolchain.toml #1349 (ritchie46)

New Contributors

@yerke made their first contribution in #1338
@jleibs made their first contribution in #1351
@tripokey made their first contribution in #1342
@Arty-Maly made their first contribution in #1367
@DzenanJupic made their first contribution in #1373
@kylebarron made their first contribution in #1386

Contributors

jleibs, yerke, and 4 other contributors

Assets 2

18 Dec 17:46

jorgecarleitao

v0.15.0

efa630b

v0.15.0

A new release is here, adding a number of new features and improvements to arrow2. Thank you to everyone that contributed to it!

This release adds support to a new format, the "record" JSON format, contributed by @AnIrishDuck, a new trait TryExtendFromSelf to efficiently concatenate an array into an existing mutable array, and multiple improvements by @sundy-li and @ritchie46 to performance. Finally, we have a new API OffsetsBuffer and Offsets proposed by @ritchie46 to allow creating variable sized-arrays without having to check for offsets.

This release also features a number of contributions from first contributors:

@benesch made their first contribution in #1271
@RinChanNOWWW made their first contribution in #1287
@datapythonista made their first contribution in #1290
@sandflee made their first contribution in #1286
@Samrose-Ahmed made their first contribution in #1279
@jondo2010 made their first contribution in #1300
@cyr made their first contribution in #1318
@universalmind303 made their first contribution in #1321

Thank you everyone for the great work this year, and happy festivities everyone!

Full Changelog

Breaking changes:

Added values' capacity to MutableBinaryArray::reserve #1277
Removed from_data from all arrays #1328 (jorgecarleitao)
Added Offsets and OffsetsBuffer #1316 (jorgecarleitao)
Bumped parquet2 dependency #1304 (ritchie46)
Added data_pagesize_limit to write parquet pages #1303 (sundy-li)
Bumped arrow-format to 0.8 #1298 (Xuanwo)
Improved iterators #1270 (jorgecarleitao)

New features:

Added TryExtendFromSelf #1278 (jorgecarleitao)
Added support for JSON ser/de records layout #1275 (AnIrishDuck)

Fixed bugs:

Parquet writes all values of sliced arrays? #1323
Avro schema: Invalid record names #1269
Fixed writing nested/sliced arrays to parquet #1326 (ritchie46)
Fixed failing to accept dictionary full of nulls #1312 (ritchie46)
Added support for Extension types in ffi #1300 (jondo2010)
Fixed error in memory usage of sliced binary/list/utf8arrays #1293 (ritchie46)
Fixed descending ordering when specify nulls first #1286 (sandflee)
Added avro record names when converting arrow schema to avro #1279 (Samrose-Ahmed)

Enhancements:

Fixed clippy #1336 (jorgecarleitao)
Improved UnionArray #1331 (jorgecarleitao)
Bumped json-deserializer version #1321 (universalmind303)
Removed flushing during arrow IPC writing to improve performance when using a buffered writer #1318 (cyr)
Improved performance of check_indexes #1313 (ritchie46)
Improved performance of checking offsets ~-64-73% #1305 (ritchie46)
Added reserve to pushable containers in parquet extend_from_decoder #1301 (ritchie46)
Optimized slicing #1285 (jorgecarleitao)
Improved ZipValidity iterators #1284 (ritchie46)
Added MutableBinaryValuesArray #1276 (jorgecarleitao)

Documentation updates:

Fixed link from the API to the guide #1290 (datapythonista)

Contributors

jondo2010, AnIrishDuck, and 9 other contributors

Assets 2

27 Sep 06:30

jorgecarleitao

v0.14.1

828d976

v0.14.1

A couple of backward-compatible bug fixes and improvements that everyone benefits from :)

Thank you @cjermain, @shaeqahmed and @ozgrakkurt! 🙇

Full Changelog

Fixed bugs:

Potential bug in reading lists from avro? #1252
Removed un-used code #1258 (jorgecarleitao)
Fixed error reading unbounded Avro list #1253 (jorgecarleitao)
Add missing call to try_push_valid for nested avro deserialization #1248 (shaeqahmed)

Enhancements:

Bump json_deserializer version to 0.4.1 #1261 (cjermain)
Fixed clippy for 1.60 #1259 (jorgecarleitao)
Added BinaryArray::into_mut and double-ended support for its iterator #1255 (ozgrakkurt)

Testing updates:

Improved test for nullable struct read from Avro #1250 (jorgecarleitao)

Contributors

cjermain, shaeqahmed, and ozgrakkurt

Assets 2

12 Sep 06:13

jorgecarleitao

v0.14.0

96df5e1

v0.14.0

Another release of arrow2 is here!

Besides API improvements to reading IPC and parquet, there are two main new features, the ability to memory map arrow files (check out https://jorgecarleitao.github.io/arrow2/v0.14.0/guide/io/ipc_mmap.html) and support for decimal 256.

The following had their first time contribution to their crate:

@daniel-martinez-maqueda-sap made their first contribution in #1204
@AnIrishDuck made their first contribution in #1211
@samkaufman made their first contribution in #1213
@teymour-aldridge made their first contribution in #1225
@poga made their first contribution in #1234
@knil-sama made their first contribution in #1237

Thank you everyone for all the issues, PRs and ideas!

Full Changelog

Breaking changes:

Removed Count (parquet statistics) #1217 (jorgecarleitao)
Exposed parquet indexed page filtering to FileReader #1216 (jorgecarleitao)
Simpler IPC API #1208 (jorgecarleitao)
Migrated Avro code to avro-schema repo #1199 (jorgecarleitao)
Added support for decimal 256 #1194 (jorgecarleitao)

New features:

Added support for decoding delta-length-encoded binary (parquet) #1228 (jorgecarleitao)
Added support to read and write Parquet's delta-bitpacked (integer encoding) #1226 (jorgecarleitao)
Added support for parquet sidecar to FileReader #1215 (jorgecarleitao)
Write 64bit aligned IPC files #1201 (jorgecarleitao)
Added support to mmap IPC format #1197 (jorgecarleitao)
Added MutableStructArray #1196 (hohav)

Fixed bugs:

Stack overflow in parquet RowGroupReader with groups_filter #1206
fixed comparisson and validity kernels #1243 (ritchie46)
Fixed reading nested stats #1240 (jorgecarleitao)
FileSink now closes the underlying writer. #1213 (samkaufman)
Fixed JSON infer order #1212 (jorgecarleitao)
Fixed StackOverflow in skipping many parquet row groups #1210 (jorgecarleitao)
Fix escaped like wildcards #1204 (daniel-martinez-maqueda-sap)
Removed println :( #1203 (jorgecarleitao)

Enhancements:

Added schema to FileReader #1246 (jorgecarleitao)
Simpler nested parquet read #1241 (jorgecarleitao)
Removed unneeded code #1229 (jorgecarleitao)
Improved MutableStruct::push #1223 (hohav)
Reduced binary size #1221 (jorgecarleitao)
Added utf8 <> binary cast #1220 (jorgecarleitao)
split parquet compression backend features #1207 (ritchie46)
Improved API of mmap #1205 (ritchie46)
Added MutableArray::reserve #1202 (jorgecarleitao)
Delayed dict #1185 (jorgecarleitao)

Documentation updates:

Fixed guide and improved examples #1247 (jorgecarleitao)
Added documentation on parquet compatibility under TimeUnit. #1238 (TurnOfACard)
Fixed typo in error message for impl StructArray #1237 (knil-sama)
Fixed incorrect command in doc for generating ORC files #1234 (poga)
Improved github page generation #1233 (jorgecarleitao)
Fix a typo in the docs #1225 (teymour-aldridge)
Fix some doc links/typos #1211 (AnIrishDuck)

Testing updates:

Fixed clippy warnings #1227 (jorgecarleitao)
Updated integration test #1214 (jorgecarleitao)

Contributors

poga, samkaufman, and 4 other contributors

Assets 2

04 Aug 05:53

jorgecarleitao

v0.13.1

350b690

v0.13.1

Full Changelog

Thanks @daniel-martinez-maqueda-sap!

Fixed bugs:

Fix escaped like wildcards #1204 (daniel-martinez-maqueda-sap)
Removed println :( #1203 (jorgecarleitao)

Contributors

daniel-martinez-maqueda-sap

Assets 2

31 Jul 20:19

jorgecarleitao

v0.13.0

3f3febf

v0.13.0

A new version (0.13) is now available on crates.io! 🎉🎉🎉🎉

This is another large release of arrow2. Among the many, many changes (see below), it is worth noting:

Added copy-on-write API to perform operations in place, improving performance of expressions like (a + b) * 2 by a factor of 2-10x
Added support to read from Apache ORC format
Added support for projection and limit pushdown when reading from Arrow IPC format
Added support for f16

Thank you to the numerous contributors, both via PRs and issues, that resulted in this fantastic release 🙇

Breaking changes:

Made nested argument of array_to_pages non-owning #1174
Replaced Result by panic in boolean comparison #1159 (jorgecarleitao)
Improved dictionary invariants #1137 (jorgecarleitao)
Change signature of PrimitiveScalar::value to return reference #1129 (ncpenke)
Removed need to pass encodings by value #1123 (ritchie46)
Removed unused NativeType::to_ne_bytes #1112 (jorgecarleitao)
Avoid clone in with_validity #1104 (jorgecarleitao)
Reduced need of unsafe in FFI #1100 (jorgecarleitao)
Removed Buffer::into_mut and make_mut functions #1089 (jorgecarleitao)
Renamed Bitmap::null_count to Bitmap::unset_bits #1087 (jorgecarleitao)
Made chunk_size optional in parquet's column_iter_to_arrays #1055 (jorgecarleitao)
Migrated from Arc<dyn Array> to Box<dyn Array> #1042 (jorgecarleitao)

New features:

Added support to read ORC #1189 (jorgecarleitao)
Added support for limit pushdown to IPC reading #1135 (jorgecarleitao)
Added support to write and read Intervals from and to parquet #1122 (jorgecarleitao)
Added support to write FixedSizeBinary to Avro #1118 (jorgecarleitao)
Added support for projections in reading IPC streams #1097 (joshuataylor)
Added support to write parquet _metadata sidecar #1063 (jorgecarleitao)
Added cow APIs (2x-10x vs non-cow) #1061 (jorgecarleitao)
Added support to read and write f16 #1051 (jorgecarleitao)

Fixed bugs:

Fixed error not implemented error when reading plain, after-dict pages for fix-len-binary from parquet #1192 (jorgecarleitao)
Fixed error in decoding nested multi-page columns from parquet #1188 (jorgecarleitao)
Fixed error in counting items in nested parquet #1182 (jorgecarleitao)
Fixed reading stats from int96 parquet #1181 (jorgecarleitao)
Fixed limit pushdown in parquet #1180 (jorgecarleitao)
use FnOnce for PrimitiveArray::apply_validity #1176 (ritchie46)
release memory on predicate with 0% selectivity #1163 (ritchie46)
Fixed error in reading Struct<List<...>> from parquet #1150 (jorgecarleitao)
Fixed IPC projection #1149 (ritchie46)
Fixed casting dictionary keys #1143 (ritchie46)
Fixed reading arrays from parquet with required children #1140 (jorgecarleitao)
Fixed panic in deserializing nested statistics #1139 (jorgecarleitao)
Aligned name of FixedSizeBinaryArray::values_iter #1117 (jorgecarleitao)
Fixed error in FixedSizeListArray::new_null #1114 (jorgecarleitao)
Fixed panic in writing dictionaries to parquet #1113 (jorgecarleitao)
Fixed error in reading chunked parquet #1108 (jorgecarleitao)
Raise error when invalid fields are passed to flight #1093 (jorgecarleitao)
Made IPC projection not sort projection #1082 (jorgecarleitao)
Fixed error in chunked_mut bitmap #1081 (jorgecarleitao)
Fixed panic in bitmap assign_mut #1078 (ritchie46)
Panic-free read of IPC files #1075 (jorgecarleitao)
Bumped parquet2 (minor) requirement #1071 (jorgecarleitao)
Fixed divide by zero on reading empty row group #1062 (jorgecarleitao)
Fixed missing validation of number of encodings passed when writing to parquet #1057 (jorgecarleitao)

Enhancements:

Improved performance of reading Binary from parquet #1190 (ritchie46)
Bumped to latest nightly #1186 (gyscos)
Improved error message #1179 (jorgecarleitao)
Added support to read and write nested dictionaries to parquet #1175 (jorgecarleitao)
Added MutableUtf8Array::into_data #1170 (ritchie46)
Added Default for Utf8Array #1169 (ritchie46)
fix(parquet): allow to read other logical types from parquet #1168 (sundy-li)
fix(parquet): enforce to use ParquetTimeUnit::Nanoseconds for PhysicalType::Int96 #1167 (sundy-li)
Added constructor MutableFixedSizeListArray::new_from #1161 (hohav)
Removed unneeded Default constraint #1157 (hohav)
Improved checks to safety invariants in FFI #1154 (jorgecarleitao)
Removed un-needed indirection #1153 (jorgecarleitao)
Soften generic constraint of Buffer #1152 (sundy-li)
Use ahash by default #1148 (ritchie46)
Reduced bound checks [#1142](https://github.com/j...

Assets 2

05 Jun 12:09

jorgecarleitao

v0.12.0

6608071

v0.12.0

A new version of arrow2 is now available in crates.io. 🎉🎉🎉

See below all great things that were released 🚀. But before that, thank you so much to everyone that contributed to this release: 🙇

@ahmedriza, @dexterduck, @GPSnoopy, @HaoYang670, @SimonSchneider, @TurnOfACard, @aptr322, @arxra, @b41sh, @cjermain, @dbr, @jorgecarleitao, @ritchie46

Breaking changes:

Require one encoding per parquet column on write #1012
Bumped parquet2 #1035 (jorgecarleitao)
Improved performance of deserializing JSON (2x) #1024 (jorgecarleitao)
Remove from_trusted_len_* from Buffer #1020 (jorgecarleitao)
Bumped arrow-format #1011 (jorgecarleitao)
Replace fn Offset::is_large() as const Offset::IS_LARGE #1002 (HaoYang670)
Renamed ArrowError to Error #993 (jorgecarleitao)

New features:

Added support to deserialize MapArray from parquet #1045 (jorgecarleitao)
Added support for random access reads from IPC #1034 (jorgecarleitao)
Added support for custom sort build_compare_fn #1016 (b41sh)
Added support to write nested parquet #1007 (jorgecarleitao)
Added support for deserializing JSON from iterator #989 (cjermain)

Fixed bugs:

Writing of ListArray does not preserve all values #1008
Write a two-dimensional list to parquet file failed #992
Writing to Parquet fails for extension types that contain lists #830
Fixed using lower limit than size of first parquet row group #1046 (arxra)
Fixed error in consuming sliced FixedSizedBinary from c data interface (FFI) #1026 (jorgecarleitao)
Fixed lexsort limit equal or greater than row_count #1021 (b41sh)
Fixed error in reading nested parquet structs #1015 (jorgecarleitao)
Fixed panic on debug print of invalid timezones #1013 (jorgecarleitao)
Treat empty timezone string as no-timezone #1009 (dbr)
Fixed encoding of NaN to json #990 (SimonSchneider)
Fixed error in writing ListArray to parquet #984 (jorgecarleitao)
Fixed decoding Binary Plain pages with dictionary pages #982 (aptr322)

Enhancements:

Added Debug and PartialEq for MapArray #1043 (jorgecarleitao)
Exposed compression levels for parquet #1041 (ritchie46)
Added .arced/.boxed to arrays #1040 (jorgecarleitao)
Added utility to create encodings #1018 (jorgecarleitao)
Made parquet_to_arrow_schema public #1006 (martingallagher)
Speeded up min_max_boolean for the case where all values are null #1005 (HaoYang670)
Simplified min_max_string and min_max_binary #1004 (HaoYang670)
Added support for Decimal in build_compare #998 (GPSnoopy)
remove accidental quadratic null_count #991 (ritchie46)
Aligns MutableDictionaryArray's with MutablePrimitiveArrays with TryPush #981 (TurnOfACard)

Documentation updates:

Cleaned docs for BinaryArray #1047 (jorgecarleitao)
Improved API docs for MutableBitmap #1025 (jorgecarleitao)
Improved docs for bitmap #1022 (jorgecarleitao)
Improved API docs for PrimitiveArray and Utf8Array #1017 (jorgecarleitao)
Fixed dev guide #1003 (jorgecarleitao)

Testing updates:

Added more tests #1029 (jorgecarleitao)
Moved coverage reporting to cargo-llvm-cov #1028 (jorgecarleitao)
Added more tests (increase coverage) #1027 (jorgecarleitao)
Moved tests from lib to tests #1001 (jorgecarleitao)
Allowed feature-specific test runs #985 (jorgecarleitao)

Contributors

dbr, b41sh, and 11 other contributors

Assets 2

05 May 20:45

jorgecarleitao

v0.11.2

d035964

v0.11.2

Full Changelog

New features:

Added support to append to existing IPC Arrow file #972 (jorgecarleitao)
Added pop to utf8/binary/fixedSize MutableArray #966 (ygf11)
Added support for union scalars #930 (ncpenke)

Fixed bugs:

Added support to read nested binary from parquet #978 (jorgecarleitao)
Fixed empty reader panic for NDJSON type infer #974 (Roberto-XY)
Prevented SO in large parquet files #973 (ritchie46)
Fixed API bug in async read of IPC metadata #969 (jorgecarleitao)
Fixed writing required list to parquet #968 (jorgecarleitao)

Enhancements:

Added support Parquet deserialize LargeList and Uint data types #979 (b41sh)
Made reading of IPC dictionaries lazy #971 (jorgecarleitao)
Allowed creating IPC FileWriter without writing to the file #970 (jorgecarleitao)

Assets 2

27 Apr 21:12

jorgecarleitao

v0.11.0

837be9f

v0.11.0

Arrow2 v0.11.0 is out!! 🎉🎉🎉

This release is mainly focus on improving upon the previous one on better parquet support. In particular, we have the main ingredients to read indexed parquet pages, which allow skipping deserializing individual pages, and since this version parquet files are written with page indexes. There is still some work to improve the frontend API to skip pages via statistics, which will be left for the next version.

This version also contains multiple bug fixes.

Thanks everyone that contributed to this release (individual PRs below)! 🙇

Changelog

Full Changelog

Breaking changes:

Refactored parquet statistics deserialization #962 (jorgecarleitao)
Made GroupFilter Send + Sync #947 (jorgecarleitao)

New features:

Added support for non-ordered projections to IPC reading #961 (jorgecarleitao)
Added support for reading indexed parquet pages #923 (jorgecarleitao)

Fixed bugs:

Parquet regression: exceptions.ArrowErrorException: NotYetImplemented("Can't read Dictionary(UInt32, LargeUtf8, false) from parquet") #955
Reading Parquet binary column panics during deserialization 'attempt to subtract with overflow` #944
Reading Parquet file written by pyarrow with lz4 compression fails with OutOfSpec("Thrift out of range") #940
Issues when trying to create a parquet file with FixedSizedListArray #691
Fixed bug in writing csv with buffer resizing #965 (ritchie46)
Fixed bug in reading binary parquet #945 (jorgecarleitao)
Fixed error in writing fixedSizeListArray to parquet #941 (jorgecarleitao)
Fixed support to read dict nested binary parquet #924 (jorgecarleitao)

Enhancements:

Reduced memory usage in reading parquet #964 (jorgecarleitao)
Simpler IPC code #939 (jorgecarleitao)
don't allocate string when writing to csv #935 (ritchie46)
Removed un-needed generic parameter #927 (jorgecarleitao)
update to odbc-api 0.36.0 #925 (pacman82)

Documentation updates:

Fixed example of parallel read via rayon #958 (jorgecarleitao)
Fixed guide deployment #931 (jorgecarleitao)
Typo fix #919 (bkmgit)

Testing updates:

Fixed patch of integration tests #960 (jorgecarleitao)
Added test for MapArray #942 (jorgecarleitao)
Fixed wrong clippy warning #938 (jorgecarleitao)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

New Contributors

Contributors

Contributors

Contributors

Contributors

Contributors

Contributors

Changelog

Releases: jorgecarleitao/arrow2

v0.17.0

What's Changed

New Contributors

Contributors

v0.16.0

New Contributors

Contributors

v0.15.0

Contributors

v0.14.1

Contributors

v0.14.0

Contributors

v0.13.1

Contributors

v0.13.0

v0.12.0

Contributors

v0.11.2

v0.11.0

Changelog