Iceberg Table Maintenance in Apache Impala
2023-2024 tavasz
Szoftver
Téma leírása
Apache Impala is an open-source distributed analytic query engine for Big Data. Apache Iceberg is a modern, high-performance format for huge analytic tables.
If an Iceberg table is frequently updated/written to in small batches (from a streaming source), a lot of small files are created. This decreases read performance. Similarly, frequent row-level deletes contribute to this problem by creating delete files which have to be merged on read.
The solution to this problem is compacting small files into larger ones, and also merging delete files with data files in the process.
Goals:
- introduce a new syntax to execute table maintenance, which performs the following tasks:
- rewrite small files into larger ones
- merge delete deltas
Külső partner: Cloudera Hungary Kft.
Maximális létszám:
1 fő