Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Note 3.0.2 #41558

Open
gavinchou opened this issue Oct 8, 2024 · 2 comments
Open

Release Note 3.0.2 #41558

gavinchou opened this issue Oct 8, 2024 · 2 comments

Comments

@gavinchou
Copy link
Collaborator

gavinchou commented Oct 8, 2024

This version is product ready release. We strongly recommend to use this version instead of other previous 3.0.x (x<2) for compute-storage decoupled mode.

Behavioral Changes

Storage

  • Limited the number of tablets in a single backup task to prevent FE memory overflow. #40518
  • The SHOW PARTITIONS command now displays the CommittedVersion of partitions. #28274

Other

  • The default printing mode (asynchronous) of fe.log now includes file line number information. If performance issues are encountered due to line number output, please switch to BRIEF mode. #39419
  • The default value of the session variable ENABLE_PREPARED_STMT_AUDIT_LOG has been changed from true to false, and the audit log of prepare statements will no longer be printed. #38865
  • The default value of the session variable max_allowed_packet has been adjusted from 1MB to 16MB to align with MySQL 8.4. #38697
  • The JVM of FE and BE defaults to using the UTF-8 character set. #39521

New Features

Storage

  • Backup and recovery now support clearing tables or partitions that are not in the backup. #39028

Compute-Storage Decoupled

  • Support for parallel recycling of expired data on multiple tablets. #37630
  • Support for changing storage vaults through ALTER statements. #38685 #37606
  • Support for importing a large number of tablets (5000+) in a single transaction (experimental feature). #38243
  • Support for automatically aborting pending transactions caused by reasons such as node restarts, solving the issue of pending transactions blocking decommission or schema change. #37669
  • A new session variable enable_segment_cache has been added to control whether to use segment cache during queries (default is true). #37141
  • Resolved the issue of not being able to import a large amount of data during schema changes in compute-storage decoupled mode. #39558
  • Support for adding multiple follower roles of FE in compute-storage decoupled mode. #38388
  • Support for using memory as file cache to accelerate queries in environments with no disks or low-performance HDDs. #38811

Lakehouse

  • New Lakesoul Catalog has been added. Apache Doris Docs
  • A new system table catalog_meta_cache_statistics has been added to view the usage of various metadata caches in External Catalog. #40155

Asynchronous Materialized Views

Query Optimizer

  • Support for is [not] true/false expressions. #38623

Query Execution

  • A new CRC32 function has been added. #38204
  • New aggregate functions skew and kurt have been added. #41277
  • Profiles are now persisted to the FE's disk to retain more profiles. #33690
  • A new system table workload_group_privileges has been added to view permission information related to workload groups. #38436
  • A new system table workload_group_resource_usage has been added to monitor resource statistics of workload groups. #39177
  • Workload groups now support limiting reads of local IO and remote IO. #39012
  • Workload groups now support cgroupv2 to limit CPU usage. #39374
  • A new system table information_schema.partitions has been added to view some table creation attributes. #40636

Semi-Structured Data Management

Other

  • Support for using the SHOW statement to display BE's configuration information, such as SHOW BACKEND CONFIG LIKE ${pattern}. #36525

Improvements

Load

  • Improved the import efficiency of routine load when encountering frequent EOFs from Kafka. #39975
  • The stream load result now includes the time taken to read HTTP data, ReceiveDataTimeMs, which can quickly determine slow stream load issues caused by network reasons. #40735
  • Optimized the routine load timeout logic to avoid frequent timeouts during inverted index and mow writes. #40818

Storage

  • Support for batch addition of partitions. #37114

Compute-Storage Decoupled

  • Added the meta-service HTTP interface /MetaService/http/show_meta_ranges to facilitate the statistics of KV distribution in FDB. #39208
  • The meta-service/recycler stop script ensures that the process fully exits before returning. #40218
  • Support for using the session variable version_comment (Cloud Mode) to display the current deployment mode as compute-storage decoupled. #38269
  • Fixed the detailed message returned when transaction submission fails. #40584
  • Support for using one meta-service process to provide both metadata services and data recycling services. #40223
  • Optimized the default configuration of file_cache to avoid potential issues when not set. #41421 #41507
  • Improved query performance by batch retrieving the version of multiple partitions. #38949
  • Delayed the redistribution of tablets to avoid query performance issues caused by temporary network fluctuations. #40371
  • Optimized the read-write lock logic in the balance. #40633
  • Enhanced the robustness of file cache in handling TTL filenames during restarts/crashes. #40226
  • Added the BE HTTP interface /api/file_cache?op=hash to facilitate the calculation of the hash file names of segment files on disk. #40831
  • Optimized the unified naming to be compatible with using compute group to represent BE groups (original cloud cluster). #40767
  • Optimized the waiting time for obtaining locks when calculating delete bitmaps in primary key tables. #40341
  • When there are many delete bitmaps in primary key tables, optimized the high CPU consumption during queries by pre-merging multiple delete bitmaps. #40204
  • Support for managing FE/BE nodes in compute-storage decoupled mode through SQL statements, hiding the logic of direct interaction with meta-service when deploying in compute-storage decoupled mode. #40264
  • Added a script for rapid deployment of FDB. #39803
  • Optimized the output of SHOW CACHE HOTSPOT to unify the column name style with other SHOW statements. #41322
  • When using a storage vault as the storage backend, disallowed the use of latest_fs() to avoid binding different storage backends to the same table. #40516
  • Optimized the timeout strategy for calculating delete bitmaps when importing mow tables. #40562 #40333
  • The enable_file_cache in be.conf is now enabled by default in compute-storage decoupled mode. #41502

Lakehouse

  • When reading tables in CSV format, support for the session keep_carriage_return setting to control the reading behavior of the \r symbol. #39980
  • The default maximum memory of BE's JVM has been adjusted to 2GB (affecting only new deployments). #41403
  • Hive Catalog has added hive.recursive_directories_table and hive.ignore_absent_partitions properties to specify whether to recursively traverse data directories and whether to ignore missing partitions. #39494
  • Optimized the Catalog refresh logic to avoid generating a large number of connections during refresh. #39205
  • SHOW CREATE DATABASE and SHOW CREATE TABLE for external data sources now display location information. #39179
  • The new optimizer supports inserting data into JDBC external tables using the INSERT INTO statement. #41511
  • MaxCompute Catalog now supports complex data types. #39259
  • Optimized the logic for reading and merging data shards of external tables. #38311
  • Optimized some refresh strategies for metadata caches of external tables. #38506
  • Paimon tables now support pushing down IN/NOT IN predicates. #38390
  • Compatible with tables created in Parquet format by Paimon version 0.9. #41020

Asynchronous Materialized Views

  • Building asynchronous materialized views now supports the use of both immediate and starttime. #39573
  • Asynchronous materialized views based on external tables will refresh the metadata cache of the external tables before refreshing the materialized views, ensuring construction based on the latest external table data. #38212
  • Partition incremental construction now supports rolling up according to weekly and quarterly granularities. #39286

MySQL Compatibility

Query Optimizer

  • The aggregate function GROUP_CONCAT now supports the use of both DISTINCT and ORDER BY. #38080
  • Optimized the collection and use of statistical information, as well as the logic for estimating row counts and cost calculations, to generate more efficient and stable execution plans.
  • Window function partition data pre-filtering now supports cases containing multiple window functions. #38393

Query Execution

  • Reduced query latency by running prepare pipeline tasks in parallel. #40874
  • Display Catalog information in Profile. #38283
  • Optimized the computational performance of IN filtering conditions. #40917
  • Supported cgroupv2 in K8S to limit Doris's memory usage. #39256
  • Optimized the performance of converting strings to datetime types. #38385
  • When a string is a decimal number, support casting it to an int, which will be more compatible with certain behaviors of MySQL. #38847

Semi-Structured Data Management

  • Optimized the performance of inverted index matching. #41122
  • Temporarily prohibited the creation of inverted indexes with tokenization on arrays. #39062
  • explode_json_array now supports binary JSON types. #37278
  • IP data types now support bloomfilter indexes. #39253
  • IP data types now support row storage. #39258
  • Nested data types such as ARRAY, MAP, and STRUCT now support schema changes. #39210
  • When creating MTMV, automatically truncate KEYs encountered in VARIANT data types. #39988
  • Lazy loading of inverted indexes during queries to improve performance. #38979
  • add inverted index file size for open file. #37482
  • Reduced access to object storage interfaces during compaction to improve performance. #41079
  • Added three new Query Profile Metrics related to inverted indexes. #36696
  • Reduced cache overhead for non-PreparedStatement SQL to improve performance. #40910
  • Pre-warming cache now supports inverted indexes. #38986
  • Inverted indexes are now cached immediately after writing. #39076

Compatibility

  • Fixed the issue of Thrift ID incompatibility on the master with branch-2.1. #41057

Other

  • BE HTTP API now supports authentication; set config::enable_all_http_auth to true (default is false) when authentication is required. #39577
  • Optimized the user permissions required for the REFRESH operation. Permissions have been relaxed from ALTER to SHOW. #39008
  • Reduced the range of nextId when calling advanceNextId(). #40160
  • Optimized the caching mechanism for Java UDFs. #40404

Bug Fixes

Load

  • Fixed the issue where abortTransaction did not handle return codes. #41275
  • Fixed the issue where transactions failed to commit or abort in compute-storage decoupled mode without calling afterCommit/afterAbort. #41267
  • Fixed the issue where Routine Load could not work properly when modifying consumer offsets in compute-storage decoupled mode. #39159
  • Fixed the issue of repeatedly closing file handles when obtaining error log file paths. #41320
  • Fixed the issue of incorrect job progress caching for Routine Load in compute-storage decoupled mode. #39313
  • Fixed the issue where Routine Load could get stuck when failing to commit transactions in compute-storage decoupled mode. #40539
  • Fixed the issue where Routine Load kept reporting data quality check errors in compute-storage decoupled mode. #39790
  • Fixed the issue where Routine Load did not check transactions before committing in compute-storage decoupled mode. #39775
  • Fixed the issue where Routine Load did not check transactions before aborting in compute-storage decoupled mode. #40463
  • Fixed the issue where cluster keys did not support certain data types. #38966
  • Fixed the issue of transactions being repeatedly committed. #39786
  • Fixed the issue of use after free with WAL when BE exits. #33131
  • Fixed the issue where WAL playback did not skip completed import transactions in compute-storage decoupled mode. #41262
  • Fixed the logic for selecting BE in group commit in compute-storage decoupled mode. #39986 #38644
  • Fixed the issue where BE might crash when group commit was enabled for insert into. #39339
  • Fixed the issue where insert into with group commit enabled might get stuck. #39391
  • Fixed the issue where not enabling the group commit option during import might result in a table not found error. #39731
  • Fixed the issue of transaction submission timeouts due to too many tablets. #40031
  • Fixed the issue of concurrent opens with Auto Partition. #38605
  • Fixed the issue of import lock granularity being too large. #40134
  • Fixed the issue of coredumps caused by zero-length varchars. #40940
  • Fixed the issue of incorrect index Id values in log prints. #38790
  • Fixed the issue of memtable shifting not closing brpc streaming. #40105
  • Fixed the issue of inaccurate bvar statistics during memtable shifting. #39075
  • Fixed the issue of multi-replication fault tolerance during memtable shifting. #38003
  • Fixed the issue of incorrect message length calculations for Routine Load with multiple tables in one stream. #40367
  • Fixed the issue of inaccurate progress reporting for Broker Load. #40325
  • Fixed the issue of inaccurate data scan volume reporting for Broker Load. #40694
  • Fixed the issue of concurrency with Routine Load in compute-storage decoupled mode. #39242
  • Fixed the issue of Routine Load jobs being canceled in compute-storage decoupled mode. #39514
  • Fixed the issue of progress not being reset when deleting Kafka topics. #38474
  • Fixed the issue of updating progress during transaction state transitions in Routine Load. #39311
  • Fixed the issue of Routine Load switching from a paused state to a paused state. #40728
  • Fixed the issue of Stream Load records being missed due to database deletion. #39360

Storage

  • Fixed the issue of missing storage policies. #38700
  • Fixed the issue of errors during cross-version backup and recovery. #38370
  • Fixed the NPE issue with ccr binlog. #39909
  • Fixed potential issues with duplicate keys in mow. #41309 #39791 #39958 #38369 #38331
  • Fixed the issue of not being able to write after backup and recovery in high-frequency write scenarios. #40118 #38321
  • Fixed the issue of data errors potentially triggered by deleting empty strings and schema changes. #41064
  • Fixed the issue of incorrect statistics due to column updates. #40880
  • Limited the size of tablet meta pb to prevent BE crashes due to oversized meta. #39455
  • Fixed the potential column misalignment issue with the new optimizer in begin; insert into values; commit. #39295

Compute-Storage Decoupled

  • Fixed the issue where the tablet distribution might be inconsistent across multiple FEs in compute-storage decoupled mode. #41458
  • Fixed the issue where TVF might not work in multi-computing group environments. #39249
  • Fixed the issue where compaction used resources that had already been released when BE exited in compute-storage decoupled mode. #39302
  • Fixed the issue where automatic start-stop might cause FE replay to get stuck. #40027
  • Fixed the issue where the BE status and the stored status in meta-service were inconsistent. #40799
  • Fixed the issue where the FE->meta-service connection pool could not automatically expire and reconnect. #41202 #40661
  • Fixed the issue where some tablets might repeatedly undergo unexpected balance processes during rebalance. #39792
  • Fixed the issue where storage vault permissions were lost after FE restarted. #40260
  • Fixed the issue where tablet row counts and other statistical information might be incomplete due to FDB scan range pagination. #40494
  • Fixed the performance issue caused by a large number of aborted transactions associated with the same label. #40606
  • Fixed the issue where commit_txn did not automatically re-enter, maintaining consistent behavior between compute-storage decoupled and integrated modes. #39615
  • Fixed the issue where the number of projected columns increased when dropping columns. #40187
  • Fixed the issue where delete statements did not correctly handle return values, causing data to still be visible after deletion. #39428
  • Fixed the coredump issue caused by rowset metadata competition during file cache preheating. #39361
  • Fixed the issue where the entire cache space would be used up when TTL cache enabled LRU eviction. #39814
  • Fixed the issue where temporary files could not be recycled when importing commit rowset failed with HDFS storage backend. #40215

Lakehouse

  • Fixed some issues with predicate pushdown in JDBC Catalog. #39064
  • Fixed the issue of not being able to read when Struct type columns are missing in Parquet format. #38718
  • Fixed the issue of FileSystem leaks on the FE side in some cases. #38610
  • Fixed the issue of metadata cache information being inconsistent when Hive/Iceberg tables write back in some cases. #40729
  • Fixed the issue of unstable partition ID generation for external tables in some cases. #39325
  • Fixed the issue of external table queries selecting BE nodes in the blacklist in some cases. #39451
  • Optimized the timeout time for batch retrieval of external table partition information to avoid long-term thread occupation. #39346
  • Fixed the issue of memory leaks when querying Hudi tables in some cases. #41256
  • Fixed the issue of connection pool connection leaks in JDBC Catalog in some cases. #39582
  • Fixed the issue of BE memory leaks in JDBC Catalog in some cases. #41041
  • Fixed the issue of not being able to query Hudi data on Alibaba Cloud OSS. #41316
  • Fixed the issue of not being able to read empty partitions in MaxCompute. #40046
  • Fixed the issue of poor performance when querying Oracle through JDBC Catalog. #41513
  • Fixed the issue of BE crashes when querying Deletion Vector of Paimon tables after enabling file cache features. #39877
  • Fixed the issue of not being able to access Paimon tables on HDFS clusters with HA enabled. #39806
  • Temporarily disabled the Page Index filtering feature of Parquet to avoid potential issues. #38691
  • Fixed the issue of not being able to read unsigned types in Parquet files. #39926
  • Fixed the issue of potential infinite loops when reading Parquet files in some cases. #39523

MySQL Compatibility

Asynchronous Materialized Views

  • Fixed the issue where partition construction might select the wrong table to track partitions if both sides have the same column names. #40810
  • Fixed the issue where transparent rewrite partition compensation might result in incorrect results. #40803
  • Fixed the issue where transparent rewrite did not take effect on external tables. #38909
  • Fixed the issue where nested materialized views might not refresh properly. #40433

Synchronous Materialized Views

  • Fixed the issue where creating synchronous materialized views on MOW tables might result in incorrect query results. #39171

Query Optimizer

  • Fixed the issue where existing synchronous materialized views might not be usable after upgrading. #41283
  • Fixed the issue of not correctly handling milliseconds when comparing datetime literals. #40121
  • Fixed the issue of potential errors in conditional function partition pruning. #39298
  • Fixed the issue where MOW tables with synchronous materialized views could not perform delete operations. #39578
  • Fixed the issue where the nullable of slots in JDBC external table query predicates might be incorrectly planned, causing query errors. #41014

Query Execution

  • Fixed the memory leak issue caused by the use of runtime filters. #39155
  • Fixed the issue of excessive memory usage by window functions. #39581
  • Fixed a series of function compatibility issues during rolling upgrades. #41023 #40438 #39648
  • Fixed the issue of incorrect results with encryption_function when used with constants. #40201
  • Fixed the issue of errors when importing single-table materialized views. #39061
  • Fixed the issue of incorrect partition result calculations for window functions. #39100 #40761
  • Fixed the issue of incorrect calculations for topn when null values are present. #39497
  • Fixed the issue of incorrect results with the map_agg function. #39743
  • Fixed the issue of incorrect messages returned by cancel. #38982
  • Fixed the issue of BE core dumps caused by encrypt and decrypt functions. #40726
  • Fixed the issue of queries getting stuck due to too many scanners in high-concurrency scenarios. #40495
  • Supported time types in runtime filters. #38258
  • Fixed the issue of incorrect results with window funnel functions. #40960

Semi-Structured Data Management

  • Fixed the issue of match function errors when no indexes were present. #38989
  • Fixed the issue of crashes when ARRAY data types were used as parameters for array_min/array_max functions. #39492
  • Fixed the issue of nullable with the array_enumerate_uniq function. #38384
  • Fixed the issue of bloomfilter indexes not being updated when adding or deleting columns. #38431
  • Fixed the issue of es-catalog parsing exceptions with array data. #39104
  • Fixed the issue of improper predicate push-down in es-catalog. #40111
  • Fixed the issue of exceptions caused by modifying input data with map() and struct() functions. #39699
  • Fixed the issue of index compaction crashes in special cases. #40294
  • Fixed the issue of ARRAY type inverted indexes missing nullbitmaps. #38907
  • Fixed the issue of incorrect results with the count() function on inverted indexes. #41152
  • Fixed the issue of correct results with the explode_map function when using aliases. #39757
  • Fixed the issue of VARIANT type not being able to use row storage for exceptional JSON data. #39394
  • Fixed the issue of memory leaks when returning ARRAY results with VARIANT type. #41358
  • Fixed the issue of changing column names with VARIANT type. #40320
  • Fixed the issue of potential precision loss when converting VARIANT type to DECIMAL type. #39650
  • Fixed the issue of nullable handling with VARIANT type. #39732
  • Fixed the issue of sparse column reading with VARIANT type. #40295

Other

  • Fixed the compatibility issue between new and old audit log plugins. #41401
  • Fixed the issue where users could see processes of others in certain cases. #39747
  • Fixed the issue where users with permissions could not export. #38365
  • Fixed the issue where create table like required create permissions for the existing table. #37879
  • Fixed the issue where some features did not verify permissions. #39726
  • Fixed the issue of not correctly closing connections when using SSL. #38587
  • Fixed the issue where executing ALTER VIEW operations in some cases caused FE to fail to start. #40872
@gavinchou
Copy link
Collaborator Author

行为变更

存储

  • 限制单个备份任务的tablet数量,避免FE内存溢出。#40518
  • SHOW PARTITIONS命令现在显示分区的CommittedVersion#28274

其他

  • fe.log的默认打印模式(异步)现在包含文件行号信息。如果遇到因行号输出导致的性能问题,请选择BRIEF模式。#39419
  • 默认将session变量ENABLE_PREPARED_STMT_AUDIT_LOG的值从true更改为false,不再打印prepare语句的审计日志。#38865
  • 将session变量max_allowed_packet的默认值从1MB调整为16MB,与MySQL 8.4保持一致。#38697
  • FE和BE的JVM默认使用UTF-8字符集。#39521

新特性

存储

  • 备份和恢复现在支持清除不在备份中的表或分区。#39028

存算分离

  • 支持并行回收多个tablet上的过期数据。#37630
  • 支持通过ALTER语句变更storage vault。#38685 #37606
  • 支持单个事务同时导入大量tablet(5000+)(实验性功能)。#38243
  • 支持自动中止因节点重启等原因导致的未决事务,解决未决事务阻塞decommission或schema change的问题。#37669
  • 新增session变量enable_segment_cache控制查询时是否使用segment cache(默认为true)。#37141
  • 解决存算分离模式下进行schema change时不能大量导入的问题。#39558
  • 支持在存算分离模式下允许添加多个follower角色的FE。#38388
  • 支持在无盘或低性能HDD环境下使用内存作为file cache以加速查询。#38811

Lakehouse

  • 新增Lakesoul Catalog。Apache Doris Docs
  • 新增系统表catalog_meta_cache_statistics,用于查看External Catalog中各类元数据缓存的使用情况。#40155

异步物化视图

查询优化器

  • 支持is [not] true/false表达式。#38623

查询执行

  • 新增CRC32函数。#38204
  • 新增聚合函数skew和kurt。#41277
  • 将profile持久化到FE的磁盘中,以保留更多的profile。#33690
  • 新增系统表workload_group_privileges以查看workload group相关的权限信息。#38436
  • 新增系统表workload_group_resource_usage以监控workload group的资源统计信息。#39177
  • Workload group现在支持限制本地IO和远程IO的读取。#39012
  • Workload group现在支持cgroupv2以限制CPU使用。#39374
  • 新增系统表information_schema.partitions以查看一些建表属性。#40636

半结构化数据管理

其他

  • 支持使用SHOW语句展示BE的配置信息,例如SHOW BACKEND CONFIG LIKE ${pattern}#36525

改进

导入

  • 优化了routine load在遇到Kafka频繁EOF时的导入效率。#39975
  • Stream load结果中增加了读取HTTP数据的耗时时间ReceiveDataTimeMs,可以快速判断网络原因导致的stream load慢问题。#40735
  • 优化了routine load超时逻辑,避免了倒排和mow写入频繁超时问题。#40818

存储

  • 支持批量添加分区。#37114

存算分离

  • 添加了meta-service HTTP接口/MetaService/http/show_meta_ranges,便于统计FDB中KV分布组成。#39208
  • meta-service/recycler stop脚本确保进程完全退出后才返回。#40218
  • 支持使用session变量version_comment(Cloud Mode)来显示当前部署模式为存算分离模式。#38269
  • 修复了提交事务失败时返回的详细消息。#40584
  • 支持使用一个meta-service进程同时提供元数据服务和数据回收服务。#40223
  • 优化了file_cache的默认配置,避免了未设置时可能导致的无法正确运行的问题。#41421 #41507
  • 通过批量获取多个partition的version提高了查询性能。#38949
  • 延迟变更tablet的分布,避免了临时网络抖动引起的查询性能问题。#40371
  • 优化了balance逻辑中的读写锁。#40633
  • 提高了file cache在重启/宕机等情况下处理TTL文件名的鲁棒性。#40226
  • 增加了BE HTTP接口/api/file_cache?op=hash,方便计算segment文件在盘上的hash文件名。#40831
  • 优化了统一命名,兼容使用compute group代表BE分组(原cloud cluster)。#40767
  • 优化了主键表计算delete bitmap时获取锁的等待时间。#40341
  • 当主键表delete bitmap数量多时,通过提前合并多个delete bitmap来优化查询时CPU消耗高的问题。#40204
  • 支持通过SQL语句管理存算分离模式下的FE/BE节点,隐藏部署存算分离模式时直接和meta-service交互的逻辑。#40264
  • 增加了快速部署FDB脚本。#39803
  • 优化了SHOW CACHE HOTSPOT的输出,使其和其他SHOW语句的列名风格统一。#41322
  • 使用storage vault作为存储后端时,不允许使用latest_fs()以规避同个表绑定不同的存储后端。#40516
  • 优化了mow表导入时计算delete bitmap的超时策略。#40562 #40333
  • 存算分离模式下be.conf的enable_file_cache默认开启。#41502

Lakehouse

  • 读取CSV格式的表时,支持通过会话keep_carriage_return设置对\r符号的读取行为。#39980
  • BE的JVM最大内存默认调整为2GB(仅影响新部署用户)。#41403
  • Hive Catalog新增hive.recursive_directories_tablehive.ignore_absent_partitions属性,用于指定是否递归遍历数据目录,以及是否忽略缺失的分区。#39494
  • 优化了Catalog刷新逻辑,避免了刷新产生大量连接。#39205
  • SHOW CREATE DATABASESHOW CREATE TABLE针对外部数据源,增加了location信息显示。#39179
  • 新优化器支持通过INSERT INTO命令将数据插入到JDBC外表。#41511
  • MaxCompute Catalog支持复杂类型。#39259
  • 优化了外表数据分片的读取合并逻辑。#38311
  • 优化了外表元数据缓存的一些刷新策略。#38506
  • Paimon表支持IN/NOT IN谓词下推。#38390
  • 兼容Paimon 0.9版本创建的Parquet格式的表。#41020

异步物化视图

  • 构建异步物化视图支持同时使用immediate和starttime。#39573
  • 基于外表的异步物化视图,在刷新物化视图前会刷新外表的元数据缓存,保证基于最新外表数据构建。#38212
  • 分区增量构建支持按照周和季度粒度上卷。#39286

MySQL兼容性

查询优化器

  • 聚合函数GROUP_CONCAT现在支持同时使用DISTINCTORDER BY#38080
  • 优化了统计信息的收集、使用,以及估算行数和代价计算的逻辑,现在可以生成更高效稳定的执行计划。
  • 窗口函数分区数据预过滤支持包含多个窗口函数的情况。#38393

查询执行

  • 通过并行运行prepare pipeline task来降低查询延时。#40874
  • 在Profile中显示Catalog信息。#38283
  • 优化了IN过滤条件的计算性能。#40917
  • 在K8S中支持cgroupv2来限制Doris的内存使用。#39256
  • 优化了字符串到datetime类型的转换性能。#38385
  • 当字符串是一个小数时,支持将其cast为int,这将更兼容MySQL的某些行为。#38847

半结构化数据管理

  • 优化了倒排索引匹配的性能。#41122
  • 暂时禁止在数组上创建带分词的倒排索引。#39062
  • explode_json_array支持二进制JSON类型。#37278
  • IP数据类型支持bloomfilter索引。#39253
  • IP数据类型支持行存。#39258
  • ARRAY、MAP、STRUCT嵌套数据类型支持schema change。#39210
  • 创建MTMV时遇到VARIANT数据类型自动截断KEY。#39988
  • 查询时懒加载倒排索引提升性能。#38979
  • add inverted index file size for open file#37482
  • compaction时减少倒排索引访问对象存储接口提升性能。#41079
  • 增加了3个倒排索引相关的Query Profile Metric。#36696
  • 减少非PreparedStatement SQL的cache开销提升性能。#40910
  • 预热缓存支持倒排索引。#38986
  • 倒排索引写入即缓存。#39076

兼容性

  • 修复了Thrift ID在master上与branch-2.1不兼容的问题。#41057

其他

  • BE HTTP API支持鉴权,需要鉴权时将config::enable_all_http_auth设置为true(默认为false)。#39577
  • 优化了REFRESH操作所需的用户权限。从ALTER权限放宽到SHOW权限。#39008
  • 减少了调用advanceNextId()时nextId的范围。#40160
  • 优化了Java UDF的缓存机制。#40404

缺陷修复

导入

  • 修复了abortTransaction没有处理返回码的问题。#41275
  • 修复了存算分离模式下提交/中止事务失败时未调用afterCommit/afterAbort的问题。#41267
  • 修复了存算分离模式下Routine Load修改消费偏移量无法工作的问题。#39159
  • 修复了获取错误日志文件路径时重复关闭文件的问题。#41320
  • 修复了存算分离模式下Routine Load作业进度缓存不正确的问题。#39313
  • 修复了存算分离模式下Routine Load提交事务失败导致卡住的问题。#40539
  • 修复了存算分离模式下Routine Load一直报数据质量检查错误的问题。#39790
  • 修复了存算分离模式下Routine Load未在提交前事务进行检查的问题。#39775
  • 修复了存算分离模式下Routine Load未在中止事务前进行检查的问题。#40463
  • 修复了cluster key不支持某些数据类型的问题。#38966
  • 修复了事务被重复提交的问题。#39786
  • 修复了WAL在BE退出时use after free的问题。#33131
  • 修复了存算分离模式下WAL回放未跳过已经完成了的导入事务的问题。#41262
  • 修复了存算分离模式下group commit选择BE的逻辑。#39986 #38644
  • 修复了insert into开启group commit时BE可能coredump的问题。#39339
  • 修复了insert into开启group commit时可能会卡住的问题。#39391
  • 修复了导入不打开group commit选项时可能会报找不到表的问题。#39731
  • 修复了tablet数量太多提交事务超时的问题。#40031
  • 修复了Auto Partition并发open的问题。#38605
  • 修复了导入锁粒度太大的问题。#40134
  • 修复了varchar长度为0导致coredump的问题。#40940
  • 修复了日志打印的index Id值不正确的问题。#38790
  • 修复了memtable前移未close brpc streaming的问题。#40105
  • 修复了memtable前移bvar统计不准确的问题。#39075
  • 修复了memtable前移多副本容错的问题。#38003
  • 修复了Routine Load一流多表错误计算消息长度的问题。#40367
  • 修复了Broker Load进度汇报不准确的问题。#40325
  • 修复了Broker Load扫描数据量汇报不准确的问题。#40694
  • 修复了存算分离模式下Routine Load并发的问题。#39242
  • 修复了存算分离模式下Routine Load job被取消的问题。#39514
  • 修复了删除Kafka topic时进度未被重置的问题。#38474
  • 修复了Routine Load事务状态转换时更新进度的问题。#39311
  • 修复了Routine Load从暂停状态切换到暂停状态的问题。#40728
  • 修复了Stream Load记录因数据库被删除被漏记录的问题。#39360

存储

  • 修复了storage policy丢失的问题。#38700
  • 修复了跨版本备份恢复报错的问题。#38370
  • 修复了ccr binlog NPE问题。#39909
  • 修复了可能的mow重复key问题。#41309 #39791 #39958 #38369 #38331
  • 修复了高频写入场景下备份恢复之后不能写入的问题。#40118 #38321
  • 修复了删除空字符串和schema change交叉可能触发的数据错误问题。#41064
  • 修复了列更新导致的数据统计不正确问题。#40880
  • 限制了tablet meta pb的大小,防止大小过大导致BE宕机。#39455
  • 修复了begin; insert into values; commit新优化器可能的列错位问题。#39295

存算分离

  • 修复了存算分离模式下多个FE的tablet分布可能不一致的问题。#41458
  • 修复了TVF在多计算组环境下可能不工作的问题。#39249
  • 修复了存算分离模式BE退出时compaction使用了已经释放的资源问题。#39302
  • 修复了自动启停可能导致FE replay卡住的问题。#40027
  • 修复了BE状态和meta-service中存储的状态不一致的问题。#40799
  • 修复了FE->meta-service连接池不能自动过期重连的问题。#41202 #40661
  • 修复了rebalance过程中有一些tablet可能会来回进行非预期的balance问题。#39792
  • 修复了FE重启后storage vault权限丢失的问题。#40260
  • 修复了tablet行数等统计信息可能因为FDB scan range分页导致统计不全的问题。#40494
  • 修复了同个label下关联大量的abort事务导致的性能问题。#40606
  • 修复了commit_txn没有自动重入的问题,保持存算一体和存算分离行为一致。#39615
  • 修复了drop column时投影列变多的问题。#40187
  • 修复了delete语句返回值没有正确处理导致删除之后数据仍可见的问题。#39428
  • 修复了文件缓存预热时因为rowset元数据竞争导致的coredump问题。#39361
  • 修复了TTL缓存开启LRU淘汰时会用满整个缓存空间的问题。#39814
  • 修复了基于HDFS存储后端导入commit rowset失败时临时文件不能回收的问题。#40215

Lakehouse

  • 修复了一些JDBC Catalog谓词下推的问题。#39064
  • 修复了当Parquet格式中Struct类型列缺失时无法读取的问题。#38718
  • 修复了部分情况下FE侧FileSystem泄露的问题。#38610
  • 修复了部分情况下Hive/Iceberg表写回导致元数据缓存信息不一致的问题。#40729
  • 修复了部分情况下为外表生成分区ID不稳定的问题。#39325
  • 修复了部分情况下外表查询会选择在黑名单中的BE节点的问题。#39451
  • 优化了分批获取外表分区信息时的超时时间,避免了长时间占用线程。#39346
  • 修复了部分情况下查询Hudi表导致内存泄露的问题。#41256
  • 修复了部分情况下JDBC Catalog可能存在连接池连接泄露的问题。#39582
  • 修复了部分情况下JDBC Catalog可能存在BE内存泄露的问题。#41041
  • 修复了无法查询阿里云OSS上Hudi数据的问题。#41316
  • 修复了无法读取MaxCompute空分区的问题。#40046
  • 修复了通过JDBC Catalog查询Oracle表示性能差的问题。#41513
  • 修复了开启文件缓存功能后,查询Paimon表Deletion Vector时BE宕机的问题。#39877
  • 修复了无法访问开启HA的HDFS集群上Paimon表的问题。#39806
  • 临时关闭了Parquet的Page Index过滤功能以避免一些潜在问题。#38691
  • 修复了无法读取Parquet文件中unsigned类型的问题。#39926
  • 修复了部分情况下读取Parquet文件可能导致死循环的问题。#39523

MySQL兼容性

异步物化视图

  • 修复了分区构建时,如果两侧有相同的列名,可能选择错误的表跟踪分区的问题。#40810
  • 修复了透明改写分区补偿可能导致结果错误的问题。#40803
  • 修复了透明改写在外表不生效的问题。#38909
  • 修复了嵌套物化视图可能不能正常刷新的问题。#40433

同步物化视图

  • 修复了在MOW表上创建同步物化视图可能导致查询结果错误的问题。#39171

查询优化器

  • 修复了升级后原有同步物化视图可能不可用的问题。#41283
  • 修复了datetime字面量比较时,没有正确处理毫秒的问题。#40121
  • 修复了条件函数分区裁剪可能错误的问题。#39298
  • 修复了存在同步物化视图的MOW表无法执行delete的问题。#39578
  • 修复了JDBC外表查询谓词中的slot的nullable可能规划不正确,导致查询报错的问题。#41014

查询执行

  • 修复了runtime filter在使用过程中导致的内存泄露问题。#39155
  • 修复了window function在使用内存特别多的问题。#39581
  • 修复了一系列滚动升级期间函数兼容性的问题。#41023 #40438 #39648
  • 修复了encryption_function在常量时结果错误的问题。#40201
  • 修复了单表物化视图导入时报错的问题。#39061
  • 修复了窗口函数分区结果计算错误的问题。#39100 #40761
  • 修复了topn计算在有null值时计算错误的问题。#39497
  • 修复了map_agg函数计算结果错误的问题。#39743
  • 修复了cancel返回的消息错误的问题。#38982
  • 修复了encrypt和decrypt函数导致BE Core的问题。#40726
  • 修复了在高并发场景下,过多的scanner导致查询卡住的问题。#40495
  • Runtime filter中支持time类型。#38258
  • 修复了window funnel函数结果错误的问题。#40960

半结构化数据管理

  • 修复了没有索引时match函数报错的问题。#38989
  • 修复了ARRAY数据类型作为array_min/array_max函数参数时crash的问题。#39492
  • 修复了array_enumerate_uniq函数nullable的问题。#38384
  • 修复了添加或删除列时bloomfilter索引没有更新的问题。#38431
  • 修复了es-catalog解析异常array数据的问题。#39104
  • 修复了es-catalog不合理条件下推的问题。#40111
  • 修复了map() struct()函数修改了输入数据导致异常的问题。#39699
  • 修复了特殊情况下索引compaction crash的问题。#40294
  • 修复了ARRAY类型倒排索引缺少nullbitmap的问题。#38907
  • 修复了倒排索引count()结果的问题。#41152
  • 修复了explode_map使用别名时结果正确性问题。#39757
  • 修复了VARIANT类型中异常JSON数据无法使用行存的问题。#39394
  • 修复了VARIANT类型中返回ARRAY结果时内存泄漏的问题。#41358
  • 修复了VARIANT类型修改列名的问题。#40320
  • 修复了VARIANT类型转成DECIMAL类型可能丢失精度的问题。#39650
  • 修复了VARIANT类型nullable处理问题。#39732
  • 修复了VARIANT类型稀疏列读取问题。#40295

其他

  • 修复了新旧audit log plugin兼容性问题。#41401
  • 修复了某些情况下用户能看到他人进程的问题。#39747
  • 修复了有权限的用户也不能导出的问题。#38365
  • 修复了create table like需要已有表的create权限的问题。#37879
  • 修复了一些功能没有校验权限的问题。#39726
  • 修复了使用SSL连接时未正确关闭连接的问题。#38587
  • 修复了部分情况下执行ALTER VIEW操作导致FE无法启动的问题。#40872

@gavinchou
Copy link
Collaborator Author

gavinchou commented Oct 13, 2024

Thanks all who contribute to this release:

924060929 BePPPower BiteTheDDDDt ByteYue CalvinKirs Ceng23333 ChenPeng2013 DarvenDuan Gabriel39 HappenLee Jibing-Li Johnnyssc Lchangliang LiBinfeng-01 Mryange SWJTU-ZhangLei TangSiyang2001 Toms1999 Vallishp WinkerDu Yukang-Lian Yulei-Yang airborne12 amorynan biohazard4321 bobhan1 caiconghui cambyzju catpineapple cjj2010 csun5285 dataroaring deardeng eldenmoon elon-X englefly feiniaofeiafei felixwluo freemandealer gavinchou glzhao89 hello-stephen htyoung hubgeter hust-hhb jacktengg justfortaste kaijchen kaka11chen liaoxin01 liutang123 lsy3993 luwei16 morningman morrySnow mrhhsg mymeiyi nextdreamblue platoneko qidaye qzsee seawinde smallx sollhui starocean999 superdiaodiao suxiaogang223 w41ter wangbo wangshuo128 wsjz wuwenchi wyxxxcat xiaokang xinyiZzz xzj7019 yagagagaga yiguolei yujun777 zclllyybb zddr zfr9527 zhangstar333 zhannngchen zhiqiang-hhhh zy-kkk zzzxl1993

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant