Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry
### What changes were proposed in this pull request? This PR propose to improve concurrency performance for `FunctionRegistry`. ### Why are the changes needed? Currently, `SimpleFunctionRegistryBase` adopted the `mutable.Map` caching function infos. The `SimpleFunctionRegistryBase` guarded by this so as ensure security under multithreading. Because all the mutable state are related to `functionBuilders`, we can delegate security to `ConcurrentHashMap`. `ConcurrentHashMap ` has higher concurrency activity and responsiveness. After this change, `FunctionRegistry` have better perf than before. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? GA. The benchmark test. ``` object FunctionRegistryBenchmark extends BenchmarkBase { override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { runBenchmark("FunctionRegistry") { val iters = 1000000 val threadNum = 4 val functionRegistry = FunctionRegistry.builtin val names = FunctionRegistry.expressions.keys.toSeq val barrier = new CyclicBarrier(threadNum + 1) val threadPool = ThreadUtils.newDaemonFixedThreadPool(threadNum, "test-function-registry") val benchmark = new Benchmark("SimpleFunctionRegistry", iters, output = output) benchmark.addCase("only read") { _ => for (_ <- 1 to threadNum) { threadPool.execute(new Runnable { val random = new Random() override def run(): Unit = { barrier.await() for (_ <- 1 to iters) { val name = names(random.nextInt(names.size)) val fun = functionRegistry.lookupFunction(new FunctionIdentifier(name)) assert(fun.map(_.getName).get == name) functionRegistry.listFunction() } barrier.await() } }) } barrier.await() barrier.await() } benchmark.run() } } } ``` The benchmark output before this PR. ``` Java HotSpot(TM) 64-Bit Server VM 17.0.9+11-LTS-201 on Mac OS X 10.14.6 Intel(R) Core(TM) i5-5350U CPU 1.80GHz SimpleFunctionRegistry: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ only read 54858 55043 261 0.0 54858.1 1.0X ``` The benchmark output after this PR. ``` Java HotSpot(TM) 64-Bit Server VM 17.0.9+11-LTS-201 on Mac OS X 10.14.6 Intel(R) Core(TM) i5-5350U CPU 1.80GHz SimpleFunctionRegistry: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ only read 20202 20264 88 0.0 20202.1 1.0X ``` ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes apache#44976 from beliefer/SPARK-46937. Authored-by: beliefer <[email protected]> Signed-off-by: beliefer <[email protected]>
- Loading branch information