Replies: 1 comment
-
This is super helpful, thank you Shane! For 0.1 I did ICU4X 0.1 vs ICU4C 67 comparison using a harness, and got size diff for DateTimeFormat, PluralRules, Locale and Unicode Set - https://docs.google.com/spreadsheets/d/1SnbL7OEshSOv7QrEIxUK8FA0FYzuZOZ-r9g8r5oLwAQ/edit#gid=0 In 0.1 DTF did not have some important pieces like TimeZones, which take a lot of data, but we knew we're designing the system to not have to load it if not in use, so I think the results can be assumed to give us a ballpark as well. The tests were performed by writing a small Rust/C++ app and plugging a single component at the time, compiling and measuring impact.
In #66 I am updating the numbers to 0.2 vs ICU4C 69. I hope to have some results within a week. |
Beta Was this translation helpful? Give feedback.
-
A couple customers have recently asked for code size analysis of ICU4X versus ICU4C, so I thought I would share some ballpark figures.
Methodology
I compared ICU4X FixedDecimalFormat against ICU4C NumberFormatter. This is not a completely fair comparison, for reasons listed in the Analysis section, but it gets us a ballpark. I compiled code_line_diff.rs and a port of that example to C++.
ICU4X
I ran the following command from the current tip of the main branch (acf3886):
$ RUSTFLAGS="-C panic=abort -C opt-level=s" cargo +nightly-2021-02-28 build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --release --example code_line_diff
Note that the
s
opt level is used; this means to optimize for size.There were to changes I made to the code file:
#[cfg(debug_assertions)]
on line 47 to force theprintln!
to be included in the binaryicu_testdata::get_provider()
withicu_provider::inv::InvariantDataProvider
The WASM numbers were taken in the same way, except with
wasm32-unknown-unknown
instead ofx86_64-unknown-linux-gnu
, and I took a sample with onlyicu_provider::inv::InvariantDataProvider
sinceicu_testdata::get_provider()
is not well defined in WASM.ICU4C
I used the following code_line_diff.cpp:
To compile this code, I first compiled ICU4C with static linking and then used the following command:
As with Rust, the
-Os
opt level is used.stubdata.cpp is a file I used to removed libicudata from the resulting binary:
To get the size for only the data loading code by itself, I used the following cpp file, compiled the same way as above:
Computing binary size
To compute the binary size for x86_64 executables, I used the following command on the resulting executable:
This command prints a "Total" line, which is what is reported in the table below.
For WASM executables, I took the file size of the .wasm file.
Raw Results
The following figures are binary code size, without data.
* These values are computed by taking the difference of the other two measurements in the row.
Analysis
First, NumberFormatter is a "kitchen sink" class that does much more than FixedDecimalFormat. This likely accounts for most of the difference between the core library size. However, ICU4C does not currently have an ability to split monolithic classes such as NumberFormatter (and many others) into smaller pieces. (Disclaimer: I wrote both FixedDecimalFormat and NumberFormatter and can speak to this as being different design considerations.)
On the data loading code size, it is interesting that ICU4C is about 5x the size of ICU4X (548 kB vs 104 kB). Almost all of the ICU4X code size for data loading is dedicated to Serde deserialization, which we are actively working to optimize (#78). The ICU4C code size for data loading is largely dominated by locale handling functionality, including horizontal and vertical fallbacks. ICU4X's data pipeline is designed to avoid the need for horizontal fallbacks, but vertical fallbacks are not yet implemented and will likely be needed.
My best explanation for the code size difference between x86_64 and wasm32 is that wasm32 does not include any file I/O for data loading, since file I/O is unavailable in WASM, and the compiler likely dropping that code entirely.
Conclusions
Beta Was this translation helpful? Give feedback.
All reactions