-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix parsing integer batch size within export #1004
base: gh/vmoens/18/base
Are you sure you want to change the base?
Conversation
ghstack-source-id: 73e7dd429770e1c383b3b2a1c28dbbf661d65f07 Pull Request resolved: #1004
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 38.5420μs | 19.7824μs | 50.5499 KOps/s | 49.6138 KOps/s | |
test_plain_set_stack_nested | 68.7990μs | 19.4874μs | 51.3153 KOps/s | 49.0805 KOps/s | |
test_plain_set_nested_inplace | 85.4400μs | 21.0734μs | 47.4532 KOps/s | 45.0632 KOps/s | |
test_plain_set_stack_nested_inplace | 74.0590μs | 21.1939μs | 47.1834 KOps/s | 45.4684 KOps/s | |
test_items | 26.1410μs | 4.1291μs | 242.1842 KOps/s | 237.3341 KOps/s | |
test_items_nested | 0.5688ms | 0.3730ms | 2.6812 KOps/s | 2.7413 KOps/s | |
test_items_nested_locked | 0.5030ms | 0.3726ms | 2.6835 KOps/s | 2.7477 KOps/s | |
test_items_nested_leaf | 0.1286ms | 68.6555μs | 14.5655 KOps/s | 14.3974 KOps/s | |
test_items_stack_nested | 0.5022ms | 0.3745ms | 2.6704 KOps/s | 2.7293 KOps/s | |
test_items_stack_nested_leaf | 0.1286ms | 70.9499μs | 14.0944 KOps/s | 14.1513 KOps/s | |
test_items_stack_nested_locked | 0.5940ms | 0.3759ms | 2.6605 KOps/s | 2.7490 KOps/s | |
test_keys | 31.0890μs | 3.5671μs | 280.3401 KOps/s | 281.3400 KOps/s | |
test_keys_nested | 0.1796ms | 99.9570μs | 10.0043 KOps/s | 9.9581 KOps/s | |
test_keys_nested_locked | 1.8124ms | 0.1049ms | 9.5329 KOps/s | 9.5227 KOps/s | |
test_keys_nested_leaf | 0.1537ms | 84.6151μs | 11.8182 KOps/s | 11.9151 KOps/s | |
test_keys_stack_nested | 0.1642ms | 0.1006ms | 9.9375 KOps/s | 9.9904 KOps/s | |
test_keys_stack_nested_leaf | 0.1472ms | 82.8254μs | 12.0736 KOps/s | 11.9905 KOps/s | |
test_keys_stack_nested_locked | 0.1761ms | 0.1053ms | 9.4929 KOps/s | 9.6317 KOps/s | |
test_values | 10.1710μs | 1.0480μs | 954.2378 KOps/s | 962.7476 KOps/s | |
test_values_nested | 0.1317ms | 75.0840μs | 13.3184 KOps/s | 13.6575 KOps/s | |
test_values_nested_locked | 0.1391ms | 74.0021μs | 13.5131 KOps/s | 13.6190 KOps/s | |
test_values_nested_leaf | 0.1125ms | 61.2305μs | 16.3317 KOps/s | 16.4462 KOps/s | |
test_values_stack_nested | 0.1499ms | 75.7689μs | 13.1980 KOps/s | 13.5130 KOps/s | |
test_values_stack_nested_leaf | 0.1239ms | 60.0065μs | 16.6649 KOps/s | 16.5073 KOps/s | |
test_values_stack_nested_locked | 0.1327ms | 75.6589μs | 13.2172 KOps/s | 13.4496 KOps/s | |
test_membership | 2.0428μs | 0.7150μs | 1.3986 MOps/s | 1.4000 MOps/s | |
test_membership_nested | 18.7350μs | 2.8040μs | 356.6350 KOps/s | 367.3173 KOps/s | |
test_membership_nested_leaf | 31.2570μs | 2.8205μs | 354.5507 KOps/s | 367.9096 KOps/s | |
test_membership_stacked_nested | 27.0910μs | 2.8082μs | 356.0976 KOps/s | 369.3306 KOps/s | |
test_membership_stacked_nested_leaf | 40.0350μs | 2.8203μs | 354.5684 KOps/s | 371.5639 KOps/s | |
test_membership_nested_last | 21.5700μs | 4.1540μs | 240.7310 KOps/s | 254.7662 KOps/s | |
test_membership_nested_leaf_last | 27.0500μs | 4.0975μs | 244.0494 KOps/s | 255.7240 KOps/s | |
test_membership_stacked_nested_last | 34.8150μs | 10.7535μs | 92.9930 KOps/s | 253.9894 KOps/s | |
test_membership_stacked_nested_leaf_last | 57.1480μs | 10.5687μs | 94.6186 KOps/s | 255.1744 KOps/s | |
test_nested_getleaf | 55.0130μs | 10.7740μs | 92.8163 KOps/s | 95.2586 KOps/s | |
test_nested_get | 35.6270μs | 10.1341μs | 98.6767 KOps/s | 97.8167 KOps/s | |
test_stacked_getleaf | 51.9970μs | 10.8244μs | 92.3840 KOps/s | 92.2174 KOps/s | |
test_stacked_get | 53.2500μs | 10.2194μs | 97.8528 KOps/s | 96.0961 KOps/s | |
test_nested_getitemleaf | 52.5580μs | 11.1762μs | 89.4759 KOps/s | 88.5681 KOps/s | |
test_nested_getitem | 54.4420μs | 10.3866μs | 96.2782 KOps/s | 91.9748 KOps/s | |
test_stacked_getitemleaf | 50.5840μs | 11.2344μs | 89.0120 KOps/s | 89.3670 KOps/s | |
test_stacked_getitem | 34.1540μs | 10.3995μs | 96.1580 KOps/s | 96.8501 KOps/s | |
test_lock_nested | 83.1001ms | 0.5714ms | 1.7501 KOps/s | 2.0184 KOps/s | |
test_lock_stack_nested | 0.5482ms | 0.4423ms | 2.2611 KOps/s | 2.1270 KOps/s | |
test_unlock_nested | 83.8251ms | 0.4898ms | 2.0416 KOps/s | 2.3680 KOps/s | |
test_unlock_stack_nested | 0.5848ms | 0.3579ms | 2.7941 KOps/s | 2.5649 KOps/s | |
test_flatten_speed | 0.3175ms | 90.1992μs | 11.0866 KOps/s | 11.3896 KOps/s | |
test_unflatten_speed | 0.6707ms | 0.4686ms | 2.1339 KOps/s | 2.1408 KOps/s | |
test_common_ops | 6.3078ms | 1.0765ms | 928.9386 Ops/s | 886.7283 Ops/s | |
test_creation | 23.1840μs | 2.0909μs | 478.2715 KOps/s | 478.8113 KOps/s | |
test_creation_empty | 46.3570μs | 15.2282μs | 65.6675 KOps/s | 56.7594 KOps/s | |
test_creation_nested_1 | 0.2100ms | 18.4522μs | 54.1942 KOps/s | 47.7274 KOps/s | |
test_creation_nested_2 | 0.1620ms | 22.9222μs | 43.6258 KOps/s | 39.5244 KOps/s | |
test_clone | 65.8640μs | 17.0727μs | 58.5732 KOps/s | 57.3704 KOps/s | |
test_getitem[int] | 1.0355ms | 16.5308μs | 60.4932 KOps/s | 57.7392 KOps/s | |
test_getitem[slice_int] | 0.1383ms | 31.5632μs | 31.6825 KOps/s | 31.2822 KOps/s | |
test_getitem[range] | 0.2126ms | 57.4919μs | 17.3938 KOps/s | 16.6669 KOps/s | |
test_getitem[tuple] | 0.1407ms | 25.5269μs | 39.1744 KOps/s | 38.0632 KOps/s | |
test_getitem[list] | 0.1786ms | 53.0200μs | 18.8608 KOps/s | 18.0920 KOps/s | |
test_setitem_dim[int] | 74.6900μs | 32.8488μs | 30.4425 KOps/s | 29.2600 KOps/s | |
test_setitem_dim[slice_int] | 0.1146ms | 62.2202μs | 16.0719 KOps/s | 15.7617 KOps/s | |
test_setitem_dim[range] | 0.1343ms | 83.2617μs | 12.0103 KOps/s | 11.4831 KOps/s | |
test_setitem_dim[tuple] | 79.0590μs | 49.6216μs | 20.1525 KOps/s | 20.1118 KOps/s | |
test_setitem | 0.1204ms | 28.2045μs | 35.4553 KOps/s | 34.2134 KOps/s | |
test_set | 0.1083ms | 27.4115μs | 36.4810 KOps/s | 34.9232 KOps/s | |
test_set_shared | 1.3110ms | 0.2118ms | 4.7212 KOps/s | 4.6673 KOps/s | |
test_update | 0.1428ms | 33.4392μs | 29.9051 KOps/s | 28.7465 KOps/s | |
test_update_nested | 0.1310ms | 44.0220μs | 22.7159 KOps/s | 22.1692 KOps/s | |
test_update__nested | 0.1067ms | 35.3321μs | 28.3029 KOps/s | 28.5885 KOps/s | |
test_set_nested | 0.1021ms | 30.0506μs | 33.2772 KOps/s | 31.4557 KOps/s | |
test_set_nested_new | 0.1009ms | 35.5562μs | 28.1245 KOps/s | 27.5339 KOps/s | |
test_select | 0.1292ms | 53.3357μs | 18.7492 KOps/s | 17.7269 KOps/s | |
test_select_nested | 0.1457ms | 60.4124μs | 16.5529 KOps/s | 15.5329 KOps/s | |
test_exclude_nested | 0.1670ms | 77.0049μs | 12.9862 KOps/s | 12.7989 KOps/s | |
test_empty[True] | 0.5134ms | 0.3190ms | 3.1352 KOps/s | 3.1098 KOps/s | |
test_empty[False] | 7.7320μs | 1.2727μs | 785.7050 KOps/s | 827.5798 KOps/s | |
test_unbind_speed | 0.5270ms | 0.3015ms | 3.3164 KOps/s | 3.2257 KOps/s | |
test_unbind_speed_stack0 | 0.4310ms | 0.2879ms | 3.4731 KOps/s | 3.3317 KOps/s | |
test_unbind_speed_stack1 | 97.2526ms | 0.7900ms | 1.2658 KOps/s | 1.3295 KOps/s | |
test_split | 89.0874ms | 2.1889ms | 456.8563 Ops/s | 449.6076 Ops/s | |
test_chunk | 3.1147ms | 2.0317ms | 492.2060 Ops/s | 449.6004 Ops/s | |
test_creation[device0] | 0.2828ms | 0.1187ms | 8.4253 KOps/s | 8.6325 KOps/s | |
test_creation_from_tensor | 3.6866ms | 0.1192ms | 8.3874 KOps/s | 8.5766 KOps/s | |
test_add_one[memmap_tensor0] | 0.1938ms | 7.5083μs | 133.1868 KOps/s | 137.8180 KOps/s | |
test_contiguous[memmap_tensor0] | 16.8710μs | 1.9104μs | 523.4640 KOps/s | 537.8001 KOps/s | |
test_stack[memmap_tensor0] | 37.3700μs | 5.6493μs | 177.0139 KOps/s | 177.5228 KOps/s | |
test_memmaptd_index | 1.1470ms | 0.4072ms | 2.4559 KOps/s | 2.5098 KOps/s | |
test_memmaptd_index_astensor | 1.2238ms | 0.4890ms | 2.0449 KOps/s | 2.0958 KOps/s | |
test_memmaptd_index_op | 1.4619ms | 0.9758ms | 1.0248 KOps/s | 985.0547 Ops/s | |
test_serialize_model | 0.1270s | 0.1214s | 8.2362 Ops/s | 8.3265 Ops/s | |
test_serialize_model_pickle | 0.4603s | 0.3927s | 2.5467 Ops/s | 2.5104 Ops/s | |
test_serialize_weights | 0.1263s | 0.1141s | 8.7634 Ops/s | 7.5141 Ops/s | |
test_serialize_weights_returnearly | 0.1761s | 0.1608s | 6.2180 Ops/s | 6.2331 Ops/s | |
test_serialize_weights_pickle | 0.6051s | 0.4463s | 2.2408 Ops/s | 2.5372 Ops/s | |
test_serialize_weights_filesystem | 0.1490s | 0.1429s | 6.9974 Ops/s | 6.9807 Ops/s | |
test_serialize_model_filesystem | 0.1620s | 0.1495s | 6.6902 Ops/s | 6.1022 Ops/s | |
test_reshape_pytree | 0.1273ms | 38.6397μs | 25.8801 KOps/s | 25.4779 KOps/s | |
test_reshape_td | 0.1123ms | 45.9621μs | 21.7571 KOps/s | 20.8746 KOps/s | |
test_view_pytree | 80.6310μs | 38.8247μs | 25.7568 KOps/s | 25.5150 KOps/s | |
test_view_td | 0.1401ms | 52.9638μs | 18.8808 KOps/s | 18.7321 KOps/s | |
test_unbind_pytree | 97.0720μs | 36.2219μs | 27.6076 KOps/s | 28.0252 KOps/s | |
test_unbind_td | 0.3619ms | 45.1836μs | 22.1319 KOps/s | 22.0007 KOps/s | |
test_split_pytree | 96.9110μs | 38.9820μs | 25.6529 KOps/s | 26.7646 KOps/s | |
test_split_td | 0.4471ms | 57.7893μs | 17.3042 KOps/s | 16.5645 KOps/s | |
test_add_pytree | 0.1267ms | 45.9991μs | 21.7395 KOps/s | 22.0554 KOps/s | |
test_add_td | 0.2135ms | 77.4837μs | 12.9059 KOps/s | 12.2225 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1088ms | 58.7833μs | 17.0116 KOps/s | 16.8283 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.2975ms | 0.1801ms | 5.5514 KOps/s | 5.6220 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1286ms | 58.2946μs | 17.1543 KOps/s | 16.9319 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2826ms | 0.1442ms | 6.9339 KOps/s | 7.0179 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 56.6960μs | 22.2979μs | 44.8473 KOps/s | 44.7865 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1352ms | 67.3926μs | 14.8384 KOps/s | 14.5036 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1474ms | 75.2525μs | 13.2886 KOps/s | 13.1963 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1409ms | 67.9936μs | 14.7073 KOps/s | 14.5719 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.2948ms | 0.1751ms | 5.7104 KOps/s | 5.7402 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3586ms | 0.1904ms | 5.2521 KOps/s | 5.2515 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1098ms | 47.5585μs | 21.0268 KOps/s | 20.4597 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1409ms | 69.3090μs | 14.4281 KOps/s | 13.9754 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2856ms | 0.1754ms | 5.7027 KOps/s | 5.7140 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.4494ms | 0.2925ms | 3.4186 KOps/s | 3.5461 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3641ms | 0.2025ms | 4.9373 KOps/s | 4.8849 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.2975ms | 0.1742ms | 5.7409 KOps/s | 5.6865 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1367ms | 61.6051μs | 16.2324 KOps/s | 15.9178 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.3221ms | 47.1525μs | 21.2078 KOps/s | 20.8523 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.5096ms | 0.2311ms | 4.3278 KOps/s | 4.3636 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.3147ms | 0.1784ms | 5.6043 KOps/s | 5.5455 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.1935ms | 0.1025ms | 9.7553 KOps/s | 9.6987 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1326ms | 57.6592μs | 17.3433 KOps/s | 17.6862 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1487ms | 75.3510μs | 13.2712 KOps/s | 12.8677 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1607ms | 67.2171μs | 14.8772 KOps/s | 14.5479 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.2981ms | 0.1973ms | 5.0693 KOps/s | 5.0475 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 1.8216ms | 1.6661ms | 600.2194 Ops/s | 613.7995 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.2961ms | 0.1931ms | 5.1784 KOps/s | 5.2348 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 1.3466ms | 1.1084ms | 902.2325 Ops/s | 938.4428 Ops/s | |
test_compile_assign_and_add_stack[compile] | 0.6988ms | 0.4197ms | 2.3829 KOps/s | 2.4213 KOps/s | |
test_compile_assign_and_add_stack[eager] | 5.1835ms | 3.6639ms | 272.9300 Ops/s | 271.9522 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1127ms | 34.1144μs | 29.3132 KOps/s | 27.8742 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5745ms | 49.3461μs | 20.2650 KOps/s | 20.5062 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 93.2440μs | 30.3686μs | 32.9287 KOps/s | 32.5524 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 91.3410μs | 29.4037μs | 34.0093 KOps/s | 35.2423 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1092ms | 29.9655μs | 33.3717 KOps/s | 33.1807 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 82.0840μs | 28.8389μs | 34.6753 KOps/s | 35.0715 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1388ms | 73.7836μs | 13.5531 KOps/s | 13.1991 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.5843ms | 28.4332μs | 35.1702 KOps/s | 35.1585 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1397ms | 69.4012μs | 14.4090 KOps/s | 14.6128 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 88.6260μs | 23.4339μs | 42.6732 KOps/s | 43.5208 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1512ms | 68.0684μs | 14.6911 KOps/s | 14.4374 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 95.3490μs | 23.3439μs | 42.8378 KOps/s | 43.8131 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1522ms | 72.4697μs | 13.7989 KOps/s | 13.4824 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.8124ms | 27.9231μs | 35.8126 KOps/s | 35.7625 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1211ms | 67.8713μs | 14.7338 KOps/s | 14.7870 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 0.1830ms | 23.5389μs | 42.4829 KOps/s | 43.7517 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1357ms | 67.7184μs | 14.7670 KOps/s | 14.6891 KOps/s | |
test_compile_indexing[int-pytree-eager] | 89.0070μs | 23.0255μs | 43.4302 KOps/s | 43.3968 KOps/s | |
test_mod_add[eager] | 88.2950μs | 23.3155μs | 42.8900 KOps/s | 39.5736 KOps/s | |
test_mod_add[compile] | 93.6960μs | 39.1258μs | 25.5586 KOps/s | 25.3350 KOps/s | |
test_mod_add[compile-overhead] | 0.1005ms | 39.0315μs | 25.6203 KOps/s | 25.3702 KOps/s | |
test_mod_wrap[eager] | 0.3978ms | 0.2104ms | 4.7531 KOps/s | 4.7347 KOps/s | |
test_mod_wrap[compile] | 0.3439ms | 0.2381ms | 4.1998 KOps/s | 4.3303 KOps/s | |
test_mod_wrap[compile-overhead] | 0.3783ms | 0.2351ms | 4.2529 KOps/s | 4.4118 KOps/s | |
test_mod_wrap_and_backward[eager] | 15.0106ms | 11.8807ms | 84.1699 Ops/s | 89.7438 Ops/s | |
test_mod_wrap_and_backward[compile] | 19.6489ms | 12.7381ms | 78.5044 Ops/s | 90.2389 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 18.9460ms | 12.6595ms | 78.9918 Ops/s | 90.9444 Ops/s | |
test_seq_add[eager] | 0.1801ms | 88.1814μs | 11.3403 KOps/s | 10.7672 KOps/s | |
test_seq_add[compile] | 0.1553ms | 66.2891μs | 15.0854 KOps/s | 15.2111 KOps/s | |
test_seq_add[compile-overhead] | 0.1329ms | 65.7982μs | 15.1980 KOps/s | 15.3163 KOps/s | |
test_seq_wrap[eager] | 0.6736ms | 0.3771ms | 2.6515 KOps/s | 2.6006 KOps/s | |
test_seq_wrap[compile] | 1.2529ms | 0.2717ms | 3.6808 KOps/s | 3.5045 KOps/s | |
test_seq_wrap[compile-overhead] | 1.2391ms | 0.2726ms | 3.6684 KOps/s | 3.7482 KOps/s | |
test_func_call_runtime[False-eager] | 0.7123ms | 0.5377ms | 1.8598 KOps/s | 1.9127 KOps/s | |
test_func_call_runtime[False-compile] | 0.6971ms | 0.5043ms | 1.9829 KOps/s | 2.0120 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.6250ms | 0.5055ms | 1.9784 KOps/s | 2.0410 KOps/s | |
test_func_call_runtime[True-eager] | 1.0700ms | 0.7549ms | 1.3247 KOps/s | 1.3410 KOps/s | |
test_func_call_runtime[True-compile] | 0.8856ms | 0.5204ms | 1.9215 KOps/s | 1.9800 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.6391ms | 0.5169ms | 1.9347 KOps/s | 1.9722 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8098ms | 0.5335ms | 1.8745 KOps/s | 1.8728 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.7087ms | 0.5056ms | 1.9780 KOps/s | 2.0164 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.6721ms | 0.5071ms | 1.9722 KOps/s | 2.0351 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.4571ms | 0.9034ms | 1.1069 KOps/s | 1.1395 KOps/s | |
test_func_call_cm_runtime[True-compile] | 0.9247ms | 0.7586ms | 1.3183 KOps/s | 1.3709 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 1.2547ms | 0.7603ms | 1.3153 KOps/s | 1.3582 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.6181ms | 1.9168ms | 521.7104 Ops/s | 542.2596 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 2.9771ms | 1.9675ms | 508.2506 Ops/s | 525.3640 Ops/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 3.1061ms | 1.9930ms | 501.7450 Ops/s | 529.3039 Ops/s | |
test_distributed | 0.3023ms | 0.1242ms | 8.0537 KOps/s | 7.7870 KOps/s | |
test_tdmodule | 29.4040μs | 17.0262μs | 58.7332 KOps/s | 58.9595 KOps/s | |
test_tdmodule_dispatch | 68.0480μs | 33.6856μs | 29.6863 KOps/s | 27.7089 KOps/s | |
test_tdseq | 38.8720μs | 19.6929μs | 50.7798 KOps/s | 49.7676 KOps/s | |
test_tdseq_dispatch | 71.7240μs | 40.0761μs | 24.9525 KOps/s | 24.7712 KOps/s | |
test_instantiation_functorch | 1.7291ms | 1.5919ms | 628.1860 Ops/s | 637.6976 Ops/s | |
test_instantiation_td | 2.1782ms | 1.2030ms | 831.2598 Ops/s | 861.5557 Ops/s | |
test_exec_functorch | 0.4276ms | 0.1888ms | 5.2955 KOps/s | 5.3236 KOps/s | |
test_exec_functional_call | 0.3077ms | 0.1756ms | 5.6963 KOps/s | 5.6768 KOps/s | |
test_exec_td | 0.2721ms | 0.1696ms | 5.8974 KOps/s | 5.7662 KOps/s | |
test_exec_td_decorator | 0.4890ms | 0.2238ms | 4.4683 KOps/s | 4.3538 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.1306ms | 0.6615ms | 1.5117 KOps/s | 1.5717 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.9338ms | 0.6569ms | 1.5223 KOps/s | 1.5713 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.7715ms | 0.5167ms | 1.9355 KOps/s | 2.0499 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7855ms | 0.5185ms | 1.9287 KOps/s | 2.0441 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.3561ms | 0.6392ms | 1.5644 KOps/s | 1.6177 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.9912ms | 0.6422ms | 1.5570 KOps/s | 1.6313 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.9990ms | 0.5348ms | 1.8698 KOps/s | 1.9710 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7133ms | 0.5296ms | 1.8884 KOps/s | 1.9805 KOps/s | |
test_to_module_speed[True] | 1.6254ms | 1.2831ms | 779.3677 Ops/s | 770.3320 Ops/s | |
test_to_module_speed[False] | 2.0428ms | 1.2569ms | 795.6280 Ops/s | 794.0504 Ops/s | |
test_tc_init | 96.0100μs | 43.1498μs | 23.1751 KOps/s | 23.2218 KOps/s | |
test_tc_init_nested | 0.1802ms | 86.3518μs | 11.5805 KOps/s | 11.4341 KOps/s | |
test_tc_first_layer_tensor | 30.9880μs | 1.6386μs | 610.2749 KOps/s | 662.9586 KOps/s | |
test_tc_first_layer_nontensor | 28.6430μs | 4.8365μs | 206.7592 KOps/s | 208.0672 KOps/s | |
test_tc_second_layer_tensor | 29.7160μs | 2.9564μs | 338.2515 KOps/s | 361.6508 KOps/s | |
test_tc_second_layer_nontensor | 29.9060μs | 6.1351μs | 162.9966 KOps/s | 164.5984 KOps/s | |
test_unbind | 0.4918s | 13.6038ms | 73.5090 Ops/s | 76.5935 Ops/s | |
test_full_like | 17.7971ms | 8.1308ms | 122.9894 Ops/s | 132.7128 Ops/s | |
test_zeros_like | 3.2583ms | 2.8792ms | 347.3206 Ops/s | 354.2205 Ops/s | |
test_ones_like | 3.7530ms | 3.4125ms | 293.0438 Ops/s | 293.3948 Ops/s | |
test_clone | 6.7312ms | 5.1586ms | 193.8520 Ops/s | 198.3409 Ops/s | |
test_squeeze | 65.8530μs | 12.1194μs | 82.5124 KOps/s | 79.2930 KOps/s | |
test_unsqueeze | 0.3712ms | 96.4651μs | 10.3664 KOps/s | 10.4398 KOps/s | |
test_split | 0.3941ms | 0.1994ms | 5.0152 KOps/s | 4.9451 KOps/s | |
test_permute | 0.4633ms | 0.2284ms | 4.3774 KOps/s | 4.3784 KOps/s | |
test_stack | 32.3185ms | 25.1218ms | 39.8060 Ops/s | 40.8482 Ops/s | |
test_cat | 28.8153ms | 24.8583ms | 40.2280 Ops/s | 40.0390 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.1398ms | 14.0510μs | 71.1694 KOps/s | 71.6220 KOps/s | |
test_plain_set_stack_nested | 40.8610μs | 14.0714μs | 71.0659 KOps/s | 70.5660 KOps/s | |
test_plain_set_nested_inplace | 44.4510μs | 14.9467μs | 66.9044 KOps/s | 66.6646 KOps/s | |
test_plain_set_stack_nested_inplace | 0.1871ms | 14.9939μs | 66.6940 KOps/s | 66.6082 KOps/s | |
test_items | 29.3310μs | 2.8550μs | 350.2578 KOps/s | 347.6205 KOps/s | |
test_items_nested | 0.3789ms | 0.3287ms | 3.0421 KOps/s | 3.0848 KOps/s | |
test_items_nested_locked | 0.3891ms | 0.3311ms | 3.0205 KOps/s | 3.0425 KOps/s | |
test_items_nested_leaf | 77.6720μs | 55.6077μs | 17.9831 KOps/s | 17.8809 KOps/s | |
test_items_stack_nested | 0.3918ms | 0.3327ms | 3.0060 KOps/s | 3.0146 KOps/s | |
test_items_stack_nested_leaf | 86.0220μs | 56.5979μs | 17.6685 KOps/s | 17.4597 KOps/s | |
test_items_stack_nested_locked | 0.3873ms | 0.3337ms | 2.9963 KOps/s | 3.0367 KOps/s | |
test_keys | 37.5410μs | 3.4019μs | 293.9538 KOps/s | 274.8335 KOps/s | |
test_keys_nested | 96.9030μs | 55.8942μs | 17.8910 KOps/s | 17.6889 KOps/s | |
test_keys_nested_locked | 2.5347ms | 62.1416μs | 16.0923 KOps/s | 16.1348 KOps/s | |
test_keys_nested_leaf | 74.1420μs | 46.9139μs | 21.3157 KOps/s | 21.3000 KOps/s | |
test_keys_stack_nested | 84.8720μs | 56.7365μs | 17.6254 KOps/s | 17.6302 KOps/s | |
test_keys_stack_nested_leaf | 74.2420μs | 46.9471μs | 21.3006 KOps/s | 20.5544 KOps/s | |
test_keys_stack_nested_locked | 0.1166ms | 61.4624μs | 16.2701 KOps/s | 16.1109 KOps/s | |
test_values | 5.4752μs | 0.8714μs | 1.1476 MOps/s | 1.1753 MOps/s | |
test_values_nested | 72.4920μs | 40.4113μs | 24.7456 KOps/s | 24.2922 KOps/s | |
test_values_nested_locked | 70.5510μs | 42.2952μs | 23.6434 KOps/s | 23.3334 KOps/s | |
test_values_nested_leaf | 67.9020μs | 34.9982μs | 28.5729 KOps/s | 28.0646 KOps/s | |
test_values_stack_nested | 78.5910μs | 41.3561μs | 24.1802 KOps/s | 23.8290 KOps/s | |
test_values_stack_nested_leaf | 71.7220μs | 35.8991μs | 27.8559 KOps/s | 27.6089 KOps/s | |
test_values_stack_nested_locked | 85.0520μs | 43.0954μs | 23.2044 KOps/s | 22.8338 KOps/s | |
test_membership | 1.5476μs | 0.5040μs | 1.9842 MOps/s | 1.9828 MOps/s | |
test_membership_nested | 19.1605μs | 1.9089μs | 523.8555 KOps/s | 530.4487 KOps/s | |
test_membership_nested_leaf | 13.4055μs | 1.8915μs | 528.6671 KOps/s | 531.3800 KOps/s | |
test_membership_stacked_nested | 29.7810μs | 1.9695μs | 507.7383 KOps/s | 522.3695 KOps/s | |
test_membership_stacked_nested_leaf | 32.6010μs | 1.9825μs | 504.4088 KOps/s | 516.6656 KOps/s | |
test_membership_nested_last | 38.4010μs | 2.8505μs | 350.8172 KOps/s | 351.9531 KOps/s | |
test_membership_nested_leaf_last | 26.1300μs | 2.8229μs | 354.2396 KOps/s | 355.6954 KOps/s | |
test_membership_stacked_nested_last | 29.0310μs | 3.1879μs | 313.6906 KOps/s | 234.5753 KOps/s | |
test_membership_stacked_nested_leaf_last | 29.9410μs | 3.2153μs | 311.0106 KOps/s | 237.3361 KOps/s | |
test_nested_getleaf | 35.0010μs | 6.1846μs | 161.6922 KOps/s | 161.9744 KOps/s | |
test_nested_get | 27.3600μs | 5.7342μs | 174.3936 KOps/s | 172.6007 KOps/s | |
test_stacked_getleaf | 35.0400μs | 6.0353μs | 165.6916 KOps/s | 164.5749 KOps/s | |
test_stacked_get | 33.0910μs | 5.6195μs | 177.9531 KOps/s | 174.2211 KOps/s | |
test_nested_getitemleaf | 33.8610μs | 6.1483μs | 162.6457 KOps/s | 161.1655 KOps/s | |
test_nested_getitem | 33.0800μs | 5.7548μs | 173.7666 KOps/s | 172.0980 KOps/s | |
test_stacked_getitemleaf | 37.6710μs | 6.0543μs | 165.1723 KOps/s | 163.3039 KOps/s | |
test_stacked_getitem | 33.9910μs | 5.7794μs | 173.0291 KOps/s | 173.8919 KOps/s | |
test_lock_nested | 5.0900ms | 0.4207ms | 2.3771 KOps/s | 2.3530 KOps/s | |
test_lock_stack_nested | 0.4354ms | 0.3843ms | 2.6023 KOps/s | 2.6160 KOps/s | |
test_unlock_nested | 0.7607ms | 0.3583ms | 2.7913 KOps/s | 2.7656 KOps/s | |
test_unlock_stack_nested | 0.3725ms | 0.3240ms | 3.0863 KOps/s | 3.1061 KOps/s | |
test_flatten_speed | 0.1495ms | 69.6921μs | 14.3488 KOps/s | 14.2443 KOps/s | |
test_unflatten_speed | 0.3385ms | 0.2808ms | 3.5614 KOps/s | 3.4071 KOps/s | |
test_common_ops | 1.5521ms | 1.2773ms | 782.9121 Ops/s | 731.5360 Ops/s | |
test_creation | 33.5810μs | 1.4832μs | 674.1959 KOps/s | 667.4252 KOps/s | |
test_creation_empty | 45.7410μs | 15.5172μs | 64.4445 KOps/s | 65.0973 KOps/s | |
test_creation_nested_1 | 46.3510μs | 17.3372μs | 57.6794 KOps/s | 57.3238 KOps/s | |
test_creation_nested_2 | 65.6110μs | 19.8274μs | 50.4352 KOps/s | 49.8204 KOps/s | |
test_clone | 59.8920μs | 29.6163μs | 33.7652 KOps/s | 34.1279 KOps/s | |
test_getitem[int] | 1.3547ms | 16.2531μs | 61.5269 KOps/s | 56.8697 KOps/s | |
test_getitem[slice_int] | 0.1198ms | 27.6207μs | 36.2047 KOps/s | 32.4668 KOps/s | |
test_getitem[range] | 0.2343ms | 0.1131ms | 8.8418 KOps/s | 8.8031 KOps/s | |
test_getitem[tuple] | 0.1205ms | 23.6230μs | 42.3316 KOps/s | 40.7698 KOps/s | |
test_getitem[list] | 0.2026ms | 0.1022ms | 9.7876 KOps/s | 9.2848 KOps/s | |
test_setitem_dim[int] | 70.6020μs | 46.3912μs | 21.5558 KOps/s | 19.4015 KOps/s | |
test_setitem_dim[slice_int] | 97.1420μs | 69.5090μs | 14.3866 KOps/s | 14.2352 KOps/s | |
test_setitem_dim[range] | 0.1595ms | 0.1307ms | 7.6490 KOps/s | 7.5977 KOps/s | |
test_setitem_dim[tuple] | 0.1034ms | 63.2289μs | 15.8156 KOps/s | 15.7832 KOps/s | |
test_setitem | 84.7130μs | 42.5314μs | 23.5120 KOps/s | 23.7585 KOps/s | |
test_set | 0.1153ms | 41.6547μs | 24.0069 KOps/s | 24.1831 KOps/s | |
test_set_shared | 0.3733ms | 52.0617μs | 19.2080 KOps/s | 19.3457 KOps/s | |
test_update | 0.3018ms | 50.0804μs | 19.9679 KOps/s | 19.8293 KOps/s | |
test_update_nested | 0.1190ms | 57.2637μs | 17.4631 KOps/s | 17.5418 KOps/s | |
test_update__nested | 0.1036ms | 60.4565μs | 16.5408 KOps/s | 16.6118 KOps/s | |
test_set_nested | 0.1019ms | 44.0056μs | 22.7244 KOps/s | 22.6947 KOps/s | |
test_set_nested_new | 0.1119ms | 47.5709μs | 21.0212 KOps/s | 21.3297 KOps/s | |
test_select | 0.1103ms | 61.1058μs | 16.3651 KOps/s | 16.2440 KOps/s | |
test_select_nested | 82.4820μs | 42.0248μs | 23.7955 KOps/s | 23.5172 KOps/s | |
test_exclude_nested | 0.1022ms | 58.8460μs | 16.9935 KOps/s | 16.8691 KOps/s | |
test_empty[True] | 0.2960ms | 0.2412ms | 4.1465 KOps/s | 4.0987 KOps/s | |
test_empty[False] | 4.1951μs | 0.7357μs | 1.3593 MOps/s | 1.3490 MOps/s | |
test_to | 71.3820μs | 24.8164μs | 40.2960 KOps/s | 38.6131 KOps/s | |
test_to_nonblocking | 61.9120μs | 24.1171μs | 41.4643 KOps/s | 39.3731 KOps/s | |
test_unbind_speed | 0.3135ms | 0.2823ms | 3.5428 KOps/s | 3.5121 KOps/s | |
test_unbind_speed_stack0 | 0.3618ms | 0.2816ms | 3.5516 KOps/s | 3.5456 KOps/s | |
test_unbind_speed_stack1 | 93.3092ms | 0.7086ms | 1.4113 KOps/s | 1.5315 KOps/s | |
test_split | 95.4190ms | 2.1751ms | 459.7553 Ops/s | 436.5296 Ops/s | |
test_chunk | 95.2672ms | 2.1528ms | 464.5174 Ops/s | 428.9869 Ops/s | |
test_creation[device0] | 0.2907ms | 0.1267ms | 7.8901 KOps/s | 7.5570 KOps/s | |
test_creation_from_tensor | 0.3594ms | 0.1303ms | 7.6749 KOps/s | 7.4000 KOps/s | |
test_add_one[memmap_tensor0] | 0.2198ms | 8.9249μs | 112.0467 KOps/s | 106.3867 KOps/s | |
test_contiguous[memmap_tensor0] | 33.0310μs | 2.2021μs | 454.1068 KOps/s | 447.1720 KOps/s | |
test_stack[memmap_tensor0] | 51.4410μs | 6.8241μs | 146.5391 KOps/s | 142.9476 KOps/s | |
test_memmaptd_index | 1.1631ms | 0.4293ms | 2.3293 KOps/s | 2.2891 KOps/s | |
test_memmaptd_index_astensor | 0.7244ms | 0.4791ms | 2.0874 KOps/s | 2.0080 KOps/s | |
test_memmaptd_index_op | 1.4160ms | 1.0275ms | 973.2404 Ops/s | 912.6920 Ops/s | |
test_serialize_model | 0.1316s | 0.1299s | 7.6977 Ops/s | 7.6824 Ops/s | |
test_serialize_model_pickle | 1.3515s | 1.2121s | 0.8250 Ops/s | 0.8228 Ops/s | |
test_serialize_weights | 0.2253s | 0.1426s | 7.0132 Ops/s | 7.0324 Ops/s | |
test_serialize_weights_returnearly | 0.2336s | 56.9592ms | 17.5564 Ops/s | 17.6422 Ops/s | |
test_serialize_weights_pickle | 1.3718s | 1.2164s | 0.8221 Ops/s | 0.8217 Ops/s | |
test_reshape_pytree | 63.6120μs | 35.8947μs | 27.8593 KOps/s | 27.4861 KOps/s | |
test_reshape_td | 74.9420μs | 42.1493μs | 23.7252 KOps/s | 23.3973 KOps/s | |
test_view_pytree | 66.3510μs | 35.4150μs | 28.2367 KOps/s | 27.5089 KOps/s | |
test_view_td | 85.0620μs | 46.0091μs | 21.7348 KOps/s | 20.8806 KOps/s | |
test_unbind_pytree | 63.9920μs | 35.0768μs | 28.5089 KOps/s | 27.9981 KOps/s | |
test_unbind_td | 0.5109ms | 43.7185μs | 22.8736 KOps/s | 22.9630 KOps/s | |
test_split_pytree | 0.5287ms | 47.0563μs | 21.2511 KOps/s | 21.3861 KOps/s | |
test_split_td | 0.1476ms | 55.9662μs | 17.8679 KOps/s | 17.5668 KOps/s | |
test_add_pytree | 0.1001ms | 57.7554μs | 17.3144 KOps/s | 17.5550 KOps/s | |
test_add_td | 0.1640ms | 96.5171μs | 10.3609 KOps/s | 11.0197 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.4282ms | 0.2127ms | 4.7012 KOps/s | 4.6114 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.1979ms | 0.1514ms | 6.6038 KOps/s | 6.6746 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1830ms | 0.1453ms | 6.8841 KOps/s | 6.8742 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2527ms | 0.1855ms | 5.3896 KOps/s | 5.4415 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 50.8910μs | 21.9777μs | 45.5008 KOps/s | 43.5984 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 90.6420μs | 44.2572μs | 22.5952 KOps/s | 22.5008 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.2377ms | 63.1173μs | 15.8435 KOps/s | 15.6026 KOps/s | |
test_compile_copy_nested[pytree-eager] | 86.7320μs | 49.0089μs | 20.4045 KOps/s | 20.4098 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.3861ms | 0.3217ms | 3.1088 KOps/s | 3.1230 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.2824ms | 0.2099ms | 4.7632 KOps/s | 4.7392 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1843ms | 0.1287ms | 7.7682 KOps/s | 7.6369 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1101ms | 59.8019μs | 16.7219 KOps/s | 15.7429 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.3951ms | 0.3221ms | 3.1045 KOps/s | 3.1058 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.6945ms | 0.6423ms | 1.5570 KOps/s | 1.6058 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.2947ms | 0.2476ms | 4.0386 KOps/s | 4.0024 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.3825ms | 0.3248ms | 3.0785 KOps/s | 3.0800 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1159ms | 69.4929μs | 14.3900 KOps/s | 13.7655 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1734ms | 0.1308ms | 7.6478 KOps/s | 7.4615 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.6038ms | 0.5336ms | 1.8741 KOps/s | 1.8880 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.3989ms | 0.3222ms | 3.1035 KOps/s | 3.1115 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 67.5010μs | 18.5081μs | 54.0304 KOps/s | 55.1974 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 64.4020μs | 26.7970μs | 37.3176 KOps/s | 37.1119 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1107ms | 69.4912μs | 14.3903 KOps/s | 14.5702 KOps/s | |
test_compile_copy_flat[pytree-eager] | 79.6920μs | 51.6724μs | 19.3527 KOps/s | 19.5388 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 2.3169ms | 0.8121ms | 1.2314 KOps/s | 1.1100 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.4347ms | 3.2951ms | 303.4788 Ops/s | 300.5918 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 2.3125ms | 0.8151ms | 1.2269 KOps/s | 1.1244 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 3.5630ms | 3.3262ms | 300.6429 Ops/s | 304.4343 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1528ms | 0.1093ms | 9.1467 KOps/s | 8.8319 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.1952ms | 65.8807μs | 15.1790 KOps/s | 15.0117 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1496ms | 0.1034ms | 9.6672 KOps/s | 9.5485 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.1467ms | 44.2961μs | 22.5754 KOps/s | 22.2333 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1588ms | 0.1086ms | 9.2080 KOps/s | 9.3104 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 92.9220μs | 44.2527μs | 22.5975 KOps/s | 22.4984 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1989ms | 0.1379ms | 7.2541 KOps/s | 7.1707 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.1634ms | 25.4665μs | 39.2673 KOps/s | 38.1116 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1672ms | 0.1318ms | 7.5883 KOps/s | 7.3971 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 56.6620μs | 20.3289μs | 49.1910 KOps/s | 46.8778 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1838ms | 0.1331ms | 7.5104 KOps/s | 7.2176 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 56.7810μs | 20.4175μs | 48.9777 KOps/s | 47.1847 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1812ms | 0.1394ms | 7.1743 KOps/s | 7.1279 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.4911ms | 24.5580μs | 40.7199 KOps/s | 38.7132 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1966ms | 0.1340ms | 7.4601 KOps/s | 7.3090 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 0.1541ms | 22.5983μs | 44.2511 KOps/s | 46.9272 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1854ms | 0.1338ms | 7.4711 KOps/s | 7.4841 KOps/s | |
test_compile_indexing[int-pytree-eager] | 65.9520μs | 20.5969μs | 48.5509 KOps/s | 47.6069 KOps/s | |
test_mod_add[eager] | 81.6420μs | 32.0422μs | 31.2088 KOps/s | 30.4546 KOps/s | |
test_mod_add[compile] | 0.3827ms | 69.8231μs | 14.3219 KOps/s | 13.9641 KOps/s | |
test_mod_add[compile-overhead] | 0.2627ms | 0.1364ms | 7.3301 KOps/s | 7.0108 KOps/s | |
test_mod_wrap[eager] | 0.3235ms | 0.2443ms | 4.0935 KOps/s | 4.0007 KOps/s | |
test_mod_wrap[compile] | 1.4681ms | 0.2998ms | 3.3359 KOps/s | 3.1661 KOps/s | |
test_mod_wrap[compile-overhead] | 7.6595ms | 4.0040ms | 249.7505 Ops/s | 248.9984 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.4577ms | 1.3667ms | 731.7052 Ops/s | 687.6753 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.5795ms | 1.3348ms | 749.1638 Ops/s | 686.2619 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.3432ms | 0.9067ms | 1.1029 KOps/s | 971.2357 Ops/s | |
test_seq_add[eager] | 0.1498ms | 97.6527μs | 10.2404 KOps/s | 10.1878 KOps/s | |
test_seq_add[compile] | 0.1477ms | 81.0903μs | 12.3319 KOps/s | 12.1919 KOps/s | |
test_seq_add[compile-overhead] | 0.1535ms | 0.1148ms | 8.7102 KOps/s | 8.5528 KOps/s | |
test_seq_wrap[eager] | 0.4456ms | 0.3875ms | 2.5808 KOps/s | 2.5402 KOps/s | |
test_seq_wrap[compile] | 0.3812ms | 0.3176ms | 3.1487 KOps/s | 3.1004 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3023ms | 0.2229ms | 4.4871 KOps/s | 4.4311 KOps/s | |
test_func_call_runtime[False-eager] | 0.8167ms | 0.7386ms | 1.3540 KOps/s | 1.3303 KOps/s | |
test_func_call_runtime[False-compile] | 0.8794ms | 0.7999ms | 1.2502 KOps/s | 1.2299 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.4139ms | 0.3626ms | 2.7579 KOps/s | 2.7281 KOps/s | |
test_func_call_runtime[True-eager] | 0.9725ms | 0.9013ms | 1.1095 KOps/s | 1.0722 KOps/s | |
test_func_call_runtime[True-compile] | 0.9312ms | 0.8344ms | 1.1985 KOps/s | 1.1780 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.4542ms | 0.3984ms | 2.5100 KOps/s | 2.4984 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8102ms | 0.7407ms | 1.3501 KOps/s | 1.2517 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.9490ms | 0.8051ms | 1.2421 KOps/s | 1.2227 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.4387ms | 0.3664ms | 2.7295 KOps/s | 2.7347 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.1212ms | 1.0030ms | 996.9759 Ops/s | 983.8462 Ops/s | |
test_func_call_cm_runtime[True-compile] | 0.9491ms | 0.8624ms | 1.1595 KOps/s | 1.1391 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.4832ms | 0.4234ms | 2.3617 KOps/s | 2.3428 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.5686ms | 2.0924ms | 477.9122 Ops/s | 475.5572 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9772ms | 0.8818ms | 1.1341 KOps/s | 1.1198 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.4791ms | 0.4309ms | 2.3205 KOps/s | 2.3269 KOps/s | |
test_distributed | 2.2133ms | 0.2002ms | 4.9944 KOps/s | 8.9291 KOps/s | |
test_tdmodule | 80.4520μs | 15.0300μs | 66.5335 KOps/s | 63.5575 KOps/s | |
test_tdmodule_dispatch | 57.8110μs | 28.7745μs | 34.7530 KOps/s | 34.6011 KOps/s | |
test_tdseq | 42.6210μs | 16.0971μs | 62.1231 KOps/s | 63.1077 KOps/s | |
test_tdseq_dispatch | 56.8020μs | 32.5273μs | 30.7434 KOps/s | 31.2041 KOps/s | |
test_instantiation_functorch | 2.4227ms | 1.8886ms | 529.5004 Ops/s | 522.7627 Ops/s | |
test_instantiation_td | 1.7868ms | 1.2015ms | 832.2859 Ops/s | 826.2625 Ops/s | |
test_exec_functorch | 0.2819ms | 0.2080ms | 4.8078 KOps/s | 4.6742 KOps/s | |
test_exec_functional_call | 0.2703ms | 0.2120ms | 4.7172 KOps/s | 4.6576 KOps/s | |
test_exec_td | 0.2799ms | 0.2180ms | 4.5862 KOps/s | 4.5472 KOps/s | |
test_exec_td_decorator | 0.6798ms | 0.2584ms | 3.8697 KOps/s | 3.7960 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.7645ms | 0.6906ms | 1.4479 KOps/s | 1.4324 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7468ms | 0.6868ms | 1.4561 KOps/s | 1.4434 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.7086ms | 0.5804ms | 1.7230 KOps/s | 1.6704 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6687ms | 0.6078ms | 1.6451 KOps/s | 1.7065 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.4322ms | 0.6822ms | 1.4659 KOps/s | 1.4666 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8429ms | 0.6807ms | 1.4691 KOps/s | 1.4720 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7100ms | 0.6085ms | 1.6434 KOps/s | 1.6749 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7492ms | 0.6256ms | 1.5985 KOps/s | 1.6477 KOps/s | |
test_vmap_transformer_speed[True-True] | 8.8495ms | 8.4518ms | 118.3179 Ops/s | 117.7615 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.9342ms | 8.4537ms | 118.2908 Ops/s | 117.7776 Ops/s | |
test_vmap_transformer_speed[False-True] | 8.4434ms | 8.1908ms | 122.0881 Ops/s | 120.7464 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.3043ms | 8.1979ms | 121.9827 Ops/s | 119.8967 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.8267ms | 19.7100ms | 50.7356 Ops/s | 50.6794 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 20.7671ms | 19.8264ms | 50.4379 Ops/s | 50.1700 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 20.7505ms | 19.6091ms | 50.9968 Ops/s | 51.3185 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.6557ms | 19.5184ms | 51.2338 Ops/s | 51.1055 Ops/s | |
test_to_module_speed[True] | 1.2098ms | 0.9383ms | 1.0657 KOps/s | 1.0593 KOps/s | |
test_to_module_speed[False] | 1.3441ms | 0.9228ms | 1.0837 KOps/s | 1.0953 KOps/s | |
test_tc_init | 62.3120μs | 32.5688μs | 30.7042 KOps/s | 30.8415 KOps/s | |
test_tc_init_nested | 0.1038ms | 66.6339μs | 15.0074 KOps/s | 15.5366 KOps/s | |
test_tc_first_layer_tensor | 5.3887μs | 0.6797μs | 1.4713 MOps/s | 1.4640 MOps/s | |
test_tc_first_layer_nontensor | 33.0610μs | 2.2435μs | 445.7403 KOps/s | 441.3346 KOps/s | |
test_tc_second_layer_tensor | 47.2713μs | 1.3843μs | 722.3918 KOps/s | 730.4920 KOps/s | |
test_tc_second_layer_nontensor | 31.7110μs | 2.9376μs | 340.4139 KOps/s | 341.8278 KOps/s | |
test_unbind | 0.1956s | 12.2958ms | 81.3286 Ops/s | 90.4173 Ops/s | |
test_full_like | 0.6570ms | 0.5756ms | 1.7373 KOps/s | 1.7427 KOps/s | |
test_zeros_like | 0.2836ms | 0.1980ms | 5.0506 KOps/s | 5.0494 KOps/s | |
test_ones_like | 0.2333ms | 0.1979ms | 5.0529 KOps/s | 5.0547 KOps/s | |
test_clone | 0.4779ms | 0.4149ms | 2.4102 KOps/s | 2.4117 KOps/s | |
test_squeeze | 38.1210μs | 9.8297μs | 101.7323 KOps/s | 99.6491 KOps/s | |
test_unsqueeze | 0.2800ms | 75.0819μs | 13.3188 KOps/s | 13.1423 KOps/s | |
test_split | 0.2596ms | 0.1534ms | 6.5206 KOps/s | 6.3078 KOps/s | |
test_permute | 0.2385ms | 0.1743ms | 5.7369 KOps/s | 5.5181 KOps/s | |
test_stack | 1.2546ms | 0.8439ms | 1.1850 KOps/s | 1.1658 KOps/s | |
test_cat | 1.2476ms | 1.2314ms | 812.0726 Ops/s | 811.7995 Ops/s |
ghstack-source-id: 18a5798c5377d3e5b65e7b6c87d59917c474fd64 Pull Request resolved: #1004
x_new, y_new = torch.zeros(5, 100), torch.zeros(5, 100) | ||
export_test = export_mod(x_new, y_new) | ||
eager_test = test(x_new, y_new) | ||
assert eager_test.batch_size == export_test.batch_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang this test fails when using dynamic shape - the eager shape is [5]
but the export is []
.
Both across strict=False
and True
.
The batch size [s0]
becomes []
when using dynamic shapes and when the 2nd output shape mismatches the 1st.
We do get a warning though
W0920 10:19:28.564000 20340 torch/fx/experimental/symbolic_shapes.py:5136] Ignored guard Eq(s0, 5) == False, this could result in accuracy problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there's something a bit nontrivial going on here. In torch.compile eager, if we produce a fresh TensorDict and that TensorDict holds a list of dynamic ints, then in the residual bytecode we have to construct the TensorDict and also put in the freshly computed dynamic shapes from the FX graph (that has some int outputs now). So actually building a TensorDict isn't just a matter of putting in the right tensors, you also have to put some ints in too. Does this work?
Assuming this does work, export also has to be setup to do the same thing as well. It wouldn't be surprising if it didn't. In particular, if all export is doing is a pytree unflatten on Tensor leaves, the batch size won't be modified at all. To address this, we need to fix the export bug. But I also saw the comment about TensorDict not being pytree-able, so I am uncertain about the status there.
If you want to workaround, perhaps batch size can store rank instead of size and lazily compute it from tensor if it's not set? Better to fix things though. Just not sure what you expect to work and not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this does work, export also has to be setup to do the same thing as well. It wouldn't be surprising if it didn't. In particular, if all export is doing is a pytree unflatten on Tensor leaves, the batch size won't be modified at all. To address this, we need to fix the export bug. But I also saw the comment about TensorDict not being pytree-able, so I am uncertain about the status there.
TensorDict is pytreeable but you can deactivate it, this is what the comment is about (don't do it or the test will fail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what works and what doesn't
class Test(torch.nn.Module):
def forward(self, x: torch.Tensor, y: torch.Tensor):
return TensorDict(
{
"x": x,
"y": y,
},
batch_size=x.shape[0],
)
x, y = torch.zeros(5, 100), torch.zeros(5, 100)
result = torch.export.export(test, args=(x, y), strict=False, dynamic_shapes={
"x": {0: torch.export.Dim("batch"), 1: torch.export.Dim("time")},
"y": {0: torch.export.Dim("batch"), 1: torch.export.Dim("time")},
})
result = torch.export.export(test, args=(x, y), strict=False, **kwargs)
export_mod = result.module()
x_new, y_new = torch.zeros(5, 100), torch.zeros(5, 100)
export_test = export_mod(x_new, y_new)
eager_test = test(x_new, y_new)
assert torch.Size([5]) == eager_test.batch_size == export_test.batch_size # Works because x and x_new have the same shape
x_new, y_new = torch.zeros(2, 100), torch.zeros(2, 100)
export_test = export_mod(x_new, y_new)
eager_test = test(x_new, y_new)
assert torch.Size([2]) == eager_test.batch_size == export_test.batch_size # Fails! now export_test.batch_size is torch.Size([])
So it's a weird behaviour, the SymInt just vanished into thin air in the second case
Stack from ghstack (oldest at bottom):