Using or not using Pandas #147
SemyonSinchenko
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Using pandas and pyarrow may improve the performance of collect operations (like
columns_to_list
). On the other side, both pandas and pyarrow are optional dependencies for PySpark SQL. Should we use them or not? And if we should, is it a good idea to separate any calls to these libs to allow other functions to work well?@MrPowers @jeffbrennan
Beta Was this translation helpful? Give feedback.
All reactions