Discussion about implicit modification of PySpark Classes #74
Replies: 2 comments
-
I am a little mixed here. I like We should definitely get rid of all the At the same time I hate monkey patching and think it's an antipattern, so I'm torn, haha |
Beta Was this translation helpful? Give feedback.
-
I was thinking again about this topic, and I see it is very dangerous to use it. Cause the current implementation depends on the order of imports. For example, this code block will work well: from quinn.extensions import *
source_df.withColumn("is_stuff_falsy", F.col("has_stuff").isFalsy()) Lets imagine the situation that we have another file from pyspark.sql import Column
def modify_col(input: Column) -> Column:
.... If we import this file into the code after the import of the quinn we will break everything. The following code block won't work. from quinn.extensions import *
from .my_function import *
source_df.withColumn("is_stuff_falsy", F.col("has_stuff").isFalsy()) The same will be if we mix import from quinn and from pyspark.sql in the same file: result will depend of the order of imports. It is a really bad practice that may create a lot of confusing... |
Beta Was this translation helpful? Give feedback.
-
It is a place for discussion topics from #26 #6, #35 and also mentioned in #42.
For me pattern with implicit modification of existed PySpark classes with
import *
is against Python Zen:Explicit is better than implicit.
An example is:
My suggestion is the following:
DeprecationWarning
with suggestion to use explicit functions instead ofimport *
Beta Was this translation helpful? Give feedback.
All reactions