-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
try dill to support lambda function? #22
Comments
+1. Adding lambda support would allow identical syntax between the regular and parallel versions of apply, and in my experiments it was as simple as @slimtom95 says. |
Pandarallel uses concurrent.futures.ProcessPoolExecutor, which uses itself pickle, and lambda functions are not pickleable... A good idea could be to use pathos, but today pathos does not support concurrent.futures: uqfoundation/pathos#90. |
Pandarallel 1.3.0 supports now lambda functions. |
Thanks! I've made it work with lambdas from Jupyter. One issue, though - does it take some sort of footprint of the state of the code the first time it runs? def myfunc(x):
pass
df['ip_long'].parallel_map(lambda x: myfunc(x)) it works fine. Then if I re-run the cell with: def myfunc2(x):
pass
df['ip_long'].parallel_map(lambda x: myfunc2(x)) I get RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
func = lambda args: f(*args)
File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pandarallel/series.py", line 23, in worker_map
res = getattr(series[chunk], map_func)(arg, **kwargs)
File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pandas/core/series.py", line 3382, in map
arg, na_action=na_action)
File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pandas/core/base.py", line 1218, in _map_values
new_values = map_f(values, mapper)
File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer
File "<ipython-input-16-96ab38ec54e8>", line 34, in <lambda>
NameError: name 'myfunc2' is not defined
"""
The above exception was the direct cause of the following exception:
NameError Traceback (most recent call last)
<ipython-input-16-96ab38ec54e8> in <module>()
32
33
---> 34 df['ip_long'].parallel_map(lambda x: myfunc2(x))
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pandarallel/utils.py in wrapper(*args, **kwargs)
61 """Please see the docstring of this method without `parallel`"""
62 try:
---> 63 return func(*args, **kwargs)
64
65 except _PlasmaStoreFull:
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pandarallel/series.py in closure(data, arg, **kwargs)
36
37 with ProcessingPool(nb_workers) as pool:
---> 38 result_workers = pool.map(Series.worker_map, workers_args)
39
40 result = pd.concat([
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pathos/multiprocessing.py in map(self, f, *args, **kwds)
135 AbstractWorkerPool._AbstractWorkerPool__map(self, f, *args, **kwds)
136 _pool = self._serve()
--> 137 return _pool.map(star(f), zip(*args)) # chunksize
138 map.__doc__ = AbstractWorkerPool.map.__doc__
139 def imap(self, f, *args, **kwds):
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/multiprocess/pool.py in map(self, func, iterable, chunksize)
264 in a list that is returned.
265 '''
--> 266 return self._map_async(func, iterable, mapstar, chunksize).get()
267
268 def starmap(self, func, iterable, chunksize=None):
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/multiprocess/pool.py in get(self, timeout)
642 return self._value
643 else:
--> 644 raise self._value
645
646 def _set(self, i, obj):
NameError: name 'myfunc2' is not defined |
same error here |
Greetings.
I know
pickle
doesn't support lambda function serialization, but another serialization librarydill
does. And there is also a multiprocessing library,multiprocess
, which usesdill
to replacepickle
.I'm new here. There may be some reasons that we don't support lambda functions, because of upstream dependent package or else.Just want to mention these, if we didn't notice them before.
Regards
The text was updated successfully, but these errors were encountered: