The DolphinDB Py plugin has the branches release 200 and release130. Each plugin version corresponds to a DolphinDB server version. You're looking at the plugin documentation for release200. If you use a different DolphinDB server version, please refer to the corresponding branch of the plugin documentation.
- 1. Prerequisites
- 2. Install the Plugin
- 3. Methods
- 4. Examples
- 4.1 Load Plugin
- 4.2 Data Type Conversion to and from Python
- 4.3 Print Default Module Search Path Using Built-In Python Module
- 4.4 Import NumPy and Call Static Methods
- 4.5 Import sklearn and Call Instance Methods
- 4.6 Import Module and Call Static Methods
- 4.7 Convert DataFrame to Table with Index Retained
- 5. Data Conversion
The DolphinDB Py Plugin is implemented based on python C-API protocol, and you can call third-party Python libraries in DolphinDB. The plugin uses pybind11 library.
- libpython3.7m.so
You can specify the Python version using DPYTHON:STRING
when building the plugin with cmake. Currently only python3.6, 3.7, and 3.8 are supported.
- libDolphinDB.so
Please specify the absolute path of libpython3.7m.so for the configuration parameter globalDynamicLib in the configuration file dolphindb.cfg:
globalDynamicLib=/path_to_libpython3.7m.so/libpython3.7m.so
If you are using a different Python version, please specify it as:
globalDynamicLib=/DolphinDB/server/plugins/py/libpython3.6m.so.1.0
Note: The development requires the Python runtime environment. The runtime environment (which can be checked with sys.path
) should be the same as the installation environment.
Install CMake:
sudo apt-get install cmake
Build the project:
mkdir build
cd build
cmake -DPYTHON:STRING=3.7 ../
make
Note: Please make sure the file libDolphinDB.so is under the GCC search path before compilation. You can add the plugin path to the library search path LD_LIBRARY_PATH
or copy it to the build directory. You can specify the Python version using -DPYTHON:STRING
. Currently only python3.6, 3.7, and 3.8 are supported.
After successful compilation, the file libPluginPy.so is generated under the directory.
loadPlugin("/path_to_PluginPy/PluginPy.txt");
Note: Please install Python libraries NumPy and pandas before loading the plugin. Otherwise, it will raise an error:
terminate called after throwing an instance of 'pybind11::error_already_set'
what(): ModuleNotFoundError: No module named 'numpy'
Syntax
py::toPy(obj)
Arguments
- obj: a DolphinDB object
Details
Convert an object of DolphinDB data type to a Python object. For the data type conversion, see 5.1 From DolphinDB to Python.
Examples
x = 1 2 3 4;
pyArray = py::toPy(x);
y = 2 3 4 5;
d = dict(x, y);
pyDict = py::toPy(d);
Syntax
py::fromPy(obj, [addIndex=false])
Arguments
- obj: a Python object
- addIndex: this parameter is required only when obj is a pandas.DataFrame. True: the index of obj will become the first column of the result table after the conversion. False (default): the index of obj will be discarded in the conversion.
Details
Convert a Python object into a DolphinDB object. For supported data types, see 5.2 From Python to DolphinDB. For an example of converting a pandas.DataFrame into a DolphinDB table while keeping the index, see 4.7 Convert DataFrame to Table with Index Retained.
Examples
x = 1 2 3 4;
l = py::toPy(x);
re = py::fromPy(l);
re;
Syntax
py::importModule(moduleName)
Arguments
- moduleName: a STRING indicating the name of the module to be imported.
Details
Import a Python module (or a submodule). The module must have been installed in your environment. Use the pip3 list
command to check all the installed modules and use the pip3 install
command to install modules.
Examples
np = py::importModule("numpy"); //import numpy
linear_model = py::importModule("sklearn.linear_model"); //import the linear_model submodule of sklearn
Note: To import your own Python module, place the module file under the same directory as the DolphinDB server or under the "lib" directory (which you can check by calling the sys.path
function in Python). For examples, see 4.6 Import Module and Call Static Methods.
Syntax
py::cmd(command)
Arguments
- command: a STRING indicating a python script.
Details
Run a Python script in DolphinDB.
Examples
sklearn = py::importModule("sklearn"); // import sklearn
py::cmd("from sklearn import linear_model"); // import the linear_model submodule from sklearn
Syntax
py::getObj(module, objName)
Arguments
- module: a module which is already imported to DolphinDB, e.g., the return of
py::importModule
. - objName: a STRING indicating the name of the object you want to get.
Details
Get all imported submodules of a module or get the attributes of an object. If a submodule is not imported to DolphinDB yet, call py::cmd
to import it from the module.
Examples
np = py::importModule("numpy"); //import numpy
random = py::getObj(np, "random"); ////get the random submodule of numpy
sklearn = py::importModule("sklearn"); //import sklearn
py::cmd("from sklearn import linear_model"); //import a submodule from sklearn
linear_model = py::getObj(sklearn, "linear_model"); //get the imported submodule
Note:
- The "random" submodule is automatically imported when numpy is imported, so you can directly get the "random" submodule with
py::getObject
. - When sklearn is imported, its submodules are not imported. Therefore, to get the submodule of sklearn, you must first import the submodule through
py::cmd("from sklearn import linear_model")
before calling thepy::getObject
method. If you only want the submodule, it's more convenient to simply use thelinear_model=py::importModule("sklearn.linear_model")
statement.
Syntax
py::getFunc(module, funcName, [convert=true])
Arguments
- module: the Python module imported previously. For example, the return value of method
py::importModule
orpy::getObj
. - funcName: a STRING scalar indicating the function to be obtained
- convert: whether to convert the results of the function into DolphinDB data types automatically. The default is true.
Details
Return a static method from a Python module. The function object can be executed directly in DolphinDB and its arguments can be of DolphinDB data types. Currently, keyword arguments are not supported.
Examples
np = py::importModule("numpy"); //import numpy
eye = py::getFunc(np, "eye"); //get function eye
np = py::importModule("numpy"); //import numpy
random = py::getObj(np, "random"); //get submodule random
randint = py::getFunc(random, "randint"); //get function randint
Syntax
py::getInstanceFromObj(obj, [args])
Arguments
- obj: the Python object obtained previously. For example, the return value of method
py::getObj
. - args: the arguments to be passed to the instance. It is an optional argument.
Details
Construct an instance based on the Python object obtained previously. You can access the attributes and methods of the instance with ".
", which returns value of DolphinDB data type if it is convertible, otherwise returns a Python object.
Examples
sklearn = py::importModule("sklearn");
py::cmd("from sklearn import linear_model");
linearR = py::getObj(sklearn,"linear_model.LinearRegression")
linearInst = py::getInstanceFromObj(linearR);
py::getInstance(module, objName, [args])
Arguments
- module: the Python module imported previously. For example, the return value of method
py::importModule
orpy::getObj
. - objName: a STRING scalar indicating the target object.
- args: the arguments to be passed to the instance. It is an optional argument.
Details
Get an instance from a Python module. You can access the attributes and methods of the instance with ".", which returns value of DolphinDB data type if it is convertible, otherwise returns a Python object.
Examples
linear_model = py::importModule("sklearn.linear_model"); //import submodule linear_model of sklearn
linearInst = py::getInstance(linear_model,"LinearRegression")
Note: The method py::getFunc
obtains the static methods from a module. To call an instance method, please use py::getInstanceFromObj
or py::getInstance
to obtain the instance and acccess the method with ".
".
Syntax
py::reloadModule(module)
Arguments
- module: the Python module imported previously. For example, the return value of method
py::importModule
.
Details
If a module is modified after being imported, please execute py::reloadModule
instead of py::importModule
to use the modified module.
Examples
model = py::importModule("fibo"); //fibo is a module implemented in Section 4.6
model = py::reloadModule(model); //reload the module if fibo.py is modified
loadPlugin("/path/to/plugin/PluginPy.txt");
use py;
x = 1 2 3 4;
y = 2 3 4 5;
d = dict(x, y);
pyDict = py::toPy(d);
Dict = py::fromPy(pyDict);
Dict;
sys = py::importModule("sys");
path = py::getObj(sys, "path");
dpath = py::fromPy(path);
dpath;
np = py::importModule("numpy"); //import numpy
eye = py::getFunc(np, "eye"); //get the eye function of numpy
re = eye(3); //create a diagonal matrix using eye
re;
random = py::getObj(np, "random"); //get the randome submodule of numpy
randint = py::getFunc(random, "randint"); //get the randint function of random
re = randint(0,1000,[2,3]); //execute randint
re;
//one way to get the LinearRegression method
linear_model = py::importModule("sklearn.linear_model"); //import the linear_model submodule from sklearn
linearInst = py::getInstance(linear_model,"LinearRegression")
//the other way
sklearn = py::importModule("sklearn"); //import sklearn
py::cmd("from sklearn import linear_model"); //import the linear_model submodule from sklearn
linearR = py::getObj(sklearn,"linear_model.LinearRegression")
linearInst = py::getInstanceFromObj(linearR);
X = [[0,0],[1,1],[2,2]];
Y = [0,1,2];
linearInst.fit(X, Y); //call the fit function
linearInst.coef_; //output:[0.5,0.5]
linearInst.intercept_; // output: 1.110223E-16 ~ 0
Test = [[3,4],[5,6],[7,8]];
re = linearInst.predict(Test); //call the predict function
re; //output: [3.5, 5.5, 7.5]
datasets = py::importModule("sklearn.datasets");
load_iris = py::getFunc(datasets, "load_iris"); //get the static function load_iris
iris = load_iris(); //call load_iris
datasets = py::importModule("sklearn.datasets");
decomposition = py::importModule("sklearn.decomposition");
PCA = py::getInstance(decomposition, "PCA");
py_pca=PCA.fit_transform(iris['data'].row(0:3)); //train the fir three rows of irir['data']
py_pca.row(0); //output:[0.334781147691283, -0.011991887788418, 2.926917846106032e-17]
Note: Use the row
function to access the rows of a matrix in DolphinDB. As shown in the above example, iris['data'].row(0:3)
retrieves the first three rows from iris['data']
. To retrieve the first three columns, use iris['data'][0:3]
.
In this case we have implemented the python module with two static methods: fib(n)
prints the Fibonacci series from 0 to n; fib2(n)
returns the Fibonacci series from 0 to n.
We save the module as fibo.py and copy it to the directory where DolphinDB server is located (or to the library path printed by sys.path
).
def fib(n): # write Fibonacci series up to n
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a+b
print()
def fib2(n): # return Fibonacci series up to n
result = []
a, b = 0, 1
while a < n:
result.append(a)
a, b = b, a+b
return result
Then load the plugin and use the module after importing it:
loadPlugin("/path/to/plugin/PluginPy.txt"); //load the Py plugin
fibo = py::importModule("fibo"); //import the module fibo
fib = py::getFunc(fibo,"fib"); //get the fib function
fib(10); //call fib function, print 0 1 1 2 3 5 8
fib2 = py::getFunc(fibo,"fib2"); //get the fib2 function in the module
re = fib2(10); //call the fib2 function
re; //output: 0 1 1 2 3 5 8
When calling Python functions that return a DataFrame, you can set convert=false of the getFunc
function if you want to keep the index as the first column of the converted table. Use the fromPy
function to convert the result to a table (set addIndex=true).
Implement a function returned pandas.DataFrame. Save the result as demo.py and copy it to the directory where DolphinDB server is located (or to the library path printed by sys.path
).
import pandas as pd
import numpy as np
def createDF():
index=pd.Index(['a','b','c'])
df=pd.DataFrame(np.random.randint(1,10,(3,3)),index=index)
return df
Load the plugin and import the module demo. Use the function to keep the index of the DataFrame as the first column of the result.
loadPlugin("/path/to/plugin/PluginPy.txt"); //load the Py plugin
model = py::importModule("demo");
func1 = py::getFunc(model, "createDF", false)
tem = func1()
re = py::fromPy(tem, true)
Data Form Conversion
DolphinDB Form | Python Form |
---|---|
vector | NumPy.array |
matrix | NumPy.array |
set | Set |
dictionary | Dictionary |
table | pandas.DataFrame |
Data Type Conversion
DolphinDB Type | Python Type |
---|---|
BOOL | bool |
CHAR | int64 |
SHORT | int64 |
INT | int64 |
LONG | int64 |
DOUBLE | float64 |
FLOAT | float64 |
STRING | String |
DATE | datetime64[D] |
MONTH | datetime64[M] |
TIME | datetime64[ms] |
MINUTE | datetime64[m] |
SECOND | datetime64[s] |
DATETIME | datetime64[s] |
TIMESTAMP | datetime64[ms] |
NANOTIME | datetime64[ns] |
NANOTIMESTAMP | datetime64[ns] |
DATEHOUR | datetime64[s] |
vector | NumPy.array |
matrix | NumPy.array |
set | Set |
dictionary | Dictionary |
table | pandas.DataFrame |
- DolphinDB CHAR types are converted into Python int64 type.
- As the temporal types in Python pandas are datetime64[ns], all DolphinDB temporal types [are converted into datetime64ns] type. MONTH type such as 2012.06M is converted into 2012-06-01 (the first day of the month). TIME, MINUTE, SECOND and NANOTIME types do not include information about date. 1970-01-01 is automatically added during conversion. For example, 13:30m is converted into 1970-01-01 13:30:00.
- NULLs of logical, temporal and numeric types are converted into NaN or NaT; NULLs of string types are converted into empty strings. If a vector contains NULL values, the data type may change. For example, if a vector of Boolean type contains NULL values, NULL will be converted to NaN. As a result, the vector's data type will be converted to float64, and the TRUE and False values will be converted to 1 and 0, respectively.
Data Form Conversion
Python Form | DolphinDB Form |
---|---|
Tuple | vector |
List | vector |
Dictionary | dictionary |
Set | set |
NumPy.array | vector(1d) / matrix(2d) |
pandas.DataFrame | table |
Data Type Conversion
Python Type | DolphinDB Type |
---|---|
bool | BOOL |
int8 | CHAR |
int16 | SHORT |
int32 | INT |
int64 | LONG |
float32 | FLOAT |
float64 | DOUBLE |
String | STRING |
datetime64[M] | MONTH |
datetime64[D] | DATE |
datetime64[m] | MINUTE |
datetime64[s] | DATETIME |
datetime64[h] | DATEHOUR |
datetime64[ms] | TIMESTAMP |
datetime64[us] | NANOTIMESTAMP |
datetime64[ns] | NANOTIMESTAMP |
- The numpy.array will be converted to DolphinDB vector (1d) or matrix (2d) based on its dimension.
- As the only temporal data type in Python pandas is datetime64, all temporal columns of a DataFrame are converted into NANOTIMESTAMP type
- When a pandas.DataFrame is converted to DolphinDB table, if the column name is not supported in DolphinDB, it will be adjusted based on the following rules:
- If special characters except for letters, digits or underscores are contained in the column names, they are converted to underscores.
- If the first character is not a letter, "c" is added as the first character of the column name.