Skip to content

DSLParser介绍

mgqa34 edited this page Feb 20, 2020 · 1 revision

DSLParser

DSLParser模块的功能是解析离线训练生成的推理文件,构造DSL建模DAG流程图提供给在线推理过程,下面描述一下具体流程

*1 根据输入的dsl_json序列化对象反序列化得到dsl json对象 *2 从dsl json对象中获取组件关系字典,示例格式如下:

{'dataio_0': {'CodePath': 'federatedml/util/data_io.py/DataIO',
              'input': {'data': {'data': ['args.eval_data']},
                        'model': ['pipeline.dataio_0.dataio']},
              'module': 'DataIO',
              'output': {'data': ['train']}},
'hetero_feature_binning_0': {'CodePath': 'federatedml/feature/hetero_feature_binning/hetero_binning_guest.py/HeteroFeatureBinningGuest',
                            'input': {'data': {'data': ['dataio_0.train']},
                                      'model': ['pipeline.hetero_feature_binning_0.binning_model']},
                            'module': 'HeteroFeatureBinning',
                            'output': {'data': ['transform_data']}},
hetero_feature_selection_0': {'CodePath': 'federatedml/feature/hetero_feature_selection/feature_selection_guest.py/HeteroFeatureSelectionGuest',
                              'input': {'data': {'data': ['hetero_feature_binning_0.transform_data']},
                                        'model': ['pipeline.hetero_feature_selection_0.selected']},
                              'module': 'HeteroFeatureSelection',
                              'output': {'data': ['train']}},
'one_hot_0': {'CodePath': 'federatedml/feature/one_hot_encoder.py/OneHotEncoder',
              'input': {'data': {'data': ['hetero_feature_selection_0.train']},
                        'model': ['pipeline.one_hot_0.one_hot_encoder']},
              'module': 'OneHotEncoder',
              'output': {'data': ['output_data']}}},
'hetero_lr_0': {'CodePath': 'federatedml/linear_model/logistic_regression/hetero_logistic_regression/hetero_lr_guest.py/HeteroLRGuest',
                'input': {'data': {'eval_data': ['one_hot_0.output_data']},
                          'model': ['pipeline.hetero_lr_0.hetero_lr']},
                'module': 'HeteroLR',
                'output': {'data': ['train']}}}

其中关键信息是{ "组件名": "算法模块名", "input": {} },我们使用("组件名", "算法模块名")表示节点,"input"信息来表示图上的边依赖关系,其他关于dsl更详细的说明可以去FATE仓库进一步了解。

*3 从2中得到的组件关系字典中,根据每个"组件名"进行图的节点初始化,同时,解析"input"的"data"关键字,从里面提取出上游依赖关系,构建有向边集(上游->自己),点集和边集构建完成后,使用拓扑排序,得到建模流程图的节点拓扑序数组 topoRankComponent

*4 另外初始化每个组件对应的算法模块名 componentModuleMap 和上游输入 upInputs