Skip to content

Commit

Permalink
Merge pull request #744 from myhloli/para-split-v3
Browse files Browse the repository at this point in the history
fix(para_split_v3): refine list block detection in paragraph splitting
  • Loading branch information
myhloli authored Oct 15, 2024
2 parents 0d83fb7 + 81b9fd7 commit f50bc87
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion magic_pdf/para/para_split_v3.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ def __is_list_or_index_block(block):
line[ListLineTag.IS_LIST_END_LINE] = True
line_start_flag = True
# 一种有缩进的特殊有序list,start line 左侧不贴边且以数字开头,end line 以 IS_LIST_END_LINE 结尾且数量和start line 一致
elif num_start_count == flag_end_count: # 简单一点先不考虑左侧不贴边的情况
elif num_start_count >= 2 and num_start_count == flag_end_count: # 简单一点先不考虑左侧不贴边的情况
for i, line in enumerate(block['lines']):
if lines_text_list[i][0].isdigit():
line[ListLineTag.IS_LIST_START_LINE] = True
Expand Down

0 comments on commit f50bc87

Please sign in to comment.