Skip to content

Commit

Permalink
fix(ocr_mkcontent): improve handling of single-character content
Browse files Browse the repository at this point in the history
- Add digit check for single-character content to avoid adding unnecessary spaces
  • Loading branch information
myhloli committed Nov 13, 2024
1 parent 963d0be commit 2de1d0e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion magic_pdf/dict2md/ocr_mkcontent.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ def merge_para_with_text(para_block):
# 如果是前一行带有-连字符,那么末尾不应该加空格
if __is_hyphen_at_line_end(content):
para_text += content[:-1]
elif len(content) == 1 and content not in ['A', 'I', 'a', 'i']:
elif len(content) == 1 and content not in ['A', 'I', 'a', 'i'] and not content.isdigit():
para_text += content
else: # 西方文本语境下 content间需要空格分隔
para_text += f"{content} "
Expand Down

0 comments on commit 2de1d0e

Please sign in to comment.