Skip to content

Commit

Permalink
Merge pull request #943 from myhloli/dev
Browse files Browse the repository at this point in the history
fix(ocr_mkcontent): improve handling of single-character content #937
  • Loading branch information
myhloli authored Nov 13, 2024
2 parents 37b36df + 2de1d0e commit 927fc6c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion magic_pdf/dict2md/ocr_mkcontent.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ def merge_para_with_text(para_block):
# 如果是前一行带有-连字符,那么末尾不应该加空格
if __is_hyphen_at_line_end(content):
para_text += content[:-1]
elif len(content) == 1 and content not in ['A', 'I', 'a', 'i']:
elif len(content) == 1 and content not in ['A', 'I', 'a', 'i'] and not content.isdigit():
para_text += content
else: # 西方文本语境下 content间需要空格分隔
para_text += f"{content} "
Expand Down

0 comments on commit 927fc6c

Please sign in to comment.