Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMOOCSpider's issues #10

Open
xiaohua9 opened this issue Jul 28, 2019 · 0 comments
Open

IMOOCSpider's issues #10

xiaohua9 opened this issue Jul 28, 2019 · 0 comments

Comments

@xiaohua9
Copy link

看了您的代码,我受益匪浅,如沐春风

在这里我也想给您反馈一个问题
  • 问题:
    您使用result字符串变量存储页面源码,问题在于您使用了定长字符串String类型,我也看了您爬取的页面,该页面有近一千行,此处用n代替,根据您的写法,那么您将浪费n-1个内存空间。注:不乏几千行的网页源码。
  • 解决方案:
    1、直接将定常字符串改为变长字符串,这样就避免了以上的内存浪费(但是:如果坚持用字符串存储,还存在一个问题,如果页面的数据足够多,那将超出字符串的容量上界);
    2、第二个方案就是,引进文件操作,使用本地文件作为页面源码的临时存取站,虽然麻烦了一些,但是更安全、保险。

略表拙见,不知所言

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant