Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Follow up #9

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
77763db
decouple jpype dependencies when installing konlpy
shaynekang May 5, 2016
11d93f9
move jpype requirements to install_requires
shaynekang May 5, 2016
54b61f4
both JPype1 and JPype1-py3 doesn't support Windows.
shaynekang May 5, 2016
8d181ec
Merge branch 'master' of https://github.com/konlpy/konlpy into jpype-…
shaynekang May 5, 2016
952458d
update document
shaynekang May 5, 2016
871f361
fix installation error on python 2.6
shaynekang May 5, 2016
e573cb8
fix misspelling
shaynekang May 5, 2016
8c692c3
Update mecab.sh
taehwoi Jun 4, 2016
93aff9e
removing sudo & check python3 on mecab install script
theeluwin Jun 9, 2016
020bcc5
Update README.rst
e9t Jun 27, 2016
aa4e225
Resolve #11
e9t Nov 14, 2016
3e7182b
Update CONTRIBUTING.rst
e9t Nov 14, 2016
10db1a5
Upgrade MeCab dictionary version to 2.0.1
e9t Nov 14, 2016
0ac7529
Elaborate error message
e9t Nov 14, 2016
9e785d8
Update CONTRIBUTING.rst
e9t Nov 14, 2016
7ce3078
Update MeCab download script
e9t Nov 14, 2016
d8611cf
Fix PEP8 styling error
e9t Nov 14, 2016
acac5e9
Increase Java memory allocation (Xmx)
e9t Nov 14, 2016
2b68672
Update CONTRIBUTING.rst
e9t Nov 14, 2016
4af8357
Merge pull request #101 from theeluwin/master
e9t Nov 14, 2016
4e61e00
Merge branch 'master' of github.com:konlpy/konlpy
e9t Nov 14, 2016
0a38562
Merge branch 'master' of https://github.com/indiofish/konlpy into ind…
e9t Nov 14, 2016
95388c5
Merge branch 'indiofish-master'
e9t Nov 14, 2016
fae0b69
Merge branch 'jpype-dependency' of https://github.com/shaynekang/konl…
e9t Nov 14, 2016
6be80b8
Merge branch 'shaynekang-jpype-dependency'
e9t Nov 14, 2016
95a656d
Remove --use-mirrors from .travis.yml
e9t Nov 14, 2016
4e611d8
Update reference link
Dec 13, 2016
c553f8f
Merge pull request #126 from Swalloow/docs
e9t Dec 13, 2016
1abd256
Add join parameter
pinetree408 Feb 15, 2017
d0fd8e8
Merge pull request #135 from pinetree408/enhancement/add-join-parameter
e9t Feb 18, 2017
3fbe76c
Merge branch 'master' of https://github.com/konlpy/konlpy
lifefeel Jun 5, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ python:
- "3.4"

before_install:
- if [[ $TRAVIS_PYTHON_VERSION == 2* ]]; then pip install -r requirements.txt --use-mirrors; fi
- if [[ $TRAVIS_PYTHON_VERSION == 3* ]]; then pip install -r requirements-py3.txt --use-mirrors; fi
- if [[ $TRAVIS_PYTHON_VERSION == 2* ]]; then pip install -r requirements.txt; fi
- if [[ $TRAVIS_PYTHON_VERSION == 3* ]]; then pip install -r requirements-py3.txt; fi
- pip install coveralls
- pip install pytest-cov

Expand Down
24 changes: 19 additions & 5 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,34 @@ KoNLPy는 오픈소스 프로젝트입니다.
2. 같은 이슈가 이미 제기되었고,
- 이슈가 해결되었다면(closed): 최신 릴리즈에서 문제가 해결되었을 가능성이 높습니다. 쓰레드에서 다른 분들이 어떻게 해결했는지 파악해보는 것도 좋은 방법입니다.
- 이슈가 아직 해결되지 않았다면(open): 댓글로 문제 상황을 설명해주세요. 같은 상황을 겪고 있는 사람들이 많이 모일수록 문제는 빠르게 해결될 수 있습니다.
3. 같은 이슈가 아직 제기되지 않았다면, "New Issue" 버튼을 눌러 이슈를 새로 생성해주시면 됩니다. 이슈를 새로 생성하시는 경우에는 사용하는 OS나 패키지 버젼 등을 같이 적어주시면 문제를 빠르게 해결하는데 도움이 됩니다.
3. 같은 이슈가 아직 제기되지 않았다면, "New Issue" 버튼을 눌러 이슈를 새로 생성해주시면 됩니다. 이슈를 새로 생성하시는 경우에는 사용하는 OS나 패키지 버젼 등을 같이 적어주시면 문제를 빠르게 해결하는데 도움이 됩니다.


3. 이슈 제안/해결하기
---------------------

- `깃헙 이슈 <https://github.com/konlpy/konlpy/issues>`_ 에 코드를 개선할 수 있는 방법을 제안하거나, 제안된 이슈에 대해 토론/해결하실 수 있습니다.
- 기여하신 부분에 대해 정확한 attribution을 할 수 있도록, 가능하다면 pull request를 보내주시기 바랍니다.
- 코드를 작성할 때는 다음에 유의해주세요.
1. 탭 대신 공백 4개 사용
2. 문서에서 특별히 언급되지 않은 사항은 일단 코드의 다른 부분들을 참고해서 작성 (+ 다른 분들의 편의를 위해 이 문서를 업데이트 해주세요)
3. 커밋 로그는 설명력 있게 작성
4. PR을 보내면 해당 코드는 KoNLPy의 오픈소스 라이센스를 따름
5. PR를 보낸 후 코드의 일부를 변경하도록 요청될 경우, ``git commit --amend`` 로 커밋을 수정
- 코드 작성을 완료한 후 코드가 모든 테스트를 통과하는지 확인해주세요.
1. 자바 코드를 수정한 경우::

# Install `Apache Ant <http://ant.apache.org/manual/install.html>`_
make java

1. 코드를 단 한 줄이라도 수정한 모든 경우::

pip install -r requirements-dev.txt
pip3 install -r requirements-dev.txt
make build # create tar.gz
make check # check code styles
make testall # run tests

- PR을 보내기 전 다음을 확인해주세요.
1. PR을 보내면 해당 코드는 KoNLPy의 오픈소스 라이센스를 따름
1. PR를 보낸 후 코드의 일부를 변경하도록 요청될 경우, ``git commit --amend`` 로 커밋을 수정


4. 문서 수정하기
Expand All @@ -55,7 +69,7 @@ Setup docs
1. Fork and clone KoNLPy::

git clone [email protected]:[your_github_id]/konlpy.git

2. Include the following lines in your `~/.bashrc`::

export LC_ALL=en_US.UTF-8
Expand Down
7 changes: 5 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,14 @@
#
# TODO: use flake8 and/or pylint

build:
python setup.py sdist --formats=gztar,zip

check:
check-manifest
pyroma dist/konlpy-*tar.gz
pep8 --ignore==E501 konlpy/*.py
pep8 --ignore==E501 konlpy/*/*.py
pep8 --ignore=E501 konlpy/*.py
pep8 --ignore=E501 konlpy/*/*.py

testpypi:
sudo python setup.py register -r pypitest
Expand Down
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ KoNLPy

KoNLPy is a Python package for natural language processing of the Korean language.

- English documentation: http://konlpy.org/en
- 한국어 문서: http://konlpy.org/ko
- English documentation: http://konlpy.org/en/latest
- 한국어 문서: http://konlpy.org/ko/latest

Links
------
Expand Down
5 changes: 5 additions & 0 deletions description.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
__title__ = 'KoNLPy'
__version__ = '0.4.3'
__author__ = 'Lucy Park'
__license__ = 'GPL v3'
__copyright__ = 'Copyright 2015 Lucy Park'
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def __getattr__(cls, name):
# documentation root, use os.path.abspath to make it absolute, like shown here.

sys.path.insert(0, os.path.abspath('..'))
from konlpy import __version__
from description import __version__

# -- General configuration -----------------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion docs/examples/multithreading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,5 @@ Check out how much faster it gets!

.. note::
- Some useful references on concurrency with Python:
- 장혜식, `"파이썬은 멀티코어 줘도 쓰잘데기가 없나요?"에 대한 파이썬 2.6의 대답 <http://openlook.org/blog/2008/06/28/python-multiprocessing/>`_, 2008.
- 장혜식, `"파이썬은 멀티코어 줘도 쓰잘데기가 없나요?"에 대한 파이썬 2.6의 대답 <http://highthroughput.org/wp/python-multiprocessing/>`_, 2008.
- 하용호, `파이썬으로 클라우드 하고 싶어요 <http://www.slideshare.net/devparan/h3-2011-c6-python-and-cloud>`_, 2011.
19 changes: 4 additions & 15 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,8 @@ Ubuntu

.. sourcecode:: bash

$ sudo apt-get install g++ openjdk-7-jdk python-dev python3-dev # Install Java 1.7 or up
$ pip install JPype1 # Python 2.x
$ pip3 install JPype1-py3 # Python 3.x
# Install Java 1.7 or up
$ sudo apt-get install g++ openjdk-7-jdk python-dev python3-dev

2. Install KoNLPy

Expand Down Expand Up @@ -57,9 +56,6 @@ CentOS
$ make # Build
$ sudo make altinstall

$ pip install JPype1 # Python 2.x
$ pip3 install JPype1-py3 # Python 3.x

2. Install KoNLPy

.. sourcecode:: bash
Expand All @@ -78,21 +74,14 @@ CentOS
Mac OS
------

1. Install dependencies

.. sourcecode:: bash

$ pip install JPype1 # Python 2.x
$ pip3 install JPype1-py3 # Python 3.x

2. Install KoNLPy
1. Install KoNLPy

.. sourcecode:: bash

$ pip install konlpy # Python 2.x
$ pip3 install konlpy # Python 3.x

3. Install MeCab (*optional*)
2. Install MeCab (*optional*)

.. sourcecode:: bash

Expand Down
4 changes: 2 additions & 2 deletions konlpy/jvm.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,14 @@ def init_jvm(jvmpath=None):
jvmpath = jvmpath or jpype.getDefaultJVMPath()

# NOTE: Temporary patch for Issue #76. Erase when possible.
if sys.platform=='darwin'\
if sys.platform == 'darwin'\
and jvmpath.find('1.8.0') > 0\
and jvmpath.endswith('libjvm.dylib'):
jvmpath = '%s/lib/jli/libjli.dylib' % jvmpath.split('/lib/')[0]

if jvmpath:
jpype.startJVM(jvmpath, '-Djava.class.path=%s' % classpath,
'-Dfile.encoding=UTF8',
'-ea', '-Xmx768m')
'-ea', '-Xmx1024m')
else:
raise ValueError("Please specify the JVM path.")
21 changes: 13 additions & 8 deletions konlpy/tag/_hannanum.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,12 @@
tag_re = '(.+?\\/\\w+)\\+?'


def parse(result, flatten=False):
def parse_opt(opt):
return [tuple(u.rsplit('/', 1)) for u in re.findall(tag_re, opt.strip())]
def parse(result, flatten=False, join=False):
def parse_opt(opt, join=False):
if join:
return [u for u in re.findall(tag_re, opt.strip())]
else:
return [tuple(u.rsplit('/', 1)) for u in re.findall(tag_re, opt.strip())]

if not result:
return []
Expand All @@ -28,10 +31,10 @@ def parse_opt(opt):
parts = utils.partition(elems, index)

if flatten:
return sum([parse_opt(opt) for part in parts
return sum([parse_opt(opt, join=join) for part in parts
for opt in list(filter(None, part))[1:]], [])
else:
return [[parse_opt(opt) for opt in list(filter(None, part))[1:]]
return [[parse_opt(opt, join=join) for opt in list(filter(None, part))[1:]]
for part in parts]


Expand Down Expand Up @@ -71,21 +74,23 @@ def analyze(self, phrase):
result = self.jhi.morphAnalyzer(phrase)
return parse(result)

def pos(self, phrase, ntags=9, flatten=True):
def pos(self, phrase, ntags=9, flatten=True, join=False):
"""POS tagger.

This tagger is HMM based, and calculates the probability of tags.

:param ntags: The number of tags. It can be either 9 or 22.
:param flatten: If False, preserves eojeols."""
:param flatten: If False, preserves eojeols.
:param join: If True, returns joined sets of morph and tag.
"""

if ntags == 9:
result = self.jhi.simplePos09(phrase)
elif ntags == 22:
result = self.jhi.simplePos22(phrase)
else:
raise Exception('ntags in [9, 22]')
return parse(result, flatten=flatten)
return parse(result, flatten=flatten, join=join)

def nouns(self, phrase):
"""Noun extractor."""
Expand Down
19 changes: 14 additions & 5 deletions konlpy/tag/_kkma.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,12 @@ def nouns(self, phrase):
if not nouns: return []
return [nouns.get(i).getString() for i in range(nouns.size())]

def pos(self, phrase, flatten=True):
def pos(self, phrase, flatten=True, join=False):
"""POS tagger.

:param flatten: If False, preserves eojeols."""
:param flatten: If False, preserves eojeols.
:param join: If True, returns joined sets of morph and tag.
"""

sentences = self.jki.morphAnalyzer(phrase)
morphemes = []
Expand All @@ -63,10 +65,17 @@ def pos(self, phrase, flatten=True):
if flatten:
for k in range(eojeol.size()):
morpheme = eojeol.get(k)
morphemes.append((morpheme.getString(), morpheme.getTag()))
if join:
morphemes.append(morpheme.getString() + '/' + morpheme.getTag())
else:
morphemes.append((morpheme.getString(), morpheme.getTag()))
else:
morphemes.append([(eojeol.get(k).getString(), eojeol.get(k).getTag())
for k in range(eojeol.size())])
if join:
morphemes.append([eojeol.get(k).getString() + '/' + eojeol.get(k).getTag()
for k in range(eojeol.size())])
else:
morphemes.append([(eojeol.get(k).getString(), eojeol.get(k).getTag())
for k in range(eojeol.size())])

return morphemes

Expand Down
24 changes: 16 additions & 8 deletions konlpy/tag/_komoran.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,20 @@
__all__ = ['Komoran']


def parse(result, flatten):
def _parse(token):
return [tuple(s[1:].rsplit('/', 1)) for s in re.findall('\+.+?/[A-Z]+', token)]
def parse(result, flatten, join=False):
def _parse(token, join=False):
if join:
return [s[1:] for s in re.findall('\+.+?/[A-Z]+', token)]
else:
return [tuple(s[1:].rsplit('/', 1)) for s in re.findall('\+.+?/[A-Z]+', token)]

if sys.version_info[0] < 3:
parsed = [[tuple(r.rsplit('/', 1)) for r in sublist] for sublist in result]
if join:
parsed = [[r for r in sublist] for sublist in result]
else:
parsed = [[tuple(r.rsplit('/', 1)) for r in sublist] for sublist in result]
else:
parsed = [_parse(i) for i in result[1:-1].split(', ')]
parsed = [_parse(i, join=join) for i in result[1:-1].split(', ')]

if flatten:
return sum(parsed, [])
Expand All @@ -49,17 +55,19 @@ class Komoran():
:param dicpath: The path of dictionary files. The KOMORAN system dictionary is loaded by default.
"""

def pos(self, phrase, flatten=True):
def pos(self, phrase, flatten=True, join=False):
"""POS tagger.

:param flatten: If False, preserves eojeols."""
:param flatten: If False, preserves eojeols.
:param join: If True, returns joined sets of morph and tag.
"""

if sys.version_info[0] < 3:
result = self.jki.analyzeMorphs(phrase, self.dicpath)
else:
result = self.jki.analyzeMorphs3(phrase, self.dicpath).toString()

return parse(result, flatten)
return parse(result, flatten, join=join)

def nouns(self, phrase):
"""Noun extractor."""
Expand Down
26 changes: 16 additions & 10 deletions konlpy/tag/_mecab.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,16 @@
'indexed'] # 인덱스 표현


def parse(result, allattrs=False):
def split(elem):
def parse(result, allattrs=False, join=False):
def split(elem, join=False):
if not elem: return ('', 'SY')
s, t = elem.split('\t')
return (s, t.split(',', 1)[0])
if join:
return s + '/' + t.split(',', 1)[0]
else:
return (s, t.split(',', 1)[0])

return [split(elem) for elem in result.splitlines()[:-1]]
return [split(elem, join=join) for elem in result.splitlines()[:-1]]


class Mecab():
Expand Down Expand Up @@ -64,26 +67,27 @@ class Mecab():
"""

# TODO: check whether flattened results equal non-flattened
def pos(self, phrase, flatten=True):
def pos(self, phrase, flatten=True, join=False):
"""POS tagger.

:param flatten: If False, preserves eojeols.
:param join: If True, returns joined sets of morph and tag.
"""

if sys.version_info[0] < 3:
phrase = phrase.encode('utf-8')
if flatten:
result = self.tagger.parse(phrase).decode('utf-8')
return parse(result)
return parse(result, join=join)
else:
return [parse(self.tagger.parse(eojeol).decode('utf-8'))
return [parse(self.tagger.parse(eojeol).decode('utf-8'), join=join)
for eojeol in phrase.split()]
else:
if flatten:
result = self.tagger.parse(phrase)
return parse(result)
return parse(result, join=join)
else:
return [parse(self.tagger.parse(eojeol).decode('utf-8'))
return [parse(self.tagger.parse(eojeol).decode('utf-8'), join=join)
for eojeol in phrase.split()]

def morphs(self, phrase):
Expand All @@ -102,4 +106,6 @@ def __init__(self, dicpath='/usr/local/lib/mecab/dic/mecab-ko-dic'):
self.tagger = Tagger('-d %s' % dicpath)
self.tagset = utils.read_json('%s/data/tagset/mecab.json' % utils.installpath)
except RuntimeError:
raise Exception('Invalid MeCab dictionary path: "%s"\nInput the correct path when initiializing class: "Mecab(\'/some/dic/path\')"' % dicpath)
raise Exception('The MeCab dictionary does not exist at "%s". Is the dictionary correctly installed?\nYou can also try entering the dictionary path when initializing the Mecab class: "Mecab(\'/some/dic/path\')"' % dicpath)
except NameError:
raise Exception('Install MeCab in order to use it: http://konlpy.org/en/latest/install/')
Loading