Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python framework improvements #12

Open
ghamerly opened this issue Jul 10, 2018 · 0 comments
Open

Python framework improvements #12

ghamerly opened this issue Jul 10, 2018 · 0 comments
Labels
enhancement New feature or request python-bindings

Comments

@ghamerly
Copy link
Owner

Suggestions from @cur33 . These can (and should) be broken into separate issues as is appropriate, and handled separately.

Proposed changes to core library:

    - Change Dataset::print to accept any ostream, default to cout, in order to
      simplify addition of __str__ method to py_dataset.cpp


Design decisions:

    - The method of struct initialization recommended by the Python/C API docs is
      to use C99 designated initializers, which are very useful but not available
      in C++. Instead, I followed this approach:

        - for structs with sufficiently few attributes, like PyModuleDef, I used
          a standard initialization approch by providing values for all members
          (except trailing ones)

        - for structs with many attributes, like PyTypeObject, I created
          init_*_members() procedures that would set only the needed members,
          which are called in the PyInit_fastkmeans() function

      However, having seen other extension modules such as those in numpy which
      simply provide 0 or NULL values for all unused struct members, I'm
      starting to think that might just be simpler and more maintainable than
      I'd originally thought.

    - The Kmeans::initialize method needs an unsigned short * as an argument, so rather
      than jumping through a lot of hoops to provide this simple functionality,
      I made a simple Assignment class that wraps an unsigned short *. An
      Assignment object is returned by the fastkmeans.init_centers* methods and
      then passed to the [SomeKmeansType].initialize method.

    - The Dataset type defined in py_dataset.cpp is technically a mapping type
      in order to allow the `a_dataset[i, j]` syntax. The Assignment type is a
      normal sequence type since it only needs a single value passed to the
      bracket operator.


Possible future enhancements:

    - It generally seems frowned upon to directly manipulate custom object
      types, e.g. NaiveObject instances. Rather, to access specific struct
      fields, one should provide macros or (usually inline) functions like
      `NaiveObject_SOME_FIELD(PyObject *)` that will return the fields desired.
      This way, a method that accepts a custom object instance does not need to
      convert the PyObject * to a custom object *, and does not need to know
      implementation details of the custom object type.
      
      This should be easy to change because I only had to directly manipulate
      Dataset and Assignment object pointers in one or two methods.

    - Originally, and primarily because Python does not natively have the concept
      of ``abstract base classes'', I followed the method of simply copying the
      file for one Kmeans subclass and altering it for each other subclass.
      However, I have since realized that the `NotImplementedError` could be
      used to simulate base classes and allow for inheritance. This would make
      the code base much more maintainable and easier to read. However, it might
      be overkill since the project is not huge.
      

    - Since the original structure of the project merely contained all the
      source files in the `src` directory, I kept it that way, but now that the
      number of files has increased quite a bit, it might make more sense to
      move the Python module files into a separate directory. This shouldn't
      create a problem for the Python module to find the header files of the C++
      classes and functions, because the setup.py file can just be altered to
      set the `include_dir` keyword argument to be something like '..' instead
      of '.'.

@ghamerly ghamerly added enhancement New feature or request python-bindings labels Jul 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python-bindings
Projects
None yet
Development

No branches or pull requests

1 participant