diff --git a/.vscode/settings.json b/.vscode/settings.json index 8da9619..8587de7 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -47,9 +47,11 @@ "repotoken", "shogo", "sidewalk", + "snivilised", "staticcheck", "structcheck", "stylecheck", + "Taskfile", "thelper", "tparallel", "typecheck", diff --git a/README.md b/README.md index 63591a5..a2c1829 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# ๐ŸŒŸ pants: ___Go template for library modules___ +# ๐Ÿœ pants: ___ants based worker pool___ [![A B](https://img.shields.io/badge/branching-commonflow-informational?style=flat)](https://commonflow.org) [![A B](https://img.shields.io/badge/merge-rebase-informational?style=flat)](https://git-scm.com/book/en/v2/Git-Branching-Rebasing) @@ -26,135 +26,155 @@

- + go.dev

-## ๐Ÿ”ฐ Introduction +## ๐Ÿ“š Introduction -This project is a template to aid in the startup of Go library module projects. +This module provides worker pool functionality based upon ___ants___. Please refer to the documentation already available at [๐Ÿœ๐Ÿœ๐Ÿœ ___ants___](https://github.com/panjf2000/ants). The documentation here focuses on the functionality provided that augments the underlying ___ants___ implementation. -## ๐Ÿ“š Usage +Included in this repo are some executable examples that help explain how the pool works and demonstrates some key characteristics that will aid in understanding of how to correctly use this package. For more detailed explanation of the _Options_, the reader is encouraged to read the ants documentation. -## ๐ŸŽ€ Features +## ๐ŸŽ€ Additional Features -

- - -

- -+ unit testing with [Ginkgo](https://onsi.github.io/ginkgo/)/[Gomega](https://onsi.github.io/gomega/) -+ implemented with [๐Ÿ Cobra](https://cobra.dev/) cli framework, assisted by [๐Ÿฒ Cobrass](https://github.com/snivilised/cobrass) -+ i18n with [go-i18n](https://github.com/nicksnyder/go-i18n) -+ linting configuration and pre-commit hooks, (see: [linting-golang](https://freshman.tech/linting-golang/)). - -## ๐Ÿ”จ Developer Info +The ___ants___ implementation was chosen because it has already proven itself in production, having a wide install base and addresses scalability and reliability issues. However after review of its features, it was discovered that there were a few supplementary features that it did not possess including the following: -By using this template, there is no need to use the cobra-cli to scaffold your application as this has been done already. It should be noted that the structure that is generated by the cobra-cli has been significantly changed in this template, mainly to remove use of the __init()__ function and to minimise use of package level global variables. For a rationale, see [go-without-package-scoped-variables](https://dave.cheney.net/2017/06/11/go-without-package-scoped-variables). ++ __no top level client defined context__: this means there is no way for the client to cancel an operation using idiomatic ___Go___ techniques. ++ __no job return error__: that is to say, whenever a job is executed, there is no notification of wether it executed successfully or not. Rather, it has been implemented on a _fire and forget_ basis. ++ __no job output__: similar to the lack of an error result for each job, there is no way for the result of an operation to be collated; eg the client may request that the pool perform some task that contains a result. In the ants implementation, there is no native way to return an output for each job. ++ __no input channel__: the client needs direct access to the pool instance in order to submit tasks with a function call. However, there are benefits including but not limited to reduced coupling. With an input channel, the client can pass this channel to another entity capable of generating a workload without having direct access to the pool itself, all they need to to do is simply write to the channel. -### ๐Ÿ“ Checklist of required changes +## ๐Ÿ’ซ ManifoldFuncPool -The following is list of actions that must be performed before using this template. Most of the changes concern changing the name `astrolib` to the name of the new application. As the template is instantiated from github, the new name will automatically replace the top level directory name, that being ___astrolib___. +### ๐Ÿš€ Quick start -โž• The following descriptions use owner name ___pandora___ and repo name ___maestro___ as an example. That is to say the client has instantiated ___astrolib___ template into github at url _github.com/pandora/maestro_ +#### ๐Ÿ“Œ Create pool with output -#### ๐Ÿค– Automated changes +```go + pool, err := pants.NewManifoldFuncPool( + ctx, func(input int) (int, error) { + // client implementation; output = something -Automated via `automate-checklist.sh` script. When the user instantiates the repo, a github actions workflow is executed which applies changes to the clients repo automatically. The following description describes the changes that are applied on the user's behalf and the workflow is automatically deleted. However, there are other changes that should be made. These compose the manual checklist and should be heeded by the user. + return output, nil + }, &wg, + pants.WithSize(PoolSize), + pants.WithOutput(OutputChSize, CheckCloseInterval, TimeoutOnSend), + ) +``` -##### โœ… Rename import statements +Creates an _int_ based manifold worker pool. The ___ManifoldFuncPool___ is a generic whose type parameters represents the Input type _I_ and the output type _O_. In this example, the input and output types are both _int_ as denoted by the signature of the manifold function: -+ `rename import paths`: global search and replace ___snivilised/pants___ to ___pandora/maestro___ +> func(input int) (int, error) -##### โœ… Identifiers +NB: It is not mandatory to require workers to send outputs. If the ___WithOutput___ option is not specified, then an output will still occur, but will be ignored. -+ `change astrolibTemplData`: perform a refactor rename (_Rename Symbol_) to ___maestroTemplData___ +#### ๐Ÿ“Œ Submit work -##### โœ… Global search replace astrolib to maestro +There are 2 ways to submit work to the pool, either directly or by input channel -Will take care of the following required changes: ++ direct(Post): -+ `change module name`: update the module name inside the .mod file in the root directory -+ `change ApplicationName`: modify to reflect the new application name. This application name is incorporated into the name of any translation files to be loaded. -+ `update BINARY_NAME`: Inside _Taskfile.yml_, change the value of ___BINARY_NAME___ to the name of the client application. -+ `update github action workflows`: change the name of the workflows in the .yaml files to replace ___astrolib___ to ___Maestro___ (note the change of case, if this is important). +```go + pool.Post(ctx, 42) -##### โœ… Localisation/Internationalisation + ... + pool.Conclude(ctx) +``` -+ `change the names of the translation files`: eg change ___astrolib.active.en-GB.json___ to ___maestro.active.en-GB.json___ +Sends a job to the pool with int based input value 42. Typically, the Post would be issued multiple times as needs demands. At some point we are done submitting work. The end of the workload needs to be communicated to the pool. This is the purpose of invoking Conclude. -##### โœ… Miscellaneous automated changes ++ via input channel(Source): -+ `reset version files`: this is optional because the release process automatically updates the version number according to the tag specified by the user, but will initially contain the version number which reflects the current value of astrolib at the time the client project is instantiated. -+ `change SOURCE_ID`: to "github.com/pandora/maestro" +```go + inputCh := pool.Source(ctx, wg) + inputCh <- 42 -#### ๐Ÿ– Manual changes + ... + close(inputCh) +``` -The following documents manual changes required. Manual checklist: +Sends a job to the pool with int based input value 42, via the input channel. At the end of the workload, all we need to do is close the channel; we do not need to invoke ___Conclude___ explicitly as this is done automatically on our behalf as a result of the channel closure. -##### โ˜‘๏ธ Structural changes +#### ๐Ÿ“Œ Consume outputs -+ `github actions workflow`: If the client does not to use github actions workflow automation, then these files ([ci-workflow](.github/workflows/ci-workflow.yml), [release-workflow](.github/workflows/release-workflow.yml), [.goreleaser.yaml](./.goreleaser.yaml)), should be deleted. +Outputs can be consumed simply by invoking ___pool.Observe___ which returns a channel: -+ `rename the widget command`: rename __widget-cmd.go__ and its associated test __widget_test.go__ to whatever is the first command to be implemented in the application. The widget command can serve as a template as to how to define a new command, without having to start from scratch. It will be easier for the user to modify an existing command, so just perform a case sensitive search and replace for ___widget/Widget___ and replace with ___Foo/foo___ where foo represents the new command to be created. +```go + select { + case output := <-pool.Observe(): + fmt.Printf("๐Ÿ’ payload: '%v', id: '%v', seq: '%v' (e: '%v')\n", + output.Payload, output.ID, output.SequenceNo, output.Error, + ) + case <-ctx.Done(): + return + } +``` -+ `review bootstrap.go`: this will need to be modified to invoke creation of any custom commands. The `execute` method of __bootstrap__ should be modified to invoke command builder. Refer to the `widget` command to see how this is done. +Each output is represented by a ___JobOutput___ which contains a _Payload_ field representing the job's result and some supplementary meta data fields, including a sequence number and a job ID. -#### โ˜‘๏ธ Github changes +It is possible to range over the output channel as illustrated: -Unfortunately, github doesn't copy over the template project's settings to the client project, so these changes must be made manually: +```go + for output := range pool.Observe() { + fmt.Printf("๐Ÿ’ payload: '%v', id: '%v', seq: '%v' (e: '%v')\n", + output.Payload, output.ID, output.SequenceNo, output.Error, + ) + } +``` -Under `Protect matching branches` +This will work in success cases, but what happens if a worker send timeout occurs? The worker will send a cancellation request and the context will be cancelled as a result. But since the range operator is not pre-empted as a result of this cancellation, it will continue to block, waiting for either more content or channel closure. If the main Go routine is blocking on a WaitGroup, which it almost certainly should be, the program will deadlock on the wait. For this reason, it is recommended to use a select statement as shown. -+ `Require a pull request before merging` โœ… _ENABLE_ -+ `Require linear history` โœ… _ENABLE_ -+ `Do not allow bypassing the above settings` โœ… _ENABLE_ +#### ๐Ÿ“Œ Monitor the cancellation channel -Of course, its up to the user what settings they use in their repo, these are just recommended as a matter of good practice. +Currently, the only reason for a worker to request a cancellation is that it is unable to send an output. Any request cancellation must be addressed by the client, this means invoking the cancel function associated with the context. -#### โ˜‘๏ธ Code coverage +The client can delegate this responsibility to a pre defined function in pants: ___StartCancellationMonitor___: -+ `coveralls.io`: add maestro project +```go + if cc := pool.CancelCh(); cc != nil { + pants.StartCancellationMonitor(ctx, cancel, &wg, cc, func() { + fmt.Print("๐Ÿ”ด cancellation received, cancelling...\n") + }) + } +``` -#### โ˜‘๏ธ Miscellaneous changes +Note, the client is able to pass in a callback function which is invoked, if cancellation occurs. Also, note that there is no need to increment the wait group as that is done internally. -+ `replace README content` -+ `update email address in copyright statement`: The __root.go__ file contains a placeholder for an email address, update this comment accordingly. -+ `create .env file`: Add any appropriate secrets to a newly created .env in the root directory and to enable the __deploy__ task to work, define a __DEPLOY_TO__ entry that defines where builds should be deployed to for testing -+ `install pre-commit hooks`: just run ___pre-commit install___ -+ `update translation file`: Inside _Taskfile.yml_, add support for loading any translations that the app will support. By default, it deploys a translation file for __en-US__ so this needs to be updated as appropriate. +## ๐Ÿ“ Design -### ๐ŸŒ l10n Translations +In designing the augmented functionality, it was discovered that there could conceivably be more than 1 abstraction, depending on the client's needs. From the perspective of ___snivilised___ projects, the key requirement was to have a pool that could execute jobs and for each one, return an error code and an output. The name given to this implementation is the ___ManifoldFuncPool___. -This template has been setup to support localisation. The default language is `en-GB` with support for `en-US`. There is a translation file for `en-US` defined as __src/i18n/deploy/astrolib.active.en-US.json__. This is the initial translation for `en-US` that should be deployed with the app. +In ___ants___, there are 2 main implementations of worker pool, ___Pool___ or ___PoolFunc___. -Make sure that the go-i18n package has been installed so that it can be invoked as cli, see [go-i18n](https://github.com/nicksnyder/go-i18n) for installation instructions. ++ ___Pool___: accepts new jobs represented by a function. Each function can implement any logic, so the pool is in fact able to execute a stream of heterogenous tasks. ++ ___PoolFunc___: the pool is created with a pre-defined function and accepts new jobs specified as an input to this pool function. So every job the pool executes, runs the same functionality but with a different input. -To maintain localisation of the application, the user must take care to implement all steps to ensure translate-ability of all user facing messages. Whenever there is a need to add/change user facing messages including error messages, to maintain this state, the user must: +___ManifoldFuncPool___ is based on the ___PoolFunc___ implementation. However, ___PoolFunc___ does not return either an output or an error, ___ManifoldFuncPool___ allows for this behaviour by allowing the client to define a function (_manifold function_) whose signature allows for an input of a specific type, along with an output and error. ___ManifoldFuncPool___ therefore provides a mapping from the _manifold function_ to the ants function (_PoolFunc_). -+ define template struct (__xxxTemplData__) in __src/i18n/messages.go__ and corresponding __Message()__ method. All messages are defined here in the same location, simplifying the message extraction process as all extractable strings occur at the same place. Please see [go-i18n](https://github.com/nicksnyder/go-i18n) for all translation/pluralisation options and other regional sensitive content. +As previously mentioned, ___pants___ could provide many more worker pool abstractions, eg there could be a ___ManifoldTaskPool___ based upon the ___Pool___ implementation. However, ___ManifoldTaskPool___ is not currently defined as there is no established need for one. Similarly, pants could provide a ___PoolFunc___ based pool whose client function only returns an error. Future versions of ___pants___ could provide these alternative implementations if such a need arises. -For more detailed workflow instructions relating to i18n, please see [i18n README](./resources/doc/i18n-README.md) +### Context -### ๐Ÿงช Quick Test +The ___NewManifoldFuncPool___ constructor function accepts a context, that works in exactly the way one would expect. Any internal Go routine works with this context. If the client cancels this context, then this will be propagated to all child Go routines including the workers in the pool. -To check the app is working (as opposed to running the unit tests), build and deploy: +### Cancellation -> task tbd +The need to send output back to the client for each job presents us with an additional problem. Once the need for output has been declared via use of the ___WithOutput___ option, there is an obligation on the client to consume it. Failure to consume, will result in the eventual blockage of the entire worker pool; the pool will get to a state where all workers are blocking on their attempt to send the output, the output buffer is full and new incoming requests can no longer be dispatched to workers, as they are all busy, resulting in deadlock. This may just be a programming error, but it would be undesirable for the pool to simply end up in deadlock. -(which performs a test, build then deploy) +This has been alleviated by the use of a timeout mechanism. The ___WithOutput___ option takes a timeout parameter defined as a ___time.Duration___. When the worker timeouts out attempting to send the output, it will then send a cancellation request back to the client via a separate cancellation channel (obtained by invoking ___ManifoldFuncPool.CancelCh___). -NB: the `deploy` task has been set up for windows by default, but can be changed at will. +Since context cancellation should only be initiated by the client, the onus is on them to cancel the context. However, the way in which this would be done amounts to some boilerplate code, so ___pants___ also provides this as a function ___StartCancellationMonitor___, which starts a Go routine that monitors the cancellation channel for requests and on seeing one, cancels the associated context. This results in all child Go routines abandoning their work when they are able and exiting gracefully. This means that we can avoid the deadlock and leaked Go routines. -Check that the executable and the US language file __maestro.active.en-US.json__ have both been deployed. Then invoke the widget command with something like +### Conclude -> maestro widget -p "P?\" -t 30 +The pool needs to close the output channel so the consumer knows to exit it's read loop, but it can only do so once its clear there are no more outstanding jobs to complete and all workers are idle. We can't close the channel prematurely as that would result in a panic when a worker attempts to send the output. _Conclude_ signifies to the worker pool that no more work will be submitted. When submitting to the pool directly using the Post method, the client must call this method. Failure to do so will result in a pool that never ends. When the client elects to use an input channel, by invoking Source, then Conclude will be called automatically as long as the input channel has been closed. Failure to close the channel will again result in a never ending worker pool. -Optionally, the user can also specify the ___directory___ flag: +___WithOutput___ is used to customise aspects of the output and typically, the use of the ___WithOutput___ operator looks like this: -> maestro widget -p "P?\" -t 30 -d foo-bar.txt +> pants.WithOutput(OutputChSize, CheckCloseInterval, TimeoutOnSend) -... where ___foo-bar.txt___ should be replaced with a file that actually exists. +___OutputChSize___: defines the size of the output channel -This assumes that the the project name is `maestro`, change as appropriate. +___CheckCloseInterval___: is internally required by ___pool.Conclude___. To counter the problem described above, ___Conclude___ needs to check if its safe to close the output channel, periodically, which is implemented within another Go routine. ___CheckCloseInterval___ denotes the amount of time it will wait before checking again. -Since the `widget` command uses `Cobrass` option validation to check that the file specified exists, the app will fail if the file does not exist. This serves as an example of how to implement option validation with `Cobrass`. +___TimeoutOnSend___: denotes the timeout used when the pool attempts to send to the output channel.