Skip to content

Commit

Permalink
Bump version.
Browse files Browse the repository at this point in the history
  • Loading branch information
jondegenhardt committed Oct 7, 2018
1 parent 1e1bd00 commit 4532610
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@ See the [tsv-summarize reference](docs/ToolReference.md#tsv-summarize-reference)

* Weighted line order randomization - This extends the previous method to weighted random sampling by the use of a weight taken from each line. The weight field is specified with the `-w|--weight-field` option.

* Sampling with replacement - All lines are read into memory, then lines are selected one at a time at random and output. Lines can be output multiple times. Output continues until `-n|--num` samples have been output.

* Bernoulli sampling - Sampling can be done in streaming mode by using the `-r|rate` option. This specifies the desired portion of lines that should be included in the sample. e.g. `-r 0.1` specifies that 10% of lines should be included in the sample. In this mode lines are read one at a time, a random selection choice made, and those lines selected are immediately output. All lines have an equal likelihood of being output.

* Distinct sampling - This is another streaming mode form of sampling. However, instead of each line being subject to an independent selection choice, lines are selected based on a key contained in each line. A portion of keys are randomly selected for output, and every line containing a selected key is included in the output. Consider a query log with records consisting of <user, query, clicked-url> triples. It may be desirable to sample records for one percent of the users, but include all records for the selected users. Distinct sampling is specified using the `-k|--key-fields` and `-r|--rate` options.
Expand Down
2 changes: 1 addition & 1 deletion common/src/tsvutils_version.d
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
enum string tsvutilsVersion = "v1.2.1";
enum string tsvutilsVersion = "v1.2.2";

string tsvutilsVersionNotice (string toolName)
{
Expand Down

0 comments on commit 4532610

Please sign in to comment.