Skip to content

Commit

Permalink
v0.2.4, add head
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed May 8, 2016
1 parent 779b3bf commit 54706b5
Show file tree
Hide file tree
Showing 8 changed files with 157 additions and 36 deletions.
19 changes: 9 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# fakit - a cross-platform and efficient suit for FASTA/Q file manipulation
# fakit - a cross-platform and efficient toolkit for FASTA/Q file manipulation

Documents : [http://shenwei356.github.io/fakit](http://shenwei356.github.io/fakit)

Expand All @@ -21,7 +21,7 @@ running environment also make them less friendly to common users.

fakit is a cross-platform, efficient, and practical FASTA/Q manipulations tool
that is friendly for researchers to complete wide ranges of FASTA file processing.
The suite supports plain or gzip-compressed input and output
The toolkit supports plain or gzip-compressed input and output
from either standard stream or files,
therefore, it could be easily used in command-line pipe.

Expand All @@ -33,7 +33,7 @@ therefore, it could be easily used in command-line pipe.
(see [download](http://shenwei356.github.io/fakit/download/))
- **Fast** (see [benchmark](/#benchmark)),
**multiple-CPUs supported**.
- **Practical functions supported by 18 subcommands** (see subcommands and
- **Practical functions supported by 19 subcommands** (see subcommands and
[usage](http://shenwei356.github.io/fakit/usage/) )
- **Well documented** (detailed [usage](http://shenwei356.github.io/fakit/usage/)
and [benchmark](http://shenwei356.github.io/fakit/benchmark/) )
Expand Down Expand Up @@ -78,15 +78,13 @@ Rename head | Yes | Yes | -- | -- | Yes
executable binary files **for most popular operating systems** are freely available
in [release](https://github.com/shenwei356/fakit/releases) page.

Just [download](https://github.com/shenwei356/fakit/releases) gzip-compressed
executable file of your operating system, and uncompress it with `tar -zxvf *.tar.gz` command,
rename it to `fakit.exe` (Windows) or `fakit` (other operating systems) for convenience.
Just [download](https://github.com/shenwei356/fakit/releases) compressed
executable file of your operating system, and uncompress it with `tar -zxvf *.tar.gz` command.

You may need to add executable permision by `chmod a+x fakit`.

You can also add the directory of the executable file to environment variable
You can add the directory of the executable file to environment variable
`PATH`, so you can run `fakit` anywhere.


1. For windows, the simplest way is copy it to `C:\WINDOWS\system32`.

2. For Linux, type:
Expand All @@ -102,7 +100,7 @@ For Go developer, just one command:

## Subcommands

18 in total.
19 in total.

**Sequence and subsequence**

Expand All @@ -129,6 +127,7 @@ For Go developer, just one command:
- `common` find common sequences of multiple files by id/name/sequence
- `split` split sequences into files by id/seq region/size/parts
- `sample` sample sequences by number or proportion
- `head` print first N FASTA/Q records

**Edit**

Expand Down
2 changes: 1 addition & 1 deletion benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Datasets and results are described at [http://shenwei356.github.io/fakit/benchma
Softwares

1. [fakit](https://github.com/shenwei356/fakit). (Go).
Version [v0.2.2](https://github.com/shenwei356/fakit/releases/tag/v0.2.2).
Version [v0.2.4](https://github.com/shenwei356/fakit/releases/tag/v0.2.4).
1. [fasta_utilities](https://github.com/jimhester/fasta_utilities). (Perl).
Version [3dcc0bc](https://github.com/jimhester/fasta_utilities/tree/3dcc0bc6bf1e97839476221c26984b1789482579).
Lots of dependencies to install_.
Expand Down
2 changes: 1 addition & 1 deletion doc/docs/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
## Softwares

1. [fakit](https://github.com/shenwei356/fakit). (Go).
Version [v0.2.3](https://github.com/shenwei356/fakit/releases/tag/v0.2.3).
Version [v0.2.4](https://github.com/shenwei356/fakit/releases/tag/v0.2.4).
1. [fasta_utilities](https://github.com/jimhester/fasta_utilities). (Perl).
Version [3dcc0bc](https://github.com/jimhester/fasta_utilities/tree/3dcc0bc6bf1e97839476221c26984b1789482579).
Lots of dependencies to install.
Expand Down
22 changes: 8 additions & 14 deletions doc/docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,15 @@

## Current Version

- [fakit v0.2.3](https://github.com/shenwei356/fakit/releases/tag/v0.2.3)
- reduce memory occupation by avoid copy data when convert `string` to `[]byte`
- speedup reverse-complement by avoid repeatly calling functions

- [fakit v0.2.4](https://github.com/shenwei356/fakit/releases/tag/v0.2.4)
- add subcommand `head`

## Installation

`fakit` is implemented in [Golang](https://golang.org/) programming language,
executable binary files **for most popular operating system** are freely available
in [release](https://github.com/shenwei356/fakit/releases) page.

Just [download](https://github.com/shenwei356/fakit/releases) gzip-compressed
executable file of your operating system, and uncompress it with `tar -zxvf *.tar.gz` command,
rename it to `fakit.exe` (Windows) or `fakit` (other operating systems) for convenience.

You may need to add executable permision by `chmod a+x fakit`.
Just [download](https://github.com/shenwei356/fakit/releases) compressed
executable file of your operating system, and uncompress it with `tar -zxvf *.tar.gz` command.

You can also add the directory of the executable file to environment variable
You can add the directory of the executable file to environment variable
`PATH`, so you can run `fakit` anywhere.

1. For windows, the simplest way is copy it to `C:\WINDOWS\system32`.
Expand All @@ -41,6 +32,9 @@ For Go developer, just one command:

## Previous Versions

- [fakit v0.2.3](https://github.com/shenwei356/fakit/releases/tag/v0.2.3)
- reduce memory occupation by avoid copy data when convert `string` to `[]byte`
- speedup reverse-complement by avoid repeatly calling functions
- [fakit v0.2.2](https://github.com/shenwei356/fakit/releases/tag/v0.2.2)
- reduce memory occupation of subcommands that use FASTA index
- [fakit v0.2.1](https://github.com/shenwei356/fakit/releases/tag/v0.2.1)
Expand Down
57 changes: 51 additions & 6 deletions doc/docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ Usage
```
fakit -- a cross-platform and efficient suit for FASTA/Q file manipulation
Version: 0.2.3
Version: 0.2.4
Author: Wei Shen <[email protected]>
Expand All @@ -134,7 +134,8 @@ Available Commands:
faidx create FASTA index file
fq2fa covert FASTQ to FASTA
fx2tab covert FASTA/Q to tabular format (with length/GC content/GC skew)
grep search sequences by pattern(s) of name or sequence motifs
grep search sequences by pattern(s) of name or sequence motifss
head print first N FASTA/Q records
locate locate subsequences/motifs
rename rename duplicated IDs
replace replace name/sequence/by regular expression
Expand Down Expand Up @@ -960,6 +961,12 @@ Flags:

Examples

1. Sample by proportion

$ zcat hairpin.fa.gz | fakit sample -p 0.1 -o sample.fa.gz
[INFO] sample by proportion
[INFO] 2814 sequences outputed

1. Sample by number

$ zcat hairpin.fa.gz | fakit sample -n 1000 -o sample.fa.gz
Expand All @@ -968,11 +975,9 @@ Examples

***To reduce memory usage when spliting big file, we could use flag `--two-pass`***

1. Sample by proportion
***We can also use `fakit sample -p` followed with `fakit head -n`:***

$ zcat hairpin.fa.gz | fakit sample -p 0.1 -o sample.fa.gz
[INFO] sample by proportion
[INFO] 2814 sequences outputed
$ zcat hairpin.fa.gz | fakit sample -p 0.1 | fakit head -n 1000 -o sample.fa.gz

1. Set rand seed to reproduce the result

Expand All @@ -985,6 +990,46 @@ Examples
Note that when sampling on FASTQ files, make sure using same random seed by
flag `-s` (`--rand-seed`)

## head

Usage

```
print first N FASTA/Q records
Usage:
fakit head [flags]
Flags:
-n, --number int print first N FASTA/Q records (default 10)
```

Examples

1. FASTA

$ fakit head -n 1 hairpin.fa.gz
>cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop
UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAAC
UAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA

1. FASTQ

$ fakit head -n 1 reads_1.fq.gz
@HWI-D00523:240:HF3WGBCXX:1:1101:2574:2226 1:N:0:CTGTAG
TGAGGAATATTGGTCAATGGGCGCGAGCCTGAACCAGCCAAGTAGCGTGAAGGATGACTG
CCCTACGGGTTGTAAACTTCTTTTATAAAGGAATAAAGTGAGGCACGTGTGCCTTTTTGT
ATGTACTTTATGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGA
TCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGT
+
HIHIIIIIHIIHGHHIHHIIIIIIIIIIIIIIIHHIIIIIHHIHIIIIIGIHIIIIHHHH
HHGHIHIIIIIIIIIIIGHIIIIIGHIIIIHIIHIHHIIIIHIHHIIIIIIIGIIIIIII
HIIIIIGHIIIIHIIIH?DGHEEGHIIIIIIIIIIIHIIHIIIHHIIHIHHIHCHHIIHG
IHHHHHHH<GG?B@EHDE-BEHHHII5B@GHHF?CGEHHHDHIHIIH



## replace

Usage
Expand Down
2 changes: 1 addition & 1 deletion doc/site
Submodule site updated from 006cf5 to 24341f
83 changes: 83 additions & 0 deletions fakit/cmd/head.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
// Copyright © 2016 Wei Shen <[email protected]>
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.

package cmd

import (
"runtime"

"github.com/brentp/xopen"
"github.com/shenwei356/bio/seq"
"github.com/shenwei356/bio/seqio/fastx"
"github.com/spf13/cobra"
)

// headCmd represents the head command
var headCmd = &cobra.Command{
Use: "head",
Short: "print first N FASTA/Q records",
Long: `print first N FASTA/Q records
`,
Run: func(cmd *cobra.Command, args []string) {
config := getConfigs(cmd)
alphabet := config.Alphabet
idRegexp := config.IDRegexp
chunkSize := config.ChunkSize
bufferSize := config.BufferSize
lineWidth := config.LineWidth
outFile := config.OutFile
seq.AlphabetGuessSeqLenghtThreshold = config.AlphabetGuessSeqLength
seq.ValidateSeq = false
runtime.GOMAXPROCS(config.Threads)

number := getFlagPositiveInt(cmd, "number")

files := getFileList(args)

outfh, err := xopen.Wopen(outFile)
checkError(err)
defer outfh.Close()

i := 0
for _, file := range files {
fastxReader, err := fastx.NewReader(alphabet, file, bufferSize, chunkSize, idRegexp)
checkError(err)
for chunk := range fastxReader.Ch {
checkError(chunk.Err)

for _, record := range chunk.Data {
i++
record.FormatToWriter(outfh, lineWidth)

if number == i {
fastxReader.Cancel()
return
}
}
}
}
},
}

func init() {
RootCmd.AddCommand(headCmd)
headCmd.Flags().IntP("number", "n", 10, "print first N FASTA/Q records")
}
6 changes: 3 additions & 3 deletions fakit/cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,10 @@ import (
// RootCmd represents the base command when called without any subcommands
var RootCmd = &cobra.Command{
Use: "fakit",
Short: "a cross-platform and efficient suit for FASTA/Q file manipulation",
Long: `fakit -- a cross-platform and efficient suit for FASTA/Q file manipulation
Short: "a cross-platform and efficient toolkit for FASTA/Q file manipulation",
Long: `fakit -- a cross-platform and efficient toolkit for FASTA/Q file manipulation
Version: 0.2.3
Version: 0.2.4
Author: Wei Shen <[email protected]>
Expand Down

0 comments on commit 54706b5

Please sign in to comment.