Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10X v3 Unable to parse UMI #105

Open
namitc opened this issue Apr 19, 2020 · 4 comments
Open

10X v3 Unable to parse UMI #105

namitc opened this issue Apr 19, 2020 · 4 comments
Labels

Comments

@namitc
Copy link

namitc commented Apr 19, 2020

HI,

I'm trying to run the dropest pipline on a 10X dataset but it's giving me a "Unable to parse UMI in " error. Below is the command, BAM and the config file that I'm using.

./dropest -m -V -b -o sample3_dropest -g ~/rnaseq/refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf -L eiEIBA -c configs/10x.xml ~/rnaseq/sample3/outs/possorted_genome_bam.bam

A00524:70:HHF7HDRXX:2:1234:20238:34037 256 1 12067 0 91M * 0 0 GGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCA FF:FFFFFF:FFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF NH:i:7 HI:i:2 AS:i:89 nM:i:0 RE:A:I li:i:0 BC:Z:TTGGCATA QT:Z:F:F,,,,F CR:Z:TACATTCTGCAACACT CY:Z:FFFFFFFFFFFFFFFF UR:Z:GGGGACACTCAC UY:Z:FFFFFFFFFFFF UB:Z:GGGGACACTCAC RG:Z:sample3:0:1:HHF7HDRXX:2
What am I doing wrong?
10x.txt

@lixin4306ren
Copy link

Got the same problem. I guess they didn't provide config.xml file for 10x V3.

@chlee-tabin
Copy link

JFYI, I am using the following 10x_v3.xml to process all my 10x v3 datasets which worked fine for me. I obtained it somewhere in the issue threads of this github.

<config>
    <!-- droptag -->
    <TagsSearch>
        <protocol>10x</protocol>
        <BarcodesSearch>
            <barcode1_length>8</barcode1_length>
            <barcode2_length>16</barcode2_length>
            <umi_length>12</umi_length>
            <r1_rc_length>0</r1_rc_length>
        </BarcodesSearch>

        <Processing>
            <min_align_length>10</min_align_length>
            <reads_per_out_file>10000000</reads_per_out_file>
            <poly_a_tail>AAAAAAAA</poly_a_tail>
        </Processing>
    </TagsSearch>

    <!-- dropest -->
    <Estimation>
        <Merge>
            <barcodes_file>../data/barcodes/10x_aug_2016_split</barcodes_file>
            <barcodes_type>const</barcodes_type>
            <min_merge_fraction>0.2</min_merge_fraction>
            <max_cb_merge_edit_distance>2</max_cb_merge_edit_distance>
            <max_umi_merge_edit_distance>1</max_umi_merge_edit_distance>
            <min_genes_after_merge>100</min_genes_after_merge>
            <min_genes_before_merge>20</min_genes_before_merge>
        </Merge>

        <PreciseMerge>
            <max_merge_prob>1e-5</max_merge_prob>
            <max_real_merge_prob>1e-7</max_real_merge_prob>
        </PreciseMerge>
        <BamTags> <!-- Optional. Tags, which are used to parse .bam file (-f option) or to print tagged .bam file (-b or -F options). Default values correspond to 10x protocol. -->
            <cb>CB</cb> <!-- Cell barcode. Default: CB. -->
            <cb_raw>CR</cb_raw> <!-- Cell barcode raw. Used only for bam output. Default: CR. -->
            <umi>UB</umi> <!-- UMI. Default: UB. -->
            <umi_raw>UR</umi_raw> <!-- UMI raw. Used only for bam output. Default: UR. -->
            <gene>GX</gene> <!-- Gene id. Default: GX. -->
            <cb_quality>CQ</cb_quality> <!-- Cell barcode quality. Default: CQ. -->
            <umi_quality>UQ</umi_quality> <!-- UMI quality. Default: UQ. -->
            <Type> <!-- Tag, which contain type of read. If not specified, all reads with gene info are considered as exonic -->
                <tag>XF</tag>
                <intronic>INTRONIC</intronic> <!-- Value corresponding to intronic reads. Default value for bam output is INTRONIC. -->
                <intergenic>INTERGENIC</intergenic> <!-- Value corresponding to intergenic reads. All reads, which has gene id and intergenic mark are considered as intergenic. Default value for bam output is INTERGENIC. -->
                <exonic>EXONIC</exonic> <!-- Value corresponding to exonic reads. If not specified, all reads with other tags, which has gene id are considered as exonic. Default value for bam output is EXONIC. -->
            </Type>
        </BamTags>
    </Estimation>
</config>

@chlee-tabin
Copy link

Looks like the only difference between the 10x.xml and 10x_v3.xml is to change the following:

diff 10x.xml 10x_v3.xml
8c8
<             <umi_length>10</umi_length>
---
>             <umi_length>12</umi_length>

@dalhoomist
Copy link

I confirm I am having this issue again. I am not sure what changed. Previously, correcting the umi_length worked, now, I am getting the same error despite the correction. Any pointers on how to fix would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants