gvcnt - Tumblr blog

gvcnt

- Never say no to the Panda!

10K posts

Bioinformatics, data science, python, R, statistics, papers observations, art, food, cute stuff

Don't wanna be here? Send us removal request.

Statistics

We looked inside some of the posts by gvcnt and here's what we found interesting.

Average Info

Notes Per Post

Likes Per Post

Reblog Per Post

Reply Per Post

Time Between Posts

2 months

Number of Posts By Type

Text

Photo

Last Seen Tumblr Blogs

passportcafeumeda

題名未設定

1 post

doctortreklock

Kobayashi Maru

33K posts

nubare

23K posts

sacreddecree-blog

Sacred Archer

1K posts

coastalhorrors

Apropos To Vestigiality

75K posts

Fun Fact

If you dial 1-866-584-6757, you can leave an audio post for your followers.

gvcnt · 8 years ago

Text

Little about anaconda

Anaconda enviroment for specific python version

IPython or jupyter notebook doesn't support python 3.6 yet, so here is a command to build an enviroment with python 3.5 and use the notebooks

conda create -n python35 python=3.5 ipython

source activate python35

python -m ipykernel install --user --name python35 --display-name "Python 3.5"

credits:

create an enviroment

import error

pykernel

#python

10 notes · View notes

gvcnt · 8 years ago

Text

How create your bed file manually

Using UCSC table browser

About bed format (UCSC link, bedtools link)

' BED files are used to define capture regions in the assembly and can be generated by hand (table browser) or automatically (plastid). These files are basically tab-separated text files whose extension has been changed to .bed.

This post use informations from biostar post

Download a bed file for the canonical transcripts (normally used as intervals for variant calling)

Assembly: Feb. 2009 (GRCh 37/hg19);

Track: UCSC Genes;

Table: knownCanonical;

If you want specific genes click from identifiers (names/accessions) click in paste or upload (eg.: BRCA1, BRCA2, EGFR, DMD, CFTR), to select all genes just ignore this subject;

Output format: selected fields from primary and related tables

select get output;

Select fields from hg19.knowCanonical: chrom, chromStart, chromEnd, transcript;

Select fields from hg19.kgXref: geneSymbol, refseq;

Click in get output.

Now you have the canonical transcript and its refseq that can be used to filter the positions in exon level

Download a bed file for exons in specific genes or all genes(normally used for bam coverage detection)

Assembly: `Feb. 2009 (GRCh 37/hg19);

Track: UCSC Genes or RefSeq Genes (preferable);

Table: knownGene (UCSC) or refGene (RefSeq);

If you want specific genes click from identifiers (names/accessions) click in paste or upload (eg.: BRCA1, BRCA2, EGFR, DMD, CFTR), to select all genes just ignore this subject;

Output format: BED - browser extensible data;

Select get output;

Select Coding Exons (for exome sequencing for example, but you can choose Exon plus splicing regions or other fields)

It will give you all cds exons present in all possible transcripts for the genes that you selected or all genes

Now you can filter all the exons from the canonical genes that you dowloaded in the first try

Create bed files with python using plastid lib

Plastid example

Installation with conda:

conda install -c bioconda plastid

#ucsc #bed #bioinformatics

2 notes · View notes

gvcnt · 8 years ago

Text

Paper annotations #1

tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine

Intro

Each SNP record in dbSNP (Database for Short Genetic Variations) is assigned a stable and unique variant accession identifier (RSID), which is linked to aggregated information (associated gene, functional consequences and allele frequency).

NHGRI-EBI GWAS Catalog is a collection of genome-wide set of genetic variants in different individuals associated with a trait [1].

For genomic variant information in cancer, COSMIC contains expert-curated data of somatic mutations [2].

CIViC is an open-acess, open-source knowledgebase for expert-crowdsourced of clinical interpretation of variants in cancer [3].

DisGeNET is a recent platform integrating information on gene-variation-disease associations from several public data sources and the literature [4].

"The first version of tmVar is a high-performance software for external evaluations comparing formats in the PubMed article and re-writing them in HGVS formats (e.g. p.Pro12Ala). However, HGVS names can still be ambiguous: one can often be linked to multiple RSIDs (e.g. rs767209585 and rs773973301 are both associated with p.Pro12Ala). Indeed, on average, one protein mutation in HGVS name maps to more than ten RSIDs".

Why not use HGVS genomic nomeclature? HGVS isn't just the protein nomeclature, it considers the gene, genomic location and protein location.

"in this work we first extended tmVar to automatically normalize the variant mentions and map them to standard dbSNP RS numbers."

It includes variants not present in dbSNP that could be considered rares?

Using the human gold standard they compare tmVar 2.0 against SETH, another automated tool to text-mining mutations [5] and had nearly 90% in F-measures.

about F1

"Our analysis includes: (i) comparing the text-mined PMID-RSID pairs with annotated dbSNP data, (ii) analyzing variants curated in ClinVar and (iii) discovering novel connections between variants, gene and diseases"

"Our investigation revealed 161 178 missing RSID-PMID links in dbSNP and 41 889 RSIDs not found in ClinVar. Moreover, our results also include over 120 000 rare variants (MAF 0.01) in nearly 4000 genes across the genome which are presumed to be deleterious and are not frequently found in the general population."

MAF isn't enough to considered a variant patogenic, maybe more information had been considered

Materials and methods

"tmVar applies ML approach to tag mutation mentions in free txt, detecting terms that represent variants of multiple types (SNV, insertion, deletion, etc) and sequence context (genomic, transcript and protein) and returns its results in HGVS form".

"Before we performed normalization, we first built a comprehensive lexicon containing all possible mappings between variant mentions and RSIDs, harvested from three difference sources: dbSNP, Clinvar and PubMed".

Two main strategies were used to find corresponding RSID: pattern matching '[Gene/Protein] ([DNAMutation] with [RSID])' and a list of candidate RSIDs for search using lexicon. For disambiguation, they use global information in the entire article and/or variant-associated gene information, also using GNormPlus an end-to-end and open source system that handles both gene mention and identifier detection.

The frequency data used as population frequency come from 1000 Genomes Phase 3, Exac, NHLBI GO ESP and gnomAD.

Results

"The tmVar RS results (62452 RS numbers in 9782 genes) were categorized using dbSNP and ClinVar annontations along multiple facets, including functional consequences (syn, non-syn, etc) based on RefSeq mRNA annotations, minor allele frequency (MAF), and clinical significance in order to prioritize their biological significance and assess their clinical impact".

Discussion

According to the table 4, OSIRIS had better results than tmVar2. So, OSIRIS could be used with tmVar2.

"our results could be used by other computational methods in bioinformatics research such as connecting genotypes with phenotypes and/or modeling gene-disease-variant relations [DisGeNeT][6]"

#annotation #study #bioinformatics #papers

2 notes · View notes

gvcnt · 10 years ago

Photo

Math is beautiful. Gorgeous!

926 notes · View notes

gvcnt · 10 years ago

Photo

Plasmids, DNA art, Science art, watercolor print, science illustration, microbiology, bacteria, microbes, biology art, DNA, virus, giclee

16 notes · View notes

gvcnt · 10 years ago

Photo

Life Nr.1by c4ligo

502 notes · View notes

gvcnt · 10 years ago

Photo

I drew a big ol’ Steven Universe scramble, I really like how it turned out! If you come see me at Otakon I’m gonna be selling it as a print.

Steven Universe is a good show and I’m super glad kids have it! Some of the episodes are hit or miss for me ~*~AS A CRITIC~*~, but that’s every show, and when the writing is on point it’s the best cartoon airing (My favorite episodes have definitely been The Test and Keystone Motel, BUT, THERE’S A LOT OF GOOD ONES…). I hope we get more of it and more things like it down the line: longform serial storytelling with a unique world and aesthetic. OKAY, that’s all, see you later.

P.S. read my webcomic Paranatural if you haven’t already :^)

85K notes · View notes

gvcnt · 10 years ago

Photo