Bioinformatics and Biostatistics

The Bioinformatics and Biostatistics Service is organized into four main Areas:

Next Generation Sequencing data analysis.
Sequence analysis and structure prediction.
Biostatistical analysis.
Support for accessing the CSIC´s scientific computational resources to researchers at CIB.

Next Generation Sequencing data analysis

Bioinformatics Service gives scientific and technical support to research groups at CIB that have needs in Next Generation Sequencing (or high-throughput sequencing technologies) data analysis, in any of its applications:

Analysis of gene expression data generated using Next Generation Sequencing (RNA-Seq).

- Alignments to a reference genome/transcriptome and detection of splicing sites.

- Transcripts assembly and isoform expression quantification.

- Analysis of alternative splicing.

- Gene expression quantification by high throughput sequencing.

- Differential expression.

- Detection of fusion transcripts.

Analysis of non-coding RNAs and small RNAs (small RNA-Seq).

- Search in specific databases (microRNAs, piRNAs, endo-siRNAs).

- Identification and classification of non-coding RNAs.

- Quantification of small RNAs.

- Sequence annotation and localization in the genome.

Genome-wide identification of protein binding sites by chromatin immunoprecipitation (ChIP-Seq, ATAC-seq).

- Detection of peaks (genome binding/occupancy profiling).

- Relative enrichment of ChIP-seq peaks that identify the location of DNA binding sites to a protein of interest and annotation (gene, promoter...).

- Analysis of motifs in ChIP-seq data and creation of sequence logos representing motifs.

- Genome-wide motif discovery.

Identification of SNPs and indels genome-wide or in regions of interest (DNA-Seq).

- Detection of variants: mutations, polymorphisms and insertions/deletions genome-wide.

- Effects on variants on genes.

- Detecting insertion sites of exogenous sequences (viral sequences, plasmids...).

Small genomes assembly from high-throughput sequencing data.

- De novo assembly: when a reference genome is not available.

- Mapping: assembling reads against a reference sequence/genome.

- Transcriptome assembly: de novo or reference-based from RNA-seq reads.

High-throughput sequencing of complex microbial samples (Metagenomics).

- Localization of sequences in huge databases and identification by BLAST.

Microarray data analysis and functional enrichment.

Sequence Analysis and Molecular Modeling

This service provides support for the following topics:

Analysis of biological sequences.
Implementation of AI algorithms: Machine Learning, Deep Learning.
Structure prediction.
Functional and evolutionary analysis from protein and genetic sequences.
Protein functional prediction.
Reconstructing and analysing phylogenetic relationships from biological sequence.

Biostatistics

The service gives statistical analysis support to research groups at demand.

We recommend the experimental design that fits the purpose of the experiment, by setting the basic concepts of the experiment (experimental unit, samples, replication, sources of variation, etc.) from which the statistical analysis of the data will be made.
According to the statistical analysis used, we help to interpret the results: theoretical support discriminating “useful information” from the statistical software outputs.
We help in the comprehension of the statistical analysis with which the researchers are not familiar.
Preferred statistical software: R or SAS/STAT^® packages for either linux or Windows computing platforms.

Support to access the CSIC´s scientific computational resources for researchers at CIB

Accessing the CESGA Supercomputing Facility.
Accessing the Drago cluster facility of CSIC

To apply for support, please fill the online form.

(Just for troubles with the above form, please fill this one instead.)

For further information please contact: bioinformatica email

Current fares

>

Person in Charge

Mario García Lacoba

Technical Staff
Guillermo Padilla Alonso
Mario García Lacoba
Ruth Matesanz Rodríguez

Next Generation Sequencing data analysis

Methods

Quality assessment of NGS reads: general statistics of sequence data, distribution of nucleotides along the read position, analysis of possible contaminants.
Script programming on demand to support the researcher requirements: Perl, shell scripts, R.
Ad hoc database creation (for whatever purpose, alignments, searches, etc.).
In silico creation of modified genomes (viral and plasmids insertions) based on previously published genomes.
Searches of nucleotide and peptidic patterns in biological sequences files.
Formatting and parsing (structural analysis) of huge size files.
Computation and processing of huge files.
Storage of raw data and processed data files in our servers.
Plots and graphs: expression quantification, sequencing coverage, gene models, mRNA isoforms, dispersion of data, principal component analyses, etc.
Manipulation of biological sequences in any of their presentations (DNA, RNA, protein).
Support in experimental design for high-throughput sequencing.

Available software

We are always open to find and test new software, as our services are highly customized to suit researchers needs, and dedicated to helping CIB researchers understand the biological processes underlying their experiments by applying computational and numerical methodologies to the study of large-scale experimental datasets.

The following computational methods/tools are provided (among others):

Short reads sequence aligners:

Bowtie1 (http://bowtie-bio.sourceforge.net/) Bowtie 2 (http://bowtie-bio.sourceforge.net/bowtie2) BWA (http://bio-bwa.sourceforge.net/)

Sequence aligners:

Blast Blast+ Clustal Omega (http://www.clustal.org/omega)

Sequence Alignment tools (format conversion, variant discovery, etc.):

Samtools (http://sourceforge.net/projects/samtools/files/) SnpEff (SNP Effect Predictor) (http://snpeff.sourceforge.net/)

Quality assessment and quality control tools:

FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)

Genome assemblers:

MIRA assembler (http://www.chevreux.org/projects_mira.html) Velvet (www.ebi.ac.uk/~zerbino/velvet/)

RNA-seq analyses programs:

TopHat (splice junction mapper) (http://tophat.cbcb.umd.edu/) Cufflinks (quantification, transcriptome assembly and differential expresssion) (http://cufflinks.cbcb.umd.edu/): cuffmerge, cuffcompare, cuffdiff. Trinity (de novo transcripts assembly) (http://trinityrnaseq.sourceforge.net/) TopHat Fusion (detection of fusion transcripts) (http://tophat.cbcb.umd.edu/fusion_index.html) FusionMap (detection of fusion transcripts)(http://www.omicsoft.com/fusionmap/) RSEM (gene expression profiling and differential expression) (http://deweylab.biostat.wisc.edu/rsem/)

ChiP-Seq analysis programs:

MACS (peaks finding) (http://liulab.dfci.harvard.edu/MACS/) MEME-ChIP (motif discovery) (http://ebi.edu.au/ftp/software/MEME/index.html)

Sequence Analysis and Structure prediction service [Mario García, Ruth Matesanz]

Available software

Discovery Studio: Small molecule and macromolecular modeling and simulation for drug design.

Sybyl: molecular modeling from protein sequences.

Schrödinger Suite: software for a comprehensive protein modeling and protein-protein docking prediction and simulation.

Biostatistics [Guillermo Padilla Alonso]

Software packages available that best fit the necessities of the researchers: based on the SAS program (Statistical Analysis Software), we seek for alternatives that could be useful to the service users (SPSS, R, and so on).

Next Generation Sequencing data analysis

Sequence Analysis and Molecular Modeling

Biostatistics

Instructions

Members

Mario García Lacoba

More Info