Publication archive

Publications

A numbered archive of Di Genoma Lab papers spanning cancer genomics, genome assembly, long-read sequencing, human genetics, microbial systems, and computational methods.

52 Total records
2026 Latest year
4 Selected papers
Selected papers

Work that anchors the lab’s direction

A compact selection from genome assembly, cancer multi-omics, sex chromosome evolution, and translational tumor genomics.

Full archive

Numbered publication list

Reverse chronological list with publication venue, author preview, compact abstracts, and direct DOI links.

52 records shown
2026
01
Jan 2026 bioRxiv

Ultra-deep sequencing reveals intra-host diversity and co-infection-driven evolution of SARS-CoV-2

Carol Moraga, Francisco Kirhman, Barbara Bernal, Sofia Poblete, Pastor Jullian, et al.

Abstract

This preprint uses ultra-deep whole-genome sequencing of SARS-CoV-2 samples from Chile to characterize intra-host variation, low-frequency mutations, co-infections, and variant-level evolutionary dynamics. The study shows how high-depth viral sequencing can reveal early diversification processes that are invisible to consensus-only surveillance.

Viral genomicsSARS-CoV-2EvolutionPreprint
02
Jan 2026 Frontiers in Cell and Developmental Biology

Transcriptional regulatory network analysis uncovers modular gene control and potential key regulators in diabetic cardiomyopathy

Fernanda Fredericksen, Victor Aliaga-Tobar, Sebastian Aedo-Cares, Jorge Torres, Sebastian Leiva-Navarrete, et al.

Abstract

This study reconstructs transcriptional regulatory networks in diabetic cardiomyopathy to identify modular gene-control programs and candidate regulatory drivers. The analysis highlights coordinated transcriptional modules linked to disease biology and nominates potential key regulators for future mechanistic and translational studies.

Regulatory genomicsCardiomyopathyNetwork analysisSystems biology
03
Jan 2026 bioRxiv

Accurate haplotype-resolved de novo assembly of human genomes with RFhap

Damariz Gonzalez, Gabriel Cabas, Juan Francisco Miquel, Carol Moraga, Francisca Salas, et al.

Abstract

RFhap is a reproducible workflow for haplotype-resolved de novo assembly of human genomes using parental short reads and child long reads. The method assigns long reads to parental haplotypes and evaluates downstream assemblies with haplotype-aware metrics, improving the resolution of complex human genomic variation.

Genome assemblyHaplotype resolutionHuman genomicsPreprint
04
Jan 2026 Biological Research

A recent and rapid genome expansion driven by the amplification of transposable elements in the Neotropical annual killifish Garcialebias charrua

Felipe Gajardo-Escobar, Camilo Valdivieso, Alex Di Genova, Luisa Pereiro, Maria Jose Arezo, et al.

Abstract

By comparing two annual killifish genomes, this study shows that Garcialebias charrua experienced a recent and unusually large genome expansion driven mainly by repetitive elements. The analysis links transposable element bursts with genome structure, gene context, and evolutionary adaptation in Neotropical annual fishes.

Genome evolutionTransposable elementsFish genomicsComparative genomics
2025
05
Jul 2025 medRxiv

A clinically relevant morpho-molecular classification of lung neuroendocrine tumours

Alexandra Sexton-Oates, Émilie Mathian, Noah Candeli, Yuliya Lim, Catherine Voegele, et al.

Abstract

This preprint integrates genomics, transcriptomics, methylation, spatial profiling, morphology, and clinical data to refine the classification of lung neuroendocrine tumours. The proposed morpho-molecular groups expose clinically relevant tumour states and potential therapeutic opportunities, including markers connected to prognosis and treatment selection.

Cancer genomicsLung neuroendocrine tumoursClassificationPreprint
06
Apr 2025 bioRxiv

Effects of Willow and Pine tree growth in bacterial abundance, interactions and metabolic profile in a copper mine tailing

Jaime Ortega, Gabriel Gálvez, Gladis Serrano, Jorge Torres, Victor Aliaga-Tobar, et al.

Abstract

This preprint examines how willow and pine growth alters the physicochemical and microbial environment of copper mine tailing soils. Vegetated sites showed reduced copper stress, more neutral pH, shifts in bacterial abundance and interaction networks, and higher predicted microbial metabolic activity compared with non-vegetated tailing soils.

Environmental genomicsMicrobiomeMining environmentsPreprint
07
Jan 2025 Science

The Silene latifolia genome and its giant Y chromosome

Carol Moraga, Catarina Branco, Quentin Rougemont, Paris Veltsos, Pavel Jedlicka, et al.

Abstract

This study reports a high-quality assembly of the male Silene latifolia genome and resolves one of the largest known plant Y chromosomes. The work links repeat accumulation, recombination suppression, and chromosome restructuring to the origin of giant sex chromosomes and identifies candidate sex-determining loci on the Y.

Plant genomicsSex chromosomesGenome assemblyEvolution
08
Jan 2025 Journal of Clinical Oncology

TERT expression and clinical outcome in pulmonary carcinoids

Lisa Werr, Christoph Bartenhagen, Carolina Rosswog, Maria Cartolano, Catherine Voegele, et al.

Abstract

Across discovery and validation cohorts, this work shows that TERT expression separates pulmonary carcinoids into biologically and clinically distinct groups. High TERT expression is associated with more aggressive disease behavior and poorer outcome, supporting TERT as a practical prognostic marker in these tumors.

Cancer genomicsPulmonary carcinoidBiomarkersTranslational oncology
09
Jan 2025 International Journal of Biological Macromolecules

Production of polyhydroxyalkanoates by Halomonas sp. HG01, a halophilic bacterium from northern Peru, using various carbon sources: metabolic and genomic analysis

Cesar W. Guzman-Moreno, Juliana Cardinali-Rezende, Alex Di Genova, Mariana Ferrarini, Marie-France Sagot, et al.

Abstract

This work combines fermentation experiments and genome analysis to characterize Halomonas sp. HG01 as a robust producer of polyhydroxyalkanoates from multiple carbon sources. The study defines polymer yield, substrate range, and biosynthetic potential for a halophilic strain relevant to scalable biopolymer production.

Microbial genomicsBiopolymersPHAMetabolic analysis
10
Jan 2025 Scientific Reports

Impact of Amerindian ancestry on clinical outcomes in Crohn's disease and ulcerative colitis in a Latino population

Tamara Pérez-Jeldres, María Leonor Bustamante, Danilo Alvares, Manuel Alvarez-Lobos, Lajos Kalmer, et al.

Abstract

This study analyzes the relationship between Amerindian ancestry and clinical outcomes in a Latino inflammatory bowel disease cohort. The results suggest that ancestry proportions are associated with distinct Crohn's disease and ulcerative colitis phenotypes, supporting the value of population-aware genomic analysis in underrepresented Latin American cohorts.

Population genomicsClinical genomicsLatino populationsInflammatory bowel disease
11
Jan 2025 Environmental Microbiome

Genome-resolved metagenomics and evolutionary analysis reveal conserved metabolic adaptations in extremophile communities from a copper mining tailing

Moises A. Rojas, Gladis Serrano, Jorge Torres, Jaime Ortega, Gabriel Gálvez, et al.

Abstract

This study applies genome-resolved metagenomics to microbial communities from a copper mining tailing in central Chile. By reconstructing metagenome-assembled genomes and analyzing metabolic pathways and selection signals, the work identifies conserved sulfur, copper, and iron adaptation programs in extremophile communities.

MetagenomicsEvolutionExtremophilesMining environments
2024
12
Dec 2024 bioRxiv

Photosynthetic and Genetic Adaptations Underpinning the Resilience of Cistanthe longiscapa in the Atacama Desert

Omar Sandoval-Ibañez, Patricio Tapia-Reyes, Anibal Riveros, Ricardo Yusta, Shengxin Chang, et al.

Abstract

This preprint investigates physiological and genomic adaptations that allow Cistanthe longiscapa to survive in the Atacama Desert. The study links photosynthetic performance, CAM-related signals, genome features, and stress-response gene family expansions to plant resilience under extreme environmental conditions.

Plant genomicsAtacama DesertAdaptationPreprint
13
Sep 2024 medRxiv

A workflow for clinical profiling of BRCA genes in Chilean breast cancer patients via targeted sequencing

Evelin Gonzalez, Rodrigo Moreno Salinas, Manuel Muñoz, Soledad Lantadilla Herrera, Mylene Cabrera Morales, et al.

Abstract

This work presents a reproducible Nextflow workflow for targeted sequencing analysis of BRCA1 and BRCA2 in Chilean breast cancer patients. The pipeline integrates germline variant calling, annotation, and clinical interpretation to support precision oncology implementation in local public-health settings.

Cancer genomicsBRCATargeted sequencingChilean genomics
14
Jun 2024 BMC Genomics

Genomes of the Orestias pupfish from the Andean Altiplano shed light on their evolutionary history and phylogenetic relationships within Cyprinodontiformes

Pamela Morales, Felipe Gajardo, Camilo Valdivieso, Moises A. Valladares, Alex Di Genova, et al.

Abstract

We generated genome resources for three Orestias species from the Andean Altiplano and used them to reassess the position of this group within Cyprinodontiformes. The analysis shows that Orestias is more closely related to South American lineages than to Cyprinodontidae, and supports Orestiidae as the appropriate family-level placement.

GenomicsPhylogenomicsEvolutionFish
2023
15
Apr 2023 Nature Genetics

Multiomic analysis of malignant pleural mesothelioma identifies molecular axes and specialized tumor profiles driving intertumor heterogeneity

Alex Di Genova, Lise Mangiante, Nicolas Alcala, Alexandra Sexton-Oates, Abel Gonzalez-Perez, et al.

Abstract

Malignant pleural mesothelioma (MPM) is an aggressive cancer with rising incidence and challenging clinical management. Through a large series of whole-genome sequencing data, integrated with transcriptomic and epigenomic data using multiomics factor analysis, we demonstrate that the current World Health Organization classification only accounts for up to 10% of interpatient molecular differences. Instead, the MESOMICS project paves the way for a morphomolecular classification of MPM based on four dimensions: ploidy, tumor cell morphology, adaptive immune response and CpG island methylator profile. We show that these four dimensions are complementary, capture major interpatient molecular differences and are delimited by extreme phenotypes that—in the case of the interdependent tumor cell morphology and adapted immune response—reflect tumor specialization. These findings unearth the interplay between MPM functional biology and its genomic history, and provide insights into the variations observed in the clinical behavior of patients with MPM.

GenomicsMesotheliomaCancerComputational biology
16
Jan 2023 Frontiers in Molecular Biosciences

Editorial: Applications of biological networks in biomedicine

Vinicius Maracaja-Coutinho, Alex Di Genova, Anne Siegel, Mauricio Latorre

Abstract

This editorial introduces a collection on biological network applications in biomedicine, emphasizing how network-based models can connect molecular interactions, disease mechanisms, and computational analysis. The article frames network biology as a practical strategy for integrating multi-scale biomedical data.

17
Jan 2023 GigaScience

A molecular phenotypic map of malignant pleural mesothelioma

Alex Di Genova, Lise Mangiante, Alexandra Sexton-Oates, Catherine Voegele, Lynnette Fernandez-Cuesta, et al.

Abstract

Malignant pleural mesothelioma (MPM) is a rare understudied cancer associated with exposure to asbestos. So far, MPM patients have benefited marginally from the genomics medicine revolution due to the limited size or breadth of existing molecular studies. In the context of the MESOMICS project, we have performed the most comprehensive molecular characterization of MPM to date, with the underlying dataset made of the largest whole-genome sequencing series yet reported, together with transcriptome sequencing and methylation arrays for 120 MPM patients.We first provide comprehensive quality controls for all samples, of both raw and processed data. Due to the difficulty in collecting specimens from such rare tumors, a part of the cohort does not include matched normal material. We provide a detailed analysis of data processing of these tumor-only samples, showing that all somatic alteration calls match very stringent criteria of precision and recall. Finally, integrating our data with previously published multiomic MPM datasets (n = 374 in total), we provide an extensive molecular phenotype map of MPM based on the multitask theory. The generated map can be interactively explored and interrogated on the UCSC TumorMap portal (https://tumormap.ucsc.edu/?p=RCG_MESOMICS/MPM_Archetypes ).This new high-quality MPM multiomics dataset, together with the state-of-art bioinformatics and interactive visualization tools we provide, will support the development of precision medicine in MPM that is particularly challenging to implement in rare cancers due to limited molecular studies.

2022
18
Jan 2022 Genomics

Genome sequencing and transcriptomic analysis of the Andean killifish Orestias ascotanensis reveals adaptation to high-altitude aquatic life

Alex Di Genova, Gino Nardocci, Rodrigo Maldonado-Agurto, Christian Hodar, Camilo Valdivieso, et al.

Abstract

Orestias ascotanensis (Cyprinodontidae) is a teleost pupfish endemic to springs feeding into the Ascotan saltpan in the Chilean Altiplano (3,700 m.a.s.l.) and represents an opportunity to study adaptations to high-altitude aquatic environments. We have de novo assembled the genome of O. ascotanensis at high coverage. Comparative analysis of the O. ascotanensis genome showed an overall process of contraction, including loss of genes related to G-protein signaling, chemotaxis and signal transduction, while there was expansion of gene families associated with microtubule-based movement and protein ubiquitination. We identified 818 genes under positive selection, many of which are involved in DNA repair. Additionally, we identified novel and conserved microRNAs expressed in O. ascotanensis and its closely-related species, Orestias gloriae. Our analysis suggests that positive selection and expansion of genes that preserve genome stability are a potential adaptive mechanism to cope with the increased solar UV radiation to which high-altitude animals are exposed to.

DNA repairAltiplanoDesert pupfishHigh-altitude
2021
19
Nov 2021 BMC Biology

The transposable element-rich genome of the cereal pest Sitophilus oryzae

Nicolas Parisot, Carlos Vargas-Chávez, Clément Goubert, Patrice Baa-Puyoulet, Séverine Balmand, et al.

Abstract

The rice weevil Sitophilus oryzae is one of the most important agricultural pests, causing extensive damage to cereal in fields and to stored grains. S. oryzae has an intracellular symbiotic relationship (endosymbiosis) with the Gram-negative bacterium Sodalis pierantonius and is a valuable model to decipher host-symbiont molecular interactions.

EvolutionGenomeImmunityColeoptera
20
Apr 2021 Nature Biotechnology

Efficient hybrid de novo assembly of human genomes with WENGAN

Alex Di Genova, Elena Buena-Atienza, Stephan Ossowski, Marie-France Sagot

Abstract

Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurate short reads to polish the consensus sequence. Here we report an algorithm for hybrid assembly, WENGAN, that provides very high quality at low computational cost. We demonstrate de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms to improve assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50: 17.24–80.64 Mb), few assembly errors (contig NGA50: 11.8–59.59 Mb), good consensus quality (QV: 27.84–42.88) and high gene completeness (BUSCO complete: 94.6–95.2%), while consuming low computational resources (CPU hours: 187–1,200). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 80.64 Mb (NGA50: 59.59 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50: 57.88 Mb).

21
Jan 2021 Biodemography and Social Biology

The Chilean socio-ethno-genomic cline

E. Barozet, C. Y. Valenzuela, L. Cifuentes, R. A. Verdugo, L. Herrera, et al.

Abstract

Studies of the current Chilean population performed using classical genetic markers have established that the Chilean population originated primarily from the admixture of European people, particularly Spaniards, and Amerindians. A socioeconomic-ethno-genetic cline was established soon after the conquest. Spaniards born in Spain or Chile occupied the highest Socioeconomic Strata, while Amerindians belonged to the lowest. The intermediate strata consisted of people with different degrees of ethnic admixture; the larger the European admixture, the higher the Socioeconomic Level. The present study of molecular genomic markers sought to calculate the percentage of Amerindian admixture and revealed a finer distribution of this cline, as well as differences between two Amerindian groups: Aymara and Mapuche. The use of two socioeconomic classifications - Class and Socioeconomic Level - reveals important differences. Furthermore, Self-reported Ethnicity (self-assignment to an ethnic group) and Self-reported Ancestry (self-recognition of Amerindian ancestors) show variations and differing relationships between socioeconomic classifications and genomic Amerindian Admixture. These data constitute a valuable input for the formulation of public healthcare policy and show that the notions of Ethnicity, Socioeconomic Strata and Class should always be a consideration in policy development.

ChileEthnicityGene FrequencyGenetic Markers
2020
22
Nov 2020 International Journal of Biological Macromolecules

The relevance of enzyme specificity for coenzymes and the presence of 6-phosphogluconate dehydrogenase for polyhydroxyalkanoates production in the metabolism of Pseudomonas sp. LFM046

Juliana Cardinali-Rezende, Alex Di Genova, Rafael A. T. P. S. Nahat, Alexander Steinbüchel, Marie-France Sagot, et al.

Abstract

Reconstruction of genome-based metabolic model is a useful approach for the assessment of metabolic pathways, genes and proteins involved in the environmental fitness capabilities or pathogenic potential as well as for biotechnological processes development. Pseudomonas sp. LFM046 was selected as a good polyhydroxyalkanoates (PHA) producer from carbohydrates and plant oils. Its complete genome sequence and metabolic model were obtained. Analysis revealed that the gnd gene, encoding 6-phosphogluconate dehydrogenase, is absent in Pseudomonas sp. LFM046 genome. In order to improve the knowledge about LFM046 metabolism, the coenzyme specificities of different enzymes was evaluated. Furthermore, the heterologous expression of gnd genes from Pseudomonas putida KT2440 (NAD+ dependent) and Escherichia coli MG1655 (NADP+ dependent) in LFM046 was carried out and provoke a delay on cell growth and a reduction in PHA yield, respectively. The results indicate that the adjustment in cyclic Entner-Doudoroff pathway may be an interesting strategy for it and other bacteria to simultaneously meet divergent cell needs during cultivation phases of growth and PHA production.

6-phosphogluconate dehydrogenaseEntner-Doudoroff pathwayPentose phosphate pathwayPHA production
23
Aug 2020 Scientific Reports

Mycoplasma hyopneumoniae J elicits an antioxidant response and decreases the expression of ciliary genes in infected swine epithelial cells

Scheila G. Mucha, Mariana G. Ferrarini, Carol Moraga, Alex Di Genova, Laurent Guyon, et al.

Abstract

Mycoplasma hyopneumoniae is the most costly pathogen for swine production. Although several studies have focused on the host-bacterium association, little is known about the changes in gene expression of swine cells upon infection. To improve our understanding of this interaction, we infected swine epithelial NPTr cells with M. hyopneumoniae strain J to identify differentially expressed mRNAs and miRNAs. The levels of 1,268 genes and 170 miRNAs were significantly modified post-infection. Up-regulated mRNAs were enriched in genes related to redox homeostasis and antioxidant defense, known to be regulated by the transcription factor NRF2 in related species. Down-regulated mRNAs were enriched in genes associated with cytoskeleton and ciliary functions. Bioinformatic analyses suggested a correlation between changes in miRNA and mRNA levels, since we detected down-regulation of miRNAs predicted to target antioxidant genes and up-regulation of miRNAs targeting ciliary and cytoskeleton genes. Interestingly, most down-regulated miRNAs were detected in exosome-like vesicles suggesting that M. hyopneumoniae infection induced a modification of the composition of NPTr-released vesicles. Taken together, our data indicate that M. hyopneumoniae elicits an antioxidant response induced by NRF2 in infected cells. In addition, we propose that ciliostasis caused by this pathogen is partially explained by the down-regulation of ciliary genes.

TranscriptomicsGene expressionPathogensBacteriology
24
Aug 2020 BMC Plant Biology

Identification of SNPs and InDels associated with berry size in table grapes integrating genetic and transcriptomic approaches

Claudia Muñoz-Espinoza, Alex Di Genova, Alicia Sánchez, José Correa, Alonso Espinoza, et al.

Abstract

Berry size is considered as one of the main selection criteria in table grapes breeding programs, due to the consumer preferences. However, berry size is a complex quantitive trait under polygenic control, and its genetic determination of berry weight is not yet fully understood. The aim of this work was to perform marker discovery using a transcriptomic approach, in order to identify and characterize SNP and InDel markers associated with berry size in table grapes. We used an integrative analysis based on RNA-Seq, SNP/InDel search and validation on table grape segregants and varieties with different genetic backgrounds.

RNA-SeqBerry sizeInDelMarker-assisted selection
25
Apr 2020 Biological Research

Development of a small panel of SNPs to infer ancestry in Chileans that distinguishes Aymara and Mapuche components

Ricardo A. Verdugo, Alex Di Genova, Luisa Herrera, Mauricio Moraga, Mónica Acuña, et al.

Abstract

Current South American populations trace their origins mainly to three continental ancestries, i.e. European, Amerindian and African. Individual variation in relative proportions of each of these ancestries may be confounded with socio-economic factors due to population stratification. Therefore, ancestry is a potential confounder variable that should be considered in epidemiologic studies and in public health plans. However, there are few studies that have assessed the ancestry of the current admixed Chilean population. This is partly due to the high cost of genome-scale technologies commonly used to estimate ancestry. In this study we have designed a small panel of SNPs to accurately assess ancestry in the largest sampling to date of the Chilean mestizo population (n = 3349) from eight cities. Our panel is also able to distinguish between the two main Amerindian components of Chileans: Aymara from the north and Mapuche from the south.

ChileAdmixtureAncestryAymara
2019
26
Aug 2019 Scientia Horticulturae

RNA-Seq analysis and transcriptome assembly of raspberry fruit (Rubus idaeus ¨Heritage¨) revealed several candidate genes involved in fruit development and ripening

Dante Travisany, Anibal Ayala-Raso, Alex Di Genova, Liliam Monsalve, Maricarmen Bernales, et al.

Abstract

Using Illumina HiSeq ™ 2000 sequencing platform (100 bp double-end reads), we performed transcriptome analysis of flower (F), green (G) and pink (P) fruit stages of red raspberry. Transcriptome was obtained by the de-novo assembly of 298 Million high-quality reads through Trinity assembler, out of the 41,650 high quality transcripts, 18,171 coding sequences were successfully characterized using databases such as UniProtKB, NCBI Non-Redundant, KEGG, Gene Ontology, and InterPro-Scan. A total of 2409 transcripts were further identified as differentially expressed genes (DEGs) between the three libraries generated, and 253 DEGs were found between different fruit stages. Singular enrichment analysis of gene ontology (GO) detected an important group of DEGs over-expressed during fruit development; and associated with ethylene, auxin conjugation, abscisic acid response, brassinosteroids biosynthesis and signaling, and cell-wall disassembly processes. Our transcriptome data provides valuable insights on genes involved in the ripening process of Rubus fruit, as a representative of non-model fruit species, and may help in developing these cultivars with improved fruit quality characteristics in the years to come.

RNA-seqAbscisic acidAuxinBrassinosteroids
27
Feb 2019 Scientific Reports

Whole Genome Sequence, Variant Discovery and Annotation in Mapuche-Huilliche Native South Americans

Elena A. Vidal, Tomás C. Moyano, Bernabé I. Bustos, Eduardo Pérez-Palma, Carol Moraga, et al.

Abstract

Whole human genome sequencing initiatives help us understand population history and the basis of genetic diseases. Current data mostly focuses on Old World populations, and the information of the genomic structure of Native Americans, especially those from the Southern Cone is scant. Here we present annotation and variant discovery from high-quality complete genome sequences of a cohort of 11 Mapuche-Huilliche individuals (HUI) from Southern Chile. We found approximately 3.1 × 106 single nucleotide variants (SNVs) per individual and identified 403,383 (6.9%) of novel SNVs events. Analyses of large-scale genomic events detected 680 copy number variants (CNVs) and 4,514 structural variants (SVs), including 398 and 1,910 novel events, respectively. Global ancestry composition of HUI genomes revealed that the cohort represents a sample from a marginally admixed population from the Southern Cone, whose main genetic component derives from Native American ancestors. Additionally, we found that HUI genomes contain variants in genes associated with 5 of the 6 leading causes of noncommunicable diseases in Chile, which may have an impact on the risk of prevalent diseases in Chilean and Amerindian populations. Our data represents a useful resource that can contribute to population-based studies and for the design of early diagnostics or prevention tools for Native and admixed Latin American populations.

GeneticsGenetic variation
28
Jan 2019 Evolutionary Applications

Comparing genomic signatures of domestication in two Atlantic salmon (Salmo salar L.) populations with different geographical origins

Maria E. López, Laura Benestan, Jean-Sebastien Moore, Charles Perrier, John Gilbey, et al.

Abstract

Selective breeding and genetic improvement have left detectable signatures on the genomes of domestic species. The elucidation of such signatures is fundamental for detecting genomic regions of biological relevance to domestication and improving management practices. In aquaculture, domestication was carried out independently in different locations worldwide, which provides opportunities to study the parallel effects of domestication on the genome of individuals that have been selected for similar traits. In this study, we aimed to detect potential genomic signatures of domestication in two independent pairs of wild/domesticated Atlantic salmon populations of Canadian and Scottish origins, respectively. Putative genomic regions under divergent selection were investigated using a 200K SNP array by combining three different statistical methods based either on allele frequencies (LFMM, Bayescan) or haplotype differentiation (Rsb). We identified 337 and 270 SNPs potentially under divergent selection in wild and hatchery populations of Canadian and Scottish origins, respectively. We observed little overlap between results obtained from different statistical methods, highlighting the need to test complementary approaches for detecting a broad range of genomic footprints of selection. The vast majority of the outliers detected were population-specific but we found four candidate genes that were shared between the populations. We propose that these candidate genes may play a role in the parallel process of domestication. Overall, our results suggest that genetic drift may have override the effect of artificial selection and/or point toward a different genetic basis underlying the expression of similar traits in different domesticated strains. Finally, it is likely that domestication may predominantly target polygenic traits (e.g., growth) such that its genomic impact might be more difficult to detect with methods assuming selective sweeps.

Salmo salarselective sweepssingle nucleotide polymorphisms
2018
29
Jul 2018 Proceedings of the Royal Society B: Biological Sciences

Genomic variation underlying complex life-history traits revealed by genome sequencing in Chinook salmon

Shawn R. Narum, Alex Di Genova, Steven J. Micheletti, Alejandro Maass

Abstract

A broad portfolio of phenotypic diversity in natural organisms can buffer against exploitation and increase species persistence in disturbed ecosystems. The study of genomic variation that accounts for ecological and evolutionary adaptation can represent a powerful approach to extend understanding of phenotypic variation in nature. Here we present a chromosome-level reference genome assembly for Chinook salmon (Oncorhynchus tshawytscha; 2.36 Gb) that enabled association mapping of life-history variation and phenotypic traits for this species. Whole-genome re-sequencing of populations with distinct life-history traits provided evidence that divergent selection was extensive throughout the genome within and among phylogenetic lineages, indicating that a broad portfolio of phenotypic diversity exists in this species that is related to local adaptation and life-history variation. Association mapping with millions of genome-wide SNPs revealed that a genomic region of major effect on chromosome 28 was associated with phenotypes for premature and mature arrival to spawning grounds and was consistent across three distinct phylogenetic lineages. Our results demonstrate how genomic resources can enlighten the genetic basis of known phenotypes in exploited species and assist in clarifying phenotypic variation that may be difficult to observe in naturally occurring organisms.

genomicsevolutionecology
30
May 2018 GigaScience

Fast-SG: an alignment-free algorithm for hybrid assembly

Alex Di Genova, Gonzalo A Ruz, Marie-France Sagot, Alejandro Maass

Abstract

Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes.Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.

31
Jan 2018 Mycology

The genome sequence of the soft-rot fungus Penicillium purpurogenum reveals a high gene dosage for lignocellulolytic enzymes

Wladimir Mardones, Alex Di Genova, María Paz Cortés, Dante Travisany, Alejandro Maass, et al.

Abstract

The high lignocellulolytic activity displayed by the soft-rot fungus Penicillium purpurogenum has made it a target for the study of novel lignocellulolytic enzymes. We have obtained a reference genome of 36.2 Mb of non-redundant sequence (11,057 protein-coding genes). The 49 largest scaffolds cover 90% of the assembly, and Core Eukaryotic Genes Mapping Approach (CEGMA) analysis reveals that our assembly captures almost all protein-coding genes. RNA-seq was performed and 93.1% of the reads aligned to the assembled genome. These data, plus the independent sequencing of a set of genes of lignocellulose-degrading enzymes, validate the quality of the genome sequence. P. purpurogenum shows a higher number of proteins with CAZy motifs, transcription factors and transporters as compared to other sequenced Penicillia. These results demonstrate the great potential for lignocellulolytic activity of this fungus and the possible use of its enzymes in related industrial applications.

RNA-seqgenome sequencingCAZymesIllumina
2017
32
Jul 2017 BMC Systems Biology

Reconstruction of the microalga Nannochloropsis salina genome-scale metabolic model with applications to lipid production

Nicolás Loira, Sebastian Mendoza, María Paz Cortés, Natalia Rojas, Dante Travisany, et al.

Abstract

Nannochloropsis salina (= Eustigmatophyceae) is a marine microalga which has become a biotechnological target because of its high capacity to produce polyunsaturated fatty acids and triacylglycerols. It has been used as a source of biofuel, pigments and food supplements, like Omega 3. Only some Nannochloropsis species have been sequenced, but none of them benefit from a genome-scale metabolic model (GSMM), able to predict its metabolic capabilities.

Genome-scale Metabolic modelMicroalgæNannochloropsis salinaTAG
33
May 2017 Scientific Reports

Global gene expression analysis provides insight into local adaptation to geothermal streams in tadpoles of the Andean toad Rhinella spinulosa

Luis Pastenes, Camilo Valdivieso, Alex Di Genova, Dante Travisany, Andrew Hart, et al.

Abstract

The anuran Rhinella spinulosa is distributed along the Andes Range at altitudes that undergo wide daily and seasonal variation in temperature. One of the populations inhabits geothermal streams, a stable environment that influences life history traits such as the timing of metamorphosis. To investigate whether this population has undergone local adaptation to this unique habitat, we carried out transcriptome analyses in animals from two localities in two developmental stages (prometamorphic and metamorphic) and exposed them to two temperatures (20 and 25 °C). RNA-Seq, de novo assembly and annotation defined a transcriptome revealing 194,469 high quality SNPs, with 1,507 genes under positive selection. Comparisons among the experimental conditions yielded 1,593 differentially expressed genes. A bioinformatics search for candidates revealed a total of 70 genes that are highly likely to be implicated in the adaptive response of the population living in a stable environment, compared to those living in an environment with variable temperatures. Most importantly, the population inhabiting the geothermal environment showed decreased transcriptional plasticity and reduced genetic variation compared to its counterpart from the non-stable environment. This analysis will help to advance the understanding of the molecular mechanisms that account for the local adaptation to geothermal streams in anurans.

Evolutionary geneticsGene expression
34
Apr 2017 Aquaculture

Genome wide association study for resistance to Caligus rogercresseyi in Atlantic salmon (Salmo salar L.) using a 50K SNP genotyping array

Katharina Correa, Jean P. Lhorente, Liane Bassini, María E. López, Alex Di Genova, et al.

Abstract

The sea louse (Caligus rogercresseyi) is an external parasite and considered one of the most important health problems in the salmon farming industry. Resistance to conventional chemical treatments has been demonstrated. Sufficient additive genetic variation has been determined to include selection for resistance to this parasite in Atlantic salmon breeding programs. The aim of this study was to perform a Genome Wide Association Study in order to dissect the genetic factors involved in the resistance to C. rogercresseyi, one of the most important species of sea lice in the Chilean salmon farming. 2628 Atlantic salmon smolts, which had been experimentally infested with C. rogercresseyi, were genotyped using a 50K SNP array. Genome Wide Association Analysis was conducted using a polygenic model. A heritability of 0.12 for resistance to this louse species was estimated using genomic information. This result was consistent with estimates from previous studies which used pedigree records in the same population. Only one SNP, located on chromosome 21, was significant at a local level, explaining 0.5% of the phenotypic variance and 4% of the genomic heritability for sea lice resistance. This SNP is located in an intronic region of a predicted gene which codes for Collagen alpha-1. Our results suggest that resistance to C. rogercresseyi can be considered a polygenic trait, controlled by many variants of relatively small effect. Thus the incorporation of genomic information through genomic selection could be the most appropriate approach for breeding purposes. Statement of relevance Caligus resistance has a polygenic genetic architecture.

Atlantic salmonGenome wide association studyPathogen resistanceSea lice
2016
35
Oct 2016 Bioresource Technology

The bioleaching potential of a bacterial consortium

Mauricio Latorre, María Paz Cortés, Dante Travisany, Alex Di Genova, Marko Budinich, et al.

Abstract

This work presents the molecular foundation of a consortium of five efficient bacteria strains isolated from copper mines currently used in state of the art industrial-scale biotechnology. The strains Acidithiobacillus thiooxidans Licanantay, Acidiphilium multivorum Yenapatur, Leptospirillum ferriphilum Pañiwe, Acidithiobacillus ferrooxidans Wenelen and Sulfobacillus thermosulfidooxidans Cutipay were selected for genome sequencing based on metal tolerance, oxidation activity and bioleaching of copper efficiency. An integrated model of metabolic pathways representing the bioleaching capability of this consortium was generated. Results revealed that greater efficiency in copper recovery may be explained by the higher functional potential of L. ferriphilum Pañiwe and At. thiooxidans Licanantay to oxidize iron and reduced inorganic sulfur compounds. The consortium had a greater capacity to resist copper, arsenic and chloride ion compared to previously described biomining strains. Specialization and particular components in these bacteria provided the consortium a greater ability to bioleach copper sulfide ores.

BioleachingBacterial consortiumMetabolic pathwaysMetal resistance
36
May 2016 Nature

The Atlantic salmon genome provides insights into rediploidization

Sigbjørn Lien, Ben F. Koop, Simen R. Sandve, Jason R. Miller, Matthew P. Kent, et al.

Abstract

The whole-genome duplication 80 million years ago of the common ancestor of salmonids (salmonid-specific fourth vertebrate whole-genome duplication, Ss4R) provides unique opportunities to learn about the evolutionary fate of a duplicated vertebrate genome in 70 extant lineages. Here we present a high-quality genome assembly for Atlantic salmon (Salmo salar), and show that large genomic reorganizations, coinciding with bursts of transposon-mediated repeat expansions, were crucial for the post-Ss4R rediploidization process. Comparisons of duplicate gene expression patterns across a wide range of tissues with orthologous genes from a pre-Ss4R outgroup unexpectedly demonstrate far more instances of neofunctionalization than subfunctionalization. Surprisingly, we find that genes that were retained as duplicates after the teleost-specific whole-genome duplication 320 million years ago were not more likely to be retained after the Ss4R, and that the duplicate retention was not influenced to a great extent by the nature of the predicted protein interactions of the gene products. Finally, we demonstrate that the Atlantic salmon assembly can serve as a reference sequence for the study of other salmonids for a range of purposes.

GenomeGenome evolution
37
Apr 2016 BMC Plant Biology

Transcriptome profiling of grapevine seedless segregants during berry development reveals candidate genes associated with berry weight

Claudia Muñoz-Espinoza, Alex Di Genova, José Correa, Romina Silva, Alejandro Maass, et al.

Abstract

Berry size is considered as one of the main selection criteria in table grape breeding programs. However, this is a quantitative and polygenic trait, and its genetic determination is still poorly understood. Considering its economic importance, it is relevant to determine its genetic architecture and elucidate the mechanisms involved in its expression. To approach this issue, an RNA-Seq experiment based on Illumina platform was performed (14 libraries), including seedless segregants with contrasting phenotypes for berry weight at fruit setting (FST) and 6–8 mm berries (B68) phenological stages.

RNA-seqBerry weightCandidate genesFunctional genomics
38
Jan 2016 Molecular Ecology Resources

Genomewide single nucleotide polymorphism discovery in Atlantic salmon (Salmo salar): validation in wild and farmed American and European populations

J. M. Yáñez, S. Naswa, M. E. López, L. Bassini, K. Correa, et al.

Abstract

A considerable number of single nucleotide polymorphisms (SNPs) are required to elucidate genotype–phenotype associations and determine the molecular basis of important traits. In this work, we carried out de novo SNP discovery accounting for both genome duplication and genetic variation from American and European salmon populations. A total of 9 736 473 nonredundant SNPs were identified across a set of 20 fish by whole-genome sequencing. After applying six bioinformatic filtering steps, 200 K SNPs were selected to develop an Affymetrix Axiom® myDesign Custom Array. This array was used to genotype 480 fish representing wild and farmed salmon from Europe, North America and Chile. A total of 159 099 (79.6%) SNPs were validated as high quality based on clustering properties. A total of 151 509 validated SNPs showed a unique position in the genome. When comparing these SNPs against 238 572 markers currently available in two other Atlantic salmon arrays, only 4.6% of the SNP overlapped with the panel developed in this study. This novel high-density SNP panel will be very useful for the dissection of economically and ecologically relevant traits, enhancing breeding programmes through genomic selection as well as supporting genetic studies in both wild and farmed populations of Atlantic salmon using high-resolution genomewide information.

next-generation sequencingSalmo salargenomic selectionpseudo-tetraploid
2015
39
Oct 2015 BMC Genomics

Genome-wide association analysis reveals loci associated with resistance against Piscirickettsia salmonis in two Atlantic salmon (Salmo salar L.) chromosomes

Katharina Correa, Jean P. Lhorente, María E. López, Liane Bassini, Sudhir Naswa, et al.

Abstract

Pisciricketssia salmonis is the causal agent of Salmon Rickettsial Syndrome (SRS), which affects salmon species and causes severe economic losses. Selective breeding for disease resistance represents one approach for controlling SRS in farmed Atlantic salmon. Knowledge concerning the architecture of the resistance trait is needed before deciding on the most appropriate approach to enhance artificial selection for P. salmonis resistance in Atlantic salmon. The purpose of the study was to dissect the genetic variation in the resistance to this pathogen in Atlantic salmon.

Atlantic salmonPathogen resistanceGenome Wide Association AnalysisSalmon Rickettsial Syndrome
40
Jul 2015 Nucleic Acids Research

The BioMart community portal: an innovative alternative to large, centralized data repositories

Damian Smedley, Syed Haider, Steffen Durinck, Luca Pandini, Paolo Provero, et al.

Abstract

The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations.

41
Feb 2015 Tree Genetics & Genomes

Transcriptome sequencing of Prunus sp. rootstocks roots to identify candidate genes involved in the response to root hypoxia

María José Arismendi, Rubén Almada, Paula Pimentel, Adriana Bastias, Ariel Salvatierra, et al.

Abstract

Root hypoxia in fruit trees affects growth, vegetative development, and reproductive development, which is reflected in low productivity, poor fruit quality, and premature decay of trees. Using Illumina Hiseq2000, we performed transcriptome analysis of roots from two different rootstocks, ‘Mariana 2624’ and ‘Mazzard F12/1,’ which are tolerant and sensitive to hypoxia, respectively. Transcriptomes from control and hypoxia-stressed plants (6, 24, and 72 h) were compared, using Prunus persica (L.) as reference genome. Hypoxic conditions altered the transcription in both genotypes. There were a high number of common differentially expressed genes (DEG) between the two genotypes for each sampling time, but also exclusive DEG for each genotype, with a few DEG that presented opposite modes of regulations during the hypoxia treatment. An important group of DEGs exclusively upregulated in the tolerant genotype are associated to enzymes of posttranslational protein modifications, such as leucine-rich repeat (LRR), kinases and ubiquitin-protein ligases, regulation of transcription, and process of oxide reduction. Singular enrichment analysis of gene ontology (GO), detected at least 115 GOs involved in the response to root hypoxia in the sensitive and/or tolerant genotypes. At least 25 GOs were identified as part of the baseline differences between the genotypes, most GO were disturbed in the sensitive genotype. The contribution from the baseline gene expression to the differential response between the Prunus genotypes is evidence that the resistant genotype is already “prepared” for a hypoxia event. An example are GO BP:0042221 of response to chemical stimulus; BP:0006979 of response to oxidative stress; MF:0016209 of antioxidant activity; MF:0016684 of oxidoreductase activity, acting on peroxide as acceptor; and MF:0004601 of peroxidase activity, which were disturbed only in the sensitive genotype, but not in the tolerant.

RNA-SeqHypoxiaqRT-PCRPrunus
2014
42
Nov 2014 Research in Microbiology

A new genome of Acidithiobacillus thiooxidans provides insights into adaptation to a bioleaching environment

Dante Travisany, María Paz Cortés, Mauricio Latorre, Alex Di Genova, Marko Budinich, et al.

Abstract

Acidithiobacillus thiooxidans is a sulfur oxidizing acidophilic bacterium found in many sulfur-rich environments. It is particularly interesting due to its role in bioleaching of sulphide minerals. In this work, we report the genome sequence of At. thiooxidans Licanantay, the first strain from a copper mine to be sequenced and currently used in bioleaching industrial processes. Through comparative genomic analysis with two other At. thiooxidans non-metal mining strains (ATCC 19377 and A01) we determined that these strains share a large core genome of 2109 coding sequences and a high average nucleotide identity over 98%. Nevertheless, the presence of 841 strain-specific genes (absent in other At. thiooxidans strains) suggests a particular adaptation of Licanantay to its specific biomining environment. Among this group, we highlight genes encoding for proteins involved in heavy metal tolerance, mineral cell attachment and cysteine biosynthesis. Several of these genes were located near genetic motility genes (e.g. transposases and integrases) in genomic regions of over 10 kbp absent in the other strains, suggesting the presence of genomic islands in the Licanantay genome probably produced by horizontal gene transfer in mining environments.

Comparative genomicsBioleachingAdaptation
43
Jan 2014 BMC Plant Biology

Whole genome comparison between table and wine grapes reveals a comprehensive catalog of structural variants

Alex Di Genova, Andrea Miyasaka Almeida, Claudia Muñoz-Espinoza, Paula Vizoso, Dante Travisany, et al.

Abstract

Grapevine (Vitis vinifera L.) is the most important Mediterranean fruit crop, used to produce both wine and spirits as well as table grape and raisins. Wine and table grape cultivars represent two divergent germplasm pools with different origins and domestication history, as well as differential characteristics for berry size, cluster architecture and berry chemical profile, among others. ‘Sultanina’ plays a pivotal role in modern table grape breeding providing the main source of seedlessness. This cultivar is also one of the most planted for fresh consumption and raisins production. Given its importance, we sequenced it and implemented a novel strategy for the de novo assembly of its highly heterozygous genome.

‘Sultanina’ genomeStructural variantsVitis vinifera L
2013
44
Dec 2013 BMC Genomics

Identification of two putative reference genes from grapevine suitable for gene expression analysis in berry and related tissues derived from RNA-Seq data

Mauricio González-Agüero, Miguel García-Rojas, Alex Di Genova, José Correa, Alejandro Maass, et al.

Abstract

Data normalization is a key step in gene expression analysis by qPCR. Endogenous control genes are used to estimate variations and experimental errors occurring during sample preparation and expression measurements. However, the transcription level of the most commonly used reference genes can vary considerably in samples obtained from different individuals, tissues, developmental stages and under variable physiological conditions, resulting in a misinterpretation of the performance of the target gene(s). This issue has been scarcely approached in woody species such as grapevine.

Candidate Reference GenePhenological StageReference GeneStable Reference Gene
45
Nov 2013 G3 GenestextbarGenomestextbarGenetics

Construction of Reference Chromosome-Scale Pseudomolecules for Potato: Integrating the Potato Genome with Genetic and Physical Maps

Sanjeev Kumar Sharma, Daniel Bolser, Jan de Boer, Mads Sønderkær, Walter Amoros, et al.

Abstract

The genome of potato, a major global food crop, was recently sequenced. The work presented here details the integration of the potato reference genome (DM) with a new sequence-tagged site marker−based linkage map and other physical and genetic maps of potato and the closely related species tomato. Primary anchoring of the DM genome assembly was accomplished by the use of a diploid segregating population, which was genotyped with several types of molecular genetic markers to construct a new ~936 cM linkage map comprising 2469 marker loci. In silico anchoring approaches used genetic and physical maps from the diploid potato genotype RH89-039-16 (RH) and tomato. This combined approach has allowed 951 superscaffolds to be ordered into pseudomolecules corresponding to the 12 potato chromosomes. These pseudomolecules represent 674 Mb (~93%) of the 723 Mb genome assembly and 37,482 (~96%) of the 39,031 predicted genes. The superscaffold order and orientation within the pseudomolecules are closely collinear with independently constructed high density linkage maps. Comparisons between marker distribution and physical location reveal regions of greater and lesser recombination, as well as regions exhibiting significant segregation distortion. The work presented here has led to a greatly improved ordering of the potato reference genome superscaffolds into chromosomal “pseudomolecules”.

46
May 2013 Neurochemistry International

Cdk5 regulates Rap1 activity

Elias Utreras, Daniel Henriquez, Erick Contreras-Vallejos, Cristina Olmos, Alex Di Genova, et al.

Abstract

Rap1 signaling is important for migration, differentiation, axonal growth, and during neuronal polarity. Rap1 can be activated by external stimuli, which in turn regulates specific guanine nucleotide exchange factors such as C3G, among others. Cdk5 functions are also important to neuronal migration and differentiation. Since we found that pharmacological inhibition of Cdk5 by using roscovitine reduced Rap1 protein levels in COS-7 cells and also C3G contains three putative phosphorylation sites for Cdk5, we examined whether the Cdk5-dependent phosphorylation of C3G could affect Rap1 expression and activity. We co-transfected C3G and tet-OFF system for p35 over-expression, an activator of Cdk5 activity into COS-7 cells, and then we evaluated phosphorylation in serine residues in C3G by immunoprecipitation and Western blot. We found that p35 over-expression increased C3G-serine-phosphorylation while inhibition of p35 expression by tetracycline or inhibition of Cdk5 activity with roscovitine decreased it. Interestingly, we found that MG-132, a proteasome inhibitor, rescue Rap1 protein levels in the presence of roscovitine. Besides, C3G-serine-phosphorylation and Rap1 protein levels were reduced in brain from Cdk5−/− as compared with the Cdk5+/+ brain. Finally, we found that p35 over-expression increased Rap1 activity while inhibition of p35 expression by tetracycline or roscovitine decreased Rap1 activity. These results suggest that Cdk5-mediated serine-phosphorylation of C3G may control Rap1 stability and activity, and this may potentially impact various neuronal functions such as migration, differentiation, and polarity.

C3GCyclin-dependent kinase 5NeuronesRap1
47
Apr 2013 Genomics

Bioinformatic survey for new physiological substrates of Cyclin-dependent kinase 5

Daniel A. Bórquez, Cristina Olmos, Sebastián Álvarez, Alex Di Genova, Alejandro Maass, et al.

Abstract

Cyclin-dependent kinase 5 (Cdk5) is a proline-directed serine/threonine kinase predominantly active in the nervous system where it regulates several processes such as neuronal migration, cytoskeletal dynamics, axonal guidance, and neurotransmission. We constructed a position specific scoring matrix (PSSM) based on a dataset of sites shown to be phosphorylated both in vivo and in vitro by Cdk5. This dataset was curated manually through an exhaustive search of published experimental data. We then used this PSSM to perform a search in the mouse proteome through Scansite, a web-based tool for matching sequence patterns in large databases. Considering a stringent cut-off score of 0.5, we identified 354 new putative sites present in 291 proteins. In order to assess the robustness of our results, ten random subsets (of 80 sites each) of the original dataset were used to construct new PSSMs, which were then used as input for a new Scansite search, leading to the recovery of 81% of the 354 sites by at least 5 PSSMs. In order to reduce the number of false positives in our sequence-based approach, we evaluated which of these predicted sites were phosphorylated in vivo as determined by multiple phosphoproteomics studies carried out through mass spectrometry and available in the PhosphoSitePlus database. This step resulted in a very promising list of 132 putative phosphorylation sites for Cdk5, of which, 51 are specifically phosphorylated in brain tissue, and some are involved in functions regulated by Cdk5 such as axonal growth, synaptic plasticity and neurotransmission. Other phosphorylation sites in our list suggest that Cdk5 might regulate processes through mechanisms not previously recognized such as the control of mRNA splicing.

Cyclin-dependent kinase 5NeurotransmissionPosition specific scoring matrixProtein phosphorylation
2012
48
Nov 2012 Journal of Bacteriology

Draft Genome Sequence of the Sulfobacillus thermosulfidooxidans Cutipay Strain, an Indigenous Bacterium Isolated from a Naturally Extreme Mining Environment in Northern Chile

Dante Travisany, Alex Di Genova, Andrea Sepúlveda, Roberto A. Bobadilla-Fazzini, Pilar Parada, et al.

Abstract

Sulfobacillus thermosulfidooxidans strain Cutipay is a mixotrophic, acidophilic, moderately thermophilic bacterium isolated from mining environments of the north of Chile, making it an interesting subject for studying the bioleaching of copper. We introduce the draft genome sequence and annotation of this strain, which provide insights into its mechanisms for heavy metal resistance.

49
Feb 2012 BioMetals

Genome wide identification of Acidithiobacillus ferrooxidans (ATCC 23270) transcription factors and comparative analysis of ArsR and MerR metal regulators

Christian Hödar, Pablo Moreno, Alex di Genova, Mauricio Latorre, Angélica Reyes-Jara, et al.

Abstract

Acidithiobacillus ferrooxidans is a chemolithoautotrophic acidophilic bacterium that obtains its energy from the oxidation of ferrous iron, elemental sulfur, or reduced sulfur minerals. This capability makes it of great industrial importance due to its applications in biomining. During the industrial processes, A. ferrooxidans survives to stressing circumstances in its environment, such as an extremely acidic pH and high concentration of transition metals. In order to gain insight into the organization of A. ferrooxidans regulatory networks and to provide a framework for further studies in bacterial growth under extreme conditions, we applied a genome-wide annotation procedure to identify 87 A. ferrooxidans transcription factors. We classified them into 19 families that were conserved among diverse prokaryotic phyla. Our annotation procedure revealed that A. ferrooxidans genome contains several members of the ArsR and MerR families, which are involved in metal resistance and detoxification. Analysis of their sequences revealed known and potentially new mechanism to coordinate gene-expression in response to metal availability. A. ferrooxidans inhabit some of the most metal-rich environments known, thus transcription factors identified here seem to be good candidates for functional studies in order to determine their physiological roles and to place them into A. ferrooxidans transcriptional regulatory networks.

Metal resistanceAcidithiobacillus ferrooxidansArsR familyMerR family
2011
50
Jul 2011 Nature

Genome sequence and analysis of the tuber crop potato

Xun Xu, Shengkai Pan, Shifeng Cheng, Bo Zhang, Desheng Mu, et al.

Abstract

Potato (Solanum tuberosum L.) is the world’s most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop.

Genome evolutionGenomic analysisPlant genetics
51
Jan 2011 Database

SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss

Alex Di Génova, Andrés Aravena, Luis Zapata, Mauricio González, Alejandro Maass, et al.

Abstract

SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project.Database URL:http://genomicasalmones.dim.uchile.cl/

52
Jan 2011 Database

BioMart Central Portal: an open database network for the biological community

Jonathan M. Guberman, J. Ai, O. Arnaiz, Joachim Baran, Andrew Blake, et al.

Abstract

BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities.Database URL: http://central.biomart.org.