Data from: Signatures of convergence in Neotropical cichlid fish

Leal-Cardin, Mariana1 ; Bracamonte, Seraina E.1 ; Aldegunde, Javier1 ; Magalhaes, Isabel S.2 ; Ornelas-García, Claudia Patricia3 ; Barluenga, Marta1

Published Sep 02, 2024 on Dryad. https://doi.org/10.5061/dryad.qv9s4mwpq

Abstract

Convergent evolution of similar phenotypes suggests some predictability in the evolutionary trajectories of organisms, due to strong and repeated selective pressures, and/or developmental constraints. In adaptive radiations, particularly in cichlid fish radiations, convergent phenotypes are commonly found within and across geographical settings. Cichlids show major repeated axes of morphological diversification. Recurrent changes in body patterns reveal adaption to alternative habitats, and modifications of the trophic apparatus respond to the exploitation of different food resources. Here we compare morphologically and genetically two Neotropical cichlid assemblages, the Mexican desert cichlid, and the Nicaraguan Midas cichlid, with similar polymorphic body and trophic adaptations despite their independent evolution. We found a common morphological axis of differentiation in trophic structures in both cichlid radiations, but two different axes of differentiation in body shape, defining two alternative limnetic body patterns. Adaptation to limnetic habitats implied regulation of immune functions in the Midas cichlid, while morphogenesis and metabolic functions in the desert cichlid. Convergent phenotypic adaptions could be associated to divergent gene regulation.

README: Signatures of convergence in Neotropical cichlid fish

Description of the data and file structure

This repository contains analysis scripts of the article "Signatures of convergence in Neotropical cichlid fish" published in Molecular Ecology.
R scripts are provided for the morphometric analyses of lower pharyngeal jaw shape and body shape. For the gene expression analyses, scripts from quality control of raw reads to GO enrichment analyses are provided. See section Code/Software for more information.

Sharing/Access information

Raw reads were deposited in a short read archive on NCBI in the BioProject PRJNA1148343: https://www.ncbi.nlm.nih.gov/sra/PRJNA1148343
The reference genome was retrieved from NCBI: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_013435755.1

Code/Software

R code for the analyses of landmark configurations:

lower pharyngeal jaw shape: LPJ_shape_analyses.R
body shape: Body_shape_analyses.R

Analysis pipeline for differential gene expression and gene ontology enrichment:

quality control of raw reads with FastQC: bash script 01_fastqc_raw.sh
summary of quality control with MultiQC: bash script 02_multiqc_raw.sh
read trimming with Trimmomatic: bash script 03_trim_reads.sh
quality control of trimmed reads with FastQC: bash script 04_fastqc_trimmed.sh
summary of quality control with MultiQC: bash script 05_multiqc_trimmed.sh
summary of Trimmomatic trimming statistics across samples: python3 script trim_stats.py
index reference genome with STAR RNA-seq aligner: bash script 06_index_genome.sh
map trimmed reads to reference genome with STAR: bash script 07_map_reads.sh
re-map trimmed reads to reference genome with STAR using information about newly identified splice junctions: bash script 08_remap_reads.sh
summary of STAR mapping statistics across samples: python3 script map_stats.py
check mapping quality with RSeQC (insert size, splice junctions, RPKM saturation): bash script 09a_mapqc.sh
check gene body coverage of mapped reads with RSeQC: bash script 09b_mapqc.sh
assemble mapped reads into transcripts for each sample with StringTie: bash script 10_assemble_transcripts.sh
merge assembled transcripts of all samples into consensus transcriptome with StringTie: bash script 11_merge_assemblies.sh
quantify reads per gene per sample with featureCounts: R script count_reads.R called from within bash script 12_count_reads.sh
extract fasta sequences from reference genome for each transcript with AGAT: bash script 13_extract_fasta.sh
make a gene to transcript map for Trinotate, identify open reading frames with TransDecodoer, and blast transcripts and corresponding proteins against UniProtKB Swiss-Prot database: python3 script annotation_prot_nuc.py called from within bash script 14_functional_annotation.sh
load gene-transcript-map and blast results into Trinotate and extract annotations and gene ontology terms for each gene
infer differentially expressed genes with DESeq2: R markdown gene_expression.Rmd, loads R function volcano_plot_shrink_keepcol.R for volcano plots
assign annotation to gene lists from differential expression inference: python3 script DEG_annotation.py
identify enriched gene ontology terms among differentially expressed genes with topGO: R markdown GO_enrichment.Rmd

count_reads.py was used to count raw reads and surviving reads after Trimmomatic
volcano_plot_shrink_keepcol.R holds a function for easy plotting of volcano plots, which was called from within the R markdown gene_expression.Rmd

Data from: Signatures of convergence in Neotropical cichlid fish

Data files

Abstract

README: Signatures of convergence in Neotropical cichlid fish

Description of the data and file structure

Sharing/Access information

Code/Software

Methods

Works referencing this dataset