Data from: Signatures of convergence in Neotropical cichlid fish
Data files
Sep 02, 2024 version files 367.51 KB
-
01_fastqc_raw.sh
-
02_multiqc_raw.sh
-
03_trim_reads.sh
-
04_fastqc_trimmed.sh
-
05_multiqc_trimmed.sh
-
06_index_genome.sh
-
07_map_reads.sh
-
08_remap_reads.sh
-
09a_mapqc.sh
-
09b_mapqc.sh
-
10_assemble_transcripts.sh
-
11_merge_assemblies.sh
-
12_count_reads.sh
-
13_extract_fasta.sh
-
14_functional_annotation.sh
-
annotation_prot_nuc.py
-
Body_shape_analyses.R
-
count_reads.R
-
DEG_annotation.py
-
gene_expression.Rmd
-
GO_enrichment.Rmd
-
LPJ_shape_analyses.R
-
map_stats.py
-
README.md
-
trim_stats.py
-
volcano_plot_shrink_keepcol.R
Abstract
Convergent evolution of similar phenotypes suggests some predictability in the evolutionary trajectories of organisms, due to strong and repeated selective pressures, and/or developmental constraints. In adaptive radiations, particularly in cichlid fish radiations, convergent phenotypes are commonly found within and across geographical settings. Cichlids show major repeated axes of morphological diversification. Recurrent changes in body patterns reveal adaption to alternative habitats, and modifications of the trophic apparatus respond to the exploitation of different food resources. Here we compare morphologically and genetically two Neotropical cichlid assemblages, the Mexican desert cichlid, and the Nicaraguan Midas cichlid, with similar polymorphic body and trophic adaptations despite their independent evolution. We found a common morphological axis of differentiation in trophic structures in both cichlid radiations, but two different axes of differentiation in body shape, defining two alternative limnetic body patterns. Adaptation to limnetic habitats implied regulation of immune functions in the Midas cichlid, while morphogenesis and metabolic functions in the desert cichlid. Convergent phenotypic adaptions could be associated to divergent gene regulation.
README: Signatures of convergence in Neotropical cichlid fish
Description of the data and file structure
This repository contains analysis scripts of the article "Signatures of convergence in Neotropical cichlid fish" published in Molecular Ecology.
R scripts are provided for the morphometric analyses of lower pharyngeal jaw shape and body shape. For the gene expression analyses, scripts from quality control of raw reads to GO enrichment analyses are provided. See section Code/Software for more information.
Sharing/Access information
Raw reads were deposited in a short read archive on NCBI in the BioProject PRJNA1148343: https://www.ncbi.nlm.nih.gov/sra/PRJNA1148343
The reference genome was retrieved from NCBI: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_013435755.1
Code/Software
R code for the analyses of landmark configurations:
- lower pharyngeal jaw shape: LPJ_shape_analyses.R
- body shape: Body_shape_analyses.R
Analysis pipeline for differential gene expression and gene ontology enrichment:
- quality control of raw reads with FastQC: bash script 01_fastqc_raw.sh
- summary of quality control with MultiQC: bash script 02_multiqc_raw.sh
- read trimming with Trimmomatic: bash script 03_trim_reads.sh
- quality control of trimmed reads with FastQC: bash script 04_fastqc_trimmed.sh
- summary of quality control with MultiQC: bash script 05_multiqc_trimmed.sh
- summary of Trimmomatic trimming statistics across samples: python3 script trim_stats.py
- index reference genome with STAR RNA-seq aligner: bash script 06_index_genome.sh
- map trimmed reads to reference genome with STAR: bash script 07_map_reads.sh
- re-map trimmed reads to reference genome with STAR using information about newly identified splice junctions: bash script 08_remap_reads.sh
- summary of STAR mapping statistics across samples: python3 script map_stats.py
- check mapping quality with RSeQC (insert size, splice junctions, RPKM saturation): bash script 09a_mapqc.sh
- check gene body coverage of mapped reads with RSeQC: bash script 09b_mapqc.sh
- assemble mapped reads into transcripts for each sample with StringTie: bash script 10_assemble_transcripts.sh
- merge assembled transcripts of all samples into consensus transcriptome with StringTie: bash script 11_merge_assemblies.sh
- quantify reads per gene per sample with featureCounts: R script count_reads.R called from within bash script 12_count_reads.sh
- extract fasta sequences from reference genome for each transcript with AGAT: bash script 13_extract_fasta.sh
- make a gene to transcript map for Trinotate, identify open reading frames with TransDecodoer, and blast transcripts and corresponding proteins against UniProtKB Swiss-Prot database: python3 script annotation_prot_nuc.py called from within bash script 14_functional_annotation.sh
- load gene-transcript-map and blast results into Trinotate and extract annotations and gene ontology terms for each gene
- infer differentially expressed genes with DESeq2: R markdown gene_expression.Rmd, loads R function volcano_plot_shrink_keepcol.R for volcano plots
- assign annotation to gene lists from differential expression inference: python3 script DEG_annotation.py
- identify enriched gene ontology terms among differentially expressed genes with topGO: R markdown GO_enrichment.Rmd
- count_reads.py was used to count raw reads and surviving reads after Trimmomatic
- volcano_plot_shrink_keepcol.R holds a function for easy plotting of volcano plots, which was called from within the R markdown gene_expression.Rmd
Methods
Different morphotypes of Herichthys minckleyi and Amphilophus spp. were fished, measured, photographed in a standard position for morphometric analyses. The lower pharyngeal jaw (LPJ) was collected for morphometric analyses and spleen tissue was collected for gene expression analyses. We used the TPSdig2 pipeline to record landmarks of the body morphology and LPJ morphology. We analysed the landmark configuration with the R package geomorph and we used a linear discriminant analyses of principal components to evaluate the posterior probability for each individual to be reassigned to its original cluster (i.e. morphotype). We analysed meristic measurements of head and trunk proportions with an ANOVA in R.
We extracted RNA from spleen for paired-end sequencing on an Illumina HiSeq2000. We clipped adapters of demultiplexed reads with Trimmomatic, mapped them to the Amphilophus reference genome with STAR RNA-seq aligner and used StringTie to generate a consensus transcriptome. We quantified reads with featureCounts of the R package Rsubread and used the DESeq2 pipeline to infer differential expression. We extracted transcript sequences from the reference genome with AGAT, identified peptide-coding regions with Transdecoder and blasted them against the Swiss-Prot database from UniProtKB. We extracted Gene Ontology (GO) terms for each gene with Trinotate and used the R package topGO to identify enriched GO terms.