Origin of minicircular mitochondrial genomes in red algae
Data files
May 24, 2023 version files 569.50 MB
Abstract
Eukaryotic organelle genomes are generally of conserved size and gene content within phylogenetic groups. However, significant variation in genome structure may occur. Here, we report that the Stylonematophyceae red algae contain multipartite circular mitochondrial genomes (i.e., minicircles) which encode one or two genes bounded by a specific cassette and a conserved constant region. These minicircles are visualized using Fluorescence Microscope and Scanning Electron Microscope, proving the circularity. Mitochondrial gene sets are reduced in these highly divergent mitogenomes. Newly generated chromosome-level nuclear genome assembly of Rhodosorus marinus reveals that most mitochondrial ribosomal subunit genes are transferred to the nuclear genome. Hetero-concatemers that resulted from recombination between minicircles and unique gene inventory that is responsible for mitochondrial genome stability may explain how the transition from typical mitochondrial genome to minicircles occurs. Our results offer inspiration on minicircular organelle genome formation and highlight an extreme case of mitochondrial gene inventory reduction.
Methods
Sample preparation
Culture strains of Tsunamia transpacifica JAW4874, Rufusia pilicola O7031, Stylonema alsidii JAW4424, Chroodactylon ornatum JAW4256, Chroothece mobilis SAG104.79, Rhodosorus marinus CCMP1338, and Bangiopsis subsimplex UTEX LB2854 were obtained from J.A. West (School of Biosciences 2, University of Melbourne, Parkville, Victoria 3010, Australia), F.D. Ott (905 NE Hilltop Drive, Topeka, Kansas 66617, USA), The Culture Collection of Algae at Göttingen University, Germany (SAG), The National Center for Marine Algae and Microbiota (NCMA), and the Culture Collection of Algae at The University of Texas at Austin, USA (UTEX), respectively. DY-V medium (added sea salt to 5 ppt) was used for culturing Rufusia pilicola. Chroothece mobilis and Chroodactylon ornatum were cultured in L1+DY-V (1:1 ratio) medium. The other samples were cultured in L1 medium. Culture flasks were kept under a white LED lamp (7.76 µmol photon m-2 s-1) at 20°C in a 12:12 light-dark cycle.
DNA and RNA extraction
Samples were either collected from 10 µm membrane filters or by centrifugation (30 min, 7830 rpm). Harvested cells were frozen in liquid nitrogen before grinding. Genomic DNAs for Illumina short-read sequencing were extracted using the Exgene Plant SV Kit (General Biosystems, Seoul, Korea) and cleaned up using DNeasy® PowerClean® Pro Cleanup Kit (QIAGEN, Hilden, Germany). For long-read sequencing, genomic DNA was extracted using the manual CTAB protocol with a customized lysis buffer 1. Harvested samples were placed in a 2 ml tube with bullet and frozen in liquid nitrogen. Then samples were machine ground. After grinding, samples were resuspended by adding 600 µl of CTAB isolation buffer (1% 2-mercaptoethanol added right before usage) and incubated at 65°C for 20 min. When samples were completely thawed, bullets were removed from the tubes and 6 µl of RNase A was added. After incubation, we centrifuged tubes at 14,000 rpm for 20 min. While not disturbing the pellets, samples were placed into another 2 ml tube and mixed with one volume of phenol:chloroform:isoamyl alcohol (25:24:1, v/v) before centrifugation at 14,000 rpm for 20 min. The aqueous phase was then mixed with one volume of chloroform in a new 2 ml tube and centrifuged at 14,000 rpm for 15 min. After centrifugation, one volume of 100% isopropanol was added and incubated at -20°C for 30 min. Samples were then centrifuged at 14,000 rpm for 20 min. Precipitated DNA was washed with 70% ethanol and centrifuged again to remove ethanol. Finally, DNA was air-dried and dissolved in 50 µl AE buffer from Exgene Plant SV Kit. Total RNA of R. marinus was extracted using RNeasy® Plant Mini Kit (QIAGEN, Hilden, Germany).
Whole genome sequencing and genome assembly
Library preparation and whole genome sequencing for both short-read and long-read sequencing were carried out by DNA Link Inc. (Seoul, Korea). For short-read sequencing, libraries were prepared using the Truseq Nano DNA Prep Kit (550 bp Protocol) and sequencing was done with the Illumina HiSeq2500 platform according to the protocol using 100 bp paired-end reagents. Long-read sequencing was carried out with Oxford Nanopore platform (ONT GridION) for R. marinus (6 kb size selection) and the Pacific Biosciences (PacBio) High-Fidelity (HiFi) sequencing platform for C. ornatum (no size selection). RNA-seq for R. marinus was done with the Illumina NovaSeq600 platform. The raw data from short-read sequencing were assembled using SPAdes 3.14.1 2 with ‘—careful’ pipeline option and those from long-read sequencing were assembled using NextDenovo 2.5.0 (https://github.com/Nextomics/NextDenovo) for nuclear genome of R. marinus. Assembled NextDenovo contigs were polished 3 times with Pilon 1.22 3 using short-read mapping data generated by bowtie2 2.3.5.1 4. For mitogenome assemblies using long-read data, reads that have BLAST hits to mitochondrial CDS were used. The program miniasm 0.3 (r179) 5 was used to identify the R. marinus mitogenome and IPA 1.3.1 (https://github.com/PacificBiosciences/pbipa) was used for C. ornatum. In addition, reads that had BLAST hits to the NCR were used to search for “empty” minicircle reads that do not contain a CDS, however, no contigs were assembled, meaning the collected reads are just fragments of CDS-containing reads. Because minicircles share long conserved regions that short-reads cannot discriminate, we used long-read data and NextPolish 1.4.0 (https://github.com/Nextomics/NextPolish) to polish the miniasm-derived contigs. We did not perform polishing on IPA contigs, because HiFi sequencing generates extremely accurate reads. The remaining SNPs and ambiguities were manually corrected using mapping data of long-reads containing CDS. For C. ornatum, each sequence from step 10 (10-assemble/p_ctg.fasta) was considered as a minicircle sequence, because the following step of the IPA assembler (polish and purge dups) did not function correctly.
For the short-read data, sorted and verified mitochondrial genes (see below) were used as seeds for NOVOplasty 4.2 6. Using Geneious (Biomatters, Auckland, New Zealand), generated NOVOplasty contigs were then de novo assembled. Assembled contig that codes any of mitochondrial genes was considered as part of mitochondrial genome. Those contigs were polished (-SNP & Indel) with Pilon 1.22 3, using short-read mapping data generated by bowtie2 2.3.5.1 4. Trinity 2.11.0 7 was used to assemble RNA sequencing data.
Sorting and verifying mitochondrial contigs
BLAST 2.2.31+ 8 was used to search for mitochondrial genes. Because mitochondrial gene sequences of the Stylonematophyceae were absent in the National Center for Biotechnology Information (NCBI) database, mitochondrial protein sequences from several red algae species were searched against assembled SPAdes contigs with e-value 1e-05 using tBLASTn. All the matched sequences were translated (Genetic code 4 9) and aligned against NCBI protein database (nr). Sequences that have eukaryotic taxa in the top 100 matches were considered as candidate genes. Those that only had prokaryotic taxa in the top 100 matches with significantly low identity or query coverage were also selected as possible candidates.
To exclude bacterial contigs from possible mitochondrial contigs, genomic features such as GC content, read coverage, and tBLASTn result (top match and identity) of the contig were used as criteria for selection. CDSs of each contig were compared against NCBI protein database (nr) using default parameters. These candidate contigs were verified manually using phylogenetic analysis. Using translated CDS in candidate contigs as queries, protein sequences from nr database were searched by MMSeqs2 10 (Version: 330ea3684fd3f985d0127ffe8ca5b3f13053c619) with maximum sensitivity and e-value 1e-05.
Nuclear gene prediction
RNA-seq reads were mapped against the assembled nuclear genome of R. marinus using hisat2 (2.2.1) 11 and STAR 2.7.7a 12 (--outFilterScoreMinOverLread 0.45 --outFilterMatchNminOverLread 0.45). Mapping information was used as training set of ab initio gene models, performed using BRAKER 2.1.5 13. Completeness was measured using BUSCO 3.0.2 with the ‘eukaryote_odb9’ database 14, following Cho, et al. (2023). RAD52 was not found in the C. crispus proteome and contaminant assemblies were found in the transcriptome assembly of C. ornatum. Therefore, we chose to generate a transcriptome assembly and perform gene modeling using the available RNA-seq data (see Supplementary Table S1). We used Trinity 2.11.0 7 to obtain the transcriptome assembly. cd-hit 4.8.1 16, 17 was used to cluster sequences with similarity over 95% and predicted proteins were generated using Transdecoder 5.5.0 (https://github.com/TransDecoder/TransDecoder). BUSCO 14 values were: C. crispus, C:97.4% [S:27.4%, D:70.0%], F:2.6%, M:-0.0%, n:303; and C. ornatum, C:96.7% [S:20.5%, D:76.2%], F:1.3%, M:2.0%, n:303. Consequently, we predicted several novel genes that are not present in existing red algal data.
Comparative analysis of CDSs
The mitochondrial genomes of 23 red algae representing Cyanidiophyceae, Compsopogonophyceae, Porphyridiophyceae, Rhodellophyceae, Bangiophyceae, and Florideophyceae were downloaded from NCBI nucleotide database (nt) and used for the comparison (Supplementary Table S1). Translated sequences of 11 CDSs (atp6, atp9, cob, cox1, cox2, cox3, nad1, nad2, nad4, nad5-f and nad5-s) were aligned by MAFFT 7.310 18 and concatenated. From the concatenated alignment, amino acid similarity were calculated by Geneious 10.2.3 using blosum62 matrix 19 with threshold 1 as well as nucleotide identity. Maximum-likelihood phylogenetic tree was built using IQ-TREE 1.6.8 20. Optimal evolutionary models were automatically chosen after model selection 21. Pairwise dN/dS calculation was performed by ParaAT 2.0 22 and KaKs Calculator 2.0 23.
For species identification of seven Stylonematophyceae, we downloaded rbcL sequences of 16 Stylonematophyceae and one Compsopogonophyceae from NCBI and aligned with those of our samples using MAFFT 7.310 18. IQ-TREE 1.6.8 20 was used to construct a maximum likelihood phylogenetic tree. Optimal evolutionary models were automatically chosen after model selection 21.
Detecting mitochondrial tRNA, rRNA, and other sequences in the nuclear genome
We searched for tRNA genes using ARAGORN 1.2.38 24. These sequences were initially searched in long-read assemblies of R. marinus and C. ornatum using barrnap 0.9 (https://github.com/tseemann/barrnap) and RNAmmer 1.2 25 but no matches were found. We then looked for raw reads reporting BLASTn hits on collected red algal rRNA sequences under diverse e-values, up to 100 but the results were not useful. Next, we looked for RNA transcripts of R. marinus. Long-reads were mapped on each of assembled transcripts, collected, and assembled using minimap2 (2.17-r941) 26, 27 and miniasm 0.3 (r179) 5. Transcripts whose assembled contig is predicted to have circular topology were sorted and alignments were manually inspected. LSU rRNA of C. ornatum was detected using BLASTn and LSU rRNA of R. marinus as a query but not in the case of SSU rRNA. SSU rRNA minicircle has a constant region that is shared among most of the other minicircle in R. marinus. Thus, we collect reads that have BLASTn match against constant regions and filtered out reads that have BLASTn match against coding sequences. After the assembly (described above), only one contig that has a circular topology left, a SSU rRNA minicircle of C. ornatum. Polishing was performed using the procedure described above.
For coding genes, we downloaded gene sets of six red algae and the transcriptome assembly of two red algae (Supplementary Table S1). As described above, we predicted gene sets for C. crispus, Bangiopsis sp. CCMP1999, and C. ornatum. We aligned all the mitochondrial genes from 30 species against gene sets of each red alga using DIAMOND 0.9.36.137 28 to find EGT-derived genes. Then we aligned all the gene sets against the NCBI protein database (nr). Hit queries and subjects were aligned using MAFFT 7.310 18 and a phylogenetic tree was constructed using IQ-TREE 1.6.8 20.
Genes that control genome stability were selected based on existing data 29, 30, 31, 32. Proteins were downloaded from NCBI and aligned against the red algal gene set using BLASTp (e-value 1e-03). Hit identity, alignment length, query length, and subject taxon from BLASTp results against NCBI database, as well as alignment and phylogenetic tree were taken into account to determine the presence of a gene. Specifically, for RAD52, RAD52, and its paralog/homolog protein RAD59, RTI1, and RAD22, as well as some other related proteins MGM101 and RDM1 were collected from the NCBI database for query and an e-value with a maximum of 10 was used. For NTG1, the addition of different gene families resulted in long and gap-rich alignments. To prevent this issue, domain regions that exhibited few gaps were extracted and used for reconstructing the phylogenetic tree. Seven additional red genomes were used for identification of RAD52, MSH1, and NTG1 (Supplementary Table S8 and Supplementary Notes 11). Interproscan 5.52-86.0 33 was used for domain prediction.
Transit peptides were identified using target2.0 34 (-org pl). Statistical tests were performed using the Wilcoxon rank sum test in R 4.0.3. Complete mitogenome sequences were used as BLASTn queries against nuclear genome to find NUMTs. Hits with e-value under 10-4 were considered as NUMTs 35, 36.
Raw long-read length distribution
Raw long-reads of R. marinus, C. ornatum, P. purpureum, and G. chorda were aligned against mitochondrial, plastid, and nuclear CDSs of each species (Supplementary Table S1). Plastid CDSs for R. marinus and C. ornatum were manually annotated. For P. purpureum (NC_023133.1) and G. chorda (NC_031149.1), plastid genomes were downloaded from NCBI nucleotide database (nt). Reads with hit length over 80% of CDS length and identity over 90% were used. In case of intron-rich genes (e.g., R. marinus nuclear genes), hit length over 200 bp was used as criteria. Finally, only CDS that has considerable coverage (differ genome by genome) was used for analysis to reduce noise signal.
Polymerase chain reaction (PCR) and quantitative PCR (qPCR)
R. marinus total DNA was used to confirm chimeric minicircles and primers were designed to target ends of CDSs and toward the NCR, so that NCRs were amplified. R. marinus cDNA was used to confirm trans-spliced nad5 transcript and primers were designed to target 3’ end of nad5-f and 5’ end of nad5-s. R. marinus cDNA synthesis was performed using First Strand cDNA Synthesis kit (random hexamer primer; Thermo Scientific, Massachusetts, USA). PCR was performed using AccuPower® PCR PreMix kit (BIONEER, Daejeon, Korea). All primer designs were done using a modified version of Primer3 (2.3.7) built in Geneious 10.2.3. PCR conditions consisted of initial denaturation at 95°C for 3 min, followed by 35 cycles of denaturation at 95°C for 30 sec, annealing at 55°C for 30 sec, extension at 72°C, and a final 7 min extension step at 72°C. Extension steps take 4.5 min for the former and 1 min for the latter. PCR products were purified with LaboPassTM PCR kit (Cosmo Genetech, Seoul, Korea). Purified PCR products were sequenced using Sanger method by Macrogen Inc. (Seoul, Korea).
The SsoFastTM EvaGreen® Supermix (Bio-Rad, California, USA) was used for the qPCR assays (two replicates) that were run on a CFX96TM system (Bio-Rad, California, USA). Primers were designed to amplify gene-specific 150 bp fragments and were tested in advance to check for primer-dimer formation in no-template control (NTC). Each tube contained 5 µl of supermix, 0.2 µl of forward and reverse primers, 3.6 µl of nuclease free deionized water, and 1 µl of template DNA (final volume 10 µl). Probes for Southern hybridization were used as template DNAs of standard (see Supplementary Table S2 for more information). Starting from concentration of 1ng/µl, seven 10-fold serial dilution series were prepared. Starting concentration of standard sample of sdhB was 0.01 ng/µl because sdhB is EGT-derived gene. For target samples, 0.06 ng of genomic (g)DNA extracted using the CTAB method were shaken. Concentration of gDNA was measured using Qubit® 2.0 Fluorometer and QubitTM dsDNA BR Assay Kit (Invitrogen, Massachusetts, USA) and all samples underwent the same treatment. Quantitation cycle (Cq) values were calculated in Bio-Rad CFX Manager 2.1 (Cq determination mode=Single Threshold). Copy number and PCR efficiency were calculated using equations from 37 and 38, respectively. qPCR conditions consisted of initial denaturation at 95°C for 3 min, followed by 50 cycles of denaturation at 95°C for 5 sec, annealing and extension at 60°C for 15 sec.
Probe synthesis
All fragments for the minicircle DNA genes (atp6, atp9, cob, cox1, cox2, cox3, nad1, nad5-s, and nad5-f), LSU rDNA gene, and the sdhB nuclear gene sequences were prepared from gDNA with specific primers (Supplementary Table S2) by using PCR and were purified using the LaboPassTM PCR kit (Cosmo Genetech, Seoul, Korea) prior to labeling. The digoxigenin (DIG)-labeled probes for Southern blot were synthesized using the DIG-High Prime DNA Labeling and Detection Starter Kit I (Roche Diagnostics, Mannheim, Germany), according to the manufacturer’s instructions.
Southern blot analysis
For the Southern blot analysis, 1 µg of total DNA from of R. marinus was either undigested or digested with each minicircle-suitable restriction enzyme that has only one restriction sites outside the targeted region (Supplementary Fig. 2b). The digestion products were separated using 1% agarose gel electrophoresis in TAE buffer and transferred overnight to a positively charged nylon membrane (Cat. No. 11209299001, Sigma-Aldrich, St. Louis, MO) through capillary blotting with 10X Saline-sodium citrate (SSC) buffer. After transfer, the membrane was auto-crosslinked using the Stratagene UV-Stratalinker. The crosslinked membrane was prehybridized, hybridized with the DIG-labeled probes, and then washed. Finally, the hybridized DNA probes were immunodetected with anti-digoxigenin-AP (Fab fragments) and visualized with the colorimetric substrates NBT/BCIP using the DIG-High Prime DNA Labeling and Detection Starter Kit I (Roche Diagnostics, Mannheim, Germany) according to the manufacturer's instructions. Blot images were stored by photocopying the wet filters.
Microchannel and surface preparation
Polydimethylsiloxane (PDMS) devices and positively charged surfaces were prepared as previously described 39. A PDMS microchannel was used for stretching DNA molecules. The microchannel template for PDMS was made by overlapping two layers on silicon wafer (Sogang NanoFab Facility, Korea). Thereafter, the PDMS prepolymer mixture was poured onto the microchannel template and incubated at 65 °C for 12 h. The PDMS microchannels were oxidized in an air plasma generator for 30 s at 100 W (Femto Science Cute Basic, Korea). Finally, the PDMS devices were washed and stored in deionized water. Positively charged surfaces were used for DNA molecule immobilization. In particular, silicon wafer for SEM visualization has a 30 nm uniformly oxidized layer on the surface (Wafer Market, Korea).
DNA molecule visualization under FM and SEM
DNA molecules pre-mixed with FP-DBP 40, 41 were stained with 5 % polyvinylpyrrolidone (PVP, molecular weight (MW): 40 000) solution. Stained DNA molecules were elongated and immobilized on a positively charged surface using a PDMS microfluidic device. DNA molecules were imaged under a FM. The microscopy system consisted of an inverted microscope (Olympus IX70, Japan) equipped with 100× Olympus UPlanSApo oil immersion objectives and an illuminated LED light source (SOLA SM 2 light engine, Lumencor, OR). Fluorescence images were captured using a scientific complementary metal-oxide semiconductor (sCMOS) camera (PRIME; Photometrics, AZ) and stored in a 16-bit TIFF format generated by Micro-manager software. In addition, DNA molecules were imaged using field emission SEM (FE-SEM; JSM-7100F, JEOL, Japan). Circular and supercoiled DNA molecules that appear as dots under the FM were confirmed under the SEM. The length of circular and supercoiled DNA molecules was manually measured using imageJ 42. Plasmid of known length (5.2 kb) was used to measure the degree of DNA stretch that affects length of visualized DNA molecules 39, 43.
References
- Ahn J-S, Woo S-O, Kim JH, Oh Y-S, Oak JH, Yum S-S. Optimization of RNA purification method from Ecklonia cava Kjellman (Laminariales, Phaeophyceae). ALGAE 19, 123-127 (2004).
- Nurk S, et al. Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. Journal of Computational Biology 20, 714-737 (2013).
- Walker BJ, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLOS ONE 9, e112963 (2014).
- Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357-359 (2012).
- Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103-2110 (2016).
- Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Research 45, e18-e18 (2017).
- Grabherr MG, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29, 644-652 (2011).
- Boratyn GM, et al. BLAST: a more efficient report with usability improvements. Nucleic Acids Research 41, W29-W33 (2013).
- Boyen C, Leblanc C, Bonnard G, Grienenberger JM, Kloareg B. Nucleotide sequence of the cox3 gene from Chondrus crispus: evidence that UGA encodes tryptophan and evolutionary implications. Nucleic acids research 22, 1400-1403 (1994).
- Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology 35, 1026-1028 (2017).
- Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12, 357-360 (2015).
- Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21 (2013).
- Hoff K, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32, 767-769 (2015).
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210-3212 (2015).
- Cho CH, et al. Genome-wide signatures of adaptation to extreme environments in red algae. Nature Communications 14, 10 (2023).
- Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658-1659 (2006).
- Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150-3152 (2012).
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution 30, 772-780 (2013).
- Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, USA 89, 10915 (1992).
- Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32, 268-274 (2015).
- Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods 14, 587-589 (2017).
- Zhang Z, et al. ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochemical and Biophysical Research Communications 419, 779-781 (2012).
- Zhang Z, Li J, Zhao X-Q, Wang J, Wong GK-S, Yu J. KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics, Proteomics & Bioinformatics 4, 259-263 (2006).
- Laslett D, Canback B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic acids research 32, 11-16 (2004).
- Lagesen K, Hallin P, Rødland EA, Stærfeldt H-H, Rognes T, Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Research 35, 3100-3108 (2007).
- Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094-3100 (2018).
- Li H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572-4574 (2021).
- Buchfink B, Reuter K, Drost H-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods 18, 366-368 (2021).
- Maréchal A, Brisson N. Recombination and the maintenance of plant organelle genome stability. New Phytologist 186, 299-317 (2010).
- Mishra A, Saxena S, Kaushal A, Nagaraju G. RAD51C/XRCC3 facilitates mitochondrial DNA replication and maintains integrity of the mitochondrial genome. Molecular and Cellular Biology 38, e00489-00417 (2018).
- Pannunzio NR, Watanabe G, Lieber MR. Nonhomologous DNA end-joining for repair of DNA double-strand breaks. Journal of Biological Chemistry 293, 10512-10523 (2018).
- Odahara M. Factors affecting organelle genome stability in Physcomitrella patens (2020).
- Blum M, et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Research 49, D344-D354 (2021).
- Almagro Armenteros JJ, et al. Detecting sequence signals in targeting peptides using deep learning. Life Science Alliance 2, e201900429 (2019).
- Richly E, Leister D. NUMTs in sequenced eukaryotic genomes. Molecular Biology and Evolution 21, 1081-1084 (2004).
- Pamilo P, Viljakainen L, Vihavainen A. Exceptionally high density of NUMTs in the honeybee genome. Molecular Biology and Evolution 24, 1340-1346 (2007).
- Whelan JA, Russell NB, Whelan MA. A method for the absolute quantification of cDNA using real-time PCR. Journal of Immunological Methods 278, 261-269 (2003).
- Rasmussen R. Quantification on the lightCycler. In: Rapid Cycle Real-Time PCR: Methods and Applications (eds Meuer S, Wittwer C, Nakagawara K-I). Springer Berlin Heidelberg (2001).
- Kim T, et al. Counting DNA molecules on a microchannel surface for quantitative analysis. Talanta 252, 123826 (2023).
- Lee S, et al. DNA binding fluorescent proteins for the direct visualization of large DNA molecules. Nucleic Acids Res 44, e6-e6 (2016).
- Park J, et al. Single-molecule DNA visualization using AT-specific red and non-specific green DNA-binding fluorescent proteins. Analyst 144, 921-927 (2019).
- Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9, 671-675 (2012).
- Lee S, et al. Investigation of various fluorescent protein–DNA binding peptides for effectively visualizing large DNA molecules. RSC Advances 6, 46291-46298 (2016).