This archive contains data for the paper 'Adélie penguin population diet monitoring by analysis of food DNA in scats' and scripts used to process that data. The archive contains four directories: 1. AdelieDietPopulationResults The processed data. Each directory contains a combination of files for a population, or sex, or all scats analysed in the study. Each name _aggregated.txt file is for an individual scat and is readable by humans. Each population directory contains a 'population_summaries' directory which has .csv files summarising the results and pie charts and bar charts summarising them as well. 2. FASTQfiles The raw data. These are the files produced by the IonTorrent. 3. SampleKeyFiles This contains .csv format spreadsheet-type files that list the forward and reverse primer tag combinations that correspond to each Adelie Penguin scat sample in the FASTQ file with the same name prefix. If you use the software in the last folder 'SSUdietPipeline' to process FASTQ file AAD49.FASTQ, for example, and use AAD49_samples.csv in that process then the final files will all be named correctly for each Adelie penguin scat included in the run. 4. SSUdietPipeline Scripts for processing the FASQ files and getting the results in archive 1. These scripts run in Python on Linux and depent upom the following software: R RPY Usearch BLAST The process requires these files: 1. A settings file. This contains paths to other files listed below and other settings. This has to be edited so that the paths are all correct for the given system. The settings can be altered for different primer sets if required. 2. The custom indexes used to identify forward and reverse primers. These are in the files 'ftags' and 'rtags.' 3. A file with a list of taxa to aggregate sequences to. This can be derived from the 'training' mode output which gives a list of all closest matches in the database and their taxonomy. 4. Files for a reference database formatted with EMBL taxonomy. The database used is included in the archive. 5. A renaming file, which is a .csv format file listing the names of the samples and their forward + reverse primer index name combination. An example renaming file is included. useage: stage 1 'training' python /home/SSUdietPipeline/PopulationDietSummaryMaster.py -mode training -in /home/simon/PenguinDietIonTorrentRuns/filename.fastq stage 2 'summary' python /home/SSUdietPipeline/PopulationDietSummaryMaster.py mode summary -in /home/simon/PenguinDiet/ -re /home/renamingFile.csv -taxa /home/agggregationTaxa