Hypermedia-based software architecture enables test-driven development
Data files
Oct 16, 2023 version files 3.82 KB
- 
              
                README.md
                2.54 KB
- 
              
                SLOC.csv
                383 B
- 
              
                Test_coverage.csv
                616 B
- 
              
                Test_timing.csv
                277 B
Abstract
Objectives:
Using agile software development practices, develop and evaluate a software architecture and implementation for reliable management of bioinformatic data that is stored in the cloud.
Materials and Methods:
CORE (Comprehensive Oncology Research Environment) Browser is a new open-source web application for cancer researchers to manage sequencing data organized in a flexible format in Amazon Simple Storage Service (S3) buckets. It has a microservices- and hypermedia-based architecture, which we integrated with Test-Driven Development (TDD), the iterative writing of computable specifications for how software should work prior to development. Optimal testing completeness is a tradeoff between code coverage and software development costs. We hypothesized this architecture would permit developing tests that can be executed repeatedly for all microservices, maximizing code coverage while minimizing effort.
Results:
After one-and-a-half years of development, the CORE Browser backend had 121 tests designed for repeated execution and 875 custom tests that were executed 3,031 times, providing 78% code coverage.
Discussion:
Hypermedia architecture’s repeating pattern, links, permits CORE Browser to implement tests that can be executed repeatedly by every microservice to achieve high code coverage. Code coverage correlates with software reliability. Other benefits of this architecture include permitting access to bucket data from outside the application and separating management of bioinformatic data from analysis.
Conclusion:
Architectural choices are important enablers of modern software development practices, such as TDD. Updating software architecture may be a critical next step in agile transformation after an engineering team implements the structural changes on which most such transformations focus.
Keywords:
High-throughput nucleotide sequencing, software, data management, cloud computing.
https://doi.org/10.5061/dryad.pvmcvdnrv
This data submission includes the automated testing results described in the manuscript:
- Test coverage.csv: per-component and overall test coverage described in the Results and Table 2.
- Test timing.csv: per-component and overall wall clock time to run the tests, averaged from 5 trials.
- SLOC.csv: per-component and overall source lines of code (SLOC) in the project.
Test coverage was computed as total statements executed out of total statements present in the component's source code directory, using cov version 4.0.0, pytest 7.2.1, and python 3.10.10 on Windows 10.
SLOC was calculated using pygount 1.5.1 on Windows 10 as the number of lines in the text of each component's source code excluding comment lines.
Description of the data and file structure
Test coverage.csv
- Component: the name of the software component (described in Table 1 of the manuscript). N/A means not applicable, the component uses no storage.
- Storage: the type of data storage used by the component, allowing comparisons of test run time by type of data storage.
- # Unit Tests: the number of unit tests executed for the component.
- # Integration Tests: the number of integration tests executed for the component.
- # Tests: the total number of tests for the component.
- % Coverage: the number of lines in the text of each component's source code excluding comment lines.
The file also contains totals for each numerical column and the overall average test coverage.
Test timing.csv
- Trial #: the trial number.
- Time elapsed (secs): elapsed time to run all tests in seconds.
- Time elapsed (mins): elapsed time to run all tests in minutes.
Underneath, the file also contains summary statistics for elapsed time in minutes, including mean, standard deviation (stdev), alpha, sample size (number of trials), confidence value, min, and max. Summary statistics for elapsed time in seconds were not calculated and are listed as N/A (not applicable).
SLOC.csv
- Component: the name of the software component (described in Table 1 of the manuscript).
- # Custom tests: the number of automated tests specific to the component (as opposed to the shared tests described in the manuscript).
- SLOC: the number of lines of text in the component's source code directory excluding comment lines.
The file also contains totals for each numerical column.
