Skip to main content

Research funding for male reproductive health and infertility in the UK and USA [2016 – 2019]

Cite this dataset

Gumerova, Eva; De Jonge, Christopher; Barratt, Christopher (2022). Research funding for male reproductive health and infertility in the UK and USA [2016 – 2019] [Dataset]. Dryad.


There is a paucity of data on research funding levels for male reproductive health (MRH). We investigated the research funding for MRH and infertility by examining publicly accessible webdatabases from the UK and USA government funding agencies. Information on the funding was collected from the UKRI-GTR, the NIHR’s Open Data Summary, and the USA’s NIH RePORT webdatabases. Funded projects between January 2016 and December 2019 were recorded and funding support was divided into three research categories: (i) male-based; (ii) female-based; and (iii) not-specified. Between January 2016 and December 2019, UK agencies awarded a total of £11,767,190 to 18 projects for male-based research and £29,850,945 to 40 projects for female-based research. There was no statistically significant difference in the median funding grant awarded within the male-based and female-based categories (p=0.56, W=392). The USA NIH funded 76 projects totalling $59,257,746 for male-based research and 99 projects totalling $83,272,898 for female-based research Again, there was no statistically significant difference in the median funding grant awarded between the two research categories (p=0.83, W=3834). This is the first study examining funding granted by main government research agencies from the UK and USA for MRH. These results should stimulate further discussion of the challenges of tackling male infertility and reproductive health disorders and formulating appropriate investment strategies.


Experimental Design:

Publicly accessible UK Research and Innovation (UKRI), National Institute for Health Research (NIHR), and National Institutes of Health (NIH) funding agency databases covering awards from January 2016 to December 2019 were examined (see Supplementary Table 1). Following the inclusion and exclusion criteria outlined within Supplementary Tables 2,3, funding data were collected on research proposals investigating infertility and reproductive health. For simplicity, these are referred to collectively as ‘infertility research’. As the primary focus of this research is on infertility, the data were divided into three main categories: (i) male-based, (ii) female-based, and (iii) not-specified (Supplementary Table 2). The first two groups covered projects whose primary aim, based on the information presented in the research abstracts, timeline summaries and/or impact statements, was male- or female-focussed. “Not-specified” includes research projects that have either not specified a primary focus towards either male or female or have explicitly stated a focus on both. The process was conducted and reviewed by E.G. with C.L.R.B. Total funding for all three groups, funding over time, and comparison with overall funding for a particular agency was examined. Briefly, E.G. retrieved the primary data and produced the first set of data for discussion with C.L.R.B. Both went through the complete list and discussed each study/project and decided whether: (a) it should be included or not, and (b) what category does it fell under (male-, female-, or not-specified). The abstracts, which were almost always available and provided by each research study, were all examined and scrutinised by both E.G. and C.L.R.B together. If there was clear disagreement between E.G. and C.L.R.B, which were very rare, the project would not be included.

UK Data Collection:

From April 2018 the UK research councils, Innovate UK, and Research England are reported under one organization, the UKRI (2019). The councils independently fund research projects according to their respective visions and missions; however, until 2018/19, their annual funding expenditures were reported under the UKRI’s annual reports and budgets. The UKRI’s Gateway to Research (UKRI-GTR) web database allows users to analyse the information provided on taxpayer-funded research. Relevant search terms such as “male infertility” or “female reproductive health” (see Supplementary Table 2) were applied with appropriate database filters (Supplementary Table 1). The project award relevance was determined by assessing the objectives in project abstracts, timeline summaries, and planned impacts. Supplementary Tables 1, 2 and 3 provide the search filters and the reference criteria for inclusion/exclusion utilized for analysis. The UKRI-GTR provides the total funding granted to the projects within a designated period.

Data obtained from the NIHR had minor differences. The NIHR has 6 datasets. The Open Data Summary View dataset was used as it provided details on funded projects, grants, summary abstracts, and project dates. Like the UKRI data, the NIHR excel datasheet had specific search terms and filters applied to sift out irrelevant projects (Supplementary Tables 1-3).

The UKRI councils and NIHR report their annual expenditure and budgets for 1st April to 31st March. Thus, the projects will fall under the funding period of when their research activities begin (e.g. if a project’s research activities undergo between May 20th, 2017, to March 20th, 2019, this project will be categorized under the funding period 2017/18). The projects collected would begin their investigations between January 2016 to December 2019, therefore 5 consecutive funding periods were examined (2015/16, 2016/17, 2017/18, 2018/19, and 2019/20). The UK data collection period ran between October 2020 to December 2020.

USA Data Collection:

The NIH has a research portfolio online operating tools sites (RePORT) providing access to their research activities, such as previously funded research, active research projects, and information on NIH’s annual expenditures. The RePORT-Query database has similar features as the UKRI-GTR and NIHR such as providing information on project abstracts, research impact, start- and end-dates, funding grants, and type of research. Like the UK data collection, appropriate search terms were inputted with the database filters applied and followed the same inclusion-exclusion criteria (Supplementary Tables 1, 2, and 3).

The UK and US agencies present data on funded research under different calendar and funding periods because the US’ federal tax policy requires federal bodies to report all funding expenses under a fiscal year (FY). The NIH’s FY follows a calendar period from October 1st to September 30th (e.g., FY2016 comprises funding activity from October 1st, 2015, to September 30th, 2016). Projects running over one calendar period are reported several times under consecutive fiscal years and the funds are divided according to the annual period of the project’s activity.

During data collection, 74 projects were found as active with incomplete funding sums as the NIH divides the grants according to the budgeting period of every FY. The NIH are in the process of granting funds for the FY2021, so projects ending in 2020 or 2021 provide a complete funding sum. For the active projects ending after 2021, incomplete funding data is provided. It is assumed the funding will increase in value by the time the research ends in the future, but the final awarded sum is unknown. To remain consistent with the UK data, projects granted funding are totalled as one figure and recorded under the FY the project first began research, whether they are active or completed. Thus US funding is referred to as “Current Total Funding”. When going through the REPORTER database, the NIH present the same research project multiple times for every funded fiscal year with consecutive project reference IDs. Therefore, for simplicity, we only included the first project reference ID. For more information on deciphering NIH's project's IDs, see For the USA, the initial data collection period ran between October 2020 to December 2020 but then restarted for a brief period in January 2021 to add up the remaining funding values for some of the active research projects. 

Data Analysis:

The data was divided into three main groups and organized into the funding period or FY the project was first awarded. R-Studio (Version 1.3.1093) was utilized for the data analysis. Box-and-whisker plots are presented with rounded P-values. Kruskal-Wallis and Wilcoxon Rank Sum tests were generated to assess any statistical significance. The data was independently collected and does not assume a normal distribution, so the rank-based, non-parametric tests such as the Kruskal-Wallis and Wilcoxon Rank Sum were used.

Research Project Details Included in the Collection Datasets:

For both, the UK and USA data, we included the following details:

  • The project (or study) titles
  • The Project IDs (also referred to as Project Reference or Project Number) 
  • The project Start and End Dates
  • The project's Status (identified by the end dates or if explicitly stated in the database)
  • The Funding Organisation (for the UK) and Admin Institute (for the USA) that are funding the research
  • The project Category (i.e. Research Grants or Fellowships)
  • The Amount Granted (for the USA, the funding values were summed up to the most recent awarding date). 

Rearranging/Processing Data for Analysis:

After the data collection has been completed, the data was processed into a simpler format in Notepad in order to perform the statistical analyses using RStudio. For that, only the essential details were included and organised that the RStudio system would recognise and analyse the information effectively and efficiently. The project Type (male, female or not-specifieded), funding sum for the respective research project Type, and the funding period (UK) / FY (USA) were included. These details were then arranged appropriately to produce box-and-whisker plots with P-values, perform the chosen statistical analysis tests, and produce the data statistics in RStudio. As mentioned earlier, the funding period/fiscal years were added following the timeframes set out by the respective countries.

Usage notes

When going through the excel sheets:

  • For the UK, the data collection was divided between UKRI-GTR and NIHR as they used different databases, and reported different Project IDs/Reference values for an individual to look up. The funding awarded for UKRI project MR/N022556/1 is for a Core centre rather than an individual project. EG and CB have decided to include it because it would entail that the funding received to run the MRC core centre will go into supporting active projects ran by the group of researchers. Thus, will go into furthering infertility and reproductive health research. 
  • For the USA, an additional column was added: The Last Fiscal Year of Funding Reported. Since the data collection period ran between October 2020 to December 2020, and for a brief time in January 2021, there are several projects that are active and have incomplete funding due to the nature of how the US reports their funding activity and division of their Fiscal Year. Thus, those projects that were found as currently 'active' at the time of data collection and analysis (between December 2020 to early February 2021) had their most recent awarding fiscal year for that project added (shown in the RePORTER database). Once the data analysis was performed, the datasets/datasheet was updated to see if the currently active projects had updated their funding. We only included the first project reference ID of the research study. 

The funding periods/fiscal years were added to the scripts used for the data analysis.

The .txt files were used in the statistical analyses and plot production. The files were imported to RStudio using the Import Dataset with the "from Text (read)" function. When importing all .txt files, the Delimiter was set to Tab.

For the UKdata_for_plots and USdata_for_plots files, the columns were set to be the following:

  • 'Type' was set as Factor and was listed as: 'Male-based, Female-based, Not-Specified'
  • The 'Funding Period' / 'Fiscal Year' was set as Character
  • The 'Funds' was set as Numeric

For the UKdata_for_stats_tests and USdata_for_stats_tests .txt files, the three columns (male, female, and not-specified) were all set as Numeric.