Key generic technology prediction in patent citation using graph neural networks
Data files
Jan 11, 2024 version files 11.16 MB
-
A)Table_of_key_generic_indicators_for_nodes_(partial_1).csv
2.31 MB
-
B)Table_of_key_generic_indicators_for_nodes_(partial_2).csv
4.08 MB
-
C)patent.content
3.63 MB
-
D)patent.cites
1.13 MB
-
E)Graph_neural_network_modeling_highest_accuracy_for_different_dimensions.csv
99 B
-
F)Prediction_effects_of_key_generic_technologies.csv
243 B
-
README.md
4.34 KB
Jan 11, 2024 version files 11.16 MB
-
A)Table_of_key_generic_indicators_for_nodes_(partial_1).csv
2.31 MB
-
B)Table_of_key_generic_indicators_for_nodes_(partial_2).csv
4.08 MB
-
C)patent.content
3.63 MB
-
D)patent.cites
1.13 MB
-
E)Graph_neural_network_modeling_highest_accuracy_for_different_dimensions.csv
99 B
-
F)Prediction_effects_of_key_generic_technologies.csv
243 B
-
README.md
4.32 KB
Abstract
With the rapid advancement of the Fourth Industrial Revolution, international competition in technology and industry is intensifying. However, in the era of big data and large-scale science, making accurate judgments about the key areas of technology and innovative trends has become exceptionally difficult. This paper constructs a patent indicator evaluation system based on the dimensions of key and generic patent citation, integrates graph neural network modeling to predict key common technologies, and confirms the effectiveness of the method using the field of genetic engineering as an example. According to the LDA topic model, the main technical R&D directions in genetic engineering are genetic analysis and detection technologies, the application of microorganisms in industrial production, virology research involving vaccine development and immune responses, high-throughput sequencing and analysis technologies in genomics, targeted drug design and molecular therapeutic strategies, genetically modified crop improvement. The accuracy of predicting key generic technologies related to graph neural networks reaches 97.67%. Based on patent citation theory and the graph neural network models, this paper considers the structural and technical attributes of cited patents, providing theoretical and empirical evidence for technology prediction, and possessing certain theoretical and practical value.
README: Key generic technology prediction in patent citation using graph neural networks
This README file was generated on 2023-11-25 by Mingli Ding.
GENERAL INFORMATION
- Author Information Investigators Contact Information Name: Mingli Ding; Wangke Yu; Shuhua Wang Institution: Jingdezhen Ceramic University Address: Jingdezhen, Jiangxi, China Email: mlding1@163.com
- Date of data collection:2013-2022
DATA & FILE OVERVIEW
- File List:
A) Table of key generic indicators for nodes (partial 1).csv
B) Table of key generic indicators for nodes (partial 2).csv
C) patent.content
D) patent.cites
E) Graph neural network modeling highest accuracy for different dimensions.csv
F) Prediction effects of key generic technologies.csv
DATA-SPECIFIC INFORMATION FOR: Table of key generic indicators for nodes (partial 1).csv
- Number of variables: 10
- Number of cases/rows: 72489
- Variable List:
- technical coverage: number of national economic classifications
- patent families: number of patent families
- patent family citation: patent family average annual citation frequency
- patent cooperation: whether there are more than two applicants who have jointly applied for a patent.
- enterprise-enterprise cooperation: whether more than two enterprises have jointly applied for a patent.
- industry-university-research cooperation: whether any enterprises have applied for a patent jointly with universities or research institutions.
- claims: number of claims
- citation frequency: average annual citation frequency
- layout countries: number of layout countries
- layout countries: age of patents
DATA-SPECIFIC INFORMATION FOR: Table of key generic indicators for nodes (partial 2).csv
- Number of variables: 10
- Number of cases/rows: 72489
- Variable List:
- technical convergence: number of deputy IPCs (International Patent Classification)
- cited countries: number of cited countries
- inventors: number of inventors
- citations: number of forward citing times
- homologous countries/areas: number of homologous countries/areas
- degree centrality: the degree to which a node in a network is associated with all other nodes can be calculated, and is the simplest indicator to quantify the influence of a node.
- closeness centrality: whether a node is at the core of a technological network, indicating how close the node is to all other nodes in the network.
- betweenness centrality: number of multiple other nodes when it is on the shortest path, characterizes the node as having a strong resource control ability in the network.
- eigenvector centrality: it is a measure of the importance of nodes in a graph, which is based on the idea that the centrality of a node is a function of the centrality of its neighboring nodes.
- PageRank: an algorithm used to rank the importance of nodes on the web, defined as a function on a collection of web pages that assigns a positive real number to each web page to indicate its importance, and these values form a vector as a whole.
DATA-SPECIFIC INFORMATION FOR: patent.content
- Number of variables: 22
- Number of cases/rows: 72489
- Variable List:
- ID: ID number of patents
- variables2-21: the same as the variables in file A) and file B)
- label: CORE (the patent is key generic technology) or NON (the patent is not key generic technology).
DATA-SPECIFIC INFORMATION FOR: patent.cites
- Number of variables: 2
- Number of cases/rows: 72489
- Variable List:
- source: the ID number of the cited patent
- target: the ID number of citing patent
DATA-SPECIFIC INFORMATION FOR: Graph neural network modeling highest accuracy for different dimensions.csv
- Number of variables: 4
- Number of cases/rows: 3
- Variable List:
- dimensions of graph network: 4
- dimensions of graph network: 8
- dimensions of graph network: 12
- dimensions of graph network: 16
DATA-SPECIFIC INFORMATION FOR: Prediction effects of key generic technologies.csv
- Number of variables: 10
- Number of cases/rows: 3
- Variable List:
- epochs: 100
- epochs: 200
- epochs: 300
- epochs: 400
- epochs: 500
- epochs: 600
- epochs: 700
- epochs: 800
- epochs: 900
- epochs: 1000
Methods
These datasets were obtained by the Incopat patent database for cited patents (2013-2022) in the field of genetic engineering.
Details for the datasets are provided in the README file.
This directory contains the selection of the patent datasets.
1) Table of key generic indicators for nodes (partial 1).csv
This file consists of 10 indicators of patents: technical coverage, patent families, patent family citation, patent cooperation, enterprise-enterprise cooperation, industry-university-research cooperation, claims, citation frequency, layout countries, and layout countries.
2) Table of key generic indicators for nodes (partial 2).csv
This file consists of 10 indicators of patents: technical convergence, cited countries, inventors, citations, homologous countries/areas, degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, and PageRank.
3) patent.content
The content file contains descriptions of the patents in the following format: <ID_number> <technical_attributes> + <class_label>. Each line contains two patent ID numbers. The first entry is the ID number of the patent being cited and the second publish number stands for the patent which contains the citation. The direction of the link is from right to left. If a line is represented by "patent1 patent2" then the link is "patent2->patent1".
4) patent.cites
The first entry in each line contains the unique string ID number of the patents followed by binary values indicating whether the value of each patent exceeds the average of the corresponding indicator (indicated by 1) or absent (indicated by 0) in the patent. Finally, the last entry in the line contains the class label of the patent.
5) Graph neural network modeling highest accuracy for different dimensions.csv
This file shows the best accuracies of GCN, SAGE, and GAT models in different dimensions.
6) Prediction effects of key generic technologies.csv
This file shows the accuracies of GCN, SAGE, and GAT models in different epochs.