Use of natural language processing tools for the analysis and development of scientific engineering articles

Authors

DOI:

https://doi.org/10.15381/risi.v17i2.29906

Keywords:

Scientific articles, key bigrams, key phrases, natural language, text summarizing, text similarity

Abstract

It is important to have tools that allow us to extract useful information from scientific texts without having to read all their content. For example, when it is necessary to determine the topics covered in scientific articles, establish an author's line of research through the review of their publications, or design the summary of an article and its keywords. So, the objective of this research is to use natural language processing techniques to extract useful information from scientific engineering articles. The twenty-two articles published by an author are taken to use them as the base document for the analysis, which is divided into: general analysis of all the articles, and particular analysis per published article. As a result of the first, the key words and bigrams were obtained, with the words “data”, “energy”, and “model” being the most frequent, and the bigrams “solar photovoltaic”, “explanatory variables”, and “renewable energies”, the most important ones. From the second analysis, it was obtained that the hierarchical bigrams for each article represent a good approximation of their keywords, and there is also a high similarity between the summaries obtained by applying natural language techniques to the articles published during the year 2024 and their summaries, being the one obtained with GPT2 presented the highest level of similarity. With the key phrases obtained with SGRank, the topic of the respective articles could be determined.

Downloads

Download data is not yet available.

Downloads

Published

2024-12-31

Issue

Section

Artículos

How to Cite

[1]
“Use of natural language processing tools for the analysis and development of scientific engineering articles”, Rev.Investig.sist.inform., vol. 17, no. 2, pp. 5–15, Dec. 2024, doi: 10.15381/risi.v17i2.29906.