Introduction
In the rapidly evolving landscape of scientific research, bibliometric analysis has
become an indispensable tool for understanding the structure, impact, and development
of academic knowledge.[1] With the continuous expansion of global research output, traditional methods of
literature review and academic assessment often fall short in providing comprehensive
insights. Bibliometric analysis, through its systematic approach, enables researchers,
policymakers, and institutions to evaluate the progression of research topics.[2]
Whether analyzing the evolution of a research topic, measuring institutional research
output, assessing journal impact, or evaluating an author's contribution, bibliometric
methods provide objective and reproducible insights.[3] There are databases that provide the analysis on their Web sites. For example, in
Web of Science (WOS), insights about citations, top authors, or institutions can be
obtained in some clicks. In addition to WOS, Scopus also provides bibliometric analytics
on their Web site.[4] However, these are subscription-based databases and many resource-limited settings
may face difficulty in accessing these databases.[5] In contrast, PubMed is a freely accessible database for biomedical scientific literature.[6]
With this background, this article provides a brief technical guide on how to conduct
a bibliometric analysis for a topic, an institution, a journal, and an author from
the data obtained from PubMed and analysis done in Biblioshiny and VOSviewer.[7]
Materials and Methods
Tools Required
For bibliometric analysis, researchers need two basic tools—Biblioshiny and VOSviewer.
The method about installation of these software is described for a Windows PC. Steps
to use those in an Apple's Macintosh computers are almost similar.
Biblioshiny: To run Biblioshiny, researchers need to install R and Rstudio Desktop.
R is a powerful language and environment specifically designed for statistical computing
and graphics. It offers a comprehensive suite of statistical and graphical techniques,
making it widely used among statisticians, data analysts, and researchers.[8] Rstudio is an integrated development environment designed specifically for R, aimed
at enhancing the experience of statistical computing and data analysis.[9]
[Fig. 1] shows a page on Rstudio. It provides a user-friendly interface that includes features
such as a script editor, console, workspace viewer, and tools for plotting and debugging,
making it easier to write and manage R code.
Fig. 1 Screenshot of Rstudio software showing the console on the left lower side where codes
are written and packages tab on the right lower side from where various packages can
be downloaded.
Biblioshiny is a Web-based application that provides a user-friendly graphical interface
for performing bibliometric analysis using the Bibliometrix R package.[10] Designed to simplify complex bibliometric workflows, Biblioshiny allows users to
import, analyze, and visualize scientific publication data without the need for coding.[7] It supports a wide range of data formats from major databases like Scopus, WOS,
and PubMed.
For accessing Biblioshiny, researchers need a Web browser (e.g., Firefox, Google Chrome,
Edge). Hence, to run the Biblioshiny, they should keep the browser installed on the
computer, R package installed, and Rstudio installed.
To run Bibliometrix package, researchers should open the Rstudio and install the bibiometrix
package with the following codes:
-
Code to install: install.packages(“bibliometrix”)
-
Code to load the package: library(bibliometrix)
-
Code to launch biblioshiny: biblioshiny()
This will open the Biblioshiny Web interface on the default Internet or Web browser.
The software is now ready for analysis. Researchers need to keep the Rstudio running
on the background while analyzing data on the Web browser.
VOSviewer: VOSviewer is a free software tool designed for constructing and visualizing
bibliometric networks.[11] These networks can be of coauthorship, keyword cooccurrence, or many others.[12] One of the key strengths of VOSviewer lies in its ability to handle large data sets
and present complex relationships in an intuitive and visually appealing manner. The
software provides various layout and clustering techniques to identify patterns and
groupings within the data, making it especially useful for mapping the intellectual
structure of research fields. This tool has also been explored for text analysis.[13] Anyone can download and use it without installing it in the system. However, the
computer needs Java 8 or higher version to be installed on the computer to run the
software. Java is a high-level, object-oriented programming language known for its
platform independence and robustness. It is widely used for building Web applications,
mobile apps, enterprise software, and bioinformatics.[14] It can be downloaded free from their Web site. As VOSviewer is a standalone Java
application, simply clicking on the VOSviewer application file (e.g., VOSviewer.exe
in Windows computer) will open it.
Data Collection from PubMed
To collect data from PubMed for a bibliometric analysis, researchers typically begin
by formulating a well-defined search strategy using relevant keywords. Researchers
can plan the keywords according to commonly used words and phrases and Medical Education
Subject Headings terms.[15] Boolean operators (e.g., AND, OR) are used for combining search terms in PubMed.[16] Researchers should remember that this is a crucial step and any error or weakness
in search strategy will give erroneous analysis. Hence, after multiple piloting and
consensus among the team members, search strategy should be finalized.
On the search results page, researcher need to click on the “Save” button to expand
the available options ([Fig. 2]) and then select “All results” to save all the results, or choose to save only the
results displayed on the current page or only selected items. There are four options
for saving the data: Summary (text), PubMed, PubMed Identifier (PMID),[17] Abstract (text), and comma separated value (CSV). For using the data in Biblioshiny
and VOSviewer, the PubMed format is required. However, for screening studies, the
CSV format is more suitable. Therefore, researchers should save the files in both
formats and store for further use. On clicking the “Create file” button, the file
will be saved on the computer.
Fig. 2 A screenshot showing a PubMed search result page where the user is saving “all results”
in “PubMed format” by clicking the “create file” button.
For screening, researchers should open the CSV file using spreadsheet software such
as Microsoft Excel and the PubMed format file (a text file and can be opened with
any text editor) in Microsoft Notepad. They can review the titles of the saved results
in this Excel file and mark studies that should be excluded from the analysis. To
remove a specific study, researchers can copy its PMID and search (Ctrl + F) it in
PubMed format file. The data for each study typically starts with the PMID and ends
with the SO (which stands for source or reference). To delete a particular study,
researchers should select and remove the content from the PMID to the SO of that study.
Data Analysis
Biblioshiny: To perform an analysis in Biblioshiny, researchers needs to open Rstudio
and enter the commands “library(bibliometrix)” followed by “biblioshiny().” This will
launch the Biblioshiny Web interface in default internet browser. Researchers should
click on the “Data” tab and then select “Import or Load.” Then, they should look on
the right side of the window, choose “Import raw file(s),” set the database as “PubMed,”
and keep the author's name format as “surname and initials.” Next, they need to click
the “Browse” button to select the file for analysis ([Fig. 3]). After selecting the downloaded PubMed format file and allowing it to upload, clicking
on the “Start” button will load the file into Biblioshiny. A window will display the
data quality; and researchers need to click “Save.” After a few seconds, a tick mark
will confirm successful data loading. Then, the window can be “close (d)” and the
data is ready for analysis.
Fig. 3 A screenshot of Biblioshiny where data import button (Import or Load) is showing
on the left side and import options (file types, database, author name format) are
showing on the right side of the image.
Researchers can apply filters using the options on the left side of the screen (see
[Fig. 4]), such as filtering by year or language. To run an analysis, they need to click
on the relevant analysis button on the left panel. In the top right corner of the
results section ([Fig. 4]), there are three buttons—play (to run analysis), plus (to add for exporting), and
download (to save the image). If an analysis does not appear immediately, clicking
on the play button and waiting a few moments will do. To export the results into an
Excel file, researchers need to click on the plus (+) button and to save images, use
the download button (↓). After completing the analysis and selecting the desired reports
to export, clicking on the “Report” button will navigate to the report page. From
here, researchers can click on the “Export Report” button and reports will be saved
as a spreadsheet file (Excel format).
Fig. 4 A screenshot of Biblioshiny where the most relevant sources (i.e., journals) under
sources of an analysis is shown.
Many researchers may encounter an issue with Biblioshiny where after loading the data,
the browser page becomes unresponsive and further analysis cannot be performed. In
such cases, the problem can usually be resolved by installing the Chromote package,
which provides a headless Chrome Web browser interface. To do this, researchers needs
to type the following command in the Rstudio console: install.packages(“chromote”).
This will install the required dependency, and the issue should be resolved.
VOSviewer: As VOSviewer does not require installation, researchers need to run the
software by clicking on “VOSviewer.exe” file. On the landing page, researchers should
click on “Create” from the File tab and select “Create a map based on bibliographic
data” and click “Next.” As the file sourced from a database, researchers need to choose
“Read data from bibliographic database files” and proceed by clicking “Next.” Then,
they need to click on the PubMed tab, select the file saved on computer disk, and
click “Next” to choose the type of analysis and counting method.
At this stage ([Fig. 5]), researchers need to select either “Co-authorship” (to analyze author collaboration
networks) or “Co-occurrence” (to analyze keyword networks). For example, to visualize
a coauthorship network, they need to select “Co-authorship,” choose the counting method
as “Full counting,” and optionally check the box to “Ignore documents with a large
number of authors,” specifying the desired threshold.
Fig. 5 A screenshot of VOSviewer showing the “create” button on the left side and pop-up
window for choice filling on the type of analysis, unit of analysis, and counting
method.
Clicking on “Next” will open a screen where researchers can define inclusion criteria
based on the minimum number of documents per author. Another “Next” click will display
a list of authors, along with the number of their documents and link strength. Finally,
clicking the “Finish” button will generate the network visualization ([Fig. 6]). Researchers can switch between different visualization types like “Network Visualization,”
“Overlay Visualization,” and “Density Visualization” by using the tabs provided. To
save the network for future use, one should click on the “Screenshot” button. A pop-up
window will appear, that will allow saving the network in various formats such as
JPG, PDF, TIFF, and others.
Fig. 6 A screenshot of VOSviewer showing the coauthorship network, visualization tab (on
the upper portion), and options to modify the visualizations (scale, size variation,
background, etc.) on the right side.
Data of Author, Institution, Journal, and Topic
The steps to conduct bibliometric analysis using Biblioshiny and VOSviewer are summarized
in [Table 1]. Neither of the two software tools provides dedicated options to analyze data by
topic, institution, journal, or author. Instead, the analysis depends entirely on
the input data supplied by the researchers. Therefore, researchers must collect and
curate the specific data they wish to analyze. The following section briefly describes
the data collection methods for analyzing research by topic, institution, journal,
or author.
Table 1
Steps of bibliometric analysis at a glance
|
Step
|
Action
|
Brief
|
|
1
|
Define objective
|
Choose analysis focus (topic, author, journal, or institution)
|
|
2
|
Formulate search strategy
|
Use PubMed to search published articles. Authors can search other database also, if
they have access
|
|
3
|
Save data from PubMed
|
Save the search result in both PubMed format and CSV
|
|
4
|
Clean and curate data
|
Use Excel for screening and Notepad to remove entries using PMIDs
|
|
5
|
Set up Biblioshiny
|
Install R, Rstudio, and Bibliometrix package
|
|
6
|
Set up VOSviewer
|
Ensure Java is installed; download VOSviewer. No need to install
|
|
7
|
Analyze with Biblioshiny
|
Launch with biblioshiny() in Rstudio, it will open interface in Web browser, import
PubMed format file, run analysis
|
|
8
|
Visualize with VOSviewer
|
Click on “Create” from the file tab, “Create a map based on bibliographic data” and
follow next to analysis with customization of thresholds
|
|
9
|
Save visual outputs
|
Save visual output (JPG, PDF, etc.) from Biblioshiny and VOSviewer for further use
|
|
10
|
Interpret and report
|
Analyze patterns and trends; include graphs and metrics in reports
|
Abbreviations: CSV, comma separated value; PMID, PubMed Identifier.
For example, if researchers want to search for a particular author, such as “Sudip
Bhattacharya,” they can simply enter the name in the PubMed search bar without using
any field tags. However, a more targeted search would look like: Sudip Bhattacharya[Author].
Here, the author in square bracket is the PubMed tag for searching author.[18]
To search for publications from a particular institution, researchers should first
identify the different formats in which the institution's name is commonly cited.
For example, to retrieve articles from All India Institute of Medical Sciences, Deoghar,
the following query can be used: ((All India Institute of Medical Sciences[Affiliation])
OR (AIIMS[Affiliation])) AND (Deoghar[Affiliation]). Here, the full form and abbreviation
of institution was used with OR and the city name with AND Boolean.
Similarly, researchers should use both the full title and the abbreviated form of
a journal for bibliometric analysis of a journal. For example, to analyze the journal
Indian Journal of Radiology and Imaging, they need to use: “Indian Journal of Radiology and Imaging” OR “Indian J Radiol
Imaging”[Journal]. Till date, this journal has a total of 1,647 articles in PubMed
starting from 2008 (48 articles) to 2025 (55 articles, till April 23, 2025). The data
can be saved as described in [Fig. 2].
For topic-based searches, if keywords are already framed, they can be directly used
in PubMed. For instance, to find papers on actigraphy-based research in diabetes,
restricted to title and abstract ([tiab] is tag for title and abstract), the following
can be used: “actigraphy”[tiab] AND “diabetes”[tiab]. If the keywords and tags are
not yet formulated, researchers can use the “Advanced” option to build a search string
step-by-step ([Fig. 2]).[19]
Once the data is collected, the analysis process follows the same steps as described
earlier. However, certain limitations may apply depending on the nature of the data
set. For example, if data from a single journal is analyzed in Biblioshiny, the source
(i.e., journal) field will contain only one entry. Similarly, if data for a single
author is analyzed, that author will dominate metrics such as “most relevant author.”
Discussion
Searching bibliometric data on an author, institution, journal, and topic is essential
for gaining a comprehensive understanding of research trends, productivity, and impact
of the author, institution, journal, and research field.[20] An author-level analysis highlights key contributors, their collaborations, research
field, and productivity over time. Commonly, the author's impact is determined by
indices like the h-index, i-10 index, or g-index.[21] However, more holistic data like collaboration with authors and countries, research
focus, working on trending topics, and other analytics will provide more comprehensive
data about an author. This data may help in assessing the research output of a researcher.
Analyzing data by institution reveals leading research trends of centers and the authors
who are contributing to the institutions with higher numbers of publications, journals
where articles are published, and other metrics. This may help the yearly research
output report of an institution.[22] Examining journal-level metrics is commonly not required by the editors as many
of the publishers provide the report along with their packages. However, there are
many journals that are using open-source publishing platforms or managing a journal
with limited manpower that may not manage to generate the bibliometrics frequently
to assess the growth. For them, the methods described in this article would help.
And the analysis will help to analyze more insights than only the citation counts.[23] Topic-based searches help identify research hotspots, emerging themes, and gaps
in knowledge. Many of the authors are now conducting bibliometric analyses and publishing
them for continuing scientific discourse on a particular topic.[24]
Beyond PubMed, several other bibliographic databases are widely used in bibliometric
research, offering broader or more specialized coverage depending on the research
question. Scopus (by Elsevier) and WOS (by Clarivate Analytics) are the most prominent
among them, known for their comprehensive indexing across disciplines, including medicine,
social sciences, and engineering.[25] These databases provide advanced filtering options, citation tracking, and citation
metrics such as the h-index, which are crucial for in-depth bibliometric assessments.
Dimensions.ai, a newer entrant, is gaining popularity due to its integration of grants,
clinical trials, patents, and policy documents alongside traditional publications,
offering a more holistic research landscape.[26]
In addition, a variety of specialized tools support bibliometric analysis and visualization.
CiteSpace is a platform for identifying research trends and knowledge evolution over
time.[27] Meanwhile, commercial platforms like InCites (by Clarivate)[28] and SciVal (by Elsevier)[29] offer institution-level metrics and benchmarking tools but require subscriptions.
Each tool and database has its own strengths, and selecting among them depends on
the research objective, available resources, and desired depth of analysis.
The novelty of this article lies in its hands-on, beginner-friendly approach to bibliometric
analysis using entirely free and open-access resources. Unlike most existing tutorials
that assume prior expertise or rely on subscription-based databases, this guide was
designed specifically to support researchers in resource-constrained environments.
However, a key limitation is the exclusive reliance on a single database (PubMed),[30] which may lead to incomplete coverage of interdisciplinary topics or articles indexed
in other major bibliographic sources like Scopus or WOS. Additionally, some advanced
functionalities available in commercial databases, like citation counts and top articles
are not addressed here, making this guide more suitable for basic to intermediate
bibliometric analysis.
Web Sites