Literature databases are commonly used to assess publications in a certain subject, discipline, country, or region of the world, a practice known as bibliometric analysis. The current protocol details how to use PubMed, Scopus, and Web of Science databases to do bibliometric analysis.
Literature databases (i.e., PubMed, Scopus, and Web of Science) differ in terms of their coverage, focus, and the tool they provide. PubMed focuses mainly on life sciences and biomedical disciplines, whereas Scopus and Web of Science are multidisciplinary. The protocol described in the current study was used to search for publications from Jordanian authors in the years 2013-2017. In this protocol, how to use each database to conduct this type of search is explained in detail. A Scopus search resulted in the highest number of documents (11,444 documents), followed by a Web of Science search (10,943 documents). PubMed resulted in a smaller number of documents due to its narrower scope and coverage (4,363 documents). The results also show a yearly trend in: (1) the number of publications, (2) the disciplines that have the most publications, (3) the countries of collaboration, and (4) the number of open access publications. In contrast, PubMed has a sophisticated keyword optimization service (i.e., Medical Subject Heading, or MeSH), while both Scopus and Web of Science provide search analysis tools that can produce representative figures. Finally, the features of each database are explained in detail and several indices that can be extracted using the search results are provided. This study provides a base for using literature databases for bibliometric analysis.
Classically, researchers have used literature databases to perform literature review for their studies1. Another use of these literature databases arose at the end of the 19th century, where researchers analyzed the body of literature, a use that has slowly grown since2. In the last few decades, digitizing literature and the formation of online literature databases provided an opportunity to researchers to analyze the body of literature and research performance easily and efficiently. An example would be analyzing the research performance for a document3, a subject4, a discipline5, a country6, or even a region in the world7. This type of analysis is known as bibliometric analysis. Heartsill Young defined bibliometric analysis as the use of statistical methods to analyze a body of literature to reveal historical development8. In other words, bibliometrics is the quantitative study of published units on the basis of citation and text analysis9.
Different databases are used to do bibliometric analysis and each database has different characteristics and can provide different services10. Currently, the most commonly used literature databases are the Web of Science and Scopus for almost all disciplines, both only available on a subscription basis11, and PubMed for biomedical and life sciences, a freely available database10. There is also Google Scholar, which might be an easy tool to handle, but it should not be used as a bibliometric analysis tool currently due to some deficiencies such as its unclear scope and coverage, its lack of citation analysis tools, and its inclusion of non-peer reviewed non-scientific contents12,13. Moreover, Google Scholar lacks the tools for performing advanced search and keyword optimization14.
Several previous studies have compared the features of the previously mentioned literature databases for literature review purposes3,5,10,12,13,15,16,17. However, in this study, the means by which PubMed, Scopus, and Web of Science databases are used to perform a bibliometric analysis will be provided, and the pros and cons for each of them will be compared. Bibliometric analysis can be used to analyze the research output in almost any discipline, so the target audience would be any researcher who intends to analyze publication trends. An example of analyzing a publication trend in Jordan as a country will be presented using each database. Jordan was chosen because doing a bibliometric analysis for a country (in contrast to a subject) is not very straightforward. In addition, Jordan, specifically, is poorly studied in a bibliometric way as it can be both an author name and a country name. We explain how to overcome such a challenge in the search.
NOTE: The following are search methods and an example search for each method is provided. Note that the part related specifically to bibliometric analysis is also supplied.
1. PubMed
2. Scopus
3. Web of Science
Results from PubMed search
A total of 4,363 documents were retrieved based on the search conducted in this study. Free full text was available for 1,767 documents (40.5%). In 2013, a total of 532 documents were published, 663 documents in 2014, 811 documents in 2015, 952 documents in 2016, and 1,405 documents in 2017.
The results reveal that 1,008 (23.8%) documents discussed issues related to cancer, while only 53 (1.2%) documents discussed AIDS related topics. The results also show that 150 (3.5%) documents were published in dentistry related journals, while 275 (6.5%) documents were published in nursing journals.
Results from Scopus search
A total of 11,444 documents resulted from the search conducted in the current study, including 10,974 (95.9%) articles and 470 (4.1%) reviews. Only 652 (5.7%) of the documents were open access.
Figure 4 shows the yearly trend in Jordanian publications during the 5-year interval. According to the country of collaboration in the Scopus search (Figure 5), the United States of America (USA) is the most common country Jordanian researchers collaborate with (1,553 publications), followed by Saudi Arabia with 1,176 publications, and United Kingdom with 723 publications.
Figure 6 details the 10 most common disciplines Jordanians have published in. Based on the Scopus search, medicine is the most common discipline published in (2,441 publications), followed by engineering (1,837 publications), and social sciences (1,468 publications). The University of Jordan has contributed to 3,346 (29.3%) publications of the total five year publications, followed by Jordan University of Science and Technology with 2,396 publications (21.0%), and Hashemite University by 1,347 publications (11.8%).
Results from Web of Science search
A total of 10,943 documents were published in Jordan. 87 are highly cited papers and 14 are considered to be hot papers. The results show that 2,879 documents were Open Access, 2,547 documents were Gold open access, 170 documents were Green published, and 162 documents were Green accepted (manuscript deposited in repositories upon acceptance before publication).
Figure 7 shows the yearly trend in Jordanian publications during the 5-year interval. Figure 8 details the country of collaboration. According to the Web of Science search, the USA is the most common country Jordanians collaborate with (929 publications), followed by France with 860 publications, and Austria with 429 publications. Figure 9 details the 10 most common disciplines Jordanians published in. According to the Web of Science search, engineering is the most common discipline published in (1,315 publications), followed by mathematics (1,263 publications), and computer sciences (828 publications).
Figure 1: The report for the PubMed search with color annotation for each section in the report. Please click here to view a larger version of this figure.
Figure 2: The report for the Scopus search with color annotation for each section in the report. Please click here to view a larger version of this figure.
Figure 3: The report for the Web of Science search with color annotation for each section in the report. Please click here to view a larger version of this figure.
Figure 4: The yearly trend in publications in Jordan during the 5-years period, as extracted from Scopus. Please click here to view a larger version of this figure.
Figure 5: The countries Jordanians tend to author publications with, as extracted from Scopus. Please click here to view a larger version of this figure.
Figure 6: The disciplines Jordanian publications are generally about, as extracted from Scopus. Please click here to view a larger version of this figure.
Figure 7: A bar chart showing the yearly publication trend in the years 2013-2017 in Jordan, as extracted from Web of Science. Please click here to view a larger version of this figure.
Figure 8: A bar chart showing the countries Jordanians tend to collaborate with in the years 2013-2017, as extracted from Web of Science. Please click here to view a larger version of this figure.
Figure 9: A tree map showing the 10 disciplines which most Jordanian publish in during the years 2013-2017, as extracted from Web of Science. Please click here to view a larger version of this figure.
Operator function | PubMed | Scopus | Web of Science | Example |
Both terms must appear | AND | AND | AND | |
At least one of the terms must appear | OR | OR | OR | |
The term after it must not appear | NOT | AND NOT | NOT | |
You want to find two words within an “n” distance from each other regardless of their order | X | W/n | NEAR/n | Jordan W/2 Cancer → finding a result with the words "Jordan" and "Cancer" within 2 words from each other |
You want to find a word within an “n” distance prior to the other word (order respected) | X | Pre/n | X | Jordan Pre/2 Cancer → finding a result with the words "Jordan" is preceding "Cancer" by 2 words |
You want to find the words with the specified stem, regardless of the other part of the word | X | * or ? | * | Jordan* or Jordan? → will return also the results for “Jordanian” |
You want to find a word with the specified stem and with a maximum of just one more letter after it | X | X | $ or ? | Jordan$ or Jordan? → will give results for “Jordans” but not for "Jordanian" |
Searches for the exact phrase within the quotation marks, will respect the meaning of any operators mentioned within the quotations | X | “” | “” | “Cancer in Jordan?” → will search for “cancer in Jordan” or “cancer in Jordans” |
Searches for the exact phrase within the quotation marks, without respecting the meaning of any operators mentioned within the quotations | X | {} | X | {Cancer in Jordan?} → will search for “cancer in Jordan?” only, that is it will interpret the question mark as a question mark |
Table 1: Operators to perform the specified functions for each database. Operators in PubMed must be in upper case, unlike those for Scopus and Web of Science. X=not present.
Outcome measures | PubMed | Scopus | Web of Science |
Documents each year | √ | √ | √ |
Publications in specific Journal | √ | √ | √ |
Publications per author | √ | √ | √ |
Institutional affiliation | √ | √ | √ |
Country of authors | √ | √ | √ |
Number of open access publications (Golden OA) | √ | √ | √ |
Number of open access publications (Green OA) | X | X | √ |
Publications per each document type | √ | √ | √ |
Subject area | √ | √ | √ |
Publications in specific publishers | √ | X | X |
Publications for specific MeSH terms | √ | X | √ |
Web of Science categories | X | X | √ |
Funding agency | X | X | √ |
Publications on specific gender | √ | X | X |
Publications on specific age group | √ | X | X |
Publications by a unique PubMed ID | √ | X | √ |
Publications managed by specific editor | X | X | √ |
Highly cited papers: Papers in top 1% in each subject area in terms of highest citations in the last 10 years. | X | X | √ |
Hot papers in the field: Papers that have been highly cited in the latest two months compared to the norm (average citations in peer papers). | X | X | √ |
Table 2: Outcome measures and search filters that are available for each literature database. Researchers may refer to each database's instructions for further details on using each filter.
PubMed | Scopus | Web of Science | |
Covered disciplines | Life sciences and biomedical disciplines | All disciplines | All disciplines |
Focus | Life sciences and biomedical disciplines | Physical, health, life, and social sciences | Science, technology, social sciences, arts and humanities. |
Covered period | 1966 | 1970 | 1900* |
Free/Paid | Free | Paid | Paid |
Ownership | National Institute of Health | Elsevier | Clarivate |
Professional term indexing | Yes | No | No |
Associated data search | No | No | Yes |
Old data coverage | No | No | Yes |
Figure production | No | Yes | Yes |
Open access assessment | Gold open access | Gold open access | Green and gold open access |
Friendly interface | + | ++ | +++ |
Availability of operators | + | +++ | ++ |
* Coverage depend on institutional subscription |
Table 3: Comparing the characteristics of PubMed, Scopus, and Web of Science. Information in this table is based on this study's data and the information provided by each database10,22,23,24.
In this study, the steps through which PubMed, Scopus, and Web of Science databases are used to perform a bibliometric analysis were provided. It was indicated that the friendliest and the easiest tool to use for bibliometric analysis services is Web of Science; however, its drawback is that its services are not available for free. PubMed is devoted for biomedical sciences and is affiliated with several other National Library of Medicine (NLM) tools that can help to optimize analysis of biomedical subjects. Medical Subject Heading (MeSH) is a professional indexing tool, where upon adding a new article to PubMed database, the article will be searched by experts for the main topics it discusses, and a list of MeSH will be assigned for each article. On the other hand, its main drawback is that it requires good knowledge on how to use it. Searching the Web of Science core collection will yield all articles that are published in journals indexed in the Science Citation Index Expanded (SCIE), the Social Science Citation Index (SSCI), the Arts and Humanities Citation Index (AHCI), and the newly added Emerging Source Citation Index (ESCI), where authors can choose the database within Web of Science to search in18. In addition, two other databases for books and conferences are also included19. Scopus is generally easy to use and has a database that covers more journals than the other two services20, but it is still a paid service. Table 3 further details and compares the characteristics of PubMed, Scopus, and Web of Science.
As shown in the results, each of Scopus and Web of Science database search provided different disciplines as the most common disciplines Jordanians publish in. The reasons behind these discrepancies were examined by analyzing the research area (discipline) classification for each database. It was found that Scopus search yielded 27 research areas, where publications are classified into one or more of them. On the other hand, Web of Science search yielded 140 research areas. However, Web of Science publications are classified into only one of them (no publication is classified into more than one research area). For example, the single research area "Medicine" in Scopus corresponds with 27 research areas in Web of Science, which are (numbers correspond to the contribution of each research area in the total 10,936 publications which resulted from Web of Science search):
Internal medicine (2.5%), neurology (2.2%), oncology (2.2%), surgery (1.4%), endocrinology (1.1%), pediatrics (1.1%), psychiatry (1%), experimental medicine (1%), cardiovascular system (0.9%), infectious diseases (0.9%), radiology (0.9%), orthopedics (0.7%), obstetrics and gynecology (0.7%), immunology (0.6%), rehabilitation (0.6%), hematology (0.6%), urology (0.5%), respiratory (0.4%), ophthalmology (0.3%), gastroenterology (0.3%), complementary medicine (0.3%), dermatology (0.2%), morphology (0.2%), rheumatology (0.2%), anesthesiology (0.2%), emergency medicine (0.1%), allergy (0.1%).
As explained earlier in the protocol, researchers may download search results in a CSV or XLSX format, where several tools are available to further analyze and map the results. These tools apply the concept of science mapping or bibliometric mapping, which is a spatial representation of how disciplines, fields, documents, or authors are related24,25:
In addition, researchers can use data obtained from the three databases (Pubmed, Scopus and Web of Science) and calculate several other valuable indices using data from other sources, including World Bank and the Organization for Economic Co-operation and Development (OECD). As yearly publications and the author's country of affiliation are available as outcome measurements in the three databases, the following indices can thus be measured:
As the world is divided into 9 regions according to United Nations Statistical Year Book by the United Nations27, these divisions are based on geographical, scientific, and economical considerations. These regions are: Western Europe, Eastern Europe, the United States of America (USA), Canada, Latin America and the Caribbean, Africa, Japan, Asia (excluding Japan), and Oceania.
Researchers aiming to do bibliometric analysis using the aforementioned databases should be aware of their limitations; journal coverage by Scopus and Web of Science in almost all disciplines does not reach half of the journals in Ulrich's periodicals dictionary28. This means that although Scopus and Web of Science indexed journals are based on quality, they do not cover all journals in any discipline. Moreover, non-English language journals are under-represented, as the focus of these databases are English-language journals28. One of the limitations one can encounter during the analysis is the unavailability of complete information about an article (e.g., missing author's country of affiliation), which might lead to some sort of error in the results. This can be avoided by performing a manual search for the author. However, this issue was not discussed in the analysis conducted in this study since previous studies have estimated the missing information caused by this issue to be insignificant (less than 5%)6.
The authors have nothing to disclose.
The authors would like to thank the Deanship of Scientific Research for its fund to support the video production for this study. The authors would also like to thank Dr. Aseel Zabin, Department of English Language and Literature, The University of Jordan for English language review of this study.
clarivate | N/A | Web of Science provider, where the access was provided by the subscription made by the University of Jordan. | |
Elsevier | N/A | Scopus provider, where the access was provided by the subscription made by the University of Jordan. |