Skip to Main Content
Monash Health Library


Click here to chat with a librarian

The text mining process uses Natural Language Processing (NLP) to examine large collections of documents, books, articles, websites, emails, survey results and reports. It aids in discovery of new information and helps the research literature search process.

  • Tools are used to analyse documents and identify facts, relationships, word usage and frequency that would otherwise remain buried in the mass of textual big data.
  • It can be used during the literature searching process to help build a list of keyword and search terms related to a particular research topic. 

Text mining is neatly explained in this 1m 50s video by Elsevier: 

The below text mining tools can inform, improve and refine your search terms and strategy by mining medical databases for keywords and MeSH subject headings. Note: While most of these tools rely on PubMed data, be aware that none of the below tools are as up to date as PubMed itself because PubMed doesn’t allow a third party tool access to all of their records in real time.

When used as an analysis tool, text mining cuts down the time researchers need to spend reading textual information. This means key information can be identified quickly, an increasing benefit considering the continuous growth in the volume of published information.

The following tools for analysing and exploring language in texts are freely available.

Using EndNote for subject heading frequency counts

Source:  CADTH Text Mining Opportunities: White Paper, Appendix A, pp.34-35

EndNote allows you to calculate subject heading frequencies across a collection of records at an aggregated level. PubReminer does this too, but only for MEDLINE, so EndNote is a good option for other databases such as Embase.


Procedure

Export relevant records from PubMed, MEDLINE on Ovid, Embase on Ovid, or any other database that uses a controlled vocabulary (and includes that information in the data that is imported into EndNote), then examine the syntax of the subject headings in the keyword field (for MeSH terms from PubMed or MEDLINE on Ovid, note the asterisk for major subject headings and the /; note also that there are no semi-colons)

  • Before exporting records to EndNote, you can set up some preprocessing for the subject headings that will be imported (into the keyword field, in this case) for better results, depending on how you want the headings to be tabulated:
    • Begin with an empty EndNote library
    • In EndNote 20, go to Library > define term lists > keywords
    • Select the correct delimiters — for example, if you want to treat the subheadings separately from the subject headings in PubMed or MEDLINE on Ovid, tick the box next to the “/” symbol to make sure that subheadings will be split apart from the subject headings and counted separately
    • Click on “update list”
    • Click “OK.”
  • Import the records for analysis into the new EndNote library
  • If you want to ignore the asterisk (i.e., treat major subject headings and minor subject headings in the same way), you can perform a Find and Replace to eliminate the asterisks from the keyword field:
    • Go to Edit > Find and Replace
    • Select the Keyword field (if that is the field into which the subject headings were in fact imported)
    • Find the * (uncheck Match Words) and Replace with blank
    • Click Change

To perform the frequency analysis:

  • Go to Tools > Subject Bibliography
  • Select Keywords from the list and click OK
  • Choose Select All and click OK
  • Click on Layout > Terms > Subject Terms Only
  • Change the number of lines between entries by removing ^p^p next to Suffix (this will reduce the length of the saved or printed document)
  • Change the display order to frequency by selecting By Term Count - Descending and click OK
  • Print or Save

Monash Health Library has a wide range of online journal subscriptions for staff and students to access for research and study purposes. Click here to browse through our large online journal collection. Here is a small sample of relevant Text Mining journals:


     

Our journal subscriptions are fully integrated with the Read App to bring you easy, mobile access to medical journals and the latest articles. The app uses your preferences to provide a personalised digital newsfeed based on the specialities, favourite journals and keyword searches you set up. By connecting to Monash Health subscriptions under Settings > Institutional settings, you will be automatically connected to articles in full text.

Monash Health acknowledges the Traditional Custodians of the land, the Wurundjeri and Boonwurrung peoples, and we pay our respects to them, their culture and their Elders past, present and future.

We are committed to creating a safe and welcoming environment that embraces all backgrounds, cultures, sexualities, genders and abilities.