Skip to content

semanticClimate/FORCE11TG-semanticClimate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

FORCE11TG-semanticClimate | 2026

semanticClimate: Open Tools for Knowledge Extraction from Scholarly Publications

Overview

Scholarly publications play a vital role in developing hypotheses, research projects, reports, theses, and evidence-based policies. However, despite the rapid growth of scientific literature, much of this knowledge remains locked within unstructured formats such as PDFs and lengthy reports, limiting its discoverability, reuse, synthesis, and policy impact. The increasing volume of publications also makes it challenging for researchers to stay updated with emerging evidence. Furthermore, access to literature is often constrained by repository download limits and publisher restrictions on bulk retrieval.

The current scholarly communication model is largely centered on individual papers: users search for a publication, download it, and manually read and extract relevant information. This approach is increasingly insufficient for addressing large-scale research questions that require systematic analysis of thousands of documents.

The semanticClimate approach moves beyond traditional document access by making scholarly content semantically accessible. This enables not only human readers—including those who rely on audio or alternative formats—but also machines to discover, analyze, and connect knowledge automatically. Such semantic enrichment supports the creation of machine-readable corpora, knowledge graphs, and AI-assisted literature review workflows.

To support this vision, semanticClimate promotes a suite of open-source, Python-based toolkits for large-scale literature retrieval and corpus creation pygetpapers, document processing amilib, and semantic extraction of entities such as species, locations, chemical compounds, and other climate-relevant concepts through document analysis and named entity recognition (NER) workflows. Together, these tools provide open and reproducible infrastructure for large-scale evidence synthesis, interdisciplinary research, and the transformation of scholarly knowledge beyond static PDF and text formats.

Beyond technology development, the project contributes to building open knowledge infrastructure and strengthening research capacity. By providing accessible tools, workflows, and training resources, semanticClimate supports students, early-career researchers, librarians, and domain experts in developing skills for open, machine-readable, and AI-enabled scholarship.

Objectives

  • Extract knowledge from scholarly publications using semantic tools
  • Convert research outputs into machine-readable formats
  • Support AI-assisted literature reviews
  • Create reusable semantic resources and knowledge graphs
  • Promote open scholarship and reproducible research
  • Provide training materials and community learning resources
  • Foster collaboration between researchers, librarians, data scientists, and students

Tools and Technologies

This project explores and develops workflows using:

  • Python
  • Jupyter Notebooks
  • amilib
  • pygetpapers
  • docanalysis
  • Wikidata
  • Knowledge Graph Technologies
  • Natural Language Processing (NLP)
  • Large Language Models (LLMs)
  • GitHub for Open Collaboration

Deliverables

  • Extraction of a structured climate ontology as a knowledge graph.
  • Interrogation of 15,000 pages of the IPCC AR6 reports (and hopefully emerging releases of AR7) , the current Open scientific literature.
  • Enrichment with trusted knowledge (IPCC, publications, Wikipedia)
  • Development of scholarly tech – semanticCorpus (2026) holds a searchable collection of articles and links with metadata management.
  • Development of encyclopedia/knowledge_graph -a generally accessible technology which can give non-experts answers within an hour

Learning Resources

  • Beginner tutorials
  • AI-assisted literature review workflows
  • Keyphrases extraction
  • Knowledge graph demonstrations
  • FORCE11 community resources
  • Training materials from the semanticClimate community

Citation

If you use materials from this repository, please cite the project and acknowledge the FORCE11 Task Group.

Related Resources

LICENSE

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ | License information: LICENSE

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors