FORCE11TG-semanticClimate | 2026

Visit the FORCE11 Task Group page

semanticClimate: Open Tools for Knowledge Extraction from Scholarly Publications

Overview

Scholarly publications play a vital role in developing hypotheses, research projects, reports, theses, and evidence-based policies. However, despite the rapid growth of scientific literature, much of this knowledge remains locked within unstructured formats such as PDFs and lengthy reports, limiting its discoverability, reuse, synthesis, and policy impact. The increasing volume of publications also makes it challenging for researchers to stay updated with emerging evidence. Furthermore, access to literature is often constrained by repository download limits and publisher restrictions on bulk retrieval.

The current scholarly communication model is largely centered on individual papers: users search for a publication, download it, and manually read and extract relevant information. This approach is increasingly insufficient for addressing large-scale research questions that require systematic analysis of thousands of documents.

The semanticClimate approach moves beyond traditional document access by making scholarly content semantically accessible. This enables not only human readers—including those who rely on audio or alternative formats—but also machines to discover, analyze, and connect knowledge automatically. Such semantic enrichment supports the creation of machine-readable corpora, knowledge graphs, and AI-assisted literature review workflows.

To support this vision, semanticClimate promotes a suite of open-source, Python-based toolkits for large-scale literature retrieval and corpus creation pygetpapers, document processing amilib, and semantic extraction of entities such as species, locations, chemical compounds, and other climate-relevant concepts through document analysis and named entity recognition (NER) workflows. Together, these tools provide open and reproducible infrastructure for large-scale evidence synthesis, interdisciplinary research, and the transformation of scholarly knowledge beyond static PDF and text formats.

Beyond technology development, the project contributes to building open knowledge infrastructure and strengthening research capacity. By providing accessible tools, workflows, and training resources, semanticClimate supports students, early-career researchers, librarians, and domain experts in developing skills for open, machine-readable, and AI-enabled scholarship.

Objectives

Extract knowledge from scholarly publications using semantic tools
Convert research outputs into machine-readable formats
Support AI-assisted literature reviews
Create reusable semantic resources and knowledge graphs
Promote open scholarship and reproducible research
Provide training materials and community learning resources
Foster collaboration between researchers, librarians, data scientists, and students

Tools and Technologies

This project explores and develops workflows using:

Python
Jupyter Notebooks
amilib
pygetpapers
docanalysis
Wikidata
Knowledge Graph Technologies
Natural Language Processing (NLP)
Large Language Models (LLMs)
GitHub for Open Collaboration

Deliverables

Extraction of a structured climate ontology as a knowledge graph.
Interrogation of 15,000 pages of the IPCC AR6 reports (and hopefully emerging releases of AR7) , the current Open scientific literature.
Enrichment with trusted knowledge (IPCC, publications, Wikipedia)
Development of scholarly tech – semanticCorpus (2026) holds a searchable collection of articles and links with metadata management.
Development of encyclopedia/knowledge_graph -a generally accessible technology which can give non-experts answers within an hour

Learning Resources

Beginner tutorials
AI-assisted literature review workflows
Keyphrases extraction
Knowledge graph demonstrations
FORCE11 community resources
Training materials from the semanticClimate community

Citation

If you use materials from this repository, please cite the project and acknowledge the FORCE11 Task Group.

Related Resources

LICENSE

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ | License information: LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FORCE11TG-semanticClimate | 2026

Visit the FORCE11 Task Group page

semanticClimate: Open Tools for Knowledge Extraction from Scholarly Publications

Overview

Objectives

Tools and Technologies

Deliverables

Learning Resources

Citation

Related Resources

LICENSE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FORCE11TG-semanticClimate | 2026

Visit the FORCE11 Task Group page

semanticClimate: Open Tools for Knowledge Extraction from Scholarly Publications

Overview

Objectives

Tools and Technologies

Deliverables

Learning Resources

Citation

Related Resources

LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages