WebsiteMachine: Automated AI-Assisted Content Producer

Project Overview

WebsiteMachine is an experimental platform for automating the collection, extraction, and summarization of web articles using a combination of AI agents and automation tools. The project aims to streamline the process of collecting interesting articles, extracting their main content, and generating concise summaries using modern language models (LLMs). The summarized content can later be integrated into a CMS or website.

Workflow & Architecture

Collect URLs: Send interesting article URLs to a dedicated email address.
MailReader: Reads new emails, extracts URLs from the message body.
EmailExtractor: Processes Thunderbird mailbox files to extract URLs and related metadata into CSV files.
Webscraper: Fetches the main article text (title, body, author, date, source, etc.) from each URL, avoiding ads and unrelated content.
Text Summarizer (AI Agent): Uses LLMs (e.g., OpenAI via LangChain) to generate summaries of the extracted articles.
(Planned): Store results in a database for later publishing via a CMS (e.g., WordPress or Drupal).

Main Components

AI-Workers/Text_Summarizer: Scripts and Docker setup for summarizing text using LLM APIs (currently OpenAI, via LangChain).
AI-Workers/dalai: Placeholder for future AI/image generation tools.
EmailExtractorfromThunderbirdINBOX: Python scripts to extract URLs from Thunderbird mailboxes.
MailReader: Script to read emails via IMAP and extract relevant information.
Webscraper_differentVersion: Scripts to fetch and parse article HTML/text from URLs in CSVs, using BeautifulSoup and pandas.
Website_Text_Extractor: Extracts headlines and content from web pages and saves them to a PostgreSQL database.
K8_deployment: Kubernetes and Ansible deployment scripts (for future scaling and integration).

Technologies Used

Python, Docker, LangChain, OpenAI API, BeautifulSoup, pandas, requests, PostgreSQL, Kubernetes, Ansible.

Setup & Usage

Clone the repository and review each component's folder for setup instructions.
Install required Python packages (see Dockerfiles or scripts for dependencies).
Security Note: API keys and passwords are currently hardcoded in some scripts. Replace these with your own credentials and consider using environment variables or secure key management.
To summarize articles, ensure you have a valid OpenAI API key and required dependencies.

Current Status

The project is a work-in-progress and not yet production-ready.
Many components are prototypes or placeholders for future features.

Future Plans

Modularize and secure API key/credential handling.
Improve extraction logic for broader website compatibility.
Add robust error handling and logging.
Integrate database storage and web publishing (e.g., to a CMS).
Expand AI agent capabilities (e.g., image generation, multi-model support).

Contributing

Contributions are welcome! Please open issues or submit pull requests for improvements, bug fixes, or new features.

Disclaimer: This project is experimental and should not be used with sensitive data or credentials in its current form. Always secure your API keys and personal information.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
AI-Workers		AI-Workers
EmailExtractorfromThunderbirdINBOX		EmailExtractorfromThunderbirdINBOX
K8_deployment		K8_deployment
MailReader		MailReader
Webscraper_differentVersion		Webscraper_differentVersion
Website_Text_Extractor		Website_Text_Extractor
docs		docs
tests		tests
.env.example		.env.example
.gitignore		.gitignore
NewWorkCulture_Blog.drawio		NewWorkCulture_Blog.drawio
db_utils.py		db_utils.py
docker-compose.yml		docker-compose.yml
history.prompts		history.prompts
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebsiteMachine: Automated AI-Assisted Content Producer

Project Overview

Workflow & Architecture

Main Components

Technologies Used

Setup & Usage

Current Status

Future Plans

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WebsiteMachine: Automated AI-Assisted Content Producer

Project Overview

Workflow & Architecture

Main Components

Technologies Used

Setup & Usage

Current Status

Future Plans

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages