Skip to content

MarcusGraetsch/WebsiteMachine

Repository files navigation

WebsiteMachine: Automated AI-Assisted Content Producer

Project Overview

WebsiteMachine is an experimental platform for automating the collection, extraction, and summarization of web articles using a combination of AI agents and automation tools. The project aims to streamline the process of collecting interesting articles, extracting their main content, and generating concise summaries using modern language models (LLMs). The summarized content can later be integrated into a CMS or website.

Workflow & Architecture

  1. Collect URLs: Send interesting article URLs to a dedicated email address.
  2. MailReader: Reads new emails, extracts URLs from the message body.
  3. EmailExtractor: Processes Thunderbird mailbox files to extract URLs and related metadata into CSV files.
  4. Webscraper: Fetches the main article text (title, body, author, date, source, etc.) from each URL, avoiding ads and unrelated content.
  5. Text Summarizer (AI Agent): Uses LLMs (e.g., OpenAI via LangChain) to generate summaries of the extracted articles.
  6. (Planned): Store results in a database for later publishing via a CMS (e.g., WordPress or Drupal).

Main Components

  • AI-Workers/Text_Summarizer: Scripts and Docker setup for summarizing text using LLM APIs (currently OpenAI, via LangChain).
  • AI-Workers/dalai: Placeholder for future AI/image generation tools.
  • EmailExtractorfromThunderbirdINBOX: Python scripts to extract URLs from Thunderbird mailboxes.
  • MailReader: Script to read emails via IMAP and extract relevant information.
  • Webscraper_differentVersion: Scripts to fetch and parse article HTML/text from URLs in CSVs, using BeautifulSoup and pandas.
  • Website_Text_Extractor: Extracts headlines and content from web pages and saves them to a PostgreSQL database.
  • K8_deployment: Kubernetes and Ansible deployment scripts (for future scaling and integration).

Technologies Used

  • Python, Docker, LangChain, OpenAI API, BeautifulSoup, pandas, requests, PostgreSQL, Kubernetes, Ansible.

Setup & Usage

  • Clone the repository and review each component's folder for setup instructions.
  • Install required Python packages (see Dockerfiles or scripts for dependencies).
  • Security Note: API keys and passwords are currently hardcoded in some scripts. Replace these with your own credentials and consider using environment variables or secure key management.
  • To summarize articles, ensure you have a valid OpenAI API key and required dependencies.

Current Status

  • The project is a work-in-progress and not yet production-ready.
  • Many components are prototypes or placeholders for future features.

Future Plans

  • Modularize and secure API key/credential handling.
  • Improve extraction logic for broader website compatibility.
  • Add robust error handling and logging.
  • Integrate database storage and web publishing (e.g., to a CMS).
  • Expand AI agent capabilities (e.g., image generation, multi-model support).

Contributing

Contributions are welcome! Please open issues or submit pull requests for improvements, bug fixes, or new features.


Disclaimer: This project is experimental and should not be used with sensitive data or credentials in its current form. Always secure your API keys and personal information.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors