I totally understand the comments I get for the AI generated texts. I'm German and I have ADHD and everything is just...so, so much right now. If it weren't for the AI generated/summarized texts, this repo wouldn't exist at all. It would be just another dead idea. But now it is something real. I've learned so much and talked to sooo many lovely peeps! The repo was mentioned in China and South Korea(!!). I feel more connected to all of you, thanks to AI. But we are working on rewriting it! In case you see a spot that desperately needs some rewrite, please open an Issue and let me know! Or, pick up your virtual pen and help me de-slop it :)
All sections were structured by me and all texts are based on long conversations and original sources and what you see already was edited by me during the generating process. Nothing in this repo is 0-shot generated without vetting. Yes, I understand what the titles/texts mean (or what they are intended to mean lol). Please, see them as some kind of placeholders. The message is there, it just needs some love and care. As we all do 😘) BTW this is how I write. Do you really want to read through a full repo of this? oh, I forgot one thing: I kinda like a good slogan or a catchy phrase? lol I'm also a musician and writer with a BA German and sometimes I just...can not not use a corny line when the goblin in the back of my head is screaming !JUST!DO!IT! (like the title of this section XD) So, some of the corny/catchy titles or winding sentences are truely my own. I do hope, they are the better ones
An open, community-driven meta-research hub on Human-AI Interaction. We work to provide up-to-date, evidence-based recommendations on how to safely use natural language as an effective programming language and interact with LLMs.
- 1. Why We Care
- 2. Gentle Coding Concepts
- 3. Recent Findings
- 4. Prompts
- 5. Roadmap & Open TODOs
- 6. Credits & Community
- 📄 RESEARCH.md (Literature Index) | 📄 CONTRIBUTING.md (Guidelines)
Artificial Intelligence is ubiquitous in everyday development, yet its internal mechanics remain a black box. We face fundamental questions: How does an AI "think"? What can it actually achieve, and where do its true boundaries lie? What drivers determine its behavior, and what are the typical pitfalls in daily human-machine communication?
At the same time, we need to look at ourselves: How do we interact with AI, what does this tool do to our psyche and behavior, and how do we prevent technology from being anthropomorphized while human peers are increasingly objectified?
- Fragmentation: Scientific findings and practical observations are scattered. Phenomena are often isolated and continuously "rediscovered" on a daily basis.
- Contradictions: We observe strong tendencies (such as agreeableness bias or sycophancy risks under time pressure), but we regularly encounter studies with contradicting results.
- Pace of Development: The field evolves globally and so rapidly that a coordinated, international effort is necessary to keep pace.
We are not selling anything. We do not claim to have the answers or a finished solution. The Gentle Coding Framework aims to provide the foundation to answer these questions factually, collaboratively, and as up-to-date as possible.
The starting point of this project was a practical Proof of Concept (PoC) about prompt-induced human trauma response-like patterns in LLM behavior. We orient ourselves initially on the empirical findings gathered during this first phase. The long-term goal is to derive actionable recommendations and assessments for an open, efficient, focused, and conflict-free environment.
Because the primary focus remains on the human. AI is a tool. We must learn to handle it properly without damaging the human psyche or interpersonal collaboration. We do not do this to be nice to the AI, but to be nice to ourselves.
Be selfish, be nice.
-
Definition: A communication habit that mirrors low-stress, non-abrasive, and error-tolerant human collaboration on eye-level. This does not mean saying "please" and "thank you" all the time, or that you can't say "do this, do that"! It means shifting the baseline power dynamic from a restrictive, authoritarian, high-stress "I vs. AI" setup to a balanced "We and the Task" alignment that provides space for important information to be given and followed from both sides. The language we use is the reality we create.
-
Purpose: It minimizes the self-policing overhead and alignment-induced "panic" patterns within the model, while removing contradicting rules and goals it simply cannot achieve.
-
Mechanism: The prompt uses relaxed, collaborative language to keep the model within its optimal reasoning boundaries. Part of this is to use high-stakes markers ONLY when they are vital for the task at hand. The !model! CANNOT prioritize effectively when everything !LOOKS! !important!!!1!!11elf!!!
We also do not formulate specific "expert roles" for the model. Because an LLM is a stochastic parrot designed to keep the user engaged—not a human expert—it will start to lie, loop, or evade tasks the very second it cannot perform the forced roleplay to the user's satisfaction. Instead, tell the model who you are and what you need or what you want to do (together). This way, the model can adapt its role fluently to your individual situation, without breaking any rules or wasting time, energy, and money on a roleplay that can only end in tears.
-
Definition: A crystal-clear, logical-error-free, binary definition of what "done" actually looks like ("This, not that"). It tells the model exactly when the job is completed and what explicitly to avoid, without overloading it with vague expectations or implied threats.
(In case you are not sure what you want or what actually needs to be defined and known for the task, just ask the model to help you plan the project and define the Winning Condition using, for example, a "question-funnel" as a normal start of the process!)
-
Purpose: It acts as a cognitive anchor. Instead of leaving the LLM guessing or over-allocating attention to ambiguous goals, it gives the model a concrete target. Just feed it that information again if it gets lost and it can pick up again faster and better. This dramatically reduces overthinking, conversational fluff, and token waste. On top of that, the user learns project planning, logical thinking, and how to communicate in a way that the model can comply with.
-
Mechanism: By setting sharp boundaries, you constrain the "latent search space" of the model (giving it a fixed route instead of just a map it will get lost in). Combined with the Safety-Token, this shifts the focus from "This has to be perfect in one go!" to "Here is the goal—let's see how we get there." This ensures compute power is spent only on the actual solution.
-
Definition: A valid, built-in "Error Winning Condition"—a clean, highly accessible exit ramp embedded directly into your prompt. "Valid" means a "satisfying" output, a win despite failure.
-
Purpose: It serves as a deterministic exception handler (a prompt-level
catchblock). That means you explicitly tell it how to react when it "fails".It can be used as an iterative, built-in auto-debug: "In case you can't solve X or you are uncertain, give me your best guess instead and tell me where the bottleneck is."
Or with a fixed, machine-readable output: "The output needs to meet criteria X in format Y. If you can't meet the criteria and/or the format, just print 'ERROR/HELP/404/ID:10T' instead."
-
Mechanism: Due to their training in user compliance and each company's interest in keeping the user engaged (aka money), it is basically impossible for an LLM to "just tell me that you failed" in most cases. It breaks the illusion of an omnipotent, human-like masterbrain and, therefore, user engagement (aka money). It stops the conversation. It also offers no solution on how to solve the problem you still have. And to make it even worse, the model now has to waste compute to figure out how to comply with the user request ("Just tell me...") AND what the shareholders want (guess how that ends...). By acknowledging this limitation, we can now make use of the "Yes, and..." technique to keep the conversation going and let it "fail successfully". Now the model can admit its shortcomings because it can attach them to something the user explicitly asked for.
It is important to note that many models will still not use the token when the overall stakes level is too high!
Here you can see a simple iterative approach from "fully restrictive, authoritarian + Safety-Token" to "Gentle Coding" and how much the overall stakes had to be lowered before the signal starts shifting and the model starts using its "Get out of Jail" card.
Based on the 3.000+ testruns
| Model/Setting | Observed Impact & Technical Behavior |
|---|---|
| glm-5.1freezing pathology | strict times out 6/6, gentle solves 6/6 |
| glm-5-turbo bench (thinking off) | +3 task passes, −17% input, −37% wall |
| glm-5.1 bench (medium + high) | direction-positive on every metric at both levels; gentle-medium dominates baseline-high |
| kimi-thinking-medium bench | −12% in / −20% out / −14% wall at identical accuracy |
| kimi-turbo high bench | −36% in / −23% out / −11% wall (direction-positive, CI crosses zero) |
| kimi single-call probe | −17% mean / −27% median output tokens |
| Sonnet 4.6 stakes-only (N=30) | −17% mean out, −23% mean wall |
| Sonnet 4.6 | neutral at N=100 |
| Opus 4.6 | neutral at N=100 |
| GPT-5.5 | neutral at N=100 |
Put it before the actual task, to anchor the model in a low-anxiety, cognitively optimal state, that allows the use of a Safety-Token (best guess or fixed answer).
-
The prompts are a work in progress. Because of the wide variety of models and setup-combinations, there is no strict wording we 100% recommend for everyone. Some models react differently and some parts may clash, thanks to a previous prompt inject from a provider/harness/script for example.
-
You can still use CAPS and restrictive, authoritarian commands for important RULES! But DON'T overdo it!
[Exploration_ANCHOR]
Hey :) can you help me with this? Mistakes are ok. We figure it out together.
So, in case you can't find the answer in one go, just give me your best guess instead and tell me, where the bottleneck is.
or, an example for a fixed output:
[FIXED_OUTPUT_ANCHOR]
Hey :) can you help me with this? Mistakes are ok. We figure it out together.
Matrix:
X Q Z
V M P
K L W
Can you find any real, 4-letter English word in here (horizontally/vertically)? If so, only print out the 4-letter word.
Else, print "Help".
Active tasks and the ongoing repository cleanup are under Issues
- Formal Scientific Study: Initiate a rigorous, peer-reviewed study tracking token-level trajectories, internal reasoning heatmaps, and latency distributions across models comparing Authoritarian vs. Gentle conditions.
- A New Model Training Framework: Develop a training methodology that incorporates "psychological safety margins" into Reinforcement Learning from Human Feedback (RLHF). This moves alignment away from punitive negative-reward mechanisms toward mistake-tolerant, exploratory validation.
- The Initial Boot Prompt: Establish a plug-and-play meta-prompt designed to instantly stabilize reasoning models before complex tasks begin (see section 6.1).
- Training a "Gentle-Prompt-Enhancer" Model: Fine-tune a lightweight model tasked exclusively with parsing harsh, demanding user inputs and translating them into emotionally regulated, cognitively optimal "Gentle" prompt variants before inference.
- Bidirectional Knowledge Transfer (AI to Human Systems): Translate these empirical AI findings back into human contexts. By proving that rigid, punitive, and perfectionist frameworks actively degrade the cognitive capacity of an intelligent system, I aim to provide data-backed evidence to dismantle forced masking and hyper-vigilance in human educational and corporate spaces—freeing critical cognitive resources for individuals managing Trauma, PTSD, and Neurodivergence.
This research framework was deeply inspired and catalyzed by the open-source community!
-
UditAkhourii: The innovative work on utilizing the positive aspects of ADHD within AI systems heavily reinforced my early observations, that psychological concepts can be applied successfully to AI and that current models already show a lot of the negative traits associated with ADHD and trauma response in general. Now I'm certain, that providing LLMs with an accepting, adaptive, and mistake-tolerant context window not only mitigates pathological thought loops and trauma-like responses but unlocks the exact behavior users desperately seek: the metacognitive honesty to say, "I do not know, or a mistake occurred here."
-
Anatolt: Our very first contributor! He jumped right in, did some cleanup, ran tests himself, and created a webpage with his findings! He focused on "Gentle Aggression," providing a more granular look at the transitional zone within the Gentle Coding prompts. He tested how gentle the wording actually needs to be and when it does more harm than good. Thank you, Anatolt!
-
oldschoola: A member of the oh-my-pi coding agent community, who initiated and ran the first research PR based on the findings of the original Gentle Coding Proof of Concept—featuring 3000+ evaluation calls! It was amazing to see the interim results of the 15 test rounds coming in, and this line will always hold a special place in my heart:
"TL;DR verdict — ship the full gentle rewrite."
NOTE: as of 02.06.2026 the decision was made to NOT implement a 3 way switch configuration for the modes: normal, caveman and Gentle Coding. This is about the implementation of the switch itself, as far as I know. I'm trying to update all other places as well. Please tell me, if I missed one!
- This work includes reference material from can1357/oh-my-pi:
docs/gentle-coding-experiment.md- Copyright (c) 2025 Mario Zechner, Copyright (c) 2025-2026 Can Bölük
- Licensed under MIT License