Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/tui-readmediafile-inline-image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@moonshot-ai/kimi-code": patch
---

Render inline images via Kitty/iTerm2 graphics protocol when expanding ReadMediaFile tool results in the TUI.
44 changes: 42 additions & 2 deletions apps/kimi-code/src/tui/components/messages/tool-renderers/media.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
* message.
*/

import type { Component } from '@earendil-works/pi-tui';
import { Text } from '@earendil-works/pi-tui';
import type { Component, ImageTheme } from '@earendil-works/pi-tui';
import { getCapabilities, Image, Text } from '@earendil-works/pi-tui';
import chalk from 'chalk';

import type { ChipProvider } from './chip';
Expand All @@ -28,6 +28,7 @@ export interface ReadMediaSummary {
bytes?: number;
url?: string;
originalSize?: string;
base64?: string;
}

const PATH_TAG_RE = /^<(image|video)\s+path="([^"]+)">$/;
Expand Down Expand Up @@ -56,6 +57,7 @@ export function parseReadMediaOutput(output: string): ReadMediaSummary | null {
let bytes: number | undefined;
let url: string | undefined;
let originalSize: string | undefined;
let base64: string | undefined;
let foundMedia = false;

for (const raw of parsed) {
Expand Down Expand Up @@ -88,6 +90,7 @@ export function parseReadMediaOutput(output: string): ReadMediaSummary | null {
if (data && data[1] !== undefined && data[2] !== undefined) {
mimeType = data[1];
bytes = bytesFromBase64(data[2]);
base64 = data[2];
} else {
url = u;
}
Expand All @@ -104,6 +107,7 @@ export function parseReadMediaOutput(output: string): ReadMediaSummary | null {
if (bytes !== undefined) summary.bytes = bytes;
if (url !== undefined) summary.url = url;
if (originalSize !== undefined) summary.originalSize = originalSize;
if (base64 !== undefined) summary.base64 = base64;
return summary;
}

Expand All @@ -121,6 +125,15 @@ function metaSegments(summary: ReadMediaSummary): string[] {
return segs;
}

function parseOriginalSize(size?: string): { width: number; height: number } | undefined {
if (size === undefined) return undefined;
const match = /^(\d+)x(\d+)px$/.exec(size);
if (match && match[1] !== undefined && match[2] !== undefined) {
return { width: parseInt(match[1], 10), height: parseInt(match[2], 10) };
}
return undefined;
}

export const readMediaChip: ChipProvider = (_toolCall, result) => {
if (result.is_error) return '';
const summary = parseReadMediaOutput(result.output);
Expand All @@ -132,6 +145,9 @@ export const readMediaChip: ChipProvider = (_toolCall, result) => {
return `${summary.kind} (${meta.join(', ')})`;
};

const MAX_IMAGE_ROWS = 12;
const MAX_IMAGE_WIDTH = 60;

export const readMediaSummary: ResultRenderer = (toolCall, result, ctx) => {
if (result.is_error) return renderTruncated(toolCall, result, ctx);
const summary = parseReadMediaOutput(result.output);
Expand All @@ -148,5 +164,29 @@ export const readMediaSummary: ResultRenderer = (toolCall, result, ctx) => {
if (meta.length > 0) tail.push(meta.join(', '));
if (summary.url !== undefined) tail.push(summary.url);
out.push(new Text(` ${dim(tail.join(' · '))}`, 0, 0));

// Render inline image on terminals that support Kitty / iTerm2 graphics protocols.
if (summary.kind === 'image' && summary.base64 !== undefined) {
const caps = getCapabilities();
if (caps.images === 'kitty' || caps.images === 'iterm2') {
const theme: ImageTheme = {
fallbackColor: (s: string) => chalk.hex(ctx.colors.textDim)(s),
};
const dims = parseOriginalSize(summary.originalSize);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse the dimensions format emitted by ReadMediaFile

For real ReadMediaFile results with known image dimensions, core emits the leading system text as Original dimensions: WxH pixels. (packages/agent-core/src/tools/builtin/file/read-media.ts), but this new inline-image path only looks for the older WxHpx summary format before passing dimensions to Image. That means the renderer almost always constructs the image without the original pixel size, so terminals cannot reliably reserve/scale the image with the intended aspect ratio; accept the current Original dimensions wording or source the dimensions from the actual output format before rendering.

Useful? React with 👍 / 👎.

const image = new Image(
summary.base64,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid sending huge images to the terminal

When a supported Kitty/iTerm2 terminal expands a large ReadMediaFile image, this path passes the full data-URL payload directly into the inline image component even though ReadMediaFile permits files up to 100 MB (packages/agent-core/src/tools/builtin/file/read-media.ts). The maxHeightCells/maxWidthCells options cap the displayed cell size, but they do not reduce the base64 payload being emitted, so expanding a large screenshot/photo can push tens or hundreds of MB of escape-sequence data through the TUI and make it appear hung; gate inline rendering by summary.bytes or generate a smaller thumbnail before constructing Image.

Useful? React with 👍 / 👎.

summary.mimeType ?? 'image/png',
theme,
{
maxHeightCells: MAX_IMAGE_ROWS,
maxWidthCells: MAX_IMAGE_WIDTH,
filename: summary.path ?? 'image',
},
dims ? { widthPx: dims.width, heightPx: dims.height } : undefined,
);
out.push(image);
}
}

return out;
};