🩺 GUI-DR: GUI Domain-Randomization for generating diagnostic GUI grounding evaluation data

GUI-DR is a part of a collaborative effort on Software Control Agents between Manifold Research and Fig

Overview

GUI-DR is a data augmentation pipeline built on domain randomization principles.

GUI grounding models often rely on visual primitives (shape, position, color) rather than functional semantics, and fixed-scene benchmarks do not reveal how they degrade under distribution shift. Using Mind2Web MHTML archives, GUI-DR varies visual scenes and instructions along controlled axes to generate data to evaluate or finetune models for use cases such as GUI grounding.

📢 Updates

2026-04 Initial release of GUI-Perturbed, technical report, and data generation pipeline GUI-DR.

💾 Installation

Requirements: Python ≥ 3.11. Download Mind2Web data under mm_mind2web/.

git clone https://github.com/ManifoldRG/GUI-DR.git
cd GUI-DR

Install with uv or pip below. Versions are pinned in uv.lock (see pyproject.toml). Playwright browsers are required to run the pipeline.

uv

Install uv, then:

uv sync
uv run playwright install

Use uv run python … from the repo root (or source .venv/bin/activate and run python as usual). Example: uv run python src/main.py --split test_task.

pip + venv

python3.11 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e .
playwright install
python src/main.py --split test_task

After installing (all options)

mm_mind2web/ (gitignored) at the repo root:
- mm_mind2web/data/<split>-*.parquet
- mm_mind2web/task/<task_uid>/processed/dom_content.json
- mm_mind2web/task/<task_uid>/processed/snapshots/*.mhtml
Parquets: Multimodal-Mind2Web on Hugging Face → mm_mind2web/data/ (e.g. train-*.parquet, test_task-*.parquet).

Raw dump: Task trees with processed/dom_content.json and processed/snapshots/*.mhtml per Mind2Web raw dump. Symlink or copy so each task is mm_mind2web/task/<task_uid>/…. Optional: scripts/globus_mind2web_downloader.sh for Globus transfer (needs endpoint + .env).

Parquet rows and task_uid paths must refer to the same tasks.
(Optional) For debug logging or scripts that use Globus/API keys, copy .env.example to .env and set any variables you need.

🚀 Quick Start

Default: test_task split, Style variant.

uv (from repo root):

uv run python src/main.py --split test_task

pip (venv activated):

python src/main.py --split test_task

Outputs: outputs/run_<timestamp>_test_task/<task_uid>/ with screenshots/ and trajectory.json. Other variants: Generating data.

🧪 Generating data

One run produces one variant. Choose flags to match the variant you want. Run from the repo root with the venv active (pip) or prefix with uv run (uv).

By variant

# Original (no perturbations)
python src/main.py --split test_task --enable_zoom false --enable_dense_info false --enable_style_variants false

# Precision (viewport zoom 0.7×)
python src/main.py --split test_task --enable_zoom true --zoom_level 0.7 --enable_dense_info false --enable_style_variants false

# Style (colors, fonts, restyling) — default
python src/main.py --split test_task --enable_style_variants true --enable_zoom false --enable_dense_info false

# Text Shrink (reduced font size)
python src/main.py --split test_task --enable_dense_info true --enable_style_variants false --enable_zoom false

Arguments

Argument	Default	Description
`--split`, `-s`	`train`	Split: `train`, `test_domain`, `test_task`, `test_website`.
`--enable_zoom`	`False`	Enable viewport zoom (Precision).
`--zoom_level`	`0.7`	Zoom level: `0.7`, `0.5`, or `0.3`.
`--enable_dense_info`	`False`	Enable text shrink.
`--enable_style_variants`	`True`	Enable style randomization.

Output

outputs/run_<timestamp>_<split>/<task_uid>/ contains screenshots/ and trajectory.json. Use one run per variant when building evaluation data or downstream tooling.

Pipeline overview

Input: Parquet files for the split, plus per-task dom_content.json and MHTML snapshots in mm_mind2web/.

Flow: Load parquet → for each task, load MHTML snapshots in order → per step: optionally inject UI modifications (style / zoom / text shrink) → resolve target element from parquet → capture screenshot and bbox → write trajectory.json and screenshots.

flowchart TB
  subgraph input [Input]
    Parquet[parquet files]
    MHTML[MHTML snapshots]
  end
  subgraph pipeline [Pipeline]
    ActionProc[action_processor]
    MHTMLProc[MHTMLProcessor]
    Inject[inject UI mods]
    Locate[locate element]
    Screenshot[screenshot + bbox]
  end
  subgraph output [Output]
    Trajectory[trajectory.json]
    Screens[screenshots/]
  end
  Parquet --> ActionProc
  MHTML --> ActionProc
  ActionProc --> MHTMLProc
  MHTMLProc --> Inject
  Inject --> Locate
  Locate --> Screenshot
  Screenshot --> Trajectory
  Screenshot --> Screens

Perturbations

Variant	Config	Implementation
Original	All off	No injection.
Style	`enable_style_variants=True`	randomization, generator, templates.
Precision	`enable_zoom=True`, `zoom_level` ∈ {0.7, 0.5, 0.3}	zoom.
Text Shrink	`enable_dense_info=True`	dense_info.

Instructions are generated per step from parquet target_action_reprs via generate_step_instruction. Config: config; injection: injection.

Data & resources

Resource	Description
GUI-Perturbed	Released evaluation data (screenshots, instructions, ground-truth bboxes).
Baseline result viewer	Streamlit Space: baseline 7B GUI grounding predictions on original vs perturbed screenshots.

Dataset summary

Aspect	Description
Source	Mind2Web MHTML archives (real web pages, DOM preserved).
Visual variants	Original, Style, Precision (zoom 0.7), Text Shrink. ~390 screens per variant.
Schema	`visual_variant`, `instruction_type`, `task_id`, `step_index`, `instruction`, `gt_bbox`, `screenshot`. See the dataset card.
Instructions	Direct (constructed from `target_action_reprs`); relational (in released schema).

Use this repo to reproduce or extend the data; use the Hugging Face dataset for evaluation.

Evaluation

Download the GUI-Perturbed dataset to evaluate your models. An evaluation script will be released soon.

Limitations

Perturbation realism - We prioritize diagnostic coverage over photorealism; some variants may look synthetic but still reveal reliance on color, position, or layout.
Instruction diversity - The pipeline produces direct referring expressions; relational phrasings appear in the released dataset; broader natural-language diversity is future work.
Web only - Desktop, mobile, and cross-application flows are out of scope.

❓ FAQ

Where do I get the Mind2Web data?

See the Mind2Web project for data access. Place it under mm_mind2web/ with the structure described in Installation.

Contributing

We welcome contributions: new perturbation types, bug reports, and improvements. Open an issue or pull request or reach out at our discord server.

📄 Citation

If you find GUI-Perturbed or this pipeline useful, please cite the dataset and technical report series.

@dataset{gui_perturbed_2026,
  title   = {GUI-Perturbed: A Domain-Randomized Dataset for GUI Grounding},
  author  = {Wang, Yangyue and Sikka, Harsh and Mathur, Yash, and Zhou, Tony and Nyachhyon, Jinu and Guruprasad, Pranav},
  year    = {2026},
  url     = {https://huggingface.co/datasets/figai/GUI-Perturbed},
  note    = {Built on Mind2Web (Deng et al., 2023)}
}

@software{gui_dr_code_2026,
  title   = {GUI-DR: GUI Domain-Randomization for generating diagnostic GUI grounding evaluation data},
  author  = {Wang, Yangyue and Sikka, Harsh and Mathur, Yash, and Zhou, Tony and Nyachhyon, Jinu and Guruprasad, Pranav},
  year    = {2026},
  url     = {https://github.com/ManifoldRG/GUI-DR},
  note    = {Data augmentation pipeline for GUI-Perturbed}
}

@online{gui_perturbed_technical_report_2026,
  title   = {GUI-Perturbed: A Domain Randomization Dataset for GUI Grounding},
  author  = {Wang, Yangyue and Sikka, Harsh and Mathur, Yash, and Zhou, Tony and Nyachhyon, Jinu and Guruprasad, Pranav},
  year    = {2026},
  url     = {https://blog.fig.inc/gui-perturbed-a-domain-randomization-dataset-for-gui-grounding},
  note    = {Part 1: Dataset \& methodology}
}

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
data		data
media		media
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🩺 GUI-DR: GUI Domain-Randomization for generating diagnostic GUI grounding evaluation data

GUI-DR is a part of a collaborative effort on Software Control Agents between Manifold Research and Fig

Overview

📢 Updates

💾 Installation

uv

pip + venv

After installing (all options)

🚀 Quick Start

🧪 Generating data

By variant

Arguments

Output

Pipeline overview

Data & resources

Evaluation

Limitations

❓ FAQ

Where do I get the Mind2Web data?

Contributing

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🩺 GUI-DR: GUI Domain-Randomization for generating diagnostic GUI grounding evaluation data

GUI-DR is a part of a collaborative effort on Software Control Agents between Manifold Research and Fig

Overview

📢 Updates

💾 Installation

uv

pip + venv

After installing (all options)

🚀 Quick Start

🧪 Generating data

By variant

Arguments

Output

Pipeline overview

Data & resources

Evaluation

Limitations

❓ FAQ

Where do I get the Mind2Web data?

Contributing

📄 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages