Skip to content

ManifoldRG/GUI-DR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

70 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GUI-DR Banner

🩺 GUI-DR: GUI Domain-Randomization for generating diagnostic GUI grounding evaluation data

Technical Reports | Fig Hugging Face Data GUI-DR GitHub Contribute on Discord

Dataset & Methodology Model Robustness Evaluation

GUI-DR is a part of a collaborative effort on Software Control Agents between Manifold Research and Fig

Fig Logo Manifold Research Logo

Overview

GUI-DR is a data augmentation pipeline built on domain randomization principles.

GUI grounding models often rely on visual primitives (shape, position, color) rather than functional semantics, and fixed-scene benchmarks do not reveal how they degrade under distribution shift. Using Mind2Web MHTML archives, GUI-DR varies visual scenes and instructions along controlled axes to generate data to evaluate or finetune models for use cases such as GUI grounding.

gui-dr-diagram


πŸ“’ Updates


πŸ’Ύ Installation

Requirements: Python β‰₯ 3.11. Download Mind2Web data under mm_mind2web/.

git clone https://github.com/ManifoldRG/GUI-DR.git
cd GUI-DR

Install with uv or pip below. Versions are pinned in uv.lock (see pyproject.toml). Playwright browsers are required to run the pipeline.

uv

Install uv, then:

uv sync
uv run playwright install

Use uv run python … from the repo root (or source .venv/bin/activate and run python as usual). Example: uv run python src/main.py --split test_task.

pip + venv

python3.11 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e .
playwright install
python src/main.py --split test_task

After installing (all options)

  1. mm_mind2web/ (gitignored) at the repo root:

    • mm_mind2web/data/<split>-*.parquet
    • mm_mind2web/task/<task_uid>/processed/dom_content.json
    • mm_mind2web/task/<task_uid>/processed/snapshots/*.mhtml

    Parquets: Multimodal-Mind2Web on Hugging Face β†’ mm_mind2web/data/ (e.g. train-*.parquet, test_task-*.parquet).

    Raw dump: Task trees with processed/dom_content.json and processed/snapshots/*.mhtml per Mind2Web raw dump. Symlink or copy so each task is mm_mind2web/task/<task_uid>/…. Optional: scripts/globus_mind2web_downloader.sh for Globus transfer (needs endpoint + .env).

    Parquet rows and task_uid paths must refer to the same tasks.

  2. (Optional) For debug logging or scripts that use Globus/API keys, copy .env.example to .env and set any variables you need.


πŸš€ Quick Start

Default: test_task split, Style variant.

uv (from repo root):

uv run python src/main.py --split test_task

pip (venv activated):

python src/main.py --split test_task

Outputs: outputs/run_<timestamp>_test_task/<task_uid>/ with screenshots/ and trajectory.json. Other variants: Generating data.


πŸ§ͺ Generating data

One run produces one variant. Choose flags to match the variant you want. Run from the repo root with the venv active (pip) or prefix with uv run (uv).

By variant

# Original (no perturbations)
python src/main.py --split test_task --enable_zoom false --enable_dense_info false --enable_style_variants false

# Precision (viewport zoom 0.7Γ—)
python src/main.py --split test_task --enable_zoom true --zoom_level 0.7 --enable_dense_info false --enable_style_variants false

# Style (colors, fonts, restyling) β€” default
python src/main.py --split test_task --enable_style_variants true --enable_zoom false --enable_dense_info false

# Text Shrink (reduced font size)
python src/main.py --split test_task --enable_dense_info true --enable_style_variants false --enable_zoom false

Arguments

Argument Default Description
--split, -s train Split: train, test_domain, test_task, test_website.
--enable_zoom False Enable viewport zoom (Precision).
--zoom_level 0.7 Zoom level: 0.7, 0.5, or 0.3.
--enable_dense_info False Enable text shrink.
--enable_style_variants True Enable style randomization.

Output

outputs/run_<timestamp>_<split>/<task_uid>/ contains screenshots/ and trajectory.json. Use one run per variant when building evaluation data or downstream tooling.

Pipeline overview

Input: Parquet files for the split, plus per-task dom_content.json and MHTML snapshots in mm_mind2web/.

Flow: Load parquet β†’ for each task, load MHTML snapshots in order β†’ per step: optionally inject UI modifications (style / zoom / text shrink) β†’ resolve target element from parquet β†’ capture screenshot and bbox β†’ write trajectory.json and screenshots.

flowchart TB
  subgraph input [Input]
    Parquet[parquet files]
    MHTML[MHTML snapshots]
  end
  subgraph pipeline [Pipeline]
    ActionProc[action_processor]
    MHTMLProc[MHTMLProcessor]
    Inject[inject UI mods]
    Locate[locate element]
    Screenshot[screenshot + bbox]
  end
  subgraph output [Output]
    Trajectory[trajectory.json]
    Screens[screenshots/]
  end
  Parquet --> ActionProc
  MHTML --> ActionProc
  ActionProc --> MHTMLProc
  MHTMLProc --> Inject
  Inject --> Locate
  Locate --> Screenshot
  Screenshot --> Trajectory
  Screenshot --> Screens
Loading

Perturbations

Variant Config Implementation
Original All off No injection.
Style enable_style_variants=True randomization, generator, templates.
Precision enable_zoom=True, zoom_level ∈ {0.7, 0.5, 0.3} zoom.
Text Shrink enable_dense_info=True dense_info.

Instructions are generated per step from parquet target_action_reprs via generate_step_instruction. Config: config; injection: injection.


Data & resources

Resource Description
GUI-Perturbed Released evaluation data (screenshots, instructions, ground-truth bboxes).
Baseline result viewer Streamlit Space: baseline 7B GUI grounding predictions on original vs perturbed screenshots.

Dataset summary

Aspect Description
Source Mind2Web MHTML archives (real web pages, DOM preserved).
Visual variants Original, Style, Precision (zoom 0.7), Text Shrink. ~390 screens per variant.
Schema visual_variant, instruction_type, task_id, step_index, instruction, gt_bbox, screenshot. See the dataset card.
Instructions Direct (constructed from target_action_reprs); relational (in released schema).

Use this repo to reproduce or extend the data; use the Hugging Face dataset for evaluation.


Evaluation

Download the GUI-Perturbed dataset to evaluate your models. An evaluation script will be released soon.


Limitations

  • Perturbation realism - We prioritize diagnostic coverage over photorealism; some variants may look synthetic but still reveal reliance on color, position, or layout.
  • Instruction diversity - The pipeline produces direct referring expressions; relational phrasings appear in the released dataset; broader natural-language diversity is future work.
  • Web only - Desktop, mobile, and cross-application flows are out of scope.

❓ FAQ

Where do I get the Mind2Web data?

See the Mind2Web project for data access. Place it under mm_mind2web/ with the structure described in Installation.


Contributing

We welcome contributions: new perturbation types, bug reports, and improvements. Open an issue or pull request or reach out at our discord server.


πŸ“„ Citation

If you find GUI-Perturbed or this pipeline useful, please cite the dataset and technical report series.

@dataset{gui_perturbed_2026,
  title   = {GUI-Perturbed: A Domain-Randomized Dataset for GUI Grounding},
  author  = {Wang, Yangyue and Sikka, Harsh and Mathur, Yash, and Zhou, Tony and Nyachhyon, Jinu and Guruprasad, Pranav},
  year    = {2026},
  url     = {https://huggingface.co/datasets/figai/GUI-Perturbed},
  note    = {Built on Mind2Web (Deng et al., 2023)}
}

@software{gui_dr_code_2026,
  title   = {GUI-DR: GUI Domain-Randomization for generating diagnostic GUI grounding evaluation data},
  author  = {Wang, Yangyue and Sikka, Harsh and Mathur, Yash, and Zhou, Tony and Nyachhyon, Jinu and Guruprasad, Pranav},
  year    = {2026},
  url     = {https://github.com/ManifoldRG/GUI-DR},
  note    = {Data augmentation pipeline for GUI-Perturbed}
}

@online{gui_perturbed_technical_report_2026,
  title   = {GUI-Perturbed: A Domain Randomization Dataset for GUI Grounding},
  author  = {Wang, Yangyue and Sikka, Harsh and Mathur, Yash, and Zhou, Tony and Nyachhyon, Jinu and Guruprasad, Pranav},
  year    = {2026},
  url     = {https://blog.fig.inc/gui-perturbed-a-domain-randomization-dataset-for-gui-grounding},
  note    = {Part 1: Dataset \& methodology}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors