Skip to content

fix: update file size calculation in dataset info#689

Open
Felipedino wants to merge 2 commits into
developfrom
fix/size-dataset
Open

fix: update file size calculation in dataset info#689
Felipedino wants to merge 2 commits into
developfrom
fix/size-dataset

Conversation

@Felipedino

Copy link
Copy Markdown
Collaborator

This pull request updates how dataset file size is calculated and displayed in the dataset visualization component. Instead of reporting the in-memory size, the system now reports the actual file size based on the Arrow table, ensuring more accurate information is shown to users.

Backend: File size calculation update

  • Changed the backend to report file_size_mb using the Arrow table's byte size instead of the DataFrame's memory usage, providing a more accurate file size metric. (DashAI/back/dataloaders/classes/dashai_dataset.py)

Frontend: File size display update

  • Updated the frontend to use the new file_size_mb field for displaying file size in the dataset visualization header, instead of the previous memory_usage_mb field. (DashAI/front/src/components/DatasetVisualization.jsx)

Copilot AI review requested due to automatic review settings June 7, 2026 17:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the dataset metadata contract so the UI displays a dataset “size” value sourced from backend metadata computed from the underlying Arrow table rather than a Pandas DataFrame memory-usage estimate.

Changes:

  • Backend: replaces general_info.memory_usage_mb with general_info.file_size_mb computed from arrow_table.nbytes.
  • Frontend: switches the dataset visualization header to read general_info.file_size_mb for display.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
DashAI/front/src/components/DatasetVisualization.jsx Switches the header’s displayed size field from memory_usage_mb to file_size_mb.
DashAI/back/dataloaders/classes/dashai_dataset.py Changes the computed dataset size metadata to use Arrow table byte size and emits it as file_size_mb.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread DashAI/front/src/components/DatasetVisualization.jsx
Comment thread DashAI/back/dataloaders/classes/dashai_dataset.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants