Skip to content

Dump raw training data for the LLM-jp-3 series #46

@hkiyomaru

Description

@hkiyomaru

Dump raw training data for the LLM-jp-3 series. For each training instance, the following fields should be included at least:

  • token_ids: A list of token IDs for the training instance
  • training_step: Training step at which the training instance was processed
  • dataset: Name of the dataset from which the instance was sourced
  • document_ids: IDs of the documents associated with the training instance

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions