preference-learning

Star

Here are 71 public repositories matching this topic...

allenai / reward-bench

Star

RewardBench: the first evaluation tool for reward models.

preference-learning rlhf

Updated Feb 16, 2026
Python

tournesol-app / tournesol

Star

Free and open source code of the https://tournesol.app platform. Meet the community on Discord https://discord.gg/WvcSG55Bf3

python youtube django reactjs django-rest-framework dataset recommendation-engine preference-learning social-choice ai-ethics bradley-terry-model golden-ratio-optimization preference-aggregation

Updated Apr 10, 2026
Python

IAAR-Shanghai / ICSFSurvey

Star

Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.

decoding self-improvement knowledge-distillation data-augmentation reasoning self-consistency preference-learning hallucination self-correction attention-head large-language-models chain-of-thought large-language-model internal-consistency self-feedback self-refine self-correct

Updated Dec 7, 2024
Jupyter Notebook

dengxianghua888-ops / ecoalign-forge

Star

Multi-Agent DPO Data Synthesis Factory — 多智能体偏好训练数据自动合成框架 | 红队攻击 → 多persona审核 → 终审裁决 → DPO偏好对

multi-agent data-quality synthetic-data preference-learning red-teaming content-moderation dpo pydantic llm rlhf

Updated Apr 11, 2026
Python

liushunyu / awesome-direct-preference-optimization

Star

A Survey of Direct Preference Optimization (DPO)

review survey alignment preference-learning dpo large-language-models llm llms large-language-model reinforcement-learning-from-human-feedback direct-preference-optimization

Updated Jul 4, 2025

qxcv / magical

Star

The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)

reinforcement-learning imitation-learning preference-learning reinforcement-learning-environments

Updated Dec 5, 2023
Python

SMARTlab-Purdue / SAN-NaviSTAR

Star

This repository contains the source code for our paper: "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning". For more details, please refer to our project website at https://sites.google.com/view/san-navistar.

machine-learning reinforcement-learning transformer preference-learning robot-navigation socially-aware-navigation

Updated Mar 8, 2025
Python

JanoschMenke / metis

Star

Python-based GUI to collect Feedback of Chemist in Molecules

machine-learning drug-discovery human-in-the-loop preference-learning de-novo-drug-design generative-ai

Updated Oct 15, 2024
Python

li-plus / flash-preference

Star

Accelerate LLM preference tuning via prefix sharing with a single line of code

preference-learning dpo llm rlhf

Updated Jul 4, 2025
Python

sail-sg / dice

Star

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

alignment preference-learning large-language-models rlhf

Updated Apr 15, 2025
Python

gao-g / prelude

Star

Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".

transformers alignment user-feedback edits interpretability preference-learning gpt4 llm llms human-feedback

Updated Nov 23, 2024
Python

CJReinforce / RIME_ICML2024

Star

Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)

reinforcement-learning deep-learning robotics artificial-intelligence manipulation locomotion preference-learning reinforcement-learning-from-human-feedback

Updated Oct 15, 2024
Python

typoverflow / WiseRL

Star

PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms

reinforcement-learning pytorch preference-learning

Updated Mar 24, 2025
Python

JinXins / MergeMix

Star

[ICLR 2026] MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding

image-classification data-augmentation preference-learning mixup multimodal ranking-loss mmcv llava token-merging token-compression iclr2026

Updated Feb 27, 2026
Python

zwhong714 / weak-to-strong-preference-optimization

Star

[ICLR 2025 Spotlight] Weak-to-strong preference optimization: stealing reward from weak aligned model

alignment preference-learning llm

Updated Feb 24, 2025
Python

vicgalle / configurable-safety-tuning

Sponsor

Star

Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"

alignment safety preference-learning dpo llm

Updated Jul 27, 2024
Python

fatemehpesaran310 / lpoi

Star

Official PyTorch implementation of "LPOI: Listwise Preference Optimization for Vision Language Models" (ACL 2025 Main)

preference-learning vision-language-model hallucination-mitigation

Updated Aug 12, 2025
Python

julilien / PLDepth

Star

Code for "Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce Model" as published at CVPR 2021.

machine-learning deep-learning learning-to-rank cvpr weakly-supervised-learning preference-learning monocular-depth monocular-depth-estimation plackett-luce cvpr2021 relative-depth

Updated Feb 3, 2024
Python

Mowenyii / learn-user-pref

Star

Official implementation of "Learning User Preferences for Image Generation Models"

preference-learning

Updated Aug 16, 2025
Python

SMARTlab-Purdue / SAN-FAPL

Star

This repository contains the source code for our paper: "Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation", accepted to IROS-2022. For more details, please refer to our project website at https://sites.google.com/view/san-fapl.

machine-learning reinforcement-learning learning-from-demonstration preference-learning robot-navigation socially-aware-navigation

Updated Oct 17, 2022
Python

Improve this page

Add a description, image, and links to the preference-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the preference-learning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preference-learning

Here are 71 public repositories matching this topic...

allenai / reward-bench

tournesol-app / tournesol

IAAR-Shanghai / ICSFSurvey

dengxianghua888-ops / ecoalign-forge

liushunyu / awesome-direct-preference-optimization

qxcv / magical

SMARTlab-Purdue / SAN-NaviSTAR

JanoschMenke / metis

li-plus / flash-preference

sail-sg / dice

gao-g / prelude

CJReinforce / RIME_ICML2024

typoverflow / WiseRL

JinXins / MergeMix

zwhong714 / weak-to-strong-preference-optimization

vicgalle / configurable-safety-tuning

fatemehpesaran310 / lpoi

julilien / PLDepth

Mowenyii / learn-user-pref

SMARTlab-Purdue / SAN-FAPL

Improve this page

Add this topic to your repo