Incorporated CLIP embeddings

Hi. thanks for sharing great works!

In this scenario, clip image embedding and text embeddings are added after forwarding shared linear layer with hyper-parameter $\alpha$.
<img width="803" alt="image" src="https://github.com/xyynafc/FaceStudio/assets/26890721/b340b102-e112-455c-9bc6-b930d98e4219">
<img width="405" alt="image" src="https://github.com/xyynafc/FaceStudio/assets/26890721/c3073afe-f7a4-401e-9134-190d66720582">

**Questions**
- Clip text embeddings would have shape of [batch size, 77, 1024] and CLIP img embeddings [batch, 1024]. Then how are they added after forwarded shared linear layer ?(I guess unsqueeze CLIP img embedding and expand it to token length (77))
- Then how $\alpha$ is set when training ?? (when inference $\alpha$ may control the influence of each type of guidance.
- If classifier-free guidance is applied, how much scale do you apply when inference ? (and how much `ucg_rate` prob which make zero embedding)

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporated CLIP embeddings #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Incorporated CLIP embeddings #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions