Skip to content

Incorporated CLIP embeddings #3

@yjhong89

Description

@yjhong89

Hi. thanks for sharing great works!

In this scenario, clip image embedding and text embeddings are added after forwarding shared linear layer with hyper-parameter $\alpha$.
image
image

Questions

  • Clip text embeddings would have shape of [batch size, 77, 1024] and CLIP img embeddings [batch, 1024]. Then how are they added after forwarded shared linear layer ?(I guess unsqueeze CLIP img embedding and expand it to token length (77))
  • Then how $\alpha$ is set when training ?? (when inference $\alpha$ may control the influence of each type of guidance.
  • If classifier-free guidance is applied, how much scale do you apply when inference ? (and how much ucg_rate prob which make zero embedding)

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions