Skip to content

fix(embeddinglinker) - Embedding Mention Mask Fix and Embed Cui via longest name.#421

Closed
adam-sutton-1992 wants to merge 3 commits intomainfrom
embedding_cui_longest_name
Closed

fix(embeddinglinker) - Embedding Mention Mask Fix and Embed Cui via longest name.#421
adam-sutton-1992 wants to merge 3 commits intomainfrom
embedding_cui_longest_name

Conversation

@adam-sutton-1992
Copy link
Copy Markdown
Contributor

Two fixes here leading to a performance upgrade.

  1. Provide mention masks for embed_cui and name methods. I previously (and erroneously) thought that mean pooling would be the correct attention for names and cui embeddings, as that's the only thing there. However it does contain the and special tokens as part of it. Throwing off scoring - if you detected say "glucose", embed with no context and just scored that glucose vs an emebdded "glucose" the scoring could be quite far off. You would expect a similarity of 1, but this would return 0.875 here.
  2. Embed CUIs via their longest name rather than their preferred name. Which solves issues with disambiguation. Another example to explain this; the name "glucose" is ambiguous. It could be a measurement (snomed: 33747003), or a substance (67079006). When embedding via preferred names; the preferred name for the measurement version is "Glucose measurement, blood", the subtance version is... "glucose". The model will always choose substance over measurement due to cosine similarities. The longest names however makes it an easier task - improving performance.

These changes performance in metrics as such to:

Epoch: 0, Prec: 0.2862910530810839, Rec: 0.7398836198247609, F1: 0.4128382160850905.

An improvement of previous similar recalls which were around 0.66.

@adam-sutton-1992
Copy link
Copy Markdown
Contributor Author

Closed as it included the transformer NER stuff. Which isn't yet ready.

@adam-sutton-1992 adam-sutton-1992 deleted the embedding_cui_longest_name branch April 26, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant