Skip to content

fix(embedding_linker) Mention_Mask fix and CuiEmbed by longest name#422

Open
adam-sutton-1992 wants to merge 2 commits intomainfrom
embedding_cui_longest_name
Open

fix(embedding_linker) Mention_Mask fix and CuiEmbed by longest name#422
adam-sutton-1992 wants to merge 2 commits intomainfrom
embedding_cui_longest_name

Conversation

@adam-sutton-1992
Copy link
Copy Markdown
Contributor

@adam-sutton-1992 adam-sutton-1992 commented Apr 26, 2026

Two fixes here leading to a performance upgrade.

  1. Provide mention masks for embed_cui and name methods. I previously (and erroneously) thought that mean pooling would be the correct attention for names and cui embeddings, as that's the only thing there. However it does contain the and special tokens as part of it. Throwing off scoring - if you detected say "glucose", embed with no context and just scored that glucose vs an emebdded "glucose" the scoring could be quite far off. You would expect a similarity of 1, but this would return 0.875 here. Using mention masks on the existing mention mask method was in two for loops. Doing this for 3 million names and 800k cuis was quite long, so it's now a tensor based operation.
  2. Embed CUIs via their longest name rather than their preferred name. Which solves issues with disambiguation. Another example to explain this; the name "glucose" is ambiguous. It could be a measurement (snomed: 33747003), or a substance (67079006). When embedding via preferred names; the preferred name for the measurement version is "Glucose measurement, blood", the subtance version is... "glucose". The model will always choose substance over measurement due to cosine similarities. The longest names however makes it an easier task - improving performance.

These changes performance in metrics as such to:

Epoch: 0, Prec: 0.2862910530810839, Rec: 0.7398836198247609, F1: 0.4128382160850905.

An improvement of previous similar recalls which were around 0.66.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant