Thanks for you code.But I have a question about the implementation of the position embedding. It seems like position endoding is randomly initialized and updated in the training just like tokens embedding. What confuses me is how does this ways works learn specific position information?
Thanks for you code.But I have a question about the implementation of the position embedding. It seems like position endoding is randomly initialized and updated in the training just like tokens embedding. What confuses me is how does this ways works learn specific position information?