Pre-lude Blog
Actor and Audience Embeddings: Personalization Without Guesswork
Abstract
Describe the two-encoder setup that produces dense actor and audience embeddings and how contrastive learning makes them predictive. Explain how PRISM blends latent profiles with retrieved text evidence.
PRISM uses separate encoders for speaker and audience, producing dense latent representations that capture the communicative tendencies and response patterns of each participant. Contrastive learning trains these encoders to produce representations that are predictive of interaction outcomes: similar actor–audience pairs that had similar outcomes should have similar embedding distances.
The latent profiles are then combined with retrieved evidence from the dialogue context—specific text spans that instantiate the general profile in the current moment. This blending allows the system to use the stable, generalizable information in the profile while remaining sensitive to the specific dynamics of the current exchange.