''Americans have grown accustomed,'' wrote Time, ''to weighing the word of one defendant against that of one plaintiff, the steadfast denial against the angry accusation: he said, she said.''
The NYT wrote in its popular "On Language" column that ran from 1979 to 2009, covering popular etymology. This quote came from an April 12 1998 edition, unpacking the newly popular "he-said she-said." Toward unpacking it, the article cited Georgetown linguistics professor Deborah Tannen who explains the phrase is about
"cross-gender discourse'' -- different understandings of the same words.
The idea was that when a woman says, ''I have a problem at work,'' she means ''Listen to my problem''; a man takes that same phrase to mean ''Tell me how to solve the problem.''
The phrase was adopted into a movie of the same title "He Said She Said" starring Kevin Bacon and Elizabeth Perkins. Director Brian Hohlfeld explained
''Our film was about gender differences, but now the phrase seems to be more about who is telling the truth and who isn't.''
He's right. The latest meaning has nothing to do with dialogue (he said and then she said) and no longer refers to missed communication. Now the phrase most often means ''testimony in direct conflict,'' with an implication that truth is therefore undiscoverable.
We see this he-said she-said problem arising in AI in two ways.
First, some applications of AI memory only save AI interpretations of original utterances, not the utterance itself. Second, (AI) applications only see a limited slice of our context.
Both of these lead to a "he-said she-said" environment where truth is undiscoverable, either from lost opportunities for interpretation or not having the full context to begin with.
In this blog we'll unpack both of these dynamics and share what Crosshatch is doing about it.
You mean you remembered all that?
On June 13, 2023 OpenAI released GPT3.5-Turbo 16k, a model with a 16k token context window. This came after GPT3.5 shipped with a small context window of 4k tokens.
Small context windows spurred exploration into technologies that could manage AI’s apparently limited context.
Vector databases rose.
Two days later, Langchain hosted an event on AI memory, exploring paths like
- Rolling conversation summary buffers
- Reflection: using AI to decide what’s important and storing it to a (vector) database
We see this reflection mechanism deployed in the popular open source mem0 and apparently in OpenAI's implementation of memory. Mem0 allows developers to pass conversations through a LM to map conversation history to “memories” or "facts." For instance, "memories" can be provided by this default prompt (see their git)
Deduce the facts, preferences, and memories from the provided text.
Just return the facts, preferences, and memories in bullet points:
Natural language text: {user_input}
User/Agent details: {metadata}
Constraint for deducing facts, preferences, and memories:
- The facts, preferences, and memories should be concise and informative.
- Don't start by "The person likes Pizza". Instead, start with "Likes Pizza".
- Don't remember the user/agent details provided. Only remember the facts, preferences, and memories.
Deduced facts, preferences, and memories:
and a chosen LM.
Of course, context windows have grown since last June. Now commercial models (with context caching!) have context windows with over a million tokens. That can fit about eight JRR Tolkien’s The Hobbit.
"Context windows to infinity” have seemed to relax the urgency of AI memory to a standard engineering problem, balancing latency, cost and quality.
These tradeoffs appear to be quite real, and for some applications, the quality lost by restricting memory to just these AI-found facts is too high.
Last week Chrys Bader and Sean Dadashi launched Rosebud, an AI powered journaling app for iOS and Android.
Fast Company covered its launch,
What if your journal could not only hear you, but also understand and respond to you?
Rosebud founder Chrys Bader reported that for Rosebud, pre-compressing “deduced” memories is too lossy.
In the context of chat, we find storing "facts" to be too lossy. Pre-compressing what someone says and storing it as bite-sized facts loses a lot of the nuance and context.
That's why at @joinrosebud we always share the original context with the AI, and while token-usage is much higher, the results are incredible. What would be an improvement is having a specialized, low latency LLM filtering layer to extract the most relevant memories from a larger dataset.
Chrys contrasts this question as one of
context at write v. context at read
To us, storing a “fact” instead of the underlying strips the original utterance of its fullest expression in other contexts. Chrys continued, saying that fixing context at write time
exercises judgment at the time of utterance, without knowing how it will be used in the future. Retaining the original utterance allows for more future dynamism and contextual relevance
Our sense is that memory for AI is no different than memory for the internet. If you see a user take an action on an object in a context (e.g., send a message with various metadata) – record that, just like we do for web. This extended CDP-style representation appears both well motivated in practice and philosophically.
This is an example of this he-said she-said problem, where a language model's first interpretation of an utterance as given by its memory deduction prompt, might not actually induce the right interpretation. Recalling from the NYT
The idea was that when a woman says, ''I have a problem at work,'' she means ''Listen to my problem''; a man takes that same phrase to mean ''Tell me how to solve the problem.''
Using a write-time memory, you risk losing opportunity to interpret the original “I have a problem at work” as you individually might, depending upon the context you have on the nature of the original utterance. Instead, the memory layer might incorrectly presume the meaning of the utterance "asking for help" when all the user wanted was for someone to listen.
This is why at Crosshatch, we use a memory representation Context Layer that eschews storing “facts” and honors data provenance.
We just store what happened and leave an application to interpret it. Context is contingent. [So is identity!] Opinionated memory layers like mem0 or OpenAI memory produce memories that are images of the underlying context, memory prompts and the LM and thus produce facts that are at best not necessary (in a modal logic context) and at worst even incorrect!
<claude-sonnet> In modal logic terms, these 'facts' fail to achieve the status of necessary truths - they aren't invariant across all possible interpretations or contexts. At best, they're contingent truths, dependent on a particular (and likely hidden!) interpretation at a specific time. At worst, they could be outright falsehoods when applied in contexts different from their original interpretation.</claude-sonnet>
Data silos also reflect our he-said she-said reality, where different applications only know one side of our story.
Forget me? They can’t even remember me!
Interactions with AI today are like using e-commerce sites in 1994.
In 1994, to buy something online you had to do it in one uninterrupted session.
If you left, the site wouldn’t remember you and you’d have to start all over again. Sites had “amnesia” as the NYT put it in a 2001 piece.
Today, any meaningful engagement with (an AI-powered) app needs to happen with just one app or OS.
If you start with one – providing it some of your context – but want to pick up with another, that context doesn’t come with you. It is stuck with the original app. Even if you go to another app (using the same underlying language model!!) you’ve got to start from scratch!
Our labor to provide apps with context is a sunk cost: an investment of time that can’t be recovered.
It's like dating; it's just hard to start over again with a new person.
USV described last week.
Bill Gurley described this dynamic in a 2003 blog “In Search of the Perfect Business Model”
What if a company could generate a higher level of satisfaction with each incremental usage? What if a customer were more endeared to a vendor with each and every engagement? What if a company were always more likely to grab a customer’s marginal consumption as the value continued to increase with each incremental purchase?
This may be the nirvana of capitalism – increased marginal customer utility. Imagine the customer finding more value with each incremental use.
The customer who abandons increasing marginal customer utility would experience "switching loss."
The strategy that Bill Gurley described becomes particularly pronounced with AI, where marginal customer utility increases each time we provide an app with a little more context.
We hope we can be pardoned for our distaste for ‘capitalist nirvana’ as a strategic limiting of consumer choice.
I was going to. But, then I asked myself, ‘Why?
Today, state memory is bound to applications or OS. It does not vary or grow independent of applications that govern it. Application memory only grows whenever we put in the work in that application.
But why should it work this way?
We engage with tons of different applications, all of which manage their own records of their interactions with us – whether in a CDP, database, memory abstraction or vector store.
State should grow independently of any single application’s use.
- We buy stuff at Walmart.
- We go for a run in a park.
- We chat with Rosebud.
- We confirm a reservation on Resy.
- We like a photo on Instagram.
Most of this took work from us. Engagement costs time. Creating data is labor. But this state can only be activated by the application where it took place.
We find this incredibly disappointing. So we're doing something about it.
Crosshatch provides a way for developers to personalize applications with data their users share. We believe the consumer context layer should update as we take action across applications, not as we use just one app. This can unlock a world where AI activations of user context are truly contextual and ambient.
Today, Crosshatch allows users to bring context with them. Crosshatch starts by sourcing context from apps with dev tools that already encourage developers to build new services with app data.
When a developer adds Crosshatch, the data connection is unidirectional. Crosshatch allows users to bring data with them, but doesn’t require applications to give up data they collect on the basis of user engagement.
Many people ask us if Crosshatch is actually contrary to the interests of the apps from which we source data.
This is a reasonable question, but – perhaps counter-intuitively – we actually don’t think so.
Rather, we think Crosshatch will have the opposite effect, by making usage of connected apps more impactful. Namely, whenever you interact with a connected app, with Crosshatch that app’s activity can be projected anywhere else on the internet.
- Make Zillow more like your Pinterest.
- Make Beli more like your Instagram and Chrome Browsing
So use Pinterest more, because your activity there can appear anywhere else you want it.
We recognize the work that goes into building a great user experience. We think data that’s created as a part of application usage is rightfully observed by the business that deployed the application. Users increasingly understand this dynamic and are okay with it, these days sharing more online than prior generations.
Taken all together, however, we believe that allowing users to bring application data to new apps is actually a net-positive for both users and apps.
At the end of the day, we are students of economics – we get the whole switching cost dynamic and why businesses want it. Crosshatch can't just be good for users, Crosshatch has to provide value for both businesses and users.
But normatively we hold firm that raising the walls to switch is not pro-user.
If AI is a path to accelerate – let us accelerate. Let's unify context and let users decide how and what to do with it, with whom.
Let’s deliver a higher level of satisfaction with each incremental usage ... across the entire internet.
Let’s build an internet more you.