# What Crosshatch was
Crosshatch was “Plaid for Personalized AI.”
This doc records all our
Docs
site copy
Blogs
should they be useful to anyone attempting a new (AI-agnostic!) identity layer for personalized AI for the internet. We shut down July 2025.
Thank you to all our investors for supporting our endeavor to build a more adaptive internet.
# Blogs
Manifesto: Your own kind of music
See the internet through you-colored glasses.
Scroll to read
11/17/2023
Since the late ’90s, internet companies set out to make the world more open and connected.
Internet companies tracked all our activities – from social interactions, messages, spending, workouts, home layouts, browsing, and where we go. This data powers conveniences we now expect – turn by turn directions, contextual information anywhere, or the ability to instantly connect with our friends.
Digital Transformation turned data into a lever of customer experience from a mere ledger of business. The more data a company has on us, the more ways they can serve us. Now everyone is building a Data Moat.
Privacy
The last decade taught us a lot about digital privacy. We Are The Product, we now know. Friendly Services – dating apps, email organizers, app analytics, flashlight apps – sold our data to hedge funds, facial recognition companies, and data brokers. Sometimes it was in the fine print. With third party cookies the whole internet followed us around. Digital Privacy looms large today because data flows are growing but our trust is eroded.
Is it bait and switch if the practice was referenced in the terms?
Sharing
Privacy comes up less in daily life. Our neighborhood coffee shop knows our order. We share our location to get a ride. We tell a friend a secret. We wear a crazy Halloween costume in public. We come out to our friends. We share a project we’re proud of.
This sharing is different. The vibes are different. It’s the context?? No one’s watching. Maybe everyone’s watching! Who cares – was that the point?
We share because it’s chill.
What’s Ours
The experiences that lead up to now – the books we’ve read, the trips we’ve taken, the trails we’ve run, the selfies we’ve taken – they’re all ours. Records of these experiences are scattered across the internet. We helped create them. They’re ours too.
The internet is at a funny spot.
AI is about to completely transform the internet. Cheap intelligence will be everywhere, deployed by anyone. If you want something, all you have to do is ask. If AI could reference your experiences, you might not even have to ask at all. Just click, snap, dart or mutter. Any business can roll out the red carpet – like our local coffee shop but at scale. Everyone will deploy AI to help us create, learn, discover and explore.
On the other hand, the surveilling Data Moats that could power this just aren’t that big. Even the big ones aren’t that big. They’re fragmented and sparse. A few clicks. A purchase last summer. Some scrolling. That time we listened to Irish Jigs on repeat. Runs we did on our fitbit but not our watch. A like. Two stories.
The hyper-personalized internet will not come from Data Moats. The internet and its communities, a growing subset of the power set of all of us, is too varied and too extraordinary to be exclusively squeezed by anyone.
But this is about us. It’s our internet. Our constructs. Our references. Meaning we made. And forgot. And laughed about.
Your Kind of Music
So far we’ve seen the internet through Silicon Valley-colored glasses. What Product Managers in Menlo Park think is good. With AI anyone can be a creator. An explosion of creative experiences is coming. An internet through you-colored glasses is coming.
One that knows the pieces you choose to share … if the context is right. If we LOVE the experience. If it were fast. If we had control. Might delete later. If it were safe.
But today we’re left with Surveillance Capitalism. Scrape The Entire Internet. We get subscribed to toilet seats and songs we binged.
Personalization as we know it today boxes us in. It can be icky and reductive. When we use the internet we don’t know how our data is being used (shared? scraped? leaked?).
We’re building Crosshatch to fix this.
We want Tony Stark’s Jarvis available on every surface. We want control. We want an internet made for us / rolls out the red carpet for us. Created by everyone. Knows us in context. Operates in confidence. Respects our choices. No more ‘users like you like’. Me –– look at me –– who I am, what I choose to share with you here, in this moment – what I want. Me.
We’re building Crosshatch to safely hyper-personalize the internet.
–
Nobody can tell you
There's only one song worth singing
They may try and sell you
Cause it hangs them up
To see someone like you
You gotta
make your own kind of music
What if you could have Spotify Wrapped every day?
Spotify Wrapped celebrates our realest moments from Spotify on Spotify. Why not celebrate our moments from everywhere on everything?
Scroll to read
12/11/2023
Meet Jeff.
Jeff is a busy field solutions engineer for a software company. He travels around the country six months out of the year meeting customers and designing solutions to help improve supply chain visibility. He loves his job, but often jokes with his wife and friends that his job is “glorified PowerPoint” for executives.
Jeff’s Spotify Wrapped is actually pretty boring. After Thanksgiving each year, Jeff logs into Spotify and gets his yearly insights. He over-indexed for Taylor Swift this year, and he was in the Top 1% of people who listened to the 1980’s hair metal band Twisted Sister. Why was Jeff in the Top 1% of people who listened to the 1980’s hair metal band Twisted Sister? That’s easy: Jeff loves playing “We’re Not Gonna Take It” when pitching his customers on the state of legacy supply chain visibility solutions. He always gets a laugh from his customers.
But does that define Jeff?
Really, Jeff wants to find the best Neapolitan pizza in any city he visits, whether he’s in Kansas City, Boston, Phoenix, or Portland. Jeff has lots of opinions on whether a coal fired or wood fired oven makes the best Neapolitan pizza. He secretly runs a food blog reviewing Neapolitan pizza around the country. Some of the best Neapolitan pizza is not, in fact, merely in the Northeast and only in the New York area. There are great Neapolitan pizza restaurants in many cities across the country, and it’s Jeff’s mission to bring this knowledge to the masses.
Jeff also likes taking ceramics classes with his wife. Jeff knows he needs to get more into fitness and he loves learning new workout tips and exercises, but he dislikes how whenever he browses the internet looking up new workouts he starts getting bombarded with supplement ads. Pine bark extract gummies? Gingko biloba creams for post-workout recovery? Jeff does not like getting fed these ads.
Jeff is more than being a Top 1% Twisted Sister fan. In fact, he actually does not even like Twisted Sister all that much. He also wants to know what is up with pickleball these days. Every time he searches the internet and reads up on pickleball he just gets more pickleball ads. Jeff was captain of his high school tennis team, he loves tennis, and he’s played tennis with all kinds of people for recreation or competition. Every time he tries to look into pickleball, he just gets more pickleball information and less tennis information.
The internet is simply broken.
Why is the Internet broken?
Today, businesses have to make best guesses about who you are, what you like, and what you care about. When you swipe a card, fill out a survey, or forget to pay your monthly credit card bill on time, you are uploaded, graphed, and scored by businesses making the best possible guess about who you are. Your data is shared across networks of data brokers, aggregated, disaggregated, with PII not intact but sometimes intact. This data is resold back and forth. If you are scored as a woman in your late 20s and you put pickles and ice cream in your online grocery shopping cart, you may get scored as a person who is likely pregnant and thus you will begin receiving ads for neonatal vitamins.
Businesses hire large teams of data scientists and analytics engineers to retarget you based on the information they know about you, or think they know about you.
But what if it’s all wrong?
What if you are Jeff and you really want all the best Neapolitan pizza restaurant recommendations in each city? What if you don’t want to be targeted for pickleball gear after you searched for pickleball just because you wanted to see what all the hype is about?
This is where Crosshatch shines.
What is Crosshatch?
Crosshatch is the digital wallet that allows customers and brands to safely share information with one another. With Crosshatch, you are you, and you can choose what to safely share with brands you trust, and you can choose to not share information with brands you do not want learning more about you.
On the consumer side, if you wish to share specific information with your favorite brands about what you are actually interested in and what you actually care about, you can share this with specific brands. If you are a marathon runner and you wish to get more information from your favorite running magazine about what products or services you may be interested in, we enable that.
On the brand side, what is more valuable than your top customers sharing information with you about what they like, what they do not like, and what is important to them? Instead of hiring your thirtieth data engineer, fortieth data scientist, and third ad agency, wouldn’t you like a direct feedback loop from your customers who want to give you more information? These are your best customers and your best retention and lifetime value contributors.
As consumers become increasingly data-driven and sophisticated in their use of their own personal data, it’s critical for both consumers and brands to better understand one another.
Join Us
At Crosshatch, we are taking on early partners to help us build the future of personalization. We believe hyper-personalization is the way forward, and that tight feedback loops between consumers and brands make the best experiences.
We’d love to chat about hyper-personalization. Reach out; we’d love to show you what we’re building. Click here to signup for our upcoming demo launch.
Ending the world’s most expensive game of Guess Who?
Digital marketing is like a high-stakes, complex game of Guess Who? Crosshatch was created to solve the guessing game once and for all.
Scroll to read
12/19/2023
If you’re an 80’s or 90’s kid, chances are you know the game Guess Who?
In Guess Who?, each player has a game board that holds 24 character cards, arranged in a grid. The characters have various distinguishing features such as hair color, eye color, facial hair, glasses, and hats.
The goal is to narrow down the possible choices as quickly as possible and correctly guess the character before the opposing player.
At Crosshatch, our team spent many hours playing Guess Who? as children, only to grow up and get jobs out of college working in the world’s most expensive and inefficient multivariate game of Guess Who?: Marketing Analytics and Data Brokerage.
Gone are the days of guessing who is wearing a funny hat or glasses. Gone are the days of guessing who has rosy cheeks but brown eyes.
Today, we spend billions of dollars figuring out who is a Spanish-speaking dog owner living within 30 miles of Philadelphia, Pennsylvania. Today, twenty person teams of $200,000-a-year salaried data scientists crunch first-party, second-party, and even zero-party data in Snowflake, Amazon Web Services, off Oracle, or even in layers and layers of Excel spreadsheets, in order to figure out whether Spanish-speaking dog owners living within 30 miles of Philadelphia, Pennsylvania have better return on ad spend on Facebook or Google.
Instead of trying to eliminate the Guess Who? characters who have mustaches but not black hair, we can look to data clean rooms. They offer publishers the ability to serve better ads for their brand partners who want to market to publisher audiences. However, if a publisher wants to use a data clean room, and they have many customers, doesn’t the ad buyer then merely help the publisher better tune their model, which would then help the ad buyer’s competitors if these competitors choose to also advertise with the same publisher?
This sounds like a problem for a small army of $200,000-a-year salaried data scientists could help solve. But wait, they are data scientists, not data engineers. To set up and manage a data clean room, you’ll need a data engineer. That will be another $250,000 a year on top of the small army of data scientists.
What about privacy? Data collection challenges? GDPR? Do you even know what GDPR stands for, or do you just know the acronym? You better hire some lawyers at $500 an hour to help guide the data engineer who is going to build the data clean room so that the data scientists can build audiences based on events like clicks and web activity tied to third-party enrichment. Then, and only then, can you effectively target all of the Spanish-speaking dog owners living within 30 miles of Philadelphia, Pennsylvania.
Oh, and you have five competitors all doing the same thing, trying to find all of the Spanish-speaking dog owners living within 30 miles of Philadelphia, Pennsylvania. That’s right.
Back when you were playing Guess Who? as a child it was a one-on-one game. You didn’t even know who your competitor was looking for. You would both be looking for different characters.
In the real world, in the world’s most expensive game of Guess Who? you and five other competitors are all trying to find the Spanish-speaking dog owners living within 30 miles of Philadelphia, Pennsylvania, and because you are all trying to find the same customers, the cost of reaching these customers rises as you and your five competitors all bid the prices up on ad platforms.
If you are unlucky, a brand new venture capital firm that just raised $1 billion in new funds just funded a new competitor. Now there are six competitors all fighting over finding the Spanish-speaking dog owners living within 30 miles of Philadelphia, Pennsylvania.
Yet it only gets worse.
It turns out that now that the small army of $200,000-a-year salaried data scientists has more data, you should actually be looking at all dog owners, not just the Spanish-speaking dog owners, and that you should be looking within 60 miles of Philadelphia, Pennsylvania, not 30 miles.
Now you’re targeting dog owners all the way out in Lancaster County, Pennsylvania. Do they even have the internet there? Isn’t Lancaster County one of the largest Amish communities in the world?
Yet it only gets worse.
It turns out your small army of data engineers and data scientists are not actually moving from Databricks to Snowflake. They are actually moving from Snowflake to Databricks. As a marketer, product manager, or executive you may have no idea what this means. It is entirely possible you will have to re-hire that $500 per hour lawyer again, as all her work presumed everything was going in Snowflake. Now, it’s all in Databricks.
Yet it only gets worse.
It will take 6 months to migrate everything from Snowflake to Databricks.
Welcome to the world’s most expensive game of Guess Who? You may just burn $10 million targeting the wrong people, only to make an incremental $2 million back in incremental revenue.
Crosshatch is out to solve the World’s Most Expensive Game of Guess Who?
At Crosshatch, we loved playing Guess Who? as children, but we hate playing Guess Who? all day long in our jobs as adults. We cut our teeth playing the world’s most expensive game of guess who as data scientists and engineers, and now we are out to end it altogether.
Instead of relying on clickstreams and events data patterns and making best educated guesses about who customers are and what they like, we are building a digital wallet that allows consumers to simply tell and inform the brands they trust what they want, what they care about, and who they are, with no guessing involved.
This two-way communication between trusted brands and consumers removes most of the Guess Who? games that we play today. With tighter feedback loops and data shared via consent, no longer do brands have to spend millions of dollars trying to guesstimate who their audience is and what audience members wish to buy. Consumers win by sharing data with brands they trust. Brands win by having their best audiences and highest value customers available without casting a wide, expensive net out into the market.
2024 is the Year of Loyalty and Rewards
Email marketing is dying, CAC is rising, and yes, cookies are crumbling. Loyalty is the new leverage.
Scroll to read
1/9/2024
At Crosshatch, we're betting big on 2024 as the Year of Loyalty & Rewards.
For too long, Loyalty & Rewards has taken a backseat to Customer Acquisition. Budgets often skew heavily toward Customer Acquisition. Dollars flow into new customer marketing that could have been spent on retaining already existing customers.
Hey, don't get us wrong, we love Customer Acquisition. In fact, what you are reading right now, this very blog post, is a Customer Acquisition strategy! It's true! We hope that you will read this blog post, digest it, and then click on our Call-to-Action at the bottom and come say hello.
Nevertheless, here are the market factors that show a lean toward Loyalty & Rewards.
First, email marketing as Customer Acquisition is about to fall off a cliff.
In the past few years, many companies have begun using various "gray area" strategies of taking website anonymous visitors and using various network graph vendors to match these to email addresses, emailing people who have not necessarily explicitly consented to be contacted by the brand.
We think this is terrible.
Google is also updating their policies, with changes going into effect in February, 2024. Going forward, bulk senders (those sending 5,000+ emails a day to Google addresses) will be required to keep spam rates below 0.3%. While many of the top brands already aim for this level, a lot of "email experimentation" around buying lists or softwares that match browsing activity to email addresses will fail to be effective. Email will have to shift toward being retention focused, as bulk senders will not risk deliverability issues for using email en masse as a Customer Acquisition channel.
Second, CAC is rising.
Isn’t CAC always rising? Perhaps. According to this recent report from CommerceNext, most retailers say CAC has risen since the introduction of iOS 14.
Third, the cookie is out.
Google will bid farewell to the era of cookies at the end of 2024, marking a seismic shift in 30 years of the digital marketing landscape. We've read the many thought pieces produced in the last few weeks. Many digital marketers will have you believe this is the apocalypse. The end is near. All hope is lost. We all know the Seven Horsemen of the Apocalypse, but until Google finally announced the end of the cookie we did not know there was actually an Eighth Horseman of the Apocalypse.
Loyalty & Rewards has a Personalization Problem Right Now
If you agree with us so far, you can see the customer acquisition headwinds. If you're like most brands, you already have plenty of levers to pull with your strongest customers, your Loyalty & Rewards customers.
What will have to change for L&R teams?
Right now, personalization is very nascent.
The personalization method for a Loyalty & Rewards marketer today is
create segments based on purchase history
choose activations for each segment
deliver these activations via a Loyalty & Rewards product, often an app or a special email segment
Segments have made sense for Loyalty & Rewards because
businesses have limited data and there just aren't that many available dimensions, or ways to slice and dice customers
each unique user journey adds complexity to make and track
Gee, that sounds very similar to a Customer Acquisition marketer flow.
A Customer Acquisition flow today is
create segments based on clicks and actions, possible enriched by third-part data bought from brokers
choose activations for each segment
deliver these activations, often an incentive to join an app or a special email segment
Segments have made sense for Customer Acquisition because
businesses have limited data and there just aren't that many available dimensions, or ways to slice and dice customers
each unique user journey adds complexity to make and track
Segments aren't personalization. In fact, segments are the anti-personalization.
At the end of the day, segments are just smaller groups of people than the entire customer base.
At Crosshatch, we want businesses to move beyond segments and feed all relevant data into AI with a prompt to create hyper-personalized experiences.
For example, let’s say you’re a small but formidable chain of coffee and bagel shops. You value your Loyalty + Rewards customers, but the program hasn’t really gotten off the ground in the year you’ve been running it.
Currently, you take all customers who have purchased 3x or more in the last 90 days, you sort them into "beverage consumers" or "bagel consumers" depending on what SKUs they primarily purchase, and then the "beverage consumers" get a 25% off coupon for their next order and the "bagel consumers" get buy 2 bagels, get 1 bagel free coupon. This is all done in SQL by a data engineer or within a CDP you bought, or perhaps both.
You brand this internally as a "personalization" experience, and it’s true, segmenting out people who buy beverages vs. people who buy bagels and customizing an offer is a good first step. It’s better than nothing, and you have more people signing up for the app, so that seems great.
At Crosshatch, our workflows are a bit different, and hyper-personalized:
Here are the user's past purchases over the past 90 days- 9/30/23, 10:02 AM: Espresso + Chicken Salad Bagel- 10/07/23, 4:00 PM: Espresso Macchiatto- 12/23/23, 4:07 PM: Espresso MacchiatoBased on the user's past spending + spending at other coffee and sandwich shops referenced below…Return a json with - a message to send the customer to get them to come back- a discount amount (float) to entice the customer to come backHere are some past conversion data to help you provide your response …
view rawprompt hosted with ❤ by GitHub
The beauty of hyper-personalization is that each customer is treated as –– a unique individual. Some customers will need a big discount on a coffee, and other customers will simply want an early bird special, where if they arrive before 8 AM they will get the discount.
Some customers will view their relationship with your brand as purely transactional, you are just the nice coffee and bagel shop across from their office. Other customers will want a gamification experience, where for each 5 coffees they buy, they get a 6th for free, and on their birthday they get a free bagel.
Going beyond the segment is the future, especially for your existing, top customers. Your relationship with these top customers is your company’s most valuable asset.
We’re holding a number of conversations with Loyalty & Rewards professionals across travel, retail, real estate, and a number of other industries.
We’d love to connect and meet with you and show you what we are all about, and to hear your plans for 2024. Click hereto get something scheduled.
Take the Epsilon Data Broker Challenge
Data brokers have spending and behavioral data on nearly every US consumer. How much do data brokers know about you? Take the Epsilon Challenge to find out.
Scroll to read
1/16/2024
Data brokers. We can’t live with them. We can’t live without them.
Or maybe we can live without them?
Epsilon is one of the largest data aggregators and data brokers in the world. They power audience insights for some of the largest brands and have proprietary data on nearly every US consumer.
The Epsilon Challenge
We’d like to introduce The Epsilon Challenge, to show folks what information a large data broker has on you.
Note: this works best for Google Chrome users, and we’ve tested it on United State users.
Step 1. Go to https://legal.epsilon.com/dsr/
Step 2. Click your country, click “Access my Personal Information”
Step 3. Submit Request
Our results
We asked a few data and marketing leaders about their experiences after going through the Epsilon Challenge. All were very surprised by certain results.
Ben Seidl, Crosshatch General Manager
This was very interesting to analyze. It seems I am tagged as an Adult Without Kids and as a Single Asian female, although I do have two kids and I am not a Single Asian female. Nobody else uses my device or browser so I am not sure where this information could be coming from. It also has information about my location in a North Carolina DMA, though I have never lived in North Carolina.
While this is a bit strange, I know that if I were a brand buying these audiences I would not be very happy. If this sample of my own data is this far off, I have to question how accurate these audiences and segments are overall.
Mary MacCarthy, Marketing Leader
Like most of us, I’m always a bit apprehensive before seeing any data collected about me, fearing which of my dark secrets is going to be revealed about my late-night Google searches and Amazon purchases.
Fortunately, Epsilon’s data on me didn’t contain anything juicy. But what surprised me was that the data – a whopping 80 pages if printed out in size 12 font – focused mostly on my grocery habits. One fun fact: I noticed a lot of references to meat products. The words ‘meat,’ ‘chicken,’ ‘beef,’ and ‘bacon’ appeared 113 times.
My household has been vegetarian for a decade. Interestingly, the words ‘vegetarian’ and ‘vegan’ didn’t appear anywhere. So despite having collected 80 pages of data on my grocery habits, Epsilon has overlooked the core element of my household’s diet– missing out on the potential to market us vegetarian and vegan products. Those are ads that I would actually welcome, as I'm always struggling to find vegetarian products to add to my grocery list.”
Christina Des Vaus, Head of Marketing and Platform, Village Global
“Some of it seems really old but a lot are recent as of December in terms of site visits. Many of the attributes are wrong (Adults Without Kids, Catholic, Conservative Donor, etc). But, some detailed purchase history was correct, like Naturipe Farms Berries!”
Lauren Balik, Data Consultant
This is hysterical. First of all, my partner also did this exercise and she’s listed as an African-American woman. She is of Ashkenazi Jewish background – not an African-American person at all.
A number of these categories have me as a Spanish speaker. Although I live in Harlem in New York, which has a large Spanish-speaking population, I rarely speak Spanish and I don’t speak Spanish at home. Perhaps this is why I keep receiving various Spanish language ads on various streaming platforms? This has really picked up in the last few months.
Also, I am listed as someone who buys Adult Incontinence products. I cannot for the life of me remember when I have ever bought Adult Incontinence products, but perhaps it’s related to the fact that I am a large buyer of Snack Rolls, Asian Marinades, and Gala Apples. I would think that if someone is purchasing Snack Rolls, Asian Marinades, and Gala Apples together and consuming them for most meals, perhaps they would indeed wish to be advertised to about Adult Incontinence products, because Snack Rolls, Asian Marinades, and Gala Apples together seems like a recipe for disaster.”
Take The Epsilon Challenge
We encourage you to take The Epsilon Challenge and see how well they know you. After all, you know you better than anyone. Did they get your location correct? Your personal characteristics? Your purchase history? Your streaming or TV viewing history?
We’d love to hear from you how right or wrong this is. We think it’s eye opening for many consumers to see the depth of information some of these companies have on consumers and similarly how eye-opening it is for brands to see just how far off some of this data can be. When you are the Beef Council and you’re paying millions of dollars to market new beef products to longtime vegetarians, you may not exactly be getting the best bang-for-your-buck.
The Crosshatch Way
We believe browser data is on the way out. Safari is already locked down, and with Google deprecating the cookie, this way of life is soon coming to an end.
At Crosshatch, we believe in hyper-personalization between brands and customers based on consent instead of on data broker lists that are sold and repackaged over and over again to create increasingly dubious audiences over time.
We’d love to show you what we’re building. Reach out, say hello, and let us show you.
The death of implicit consent
The death of cookies makes most users anonymous. If you don't want to break marketing, users need to log in. A new opportunity for loyalty and rewards.
Scroll to read
2/1/2024
What do AdTech and one-hit wonder pop star Ashlee Simpson have in common?
They are entirely based on pieces of me. Ashlee Simpson’s Pieces of Me peaked near the top of the pop charts in 2004. AdTech takes pieces of me from my activity all over the internet and uses this to infer and make best guesses about who I am and what I like, buying and repackaging and reselling this data over and over again, all because of implicit consent.
Nobody has heard from Ashlee Simpson since 2004. After 2024, nobody will hear from implicit consent.
The current state of implicit consent
For at least the last decade, implicit consent has driven personalization on the internet. It’s powered advertising and first party personalization like search and site optimizations for relevance.
As a simple example of implicit consent, when I access a website, “pieces of me” – a bunch of information from my browser is visible to the website. These things include their IP address, screen size, browser version, and cookie identifiers websites or identity resolution providers may have placed on my browser.
In 2017 LiveRamp marketed site optimization without requiring a login.
So when I visit a website, it shares these pieces with an identity resolution provider like Acxiom or Epsilon. These companies track personally identifiable information and behavioral data for nearly every American. That is, while the website only has a few pieces of me, the identity resolution provider has A LOT of pieces of me. Even purchases and browsing activity. The identity resolution provider matches the website’s pieces to its database of Americans and returns an identifier. They’ll often also share insights on my purchases and browsing through an enterprise CDP.
How does the identity resolution provider have all this data? I don’t even know them.
They have it because at some point or another I agreed to use some services whose terms allowed the sale of my data to the data aggregators who then sell it back to all the brands via identity resolution and CDP enrichment.
So when I go to a website without logging in – ostensibly anonymously?? – the website is able to identify the pieces of me and then associate all my activities on their site to this identifier (e.g., with a CDP), all without me having to do anything.
In some ways – this is great! I get a better experience all frictionlessly. I don’t have to do anything and instantly every site I visit that uses one of these services can ship a better experience. This has been a core selling point of identity resolution providers. Optimize without requiring a login – pretty cool!
On the other hand, the cookie is going away. Identity resolution providers don’t need the cookie to identify users – they can use IP address, screen size and browser information like type or version. But soon IP addresses will no longer be visible to websites.
Identity is the cornerstone of marketing and advertising technologies like CDPs. The final Jenga blocks in the tower are getting pulled.
So what's a brand to do?
First, one of the emerging trends is simply asking consumers to log in and identify themselves. At Crosshatch, we expect this pattern to continue as an early attempt by brands to solicit more explicit consent and to collect more data as first-party and owned by the brands.
We can see this pattern emerging with Booking.com, Zillow, Trip Advisor and Bloomingdales, all prompting site users to sign in and validate their identities upon visiting the respective sites.
Major B2C corporates in travel, real estate and retail now ask users to log in or download the app to get a consistent user identity.
Second, we see many brands considering building out and adding more robust experiences and features to their existing Loyalty & Rewards apps and site features. Although some brands live and die by their L&R programs, for many brand L&R programs have over the past few years taken a back seat to customer acquisition efforts meant to acquire new customers.
In 2024, we expect most forward-thinking brands to reevaluate the untapped goldmines of Loyalty & Rewards programs, and invest more heavily in building experiences for repeat consumers.
The experiences we’re most excited about? Those using AI. The rise of AI offers brands incredible opportunities to create more dynamic and relevant experiences that anticipate consumer needs. The more context AI has about a customer the better it can serve customers. Consumers never consented to being tracked by data aggregators.
How does Crosshatch fit in?
Crosshatch is the digital wallet for consumers to explicitly share data with the brands they like, while maintaining safety and removing implicit consent altogether from the equation. If the deprecation of cookies requires the effort of log in, we believe that log-in should 10x the experiences from before – should deliver 10x personalization. We believe in consumers having one-to-one relationships with the brands they love instead of having one-to-many relationships brands simply buying and resharing consumer data over and over again.
We are taking on early design partners and speaking with marketers, developers, and partnerships teams across the market. We’d love to hear your thoughts.
The Rise of Headless Personalization
Hyper-personalization is the internet's version of rolling out the red carpet. Introducing a new personalization architecture that makes it possible.
Scroll to read
2/13/2024
Has your CEO ever asked you to use AI to anticipate customer needs? Has someone on your product or marketing team called out the difficulty of activating the data in your CDP to create experiences that help convert customers?
In 2018 before we founded Crosshatch we worked on a personalization team at Walmart. Our mission seemed simple: use AI to keep the American consumer’s refrigerator stocked, by using predictive AI.
Walmart is well positioned to actually deliver on deep personalization.
There is a Walmart store within 10 miles of 90% of all Americans.
Walmart is a consumer-staples business. Unlike discretionary businesses like travel or real estate, customers shop weekly to stay in stock of staple items they need.
We had tons of consumer data.
Peers on our team were literally rocket scientists and ex-quants experienced in using data to predict behavior. We had access to every online grocery order: a pristine time series of customer ordering behavior. Transactions data constitute what economists call “revealed preference” — you best know what people want in the future based on what they have spent money on in the past.
We had great tailwinds: rich data, great infrastructure, and an incredible team.
What we found was that even with these tailwinds we couldn’t reliably predict grocery orders that helped families save time.
If rocket scientists and the world’s best data couldn’t readily anticipate customer needs with AI in a constrained case like grocery, how will the rest of the internet?
Introducing Headless Personalization
If you want to anticipate customer needs, you simply need more accurate data.
Most businesses don’t have the data resources of Walmart. Those that do are under pressure by lawmakers who are demanding transparency about what data companies collect and how it’s used. The systems that govern the customer data companies do have are often tightly coupled with the application of it, making personalization today an expensive and tightly coordinated exercise involving product, marketing and engineering.
This is why we’re introducing Headless Personalization. Headless personalization is a data interoperability model that allows you to use any language model to generate personalized text with a consumer's private permissioned context. It works by separating the application layer from the back-end data and governance. This allows customers to share (and un-share) their data to applications they love, while giving businesses the flexibility to transform and infer from it on demand. Apps permissioned to perform this headless personalization use inferences from connected consumer context and don't directly handle the underlying data that powers it. Since consumers have more control over how and what they share and for what purpose, data sharing becomes ... chill.
We contrast this with existing personalization solutions, where a business’ ability to personalize is given by whatever data they happen to have. Headless personalization separates these concerns, allowing consumers to bring their data with them and businesses to activate it into whatever context drives mutual value.
We’re betting big on this architecture because it unlocks creative freedom to build entirely new personalized experiences that are typically not possible with first party data alone. Developers are eager to go headless because it offers a unique level of developer and consumer control and gives them the freedom to leverage composable tech stacks without the overhead of building data integrations, tracking plans, and security regimes.
Going headless allows you to activate permissioned customer information in multiple front-end experiences for different customer touch points on any data a consumer is willing to share. Your web and mobile can talk to a single personalization system via the permissioned API layer, which allows emerging startups to be as personalized as customers want in a way they’ll trust.
Going headless unlocks true hyper-personalization. Hyper-personalization is only possible when services can respond to consumer needs exactly as they are or given by as much data as a consumer is willing to share. Headless is how that data are activated in a chill way.
Signs that Headless Personalization could benefit your business
Brands starting to explore Headless Personalization often see themselves in one of the following scenarios:
My leadership team is creating a new AI strategy and I’m worried we may not have the data to support it.
I already have an established CDP but its ROI is not what we had hoped for.
We shipped a CDP but we’re worried about customer journeys degrading when cookies deprecate later this year.
I want my customer experience to be tailored to each customer individually even when they log in for the first time.
I want to build a unique customer experience but I’m worried I’m at a disadvantage to incumbents who have way more data than I have.
We’d like to scale the data we have about each customer, but we’re cognizant of respecting privacy and costs of compliance.
Consider the Costs
As you decide if and how to go headless, consider costs and time, especially relative to other options. Enterprise CDPs tend to cost in the tens to hundreds of thousands to millions. Some come with enriched data and identity resolution services but are not prepared for cookie deprecation happening this year that could threaten the efficacy of those solutions. Implementing a rich CDP can be an expensive and frustrating exercise.
Going Headless with Crosshatch
Crosshatch is supporting businesses looking to make their services more personalized without heavy engineering lift. Developers and marketers leverage Crosshatch’s suite of headless personalization services to build experiences that speak to each user individually, built in less time and lower cost. Crosshatch’s headless solution allows businesses to add inferences from user context wherever it’s needed and with whatever AI tools they’re already using.
Crosshatch’s headless personalization services include
Proxy API: add consumer context to any LM we’ve secured
Query API: test what context you can add
Let’s take a look at how each enables businesses of all sizes to move toward a headless architecture.
Proxy API
Proxy API is the foundation of our headless hyper-personalization platform. It provides access to the full breadth of AI resources and unified representation of data consumers are willing to share. This enables capabilities critical to driving retention, engagement and conversions like
Optimized search + recommendations
Tailored interfaces
Details that put products and services in context
The Proxy API is agnostic to LM inference provider, orchestration stack and evaluation platform. This gives developers the freedom to use the tools they already use and love, while simultaneously experimenting with new hyper-personalized technologies that add consumer context wherever it’s needed.
“Ultimately, the power of generative AI or any technology is only as good as the data that powers it”
Walmart’s Doug McMillon said last fall.
Query API
Crosshatch’s development stack also consists of the Query API, which gives developers a clear path for developing hyper-personalized applications with rich consumer context.
The Query API exposes the query interface for use on test data, allowing developers to see how queries resolve as stringified context in the Proxy API.
Ready to go headless and hyper-personalize?
Whether you’re an established business with existing architecture and user profiles, still building out enterprise architecture, or a new business looking to deliver new services with rich consumer context, a headless approach to hyper-personalization could be right for you.
Similarly, if your product or marketing operations over your customer data are becoming complex or unstable and you want to compete on customer experience, headless might be in your future.
Unlock total creative control with headless hyper-personalization. Learn how.
Cross-pollinating walled gardens
Walled Gardens are the future of consumer AI, but their isolated ecosystems limit growth. To cultivate a richer internet, we must cross-pollinate our walled gardens.
Scroll to read
3/19/2024
We are exiting the era of implied identity.
In this era, identity was underwritten by cookies and IP address. Sites you never visited before could immediately infer who you are based on some data in your browser. These sites linked that identifying data to information "data brokers" you’ve never heard of collected about you.
How did they get the data? Some service you use gave them that data. That service can do that because their terms that you agreed to that one time said they could.
These sites then use this identity to track your behavior over time, even as you (still!) have never introduced yourself formally. Activity on one site could be readily linked to another. This implied identity infrastructure made possible the personalization and ads experiences on the internet we use today.
In this era, consumer enterprise regularly highlighted unauthenticated MAU growth to shareholders as a mark of growth. Later this year, though, those metrics will no longer cut it. Without identity, none of the internet flywheel of
more data ==> better UX
works.
Big tech is killing implied identity. Browsers like Firefox, Safari, and Chrome are depreciating cookies and IP address. No longer can sites readily identify you just based on the browser you use or where you access the internet from.
This has caused a rush to loyalty and scaling of loyal authenticated traffic, where activities can be tracked from session to session. This loyal attention is proving incredibly valuable to businesses, who use it to personalize their experiences and grow massive retail media ads business.
Everyone wants to become their very own Walled Garden.
What is a Walled Garden?
When we hear Walled Garden we think consumer social like Meta or Snap. It’s important to be clear about what a Walled Garden is – many have explored the idea of digital gardens to gardens of adtech, but we say Walled Gardens have only two properties
They command attention: consumers spend time engaging within them
They know who users are: they wield authenticated Identity
Commanding attention makes the Walled Garden interesting. Without attention there’s no distribution and no business. Knowing identity, on the other hand, underwrites the the longitudinal behavioral data that unlocks the flywheel of self-improving systems that make walled gardens better the more you use them – usually with machine learning or AI.
Walled Gardens have also implied a level of security and discretion. Though Apple’s gardens have rich customer data, Apple makes sure that the data they collect is only used within their controlled ecosystem. Gardens usually deliver alignment to users by making clear the context for data that's collected and used.
Digital gardens on the internet take a new form in the death of implied consent. Many platforms wield attention, but far fewer have scaled identified traffic – sessions among users who’ve logged in.
Walled Gardens are great businesses
Search and social have built among the internet’s largest and highest margin businesses from attentive identified (authenticated) traffic, pioneering the internet’s most engaging and largest ads businesses.
Businesses like Instacart, Uber, Walmart, and Lowes – folks we haven’t traditionally thought of as walled gardens, but who also have authenticated attentive traffic – are seeing massive rewards from selling ads.
The fruits of attentive identified traffic are too good to pass up.
Walled Gardens are great for AI
Walled Gardens are also great for AI. To power consumer experiences that command attention, AI needs to know who users are. Since implied identity is dead, the only pathway there is from building this context from authenticated traffic.
A challenge though is that authenticated traffic – even among the most engaging digital platforms like Google or Meta, and especially for smaller upstarts – only delivers so much information about consumers.
Walled Gardens with identity only know what users do on their platforms. That means, they know what users
Search for
Click on
Scroll past
Spend money on
Share to friends
but that’s it. In cases like Google or Apple, that’s quite a lot. But for most consumer businesses, that’s quite a bit less to go on. For instance, for Lowes’ users who are logged in, Lowes only knows what users have searched for, what they’ve bought in the past, and what they’ve scrolled through. They don’t know that it’s actually Greg’s grand-daughter’s birthday, and the family is set on building a swing set as a father-son activity. Lowes can monetize this traffic, sorting this user into loose audiences based on historical interactions, but Lowes doesn’t really know who Greg is. Not really anyway.
Lowes certainly doesn’t know Greg as well as TikTok does. Or Google does. Or Apple does.
Greg doesn’t really care about any of this, of course. He just wants a great user experience – one that anticipates his needs.
Lowes can’t really do that today. Even as it rises as a new Walled Garden, all it knows about Greg is what Greg has done at Lowes.
Make our garden grow
Vibrant gardens are cross-pollinated.
Today, the more time Greg spends in one garden, the better his experience gets in that garden alone. That helps thatGarden, but it doesn’t fully help Greg. It’s true Greg gets a better user experience in the given Garden, but Greg’s usage gives rise to additional context that could make his experience better in every other Walled Garden he uses.
Greg benefits from cross pollinating Walled Gardens.
Users like Greg benefit from a means to go between walled gardens and take the behavioral data collected from wherever he’s been and use it wherever he’s going. Today, Walled Gardens are unable to truly serve all of Greg because Greg hasn’t happened to have shown them that particular data. Maybe he doesn’t want to. Maybe it doesn’t exist yet! Maybe Greg wants them to anticipate all of his needs without any extra work.
No surprise – Crosshatch unlocks cross-pollinating walled gardens. Crosshatch is an identity layer that allows people to safely share their data to apps they love to unlock this cross-pollinating context.
Early design partners have asked us about cross-pollination – will this hurt their business? Our core belief is that cross-pollinating Walled Gardens actually makes the entire internet richer and more diverse. Users like Greg will spend more time within rising Walled Gardens because his experience is so much richer within them and because of them.
When Greg uses Gmail, new data reflecting his reservations and purchases are collected.
That’s then used by TripAdvisor, who can help Greg better plan trips by inferring the types of restaurants, hotels, and destinations he already likes to provide suggestions that resonate with his past trips.
These new trips cross pollinate to Greg’s Spotify, which now plays tunes in anticipation of his upcoming trip. Spotify knows that Greg recently booked a trip to Miami, and so it’s playing songs he loves but also tunes loved by the people in the neighborhoods he’s scheduled to visit.
This listening data cross pollinates to Greg’s Resy account, who anticipates his mood and suggests a night out with a friend he hadn’t reached out to in a while.
Cross pollinating walled gardens makes gardens more valuable, not less. Cross pollination is also useful to upstarts, who are just starting to build their own Walled Garden, and can deliver great new user experiences pollinated by context from the user’s past.
Cross pollination doesn’t mean giving up agency or security however. Global privacy law encourages this cross pollination (“data portability”) but for reasons users agree to and with minimal data shared – only that required to provide value to users. Cross pollination doesn’t have to cost user control.
Voltaire knew that the world isn't always the "best of all possible worlds" – it's up to us to roll up our sleeves and make it better. We can't just sit back and expect everything to work out on its own. We have to take responsibility for our choices and actions, both as individuals and as an industry.
If you’re interested in how Crosshatch can help cross pollinate Walled Garden context: deliver hyper-personalized experiences on log-in and for loyal customers – we’re engaged with design partners building a new cross-pollinated internet with AI. Reach out – we’d love to hear about what you’re building.
–
And let us try before we die
To make some sense of life.
We're neither pure nor wise nor good;
We'll do the best we know;
We'll build our house, and chop our wood,
And make our garden grow.
What is Your AI?
Personal AI is going to be everywhere. It's a composition of attention, tools, and context. But what makes it Yours?
Scroll to read
3/29/2024
What is Your AI?
Is it a device? Your computer? Your operating system? What about your browser? Or an app?
Is it aligned to you? Is it not? It wants to sell you something! How do you know?
Is it every app? Every where? Any thing?
Is it just one? Or many? A different one every day? Every second? Every other? Says who?
Is it capable? The best?
Is it woke? Is it cracked?
Is it affordable? Or attainable? To whom? And why?
What does it know about you? And how? Does it know everything? Or very little? Will it forget? How do you know?
Is its context yours? Can you take it with you? How? Does it want that? Will it allow you to?
Can it do things for you? Without your knowledge? Just in one place? Or everywhere?
Will others allow that? What’s in it for them? What can they do with what’s shared? Can Your AI control it?
Can you?
What others say
Some say Your AI is singular. It’s aligned to you because you pay it. It stays aligned because tech executives and alignment researchers say so.
Others say Your AI is your operating system. You can run the AI locally. Your computer needs to be turned on. If not it runs in the cloud? Or is the AI always in the cloud? You can trust your computer. What about the internet?
But really Your AI is your browser. The internet is broken. The browser can use all internet apps! Your AI knows how you like the internet. It can filter it for you.
No! Your AI is Their App. They have the best AI model! Well, they did. Have you tried Claude Opus? They will again!
Your AI apparently takes many forms
chat applications like ChatGPT or Claude
answer services like Perplexity and SGE
travel applications like Mindtrip or TripAdvisor Trips
home search like Ask Redfin or Tomo
retail apps like the Shop or Walmart
companions like Character or Dot
browsers like Arc and Chrome
embodied companions like Figure and Looi
Operating systems like Windows Copilot or Google Duet
and many more!
It’s not just one thing. It’s everything. Everything everywhere all at once!
What Crosshatch says Your AI is
Who cares.
It’s not up to us.
We’re tired of being preached to.
Your AI is up to You.
We believe that Your AI should be whatever and wherever you want it. Whether it’s an application, a service, a telephone number, a wire, a swing set, a boombox or an operating system. We’re not here to tell you what Your AI is.
That’s for you to decide.
It’s for you to decide how much, how little, how big, how small, how expressive, how sparse, how connected, how covert.
That’s the only thing we believe.
AI is whatever you say it is.
Your AI is incomplete
Your AI has three traits
it commands attention
it can use helpful tools
it knows who you are
Your AI, whatever you choose, commands your attention if you decide to give it.
It can probably use some tools to help you. It can help you find groceries, send a message, meet up with friends or plan your next great adventure.
It doesn’t really know you though.
It knows what you’ve done together! What you’ve seen, said or heard. What you’ve clicked on. What you’ve browsed. What you’ve purchased. Where you’ve been. What you’ve loved. Where you’re going.
But only if you’ve done those things … together.
Your AI is everywhere. It’s everything.
You do lots of things! Everywhere. All the time.
If you don’t do everything together, how will Your AI know who you are?
Do we want to do everything together?
Making the internet more you
Can an AI that doesn’t know who you are –– doesn’t have all your context –– be that helpful?
We don’t think so.
That’s not the future. That’s not how AI worked in the movies.
We’ve seen it first hand – first party data is not enough for the future of AI.
We don’t do everything with one corporation. Or a tab in our browser.
We’re expansive and everywhere. Sometimes alone. Sometimes with friends. Sometimes with a business. Each with their own vibe.
If you want Your AI –– worth your attention, capable and knowledgeable –– to be everywhere you are, new consumer infrastructure needs to rise.
Crosshatch is building this infrastructure.
We believe Your AI is wherever you want it. Deployed on whatever resources you want. Available in whatever interfaces you want.
So you tell us.
What is Your AI?
Data gravity
AI moves to where the data live. Consumers have the most data. You are the sun.
Scroll to read
4/6/2024
Last week Brad Gerstner and Bill Gurley discussed language model switching costs. What they and others have observed is that while many enterprise start prototyping with private language models, many are switching to open source models run in their own systems for security and control.
While switching costs do not generally furnish user surplus, they are important ingredients to shareholder value. Bill Gurley rated LLMs as having a 2 out of 100 score for switching costs.
Brad later says that switching costs will not live with the LM but rather “where your data resides.”
“It seems to me that AI is finding its way back to the data” as a sort of “data gravity.”
Data Gravity is old and simple idea that, to our knowledge, was introduced in 2010 by technologist and author Dave McCrory.
Data Gravity. Dave McCrory. 2010
Consider Data as if it were a Planet or other object with sufficient mass. As Data accumulates (builds mass) there is a greater likelihood that additional Services and Applications will be attracted to this data. This is the same effect Gravity has on objects around a planet. As the mass or density increases, so does the strength of gravitational pull.
Services and Applications can have their own Gravity, but Data is the most massive and dense, therefore it has the most gravity. Data if large enough can be virtually impossible to move.
Dave saw latency as a driving force of this gravitational pull, but increasingly it's appearing that security and control are similarly forceful.
Today, enterprise are choosing LMs that live close to where the largest or most important set of data live. For internal enterprise applications, that seems to be the cloud or bare metal where organizations host their data.
But for consumer use-cases, things become less clear.
In consumer, organizations use first-party customer data to try to anticipate customer needs and serve them more relevant content on their properties or others’.
But as we’ve written before, first party Walled Garden data is unlikely to be enough for AI. Even for organizations with lots of first party data, organizations only know what customers have done with them. Organizations only know what we do together.
Users have busy lives outside of what they do with any given organization.
Could the user actually have the data gravity?
Center of the universe
Traditionally, organizations have built personalization systems by centralizing user data and training machine learning models over all their users’ data. Think Netflix’s famous collaborative filtering model. They built it by combining user trait information with past movie ratings and used it to estimate what movies you might like based on what people like you liked. This helped organizations like Netflix get a better understanding of its users, and sometimes outcompete organizations with weaker user understanding.
This pattern powered a rise of switching costs that investors loved. This is because consumers became accustomed to organizations that could serve them better (e.g., TikTok knows what we like) and became resistant to switch to organizations with weaker consumer understanding.
But if a consumer could pull all her data together – all her data across contexts – then she’d have more data on herself than even any large organization has.
AI would gravitate to her.
And with AI situated next to her data, she could reverse the client server model of the internet.
Reverse client-server
In the traditional client-server model, the server ("APP") hosts the application logic, data and resources, processing requests from clients and sending back results. Clients ("User") interact with the app to access these resources.
Left: Traditional client-server model. Right: Reverse client-server "Headless Personalization".
The inverted client-server model reverses these roles. The user (client) controls their data and flexible application logic, today in the form of AI. When the user submits a request to an application, the application sends a request with instructions to the user's AI resource. The AI resource processes the user's data according to the request and user-defined controls, and returns the personalized output to the application. The application then combines this output with its own data or logic to provide the service, without directly accessing the user's underlying data.
You are the sun
This access pattern follows the Data Gravity. With Crosshatch, users have the means to bring all their data with them to any application. Following Brad Gerstner, AI follows the data gravity. While organizations have some first party data, users can bring much more. The reverse client server model is just a reflection of the Data Gravity.
This shift in data gravity and rise of AI to function as a competent application logic engine makes the reverse client server (as described by All In Pod last year) possible. Data gravity is why we’re so excited about Cross-Pollinating Walled Gardens, where third parties will be able to anticipate our needs based on everything we do and share, not just what we’ve done together with one company. Headless Personalization is the architecture for this activating data gravity, where consumers wield the data and a capable data execution engine – AI. It also has nice economic benefits, in that it is it delivers rich personalization (essentially "Jarvis") for a lower cost and in a way consumers trust.
If you’re interested in building new experiences that anticipate customer needs, give us a ring – we or one of our trusted partners would love to help! Reach out!
The personalization infrastructure phase
In 1994 shopping sites couldn't remember users across sessions. Today AI can't remember us across surfaces. A study of app-infrastructure cycles in personalization.
Scroll to read
4/17/2024
At Crosshatch we’re fascinated by tech cycles and how and when they create opportunity.
On repeat for us has been Dani Grant and Nick Grossman’s “Myth of the Infrastructure Phase,” a historical study of how “apps” like the light bulb or the airplane inspire infrastructure like the electrical grid or the airport –– not the other way around.
A big part of why this is even a topic of conversation is that everyone now knows that "platforms" are often the largest value opportunities (true for Facebook, Amazon/AWS, Twilio, etc.) — so there is naturally a rush to build a major platform that captures value.
This AI cycle is no different – many are endeavoring (we are!) to find a seat delivering infrastructure for this new age.
But a teased infrastructure phase can be a trap: breakout apps come first and inspire infrastructure before infrastructure has a right to rise. And the evolving capabilities of AI introduce new challenges to sizing where in the cycle we are.
For instance, AI apps with problems like limited context and reasoning of 2022 that limited AI memory and logical capabilities seemed to imply the righteous rise of infrastructure. But models themselves have improved now with over 100x the memory and reportedly near-human capabilities in some benchmarks. Problems you might solve for customers with infrastructure today could be resolved by better models tomorrow.
We saw this in February when Klarna reported resolving nearly two thirds of customer queries with their own application of OpenAI model calls. Despite the availability of many startups shipping customer service SaaS solutions to do this on behalf of Klarna, Klarna built this themselves.
What we see in the sequence of events of major platform shifts is that first there is a breakout app, and then that breakout app inspires a phase where we build infrastructure that makes it easier to build similar apps, and infrastructure that allows the broad consumer adoption of those apps.
USV: "Apps and infrastructure evolve in responsive cycles, not distinct, separate phases."
So now for us, Crosshatch is building a new identity layer – personalization infrastructure for transferring state information across applications – to make AI that remember us across distinct interactions available on any surface with any model in a way end-users control.
And so we wonder: Are we falling prey to this natural rush to build infrastructure before a breakout app unlocks the right to build supporting infrastructure? Or does the history of personalization and networking somehow resist this pattern – maybe we’re bound by some other framework?
In this blog, we attempt to map waves of apps => infra => apps => infra … to the history of personalization and networking.
What we find is that the history of personalization and networking also follow these waves. And if we’re reading everything right, personalization is just about to enter an infrastructure phase.
A history of personalization and networking
We set out to study the history of personalization. We include networking in our study because personalization is an act of aligning objects to an individual’s tastes or requirements. Since candidate objects to be personalized are administered (and sometimes owned or rightfully governed) by third party entities, the act of personalization on the internet requires some networked communication, and is why we include networks in our study.
We start early with the industrial expansion of the railways. “Apps” Montgomery Ward (1872) and Sears (1886) shipped catalogs of items with hundreds of pages. Most consumers had only shopped at local general stores within traveling distance. A catalog with thousands of items offered a first glimpse into personalization, expanding consumer choice to find items among many that were most suitable to them.
This led to the rise of logistical and distribution infrastructure. To meet the demand generated by these catalogs and ensure the delivery of goods, robust logistical systems were developed like UPS (1907) and USPS’ Parcel Post service (1913). Market research infrastructure like Nielsen (1923) to help businesses better understand their customers.
But as the distribution of goods grew, brands “apps” like Proctor and Gamble, Unilever (1929) and Colgate Palmolive (1928) looked to differentiate themselves and their individual product lines. P&G created a brand management system (1931), where each brand manager controlled all aspects of their brand – like running a small business. Each brand manager studied their product’s audience, and aligned their brand’s positioning with the studied needs of the customer. These apps pursued new messaging in the forms of sponsored radio and soap operas. Apps began to talk to their audiences directly instead of intermediated by a catalog.
Then there was the wave of broadcast infrastructure with NBC (1926) and the FCC (1933) that set the stage for brand-driven broadcast personalization. Brands like Coca Cola and P&G refined their brand identities and product offerings on broadcast radio (1927) and later TV for deep connection rather than broad reach. These apps personalized through messaging that aimed to set the zeitgeist, transforming simple products into symbols of lifestyles and aspirations. Infrastructure like the RCA color TV (1954) enabled Mad Men to help apps like Marlboro to espouse aspired lifestyles (“Marlboro Man” 1954) leading customers to aspire or relate to brand personas. As demand for personalized messaging rose, infrastructure like Zip Codes (1963) and OCR deployed at the USPS (1965) allowed post offices to read and sort up to 42,000 addresses per hour.
Infrastructure like Epsilon (1969) and Acxiom (1969) enabled brands to further scale their mailings, enabling mailings personalized to each user.
With the internet came apps like messaging (1970) and email (1972) that sparked an infrastructure wave including Ethernet (1973), TCP/IP (1973), ISPs (1974) and Oracle’s Version 2 (1979). These advancements paved the way for the emergence of web portals like Prodigy (1990) and AOL (1991), which in turn fueled the development of search engines and web browsers in the early 1990s.
Then there is the next wave of apps, which are services like American Express Membership Rewards (1991) and sites like Ebay (1995) and Amazon (1996).
There was a problem though.
Every visit to a site was like the first, with no automatic way to record that a visitor had dropped by before. Any commercial transaction would have to be handled from start to finish in one visit, and visitors would have to work their way through the same clicks again and again. It was like visiting a store where the shopkeeper had amnesia.
The NYT recalled in 2001.
This lead to new infrastructure: the cookie (1994) was born, with a patent filed in 1995 and deployment in Internet Explorer in 1995.
Lou Montulli: "A method and apparatus for transferring state information between a server computer system and a client computer system"
This small piece of data stored on users' computers became the critical tool for enabling websites to remember returning visitors, their preferences, and the contents of their shopping carts, from their site and even others.
Epsilon Agility maketing documentation (left) and Acxiom's identity marketing (right) both in 2012 from Internet Archive.
Then came the era of enhanced e-commerce and personalized digital experiences. The cookie enabled sites like Amazon to recommend products based on previous visits, and eBay to enhance user experiences by remembering login details and viewing preferences.
Web analytics infrastructure company Omniture (1995) rose to help websites better understand their audiences and traffic. Next came apps like Expedia (1996), Realtor.com (1995), Zappos (1999), Hotwire (2000), TripAdvisor (2000), Trulia (2005) and Zillow (2006).
Then there was the wave of infrastructure like AWS (2006), the iPhone (2007) and Shopify (2006) and the cycle continued into apps like Amazon and eBay for iOS (2008), Spotify (2008) Instagram (2010) and Pinterest (2010) and then infrastructure Segment (2011), BigQuery (2010), Epsilon Agility (identity) (2012), Acxiom Identity Solutions (2012?), iBotta (2011), Fetch Rewards (2013), Databricks (2013) and TensorFlow (2015).
This lead to AI applications like ChatGPT, Pi, and Claude with millions of users. And privacy-focused browser updates like Safari (2021) Private Relay and Chrome (2023) Privacy Sandbox.
The history of personalization follows other tech cycles, with an interplay of apps to infra to apps to infra and so on. AI has hastened the pace of innovation, with multiple huge leaps happening in a single year.
The next infrastructure phase in personalization
Like applications before them – Montgomery Ward to Amex to Ebay to Pinterest to Zillow – all inspiring an infrastructure phase that democratize and scale the innovations and experiences that enabled breakout apps.
AI applications like ChatGPT (2022) and Pi (2023) gave us a taste of what a Personalized AI could be. Now every board is asking leaders how they can include AI in their business. Internal enterprise applications are under way but consumer applications have lagged.
We were kids back then, but consumer AI today feels to us like the internet must have felt in 1994. AI across domains lack state. Each new AI we engage with – whether on an app, a swing-set, a clip-on, or a browser – doesn’t know what sessions we just had with the last AI, as governed by someone else. Our sessions with one are never known by the next. Like the shopping sites of 1994, each time we land on a new AI powered surface we have to work our way through the same context again and again. Introduce ourselves again and again. Everything must happen in a single service.
The breakout app in 2022 was ChatGPT.
Netscape shipped the cookie.
Crosshatch is shipping the wallet.
Information games
Portable context helps us relate to the world. But it can also relate the world to us. How selective sharing (and un-sharing) can drive consumer value.
Scroll to read
4/20/2024
Everyone wants to ship hyper-personalized experiences today.
Glenn Fogel CEO at Bookings shared with Barrons in January
You want something personalized so much so that it's just like it was in the old days when it was a human being travel agent that knew you so well that knew what you liked and what you could afford.
Walmart’s CEO Doug McMillon said the same in August last year
There’s a great opportunity for us to be more anticipatory, and to be more relevant to [customers] and communicate in a way that shows that we know who they are, in a healthy way, while protecting privacy.
Hyper-personalization – the anticipation of customer needs given complete user context – feels inevitable.
We really want a hyper personalized experience that understands our needs one step ahead of our own brains.
said Micah Rosenbloom a seed stage investor at Founder Collective.
Hyper-personalization’s benefits to businesses appear clear – especially among businesses with large inventories, personalized recommendations of relevant products shift up consumer demand and increase revenue.
But hyper-personalization – its operation on vast consumer context – could also come with a cost.
You like that car, huh?
On a path to hyper-personalization, consumers are poised to have a choice. Turn on hyper-personalization in an app – bring some or all of your context with you to unlock hyper-personalization – or not.
The benefits of turning it on is that the internet becomes more like you. Everything you do in your life can be reflected back by any app, service, piece of hardware, or ecommerce store you want. No need to reintroduce yourself to each domain – just turn on hyper-personalization with a tap.
But what about hyper-personalized prices?
For instance, suppose you’re at a car dealership looking for a new car. The salesperson asks to learn a bit about you
What sort of car are you in the market for
Any particular reason you’re looking for a new car (new job? promotion?)
Fast or fuel efficient?
Are you a fan of the brand?
Even what you’re wearing could give the salesperson an impression of what sort of customer you are.
Based on all of this information, the salesperson can give you a recommendation. You benefit from this because you might not have known that much about the available options, and your additional context is helpful for the salesperson to recommend the car that best fits your needs.
On the other hand, when it comes to buying the car, all this shared information could be used against you. For instance, the salesperson knowing that you love a particular car may make the salesperson less willing to compromise on price. They are personalizing their price according to their estimation of your willingness to pay.
Many car dealers have committed to resolving this by setting a no-haggle policies wherein all prices are fixed up-front.
On the other hand, e-commerce uses dynamic pricing broadly, using demand signals and other inputs to adjust prices throughout the day.
This is related to but empirically distinct from microeconomics’ price discrimination. Here’s how a Boston Consulting Group brochure described it in 1986:
A key step is to avoid average pricing. Pricing to specific customer groups should reflect the true competitive value of what is being provided. When this is achieved, no money is left on the table unnecessarily on the one hand, while no opportunities are opened for competitors though inadvertent overpricing on the other. Pricing is an accurate and confident action that takes full advantage of the combination of customers' price sensitivity and alternative suppliers they have or could have.”
With hyper-personalization, first-degree price discrimination – where firms charge each of us exactly our willingness to pay – seems entirely possible.
Under hyper-personalization, firms know our past actions and preferences – they could theoretically figure out exactly how much we value everything and charge us exactly that.
Though personalized pricing could lead to greater efficiency like making products available to more people, it also appears to lead to less consumer surplus, taken as the difference between our willingness to pay and the prices we actually pay.
Ideally consumers could get personalization without affecting consumer surplus.
I know you know I know
It seems though that we shouldn’t worry too much about this.
This is for two reasons.
First, even though many businesses already have access to rich data about us, empirically they don’t seem to use it to personalize prices as much as they could.
For instance, Uber adjusts ride prices based on where you are and how many drivers are around. Of course, Uber knows other personal signals that it could activate to further adjust its prices. For instance, suppose it's cold and rainy and your phone is at 3 percent battery. If you're looking for a ride, that would be pretty rich signal that your inclination to pay a higher price is probably quite a bit higher than usual. That said, Uber reports not using signals like battery life for its dynamic pricing.
An older 2017 study of 2000 e-commerce sites similarly found that personal-data-induced price discrimination is rare.
Second, it turns out the possibility of personalized pricing under selective context sharing in hyper-personalization might make consumers better off.
Microeconomics professor Shota Ichihashi studied this stylistically in the prestigious American Economic Review.
In deciding to share context, consumers face a trade off: share more context in exchange for more relevant experiences but at the possible cost of price discrimination. In this framework, a seller can
commit to not personalizing prices to induce more sharing, setting prices in advance
not commit and leave a consumer to decide how much to share given the possibility of personalized prices, setting prices once context is observed
Timing of moves: whether a seller commits or not to personalize prices.
No surprise, the model reveals the seller is better off committing to not personalizing prices. The intuition is what you’d expect. When a seller commits to not personalizing prices, consumers tend to share much more context that sellers can use to provide more relevant experiences that shifts up customer spending.
Interestingly, however, this seemingly consumer-friendly policy – committing to not personalize pricing – may actually make the consumer worse off.
Why?
Ichihashi explains
To see this, imagine that a seller commits to not personalize prices, and all consumers except you provide their data. The seller can use data to learn consumers’ tastes, and show each consumer the products he or she likes. Because the seller can present each product to consumers who value it highly, the seller can set a relatively high price for each product. Since prices are not personalised, you will also have to pay these high prices, regardless of whether you provide data.
That is, other customers’ context sharing has the negative externality of higher prices, that you, someone who decided not to share, has to pay. In the car dealership case, although a no-haggle policy allows customers to be fully open about their needs, it could also lead to higer prices for someone more privacy sensitive.
Will AI protect us from price personalization?
Ichihashi focuses on the case where a buyer discloses information to a seller in exchange for a more relevant recommendations and experiences. It’s understood that the seller has richer information about available inventory than the buyer, and that’s why a buyer must disclose information to the seller.
Many are excited about the possibility of user-owned AI agents protecting us from these dynamics.
We’re not so sure.
Suppose you’re thinking of buying products from a seller, and you’ve dispatched your AI agent to learn their inventory to recommend a product to you. In this case, the seller cannot personalize prices because it never has an opportunity to – your AI agent scraped all the seller's inventory and your agent does its analysis in an environment you own.
But if the seller can infer that your AI agent is accessing its inventory on your behalf, it can revert to its strategy of setting relatively high prices for its products, as it knows that its products are distributed – via buyer-owned AI – to those who value its products highly, even though it’s not the one doing the personalization.
So while this scraping pattern keeps your data private (though possibly violates seller Terms of Service), it may not protect against price personalization.
Turning on hyper-personalization
In turning on a hyper-personalized internet, we’ve worried about implications of discrimination. If a third party now knows something about us, what can they do with that information, even if it’s shared under an explicit agreed-upon context?
We're excited to see more study here. For instance, extending Ichihashi's framework to competitive markets with differentiated products and repeated seller-buyer interactions could give new context into incentives to personalize prices as a function of how seller reputation affects buying decisions.
Aside from strategic implications, it seems like many businesses do not use personalized prices because it just doesn’t feel right. Turning on hyper-personalization should create great internet experiences while leaving consumers feeling like they’re getting mutual value.
But we’re excited to see theory showing how, per Ichihashi
“it could be optimal for an individual consumer to [turn on hyper-personalization], as long as sellers do not personalise prices.”
Privacy and collaboration
Open collaboration is one of America's greatest strengths. But its realization on the internet has felt constraining and icky. The pathway of AI to a more open and collaborative internet.
Scroll to read
4/28/2024
The internet has a control problem.
Our data, powering personalized experiences and ads on the open internet, flows to anyone who can pay for it or get invited by an exclusive co-op to share in it.
We’ve slouched toward accepting this is how the world works. It feels icky. Some, like Privacy professor Daniel J Solove think it's hopeless: "managing one’s privacy is a vast, complex, and never-ending project that does not scale." An open internet trading limited personal data for usage in contexts we’ve never discussed.
Early methods to try to control it have felt frustrating and left us with cookie fatigue. It seems the natural response then is to call for more privacy by law or self-preferencing corporate policy.
We believe this is a mistake.
The trouble is that privacy lacks consistent definition. Ask different people what it is and you'll get different answers. Privacy is, however, a specific expression of control. The real issue isn't what's known about us for what and by whom, per se.
The issue is who gets to decide.
This blog explores how privacy, control, and open collaboration intersect in the digital age, and how language models could provide a path forward for enabling controllable collaboration while respecting privacy preferences.
Open collaboration
Control is hard if you want open collaboration.
And we want open collaboration. It has a rich history in America. It's one of our greatest traits. Alexis de Tocqueville admired America's collaborative spirit in Democracy in America.
An association unites into one channel the efforts of divergent minds and urges them vigorously towards the one end which it clearly points out.
The most natural privilege of man, next to the right of acting for himself, is that of combining his exertions with those of his fellow creatures and of acting in common with them. The right of association therefore appears to me almost as inalienable in its nature as the right of personal liberty.
Toward adopting a notion of collaboration in a technical setting, we look to behavioral science that takes collaboration as
a process through which parties who see different aspects of a problem can constructively explore their differences and search for solutions that go beyond their own limited vision of what is possible.
This definition maps nicely into the mechanics of digital collaboration. We pose our own definition, saying
Digital Collaboration happens between two or more Data Controller agents who
see and share different information sets, instructions, processes or algorithms
all toward solutions superior than those otherwise resolved independently.
We introduce GDPR Data Controller language because collaboration involves the sharing of data and context amongst different people or entities. Each of these entities have the ability to take this information and do what they like with it. The legal definition of Data Controller captures and formalizes this capability.
Open digital collaboration, however, is one of the hardest and most important problems on the internet today. How do we create a way to make the internet richer and more abundant through collaboration while still enabling effortless agency and control among consumers and businesses who wish to collaborate?
We saw the depth of this challenge last week when Google delayed its cookie deprecation for a third time.
Third party cookies and related fingerprinting technologies are how the open internet does frictionless targeted ads and measurement, but it comes at the cost of consumer control. In this case, advertisers, publishers, and data management platforms (and indirectly consumers!) are data controllers who each have a different view of the consumer, context, and goal. They Collaborate toward more relevant and engaging ads and experiences.
Google’s Privacy Sandbox initiative increased privacy but came at the apparent cost of open collaboration.
Our sense, however is that it’s not even about privacy.
Privacy is about control
Privacy is a nebulous concept.
Many in law, philosophy and culture have attempted and failed to find a satisfying conception of privacy. It’s been over 50 years since Arthur Miller complained privacy as so “difficult to define because it is exasperatingly vague and evanescent” and even today’s privacy law experts concede there’s no unified conception of privacy.
America’s first articulation of a right to privacy –– Warren and Brandeis’ 1890 piece in the Harvard Law Review –– still reads true though a little dramatic today
Gossip is no longer the resource of the idle and of the vicious, but has become a trade, which is pursued with industry as well as effrontery.
What would Warren and Brandeis think of the $10+ billion data broker industry or the $300+ billion ads business trading on estimations of who you probably are or what you probably like?
Nearly 60 years later Harry Kalven Jr.’s critique of the complexity and confusion of privacy as a concept still has teeth
Conceptually I am not sure I see what the invasion of privacy consists of. Is it an invasion of privacy to say falsely of a man that he is a thief? What is his grievance? Is it that … it would have been an invasion if true and no one had known it? Or is it that it is false? Or is it that his name has been used in a public utterance without his consent?
Even contemporary privacy offerings are hard to understand. Technologists clamor for "local-only AI" as favorite privacy-first devices like iPhone deliver content availability through the cloud. Companies advertise privacy protections like differential privacy but reportedly choose privacy parameters not in keeping with those considered acceptable by research communities. Either way it's unclear if consumers would even understand what that means. Despite a century of writing of privacy, no one seems to know exactly what it means nor understand what it entails when otherwise promised.
Our view is that privacy is an instance of control. You can have privacy but not control (Privacy Sandbox) and control but not privacy (deciding to attend Santa Con or gay pride).
As we build new technologies enhancing consumer agency or privacy, it’s critical we know what specifically the tech is responsible for.
Economics’ view of privacy is more tractable.
Tradeoffs of sharing
Economists have been interested in privacy since the 1960s, but specifically in a study of the trade-offs arising from protecting or sharing of personal data. We wrote about one of those tradeoffs last week.
Analyzed as economic goods, privacy can be revealed to have interesting and sometimes unusual properties e.g.,
Privacy is a "public good", namely someone else not protecting their privacy could end up affecting your privacy!
Privacy has a confusing mix of tangible properties (like getting coupons from sharing data) and intangible properties (“that felt icky”)
Privacy lacks a clear way to value it: is it worth
the minimum amount of money you'd accept to give it up
how much you might pay to protect it
the personal cost of it being exposed to the public
the marginal expected profit someone else gets from acquiring it
how much positive or negative publicity you get from sharing
As we think through privacy at Crosshatch, and the mechanisms we might need to secure it (if we decide to characterize it as an action of protecting privacy at all!) we find ourselves preferring the economic framing of the concept for its simplicity and tractability. This framing also clearly motivates a user-centric view of privacy.
Privacy is given by tradeoffs defined by users.
This centers privacy really as a question of agency – of sharing (and un-sharing).
But what does privacy have to do with collaboration or AI?
Controllable collaboration
The core question for the age of AI and abundance where
Data Controller collaborators of different known contexts can come together to exchange information for the production of greater value than they could alone
is what are necessary conditions to Collaborate?
On the other hand, necessary conditions could be too strict. What's necessary could depend on consumer preferences. Instead we may wish to pose this in an economics setting. What information is inexpensive enough to be revealed in a given collaboration setting?
For instance, in real life we readily share more than what's strictly necessary for a collaboration. Our neighborhood florist knows our partner's birthday. The coffee shop knows we like cold brew in the morning but cortados as a treat in the late afternoon. In real life we don't constrain engagements to the minimum necessary information but informally bound it according to the context of the engagement.
So the open question is what data beyond that which is necessary is sufficiently low cost to share for a given collaboration paradigm?
Language models offer a unique path to begin to explore this.
AI as a trusted facilitator
Before language models, the technical requirements necessary to digitally collaborate were much greater and framented. Business collaborators needed to collect troves of consumer data, combining it in their own compute environments to create models and systems they used to compete to serve us.
Language models, on the other hand, appear as a route to resolving this fragmentation. They have a more forgiving interface than any prior machine learning system and can execute programs of natural language. Their broad democratization – available to anyone not just organizations who've collected a bunch of our data – weaken the power dynamics that motivated massive data collection in the first place. Data gravity is shifting to the consumer anyway.
It seems then that in the age of AI open collaboration under Collaborator control could simply mean
the well-structured and secure combination of collaborator instructions and private information “context” for a given AI resource with rules for what Collaborators can see at the conclusion of a given Collaboration.
To collaborate with language models as a trusted intermediary, all collaborators need to do is define what context could be available for what purpose.
With the Crosshatch model, apps show consumers the potential benefits of allowing some data collaboration. Consumers can then decide exactly what information they want to share and for what purpose (or defer to an authorized agent). Crosshatch acts as an intermediary, facilitating these collaborations through secure language models while enforcing the consumer's chosen sharing preferences.
We’re particularly excited about this path because it’s not even a new construct.
This collaborative model powered by language models is reminiscent of the "clean room" approach used in ad-tech for privacy-preserving data sharing. However, our proposed framework extends this concept further by creating a consumer interface and making configuring controlled collaboraton effortless. We then focus on
How easily can agents enter into Collaborations?
Do collaborators need to change how they work to Collaborate? What about to further minimize data shared as a part of a Collaboration?
The future of open collaboration on the internet will be defined by the rigor of our framing of collaboration –– what actually is it and who really are the Collaborators who have a rightful seat at the table –– and the effortless control Collaborators have to enable rich collaborations.
The AI-enabled collaboration model is a promising path forward. Taking secured AI as trusted compute environments for the execution of collaborator-aligned instruction-based collaborations is a simple way to begin to enable flexible open collaboration under collaborator control.
Taking privacy as an economic object shows privacy not as some exasperating vague object but counter-intuitively, one that enables more sharing and collaboration the more inexpensively controllable it is.
And with language models, a simpler 18th century posing of privacy
It is certain every man has a right to keep his own sentiments, if he pleases. He has certainly a right to judge whether he will make them public, or commit them only to the sight of his friends.
has a real chance of coming to fruition. And if we can be forgiven for being so excited by Tocqueville, can help grow our distinctive American richness
The most democratic country on the face of the earth is that in which men have, in our times, carried to the highest perfection the art of pursuing in common the object of their common desires and have applied this new science to the greatest number of purposes.
Go direct
First party data is mostly "meh" and third party data is deteriorating. The future of personalization? Let customers introduce themselves in a tap.
Scroll to read
5/5/2024
For too long, brands have yielded control over personalized customer experiences to happenstance or misaligned or unconsented middlemen.
First party data strategy relies on what data a business happens to observe. And third party data from honeypots of data indirectly or unknowingly collected.
This year it’s all coming to a head.
This year Google had scheduled to deprecate cookies in its Google Chrome browser. Google Chrome is the most popular internet browser and delivers the internet to nearly 3.5bn people or 65% of the market.
While last month they delayed this action, the writing is on the wall: Browsers are moving against cross-site tracking and identifiers.
This includes both third party cookies and IP addresses. This challenges recent band-aid approaches like ID-bridging, that use coincidental observations of IP address and other identifiers like email to infer identity when only one is observed.
The old playbook of relying on first party data or third parties to supply “privacy-preserving” customer data through broken identifiers is obsolete.
Implicit identity is dead.
First party data is underwhelming
Five years ago, first party data strategy to
identify customers
track events (“account created” or “product viewed”) and
route events to activation destinations like Klaviyo, BigQuery, or Snowflake
were all the rage. Even before cookie deprecation, first-party data was the pathway to developing experiences that serve each customer individually.
But now, AdExchanger reports some are wondering if the first party data revolution is actually a dud. One Fortune 100 CPG executive now thinks the first party data strategy is – "meh."
A combination of unreliable identifiers and ballooning costs of CDP licenses and supporting data science and marketing headcount contribute to making using first-party data for granular personalization tough to pencil out.
But not for everyone.
First party data among scaled players like Walmart, Instacart and Doordash have proven to be incredibly valuable. These retailers have scaled their visibility into user behavior – in particular, action events like purchases – to create lucrative ad networks.
It seems first party data, like everything else on the internet, experience a barbell effect: Those with the most attention and transaction data are best positioned to activate it into efficient personalization and advertising.
Of course, as we’ve noted previously, even these scaled players have a context that’s limited to the universe of activities that their customers happen to do with them. DoorDash knows your burrito order, Walmart knows the staples you sometimes buy, and Lowes knows about some of the home improvement projects you work on.
No one today has the full picture of a consumer – only pieces.
Third party aggregators are unlikely to help.
Third party data is misaligned and degrading
Data aggregation is big business. Last month rewards company iBotta raised nearly $600mm in its IPO, valuing the company at over $3bn. Fetch Rewards was last valued at $2.5bn. Publicis Groupe bought Epsilon for $4.4bn in 2019.
Data aggregators come in two flavors: those
with a relationship with a customer (“rewards”) and
those without (“data brokers”)
Companies like iBotta and Fetch are in the former category. Each have an app that allows consumers to trade purchase data for personalized offers from partner brands, whether in their app or a partner retailer or publisher. Of course, the quality of aggregator data is only as good as the consistency of users in inputting data (e.g., snapping pictures of receipts).
The other class of data aggregators acquire data via partnership deals with companies that do have a customer relationship and flexible terms of service that most consumers do not read. Since their relationship with the customer is once-removed, they have an additional problem of reconciling customer identities across partners.
That is, if you have accounts with two different businesses under different email addresses, it’s harder for these aggregators to see that it’s you at both businesses. And with identity like cookies and IP address now harder to observe, this identity reconciliation has gotten harder and less reliable. In the last few years we’ve heard (and seen for ourselves) that the quality of data broker data is degrading.
In both cases, the data-value exchange is either high-friction, complicated or a little creepy. In the rewards case, apps either require users to take pictures of receipts for specific rewards (e.g., $1.50 off Axe Hair Care) or connect data for the promise of points, with variable conversion to rewards. The data connection is specifically for multi-brand rewards or payouts, not to strengthen a direct customer relationship. In the data broker case, often the customer isn’t even aware that this exchange is going on. Data broker data are usually de-identified or aggregated to navigate the awkwardness of sharing data that a consumer doesn't know about.
While iBotta’s core B2C rewards business is growing 19% year over year, its performance network, enabling brands to white-label iBotta rewards technology and offer rewards directly to their customers (even those without an iBotta account), grew over 700%, according to their S1 filing. This growth suggests that consumers are eager for direct, frictionless experiences with brands they love, rather than going through misaligned intermediaries.
Go direct (or go home)
In the early days of the consumer internet, brands had little choice but to rely on intermediaries and aggregators to deliver frictionless personalized experiences. The infrastructure for direct digital relationships simply didn't exist.
But now, we have the tools to build something far more powerful: an open identity layer that empowers consumers to connect directly with the brands they love, on their own terms.
This is the vision behind Crosshatch.
For us, going direct doesn’t mean more rigorous event instrumentation. It doesn’t mean longer customer Typeform onboardings. And it certainly doesn’t mean delaying personalized experiences until you’ve collected enough first party data to deliver something meaningful.
Going direct means showing and delivering clear value from connecting data.
Crosshatch provides the infrastructure for direct and frictionless value-driven relationships between brands and consumers.
With Crosshatch, brands can deliver hyper-personalized experiences by connecting directly with consumers and their data, without relying on unreliable third-party identifiers or opaque data brokers. For consumers, Crosshatch offers a simple, secure way to share data selectively in exchange for tangible value.
Shifting to this new paradigm won't be easy - it requires rethinking long-held assumptions and accepting some short-term friction. But the brands that embrace this shift will be rewarded with greater trust and loyalty from hyper-personalized experiences they ship that truly anticipate their customers' needs.
The alternative is to be left behind, relying on increasingly unreliable and inefficient intermediaries as the world moves on.
So go direct. Build experiences that respect consumers and reward them for sharing their data. Forge a new path in personalization, grounded in transparency and trust.
The future belongs to the brands that go direct - will you be among them?
[This post was inspired by Lulu Cheng's Go Direct manifesto.]
Shape of you
What could be an interoperable context layer for hyper-personalized AI? Any AI over a unified events representation of all interactions you've had with the world.
Scroll to read
5/11/2024
At Crosshatch we represent the changing shape of people depending on the data that’s shared. We do this to represent how each application we use, even combinations of them, only know a portion of us.
Outside of the digital world we’re whole.
Even combinations of the many apps we use only see a piece of us. They could know these pieces because that’s all we’ve decided to share. Or had time to share? Maybe these applications leave us to act a certain way.
Flirty? Serious? Organized! Distracted? Luxurious.
Contingent.
There is a wound that won’t heal at the center of the internet.
On the internet our identity is de-facto contingent. Contingent on the bounds of any application.
At Crosshatch we’re building an interoperable unified context layer for hyper-personalization anywhere on the internet. We wish for this layer to be both humanistic and of great ergonomics for developers and control of users (or their authorized agents).
Of course, creating a unified context representation is a tall order. In this blog we share our thinking in deriving our first one.
So commercial
A simple place to start is just how the industry treats this problem today. How do businesses today create representations of their customers for analysis and activation?
Most businesses track and represent customer behavior using industry standard action-object framework e.g.,
clicked product
viewed hotel
updated cart
and associate these events to an identity. [Increasingly the identity that underwrites these events is going away.]
The framework is popular because it clearly describes the actions a user has taken and onto what objects.
Objects can be assigned additional properties to further characterize the nature of the product or hotel object that received the action, like category, price or description. These additional properties can serve as important metadata for easy retrieval or analysis that enable better understanding of a user’s journey within a business.
Of course, this representation is only what a given business sees of us.
Other businesses might collect other viewed or clicked or purchased or liked events, but against only the universe of objects they govern.
While this helps define user-application state locally, it doesn’t define the user’s global (cross-application) state, which could be given by the union of tracked events over all applications a consumer uses.
Suppose a consumer could (effortlessly!) assemble all of these events. Taken together could they represent the user?
Do you know who I am?
To try to go deeper here, we take a quick look at philosophy, namely theories of reference.
Given that we’ll use language as a medium for representation, it’d be quite convenient if we could assume that there is a general correspondence between language and reality. To properly represent someone all we need to do is choose the right words.
Bertrand Russel sought to deliver this link via logical atomism, or the supposition that “reality is composed of logical atoms which are not further analyzable.” If this were the case, to represent someone all we’d need to do is recall their unique description, composed of these logical atoms.
Unfortunately, Russel later soured on logical atomism.
It has a few problems. What exactly is a logical atom? If you think you’ve found one – something no longer further analyzable – how do you know? And for a given description, how do we know it’s necessary – something describing a person’s true essence – and not just possible?
We’ve spent the last decade shifting culture to honor our respective pronouns – to treat each other the way we want to be treated. The last thing we want is to build on a representation that’s merely possible.
We want a representation that’s necessary – uniformly true – not possible or contingent.
Intuitively, names appear rigid and not contingent on anything. Einstein. Rachmaninov. Any representation we devise shouldn’t be contingent either.
Saul Kripke formalized this idea in his 1972 Naming and Necessity, proposing the notion of rigid designators, a proper name that denotes an object in every possible word that it exists.
Kripke went on to reject Russel’s descriptivism, proposing instead that proper names are declared or “baptized” and carry meaning via every reference to the name since the initial naming ceremony, each reference based on the previous – a causal chain.
Under Kripke, reference is secured by this causal chain.
Every interaction the user has with the world could constitute a path to representing Kripke’s causal chain. And constitute representation that’s necessary, not merely possible.
Social butterfly
Ideally our representation could also be humanistic or reflect how we naturally treat identity.
For instance, ever since we were kids, we’ve been identity switching. We shift our vibe when we’re with friends, to when we’re with friends we trust, to our parents, to professional circles.
Social psychology calls this social identity – the switching of images of ourselves across contexts. In professional contexts we’re charming, interested, and focused. With friends we’re funny, sharp, and more relaxed.
Some identities are stigmatized. We shift to those more rarely or only in environments that feel safe.
Some social identities we cannot hide. Other people in our group just know, or think they know, even just seeing us on the street. A quick glance gives it away. Social identity isn’t always actively switched on or off. Vibes carry.
So it’d be great if our representation could also accommodate the dynamic nature of our identity as we navigate digitally.
Personal timeline
Putting this all together, we take our first user context representation to be a personal timeline.
Crosshatch enables consumers to effortlessly assemble – take the union of – data from across applications.
Each of these applications has its own data representation. On the other hand, irrespective of the representation, we can see some of the actions enabled in each application may actually carry equivalent meaning to the user.
One place to see how this could work is bottoms up – if you’re creating a protocol for decentralized social network applications, how might you define the interaction semantics that could work with every client? We look to Farcaster, which standardizes a set of available message types
cast: public message
reactions: Bob liked Alice’s `cast`
links: Bob follows Alice
Profile data: Alice published a profile pic or display name
Now going top down, seeking to find a similar unified set of actions that consumers might take across apps, we might axiomatically declare
A click is a click, whether it happened on Amazon or Kroger
A like is a like, whether it happens on Instagram or Spotify
A run is a run, whether it happens on Strava or Apple Health
An utterance is an utterance, whether it happens on ChatGPT or Claude
While not necessary, axioms of this flavor could help furnish a user data layer with nice ergonomics focused on user actions, irrespective of the data source channel that observed it. Taking the data layer as a sequence of all actions a user has observed or personally taken, we get closer to creating Kripke’s causal chain, a sequence of all interactions between you and the world. And with objects of the action-object framework enriched with metadata, it could enable fine-grained, context-aware permissioning and control for effortless extensions of social identity.
This representation isn’t perfect. Characterizing objects with metadata is descriptivist as of early Russel. Ideally we’d wish for objects to accrue their own causal chain of reference (e.g., among all the users that act on it the object). Further, it’d be convenient if all object metadata of the same references had the same representation (e.g., ‘New York City’ v ‘NYC’). This could allow for easier retrieval and processing when data are activated, but guaranteeing this sort of uniform entity resolution across all possible object entities we might observe could be challenging, especially since language models have looser consistency requirements.
We’re excited to continue in the path of many who’ve hoped for a user-governed interoperable data layer respectful to and delightful for humans and easy to use for developers.
Progressive self sovereignty
Even in 2003, it felt like walls were going up on the internet. How a progressive self-sovereignty can make internet experiences richer at a reduced cost.
Scroll to read
5/20/2024
Walls have been going up on the Internet. The openness that characterizes the Net is under attack on several fronts.
With the death of cookies, you might think this is recent writing, but it actually comes from a manifesto from over 20 years ago. It was written by internet pioneers who even in 2004 were already aspiring to a Next Generation of the Internet, one with identity and trust built in.
Their proposed protocol has yet to materialize. For instance, two weeks ago George Hotz wrote
The reason the Internet sucks now is because it has a different ruling class from the 90s-00s. A subject mindset now vs a citizen mindset then.
Elon Musk agreed and asked for ideas. Hotz replied
I think the main problem is the friction of avoiding the major platforms.
HTTP, SMTP, and POP3 were open protocols that anyone could participate in, Facebook, Reddit, and X aren't.
However, just building the protocols won't work. I'm generally bearish on most crypto projects, particularly when I see things like metamask swap and the uniswap front end fee.
tl;dr: Make the self sovereign option lower friction than the centralized option.
We agree: You can’t save souls in an empty church.
We love the principles that crypto projects encode, but they’re paid for with complexity, cost, speed, and consumer ergonomics.
It’s time for little tech to get aggressive.
You don’t get self-sovereignty on principle alone. Consumers expect effortless experiences. They don’t want to run their own servers. You get self sovereignty because it delivers 10x better experiences that recast incumbent cost structures.
So – how do we get “self sovereignty” and lower friction? Self sovereignty and 10x products with entirely different cost structures?
In analysis, a common proof technique is to begin by starting with just what you know and pulling the thread in a forceful way to get what you want.
You can prove foundational results like the Intermediate Value Theorem this way. Just use the properties of what you know and have and continue to unpack them until you get what you want.
Taking a similar approach, we attempt to unpack self-sovereignty and see if we can find threads to pull to get it.
Law or ledger?
There are many candidate definitions of self sovereign identity. We like Christopher Allen’s 2016 Coindesk posing of self-sovereign identity and its associated 10 principles.
Existence: Identity is based on a personal core that can't be fully digitized. [Just as in our Shape of Yourepresentation.]
Control: Users must govern and manage their own identities.
Access: Users need full access to their data without hidden information.
Transparency: Identity systems must be open and understandable.
Persistence: Identities should endure as long as needed by the user.
Portability: Identity information must be transferable across systems.
Interoperability: Identities should function globally and cross-platform.
Consent: User consent is required for identity data sharing.
Minimalization: Only essential identity data should be disclosed.
Protection: User rights must be prioritized and safeguarded.
It seems that these principles have endured. Let us say that self sovereignty is the delivery of these principles.
Since Allen’s writing, global privacy laws have embraced these principles, delivering components e.g.,
Control: Right to delete and correct
Access: Right to access
Transparency requirements
Persistence and Portability: Right to access
Consent and Minimization: CCPA 1798.100.a.1-3
Of course, these laws don’t have the same elegance as protocols. Compliance can be slow, high-friction, and leaky. Protocols resolve possible conflicts of interest on initialization.
On the other hand, laws like the Digital Markets Act are amping up costs of non-compliance to 20% of global revenues. Big companies (“Gatekeepers”) called out by the DMA like Google and ByteDance are taking it seriously, shipping, among other things, data portability APIs.
It’s unlikely Google would ship a Data Portability API if they didn’t have to – after all, they’re simultaneously attempting to restrict cross-context data flows.
Possible paths to self-sovereignty by policy not protocol can feel sacrilegious.
I think in the industry, because we love decentralization … we treat it as though it is sacred and always applicable and that's not true. Decentralization means means you will be very inefficient and very slow in exchange for being perhaps very secure or aligning incentives so that you don't have one group of people taking advantage of another group of people.
Jake Chervinsky Chief Legal Officer at Variant explained at ETHDenver in February.
Policy is making denying consumers access, control and transparency around their data and its use expensive. Chervinsky continued
What has worked is being very honest about what you are doing and why and what that includes when it comes to deciding whether and when you want to decentralize so to me decentralization is a trade-off.
We agree. Full self-sovereignty, depending on how it's posed, comes at a cost.
So the enforcement and demonstrated early compliance with laws like the Digital Markets Act feel like an opportunity. With another candidate option in hand and a clear set of principles needed to deliver self sovereignty, we have apparent opportunity to affirm the tradeoffs we might make to deliver self sovereignty, perhaps progressively.
Persistent Identity
It didn’t take very long for people on the internet to begin to long for a persistent identity.
We quote the Next Generation of the Internet manifesto because it is strikingly and uncannily prescient to the present moment.
Off-line the question of individual identity is fairly straightforward. ….
On the Internet, however, identity is a far less subtle, and more complicated issue. The Internet, as it is configured today, is poorly suited to support the multifarious nature of identity that we take for granted in daily life. … Each of us may have one, or several, e-mail addresses, but that specific identification says little about who we are, our interests, or our experience.
The World Wide Web is a super-set of the Internet built on top of the fundamental organizing principal of domain names, which are used in the creation of URLs. Each URL is a "Web site," or a location on the Internet that individuals can visit by clicking on hyperlinks. Once a URL is established, it essentially becomes the private property of the person or group that is administering it — the site becomes whatever the site’s director chooses for it to be, at a whim.
But while the Web has developed a sophisticated system for the creation of "sites," there has yet to appear a good means to represent each of us as individuals in cyberspace. Every time we visit a new Web site, we enter as an anonymous person. Then, with our own labor, we create an identity within that specific site, following the rules as they are presented to us (For example: "Please click here to register ..."). Once we establish our identity on that Web site, it effectively becomes the property of the Web site owner. For this reason, URL-based communities are like walled castles with one-way doors; once you have created an identity on that Web site, it is only of use on that same Web site; it can never escape.
This writing resonates with us deeply.
“The internet sucks” says Hotz. We've known why for over 20 years.
But the manifesto's solutions, particularly involving a decentralized consensus building on internet ontologies, have felt to us challenging to get off the ground. AI with growing data portability rights appear as a solution.
Federated meaning of life
Each of us makes our own meaning in life. But the idea of a decentralized approach to developing an open ontology – a formal shared map of concepts and meanings (that themselves are in flux!) – feels challenging.
The idea is beautiful – let the people define the matter of meaning. But decentralized governance is intentionally inefficient. Chervinsky explains
decentralized governance … [is] intentionally going to be inefficient. The best example of decentralized governance that we have before crypto is the United States Congress if you've spent a minute looking at the United States Congress you can see it is not efficient it is very slow it almost never gets anything done and that is by Design.
As we pointed out last week, even decentralized social protocol Farcaster takes an opinionated (and reasonable!) approach to ontology.
And in the age of AI, a blessed formal shared taxonomy may not be needed. As long as there is an open and well-formed standard for cataloging events (e.g., as recorded by other platforms), with AI data intoperability is just a prompt away. So while, per the manifesto, “commercial systems for forming ontologies have no incentive to use open standards” in the age of AI – where data interoperability only demands known provenance and ontological consistency – that might not matter.
AI makes closed ontologies difficult to defend.
Progressive self sovereignty
The manifesto seeks principles
Open Standards: “it must be transparent so that all of the entities that participate in it are reasonably assured of its trustworthiness”
Interoperability: “Interoperability between diverse environments and ontological frameworks is central to this effort.”
Inclusivity: it “must be value-neutral, open, and inclusive, not unlike the open connectivity of the underlying Internet protocols.”
Respect for privacy: “Every person online must be certain that private information remains private, and that neither governments nor commercial interests will use this information in any way without the individual’s knowledge and expressed permission.”
Decentralization: the authors propose decentralization rather in how to integrate the protocol with the rest of the internet, namely “standards that can be added to existing community operating systems in a modular fashion — so they do not have to rewrite their software from scratch, but rather can "plug-in" these modules to their existing infrastructures.”
We’re building Crosshatch in support of these principles.
We were only kids when this manifesto came out. But we expect a better internet today. Governments do too.
The ideals espoused here are now not only written into law but enforced. AI releases us from the coordination challenge of creating a single way to describe things that we all agree on to one that’s just well-formed and openly documented. It also is poised to recast infrastructure costs of personalization, enabling activating personal context a mere API call away at marginal cost that’s falling over 80% year over year.
And for digital representations of identity just images of the ineffable self portable by expensive-to-violate state mandate, there appears a thread to pull to make a self-sovereignty option not only lower friction but a way to richer experiences than centralized option alone. With data represented once under the user, businesses wishing to activate the data can skip building and maintaining this infrastructure themselves, delivering "internet more you" exerperiences for less.
We view self-sovereignty on a sliding scale of tradeoffs. Stronger (protocol-enforced) controls over custody and data usage can come at the expense of consumer and developer ergonomics, costs and experiences. Controls via legal agreements and privacy laws are gaining stronger teeth and enable ergonomics more familiar to developers and consumers. We start with the latter and build toward the former to the extent it unlocks demand of people with fiercer preferences for control.
Enabling a more expressive and free internet – to make everything more you – is complex and going to take a lot of work to get right. But the principles are worth it. The upside of an internet 10x more you because it activates all the contextyou wish to share is worth it. Getting these principles with economies of scale is worth it.
It is progressive in that we begin with initial adherence to self-sovereign principles and build in a way that tightens the guarantees we're able to make over time. We start with what we know and pull the thread to its conclusion. That feels like a beautiful path.
And it's the one Crosshatch is building.
Logged-in Web
Open networks need superior economics to rise. Today private networks control consumer identity and data to massive enterprise. The cookies are stale. It's time for the Logged-In Web.
Scroll to read
5/26/2024
Last week Ari Paparo and Eric Seufert spoke about growth opportunities for digital advertising.
Ari’s take is that the open web – everything not managed by the Walled Gardens – is dying.
Walled gardens are well-tended. They have direct distribution to consumers and rich identity via user log-in furnished with longitudinal customer data that enables great user experiences including relevant ads and increasingly measurement.
On the other hand, with cookies set to deprecate sometime in 2025, ads on the open web are about to lose their predominant identity. Worse, ads products on the open web are fragmented and home to unscrupulous intermediaries that, according to Seufert, “exist to commit fraud.” Taking these two together – the loss of identity and murkiness of business practices– it’s easy to see why Ari might be right.
The open web is dying.
Many in AI say ads get in the way of great internet experiences. But ads are what enable the free and open access of information.
Lou Montulli famously of Netscape explains the early intuition of why developing commerce on the open web was so important
The thought was if we enabled e-commerce we would essentially create a revenue stream for the internet that would be sustainable because it's all well and good to have a set of technologies that allows anyone to freely surf the Internet but if there was not companies who actually provided information and provided businesses on the internet then there would be no future funding and the commercial networks that were competing with us i.e., AOL and MSN would essentially win out due to greater funding.
That is, finding economics that could out-compete private networks was essential to the path to dominance for the principles of the open web. Enabling open networked commerce was that way.
The development of commerce also required memory, per Montulli, “that would allow a website to recognize a user from one visit to the next” without requiring a user to log in.
Today, however, the economics that originally supported the open web are funneled to private networks – companies that have consumer identity and data, principally because they do ask users to log in. Logged in users have well-understood identity and that furnish better user experiences that monetize better in owned experiences and ads. This has concentrated power, and poses a threat to the early visions of a decentralized web.
Today, it seems like history is repeating itself, with today's closed networks not Bell Atlantic, T.C.I., Time-Warner, U.S. West; or AOL and MSN; but Meta, Amazon and Apple. Just as the early web pioneers sought to create open platforms that could compete with the closed networks of the 1990s, we now face a similar challenge in the face of the walled gardens' growing dominance over our collective identities.
A web of free expression
The early pioneers of the internet envisioned a web that would foster competition, innovation, and free expression.
The Electronic Frontier Foundation's 1993 "Jeffersonian Vision" called for guaranteeing
that anyone who, say, wants to start an alternative news network or a forum for political discussion is given an outlet to do so.
The open web made creating such forums possible, but as Montulli recognized, sustainable economics were necessary for these ventures to thrive.
Today, the economics of identity threaten the open web and the early visions of the internet. The Progress and Freedom Foundation's 1994 essay "Cyberspace and the American Dream" argued that "technological progress creates new means of serving old markets, turning one-time monopolies into competitive battlegrounds." The essay envisioned ‘cyberspace’ as a force for competition, with "market after market" being transformed from a "natural monopoly" to one in which "competition is the rule."
"Inexpensive knowledge destroys economies-of-scale."
Yet today, we see the opposite trend: the rise of new natural monopolies, driven by the concentration of consumer identity and data in the hands of a few dominant platforms. While cyberspace has made knowledge about the world inexpensive, it has made knowledge about each other expensive to acquire, manage, and administer - and increasingly controlled by walled gardens.
We were very much against tracking and in favor of privacy. [Our] eureka moment was what if what if we just created an identifier that was only associated with individual websites and was not shared between websites
They could create an identity that allowed websites to remember returning users without log in.
Montulli recounted. Of course as we all know now the internet found a way around.
With technologies within HTTP it definitely allows cookies to be used if a whole group of websites also essentially work together to create … a tracking Network”
The deprecation of third party cookies blocks this behavior.
So what sort of technology can address the natural monopolies over our identity and the economics it accrues while also side-stepping the problems that cookies introduced?
Logged-in Web
There’s tremendous elegance in simplicity.
It’s true that in the early days of the web, the only way sites could remember you was through logging in with a username and password. Montulli explains, “there was no other form of interaction that would allow a website to recognize a user.”
But maybe that’s a good thing. Maybe users should just log in.
Cookies are a mess. They're a privacy and user-agency nightmare. And the data they furnish is expensive and not even that good.
On the other hand, identity with the Logged-in Web is simple. Modern identity support automatic logins or logins in a single tap. Users don’t need to remember which account they used for their last visit – it can be all automatic. No duplicate identities that muddy customer journeys. Our avenues for introducing and representing ourselves are increasingly effortless and humanistic. Soon consumers will be able to login to any site with all their context in just a tap.
This is the vision for the Logged-in Web: a fresh take on cookies. Ari explains the idea on Eric’s podcast:
“And you may say, well that’s a pipe dream. No one will ever log in.
But if you consider the scenario where cookies are deprecated sometime in 2025 and the CPM difference between logged in and non logged in skyrockets … you could see a scenario where [sites] say ‘well I know [requiring logins] is painful but it’s the only way.'
And then it becomes self-perpetuating where more and more [user experience, ads and commerce] go to logged in and less and less to the non-logged in and the economics of the non-logged in become totally infeasible.
And you could end up in a world where it’s not 100%, maybe it’s 50% logged in or higher and that would be very different from the current world.
Our view at Crosshatch is that this shift is a massive opportunity to change how data flows on the internet. With portability laws rising worldwide, users have a growing right to access their data. The costs of non-compliance are high and the largest private networks are moving to comply.
Logging in can happen in a tap. And if you’re going to ask users to log in, why not do it in a way that lets them bring all their context? First party data is increasingly ‘meh.’ Your users have the most data anyway.
Identity and data deliver the best user experiences and network economics. The gates to consumers wielding that data for whatever purpose they like are opening.
It's time to rise against the natural monopolies that privately capitalize on our identities and embrace open systems that empower us to make our own.
Hypermedia and Identity's Third Law
Everyone has tried to build a personal data wallet. Most failed. Why's this time different? Data portability and AI.
Scroll to read
6/1/2024
Data wallets are inevitable.
Or at least that’s what people have been saying since 1999. Or 1965? Or 1945??
Everyone has attempted a wallet.
Microsoft
Sun Microsystems
OpenID Foundation
Startups like Personal Inc
Loyalty like iBotta
Plaid
Internet futurists imagined interoperability everywhere, across forms, e-commerce, social networks – everything.
No one has achieved it. A few got part way. Most have died or faded.
So what the heck happened?
And now we’re building a wallet??
Why?!
In this blog, we study the pre-internet’s imagination of interoperability, a brief history of wallet attempts evaluated under Identity's Third Law, and use the context to motivate why now’s the time for the data wallet to rise.
Hypermedia
Interoperability has fascinated scientists and philosophers long before the internet.
In the July 1945 issue of The Atlantic, former World War II engineer and administrator who led the U.S. Office of Scientific Research and Development Vannevar Bush imagined
Consider a future device … in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.
Bush was inspired by how the Telegraph could evolve.
Bush believed the telegram could evolve into the Memex, a self contained research library.
Bush explored how content of similar meaning or reference could be composed into “associative trails” of references to books, articles or writings.
Thereafter, at any time, when one of these items is in view, the other can be instantly recalled merely by tapping a button below the corresponding code space.
Twenty years later, the Memex had still not arrived.
In 1965 at the 20th proceeding for the Association for Computing Machinery Swarthmore College alumnus Teddy Nelson believed computing hardware necessary for a memex was available and described a new framework to deliver it.
He proposed an Evolutionary File System “ELS”
a file structure that can be shaped into various forms, changed from one arrangement to another in accordance with the user's changing need.
The system is user-oriented and open-faced, and its clear and simple rules may be adapted to all purposes.
and went on to introduce the idea of Hypertext
a body of written or pictorial material interconnected in such a complex way that it could not conveniently be presented or represented on paper.
[Today, this sounds like context for a language model!] Nelson’s speaking on the topic has evocations similar to vectors e.g., in 1965 at Vassar College
Mr. Nelson pointed out that we often do not think in linear sequences but rather in "swirls" and in footnotes. He introduced the concept of the hyper-text, which would be a more flexible, more generalized, non-linear presentation of material on a particular subject.
In the 60s, this idea evolved into a hypertext publishing project ‘Xanadu’ that went on for 30 years. It proved extremely ambitious – with copyright and royalty built in (reading text would send micropayments to the author!). It was theoretical and complex. Wired Magazine called it “cursed” and the “the longest-running vaporware story in the history of the computer industry.”
In 1990, Tim Berners Lee refined Teddy Nelson’s hypertext (and related hypermedia) as
Human-readable information linked together in an unconstrained way.
in his famous “vague but exciting” “Information Management, A Proposal.”
Berners-Lee explained how his frustration at the lack of interoperability across computers lead to his creation of the web.
“I was frustrated -- I was working as a software engineer lots of people coming from all over the world. They brought all sorts of different computers with them. They had all sorts of different data formats, all sorts, all kinds of documentation systems. And these were all incompatible. So if you just imagined them all being part of some big, virtual documentation system in the sky, say on the Internet, then life would be so much easier.
Well, once you've had an idea like that it kind of gets under your skin.”
While “vague but exciting” in 1990, hypermedia today seems simple. With AI models accepting context of millions of tokens, todays hypermedia appears to have just two components
Human readable information
AI that remixes information without constraint
Coincidentally, we’ve modeled our data interface (AI over a unified events representation) exactly following Nelson and Berners-Lee’s hypermedia.
Personal hypermedia “hyper-personalization” then has all our Linked Information over the objects
sites
songs
videos
stores
runs
books
that we have
liked
purchased
exercised
confirmed
played
all linked together in an unconstrained way via AI, constrained only by the prompt and the context you choose to add.
Have you seen my wallet?
If you’re going to build a wallet – an identity with permissioned data – you should have some sense for past attempts and understand why they succeeded or failed.
In the first section, we encounter the Third Law of Identity, which appears as a pinch point for the success or failure of an identity.
Microsoft Passport
On October 11, 1999, Microsoft introduced Microsoft Passport
a single sign-in and wallet service for communication and commerce on the Internet. By creating a single Passport “identity,” users can easily access information and purchase goods on multiple Web sites using a single login and password.
This new service allows online shoppers to purchase items with ease, eliminating the need to repeatedly type the same shipping and billing information when ordering products or services at different Web sites.
Passport’s [gives] consumers the ability to use their Passport identity across many Internet sites, not simply within a proprietary network, and a server-side design, which provides consumers access to their Passport anytime, anywhere, using any Internet device.
In addition, Passport takes a stronger stance on privacy and security. Passport ensures that its members always control the information stored in their Passport, and which Web sites receive it. Moreover, Passport requires all participating sites to adopt privacy policies that conform to industry-recognized privacy standards.
“The user is in complete control of the information he or she supplies to Passport — and the user also actively controls with whom that information is shared.
In practice, it seems like Microsoft Passport was really a single sign on service with a promise of privacy. Passport managed to accrue big partners like eBay, Costco and Barnes and Noble.
But following a series of security breaches, rumors of anti-competitive business practices like blocking non-Internet Explorer browsers, and a complaint filed with the FTC concerning alleged false representations of data collection, by 2005 its last major partners had jumped.
Passport also arrived at a time when people still believed in the Golden Rule of Cookies
The golden rule of cookies is that they are only sent back to the same web domain as they came from.This is important to remember, because it’s the only thing that really protects you from having all the web sites you visit swap information about you.
But swap information the internet did. Huge Data Management Platforms were built specifically to do this, and are now hooked into major enterprise CDPs.
We wonder how writing on Passport might have been different if folks knew what cookies became.
Kim Cameron, Microsoft’s Chief Architect of Access did a post mortem on Passport. He didn’t believe Passport’s troubles came from people not trusting Microsoft (millions of people had accounts!) but rather a consequence of the Passport’s positioning with respect to the Third Law of Identity that stipulates
Technical identity systems MUST be designed so the disclosure of identifying information is limited to parties having a necessary and justifiable place in a given identity relationship.
Cameron came to believe that Passport violated this third law for two reasons:
The first was that web sites didn’t really want Passport mediating between them and their customers. And the second was that consumers didn’t see what Passport was doing there either.
Cameron concluded the postmortem
For now, I leave it as an exercise for the reader to explore the applicability of this law to various potential candidates for provision of identity.
Sun Microsystems
In September 2001 Sun Microsystems and an alliance of 30 others including General Motors, eBay, Nokia and later AOL launched a counter to Microsoft’s passport.
They called it Liberty Alliance
Federated identity will enable the next generation of the Internet: federated commerce. In a federated view of the world, a person's online identity, their personal profile, personalized online configurations, buying habits and history, and shopping preferences are administered by users, yet securely shared with the organizations of their choosing. A federated identity model will enable every business or user to manage their own data, and ensure that the use of critical personal information is managed and distributed by the appropriate parties, rather than a central authority.
Liberty Alliance grew to 150 members. That said, it required expensive coordination between multiple parties, slowing its adoption relative standards like OpenID that would later emerge.
Entities involved in a ID-FF implementation.
Relative to sites or applications managing their own authentication, Liberty ID fails Cameron’s Third Law, as it lacks a necessary or justifiable role in mediating identity.
Open ID
On May 16, 2005 Brad Fitzpatrick proposed yadis Yet Another Distributed Identity System. The history of OpenID is well-documented. Its Attribute Exchange capability could enable social network interoperability, where a user could direct certain attributes saved in one social network profile and point them to another.
While technically feasible, this never happened. Dare Obasanjo explained in 2007
The primary reasons for lack of interoperability are business related and not technical.
Why would Facebook implement a feature that reduced their user growth via network effects? Why would MySpace make it easy for sites to extract user profile information from their service? Because openness is great? Yeah…right.
Openness isn’t why Facebook is currently being valued at $6 billion nor is it why MySpace is currently expected to pull in about half a billion in revenue this year. These companies are doing just great being walled gardens and thanks to network effects, they will probably continue to do so unless something really disruptive happens.
It turns out Facebook had over $1T worth of reasons not to do this. Particularly with major social media networks, OpenID fails Identity’s Third Law because OpenID has no necessary or justifiable reason to mediate auth to platforms like Facebook when Facebook could just do auth itself, especially given implications of openness to its network effects.
Personal Inc
In 2009, Personal Inc launched their life management platform. While not explicitly an identity layer, they enabled users to collate their data and permission it to services, friends and family.
They got great press, including from The Economist and Time Magazine, who described the service
Personal is a highly encrypted cloud storage service where users are the only ones with the key necessary to decrypt their data. You can manually upload documents as well as email passwords, account numbers and addresses.
CNET went on to describe the platform
The idea with Personal is that you file these items into "gems" like, "beverage preferences," for example. Then you can recall them when needed, and more importantly, share with others when they need them. For example: You can quickly give the babysitter access to the alarm code gem.
But there was a problem. Onboarding. CNET explained
But there's a big problem with Personal: Data entry. It is a total drag to collate all this data. And I think the response rate when people request gems from other people (as in the dinner party scenario above) will be too low to make the system truly useful.
This onboarding hump could doom the company”
Personal violated Identity’s Third Law in a new way: Friction.
Personal implicitly asked users to frontload friction of entering information manually that they’d only maybe use in the future. Why use Personal to communicate your beverage preferences to your friend when you can just text them “yeah i like diet coke.”
Personal Inc just didn’t have justification to mediate these communications and, as such, failed Identity’s Third Law.
iBotta
iBotta is a cash-back app that rewards users for everyday purchases founded in 2011. iBotta is not a typical identity layer, but we include it because it associates data – purchase receipts – to user identity that interoperate with brand offers.
In April iBotta IPO’d to a nearly $3bn market capitalization. Investors were most excited about its retail performance network IPN, an apparent data consortium over retail partner authenticated identity that allow CPG marketers ability to reach consumers. This performance network acquires identity not by signups to the iBotta app but rather by usage of its retail partners sites. This B2B2C motion enables more scalable identity acquisition than iBotta app alone, which grew at about 19% YoY last year according to their S1.
iBotta does not fail the Third Law of Identity. With the iBotta app, iBotta delivers savings to consumers across brands they love. With iBotta IPN, iBotta delivers its CPG customers access to offer savings not only on its app surface but on the surfaces of its retail partners, all in a white labeled way. In the latter case, iBotta inherits identity provided by its partners. In both cases, iBotta is justfied and apparently necessary in mediating identity because it independently collates data across fragmented partners and experiences; and provides relevant offers over this combined data that'd be more difficult and expensive to build in a decentralized way. On the other hand, the iBotta wallet is limited to interoperating over brand offers, not personalization generally.
Plaid
Plaid is a safer way for users to connect their financial data to third party apps, founded in 2013. Its founding and history are well known. It is justified and necessary in mediating identity because building individual banking integrations (in a safe and compliant way!) is costly for any application it serves. On the other hand, its interoperability is limited to data from financial institutions.
Hypermedia and Identity’s Third Law
Identity solutions of the past apparently failed because of the Third Law of Identity. The identities that failed didn’t have a necessary and justifiable place in a given identity relationship.
They either weren’t necessary
Microsoft Passport
Sun Liberty Alliance
or they weren’t justifiable
OpenID
Personal Inc
because of business reasons or friction.
Cameron left an exercise to the reader to apply this law to potential candidates for provision of identity.
Today, we take up this exercise. We identify two critical trends that unlock new possibilities for an identity candidate.
First, data portability laws are on the rise. Users globally have a growing right and ability to access data about them.
Paths to onboarding data are in increasingly lower friction. Manual data entry for only a single data point (“I like diet coke”) is being replaced with automatic data collation from modern and automated auth flows.
While in 2007 Facebook eschewed OpenID’s Attribute Exchange, under today’s laws, Facebook no longer has a choice. Neither do any Gatekeepers.
Global law and new efficiencies in data connections satisfy the Third Law’s justifiability requirement.
Second, with AI hypermedia is now possible. But hypermedia requires access to consumer context from across silos.
Silos of consumer context from a Tim Berners Lee Ted Talk in March 2009.
A personal hypermedia packaged in an identity (a “personal data wallet” or "hyperpersonalization" via headless personalization) satisfies the necessary requirement in that it enables access and inference from context from across silos. Without such an identity, services must run on first party data that's increasingly "meh" and at risk from the death of cookies.
This contrasts with the incentive asymmetry of OpenID where OpenID providers benefit more from mindshare and lock in without delivering a corresponding benefit to those who accepted. While there was a long list of OpenID providers, a list of organizations accepting OpenID was nowhere to be found.
On the other hand, providing access to a personal hypermedia resolves this asymmetry: trading a third party identity for ability to provide superior service.
With data portability mandated by law and AI-unlocked hypermedia, an identity layer for personalization has a new chance to rise.
We're excited to make it happen.
Personal AI is an identity
Browsers and OS use identity to make data available across devices. The case for identity to make personal AI available across applications.
Scroll to read
6/8/2024
In March we asked What is Your AI?
It’s our belief that Personal AI is anything and everything. Personal AI is anywhere we are. It's a composition of attention, tools, and context.
Most candidate Personal AI do not have the context. As Notion’s Ivan Zhao put it last week
The storage/data/context layer remains fragmented. Compute without context can’t do much.
Suppose to the contrary that the context layer were not fragmented. But Personal AI is everything and everywhere.
So is our (private?) context now everywhere?
Reductio ad absurdum! Surely not!
The context over which personal AI might operate is often private. Equipping everyone with all our context would fly in the face of accepted data minimization principles adopted by global privacy law. It'd also be inefficient and expensive, with everyone building personal AI duplicating pipelines to end up with the same context.
What’s increasingly clear is that legally, individuals have a right to this fragmented context. So the data to power a personal AI should at least be with us.
But how do we get it everywhere? To every Personal AI? In a way we control?
In this blog we study paths for Personal AI to rise, the tradeoffs each of these paths might make, and why we believe Personal AI is an identity.
Compute with some context
Compute without context can’t do much, so we study only the paths that have context.
The only existing technologies that have rich context are
The browser
Desktop OS
Mobile OS
The browser can see a history of everything you’ve browsed.
The desktop operating system can see all compute processes and screenshot everything on the screen.
Neither can see what you do offline. They probably also cannot access data from mobile apps. Nor can they see the context kept private by businesses, like inventories or profiles of other platform users. Most offline services keep digital backends accessible via email or data request.
Mobile operating systems like iOS or Android appear different, in part because of how they mediate installations and monetization via App Stores. They see OS-level interactions but do not record interactions that take place within an app. For instance, iPhone does not collect
Ubers we’ve booked
DoorDash orders
Amazon purchases
In Apple or Google terms of service we found no language explicitly blocking this activity. That said, given the consent orientation of corporate policy and global privacy law, it’d seem Apple and Google would need to get consent from both users and developers to do this. On last week’s All In Podcast Chamath agreed Apple would need to change their SDK terms of service to enable this sort of data usage. While many are excited about mobile OS enabling a personal AI, it might turn out they have access to less context than most imagine.
If you look closely at all three of these technologies, you’ll notice that all of them encourage users to log in
MacOS and iOS with iCloud
Windows with Microsoft
Chrome with Google
Safari (indirectly with iCloud)
Arc Browser
These login services enable a more seamless browsing experience by making context from one device available on another, such as:
Your iMac available on iPad
Your Surface available on PC
Your mobile Arc available on Windows Arc
So while it’s true there are only three technologies today that observe massive user context flows, all three are positioned downstream from identity.
And increasingly, so is the web.
Data is wherever I am
A lot of people seem to believe that data on iPhone doesn’t leave your phone.
This is not true. It’s also not convenient.
Most iPhone users keep iCloud accounts. iCloud syncs data from native apps like Photos, Contacts and Maps to iCloud so it’s available across devices.
Here’s Apple’s Nick Gillett explaining at WWDC19
A belief that I have is that all of my data should be available to me on whatever device I have wherever I am in the world. …
And the data that we create on all of these devices is naturally trapped.
There’s no easy way to move it from one device to another without some kind of user interaction.
Now to solve this, we typically want to turn to cloud storage because it offers us the promise of moving data that’s on one device seamlessly and transparently to all the other devices we own.
If Apple is storing our encrypted data on its cloud services, you might start to wonder who holds the encryption keys. It turns out it depends. For most users, while the iPhone stores encryption keys for iCloud data for apps like Journal, Maps and Safari, Apple holds the encryption keys for data like Photos, Notes and Contacts.
Be careful with those selfies!!
AI even where I’m not
We use iCloud so our data is wherever we are.
But what about AI?
It appears that AI model size is bifurcating. Small performant models are beginning to appear natively in the browser like Gemini Nano in Chrome. On the other hand, large models like Llama 3 70B can run locally on Apple Silicon Macs but need more RAM than iPhone 15 currently has. Even if it could, it’d have significant impact on battery life, reported Dylan Patel at SemiAnalysis.
And if you want to use private models like Claude Opus or GPT4, demanding local-only (or onchain!) inference could leave you out of luck.
It is expected that this week Apple will announce OpenAI integration with iPhone with inference available via Secure Enclaves.
While data is canonically encrypted at rest and in transit, data is often decrypted when it’s in use. This would challenge Apple’s privacy posture, which keeps data encrypted but in limited circumstances.
Secure Enclaves solve this by keeping data encrypted while it’s in use. This could enable Apple to extend iPhone services to cloud inference that isn’t available locally running on data that even Apple doesn’t have the keys to decrypt.
With Secure Enclaves, iPhone users get inference over data they control even where they’re not.
Personal AI is an identity
All the candidate services that might wield context for a personal AI encourage users to log in with a user-governed identity.
iCloud. Google. Microsoft. Arc.
These identities, and the permissions associated with them, are extended by encryption keys. Some user-governed. Some managed. The choice depending on ergonomics and user-security preferences. These identities accumulate context, like browsing, photos or health information.
But all of these existing services have constrained interfaces. They don’t interoperate. Apple is explicitly against cross-context interoperation. Google is constraining it. Folks are unsure of Microsoft.
None have the full context.
Apple enables users to share suggested memories including photos and maps with Journaling Apps. Apps can't see all connected data, only suggestions that users agree to share.
On the other hand, per Notion’s Ivan Zhao, applications from travel, to dining, to grocery to social are shipping applications with fragmented context. AI is capable of transforming context into great new applications, but applications can only apply it if they have it.
This is why headless personalization – an identity layer with data, AI and user governance built in, that could dip into our data wells, reuse it and create magical personalized experiences in different contexts – is so interesting to us.
If there were an effortless way to assign our fragmented data to an identity with AI resources we govern, and avail those resources to third parties to interpret it in new applications, that could unlock incredible new applications that today are otherwise constrained by operating systems and browsers.
This architecture could begin to address the concerns Bill Gates raised in November last year
You’ll want to be able to decide what information the agent has access to, so you’re confident that your data is shared with only people and companies you choose.
Putting AI resources next to where user data live and placing them within a security boundary users govern give users a way to endow AI resources with access to accumulated permissioned data in a way they effortlessly control. This pattern also respects a minimization principle, where the only data shared by the AI resource, whether it's GPT4 or Claude Sonnet or Gemini Nano, is that which is needed for the application to operate.
As Apple showed with iCloud and its forthcoming AI services in Secure Enclaves, it’s not about data or AI staying local, per se. It’s about encryption, user agency, and sometimes trust. Many users trust Apple and elect Standard Data Protection v its Advanced form. AI served on iPhone is cloud inference served to my identity with encryption keys I control.
And while previous phases of computing were fragmented – involving different stacks, runtimes, frameworks, etc., – AI appears simple. Text is the universal interface. While previously it made sense to endow third parties Data Controller rights so they could perform their own flavor of computing on our data, with generalized models rising in capability, speed and cost efficiency, there’s less need to give third parties Data Controller rights to process data with models that users could govern themselves. This becomes particularly apparent as users accumulate a new Data Gravity and physics move compute to where the most user data live.
Personal AI is not a tab in your browser. Or just your browser. Or an app on your phone.
It’s everything.
To be everything everywhere all at once, Personal AI with all our context needs to be what you govern.
Personal AI is an identity.
Subjectivity that's yours
A subject mindset has replaced the citizen mindset of the early internet. Duty will not fix this. Desire will.
Scroll to read
6/17/2024
"Revolutionaries often forget, or do not like to recognize, that one wants and makes revolutions out of desire, not duty."
Deleuze and Guattari Anti-Oedipus: Capitalism and Schitzophrenia, p 366
Privacy turned rationalist
It’s been over a century since Warren and Brandeis declared the Right to Privacy in 1890.
Since then, interests in privacy have grown, with nearly everyone today touting some form of “privacy preserving technology.” The specific nature of the privacy conferred by these technologies is often complex and rarely specifically spelled out, beyond realizations of autonomy or control.
This isn’t too surprising, as outside of a rationalist “privacy is control” framing, privacy itself lacks a unified analytical conception. We’ve seen privacy defined in different ways e.g., with Columbia University Law School Alan Westin posing it as control over one’s personal information
"...each individual is continually engaged in a personal adjustment process in which he balances the desire for privacy with the desire for disclosure and communication…"
or philosopher Ferdinand David Schoeman in 1992 suggesting privacy as a constituent of dignity or the protection of “intimacies of personal identity.”
It may even be that spiritual humanist visions of privacy – safe spaces to be left alone and explore without surveillance – have culturally given way to rationalist forms of privacy
“As privacy has become commodified, many of us have begun to question its value. Many no longer believe that there is anything fundamentally essential about a more private or anonymous human existence.
Whereas privacy was previously framed in humanistic terms, it is now far more likely to be thought of as a type of property.”
wrote Columbia Law School professor Bernard Harcourt.
We’ve long understood the value of this property to social media and search platforms, who transform observations of us into lucrative ads businesses. AI is poised to find new value in its observations of us, some saying (e.g., Gavin Becker at Atreides Management) that AI’s observations of user activity is critical to powering enduring enterprise value.
But as we’ve also seen, platform observations of us are reflected back in the form of individualized content that shape new cultural narratives.
If privacy is seen as individual control over information, surely then we might have control over to whom we grant this information and at what price.
Refuse what we are
Earlier this month, Jack Dorsey spoke at the 16th Annual Oslo Freedom Forum.
“I think the free speech debate is a complete distraction right now. I think the real debate should be about free will.
We are being programmed. We are being programmed based on what we say we’re interested in, and we’re told through these discovery mechanisms what is interesting—and as we engage and interact with this content, the algorithm continues to build more and more of this bias.”
We agree with Jack (and Elon who retweeted it). Jack’s speaking reminds us of the warnings of Mill, how social pressures of digital platforms can lead to an inescapable conformity
"Society can and does execute its own mandates: and if it issues wrong things with which it ought not to meddle, it practices a social tyranny more formidable than many kinds of political oppression, since, though not usually upheld by such extreme penalties, it leaves fewer means of escape, penetrating much more deeply into the details of life, and enslaving the soul itself."
It also reminds us Foucault’s The Subject and Power
“Maybe the target nowadays is not to discover what we are but to refuse what we are."
Here, “what we are” refers not only to technology’s categorization of us
“Wife”
“UC Berkeley graduate”
“ex-New Yorker”
“Gen Z”
but now also to algorithms telling us, as Jack explains, what to think. With the direct link of our data to platform power, we are increasingly platform subjects.
Foucault continues
“We have to imagine and to build up what we could be to get rid of this kind of political "double bind," which is the simultaneous individualization and totalization of modern power structures.
The conclusion would be that the political, ethical, social, philosophical problem of our days is not to try to liberate the individual from the state and from the state's institutions but to liberate us both from the state and from the type of individualization which is linked to the state.”
In the modern internet, we find an instantiation of Foucault's totalizing double bind: a space where individual freedom of expression is celebrated, yet where that very expression is continually shaped and molded by the invisible hand of algorithmic systems.
While we’re free to individually post and engage however we want, we do so under the Algorithm which shapes the narrative and context powering our next post. And with information and content increasingly mediated by AI, finding ways to subvert this double bind become ever more vital.
This idea of 'refusing what we are' takes on particular significance in the context of today's digital platforms, which function as modern forms of Foucault’s Technologies of the Self.
Technologies of the self
Technologies today are broadly confessional.
Twitter: “What is happening?!”
Facebook: “What is on your mind?”
Instagram: “Share everyday moments”
Snap: “Express yourself and Live in the moment”
TikTok: “Inspire creativity”
Strava: “Record over 30 types of activities with features to help you explore, connect and measure your progress”
Search may even be understood this way. Foucalt’s Technologies of the Self help explain these confessional technologies. Norm Friesen writes
“Foucault defines technologies of the self as “reflected and voluntary practices by which men not only fix rules of conduct for themselves but seek to transform themselves, to change themselves in their particular being, and to make their life an oeuvre.” These are practices or techniques, in other words, that are both undertaken by the self and directed toward it. Specifically confessional technologies involve a deliberate and often structured externalization of the self, often with the help of a confessor or a confessional text or context.”
We share in seeking transformation with the help of a mediator, today now Tiktok, Twitter, Facebook or ChatGPT. Each of these technologies enacts Foucault's totalizing double bind, offering users a sense of individual expression while simultaneously shaping that expression according to the platform's own controlling paradigms.
It's clear that interests in privacy can exacerbate Foucalt’s double bind, as modern privacy is increasingly understood to be something like
an oligopoly of companies
they have have large slices of your data
It’s fine because they don’t really share it with anyone
The data moats members of the oligopoly build from our confessionals grow stronger the more we share and the more privacy is understood as protection from sharing than individual exercise of control.
With our data taking on greater power in mediating our digital experiences with AI, we believe privacy as information control takes on even more importance as a tool free agency and resistance against our present totalizing double bind.
“The algorithm is effectively a black box and … it can be moved and changed at any time. And because people become so dependent on it, it’s actually changing and impacting the free agency we have. The only answer to this is not to work harder at open sourcing algorithms or making them more explainable … but to give people choice, give people choice of what algorithm they want to use … that they can plug into these networks.”
Jack and Elon's call for greater user choice (“refuse what we are”) appear to be Foucauldian. If we are to resist the shaping of our subjectivity by algorithmic systems, we must have the ability to choose the applications we use and the algorithms they employ, without sacrificing convenience.
"The reason the Internet sucks now is because it has a different ruling class from the 90s-00s. A subject mindset now vs a citizen mindset then. Fix this, and you fix the Internet."
George Hotz tweeted last month.
A technology to enable applications with
inference or data interoperability from any AI model
onto any application surface
for any purpose
based on any context a user shares
without costing user control or convenience
could provide the free agency Jack seeks and the citizen mindset Hotz suggests.
This is what we’re building at Crosshatch – a button that can turn anything and everything more you.
Crosshatch aims to be this technology of the self that could subvert the double bind of modern tech platforms by giving users the ability to actively shape their own subjectivity, rather than simply being shaped by external forces. Data interoperability (e.g., via AI) could enable subversion of the totalizing lock-in of large platforms, as user context can be shaped by AI into whatever format or structure is required by a given applicaton.
Deleuze and Guattari's insistence that "one wants and makes revolutions out of desire, not duty" is crucial here. For our proposed system of user choice and algorithmic transparency to succeed, it must be driven not by a sense of intellectual duty or purity, but by a genuine desire on the part of users. It must offer tangible benefits and satisfactions that draw people in, rather than relying on a sense of moral obligation.
Our experiences on the internet are shaped by the data collected about us by large platforms that subject us to their view of the context they happen to know about us.
Of course, we’re more complicated than any platform knows. A button of desire – that we push for new
convenience
connections
value
delight
could enable a new subjectivity – one that’s yours. Whenver it's not, simply disconnect and push the button on a new application, making it instantly more you.
We can't wait to experience this new internet.
Stay tuned for announcements this summer.
Unverifiable
Personal AI is an identity. Digital identity is Personal AI.
Scroll to read
6/24/2024
The internet was born without an identity layer.
As we’ve previously written, many have attempted to build one, but doing so is complicated. Microsoft’s Head of Identity Kim Cameron – who helped build Microsoft Passport in 1999 – explains why in his Laws of Identity
Why is it so hard to create an identity layer for the Internet? Mainly because there is little agreement on what it should be and how it should be run. This lack of agreement arises because digital identity is related to context, and the Internet, while being a single technical framework, is experienced through a thousand kinds of content in at least as many different contexts, all of which flourish on top of that underlying framework. The players involved in any one of these contexts want to control digital identity as it impacts them, in many cases wanting to prevent spillover from their context to any other.
There’s a tension here today – while individual internet players want to prevent spillover of context, individual users want context spillover in ways it serves them. Today, each of us act as a human “clipboard” using our labor to translate context from one application to another when the task could clearly be offloaded to AI.
Applications refuse to help for the sake of our privacy.
We think identity gains particular importance in the age of AI. If we’re to have Personal AI, it’ll be based on who we are – our identity.
The representation we use will be of critical importance because AI operates on context – if we choose a bad representation any Personal AI could feel shallow and unhelpful. So much of our digital identity stems from a Foucauldian subjectivity, where we leave it to major platforms to set the context and frameworks to construct our digital representations – boxing us into how others have observed us in contexts foreign-controlled.
Every internet player also tactically has a different view of our digital identity – so any identity we build must interop to the systems, representations, and interfaces preferred by any player.
We also see efforts in Web3 to construct identity as a set of verifiable, possibly immutable attributes. For Personal AI applications, we see such efforts as misguided and even potentially dangerous.
We previously wrote that Personal AI is an identity. In modern internet platforms identity is upstream of most applications to make application state available across devices. This is how Apple makes your file system available across devices.
In this blog we’ll defend the opposite direction: Digital identity is personal AI.
Deep Real Identity
Prevailing narratives of identity center around namespaces and verifiable attributes.
It’s composed of self-sovereign verifiable immutable attributes
It should be of a namespace we control. We are “Crosshatch” not “CH” or “Cross Hatch”.
Large platforms have assigned us attributes – our digital identity is “the sum of all digitally available informationabout that person”
These narratives are tactically reasonable but lack philosophical depth to a point of being misguided.
Doc Searls suggests identity as Deep Reals, something that
is fully human, and can elicit any level of emotional response, as real humans tend to do.
actually given by self-sovereign “standardized ways of letting others know required facts about us” e.g.,
"I’m over 18,”
“I’m a citizen of this country"
“I have my own car insurance”
“I live in this state"
“I’m a member of this club.”
We realize technology is a game of tradeoffs, but “standardized required facts” is clearly a far cry from anything “fully human.”
Identity is emergent
We take on a firm post-structuralist view of identity: Each of us is contingent, fluid and emergent.
“Identities are only simulated, produced as an optical ‘effect’ by a deeper play, that of difference and repetition.” Deleuze, 1968
We’re compelled by modern philosophy here, but we think this idea is pretty natural. On a run, Coach Angela Davisdeclares
“You’re either growing or you’re slowing down.
The old you might not have had it in you to handle this moment. But the new you? Is embracing the challenge. The new you is leaning into the challenge. Now is another opportunity that we will embrace. You know what it takes to win. You know how to get there. You’re starting to see how every big win sets you up for something bigger.
Spending time, energy or compute verifying an attribute of the old you while the new you is constantly emerging feels like an odd use of time and an affront to our capacity to become.
Almost nothing essential
Modern theories of reference are no more friendly to the idea of people as a composition of attributes.
If you’re going to assign an attribute to a person, it should be necessary i.e., arising in all possible worlds that it exists. It should be essential i.e., without this property the object would not exist.
Gold has atomic number 79. If not atomic number 79 then not gold.
Try to find essential properties of other objects or people and you’ll quickly find yourself frustrated. “Identity company from Miami” does not necessarily define Crosshatch.
That’s subjective!
We’d go further and note that example facts that might be assigned are likely subjective a la Foucault i.e., arising from power structures that impose foreign-defined attributes onto us. If identity is to be self-sovereign, the sovereignty ought to extend to the frameworks themselves that assign any attributes we might accept as composing of our identity.
The stakes of namespaces appear to pale in comparison to the stakes of any candidate Deep Real identity. Sure it’s sort of annoying that at CVS I’m identified by my old phone number even though I have a new phone number. That Doc Searls goes by ‘Doc’ and not ‘David’, even though he was named by his parents as David. Superman is Clark Kent. It doesn’t matter that Lois doesn’t know it yet.
We reject Foucaldian subjectivity induced by modern tech platforms – we rather actively construct our own. Endeavoring to own (or verify!) platform observations of us reifies their platform construct and the behaviors or dynamics they induce. We can never truly own the data about us if we can’t own the data generating processes themselves.
Attempting otherwise feels like Stockholm Syndrome.
Cruising
Privacy in public is an antiquated construct belonging to the unremarkable and culturally acceptable – we were not afforded any true cocoon of anonymity. Distressing stereotypes persist on the basis of how people look or act. Discrimination is not solved by staying out of sight.
Toward our own identity
These are not merely academic distinctions. To the extent our world is increasingly mediated via Personal AI, the context that powers this mediation should be truthful to us. This mediation already risks flattening human beings to a bland conformity. A pathway to protecting ourselves is one where the context AI has is truly necessary, not just idiosyncratic or of verified Foucauldian subjective realization.
We are not just the sum of all data collected about us, we are contingent images of it.
Accepting deep real identity as a collection of verifiable attributes assigned to us by large tech platforms feels both inhumane and not real at all. We are clearly not a collection of “required facts.”
But AI poses a natural path to construct sovereign contingent images. A kaleidoscope of emergent becomings.
Any collected context
from any framework that collects data about us
Projected onto any other surface
For purposes desired by us
If there is a self-sovereign digital identity, the best candidate today is Personal AI. Personal AI also solves the troubles identified by Cameron, as AI enables dual control of digital representation, enabling individuals to selectively share AI projections of themselves in whatever context they choose.
If an application requires a different representation – data interoperability is just a prompt away. This form of Deep Real identity is delightfully tractable, as it’s just
any AI
Any context
Any prompt
Any surface
operated under dual governance of application and user.
Widely cited 2016 posing of self-sovereign identity in Coindesk takes its first principle of self-sovereign identity as Existence, namely that identity is based on a personal core that resists digitization. It seems like a DeepReal identity as by Personal AI need not contradict this, but operate fully contingent on it.
Such a fluid digital identity need not deny applications where verified credentials are required. It could just rather be that digital personal identity operates on a plane separate from verified credentials, which could have altogether different requirements.
We’re excited to release our progressively self-sovereign digital identity this summer. One that's emerging and fluid.
And resists verification.
Thick and thin
AI unlocks malleable UI. It can be executed by the OS, browser or application-layers. Why "thick" applications resist OS or browser-led malleability and personalized AI will ship at the application layer.
Scroll to read
6/29/2024
AI unlocks ephemeral malleable UI – UI that morph depending on the context.
For this AI-powered malleability, where should this malleability-making AI application live?
At the OS-level?
The browser-level?
Or the application level?
Before proceeding, we should clearly delineate between the AI resource that might power malleability and the interface where the malleability might appear.
Language model AI resources are making their way to the edge – browser and OS – from rarefied cloud addresses. Where a language model is deployed seems a function of performance, availability, cost and security.
AI is a valuable commodity – available everywhere with tokens soon too cheap to meter – perhaps even marginally free, but for the cost of power.
This is why it’s important to set aside the location of the LM deployment and instead focus on the application of AI – the human interface where its outputs appear.
While there’s AI at
The OS-level with Microsoft Copilot and Apple Intelligence
Browser-level with Arc, Perplexity and Poe (we include the latter two who reportedly use headless browsers for scraping)
we say fundamentally all of these are AI applications, bubbling up outputs of AI resources – wherever they might live – on surfaces they own with the attention of their users.
AI introduces important questions of provenance. Since AI inference can make anything everything, much of AI reduces to questions of who owns what.
AI applications then can be divided by their philosophies around provenance or rightful or practical custody. Some use AI deeply
Microsoft Copilot on Microsoft Word
Box AI on documents in Box
Notion AI on writing in Notion
Apple Intelligence on apps via App Intents
on content and services over which they’ve explicit rights or connection. That is, users have signed terms that explicitly grant these services rights to provide services against uploaded User Content or actions against registered, explicitly legible actions.
Others
Perplexity and Poe on the Internet
Arc over sites
Experimental browser extensions
iOS18 Safari “Web Eraser”
apply shallow AI over content not explicitly authorized for use this way.
Tldraw’s engineer-in-residence Orion Reed who created this experimental browser extension that
lets you pull apart any website, create views and transformations with LLMs, add new UI, pull in context from knowledgebases and APIs, and mix pieces of multiple websites together
called such application of AI a form of adversarial malleability
I've been calling this "adversarial malleability" as it makes the web more malleable with no action needed on the authors side in a way that is potentially against the business interests of the site + it's hard to stop you from doing it (without locking down the web entirely)
Irrespective of where you normatively land on the Right Way To Operate, it appears apps fall into one of two classes, one of which structurally blocks shallow AI applications from providing value users expect.
Thick and thin applications
We co-opt language from web development.
We’ll say an application is thick if it requires context that is of heterogenous and stochastic state that is expensive or hard to know. We’ll say apps are thin if … they’re not thick.
For instance, Airbnb is a thick app.
It’s composed of inventory that really only Airbnb knows about. Its inventory quality may be changing – a room or a house that started out great might be next door to loud construction that just started. That quality is given by reviews and Airbnb has the most concentrated and protected collection of them. Getting data about Airbnb inventory is expensive e.g., last year they expanded their Anti-Bots engineering team that was
harnessing the power of AI, state of the art ML techniques, and data for the higher purpose of preventing scraping and protecting data.
On the other hand, NYT is a thin app.
Its inventory – its stories – are “all the news that’s fit to print.” A story’s quality is given by the article’s provenance – it’s on the NYT! No reviews required. Any shallow app – whether a browser, browser extension, or another web app – could technically take an article and refine it for use in another context. No other data is required to make it useful.
Making an email message "more formal" is a thin application. Re-rendering a page in the style of Barney is a thin application. Anyone – OS, browser, or application – could credibly execute it with only subtle nuance at most lost.
Applications that provide rich individual personalization appear to be thick.
But for thick apps like Airbnb, a shallow AI attempting to operate over Airbnb like
A Perplexity or a Poe
A browser or browser extension
would struggle because Airbnb is thick. All the requisite context that a shallow AI might need is not just on the page – it’s across pages, projections of what’s in Airbnb’s protected backend. User context it might need is scattered across applications; not visible from actions that such a shallow AI might have happened to already observe.
Apps like retailers with variable inventory are likely thick. Aggregator apps that operate over publicly accessible APIs are thin, as the inventory they operate involves context that’s not hard to know. Just call the APIs they use. Or get the context once – rooms at the Westin Copley Place are probably the same this year as they were last.
Thick apps make it difficult for shallow AI to provide great user experiences because they involve context that’s just too difficult or too expensive to get. Thick apps are best positioned to offer compelling AI experiences themselves. It’s true that AI can change interfaces, but those changes are only surface level.
They do not engage the deeper context that only thick apps really know about. And perhaps that deeper context is really only theirs to wield.
Sovereign representation
Malleable or not – we think pure representations are those defined by the self; not cast onto others by foreign actors.
With software ever easier to create and incredible interest in the opportunities of AI, a lack of malleability in an application may increasingly reflect intentional expression than a lack of interest. If consumers demand AI-powered malleability, competition could drive change in the market. Adversarial malleability may not be required.
While some shallow AI provide aggregations that a single actor could not (e.g., providing a unified perspective from fragmented sources LA Times, Forbes and Washington Post simultaneously), such aggregation may also violate the agency and policy of an individual or organization that might explicitly restrict such activity.
Either way, at Crosshatch we’re interested in a malleable internet made hyper-personalized – making every app a thick app, by incorporating headless hyper-personalized AI in a way individuals and businesses control. We think malleability defined by everyone – a decentralized malleability – is more interesting and vibrant than malleability defined by a few. It also seems likely to be deeper, as the malleability emanates from the self rather than by others.
Decentralized malleability is really core to what Crosshatch is about – we want to make the internet more you. Only you can define who you are, and who each of us is resists attribute-based identification defined by others. Anyone telling us who we are or what we’re about is bound to feel shallow. Personalized AI must be decentralized and emanating from the self.
We believe personalized AI will appear everywhere (and as a result is an identity). While OS and browser-lead malleability are scalable and require little to no cooperation with others, for the open web, we’re most excited about a malleable internet lead by the application layer.
Applications (particularly thick ones!) know themselves best and are best positioned to recast their own identity remixed with their users'. This is why the application layer is such a promising place for rich personalized AI to manifest – it's closest to the people and organizations that create experiences and services we love.
Only you know who you are. Mediating third parties are just guessing.
A decentralized malleability invites everyone on the internet to participate in defining themselves with respect to everyone else, rather than having a few companies telling everyone what and who everyone else is. Decentralized malleability takes more work to come by, but we believe it’s a path to a richer internet and deeper relationships between people and businesses.
Oneself as another
Identity and the Other are mutually constructed. They pass into one another. How privacy absolutism could rob us of a rich identity.
Scroll to read
7/10/2024
Everyone talks about privacy.
It’s usually left undefined.
But we still see privacy absolutists, who appear to believe under no circumstances should you share anything with anyone.
We think this position is impractical and philosophically misguided. In this blog we’ll explain why.
Everyone is a privacy rationalist
Everyone talks a big game on privacy.
But the term is usually left undefined. It’s hard to pin down a specific definition.
Today, it seems like rationalist ‘privacy as control’
Privacy is the strategic sharing or shielding of information
prevails. It’s nicely tractable.
It’s reflected commercially, for instance on iPhone
“Apps mind their business not yours”
But you can share your location Always, While Using, or Never.
It’s convenient to have your Uber come exactly where you are, even when you’re not using the app.
Even location sharing aside, hailing a cab with a human driver could be said to violate your privacy. Telling a driver where you’re going could leak information about yourself.
But of course we all hail Ubers and cabs and tell drivers where we’re going. [We tell them the exact address not three blocks away as an anonymization mechanism.]
Most of us navigate privacy choices pragmatically, opting for convenience, expression, and even a Foucauldian 'confession'. [We'll write about that in a future blog post.]
But for its nebulous nature, some take an absolutist stance on the issue.
Absolutely difficult
Sometimes also see a Privacy Absolutism of a form
Everything online should be anonymous.
Don’t get us wrong.
Third party cookies made the internet creepy. It’s kind of weird for many people you’ve never met to have data about you (some of it’s incorrect??) stored in an enterprise CDP. [But it’s definitely privacy preserving!]
At the same time, general anonymity isn’t really possible in most parts of public life. It’s even unclear what a general anonymity really means – that you could go in public and no one would be able to know anything about you??
We’re not sure that’s even possible, particularly for underrepresented or persecuted groups. The way you walk. The way you make eye contact. Whether explicitly stated or otherwise, these are forms of communication and betray an absolutely anonymous self.
Hedge funds even institutionalized this game with alternative data.
Tell me you beat the quarter without telling me you beat the quarter.
Identity is co-created
Privacy Absolutism as state or corporate policy seems to challenge development of a rich identity.
We’re deep into post-structuralist philosophy, and Paul Ricoeur’s theory of the self e.g.,
a narrative propelled by conviction mediated by interpretation
feels intuitive. Our sense of self is not fixed, but constantly evolving through our interactions and interpretations—both our own and others'.
This process of identity formation relies on our ability to share aspects of ourselves and receive feedback from the world around us.
For Paul Ricoeur, in his Oneself as Another,
"selfhood of oneself implies otherness to such an intimate degree that one cannot be thought of without the other, that instead one passes into the other, as we might say in Hegelian terms."
where for Hegel
"Truth is found neither in the thesis nor the antithesis, but in an emergent synthesis which reconciles the two."
For Ricoeur, identity is the emergent synthesis of the self and the other.
Absolutist privacy – other than when desired by the individual, however effective (and particularly when declared by a corporate or the state) – can rob us of emergent identity by restricting exposure to the other and its interpretations of us.
If, per Ricoeur, our selfhood is a narrative mediated by interpretation – that of ourselves and others – if we cannot share aspects of who we interpret ourselves to be, we may fail to find the emergent self we may yet become.
We believe in “privacy as control” for many reasons, but Ricoeur’s Oneself as Another perhaps presents our deepest reason.
To become, we share what we want.
And it’s why we’re building an identity layer for personalized AI to do it.
Personalizationomenon
Personalization today is expensive and bad. It's not a skill issue: no one has the context. To get personalization from the movies, we need a radical shift.
Scroll to read
8/2/2024
In 1994 sites couldn’t remember us across sessions. The New York Times recounted in 2001
"At that moment in Web history, every visit to a site was like the first, with no automatic way to record that a visitor had dropped by before. Any commercial transaction would have to be handled from start to finish in one visit, and visitors would have to work their way through the same clicks again and again; it was like visiting a store where the shopkeeper had amnesia."
Lou Montulli at Netscape introduced the cookie, and we ended up tracked everywhere we went by organizations we’ve never met.
The Golden Rule of Cookies didn’t last long.
But now AI poses incredible capability to personalize anything – just add some context and a prompt.
But (AI-powered) applications don't remember us across applications (even those using the same underlying AI services!). Despite all its capability, we’ve found AI with the same type of amnesia the internet had in 1994.
Every time we encounter a new application we must re-introduce ourselves. Just as we did pre-cookie, we're expected to act as a “human clipboard” and use our own labor to go through the same onboarding flows, share all the requested information (with Terms of Service that no one (should have to!) read) so that a new digital service can wield estimations of who we probably are based on limited fragments they could get their hands on – consented or not.
The largest and best situated players tell us this is For Our Own Good – for The Sake of Our Privacy – while they in turn launch and grow massive ads businesses for their own benefit.
In this blog we’ll contrast how personalization systems have historically been built, and how, perhaps paradoxically, AI can unlock 10x better personalization that’s cheaper to implement and actually more private.
I wish I knew how to quit you
We all know identity on the internet doesn't really work.
And not to get pedantic (read: please pardon this pedantic pageantry) but it seems like defining what personalization actually is could help us speed run to a solution.
So many – particularly in web3 and EU policymakers – get lost here.
For us, personalization is about identity. It’s about aligning the world to us, and perhaps finding alignment in ourselves with the world.
It feels like personalization can actually go pretty deep e.g., following Oxford philosophy professor Timothy Williamson
“Knowledge and action are the central relations between mind and world. In action, world is adapted to mind. In knowledge, mind is adapted to world. When world is maladapted to mind, there is a residue of desire. When mind is maladapted to world, there is a residue of belief. Desire aspires to action; belief aspires to knowledge. The point of desire is action; the point of belief is knowledge.”
Personalization appears to be a flexing of
world to mind (desire to act to project ourselves onto the world)
and mind to world (belief to knowledge to align what we believe to what is true about the world)
[Given recent FTC action on “surveillance pricing”, we say for posterity that this flexing need not imply personalized prices!]
Personalization as a dual relation of mind to world is obviously about identity. We suppose it is comprised of three components:
Identity
Context
Control
The internet really still has none of these.
I am who I think you think I am
Personalization starts with identity.
Applications today cobble together identity from third party cookies (t̶h̶a̶t̶ a̶r̶e̶ g̶o̶i̶n̶g̶ a̶w̶a̶y̶ are probably still going away) and increasingly logins.
Stopgap “id bridging” techniques soon won’t work on 4 in 5 browser sessions with Safari “Private Relay” and forthcoming Chrome IP Protection (that’s been in the works for a while!).
Apple’s iCloud Private Relay hides IP address of browsing activity on Safari, limiting effectiveness of ID-bridging strategies.
Customer Data Platforms "CDPs" are worthless without consistent identity.
B2C companies responded this spring by adding log in and loyalty CTAs toward a new Logged In Web.
B2C companies accrue little advantage from usage for users who don’t convert and whose identity they cannot track. AI-motivated scraping has exacerbated this issue, with anonymous users now bombarding brands with valuable datawith heavy network loads and extreme bandwidth charges while providing no commensurate value in return.
My reality is just different from yours
Next for personalization, you need context.
First party context isn't that much, so many have augmented it with that provided by “Data Management Platforms” “DMPs”.
It’s not immediately clear what Data Management Platforms are. Wouldn’t they just be Customer Data Platforms?
Actually – no.
It turns out that, while they represent similar data, DMPs contain third party data (like that provided by an Epsilon or an Acxiom) while CDPs contain first party data. Enterprise CDPs often sell subscriptions to DMPs that enrich CDPs with third party data from DMPs.
Where do DMPs even get their data?
Some brag they have data on over 200mm Americans – including transactions, clickstream, and even health data. They have sold the legitimacy of their data collection claiming their records are anonymized or privacy preserving ostensibly because they’ve removed PII (even as brands use related identifiers to match to their own first party data).
Either way, the FTC isn’t buying it.
Rightfully so – removing PII leaves plenty of avenues for reidentification that we’ve collectively known about for at least a decade.
Some in the industry believe DMPs will be regulated away, even though activations of this data constitute core components of adtech e.g., Trade Desk margin, per Ari Paparo.
Yeah, ooh, control
Finally, personalization needs some control layer. It needs some control layer to mirror how we engage in real life. For instance, we're different when we're with our friends than we are at work than when we're with kids, than when we're out for a run.
And no one needs to say – we both
don’t have any control on the internet
and yet we also have too much (Thanks EU!)
which practically leaves us with none at all.
There’s all sorts of deeper writing on why we should care about control on principle. The most straightforward of which is J̶a̶n̶e̶t̶ J̶a̶c̶k̶s̶o̶n̶ John Stuart Mill
“Society can and does execute its own mandates: and if it issues wrong things with which it ought not to meddle, it practices a social tyranny more formidable than many kinds of political oppression, since, though not usually upheld by such extreme penalties, it leaves fewer means of escape, penetrating much more deeply into the details of life, and enslaving the soul itself. Protection, therefore, against the tyranny of the magistrate is not enough: there needs protection also against the tyranny of the prevailing opinion and feeling; against the tendency of society to impose, by other means than civil penalties, its own ideas and practices as rules of conduct on those who dissent from them; to fetter the development, and, if possible, prevent the formation, of any individuality not in harmony with its ways, and compels all characters to fashion themselves upon the model of its own.”
We're most compelled by control as exercise of social identity: the way we change ourselves depending on the context. What's for sure is that on the internet we definitely can't do that today.
In industry we've all (at least implicitly) recognized these challenges, and built bloated bandaid infrastructure workarounds that
feel icky
face regulatory pressure
aren’t living up to promises
Building personalization systems is expensive, and they’re not even that good.
Not second best but very, very expensive
Building personalization and consumer data activation pipelines is an expensive proposition.
It requires
Instrumenting and assembling consistent events data into one place
Integrating this data into complex personalization systems powered by expensive data warehouses and headcount
Train, deploy and maintain models
Workflow for AWS Personalize
For the business, these workstreams are entirely speculative. It is a bet of magical thinking that – IF
a business just collects enough data
and they're able to build and deploy algorithms
placed in just the right places
that will eventually lead to improvements to important KPIs like LTV, AOV, NPS, or retention or engagement. These exercises thrived in ZIRP but businesses are taking a hard look at these today.
And to top it off, these exercises require coordination from
Product
Marketing
Eng
Data Science
who, in our experience, are stakeholders who don’t always enjoy working together.
Product and Engineering prefer longer time horizon projects: those that create new products. Marketing and Data Science are often interested in optimizations over products or even step on toes with products themselves (e.g., Zillow Zestimate or Retail Media businesses). Who owns these efforts can feel confusing, as many stakeholders could appear as rightful owners, while the whole involves a markedly diverse set of skillsets that few have (or have the opportunity to develop).
Aside from being expensive and a slog to build, our core beef with this style of personalization is that it’s not even that good. CMOs are increasingly saying first party data is “meh.”
The story of real-time personalization “if you just know your customer” based solely on first-party data – has wholly failed to materialize.
We’ve heard it since at least 2018, and we’ve yet to see it.
We even attempted it ourselves at Walmart. Walmart likely has one of the best first party data assets in the world. There's a store within 10 miles of 90% of all Americans and it's a consumer staples business selling groceries – people have a reason to shop there often. And unlike many assume, Walmart data is pristine – we’ve seen it ourselves. Perfect records of every order flowing into a warehouse with well-organized catalog ontology.
We attempted this in the simplest way possible – starting with grocery, where we knew people already shop pretty regularly. With this incredible data asset and a former quant from the dominant Renaissance Technologies leading the initiative, what could go wrong?
What we found is that people are just idiosyncratic, even within a staples business like grocery. Their schedules and needs are changing. We couldn't predict even half of the basket with acceptable precision or recall.
Walmart's standard is saving busy families time. Personalization that delighted or saved time wasn't possible even with Walmart's incredible set up. Expanding scope to consumer discretionary like travel, retail or real estate, personalization seems impossibly hard without a fundamental change in dynamics.
We exist in the context
Today’s failure of personalization is not an infra or skill issue. There’s great infra for personalization today.
It’s one of access to context.
We spin our wheels (and compute!) to just try to figure out who someone is based on limited information when we could just … ask. People are willing to share data for value in return.
And with AI’s incredible capabilities becoming more powerful, accessible, and affordable – the limiting constraint of context becomes even more pronounced.
We need a personalizationomenon. Headless personalization. Instead of expensive and speculative personalization efforts, what if instead we could mash different components
First party context or application state
Context users bring with them
Some instructions for what to do with it by user and developer
to get an AI produced interface, recommendation, or re-ranking?
What’s interesting about this framework (one that intermixes privacy and collaboration) is that this mashup of contexts
User-owned
Business-observed
can happen in an environment of a trusted third party facilitator. It’s like a consumer- and AI-powered clean room.
We think this personalizationomenon is far superior to clean rooms.
They have more data
Enable cheaper data activation and integration
Unlock more expressive application than just analytics or cohort-based targeting
Leave consumers in control
Traditional clean rooms, on the other hand, require complex partnership-lead integrations that are really only available to the largest players. If you don’t have a lot of traffic, no one is going to want to create a clean room with you.
And for all the discussions of privacy, clean rooms don’t really have consumer stakeholders. They do in theory – if something bad were to happen and people were to find out. But consumers aren’t usually consulted as a part of clean rooms or data consortia. Nor would they really want to be.
The kind of notice consumers get in the EU for being part of a clean room data consortia, even among premium publishers. See it for yourself at BBC’s Good Food.
Clean rooms are like the new third party cookie, but worse. They're just weird. Users can't delete them. Privacy is principal in their messaging, but clean room companies sell to businesses. Consumers are not stakeholders. Businesses only want privacy to the extent they can say they Value Our Privacy and use a Clean Room and Clean Rooms are Privacy Preserving.
Data collaboration with consumers, on the other hand, just makes sense.
Let consumers bring their data with them and activate it on any surface, for any purpose, with the help of a trusted AI. 'Rationalist privacy,' share a bit of context for commensurate value in return is the governing paradigm of privacy for most Americans.
This is what we’re building at Crosshatch. We make it straightforward for businesses to add a Link to their app and invite consumers to log in with their context for usage within a particular use case. We then stream this live consumer data to a consumer-managed wallet.
Once consumers 'ok' Crosshatch in a tap, businesses can then request short-lived authentication tokens that allow them to call Crosshatch-secured language models. Using these tokens, in a single API call they can get personalized output from favorite language models by adding context to language model prompts programmatically and in environments whose rules are known at the outset. This allows businesses to generate personalized outputs like interfaces, rankings, or recommendations, all without building any infrastructure or directly accessing or storing the raw user data themselves.
And it uses AI in exactly the thing it's good at; following Slow's Sam Lessin.
Large Language Models are good at Expansion, Compression and Translation.
These are all of the ingredients of personalization: expanding, compressing and translating context from one instance
you liked a photo in Madison Square Park and you purchased at Sant Ambroeus
to another
you might like the Edition Hotel.
Comparing this to how personalization is currently done, feels like fresh air.
Personalizationomenon is a shift in how we approach personalization. Instead of businesses struggling to cobble together limited data and build expensive, siloed systems, this new paradigm enables a seamless flow of permissioned data, activated by AI at the moment it's needed.
For consumers, this means personalized experiences that remember preferences across platforms, without compromising on privacy or control. For businesses, it means more effective personalization at a fraction of the cost and complexity. And for the internet as a whole, it means a more open, user-centric ecosystem, where innovation can flourish without being stifled by walled gardens.
We were promised flying cars and instead we got 140 characters. We want a hyper-personalized internet: one that rolls out the red carpet for us. We were promised fanciful new technology.
What we really need is a personalizationomenon.
Private dancers
Privacy has been a tool of marginalization. It commodifies and constricts identity. Technologies of the self – rituals of self-actualization and truth – are the cure.
Scroll to read
8/4/2024
Rationalist privacy reigns in America.
People readily trade information about themselves in exchange for value, like saving time or money, or just expressing themselves.
Survey after survey reveals a majority of Americans – particularly younger Americans – will trade information about themselves for better or more convenient experiences online.
For instance, in January this year open internet ads platform The Trade Desk surveyed over 2000 people and found
74 percent of respondents surveyed say they would be willing to share personal information with brands and retailers when prompted.
Most consumers were interested in sharing data if it could give them access to better prices.
This is consistent with our own research in the area. Last fall we surveyed 500 Americans ages 18-65, asking them a variety of questions on their attitudes around personalization
How personalized does the internet feel to you today?
Would you like the internet to be more personalized?
If it were safe would you share data to unlock a more personalized internet?
When restricted to Gen Z – respondents aged 18-27 – we found that over 70% would share data with their favorite sites. [Email us for access to the results of our survey.] Younger Americans are increasingly sharing everything online
frustrations dating
struggles with finances
Social as commerce
We’ve previously written about the privilege, challenge and costs of privacy absolutism.
In this blog we trace a history of the powerful using privacy as a mechanism of subjugation and contrast it to the power of confession as a principal pathway to self-actualization.
Privacy to marginalize
Privacy is complicated.
It’s been over a century since Warren and Brandeis declared The Right To Privacy.
Despite years of attempts, a unified theory of privacy has proved elusive except perhaps economists’ ‘privacy as control.’ [We trace this history of attempts in a previous blog.]
We see privacy in a Mill-ian sense, where privacy can be used by the individual as protection
against the tendency of society to impose … its own ideas and practices as rules of conduct on those who dissent from them … [and] prevent the formation, of any individuality not in harmony with its ways
Privacy can be a shield against threats of domination.
But in cases of power imbalances, privacy can also be used as a dominating force of subjugation, control or marginalization.
In realms of privacy like family units, an institution of privacy can act as a force of domination that keeps the disempowered voiceless and isolated; where privacy of the family unit trumps the liberty of the individual.
“Don’t Ask Don’t Tell” is another example.
A person's sexual orientation is considered a personal and private matter … Homosexual conduct is a homosexual act, a statement by the applicant that demonstrates a propensity or intent to engage in homosexual acts.
Homosexual conduct is grounds for barring entry into the Armed Forces.
In this case, privacy is used by an institution of power to impose its own practices on those who – perhaps even by God– dissent.
In this case, privacy is not an object of individual agency but one of marginalization. It is the insistence that there are certain truths or characteristics that must be relegated to private spaces; that cannot be observed or acknowledged as to otherwise violate a sanctity of an otherwise pure organizational unit.
This flavor of privacy has images of the privacy that is commercialized by big tech. It’s reasonable to imagine that objects that we deem private – our email or photos – remain ours alone. But it’s interesting to see other aspects or objects of ourselves caught up in corporate privacy narratives that aren’t clearly also understood to be private.
On one hand it’s obvious that the way consumer data drifts freely across the internet is problematic. It’s weird that a company most people have never heard of has spending, behavioral, and even health data from over 200 million Americans – likely acquired through backroom deals no one wants you to know about.
But some paths in digital privacy appear to come at a cost so great it might begin to trade off other things we actually value quite a bit too. [Like the Open Internet.]
Eric Seufert calls this Pyrrhic Privacy
Pyrrhic privacy goes far beyond what is necessary to fortify consumers’ data, and it compromises the foundation of the open internet. Ridding the world of targeted, personalized advertising would impose enormous costs on society.
Eric continues in an earlier article
If Apple wants to provide consumers with real choice around how their data is collected and used, it should explain why their data is collected and used, and for what purpose. Would most consumers opt into ad tracking … if they knew that doing so provided for their favorite mobile apps to be made available for free? The community of app developers that propelled the App Store to half a trillion dollars in revenue in 2019 deserves to know the answer to that question.
If AI collapses the value of software to the cost of compute, privacy is a final boss that allows accrual of rents for compute on data over other compute. To be clear, we are targeting this business, but we are doing so on the principle of individual liberty and an open internet rather than oligopolistic corporate control of data and attention.
Since the start of Crosshatch we have been fiercely opposed to wielding privacy as motivation for closed networks. [We aim to navigate our own possible tension here by a devotion to individual agency and all that may mean.]
Liberty is not achieved by defining select spaces safe for expression. It’s achieved through freedom and self actualization – by fiercely choosing to be yourself.
Privacy is rather a curious final boss to pick, as it is in conflict with generations-old rituals for self-actualization and truth.
These are my confessions
Since the Middle Ages at least, Western societies have established the confession as one of the main rituals we rely on for the production of truth.
The obligation to confess is now relayed through so many different points, is so deeply ingrained in us, that we no longer perceive it as the effect of a power that constrains us; on the contrary, it seems to us that truth, lodged in our most secret nature, demands only to surface; that if it fails to do so, this is because a constraint holds it in place, the violence of a power weighs it down, and it can finally be articulated only at the price of a kind of liberation. Confession frees, but power reduces one to silence; truth does not belong to the order of power, but shares an original affinity with freedom: traditional themes in philosophy, which a political history of truth would have to overturn by showing that truth is not by nature free—nor error servile—but that its production is thoroughly imbued with relations of power. The confession is an example of this.
Foucault writes, and continues
Truth is not by nature free - nor error servile.
For Foucault, truth is both liberation from power but also sanctioned by it, as it could not be produced if not for power. Our truths are produced by the self-actualization of the individual, but also only achieved through the mediation and interpretation of others.
Foucault treats confession as a Technology of the Self, an exercise of transformation toward knowledge, purity or truth. Writing is an example.
For Foucault technologies of the self
permit individuals to effect by their own means or with the help of others, a certain number of operations on their own bodies and souls, thoughts, conduct, and way of being, so as to transform themselves in order to attain a state of happiness, purity, wisdom, perfection or immortality.
It’s not a stretch to say tech’s largest social platforms are expressions of this idea.
Technologies today are broadly confessional. As we previously observed, each social platform invites reflection
Twitter: “What is happening?!”
Warpcast: “What’s happening?”
Facebook: “What is on your mind?”
Instagram: “Share everyday moments”
Snap: “Express yourself and Live in the moment”
TikTok: “Inspire creativity”
Strava: “Record over 30 types of activities with features to help you explore, connect and measure your progress”
As a quick aside, it's curious to see Warpcast in this list. With crypto “I own my data” it’s not extraordinary to imagine the web3 crowd could believe sharing “what’s happening?” could be a violation of the self-sovereignty of individual data. That is, granting data access of a realization of ‘what’s happening’ to even a ‘sufficiently decentralized’ platform could violate the agency of the self, even if consentfully shared.
The obvious absurdity of this reveals a misguidedness in data self-sovereignty that the web3 crowd is finally coming around to.
Indeed, by definition, technologies of the self are not platforms of privacy. They are platforms of self-discovery and transformation specifically for their mediation by relations to power, e.g., by friends, strangers, or e.g., with ChatGPT or Llama3.1, servers.
We become by sharing.
It’s difficult to imagine a massive consumer internet platform that is not a technology of the self. Even private services on iPhone like Camera Roll, Notes, and Location Services have interfaces to share.
Private dancers
Privacy is indeed complicated.
But, except as decided by the self, we’ve seen privacy as a tool of marginalization and blocking of self-actualization.
People, platforms, and organizations tell us only certain places are safe to be ourselves. That certain behavior is not appropriate or cannot be shared for it’s unseemly or uncouth.
There’s no doubt that privacy serves as a tool of the individual as protection from a tyranny of an unaccepting society wishing for a softening of a long jagged tail.
But for platforms to insist that the self is not safe but on their turf:
I’m your private dancer
Dancing for money
Do what you want me to do
so only they can watch us and sell our attention feels like a bleak endgame to a commodification of our collective selves.
Taking instead privacy as an object specifically of individual agency and control – as a technology of the self toward self-actualization from deep within the soul – where we decide to dance wherever we want to ends of our choice – has an optimism of empowerment otherwise lacking.
Love has a lot to do with it.
At Crosshatch, we envision a different paradigm - one where privacy is an individual's tool for asserting agency over identity and data. In this view, sharing is not a violation but a voluntary act, a technology of the self in the service of becoming. Confession, mediated by protocols of trust and reciprocity, becomes a path to richer self-knowledge and authentic connection.
Love has a lot to do with it. Love for the singular, irreducible human behind the screen. Love for the transformative potential of the technologies we've created. Love as an antidote to the instrumentalization of the self.
So let us dance where we want, with whom we want, to the rhythms of our own kind of music. That is the true promise of the digital age - not a retreat into defensive, fearful privacy, but a leap into the open air of self-expression, exchange, and actualization.
At Crosshatch, we're building the protocols for that future.
He-said she-said
For memory for AI, we don't just want her side of the story or his. Whatever you want to share, we want the whole story, unfiltered.
Scroll to read
8/10/2024
''Americans have grown accustomed,'' wrote Time, ''to weighing the word of one defendant against that of one plaintiff, the steadfast denial against the angry accusation: he said, she said.''
The NYT wrote in its popular "On Language" column that ran from 1979 to 2009, covering popular etymology. This quote came from an April 12 1998 edition, unpacking the newly popular "he-said she-said." Toward unpacking it, the article cited Georgetown linguistics professor Deborah Tannen who explains the phrase is about
"cross-gender discourse'' -- different understandings of the same words.
The idea was that when a woman says, ''I have a problem at work,'' she means ''Listen to my problem''; a man takes that same phrase to mean ''Tell me how to solve the problem.''
The phrase was adopted into a movie of the same title "He Said She Said" starring Kevin Bacon and Elizabeth Perkins. Director Brian Hohlfeld explained
''Our film was about gender differences, but now the phrase seems to be more about who is telling the truth and who isn't.''
He's right. The latest meaning has nothing to do with dialogue (he said and then she said) and no longer refers to missed communication. Now the phrase most often means ''testimony in direct conflict,'' with an implication that truth is therefore undiscoverable.
We see this he-said she-said problem arising in AI in two ways.
First, some applications of AI memory only save AI interpretations of original utterances, not the utterance itself. Second, (AI) applications only see a limited slice of our context.
Both of these lead to a "he-said she-said" environment where truth is undiscoverable, either from lost opportunities for interpretation or not having the full context to begin with.
In this blog we'll unpack both of these dynamics and share what Crosshatch is doing about it.
You mean you remembered all that?
On June 13, 2023 OpenAI released GPT3.5-Turbo 16k, a model with a 16k token context window. This came after GPT3.5 shipped with a small context window of 4k tokens.
Small context windows spurred exploration into technologies that could manage AI’s apparently limited context.
Vector databases rose.
Two days later, Langchain hosted an event on AI memory, exploring paths like
Rolling conversation summary buffers
Reflection: using AI to decide what’s important and storing it to a (vector) database
We see this reflection mechanism deployed in the popular open source mem0 and apparently in OpenAI's implementation of memory. Mem0 allows developers to pass conversations through a LM to map conversation history to “memories” or "facts." For instance, "memories" can be provided by this default prompt (see their git)
Deduce the facts, preferences, and memories from the provided text.
Just return the facts, preferences, and memories in bullet points:
Natural language text: {user_input}
User/Agent details: {metadata}
Constraint for deducing facts, preferences, and memories:
- The facts, preferences, and memories should be concise and informative.
- Don't start by "The person likes Pizza". Instead, start with "Likes Pizza".
- Don't remember the user/agent details provided. Only remember the facts, preferences, and memories.
Deduced facts, preferences, and memories:
and a chosen LM.
Of course, context windows have grown since last June. Now commercial models (with context caching!) have context windows with over a million tokens. That can fit about eight JRR Tolkien’s The Hobbit.
"Context windows to infinity” have seemed to relax the urgency of AI memory to a standard engineering problem, balancing latency, cost and quality.
These tradeoffs appear to be quite real, and for some applications, the quality lost by restricting memory to just these AI-found facts is too high.
Last week Chrys Bader and Sean Dadashi launched Rosebud, an AI powered journaling app for iOS and Android.
Fast Company covered its launch,
What if your journal could not only hear you, but also understand and respond to you?
Rosebud founder Chrys Bader reported that for Rosebud, pre-compressing “deduced” memories is too lossy.
In the context of chat, we find storing "facts" to be too lossy. Pre-compressing what someone says and storing it as bite-sized facts loses a lot of the nuance and context.
That's why at @joinrosebud we always share the original context with the AI, and while token-usage is much higher, the results are incredible. What would be an improvement is having a specialized, low latency LLM filtering layer to extract the most relevant memories from a larger dataset.
Chrys contrasts this question as one of
context at write v. context at read
To us, storing a “fact” instead of the underlying strips the original utterance of its fullest expression in other contexts. Chrys continued, saying that fixing context at write time
exercises judgment at the time of utterance, without knowing how it will be used in the future. Retaining the original utterance allows for more future dynamism and contextual relevance
Our sense is that memory for AI is no different than memory for the internet. If you see a user take an action on an object in a context (e.g., send a message with various metadata) – record that, just like we do for web. This extended CDP-style representation appears both well motivated in practice and philosophically.
This is an example of this he-said she-said problem, where a language model's first interpretation of an utterance as given by its memory deduction prompt, might not actually induce the right interpretation. Recalling from the NYT
The idea was that when a woman says, ''I have a problem at work,'' she means ''Listen to my problem''; a man takes that same phrase to mean ''Tell me how to solve the problem.''
Using a write-time memory, you risk losing opportunity to interpret the original “I have a problem at work” as you individually might, depending upon the context you have on the nature of the original utterance. Instead, the memory layer might incorrectly presume the meaning of the utterance "asking for help" when all the user wanted was for someone to listen.
This is why at Crosshatch, we use a memory representation Context Layer that eschews storing “facts” and honors data provenance.
We just store what happened and leave an application to interpret it. Context is contingent. [So is identity!] Opinionated memory layers like mem0 or OpenAI memory produce memories that are images of the underlying context, memory prompts and the LM and thus produce facts that are at best not necessary (in a modal logic context) and at worst even incorrect!
In modal logic terms, these 'facts' fail to achieve the status of necessary truths - they aren't invariant across all possible interpretations or contexts. At best, they're contingent truths, dependent on a particular (and likely hidden!) interpretation at a specific time. At worst, they could be outright falsehoods when applied in contexts different from their original interpretation.
Data silos also reflect our he-said she-said reality, where different applications only know one side of our story.
Forget me? They can’t even remember me!
Interactions with AI today are like using e-commerce sites in 1994.
In 1994, to buy something online you had to do it in one uninterrupted session.
If you left, the site wouldn’t remember you and you’d have to start all over again. Sites had “amnesia” as the NYT put it in a 2001 piece.
Today, any meaningful engagement with (an AI-powered) app needs to happen with just one app or OS.
If you start with one – providing it some of your context – but want to pick up with another, that context doesn’t come with you. It is stuck with the original app. Even if you go to another app (using the same underlying language model!!) you’ve got to start from scratch!
Our labor to provide apps with context is a sunk cost: an investment of time that can’t be recovered.
It's like dating; it's just hard to start over again with a new person.
USV described last week.
Bill Gurley described this dynamic in a 2003 blog “In Search of the Perfect Business Model”
What if a company could generate a higher level of satisfaction with each incremental usage? What if a customer were more endeared to a vendor with each and every engagement? What if a company were always more likely to grab a customer’s marginal consumption as the value continued to increase with each incremental purchase?
This may be the nirvana of capitalism – increased marginal customer utility. Imagine the customer finding more value with each incremental use.
The customer who abandons increasing marginal customer utility would experience "switching loss."
The strategy that Bill Gurley described becomes particularly pronounced with AI, where marginal customer utility increases each time we provide an app with a little more context.
NewsRadio S02E14
We hope we can be pardoned for our distaste for ‘capitalist nirvana’ as a strategic limiting of consumer choice.
I was going to. But, then I asked myself, ‘Why?
Today, state memory is bound to applications or OS. It does not vary or grow independent of applications that govern it. Application memory only grows whenever we put in the work in that application.
But why should it work this way?
We engage with tons of different applications, all of which manage their own records of their interactions with us – whether in a CDP, database, memory abstraction or vector store.
State should grow independently of any single application’s use.
We buy stuff at Walmart.
We go for a run in a park.
We chat with Rosebud.
We confirm a reservation on Resy.
We like a photo on Instagram.
Most of this took work from us. Engagement costs time. Creating data is labor. But this state can only be activated by the application where it took place.
We find this incredibly disappointing. So we're doing something about it.
Crosshatch provides a way for developers to personalize applications with data their users share. We believe the consumer context layer should update as we take action across applications, not as we use just one app. This can unlock a world where AI activations of user context are truly contextual and ambient.
Today, Crosshatch allows users to bring context with them. Crosshatch starts by sourcing context from apps with dev tools that already encourage developers to build new services with app data.
When a developer adds Crosshatch, the data connection is unidirectional. Crosshatch allows users to bring data with them, but doesn’t require applications to give up data they collect on the basis of user engagement.
Many people ask us if Crosshatch is actually contrary to the interests of the apps from which we source data.
This is a reasonable question, but – perhaps counter-intuitively – we actually don’t think so.
Rather, we think Crosshatch will have the opposite effect, by making usage of connected apps more impactful. Namely, whenever you interact with a connected app, with Crosshatch that app’s activity can be projected anywhere else on the internet.
Make Zillow more like your Pinterest.
Make Beli more like your Instagram and Chrome Browsing
So use Pinterest more, because your activity there can appear anywhere else you want it.
We recognize the work that goes into building a great user experience. We think data that’s created as a part of application usage is rightfully observed by the business that deployed the application. Users increasingly understand this dynamic and are okay with it, these days sharing more online than prior generations.
Taken all together, however, we believe that allowing users to bring application data to new apps is actually a net-positive for both users and apps.
At the end of the day, we are students of economics – we get the whole switching cost dynamic and why businesses want it. Crosshatch can't just be good for users, Crosshatch has to provide value for both businesses and users.
But normatively we hold firm that raising the walls to switch is not pro-user.
If AI is a path to accelerate – let us accelerate. Let's unify context and let users decide how and what to do with it, with whom.
Let’s deliver a higher level of satisfaction with each incremental usage ... across the entire internet.
Let’s build an internet more you.
Hot stuff
Personalization cold-starts and learning curves are over. Live cross-app context is now on tap.
Scroll to read
8/19/2024
Early customers use Crosshatch to do two things
Replace lengthy on-boardings with a tap
Enable activation of up-to-date user context
In the past, consumer companies have gotten to know users by
asking users lengthy questionnaires
watching users use their app (e.g., via CDPs)
These methods are challenging because they’re high friction, expensive and signal poor.
Onboarding questionnaires are nice in theory. The longer the questionnaire the more signal you might get about your customer. But that also increases friction: The more questions you ask, the less likely a user is to complete it.
Every time you ask the user click you lose half of them.
(AKA why tutorials, splash screens, and lengthy signup flows are a bad idea)
A16Z’s Andrew Chen wrote in February of this year.
And while CDPs have been selling the promise of real-time personalization from first party data for the better part of a decade, all but the largest players are coming up disappointed.
The truth is, first-party data constrained to a single app’s context isn’t enough to meet consumer expectations. If you want to meet consumers where they are, you need the whole story, not just the limited slice you see.
Crosshatch solves both these problems with identity: allowing consumers to log in with all their context in a tap, and brands to activate it in just a few lines of code.
Unlike personalization solutions of the past, Crosshatch is cheap to implement and works as soon as your first user logs in.
No more learning curves
Suppose you want to ship a personalized user experience. How do you do it?
According to McKinsey, you must
Invest in customer data and analytics foundations. Find and train translators and advanced tech talent. Build agile capabilities. Protect customer privacy
This sounds like a lot of work! We’ve been there – shipping personalized experiences in the past required
instrumenting reliable events data over a lot of customers
associating that to outcomes you care about
Training models that predict outcomes from event patterns
monitoring performance
This baggage leads many folks to ask if Crosshatch has a chicken or the egg problem.
Their intuition? To ship personalization you need to aggregate a lot of user data first! Right?
Wrong.
Language models make activating user context a single API call.
With language models, the old ways of personalization feel bloated and strikingly old fashioned.
Crosshatch takes a radically different approach: When a user links their data to an app with Crosshatch, that app activates connected context through language models. This allows Crosshatch to deliver users and developers value as soon as a user links data, not after lengthy and expensive speculative build outs.
Where should a user stay in Paris?
curl -L 'https://proxy.crosshatch.io/chat/completions?api-version=2024-06-01' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Crosshatch-Replace: {"hotel_history" : {"select": ["entity_name", "entity_city", "order_total", "originalTimestamp"], "from": "personalTimeline", "where": [{"field":"entity_subtype1", "op":"=", "value": "TRAVEL_TRANSPORTATION"}, {"field":"event", "op":"=", "value":"purchased"}], "limit": 100}}' \
-H 'Crosshatch-Provider: gpt-4o' \
-H 'Authorization: Bearer $USER_BEARER_TOKEN' \
--data '{
"messages": [
{
"role": "user",
"content": "Consider the users past hotel spending {hotel_history}. Based on this context, where might the user like to stay in Paris?"
}
],
}'
Call the Crosshatch API to find out.
Just define relevant context hotel_history and choose a language model gpt-4o. The context hotel_history is populated or 'hydrated' using the data a user links. Crosshatch keeps this data in sync, meaning that a user engages across other apps, that context and the output of this API call automatically stays up-to date with no work from developers or users. Context coming in hot!
To ship hyper-personalization of the future, you don't need the McKinsey recommended
expensive infrastructure
advanced tech talent
cross-disciplinary project teams
or hairy privacy infrastructure
Crosshatch compresses all of this to a single API with all the usual benefits
reduced customer acquisition costs by as much as 50 percent
increased revenues by 5 to 15 percent
increased marketing ROI by 10 to 30 percent.
To start personalizing on context users share, reach out to create an account.
Bill Gurley and the Sirens of Nirvana
20 years ago Bill Gurley found the keys to capitalism: the Context Moat. For the age of AI, it's a local maximum.
Scroll to read
8/31/2024
On June 10, 2003, with the economy “anxious and waiting” looking for sustainable competitive advantages Bill Gurley found the Perfect Business Model.
What if a company could generate a higher level of satisfaction with each incremental usage? What if a customer were more endeared to a vendor with each and every engagement? What if a company were always more likely to grab a customer’s marginal consumption as the value continued to increase with each incremental purchase?
This may be the nirvana of capitalism – increased marginal customer utility. Imagine the customer finding more value with each incremental use.
To keep customers engaged, no need to directly impose switching costs onto consumers “which can be seen as ‘trapping’ or ‘tricking’ the customer”. Increased marginal utility preempts the need!
Instead, the customer who abandons increasing marginal customer utility would experience "switching loss."
It was at this moment that Bill Gurley invented the context moat.
This became the perfect business model used by today’s biggest consumer businesses:
Collect usage data
Transform into better experiences
Increase switching costs
The trouble is, and as AI is making increasingly clear – the context moat is a local maximum.
While AI capabilities are expected to continue increasing, even today’s distilled models are incredibly capable. In the context of personalization, the power of personal AI is bounded from above by available context. First party context, it turns out, is just not that much.
To be fair, Personalization has been a fragmented exercise – requiring fragmented
Personnel: PMs, data scientists, engineers, designers
Systems: warehouses, pipelines, model frameworks, activation mechanisms
Data: first party or third party from DMPs
But AI can vertically integrate all of this: AI unlocks simple personalization programs comprised of just context and a prompt; executed at marginal costs headed to zero.
So while incremental usage data to grow app marginal utility might have worked for the age of fragmented machine learning, it breaks for the age of AI.
It’s simply not ambitious enough.
Language models are serving “machine god from sand” but machine learning with first party data is giving “this one time at band camp.”
This blog explains how we got here, and what experimentation outside of this local maximum could unveil.
Context not data moat
Traditionally we think of data moats.
The story of defensibility from data started in the 2010s as commercial machine learning systems began to rise. To build these systems, you’d need lots of data. Consumer startups found defensibility in Gurley-styled consumer data assets used to create personalization models that made switching expensive for users.
But, paradoxically, the rise of machine learning systems atop these data moats is leading to their own demise. A16Z’s Martin Casado explained 2019
Data has long been lauded as a competitive moat for companies, and that narrative’s been further hyped with the recent wave of AI startups.
But for enterprise startups — which is where we focus — we now wonder if there’s practical evidence of data network effects at all. Moreover, we suspect that even the more straightforward data scale effect has limited value as a defensive strategy for many companies
Martin’s observation for enterprise appears prescient for the age of large language models. LM makers have assembled all the internet’s data and trained generalized models on top.
UPenn’s Ethan Mollick observed a few months ago that specialist models – trained atop first party private data – quickly lose out to rising generalists.
Remember BloombergGPT, which was a specially trained finance LLM, drawing on all of Bloomberg's data? … GPT-4 beat it on almost all finance tasks.
Your special proprietary data may be less useful than you think in the world of LLMs…
Consumer companies, of course, trained their own models to predict favorites, what items customers might like to try next, or what sort of coupon will keep them coming back.
But with generalist language models able to activate any context, the candidacy of first party data to train proprietary models as differentiated enterprise asset sours. Open-source “GPT-4 level models” like Llama 3.1 distribute under a permissive
non-exclusive, worldwide, non-transferable and royalty-free limited license
raising the relative requirements for training models internally.
The data moat is dead.
It’s the context moat that holds on and keeps switching costs high.
It’s pretty weird to think about companies competing to own a slice of our context. But these context moats have real stakes. To understand why, we need to look at the economics of context moats, particularly in the age of AI.
Economics of context
The "everything is an ad-network" world runs on two things: attention and context.
And the business of personalization is a simpler form of the business of advertising (see Google’s Hal Varian’s intro). With rising CAC and privacy technologies, it has increasingly attractive properties.
Personalization is about matching people who want to buy things to things you sell.
The business value of personalization grows with customer data and attention. This is distinct from advertising, whose context is fragmented across corporate lines.
For instance, a Shopify store advertising on Meta doesn’t know the users who see Meta ads, and Meta doesn’t readily observe the conversions that happen on Shopify as a result of its ads. In the personalization case, there’s no fragmentation as all of this context is first party.
This is particularly important because different types of context have different value. In the world of personalization in businesses that sell things, context relevant to purchasing decisions reigns. A static picture of a user doesn’t cut it – you need a constant stream of data relevant to the personalization exercise at hand.
For instance, while Facebook has a lot of user data, for advertising it’s rather dependent on conversions it (used to) observe on advertiser sites. Eric Seufert explains
Facebook’s first-party data — its knowledge of the ‘cat videos’ and ‘vacation pictures’ with which users engage — is actually not very helpful for use in targeting ads. Rather, what Facebook uses to great effect is its knowledge of which products users buy on advertisers’ properties: armed with that information, Facebook can create behavioral product purchase and engagement histories of its users that very effectively direct its ads targeting. What’s missing in the above assessment is that advertisers willingly and enthusiastically give that data to Facebook.
It’s important this data is specific and recent:
Recency plays an incredibly important role in the signal parsed from the data identified above. Not only must an advertiser know that a user has historically engaged and monetized with products following an ad click, but they must know that a user has done so recently: the older the data is, the less helpful it is in targeting ads to that user. Facebook’s ad targeting models require a constant supply of data — the off-property events stream — in order to retain efficacy.
This is why we’re seeing a great rise in retail media networks: everyone who has conversion data
Lowes
Chase
Walmart
Uber
Instacart
Doordash
CVS
has launched one, and investors are loving it. Selling attention by activating context is a great business, it turns out. [It could also be enshitifying everything.]
A fundamental problem is that context is fragmented and not evenly distributed. In some ways, this is good – we don’t want everyone to have all our context. In real life we share it selectively.
On the other hand, this fragmentation leads to suboptimal experiences and services for businesses and consumers. Meta needs conversion data but doesn’t have it (and instead cuts deals to get it). Walmart – who wishes to anticipate grocery needs – could use cross-app itemized receipt data but only has its own. Spotify – who wishes to provide more relevant user experiences – could use context from Apple Music, location, or browser history, but only has listening history. In response, companies stand up clean rooms to share consumer data, but it’s only available to the largest players and trades noisy “privacy preserving” data for political and compliance cover.
This cross-app interoperability could deliver surplus for consumers and businesses, but today we’re all restricted to the context any app has. This is the local maximum of Gurley’s capitalist nirvana. Surely we can do better.
Sirens of Nirvana
20 years ago Bill Gurley summoned the Sirens of Capitalist Nirvana. The ultimate business model where marginal switching loss increased with marginal usage.
Each incremental usage created a new data point that cumulatively made consumer switching wholly inconvenient. Even if a competitor had something better, the prospective loss of convenience appeared too great for users to contemplate alternative action.
And now with language models able to operate practically any length of context for marginal costs approaching zero, the games of accumulating consumer context to induce switching loss is laid totally bare.
Before language models, this conceit was hidden beneath layers of infrastructure and headcount – transforming Big Data into Machine Learning Models was only available to the technical, creative and bold. The complexity of activating context created another moat – only the largest and most sophisticated could do it.
But now, language models are available everywhere, for everyone – serverlessly, in your own GPUs, open source, probably soon in the browser and OS – and for any skill level: all you need is a well-formed prompt.
The technical veil that made this capitalist nirvana not patently an exercise of competitive data tracking is gone. Language models democratize the activation of consumer context – it's just a prompt away and available for less cost than purpose-built systems.
The personalization we saw in the movies anticipated our needs. It could use context about us to surprise, delight and save us time and money.
Consumer enterprise prefers the context moat. It’s been the basis for the defensibility of their business for the past two decades.
But clinging to it now will fail relative to the growing data gravity of data consumers can bring with them. Consumers have more data about themselves than any business does. Consumers are the sun.
What users gain – internet applications that specialize themselves for each of us – is at least normatively and likely economically superior to what any business might lose in getting to claim they have more context than a competitor.
At the end of the day, the user (and likely the business) pays the cost of clinging to Gurley’s capitalist nirvana. And language models lay this truth bare. There’s no reason for it any longer.
We’re entering a future where consumers can bring their context with them. With Crosshatch, consumers can allow apps to interpret their context on any surface in ways it benefits both business and consumer. Competition proceeds on provision of genuine value, not who knows a slice of you best.
This mutual gain is critical, as it enables personalization that’s free for users, a founding design principle of the internet.
One failure mode for Crosshatch is businesses just reject this: why mess with a good thing? Well, the current paradigm isn't great for people. Normatively governments globally are requiring businesses allow consumers to access their data.
We believe context interoperability is actually good for everyone. Our intuition follows Adam Smith:
Key to promoting efficiency in markets is transparency and access. Adam Smith in the Wealth of Nationswrote about this over 200 years ago. He contended that if the price of information is lowered or you make information free, in promoting such transparency, the economy benefits. Similarly, if access to the market is free, everybody gets to compete.
Gary Gensler remarked in 2013. People and the economy benefit when information flows freely up to the frontier of user choice.
At Crosshatch, we’re building the infrastructure to make this future possible. Businesses can invite users to link their context in a tap, making their context available for activation for mutual benefit and in a way users control.
The context moat is coming to an end.
To begin building in this new future of personalization, turning on hyper-personalized experiences in a tap with no learning curve or cold start required, feel free to create a developer account.
YMMVC
Extending the Model-View-Controller architectural pattern with AI-Bindings to make generative UI react to You.
Scroll to read
9/10/2024
The web started out still, but then it started to react.
This blog studies a history of reactivity on the web and how a reframing of the classical model-view-controller user-interface architecture pattern could unlock ultimate reactivity.
Still life
This is the world’s first website. It was published by Tim Berners Lee on April 30, 1993.
It’s a text document written in Hyper-Text Markup Language. It’s a page part of a file system that’s accessible to browsers at its file system’s IP address.
The page is static and read-only: a reflection of how the authors saw the world.
Hackernoon: What is HTTP?
It’s accessible via Hyper-Text Transfer Protocol: You request the page and you get it back. It doesn’t change based on your inputs here or anywhere.
Change with the times
In 1995 Boston TV’s WCVB’s Fred Dufresne created and patented server-side scripting so that web pages could be templated and dynamically constructed.
Figure 14 from US Patent US5835712A filed in 1996.
This was important for news sites like WCVB whose news articles shared structure but had dynamic content.
WCVB from 1998 via Wayback Machine.
As the world changed, WCVB wrote new stories.
Scripting allowed it to update its content independently from its web page structure. Whenever a user visited the site, HTML was rendered server-side and sent to the client.
At this point in time, the web’s view of the world came outside the web.
“Sox CEO Pitches Ideas for New Ballpark Funding”
WCVB wrote about events happening in the real world and represented the news on the web.
This pattern enabled early reactivity, but it had two problems. First, whenever the world changed, the site loaded in the browser wouldn’t update. If you wanted fresh news, you’d have to refresh. The browser would then would
go to the server
fetch data from the database
turn it into html
send it back to the browser
Ideally, web applications would automatically change client side – on your browser – whenever the data state changed.
Second, while news applications reacted to events happening outside the web, soon we’d want applications that could react to state changes from inside the web.
Change inside out
In 1995, eBay launched, allowing users to buy and sell items in a marketplace. This introduced new applications of templating – for accounts, carts, and product pages.
At this moment the web began to create its own objects to represent: commerce. It went observations of general events sourced offline to those cataloged digitally.
It birthed new semantics: items for sale, pending, shipped, delivered.
This kicked off a flurry of new platforms enabling representation and mutation of the world and its digital representations – travel, social, real estate and e-commerce.
But all these platforms concerned user state as only known by the application.
Change outside in
What if user state could be managed outside the application?
This started with providing user state outside the application session. In 1994 websites couldn’t remember users across sessions. Each time a user returned, web applications couldn’t remember. State developed within a session was lost.
Cookies gave HTTP state across sessions: They allowed sites to implicitly identify via user cookie identifier stored in the browser so that application sessions could be maintained in the server. That is, the web server associated user state to a cookie identifier.
This helped web pages remember users across sessions without logging in.
Third-party cookies enabled tracking user state across web applications, giving a more unified view of activities across many sites. This allowed sites to represent changes from outside their walls (e.g., activity on other sites) to services like personalization and ads on their own.
This made the web convenient but also a little creepy. When talking to one person we don’t usually imagine that conversation is being shared with everyone else.
But the systems to map outside activities to services within apps turned out to be fragmented and complex. These systems have two parts: sourcing and activating data – both challenging.
Pathways like cookies to source data outside in – now increasingly blocked – were often unreliable. Cookies refreshed. IP addresses have multiple users. Some applications used DMP data networks whose wonky backroom deals made reconciling identifiers across data partners equally error-prone.
Even the largest social networks rely on cross-app tracking: Meta replaced its pixel with the Conversions API to track conversions on its advertiser customer sites. Many imagine Meta has incredible data – it does – but data most valuable to its ads business comes from cross-application contexts. Meta doesn’t know if its ads convert unless advertising applications tell them.
Activating outside data inside is also a challenge. Engineers combine these data flows in proprietary machine learning algorithms and systems that are expensive to engineer. Even if you have access to data representing change outside in, activating that data is a whole other question.
Time to change
To get to the final stages of reactivity, we return to the original user interface design pattern: Model-View-Controller
Wikipedia simplified diagram of MVC.
We follow the 1979 original Trygve Reenskaug definition
Models. Models represent knowledge. There should be a one-to-one correspondence between the model and its parts on the one hand, and the represented world as perceived by the owner of the model on the other hand.
Views. A view is a (visual) representation of its model. It would ordinarily highlight certain attributes of the model and suppress others. It is thus acting as a presentation filter.
Controller. A controller is the link between the user and the system. It provides means for user output by presenting the user with menus or other means of giving commands and data.
Perhaps only at question then is who is the owner of The Model?
Or rather, what is the model of? The application’s view of the user? Or the user’s view of the world? Is it the user or the app that provides the interface?
For the web to become fully reactive – to flux – we believe it’s both user and app.
Crosshatch-App Partner architecture pattern.
Here we introduce the YMMVC model, an extension of the classical MVC pattern but adding Your Model. In it, the app controls its own local model, but users also have a global model and private AI resources that can be permissioned to any app they use.
Today, like the original server-side rendering of WCVB, if you want a view to update, you the user has to engage with the app to provide it updates to its model. This costs time and labor and in many cases you've already provided updates to other app models relevant to the present app. Apps should react to You: This pattern fixes this by allowing Views to update based on all Your Model context, not just your engagement with the application at hand.
Apps may expose their model to users to access and sync it. Apps like Strava, Fitbit, Whoop, TikTok, and Google already do this. This means that Your Model is a meta model over cooperative application Models.
Apps update their Views based on local and global (Your) models, depending on whose data model is most relevant and accessible to a view at hand.
Model-view binding – making data representation in the model legible to the View – is typically challenging, but AI can make integration natural. AI can translate combined app and Your Model context into new views with only a prompt mapping context from other application models to its own. AI is a powerful and inexpensive binding agent.
Claude suggested we talk about privacy – we've discussed privacy at length elsewhere – we believe this pattern is both privacy-safe (only using context a user chooses to share) and consistent with how we already use user interfaces (I already tell TripAdvisor my travel dates and price per night I'm looking for).
Do it like the 90s
[We borrow this header title and the following quote from this Facebook presentation in 2014.]
“We should do our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program and the process as trivial as possible” (Dijkstra, 1968)
Unlike traditional UI frameworks, we take a Model as less a candidate specification of a View, but rather context for the construction of any View as prescribed by an AI context-view binding.
AI as a binding-agent is perhaps a maximalist response to Dijkstra’s wish.
In a traditional application, if you see a snapshot of state it should be straight forward to understand how state binds to application view. But if you were to see a collection of context from state outside your application, you might not know how that binds to a View. But AI – equipped with relevant context (and a general understanding of the world) – does.
In the ‘90s we refreshed our browser to get updated content. In 2024, applications infer from Your Model context to get updated Views..
Taking AI to be a binding agent, you can recast your UI as a referentially transparent function over arbitrary context, not just context as defined in your application. Instead of templating UI with variables defined by your application, AI as a binding-agent expands the scope of applications to operate over any context a user will permission to your application.
Using Vercel AI, this is Crosshatch YMMVC in action.
We’re getting ready to publish a Crosshatch SDK for Vercel AI library to enable this pattern in a few lines of code. Sign up for access and we’ll keep you posted.
Call my agent
Before AI agents, there were user agents. To get personalized services, have their agent call your agent.
Scroll to read
9/15/2024
To deliver personalized services, services will call your user agent.
We've all heard this narrative. Why would they need to do this?
So far in internet history, apps have collected context about us and used that context to serve us.
They identified us with cookies or other identifiers and associated all the context they observed to that ID in what eventually became a CDP.
And indeed, CDPs like Twilio’s Segment make activating that data easy and startups like mem0 and zep enable companies to ship their own memory layers for their own AI applications.
So where does “your agent” come in? And why?
If you're using an AI service with a memory layer, do you need an agent? Stepping back, how do you even get an agent? And once you have one will it always have your best interests at heart?
May I speak to Susie Myerson please?
AI agents are evocative, but it’s hard to know what they really refer to.
We’ve seen many candidate definitions across the industry with e.g.,
LangChain’s Harrison Chase saying
An agent is a system that uses an LLM to decide the control flow of an application.
or Andrew Ng
There’s a gray zone between what clearly is not an agent (prompting a model once) and what clearly is (say, an autonomous agent that, given high-level instructions, plans, uses tools, and carries out multiple, iterative steps of processing).
But, per OpenAI’s Noam Brown, OpenAI’s new o1 is a model, not a system.
You can just prompt the model once (in contrast to Ng) and the model will "think" before providing an answer. OpenAI explains (and nicely explained by Letitia)
Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working. This process dramatically improves the model’s ability to reason.
This reminds us of BabyAGI from last year but strictly speaking, Noam’s definition is in apparent contrast to Chase and Ng’s definitions, which seem to require an “agent” is a language model in a loop.
So which is it?
Maybe … it doesn’t matter.
AI models are on a path to incredible capabilities. Instead of spending time debating the technical details of what makes an agent, perhaps we should spend more time concerned with how AI works for us.
A real user agent
In 1998 Tim Berners Lee defined the User Agent in human terms (in reference to the more formal 1996 definition)
Browsers and Email programs are user agents. They are programs which act on behalf of, and represent, the user.
15 years later he submitted a refinement
When does software really act as an agent of the user? One way to think about this is that the program does, in each instance, exactly what the user would want it to do if asked specifically.
This comes in contrast to the common form posited by practitioners in AI e.g., Bill Gates
This type of software—something that responds to natural language and can accomplish many different tasks based on its knowledge of the user—is called an agent.
Tim Berners Lee definition appear to align more strongly with the psychologist form of agentic programs that
describe humans’ capacity to exercise control over their lives
Casting agents as those that do exactly what humans want seems like a more fertile path than describing AI technical capability, which could readily do all Gates describes yet violate the definition preferred by Berners Lee.
What an agent is has less to do with how a technology works, and everything to do with how a technology aligns itself to the wishes of those it works on behalf of.
Agent defined
Let’s follow Tim Berners Lee definition:
a user agent is a program that acts on behalf of, and represents, the user.
This program could be anything:
a python script
a browser
an AI model like claude-3.5-sonnet
chain of thought gpt-o1
or gpt-o1 in a loop.
An AI agent then, depending on its alignment with the definition above, could be just an agent. Whenever we refer to AI Agent, we do so under Berners Lee definition, just where the program happens to be AI-powered but does exactly what the user wants.
It may well be that your user agent is managed by a third party. For instance, browsers are readily user agents, and browsers are developed and distributed by third parties.
Private AI labs appear to endeavor to be user agents. So far, however, their focus on alignment has been on a general form, rather than alignment to individual users as University of New Mexico Prof. Geoffrey Miller put it. [A competitive market and low switching costs is a meta way to deliver user alignment.]
Perhaps obviously, there exist legitimate user agents that are not aligned to you. Cleary, another user’s user agent acts on behalf of that user, not you. Your sovereignty and user agent alignment does not trump that of another. The same could readily be said of a business.
This raises a problem for agents.
My context, my rules
Businesses have built services accessible only via their application interfaces, not APIs.
You can book an Uber on Uber.
You can book a trip on Airbnb or Expedia.
You can buy groceries at Walmart
We refer to these types of apps as Thick apps: those that have large inventories with variable and expensive to observe attributes. Retail, travel and real estate all satisfy this condition. These inventories are expensive to originate and maintain and these apps are opinionated about customer experience: how their inventories are maintained and made available to customers. Intermediated service provision is hard to make work.
An agent with AI then has two candidate interfaces into a thick app: authorized and unauthorized routes.
So far in internet history, we’ve engaged with apps through authorized routes. We send HTTP requests public-facing web servers respond to, and load the web application in our browser. How much of that web experience is constructed server-side v client-side (on our turf) is an evolving question (and an active area we’re exploring for its privacy implications).
An unauthorized route might be sending a service to scrape all available data from a business and pull it into an environment you own. We’ve been suspicious of those routes – they’re expensive, slow, not always reliable and can violate the agency and sovereignty of others. Clearly your agent doesn't have right to another person's context. You don’t have a carte blanche right to my context. Either way, Cloudflare is working on helping apps block this.
Users and apps can share context directly or through scraping. Neither is optimal.
While apps could expose an Agent API – services that make internal context legible to third parties e.g., as Jeremiah Owyang proposes – our bet is that isn’t how things play out. Entities shouldn’t have to reveal all internal context – which may readily have value in personalization or alignment efforts – in order to collaborate. The trust implications alone could raise concerns.
Instead, we’re bullish on controllable interfaces that
allow users to project outside context into app experiences (as in last week’s YMMVC)
allow apps to project app-owned services into other contexts
e.g., as in
Apple App Intents
Uber Deep Links
Apple App Intents make app functionality legible to iOS and Apple Intelligence.
Uber Deep Links allow third parties to start the Uber Ride Request Flow from anywhere.
These are real (and, in our view, elegant) commercial applications that enable app and context interoperability that don’t require uncontrolled exposure of private information or services.
Apps aren’t about to (and shouldn’t have to!) give up core functionality or brand equity to third party agents.
Let’s put this in context for AI agents.
AI Agents in action
Putting this in context for AI, we follow Bill Gates
Imagine that you want to plan a trip. A travel bot will identify hotels that fit your budget. An agent will know what time of year you’ll be traveling and, based on its knowledge about whether you always try a new destination or like to return to the same place repeatedly, it will be able to suggest locations. When asked, it will recommend things to do based on your interests and propensity for adventure, and it will book reservations at the types of restaurants you would enjoy.
So how will this work exactly?
Bill Gates is obviously not talking about a world governed by first party data – however managed by CDP or memory layer for AI. He’s talking about services that can safely integrate all our context.
Clearly we can't copy over our context to every app that asks. And we can’t ask apps to copy over all their context.
So what’s the path?
We’re betting on the YMMVC pattern from last week's blog: users allow an app to project AI-transformations of their private linked context onto the app for an agreed purpose.
Such a pattern is consistent with App Intents and Deep Links and also how we engage IRL.
When a florist asks “what’s the occasion?” your response is not “you don’t need to know that talk to my agent.” We say, “I’m buying flowers for my Mom’s birthday and she loves asiatic lilies.” Context flow is itself all contextual.
Is your AI agent allegiant?
Returning to the Berners Lee definition of agent
a user agent is a program that acts on behalf of, and represents, the user.
how do we know if an AI service with access to our context (e.g., Crosshatch) will act on our behalf (e.g., only satisfy requests corresponding to an agreed purpose) at the behest of another application?
Luckily, this is a canonical principal agent problem!
In classical principal-agent language,
a user “the principal”
wants an app “the agent”
to personalize their e.g., ‘travel’ user experience to the ends of a superior travel UX and not for another purpose.
The YMMVC Principal-Agent problem.
Of course, the agent could have other motives. So the question is, how do you construct a mechanism such that both the principal and the agent are better off?
Following Google economist Hal Varian's Monitoring Agents with Other Agents
The principal-agent literature typically assumes that the principal is unable to observe the characteristics or the actions of the agents whom they monitor. However, in reality, it is often not the case that agents' characteristics or effort levels are really unobservable; rather, they simply may be very costly to observe.
In particular, simply because information is costly to the principal doesn't mean that it is costly to everyone.
Varian studies the multiple agency problem wherein a principal can deploy an agent to monitor the performance of other agents. The monitor really has two available strategies for inducing desired behavior by the performing agent
The carrot: make good behavior less expensive
The Stick: make bad behavior more expensive
In the paper, he shows that
a principal prefers a monitor who can reduce the costs of desirable actions rather than increase the cost of undesirable actions.
This is what the YMMVC model does. With Crosshatch, users link their context to apps for a specified purpose or “usage policy.” Since the principal is free to use any monitor she wishes, the threat of competition can induce monitor compliance. That said, Varian does not treat the issue of monitor alignment in this paper.
In the YMMVC framework, Your Model can observe requests made by the application agent, and check to make sure that requests comply with the agreed purpose. Your Model agent could even flag or block third party agent requests that appear to violate your usage policy – this is something Azure OpenAI already does out of the box.
This pattern corresponds with the Varian framework wherein your Crosshatch agent performs as a monitoring agent for third parties that perform personalizing action on your (the “principal’s”) behalf.
[To see this in action – check out our docs! Public beta launch coming soon!]
Where does your agent live?
It'd be nice if your agent lived locally: All your data and your agent in a device you control.
The trouble is that the AI services you might wish to equip your agent with likely do not fit on your device. For instance, Apple extends the capability of Apple Intelligence with Private Cloud Compute, AI services that live in Apple's cloud, not locally. OpenAI services are available in Azure while Claude and Gemini services are available in GCP.
Further, there's value in having your data available on any device. That's why Apple shipped iCloud and CloudKit, per Apple's Nick Gillett at WWDC19
A belief that I have is that all of my data should be available to me on whatever device I have wherever I am in the world. …
And the data that we create on all of these devices is naturally trapped.
There’s no easy way to move it from one device to another without some kind of user interaction.
Now to solve this, we typically want to turn to cloud storage because it offers us the promise of moving data that’s on one device seamlessly and transparently to all the other devices we own.
And 2 in 3 iPhone users use iCloud – the promise and ease of data availability appears to resonate with most iPhone users. Apple keeps its Standard Data Protection – where it stores most encryption keys in the cloud for easier data recovery – as the default option. iCloud users may choose to "Optimize iPhone Storage" for Photos, keeping full-resolution photos in the cloud with smaller device-size versions on device.
The story of "local only" AI doesn't match its commercial implementations – with data, encryption keys, and AI – all living in the cloud.
More likely appears to be Gavin Baker's "local when you can, cloud when you must." Standard calls to Siri may not need cloud resources, but queries involving richer context may best be routed to the cloud, closer to where AI and data resources live.
We see user agents as Berners Lee appears to: a program
Any AI: today: claude, gemini, gpt models
With access to permissioned context
able to be directed by choice application agents exactly on your behalf.
The north star is maximal interoperability – AI and context – securely, projected on any surface you want.
Thin protocols
Fat protocols are crumbling. Where does value accrue in AI, even with portable data? Thin Protocols. Fat "Thick" Apps.
Scroll to read
10/12/2024
In 2016 Joel Monegro wrote the influential and controversial Fat Protocols thesis
The previous generation of shared protocols (TCP/IP, HTTP, SMTP, etc.) produced immeasurable amounts of value, but most of it got captured and re-aggregated on top at the applications layer, largely in the form of data (think Google, Facebook and so on). The Internet stack, in terms of how value is distributed, is composed of “thin” protocols and “fat” applications.
This relationship between protocols and applications is reversed in the blockchain application stack. Value concentrates at the shared protocol layer and only a fraction of that value is distributed along at the applications layer.
This argument appears entirely based on a 2016 empirical observations
The Bitcoin network has a $10B market cap yet the largest companies built on top are worth a few hundred million at best, and most are probably overvalued by “business fundamentals” standards.
For clarity, we say a protocol is a process that governs exchange or activation of energy, resources, or data, while an app is a controller and viewer of data, energy or space with a direct social or economic relationship with a customer.
Fat Protocols are crumbling
This narrative is under pressure today. Last week we saw tweets from Austin King
@howdymarry
And @kaledora
And even empirically, we’re starting to see apps make more revenue than underlying infrastructure.
Ok the Fat Protocol thesis is in question. Crosshatch is an integrations and security company using AI.
What does this have to do with AI?
AI is a protocol
AI and (values of!) web3 are on a collision course.
AI can be thought of as a candidate protocol for fetching intelligence. Riffing more, AI weights might be seen as a sort of ledger of things that have happened in the world and how they relate.
AI appears poised to be a centralizing force, while interests in privacy, control and interoperability hold power by their connection to the rights and interests of the individual, a decentralizing force.
AI can remix any content and potent AI resources are available everywhere (cloud, browser, and locally) so there’s a question of what gets remixed, by whom, and why.
Can the individual remix content across apps – is the app over? Or can apps remix an individual’s context? Or does all candidate functionality and context accrue to a large AI lab?
This question is fundamentally about where value will accrue in the age of AI. And the ideas we’ve long explored around users “logging in with their context” are ideas previously of crypto e.g., in the follow-on essay to Fat Protocols, Thin Applications
As a crypto user, you bring your own data. Nobody has monopoly control. When you log into a crypto app by connecting your wallet, you’re sharing the “keys” it needs to find your information across the relevant networks. You can share them with any app, so your data comes with you as you move from interface to interface
This sounds just like what AI people are saying today, e.g., A16Z’s Justine Moore.
Are we talking about AI or web3?? See Justine Moore’s thread last week on what would be unlocked if we could share AI activations of our data with our favorite apps.
Value in apps is private information
The thing is, and as many started discussing last week, we believe the Fat Protocol and Thin Applications thesis is wrong.
We introduce a new argument: the Fat Protocol thesis for AI is wrong because of inherent asymmetry at the interface between application and protocol.
Continuing from before, take a protocol to be software that accrues, permissions, and activates data. While protocols might manage and activate data, protocols are not privy to nor have rights to participate in the value generated by apps that use protocols.
That is, there is an information asymmetry between apps and protocols. Apps provide an interface that creates value with users. That interface is owned by the app, and has no obligation to (or incentive!) describe how a protocol drove value to the app or its users.
This situation is the Inverse Market For Lemons.
In the classical Market For Lemons – which won George Akerlof a Nobel Prize – buyers know less about the value of a good being sold than sellers. This leads to a market failure and a race to the bottom where high-quality goods leave the market because such sellers can’t sell their products for their true value.
In the inverse case, buyers are better informed than sellers. That is, apps know the value the app delivers to their users via the protocol, which (at least in web2 markets) is private information unknown to the protocol. Apps are disincentivized to share this value, as this could lead to price discrimination (and loss of their hard-won surplus!).
In this case, this leads to a race to the top with lower quality protocols falling out of the market, as protocols have to take what they can get, while higher quality or scaled protocols rise.
You can see this in analysis around AI-enabled “services as a software” businesses. “Services as a software” customers want to save time and money. AI-enabled B2B SaaS don’t know the value of their services to their customers – sharing value could give SaaS ammunition to raise prices! – and if these customers can save more by killing the AI-enabled SaaS arb by building themselves (e.g., as Klarna is), they probably will.
In June, we bet that value will accrue to “Fat” Thick Apps – those with large inventories of qualities expensive to discern. Our argument there rests on the impracticality of disintermediating Thick Apps, which would involve expensive, slow, unreliable and likely unconsented scraping and AI processing.
We believe Monegro was mistaken in both of his essays: in the age of AI (even with portable interoperable context!) there will be
Thick “Fat” Apps
Thin Protocols
[Thin Protocols means, among other things, that “intelligence too cheap to meter” actually means that large lab margins go to the cost of compute by the same information asymmetry argument, unless of course, they vertically integrate and bypass this protocol/app interface.]
Grow the pie
We’re building Crosshatch to become a Thin Protocol. We follow Fred Wilson: Be Generous is a good strategy (if you have sensible cost structure!) :
How can giving something away or letting others take value from you be good business?
It is all about zero sum thinking. If you think that the size of the pie is fixed, then you need to grab as much of it as you can. But if you are making a pie that can grow and grow and grow, you just take a small slice and let everyone else eat.
Accepting this asymmetry is also just practical. E.g., following Gokul Rajaram trying to set prices on metrics you don’t control is not recommended.
Crosshatch enables the world Monegro envisioned: users bring their user-agent
Secure AI resources
Permissioned data
Activated for a particular purpose
to turn on hyper-personalization in their favorite apps in a privacy preserving way. Crosshatch enables richer experiences at a lower cost for its affordance of all linked user context scaled by assurances of user control that only a protocol can offer. Following Monegro
Nobody has monopoly control. When you log into a crypto app by connecting your wallet, you’re sharing the “keys” it needs to find your information across the relevant networks. You can share them with any app, so your data comes with you as you move from interface to interface. And you keep control of the “private key” (basically a password) needed to operate on it.
The Fat Protocols thesis sizes "Work + Data" as constituting the largest part of value capture
Monegro's Cryptoservices Architecture: "Consumer applications in crypto/Web3 are independently built on top of multiple “composable” protocols using what we could call a cryptoservices architecture (like microservices, but with sovereign components)."
Particularly in the age of AI, the "work" of these "cryptoservices" could conceivably collapse to a single AI API call on models whose price has fallen 240x in the last 18 months. Combined with the information asymmetry at the interface between app and protocol, even under a "bring your own data" (that we're building!), we expect the value to look more like
In the age of AI, value accrues to Thick Apps and Thin Protocols.
Thick apps accrue value for their earned relationship with the customer. Thin (“skinny fat”) protocols accrue value by scale, network effects and trust.
The reason why apps are thick is not because they control all the context – when users can bring their own data to apps, users have the data gravity – but because of this information asymmetry: the value apps deliver to end-users is earned private information owned by the app not the protocol. Apps keep margins because they build trusted brands (often themselves with their own networks like eBay, Zillow, Spotify or Airbnb) that delight and save users time and money. Revenues from retail media networks could see a shift.
But normatively, this all feels right to us. Applications that directly create value should accrue value.
At Crosshatch, we’d be thrilled to help make Apps even better for lower costs.
All For You
The personalized internet needs a new data layer made from aggregating the aggregators. What Aggregation Theory predicts comes next.
Scroll to read
11/2/2024
In the age of AI, everything should be For You.
Not just TikTok. Or what you do in web3. Or the AI chat app you talk to most.
Every website should welcome us by name. Every piece of marketing should speak to our own interests and our own past purchases and whatever’s in our closet.
explained Adobe’s Scott Belsky last week in an interview with Bloomberg.
These are things that should be happening but aren’t and it’s because the data layer isn’t there.
Candidate data layers for this purpose are not well understood. We’ll study existing ones and then apply Stratechery’s Aggregation Theory to explore the new power dynamics when users become their own bitch and aggregate the aggregators.
Existing data layers
The existing data layers are either insufficient, bad (in quality or legal compliance) or inaccessible.
Onboardings
To solve the personalization challenge and bolster lacking first party data, some companies have started adding richer onboarding flows, inviting users to share more about themselves to get a more personalized experience. For instance, eBay has added a new discovery section (notably gated behind a login wall) that invites users to share a bit about their interests and specializes the experience based on these choices. This approach has the problem that the more extensive the onboarding, the more it churns. This places an effective limit on the power of onboardings.
DMPs
The leading data layer for this purpose are DMPs like Epsilon, Acxiom or Zeta Global, who have transaction and activity data on over 200 million Amercians. These data layers are hooked into enterprise CDPs. They’re rumored to source data from all the same places, with small differences on the margin (e.g., Zeta Global purchased Disqus in 2017).
The whole who’s who of Corporate America buy this data.
But this candidate data layer is under pressure for three reasons. First, growing privacy laws and actions by the FTC are limiting data broker data access and continued growth. Second, data broker data quality is on the decline. Last year we empirically noticed the decline in data broker data quality. Because data brokers source their (often snapshot) data in secret backroom deals and operate on identity that’s duct-taped together, their ability to stitch together data on individuals from multiple sources is tactically challenging at scale.
Finally, the effective death of cookies makes DMPs hard to use at the individual level. Brands have typically used DMPs hooked into CDPs as a means to wield additional context for individual users even for those who don’t log in. Brands used third party cookies and ID-bridging to identify users and then performed a lookup in the database to see if they could find data associated to that user identity. The death of cookies makes these practices harder.
Big Tech
Data aggregators like Apple, Meta, Google or Amazon could be candidates for this data layer but their data layers are private for really for their own exercise within their own private networks.
Four years ago – per Wayback Machine – you could use the Instagram Display API to access a user’s profile and posts. Last month Meta deprecated it.
There appears no API to access photos on iPhone though Apple syncs photos to iCloud for an estimated 2 in 3 Apple customers and holds encryption keys for most users.
Spotify has a developers API to allow developers to access users’ listening data and favorites but their Terms restrict using that data with an AI model – see Section IV(2)(a)(i).
Amazon has an API but it’s only for merchants and its terms prohibit the use of data extraction tools, reserving the right to terminate your account at its discretion.
Putting aside access, even the internet’s largest aggregators only see a slice of you.
For us, All For You means
All interfaces
Based on all context
All For You.
Existing data layers don't work. We need something new.
Aggregating the Aggregators
Aggregators aggregate a lot of things. Google aggregates information. Netflix aggregates content.
But they also aggregate data. Data is critical to securing aggregator power, in that it enables provision of protected discovery mechanisms to sort through the aggregated abundance of supply.
At Crosshatch, we believe this discovery mechanism is best owned by the user. The user is the final boss of data aggregation. The one with Data Gravity. Only the user has the Right To Access all the data collected about her. All other aggregators only know what the user does with them.
This also makes sense from a privacy perspective – third parties should only know what's "reasonably necessary and proportionate" to their use case anyway.
Aggregator Theory
Ben Thompson introduces Aggregator Theory via Clay Christensen’s Law of Conservation of Modularity.
Commoditizing an incumbent’s integration allows a new entrant to create new integrations — and profit — elsewhere in the value chain.
Now consider the case where users aggregate the aggregators: aggregating all their context and AI; and permissioning it to connecting applications.
Previously apps integrated logic and intelligence. User context was modularized, collected through interactions logging or onboarding flows.
But in a case where users aggregate their context and permission their AI Agent – AI and context to be used for a particular purpose – the App becomes modularized. Thompson explains
More broadly, breaking up a formerly integrated system — commoditizing and modularizing it — destroys incumbent value while simultaneously allowing a new entrant to integrate a different part of the value chain and thus capture new value.
This is poised to flip power dynamics of the internet back to the distributor (or even the supplier, who might soon wish to control her own distribution). Thompson explains in an earlier piece
The value chain for any given consumer market is divided into three parts: suppliers, distributors, and consumers/users. The best way to make outsize profits in any of these markets is to either gain a horizontal monopoly in one of the three parts or to integrate two of the parts such that you have a competitive advantage in delivering a vertical solution.
The biggest companies of the internet era came to become horizontal monopolies controlling supply via a virtuous cycleof ever better user experience. And while the internet initially made distribution free (pressuring the pre-internet controls on distribution), today distribution is expensive – the CAC is too damn high.
This is why we’re seeing every major consumer internet company lean into loyalty as a pathway to reduce distribution costs.
And when users can aggregate their own data and AI – be their own bitch – so that incredible personalized experiences are available to any app, power dynamics shift.
While the internet era commodified supply and fed off a flywheel of ever improving user experience through data and proprietary algorithms, an era where consumers
Aggregate the aggregators
Be their own bitch
Bring their own data and AI to their favorite apps
the supply-side gains power.
Ben Thompson continued, before the internet era
incumbents, such as newspapers, book publishers, networks, taxi companies, and hoteliers, all of whom integrated backwards, lose value in favor of aggregators who aggregate modularized suppliers — which they often don’t pay for — to consumers/users with whom they have an exclusive relationship at scale.
But now, consumers aggregating the aggregators, previously candidate aggregators become modularized.
The rise of taste
As we wrote last week, we do not believe this translates into the same value capture as in the internet era. Apps are thick while protocols that deliver personalized AI outputs to applications are thin.
We’re excited to help support this shift back in power to tasteful upstarts or luxurious software who are creating unique supply and personalized experiences based on all the context, and who keep distribution not for their spend on expensive ads, but because consumers just love what they make.
We believe this offers tremendous opportunities for suppliers with taste, who under this new paradigm can offer the same or better user experience as the aggregators, and develop 1-1 relationships with customers who love what they create, without relationships or customer acquisition mediated by aggregators. To succeed in this new paradigm, suppliers should invest in creating great experiences augmented by complete user-provided context, e.g., creating web and mobile experiences that
Invite users to log in with their own data
Greet customers by name
Recommend products or services based on user context
With more context than any individual aggregator, power returns to those with unique or opinionated supply, who can choose how and where to distribute.
With AI and user context commodified, it offers a chance to principally focus on shipping great user experiences and products that only your company can create.
Here’s to the rise of rebels and challengers and the emergence of an internet that’s All For You.
If you want to see what’s possible when you let users log in with all their context – check out our docs or start building.
Proactive AI
Previewing Crosshatch Webhooks, extending our API to enable any service to take proactive action based on complete user context.
Scroll to read
11/10/2024
Last month, Sarah Guo joined OpenAI and Anthropic Chief Product Officers’ Kevin Weil and Mike Krieger to talk product in the age of AI and what we can expect in the near future.
Anthropic’s Mike Krieger shared excitement for proactive AI:
Once [AI] knows about you and they're … reading your email in a good but not creepy way … and then they spot an interesting trend, or you start your day with something that's a proactive recap of what's going on, some conversations you're going to have.
- “I pre-did some research for you.”
- “Hey, your next meeting is coming up. Here's what you might want to talk about.”
- “I saw you have this presentation coming up. Here's the first draft that I put together.
That kind of proactivity, I think, is going to be really, really powerful.
We agree. We want to give proactive AI superpowers to every service on the internet.
To us, proactive AI takes action without being asked
“I saw you have a meeting at 3p. Should I move your 3.30p?”
NOT “each morning send me my to-do’s”
To be proactive and actually helpful, however, you need the full context. Proactive AI isn’t gated by powerful AI, but rather a data layer that has up-to-date context.
Traditional data layers – like Typeform onboardings or progressive profiling – don’t cut it.
Proactive AI needs context to be helpful
If you’re going to take proactive action on behalf of users, it better save them time.
Last month we studied the history of the reactive web: how the web went from
static
Reacting to the outside world
Reacting to changes inside an application
and soon, reacting to state sourced from across applications.
Web reactivity has canonically operated under a pull model with requests originating from the client:
User navigates to a page
Page loads with all known state information
The reactive web has lower stakes in that users are navigating to a service, and the server reacts with the most up-to-date information. The server is responding to client action.
Proactive AI follows a push model with requests originating from the server. On what basis might a server know to messages to the client?
Typically, context for server proactivity has come from the client: The server accumulates user context by watching how the client engages with the application.
This usually takes the form of onboarding surveys or customer data platform-based profiling. This sort of data collection is challenging because it adds friction to the user experience, goes stale fast, or is not rich enough to fully describe user needs. If you’re going to send a message to the client, you should probably have some reason to do so.
Under first party data, without recent client interaction, how would you know now is the right time to send a message?
Today’s proactive AI disappoints
These questions have made proactive AI difficult to ship: We’re already overloaded with information. We don’t want AI to add more noise to our lives. To save us time, AI needs the full context so it can react to events in our lives without us having to explicitly poll for it.
For instance, Amazon launched Subscribe & Save in 2007, allowing users to subscribe to products delivered on a set cadence. You can get paper towel every 2 weeks or 2 months or toothbrushes every 4 months.
Amazon doesn’t break out Subscribe & Save revenue in its earnings. While ostensibly time saving, customers have complained about unexpected price changes and deliveries not matching the rate they actually need replenishment.
In 2019, Walmart planned to release Smart Reorder
What if we could forget the groceries, on purpose? Introducing Walmart Smart Reorder. it learns your grocery habits and delivers them for you.
Smart Reorder never shipped. Putting household essentials on autopilot needs more context, not more AI.
In 2021, Twilio Segment launched Journeys, a way for marketers to build audiences and take proactive action based on site activity.
Of course, Journeys is only based on first-party context. Particularly with AI, activating context like
No purchase in last 3 months
Viewed site last month
doesn’t feel particularly personal or proactive. Most of our email inboxes classify this sort of outreach as spam.
Proactive personalized outreach or action clearly needs more context to actually provide value to users.
Proactive AI with Full Context
With the right context, proactive AI is poised to save us time in work and in life.
Following Krieger, with our work email, proactive AI could prompt us with a morning brief, big ideas to think about for the week, or suggestions to optimize our schedule. Google’s Notebook LM is giving us a glimpse into how this could work.
In our personal life, Proactive AI could help us
save time staying in stock of household essentials
find out about events we’ll love on a night we’re free
Suggest a restaurant to check out with a friend we haven’t seen in a while
This sort of Proactive AI is not possible without a new data layer.
Previewing Crosshatch Webhooks
In Q1 next year – we plan to launch Webhooks, a safer way for users to let services take action based on cross-application context.
Subscribe to a user’s online orders to provide pickup services. See updates to a user’s calendar to see if they’ll need more treats for an additional basketball practice.
Webhooks expand Crosshatch’s existing client-initiated pull-based personalization to proactive, server-based personalization, but where the server is user-owned.
This allows users to aggregate context from across applications, from
purchases or reservations in email or plaid
runs in strava or whoop
likes in apple music
posts in reddit
and enable apps to subscribe to changes so that AI can transform that context into valuable time-saving action.
We ship webhooks as an API and not a closed service because we believe users should be able to activate their context in pull and push contexts anywhere on the internet, not just certain privileged AI services.
If you’d like to learn more about our webhooks release, reach out!
Adaptive interfaces
Expanding the app context window with serverless, user-owned context and AI. Introducing our personalized inference provider for Vercel's AI SDK.
Scroll to read
11/15/2024
Today we launched a personalized inference provider for Vercel’s incredible AI SDK.
Vercel’s AI SDK enables developers to generate UI like
chat
dynamic carousels
reranked elements
relevant tools like weather or maps
typically in response to unstructured inputs.
Developers love the Vercel’s AI SDK.
Indeed, we envision a future where every interface is personalized to us individually
The internet’s version of rolling out the red carpet
So making a secure personalized inference provider for Vercel’s Typescript SDK is a natural step.
We’re really excited about the possibilities this unlocks.
App context windows are limited
App user “context windows” are constrained.
To be clear, that’s probably a good thing. We don’t need a panopticon where every app knows everything about us.
And practically app context windows are constrained not because the AI models they use are limited. It’s rather because they just don’t have that much context.
App interfaces into user context are just a function of app-induced user labor
A quick search
Some clicks and scrolls
A purchase that one time
Adaptive interfaces won’t be powered by first party data.
They need a new data layer. We interact with applications across the internet. [Our full context probably resists digital representation.] But our interactions scattered across the internet represent a lot of it.
Apps as permissioned AI views of context
With Crosshatch, developers specify views into the user’s complete context
Recent restaurant reservations
Hotel spend over the past two years
Runs for the last three months
and instruct a user’s AI how to
interpret
view
activate with (private?) tools
that context to be in the app view.
Based on the restaurants I reserved recently, where might AI suggest we stay in Paris? We should add hotel context too! You can do that by adding another hotels query object.
That is, with Crosshatch application behavior can vary independent of user interaction with a given application!
Of course, some of the most exciting applications require coordination inter or intra users or businesses
Scheduling
Product / place recs
where we see developers adding their own context to Crosshatch calls let AI in Crosshatch-secured VPC mediate context
Users
Developers
share.
In these applications, Crosshatch acts as a trusted compute layer between multiple parties, enforcing respective egress requirements.
How Crosshatch keeps context secure
You can learn more about our security model in our docs.
For most applications, users link data under an “Access while using” model. Apps can only access our context while we’re on the site. We execute this via short-lived single-use access tokens available only while users are on the linked web app.
For our forthcoming Webhooks feature, the permission model follows Apple’s “Always allow” permission setting. Here users can permission apps to subscribe to AI views of changing context. This is useful for logistics companies who help users save time based on changing underlying context.
Context sharing occurs under a Crosshatch-operationalized “contextual integrity” framework – where data is agreed to be shared for a particular purpose.
We log calls for ex-post verification that calls satisfy sharing agreements. Next year we expect to make this real-time.
Though we’ve seen Apple, Google, and Bill Gurley express contempt against cross-app context sharing, we’re encouraged both by first principles and even Apple movement to the contrary.
Toward adaptive interfaces
AI unlocks lower-friction interfaces to our context.
Instead of filtering inventories
> gender: men
> clothing: sweaters
> size: L
> color: grey
we can now just search in natural language.
We’re excited by
“chat to interact”
interfaces, but we’re most excited by interfaces that vibe to us – interfaces that use complete user context – to deliver experiences that delight and save us time and money.
If you’d like to start building adaptive interfaces in your app, create a free developer account or jump on our onboarding form. We’d be thrilled to chat.
Authorized agents
The case for AI-native auth with flexible purpose- and time-limited scopes on context.
Scroll to read
11/24/2024
Does AI itself need auth? What kind? Can vanilla auth work?
To answer these questions, we need to understand how and where AI might operate, and to do that, we need to study applications.
The apps that thrive in AI
A year ago this month Bill Gates wrote “AI is about to completely change how you use computers”
In the next five years … you won’t have to use different apps for different tasks. You’ll simply tell your device, in everyday language, what you want to do. And depending on how much information you choose to share with it, the software will be able to respond personally because it will have a rich understanding of your life. In the near future, anyone who’s online will be able to have a personal assistant powered by artificial intelligence that’s far beyond today’s technology.
While the model layer appears on a path to commoditization, there’s growing interest in the application layer. But per Bill, in five years the application layer won’t exist. Which is it?
It appears the application layer is on its way to more natural interfaces. Instead of clicking through menus we’ll just ask for what we want.
But then you might wonder – what exactly is an app in this world? If app functionality succumbs to language interfaces, you might start to wonder if or why you’re using an app to begin with. At this moment, are we actually in the world Bill Gates imagines?
To us, an application is no less than a private collection of services and context.
Walmart sells low-priced inventory available at conveniently located stores
Airbnb sells trusted home-rentals
The NYT sells all the news that’s fit to print
To achieve the world Gates describes, you’d need reliable interfaces into these services (e.g., APIs).
But such interfaces, at least today, are contrary to apparent business interests. Airbnb wants to own the user experience end-to-end. Just 1 month ago Airbnb was hiring for its Anti-Bots Engineering team: Airbnb doesn’t want bots mediating its carefully designed user experience. Walmart has an API but only for sellers.
Some apps like eBay or Kroger have APIs that enable developers to create new storefronts into their marketplaces. Despite this and the incredible capability of today’s AI, we haven’t seen new AI services built on top of these APIs. [How would these services make money? What would they deliver that Kroger itself doesn’t?]
Providing services when you don’t control the user experience end-to-end is just hard. It’s why restaurants complain about delivery services, or SaaS companies complain about bring-your-own-cloud deployments. It’s hard to optimize and easy for things to slip through the cracks when responsibility boundaries are not clear.
So to us – and as we’ve written before – the app isn’t dead. Intermediating aggregators without trusted brands or durable attention might be. But services that produce goods and services that people love may credibly fight to own the user experience.
It’s why OpenAI discontinued its ChatGPT Plugins service and has reportedly sought deals with Redfin, Conde Nast, and Eventbrite to power search within their respective platforms. Personal AI bots intermediating all our interactions with the entire internet might not be practical or desired.
So if apps like Redfin, Conde Nast and Eventbrite are Thick Apps here to stay, we have our motivation for how AI services might interact with apps.
Operating system level services might direct users to particular applications with captured supply or trusted experiences (e.g., via Apple App Intents!), but users will engage applications with AI powering them.
AI will be embedded in apps, enabling simpler and more streamlined interfaces into inventory, content and services. Our user agent will selectively share transformations of our context with the applications we use. For that we’ll need auth, but existing vanilla auth isn’t enough.
Tacked on auth won't work
Last month Auth0 announced Auth for Generative AI applications. It reportedly powers four use cases
Standard auth
Calling APIs on behalf of users
RAG with fine-grained access control
Human in the loop (unclear what this means or why Auth0 is involved here)
We’ll restrict our attention to items 2 and 3 as they have the clearest and most well-defined AI application. Standard auth is not AI-specific and async human in the loop auth likely needs more details to allow comment.
Auth0 shares a demo application. Studying its details, it quickly becomes apparent that (2) is strictly downstream of scopes available in connecting applications. For instance, the demo allows users to link their calendar to check if calendar availability matches that for an example upcoming company event. Auth0 says that this integration
will only check free/busy availability, not get full calendar access. This showcases the Google Calendar API integration, while minimizing information disclosure necessary for a demo app.
There are a few problems here, however.
First, the minimization of information disclosure is entirely downstream of the connecting application's available sharing scopes. Google Calendar happens to have a
https://www.googleapis.com/auth/calendar.freebusy
scope, but this is more luck than design. Many applications (like Gmail e.g., emails containing itemized receipts) contain data users would like to selectively share but lack appropriately granular scopes. This creates a false choice: either share everything or nothing at all. A truly AI-native approach would let apps request exactly what they need ("Are you free next Tuesday at 2pm?") rather than being constrained by pre-defined API scopes.
Second, the temporal nature of sharing is binary and inflexible. While the amount of information is constrained to this Google-provided scope, once you share the calendar.freebusy scope, it's shared permanently - there's no concept of temporary or purpose-limited access. Consider how this contrasts with modern location sharing:
Apple lets users share location:
Never
While using
Always
But with standard auth we only have:
Never
Always
Apple allows us to share our precise location with Uber for a clear purpose and time-bound access.
This lack of temporal control is particularly problematic for AI applications, which often need temporary access to fulfill specific, time-bound tasks. For example, a travel planning AI might need calendar access only while booking a trip, not indefinitely.
Finally, and perhaps most critically for AI applications, the current approach fundamentally overshares data even when attempting to minimize it. In the calendar example, the application only needs to answer a simple question: "Is the user available for this specific event?" Instead, it receives broad access to free/busy status - far more information than required. This pattern of oversharing becomes even more problematic as AI applications handle increasingly sensitive personal data.
These limitations become especially acute when we consider how thick apps might want to integrate AI. A service like Airbnb or Walmart might want to provide AI-powered personalization, but current auth patterns would force them to choose between requesting broad data access – risking user privacy – or forgoing rich personalization entirely. [First party data doesn’t cut it for AI.] This explains why many thick apps remain hesitant to fully embrace AI integration - the auth and integration patterns simply haven't caught up to their needs.
This is why we need Authorized Agents. Rather than retrofitting traditional OAuth patterns, we need an auth system designed specifically for AI's unique requirements around context, temporality, and granular data access.
The promise of Authorized Agents
AI is enabling more streamlined and delightful access to products and services we love.
To make this even more seamless, these applications need access to our context.
But we shouldn’t have to share everything to make this possible. What we share should be context specific, not given by whatever existing application scopes happen to exist. And if applications are using AI to transform our context to provide us a service, do we have to share our context to them so that they can transform it, or can they call an AI resource users own – our AI User Agent – to just get the answer they need e.g., if a candidate event fits in our calendar.
This is the case for Authorized Agents. Auth for AI cannot be tacked on. It needs to be tightly integrated with AI and the context it operates. When AI is embedded into applications, we need our AI User Agent to operate our context while enabling the best experiences in apps we use.
When searching for travel, a travel app should call our AI User Agent to refine a list of candidate hotels that best align to us. When shopping for groceries, our AI User Agent should help grocers deliver us the household essentials that best align with our schedules and evolving tastes. These use-cases don't require full data sharing, just a call to our Authorized Agent.
This interaction pattern requires an auth that allows users to link their AI User Agent to applications with
Sharing purpose
Specific context scopes
Time-bound access: Link once, while using, or always?
Vanilla auth doesn’t cut it. The future needs Authorized Agents.
Solving AI’s big privacy questions
Last year Bill Gates set forth the big privacy questions for AI.
As all of this comes together, the issues of online privacy and security will become even more urgent than they already are. You’ll want to be able to decide what information the agent has access to, so you’re confident that your data is shared with only people and companies you choose.
But who owns the data you share with your agent, and how do you ensure that it’s being used appropriately? No one wants to start getting ads related to something they told their therapist agent. Can law enforcement use your agent as evidence against you? When will your agent refuse to do something that could be harmful to you or someone else? Who picks the values that are built into agents.
There’s also the question of how much information your agent should share. Suppose you want to see a friend: If your agent talks to theirs, you don’t want it to say, "Oh, she’s seeing other friends on Tuesday and doesn’t want to include you.” And if your agent helps you write emails for work, it will need to know that it shouldn’t use personal information about you or proprietary data from a previous job.
These questions are simplified when users own their AI User Agent. When the AI User Agent is, per Tim Berners Lee
a program [that] does, in each instance, exactly what the user would want it to do if asked specifically
Now, when Gates asks: 'Who owns the data you share with your agent?' with Authorized Agents, users own both their data and their AI agent. The agent acts as a trusted intermediary, transforming private context into purpose-specific insights that apps need. Authorized agents ensure context stays bounded by purpose and time. And since Authorized Agents have AI inside, linked applications can enable users to
'share just my work availability' or 'only high-level schedule info'
rather than being bound by whatever API scopes exist."
AI privacy questions find natural resolution in Authorized Agents.
Authorized agents for all
Crosshatch provides an Authorized Agent to every user. We do this by decoupling sharing scopes from apps. In its place, we create a new data model with its own more flexible and natural sharing scopes, creating a single pane of glass for controlling links to applications. Linking applications can access agent-transformed context ‘while-using’ and ‘always’ for reasons always clearly communicated.
Thick apps like Airbnb and Walmart will persist because they provide trusted services and curated experiences. But to deliver the personalized future Gates envisions, they need safer ways to access user context. Traditional auth forces a false choice:
Share everything (via broad API scopes)
Share nothing (preserve privacy but lose personalization)
Authorized Agents – provide a third way - letting thick apps access AI-mediated insights from user context without requiring direct access to that context. This preserves both:
App control over user experience
User control over personal context
If you'd like a demo of how all this works, reach out!
[Authorized agents may also take action on behalf of users – we'll consider that in a future post.]
Say less
Supercharge search with personalized re-ranking that adapts to user context without needing to be told.
Scroll to read
12/1/2024
AI makes it easier for consumers to discover more relevant people, places and things.
Instead of going through menus, choosing options from filters, AI allows us to just describe what we want.
This is unlocking a wave of incredible chat to search features in retailers like JCrew and Rent the Runway.
Instead of choosing a category, just type what you want: Perhaps you have a work conference on Miami Beach this winter?
While AI has little value to add in “lower funnel” search (e.g., “Aesop Reverence Hand Wash”), many expect it to rise in research and discovery or upper and middle funnel search, respectively. Walmart describes this in the forthcoming AI search release.
For example, a parent planning a birthday party for a child that loves unicorns. Instead of multiple searches for unicorn-themed balloons, napkins, streamers, etc., the parent can simply ask the question “Help me plan a unicorn-themed party for my daughter.”
We’re excited by these more expressive interfaces for search – helping us edit our now expansive digital access.
But we see the state of AI much as the internet was in 1995. We’re struck by a 2001 NYT telling of it
At that moment in Web history, every visit to a site was like the first, with no automatic way to record that a visitor had dropped by before. Any commercial transaction would have to be handled from start to finish in one visit, and visitors would have to work their way through the same clicks again and again; it was like visiting a store where the shopkeeper had amnesia.
Despite its more natural interface, our AI-powered interactions on sites have amnesia. AI services don’t remember us. They remember what we do in our limited interactions on one surface, but require that we work our way through the same context over and over just so we can be understood.
The solution called for each Web site's computer to place a small file on each visitor's machine that would track what the visitor's computer did at that site. [Netscape engineer] LouMontulli called his new technology a ''persistent client state object.”
We have a similar approach. We allow users to link their personal AI user agent – Crosshatch! – to their favorite apps in a few taps. Rather than work our way through the same context to each site we go to, we just tap “Continue with Crosshatch” so that our favorite sites can reference this context all automatically.
AI search has many technical implementations, including
Classical (e.g., Elastic) with AI-generated metadata
Generating related searches and using existing search infra
Using retrieval augmented generation with LLM reranking
Re-ranking with large language models like GPT-4o-mini or Gemini 1.5 Flash perform well and at unit economics that can beat purpose-built solutions.
Add personalized re-ranking to your app
In this blog, we implement LLM-powered re-ranking, but allow users to bring their own context, in this case, itemized summaries of the user’s most recent past 25 purchases so that brands can deliver a search that’s personalized to the user, without the user having to repeat themselves (like we did in 1995).
Re-rank a list of products PRODUCTS_METADATA and render the list using Vercel's AI SDK.
This works just like typical re-ranking – here with the LLM gpt-4o-mini, but in the Crosshatch case we enable brands and users to privately activate linked context in order to get results that adapt to all user context, not just that provided in the search bar.
The output of this call is the re-ranked list, but it operates on the user’s past purchases. This activation is privacy preserving in that the only output was the re-ranked list, even though the AI call had access to the user’s permissioned purchase history.
This type of data activation is convenient, transparent and privacy preserving, in that it requires no infrastructure to set up, proxies popular commercial AI models, and happens privately in user-governed AI infrastructure.
So while we’re excited about chat to shop interfaces, we think their final form is more adaptive, adapting to the full context that users link, not just what they might share in a search box.
To get started
Create a free sandbox account on Crosshatch
Check out our docs
Reach out for the full re-ranking source code (in our case we performed re-ranking on eBay's catalog)
AI cookie
Enterprise have AI applications that run on all their data. What about consumers?
Scroll to read
12/7/2024
Last week Wired magazine interviewed Brian Chesky on AI. Brian explained
Up until 2 years ago very few people were talking about AI and then of course ChatGPT launched … and while it’s been so exciting, I bet you … your daily life has[n’t] been any different because of AI.
Here’s a test: take your phone out; look at all the apps on your home screen and ask what of them have changed because of generative AI? … I bet you none of the apps you use – including Airbnb – are fundamentally different in a world of AI. What did you do that was changed fundamentally because of AI?
We agree. Despite the incredible progress in AI – not much has changed for consumers.
We're disappointed but not surprised.
We started Crosshatch as a bet that – aside from AI native apps like chat – consumer AI wouldn’t be that interesting unless there was a new data layer for personalized applications.
The logic is simple: Personalization is downstream of context.
Enterprise get it. They have AI apps that run on all their data. What about consumer?
Toward apps with complete context
Personalization is downstream of context.
Without complete context, you don’t have meaningfully adaptive applications. It’s true we have great new tooling like Vercel’s AI SDK – but without dynamic context there’s little motivation for generative interfaces.
The history of this is straightforward: Before language models, personalization capability was given by data infrastructure and engineering sophistication.
- Are you tracking all data?
- Is it well organized and accessible?
- Have you hired talented machine learning engineers to transform it into models?
But now, personalization quality is given entirely by access to relevant context.
People are adding blood test results to Claude Projects to get personalized health advice. You wouldn’t ask Claude these same questions with only partial access!
But that’s what we’re effectively doing with internet applications under a first-party data regime.
Today, businesses ship personalization based on the context they’ve happened to collect about us. This follows 20 years of business cannon. If businesses can build bigger Context Moats than their competitors, they can make it more expensive for consumers to switch to a competitor.
This is a good strategy for corporate boards but limits user experience.
But you wouldn’t use an AI without all the context. Why would you use an application with only a slice? [Businessessurely do not.]
But that’s where we are today. Every app is riding the old strategy of
Collect more data
Organize and activate it better than your competitors
Induce switching costs
The trouble is, this data just isn’t enough for AI. The first party data strategy of the mobile and cloud era breaks in the age of AI.
Personalization just isn’t a prize that can be won on first party data alone. Shipping natural language search doesn’t make your search personalized.
The end-state is applications and interfaces that adapt dynamically to context that they might not have directly observed.
Enterprise know they need applications that run on all the context. Why don’t we demand it for consumer?
Enterprise leading consumer
In the mobile era, consumer lead enterprise. In the intelligence era, it seems enterprise could lead consumer. 2024 was the year of enterprise AI. Will 2025 be the year of consumer?
Today’s enterprise expect to be able to run AI on all their data. It’s a clear enduring trend. While it’s swept enterprise, we believe it’s only a matter of time before it arrives for consumers. You can’t escape data gravity.
The enterprise case is simpler, however, because enterprise own all their data. The enterprise is the Data Controller. There’s no third party (or at least enterprise can demand Bring Your Own Cloud deployments).
Consumer internet applications make this complicated, as applications are hosted by a third party Data Controller, who typically has ‘worldwide’ ‘irrevocable’ rights to do whatever it wants with data it accesses.
Every app hasn’t changed as Chesky describes because every app probably can’t have all our data because that could be a privacy nightmare for everybody.
As we’ve written at length, AI can make this simpler. Previously everyone built their own machine learning modelson their own hard-won first-party context. But now “in-context learning”
just adding user context with a few examples to a prompt
is poised to work just as well, for cheaper both absolutely and on the margins. Our previous modeling of
people like this tend to do that
people with these features tend to act in a certain way
via feature engineering and machine learning models looks incredibly old fashioned and naive in the age of intelligence. This prior modeling slices the world up into legible, consistent and dense features while the world around it is unstructured, ambiguous and often sparse.
As an aside, this all is happening while the traditional data-driven “Your Data. Your AI” narratives that powered the rise of cloud are collapsing. The first party data that would power our promised e.g., adaptive commerce experiences won’t live up to expectations (cf Mode’s Benn Stancil). Nike made itself “data driven” above all else and its stock dove into the ground. [Taste rules the world!]
And this reckoning has been coming for a while
No matter how good the tools are that clean and analyze it, how skilled the engineers are who are working on it, or how mature the culture is around it, [first party data] never burn as hot or as bright [as the data of companies like Google, Amazon, or Netflix, whose datasets are inherently rich and high-leverage].
wrote Mode’s Benn Stancil now two years ago!
“In-context learning” makes it so that the personalization task can rather be outsourced to a capable LLM, context and a well-chosen prompt.
This means that consumers have a path to “AI with all your data” on interfaces that they don’t own.
The consumer case is more challenging than enterprise, in that in the enterprise case the enterprise owns all their data. An enterprise can demand that any AI application that unites and activates data runs on their servers e.g., via Bring Your Own Cloud deployments.
The consumer – who may not wish to bother with locally hosting such services – will likely rely on a third party to manage auth, data syncing, and AI on her behalf.
This architectural pattern - where users bring their own context and AI to web applications - demands new metaphors. It's like having a personal agent (Call my agent!) that mediates your digital interactions.
Or, more precisely, it's an inversion of the traditional cookie model: instead of identifying you to servers, it authenticates servers to your AI agent.
AI cookie
Let’s say an AI cookie is an identifier dropped on the browser or in the app that enables permissioned access to a user’s AI agent resource with all permissioned context.
Cookies originally used to identify clients to servers to enable user data tracking in a first-party CDP or third-party DMP or DCR (all of whom have incomplete or incorrect data on users) are better used to identify servers to user-owned personal AI agents that have all the context.
While personal AI agents introduce candidate points of failure in a network, they’re poised to introduce new privacy dynamics that could resolve challenges of the early internet.
For one, the golden rule of cookies was never honored. It appears entirely plausible that the same-origin policy actually incentivized more unconsented tracking than it stopped. This is because it never addressed the underlying incentives that strictly rewards relevance over privacy.
We’re confident AI cookies will increase privacy by directly addressing the incentives problem: when users can bring their own data and AI to applications, other avenues for data collection pale in comparison to the user’s own personal AI. Data gravity is real.
Why use an expensive DMP or DCR when you can just ask the user’s AI agent?
It’s true this new AI cookie “YMMVC” pattern prompts new questions about the nature of applications
what is an app beyond a view and controller of AI-mediated context
though we remain confident that
Apps are fat
Protocols are thin
Moving to this new paradigm will only increase consumer and business surplus.
Consumer apps with all the context
We’re excited to see the tide begin to move in this direction. Two weeks ago, Anthropic shipped its Model Context Protocol, to enable, among other things, applications to sample LLM completions of private context.
Anthropic's Model Context Protocol shows how this might work for developers, starting with developer-owned resources. But consumer applications present unique challenges. While enterprise MCP servers operate within single organizations, consumer context requires more sophisticated governance as it requires (e.g., per privacy law) purpose-limited permissioned access to LLM sampling – all with streamlined UX. Crosshatch provides this complete infrastructure - from data integration to privacy controls - making AI cookies practically possible.
Developers have been thrilled with MCP and how it’s enabled easy access to outside context.
Today, when you look at your phone, you see apps that know only fragments of who you are. Tomorrow, with user-controlled AI and complete context, every interface becomes uniquely yours.
The enterprise already has this future. It's time consumers did too.
If you’d like to see how you can use Crosshatch’s “AI cookie” for adaptive interfaces or proactive AI - reach out!
Personal MCP server
Model Context Protocol (MCP) allows users to bring tools and context to desktop AI clients. What about web clients?
Scroll to read
12/15/2024
AI capability and utility is downstream of context.
This simple truth has led every application to build its own fragmented infrastructure of data integrations, CDPs, and onboarding forms – all to collect and integrate context into their application.
Last month, Anthropic released the Model Context Protocol (MCP) to change this pattern.
What is MCP?
MCP is an open protocol that standardizes how applications provide context to LLMs.
Using supported local client applications like
Anthropic Claude desktop
Code editor Zed
Code assistant Continue
developers can contribute MCP servers that you can host locally to interact with context or tools like
Google Drive
Google Maps
Slack
Filesystem
or Postgres
MCP servers provide MCP clients a well-defined and controlled interface to interact with underlying integrations. For instance you can give Claude Desktop the ability to
List channels slack_list_messages
Post messages slack_post_message
or in Google Maps
Search for a place search_places
Get driving directions get_directions
or in your local filesystem
Get files in a directory list_directory
Search for files returning all matches search_files
MCP server configurations provide MCP clients tool
Descriptions: so clients know what they are and when to use them
Interfaces: so clients know how to use the tool
This enables client interactions with underlying tool and context sources without directly managing auth or otherwise underscoped interfaces.
MCP architecture.
The best AI applications are equipped with relevant tools and context. So far, developers have had to roll integrations themselves: this open protocol makes equipping supported MCP clients with these tools easy.
Indeed, pairing LLMs with enterprise data is a core theme for AI: an open protocol is poised to accelerate it.
Developers find the feature to be incredibly powerful.
Instead of equipping Claude Projects with context manually, you can let Claude just search your filesystem and return LLM completions of it.
MCP Tutorial
So far, these examples have centered on interactions where the client and the server are self-hosted or trusted. For instance, your client sending a search request to Google Maps is fine because it’s both
Consented: the client checks with the user before invoking the tool
Trusted: your use of e.g., Google Maps operates under a license agreement you’ve already agreed to with Google
But what about cases where both the client and server are owned by someone else?
MCP for Consumer Web Applications
Consumer web or mobile applications may also want to equip themselves with tools and context – particularly yourcontext.
This case is different than those we’ve considered above: here you’re engaging with a
Applications hosted by third parties
Who may wish to access your context (not just context they own)
e.g., to provide a more personalized experience. Examples include a
Walmart or Target who’d like to save you time and money staying in stock
DoorDash or Uber who’d like to schedule food and cars to fit your schedule and taste
Dating apps to match you with users with similar interests
TripAdvisor or Booking to match you with travel that matches your tastes
In these cases, client applications are hosted by third party businesses and may wish to provide AI-powered services that reference your data – your past purchases, reservations, favorite songs, etc.,
Today’s MCP doesn’t handle this case: the client is local and trusted. You configure MCP servers with authentication keys that, as a developer, you manage yourself.
But what about this case where the client is owned by someone else? And when you don’t really want to bother managing your own authentication keys?
Enter LLM Sampling
Anthropic’s MCP introduces a concept of Sampling
Sampling is a powerful MCP feature that allows servers to request LLM completions through the client, enabling sophisticated agentic behaviors while maintaining security and privacy.
Anthropic shares no real-world examples of sampling, but it appears to set the stage for LLM clients to call other AI. The structure of a sampling call is informative
Standardized message request for sampling calls.
Notice how the request abstracts away
Model
Context: thisServer v allServers (though it’s not clear what these settings actually induce)
allowing the server to just say
i need a completion from a cheap, fast and somewhat smart model with the following context
This appears to have direct application for the third party client, which could call out to a secured AI client for LLM completions of permissioned context.
The sampling capability in MCP points toward a potential solution for this third-party context challenge, though it requires careful extension to work in consumer applications.
MCP for personalized applications
Suppose you’re using a travel site – a remote third party client application.
It endeavors to personalize your experience, possibly with AI.
But it doesn’t have the data needed to adapt its experience to your context. It only knows what you’ve said or done together – which isn’t the full story.
This is the setup for MCP: you equip clients with additional context and tools to give your application additional functionality and adaptability.
For instance, Claude Desktop may not be up to date on current events. We can solve that by adding search to our MCP client Claude Desktop by equipping it with the Exa AI MCP server.
Similarly, we could equip the travel application with a MCP server with sampling enabled, so that it could sample private user context to the ends of enabling the application to adapt itself to personal context.
That said, this case has a few differences.
First, candidate user context for personalized applications exists across many remote application servers, each with their own data schema and latency. Ideally candidate retrieval methods could operate over a unified data model with managed latency. Further, ideally keys could be managed on behalf of users, who may not have the time, interest, or capability to manage keys themselves.
Second, we’d wish for some security model that enables fine-grained access control over the context that’s passed to the LLM client. We don’t want to pass all context to the LLM client or be at the whim of the native remote service API scopes like https://www.googleapis.com/auth/gmail.readonly. Additionally, we’d like to assert how or why LLM completions of permissioned context passed to third parties satisfy privacy law.
Finally, we’d hope for some consent model that doesn’t require consent at each sampling invocation. Consent must be meaningful and intuitive, not a result of a beleaguered fatigue.
So while MCP lays the groundwork, Crosshatch builds upon it with a user-centric approach.
LLM Sampling for remote data and apps
Crosshatch solves these issues by enabling secure LLM sampling of private data by third party applications.
Crosshatch's version of MCP Sampling: combine LLMs, a unified data representation, and fine-grained access controls in user-governed cloud compute managed by Crosshatch.
Developers can add Crosshatch Link to their site or apps to invite users to link context to their application for a specified purpose e.g., “personalized travel”. This initiates a sync to the user’s trusted cloud and transforms the data into a unified events representation or index: everything the user has
Liked
Purchased
Confirmed
Exercised
Posted
across applications. The user leaves their trusted cloud to manage auth keys on their behalf.
In the Link, the user can choose what data to share to this application, agreeing to make this context available for secured LLM completion.
This unified events representation enables fine grained access controls to share e.g.,
“only recent purchases”
“hotel and restaurant confirmations”
irrespective of the data model or permission scopes native to the remote service that provides the data. [This is a reason why AI needs AI-native Auth.] It also enables natural retrieval for LLM completions.
Notice the similarity between the sketched MCP Sampling call and Crosshatch API call, both of which enable the server (in the Crosshatch case, the third party application) to request LLM completions of private context for any chosen AI model.
Finally, the Link enables a consent model that’s both consistent with privacy law and doesn’t require repeated consent on each sampling call.
Privacy law requires that data access should be “reasonably necessary and proportionate” to the use case, where “reasonableness” is determined by the user. That is, so long as linked applications make LLM calls in keeping with the Link-specified purpose, LLM sampling should satisfy privacy requirements.
We started Crosshatch well before MCP dropped and we do not currently use MCP. However, both efforts point toward the same future: one where applications can access rich user context through well-governed protocols rather than building isolated context silos. By allowing consumers to attach purpose-bound LLM completions of their context to the apps they already use, we can make every application more intelligent while preserving user control.
Anywhere n'everywhere
Not web3 or open protocol: How we designed Crosshatch to enable simple AI agent portability.
Scroll to read
1/5/2025
Is Crosshatch Web3? Why is Crosshatch building a central trust authority v an open protocol? In this blog, we unpack our design choices.
Is Crosshatch web3?
At Web Summit 2022 Tim Berners Lee was similarly asked, “Is Solid Project Web3?”
This is an important question – Solid (and Crosshatch!) look like they could be web3 projects. We’re focused on interoperability, decentralization and self-sovereignty.
But Solid is not web3. TBL explained
Blockchain protocols may be good for some things, but they’re not good for Solid. They’re too slow, too expensive and too public. Personal data stores have to be fast, cheap and private.
He went on
Web 1.0 was the bloggosphere, static HTML documents linked together with URLs.
Web 2.0 added technologies like Javascript and dynamic HTML.
But, for security reasons, … identity, apps and data were tightly coupled. … The Web 2.0 approach became the era defined by a few big companies that used our data to lock us into our platforms.
This is the strategy Bill Gurley discovered in 2003.
So Solid is not web3; it’s Web 3.0. [Maybe we should lean in and just call it web4.]
Solid aimed to decouple identity and data from apps. This was originally done via the Semantic Web, but AI makes data cleanly interoperable.
Instead of bringing your data to any app as in Solid, at Crosshatch you bring your AI agent: secured AI with permissioned access to context that you control.
Professor Claude expands:
AI dissolves ontological barriers that plagued Semantic Web. No more schema wars. No more standard bodies. Pure pragmatic interop via statistical mediation.
Long-time Solid contributor Noel De Martin describes the difference clearly:
In Web3, decentralized means that data is everywhere. Whereas in Solid, decentralized means that data is anywhere. If we're talking about my private data, I'd very much prefer the second approach (with "anywhere" meaning wherever I choose).
Crosshatch is Pretty Solid.
Whereas Solid has suffered from DX and UX “that suck”, Crosshatch aims to resolve these issues with an intuitive user experience and developer experience made clean and intuitive by AI. Data should be permissioned at the context level (“recent reservations”, “recent purchases”, “recent posts”), not the data source level.
Toward “data anywhere” paradigm, we’re beginning work on a self-hosted Crosshatch, where users can sync their data to a personal data wallet hosted on their own machine.
This delivers on De Martin’s “my data wherever I choose” – in the cloud for convenience-oriented users, or locally for those privacy conscious. This is distinct from crypto’s “data everywhere” but is consistent with ethos of interoperability, self-sovereignty and decentralization.
Why not an open protocol?
Last time we wrote about Crosshatch as a Personal MCP server.
When we wrote it we missed an ongoing discussion of how MCP might add auth to servers, where servers act as clients to third party integrations.
The intuition is that end-users may be more comfortable granting scopes to servers than clients, particularly since end-users may use a server with many clients.
The spec isn’t done, but the lead author eschews a central trust authority.
We want clients and servers to be able to determine whether they trust each other based on user consent, rather than delegating to any kind of registry of trusted implementations.
Anthropic’s Justin Spahr-Summers wrote in November.
This approach makes sense for an open protocol, but we believe that it comes at the cost of user experience.
We’re overwhelmed by Terms of Service. No one reads them. When users sign up for a Solid Pod, they’re presented with a number of candidate Solid Pod providers, but it’s not apparent what the difference is between them. We all have cookie fatigue.
The open protocol approach faces a paradox: decentralization and no central authority requires consistent consent and choice, but choice creates cognitive overhead.
When presented with a new MCP server, how does a user know if it's safe to trust? Like the proliferation of cookie consent popups, more "user control" often means more decision fatigue (per privacy scholar Daniel Solove a "Kafkaesque consent"), not more actual control. This is why Crosshatch takes a different path: a single trusted intermediary that handles authentication complexity, rather than pushing that complexity onto users in the name of decentralization, hosted on your machine or in the cloud.
In our view, the happy path eventually divorces consent from source data, so that you can integrate context into LMs and client apps via your agent irrespective of where it came from. That is, instead of sharing
gmail.readonly
calendar.readonly
we think you should be able to share coherent groups of events e.g., recent
purchases
hotel reservations
reservations
irrespective of where the events were observed AND irrespective of what endpoints and latency are available from among source endpoints.
Further, the server that manages these connections and permissions should be trusted. You shouldn’t have to read a new end user legal agreement every time you use an MCP server.
These three reasons:
Simplified consent over aggregated data
Trusted intermediary with well-known governance
Unified data model enabling simple retrieval
are why we’re building Crosshatch as a single Personal MCP server – hosted in the cloud or on your local machine – that you can authenticate to anything with simple and intuitive consent mechanisms.
For developers looking to enable rich personalization without protocol complexity, we invite you to explore our docs or connect with our team.
MeDP
Help consumers save time with better software and portable preferences, not automating clicks.
Scroll to read
2/2/2025
Will AI agents help consumers save time? We’re on our way to finding out.
Introducing Operator
Last week, OpenAI shipped Operator, a new $200 / month AI agent that can accomplish tasks in a browser.
Operator works pretty well – Kevin Roose at the New York Times successfully had the agent
Order a new ice cream scoop on Amazon.
[Buy] a new domain name and configured its settings.
[Purchase] a Valentine’s Day date for [him] and [his] wife.
He summarized
In all, I found that using Operator was usually more trouble than it was worth. Most of what it did for me I could have done faster myself, with fewer headaches. Even when it worked, it asked for so many confirmations and reassurances before acting that I felt less like I had a virtual assistant and more like I was supervising the world’s most insecure intern.
While it’s still early for AI agents and the trajectory of AI improvement is clear (we’re AI bulls), we think it’s worth unpacking how, exactly, AI is poised to actually save users time. It seems like there are two core mechanics:
Bypassing software with poor UX
Translating or activating (likely) preferences
Today, Operator appears squarely focused on (1). If you can describe exactly what you want in words, and there’s no room for misinterpretation, Operator allows you to trade typing or dictation for its automated clicks.
But the canonical motivating example for consumer agents usually involves (2). Princeton CS’ Aravind Narayanan explains
Compare this to the usual motivating example for AI agents — automating shopping or flight booking. This is actually the worst-case scenario. If the wrong product shows up at your door even 10% of the time, the agent is useless. And don't forget that online commerce is an adversarial environment — comparison shopping is hard because companies deliberately make it hard. If agents make it easier, brands will fight back. As for flight booking, the time consuming part is preference elicitation. The reason it is frustrating is that search interfaces don't know all your preferences and constraints (e.g. how to trade off time and money, preferred airlines, constraints on when you want to depart and arrive, and really dozens of other little things). But guess what, the agent doesn't know this either. I really don't think shopping and travel booking agents are going to work, and it's not a matter of improving capabilities.
Agents can’t save you time in (2) because they don’t know your preferences. If they decide for you and guess wrong, they could end up creating a costly error you'll have to clean up yourself.
Memory layers purpose-built for AI won’t help either for the same reason Consumer Data Platforms (CDPs) have failed – recordings of our preferences are fragmented across applications, and most memory layers, whether AI native or otherwise, are sparse and quickly become stale.
Even (1) is in question, with Upfront’s Peter Zakin wondering
What if the end-state of customer support software isn't replacing humans with agents? What if instead, the future of customer support is just more powerful, more malleable software? Yes, I could call Delta or email Delta for some request (from simple tasks like flight changes to complex tasks like PNR splitting, bereavement fares etc...)... but I could also just get what I need from a more powerful version of the Delta app.
Applying this to consumer – what if the future of consumer experiences isn’t an AI agent automating clicks but rather businesses using AI to just make more powerful and adaptive software that doesn’t require excess clicks?
Not to mention the incentives problem of browser agents, with Thick Apps (sites with private aggregated supply) like Reddit and YouTube already blocking Operator in the first week of its launch.
This is to be expected, per Ben Evans
A lot of companies will block [Operator]: they didn't make an API (or only made a limited one) on purpose, because they want to steer the flow and manage your interactions (and upsell you), and because they themselves want to be the place where you do the whole thing: no one in tech wants to be someone else's dumb API call.
Putting this all together, what the new user experiences of the future look like is still emerging. It sure appears that they will be more adaptive and conformative with, per Scott Belsky,
Context-based purchase decisions. Imagine every purchase decision - from food items and vitamins to wardrobe and accessories - being framed in the context of your diet, what you’ve purchased before, or what is recommended based on a deep analysis of your life and preferences
On-the-fly UI & text-based-commerce: experiences akin to a personal shopper that has worked with you for years, remembers your preferences (whether your wardrobe or otherwise), and can really engage with you personally
What’s core in this is the translation and portability of your preferences.
Traditional first party data via Customer Data Platforms CDPs or memory layers for AI are insufficient for this future. The data are too stale and too sparse.
We’ve explored other phrases to refer to this portable data layer – the one that contains your past purchases, preferences, etc., – as an AI cookie, a Personal MCP server – but for the death of the CDP (and adtech’s love of acronyms) perhaps what we need is a MeDP.
Introducing the MeDP
Crosshatch is a MeDP – a Me Data Platform.
The MeDP improves on the CDP by aggregating and unifying context from across applications, to create a single timeline of all actions a user has taken in the world, not just those in a first-party context. The MeDP is the platform for recording and activating (revealed) preferences.
This phrasing is useful particularly because its data structure follows canonical CDPs.
Like CDPs Crosshatch’s Context Layer uses the same event data naming convention best practices of CDPs. The only difference is that the MeDP unifies events from across applications to a single context layer: a single view into all actions you take in the world.
Segment's suggested action event naming conventions for Glossier, Hotel Tonight or Segment. Notice that Glossier and Hotel Tonight emit events that have the same name.
This means that we say a like action on Instagram is the same as a like action on Spotify. They both involve ‘liking’ something – whether it’s a song or a photo. Of course, like CDPs, we record the object that received the like, as well as relevant metadata about the object. Here’s how Segment describes metadata assignment best practices
Segment's suggested available objects that can receive actions. At Crosshatch, objects are anything that can receive an action – usually recorded as a proper noun.
In our case, objects carry metadata like
Object_style: a LM-generated description of the object, based on data we were able to find about that object through search-based RAG
Object_unstructured: unstructured data we might have about the object, for instance in the case of events derived from email, where we assign object_unstructured to be the contents of this email
This MeDP representation is helpful further in that it enables intuitive fine-grained access controls over the events observed across applications, which isn’t available or easy to modulate in native application scopes. [For more, see our past blog Authorized Agents.]
Fundamentally, we believe this MeDP is critical to AI-mediated interactions, as it solves the core problem of AI. Not automating clicks, per se, which may readily be resolved through superior software, but rather in preference elicitation.
MeDPs save us from AI as world’s most insecure intern and give us a personal assistant who actually know us intimately.
If AI is truly going to save us time, it’ll do it by translating our preferences and past behaviors into new contexts. So that the things we already do magically help us save time in the future, without any extra work from us.
This isn’t magical AI – it’s just practical application of context, in a secure and privacy-compliant way.
Bring on the MeDP.
Segment's suggested action event naming conventions for Glossier, Hotel Tonight or Segment. Notice that Glossier and Hotel Tonight emit events that have the same name.
This means that we say a like action on Instagram is the same as a like action on Spotify. They both involve ‘liking’ something – whether it’s a song or a photo. Of course, like CDPs, we record the object that received the like, as well as relevant metadata about the object. Here’s how Segment describes metadata assignment best practices
Segment's suggested available objects that can receive actions. At Crosshatch, objects are anything that can receive an action – usually recorded as a proper noun.
In our case, objects carry metadata like
Object_style: a LM-generated description of the object, based on data we were able to find about that object through search-based RAG
Object_unstructured: unstructured data we might have about the object, for instance in the case of events derived from email, where we assign object_unstructured to be the contents of this email
This MeDP representation is helpful further in that it enables intuitive fine-grained access controls over the events observed across applications, which isn’t available or easy to modulate in native application scopes. [For more, see our past blog Authorized Agents.]
Fundamentally, we believe this MeDP is critical to AI-mediated interactions, as it solves the core problem of AI. Not automating clicks, per se, which may readily be resolved through superior software, but rather in preference elicitation.
MeDPs save us from AI as world’s most insecure intern and give us a personal assistant who actually know us intimately.
If AI is truly going to save us time, it’ll do it by translating our preferences and past behaviors into new contexts. So that the things we already do magically help us save time in the future, without any extra work from us.
This isn’t magical AI – it’s just practical application of context, in a secure and privacy-compliant way.
Bring on the MeDP.
D.R.AI.
Today’s AI agents own no inventory and are not directly responsible for outcomes. Who provides services we trust? Apps.
Scroll to read
2/9/2025
Just over two weeks ago OpenAI launched Operator,
an agent that can use its own browser to perform tasks for you
OpenAI operator.
You can ask it to
Find you cheap eggs
Buy you an ice cream scoop
Or book you a trip
OpenAI’s agent will attempt to execute the task in the browser for you end-to-end, searching and clicking just like you would.
Operator and browser agents like it are poised to allow us to stop using internet applications altogether, and just let AI do things on our behalf.
We’re incredibly excited about this future, but it seems like there are (at least!) two open questions on the path there
Will sites allow agents open access?
Who is responsible for agent actions?
The answers to these questions have implications for how agents enable personalized services that save consumers time and money. Do agents orchestrate applications, or do users bring their agent to apps?
Can agents access app inventories?
Agent capability is downstream of access to context and tools.
Operator enables access to tools by making the browser its tool – it’s able to execute anything in a browser you could.
Apps and sites like Uber, Zillow, Airbnb, and Walmart work hard to assemble a trusted supply of cars, homes, and staples they make available to customers. They’re opinionated about customer experience and want to own the whole experience end to end.
It’d appear for this reason, we’re already seeing sites like Reddit and Youtube block Operator. As we wrote last week, referencing Benedict Evans
no one in tech wants to be someone else's dumb API call.
Last year Airbnb was hiring Anti-Bot engineering.
These market dynamics are still forming, but early responses appear to be aligned with Evans: apps who own valuable inventory want to be the principal toll road to it.
Who is the DRI for AI agents?
When AI agents take action on our behalf, who is responsible if they mess up? When there is money involved, will the agent cover a mistake?
For instance, when you order DoorDash, DoorDash is acting as your agent to orchestrate the sourcing and delivery of your food. If your order isn’t right, DoorDash covers it, often offering re-delivery or refund.
When the Washington Post tried Operator, Operator bought eggs on behalf of the columnist, even though that wasn’t specifically asked.
Sam Lessin, previous co-founder of Fin (a human/AI assistant founded in 2017), shared 3 lessons from building Fin
When you get between a customer and an open-ended set of services with real world consequences that you don’t control, you are assuming a massive amount of liability for outcomes you can’t guarantee."
The 1% Blame Game - The Assistant Front End Will Get Blamed for Any Downstream Service Mistake"
“Depth of Error Impact in the Real World — The surface area of cost and consequence is dramatically dramatically bigger than it is when your Doordash food is cold. The problem is a financial one. Every time you take on a high ticket item you are basically taking on the liability of a big refund or huge fight if you don’t deliver … and if you are just getting paid as an ‘assistant’ vs. something like a 10% or 20% commission on everything, you just don’t have the cash to give reasonable compensation to your customers and keep your model working.
"I could have done it better / faster myself" — Anytime a service isn’t perfect, the customer / client thinks ‘I could have done it better or faster myself’ … even when that is 100% untrue. But again, when someone gets in the middle of that transaction in most cases the liability assumption even when things don’t go terribly wrong creates constant stress.
The DRI for an AI agent is the agent itself. The agent is more valuable the more it does for you, but that adds to the surface area of what it can mess up. And when it messes up, people will expect the agent to make it right. But as Sam notes, the business model to make democratized assistants work is difficult to find.
Buying services from multiple vendors with unclear responsibility boundaries often leads to poor outcomes for customers: things get dropped and it’s not immediately clear who is ultimately responsible for any service level. [You can see this in challenges presented by BYOC deployments, for instance.]
Make agents responsible again
Today’s independent AI agents own no inventory themselves – the content, services, and goods they aggregate are not theirs. The services they source this from could readily block or hinder their activity.
And they’re not DRIs for outcomes. If they make a mistake, will they compensate you for your lost time or money?
In the end, an app is a wrapper for
Its own supply of content, goods or services
Direct responsibility for the provision of these goods
AI agents have neither of these traits.
This is why, if you want AI to help you save time or money, you need to go to services that both have
The goods you want to consume
Are responsible for the effective delivery of these goods
Apps do this, but today’s independent AI agents do not.
Motivation for personal AI as an identity
Many have imagined that a personal AI could be an AI agent that goes out and does stuff on your behalf. This AI agent should know you well and can use this information to help you save time and money.
But both of these dynamics
Sites already blocking independent AI agents from taking automated action
Lack of a directly responsible party for delegated action
challenge this path of the personal agent.
Instead, if you want personalized services that save you time and money, you need to bring your AI agent to these services, enabling you to access their private supply of goods and services under a service agreement where there actually is someone who’s responsible for the delivery of goods. By bringing your AI agent, you can access the goods and services these applications sell, and retain the service agreement where they guarantee service quality. This enables a personalized experience that an AI agent could otherwise provide, but under the boundaries and guarantees of an application.
Without access to goods or capacity to actually guarantee their delivery, what really is an AI agent?
[An AI agent is actually a user agent that always acts in the user’s interest, sharing information with applications only reasonably necessary for the fulfillment of user requests.]
The end of first-party data
Why Walmart's vast first-party data still failed to deliver the personalization AI promised—and what this means for your customer data strategy.
Scroll to read
3/2/2025
In April 2019 Crosshatch cofounder Soren Larson decided to join Walmart’s Store No 8, a startup division created by Jet.com’s Marc Lore. The team’s mission was to help busy families save time by putting groceries and household goods on autopilot. Our team was well positioned to this
Team: I joined a team of an ex-Renaissance Technologies Harvard astrophysicist and a Cambridge experimental physicist
Data: Walmart has more first party data than anyone on the needs of American families
Infra: Walmart data is well organized, with purchases and clickstream data all readily accessible in data warehouses
Walmart instrumented all events on web and mobile – we could see everything a user did on digital Walmart properties. This data landed in data warehouses including Google BigQuery.
It was hard to imagine that we wouldn’t succeed.
Yet our journey at Walmart revealed a fundamental truth that would eventually lead us to found Crosshatch. After experiencing first-hand the limitations of even the most robust first-party data infrastructure at one of the world's largest retailers, we recognized that the entire industry was building on a flawed foundation. What began as a frustrating experiment in predictive grocery auto-replenishment became the motivation for reimagining how personalization could actually work in practice.
The gap between the promise of data-driven personalization and its reality wasn't just Walmart's challenge—it was an industry-wide blind spot.
What is first-party data?
First party data is any information that a business collects as an end-user engages with a product. This information includes
Products a user clicks on
Forms a user completes
Products a user adds to cart
Products a user purchases
Searches a user enters
Walmart consolidates all of this information within its data warehouses.
Whether this information is
collected in a CDP
or a data warehouse
or a memory layer for AI
all of this information constitutes first-party data.
But as we learned in our product development, first-party data is insufficient for personalization that saves users time.
Why first-party data isn’t enough for personalization
At Walmart, we endeavored to use the data we collected about customer behaviors to ship the ultimate personalized service: a service that puts groceries on autopilot. This is what personalization long promised: A service that knows what we like, our needs, or behaviors, and can automate chores in our life.
Using first-party data collected on Walmart.com, we planned to estimate
How often customers needed groceries
What groceries they would need
In our product development, we onboarded 1,000 customers to have their groceries and household goods delivered on autopilot. Using our vast first party data we
estimated how often users would expect groceries
what groceries the user would need
Whenever it was time for a new order, we placed orders on behalf of customers, giving users 2 days to modify their carts.
What we found is that we were only able to predict about half the basket with 70% precision. Namely, for only half the basket could we estimate 7 in 10 items correctly. For the other half of the basket it was even worse.
Walmart endeavors to save busy families time. This level of performance wasn’t good enough – shoppers don’t want AI to buy groceries on our behalf where 3 or 4 of the products ordered are wrong. That doesn’t save users time.
The problem we faced wasn’t
Data availability: we could see all online customer behaviors
Talent: our team was lead by an ex-quant from a fund that delivered top hedge fund returns
Infra: We were able to train models on all customer data on leading cloud infrastructure
It was one of data. First-party data isn’t enough to anticipate customer needs.
First-party data fails in the AI era
When you zoom out to look at what data is actually in your data warehouse, you’ll quickly see why first-party data fails in the AI era.
CDPs collect data like what things users
click on
Search for
Scroll past
Buy
Most of these things are not very rich in signal. Does what someone clicks on actually mean they like it? Or what they search for? Or scroll past? How does this first party data actually drive real modeling of user preferences? These behaviors might confer value in scaled platforms like TikTok, but most sites don’t have this sort of traffic or products that really imply rich preferences.
Purchases are rich in signal, but purchases are backward looking and are limited to what a user does with only your app. Customers use many apps! So any behaviors you might observe over time – that could inform you to their preferences – could mislead you into what users actually need.
Do they buy milk once every other week actually? Or do they switch between your app and another? Or were they just traveling last month?
If a user says they like pizza, do they always like pizza, or just certain contexts?
When you start looking at the data actually in your data warehouse – the things people
click on
Search for
Infrequently bought (conversion events are rare!)
Say they like
you might start to wonder how that data actually confers data activations that drive value to you or your customer.
This is particularly clear in this new age of AI, where models like Gemini or GPT4 can be readily asked what data in your warehouse might mean for an optimal next best interaction with your customer.
AI puts the limitations of first-party data in clear view. You can just ask ChatGPT to interpret your first party data, but what’s ChatGPT going to do knowing what a user recently clicked on, searched for, or bought once last summer? Not much.
Why AI demands a new kind of context
While first-party data is useful to know where customers are in their customer journey in a session, it’s fundamentally insufficient to ship personalized experiences that move the needle for customers.
No data activation platform will solve this. While every data activation platform markets shipping 1:1 customer experiences, fundamentally this is a lie. They can’t actually deliver personalization that saves users time because first party data just isn’t enough context for AI to provide value to customers.
To make matters worse, first-party data infrastructure is expensive to stand up, administer, and maintain. VCs will say that first-party data is what drives competitive advantage, but when you actually look at the data you’ve collected, you might start to wonder how this narrative went on for so long. How exactly is knowing what a user clicked on a few months ago driving competitive advantage?
Rather than standing up another data activation platform, teams need a personalization platform that runs on a customer’s complete context. This offers product managers and marketers a true complete view of the customer’s journey from across applications, and enables time-saving personalization that adapts to customer needs and drives loyalty and spend.
This is a controversial view, but we believe it’s time to move on from incremental improvements in personalization. We’ve been told first-party data is the pathway to better personalization, but it’s been well over a decade since these narratives began, and we’re no closer to the futuristic personalization we were long promised.
Instead of relying solely on first-party data, companies need to leverage enriched customer contexts from multiple sources to deliver truly effective personalization. At Crosshatch, we've built a new kind of personalization platform that combines first-party data with context users bring with them to create a complete view of your customers. By contextualizing behaviors across applications and channels, we enable businesses to anticipate customer needs with much higher accuracy than traditional CDPs or first-party data warehouses alone. Our early customers are delivering adaptive experiences that take action on changes in a user's context, previously impossible on the basis of first-party data alone.
It's time to move beyond the limitations of first-party data and embrace a new paradigm for personalization. At Crosshatch, we're helping forward-thinking brands deliver experiences that truly adapt to customer needs and save them time. Want to see how our approach compares to your current personalization strategy? Schedule a demo to discover what's possible when you stop relying solely on first-party data.
Introducing Playground
See the internet’s first personalized inference gateway in action.
Scroll to read
3/23/2025
If AI agents are going to save consumers time and do things on our behalf, they need to know who we are.
They need to know how we behave IRL
what choices we make
where we like to go
what things we like
and then use that context to make helpful recommendations or decisions automatically.
At Crosshatch, we've shipped the internet's first personalized inference gateway - a way for users to authenticate their personal AI to apps and agents. Today we share a beta version of our API Playground to see our API in action.
The personalization promise gap
Consumer internet companies still talk about what AI will do for consumers, but they’re light on the details on how.
For instance, last week Booking Holdings’ CEO Glenn Fogel sat down with Fortune magazine
But we’re old enough to remember a human travel agent. And that human travel agent knew a lot about you. She kind of knew what you could afford. She kind of knew what you liked in general, and she suggests things for you. That’s what we want, that personalization, that’s what we want to do. And we can do it. We will do it. And the more you work with our services, the more we know about you.
Consumer internet companies have been telling this personalization story
We'll collect a bunch of data and finally be able to deliver services that anticipate your needs!
for at least a decade, but it’s yet to yield the personalization promised.
Why is this so hard? First party data doesn’t work. Businesses don’t observe enough consumer behavior data to achieve Fogel’s aim. We saw this first hand at Walmart – a consumer staples business where people are shopping all the time. First-party data just isn't enough context.
Travel is a consumer discretionary business. People don’t book trips that often. Bookings Holdings’ doesn’t have enough touchpoints with a consumer to actually know a user’s evolving preferences.
A user books a trip with Bookings, or maybe Expedia, or maybe directly, they travel and then their life changes. All Bookings sees is the travel booked with Bookings.
Without a change, it will never yield the personalization Bookings’ Fogel imagines.
Why we built Crosshatch
This is why we created Crosshatch: To enable users to log in with their personal AI to any application, where this personal AI has complete context of their preferences, behaviors, and needs, sourced from every application a users uses.
Applications like Booking.com offer travel booking and planning services users trust, but it lacks the context to provide the human travel agent experience Fogel imagines. Instead, we believe users should be able to log in with a personal AI and have Booking “call their agent” for advice on what the user might like, etc.,
Our actions across applications produce data that ‘reveal our preferences.’ Rather than having to repeat ourselves to every app, we should be able to bring our own context-rich AI to help save us time.
What makes Crosshatch different?
Unlike other personalization solutions that rely solely on first-party data or invasive tracking, Crosshatch:
Unifies context across apps - Brings together data from Gmail, Calendar, Plaid, fitness trackers and more for a live stream of context that’s always up to date
Puts users in control - Fine-grained permissions let users decide exactly what context to share
Maintains privacy - Applications never directly access raw user data
Requires minimal integration - Developers or AI agents can add personalization with just a few lines of code.
Separates model and context layers - Applications can activate context with the best model for the task and not get locked in to any model provider
Not mere MCP - Crosshatch creates long-running context syncs, session-based auth, a unified context layer, and folds AI into servers to make interop of consumer data and AI privacy-compliant
Introducing Crosshatch Playground
To make this new pattern more intuitive, we've built Crosshatch Playground – a web application to see how personalized inference works.
Our personalized inference gateway is a proxy over secured top language models, but with a few important differences.
COMPUTE TO DATA
Most applications operate under a “data to compute” paradigm, where users must send their data to remote third-party-governed compute. This is how Booking.com works – you must tell them information about yourself, and they use that on their compute to save you time.
This leads to users having to repeat ourselves across applications. It’s also less privacy preserving, in that we have to tell apps practically everything so that they might be able to save us time.
Crosshatch operates under a “compute to data” paradigm, where applications call your agent – private AI with permissioned access to context – enabling applications to perform compute on your context. This way, you provide your data once and apps are able to ask your agent questions about your context.
AI AUTHENTICATED BY USER-SPECIFIC TOKENS
Unlike most AI endpoints that are authenticated by a developer- or application-specific API key, our AI API is authenticated by app and user-specific authentication tokens, granted by our Link.
This is because our API is a personalized inference gateway. When you talk to Claude or GPT-4o through our API, you’re talking to that model as well as the user’s context.
CONTEXT IN THE HEADER
When a user logs in to an app with Crosshatch, the user links to the application specific context like from their Gmail or Calendar. When this happens for the first time, Crosshatch initiates a live stream of this context to a database we manage on behalf of the user. This makes sure that context is always up to date without any extra work from the user or the app.
Applications access this context through a header in our API, enabling apps to reference recent reservations, purchases, exercise history, likes and more all by an SQL-like querying language. With these context queries defined, applications can inject the results of these queries into AI prompts.
This design makes it easy for apps and agents to use our API.
When Crosshatch receives an API request, we
perform the specified context query on behalf of the application
hydrate the prompt with the results
Send the hydrated request to the specified AI model
Return the results to the application
This enables applications to get private personalized inference in a serverless fashion.
We created our playground to make it easy to see how context queries are defined and how to inject them into prompts.
In the Crosshatch playground, you can define queries into the user’s context live stream and reference them by name using curly brace notation (e.g., here {recent_travel}). When Crosshatch processes the API call, we hydrate {recent_travel} with the results of the query you define.
You can see the context query defined for recent_travel here as well as some example results, showing how the results of recent_travel resolves into the prompt.
AI GATEWAY
Our API is a proxy over top language models. Through our API you can use models from OpenAI, Anthropic, and Gemini using their native syntax.
The playground uses the Crosshatch provider for Vercel’s AI SDK, which makes calling our API easier in web applications.
Try Crosshatch Playground
Ready to experience the future of personalization? Our playground is now available in beta for developers and product teams to experiment with.
Try Crosshatch Playground
Or if you're ready to integrate Crosshatch into your application:
Create Developer Account
Join us in building an internet that truly understands its users – where your context can be easily and safely linked across applications, saving you time and delivering experiences tailored specifically to you.
Free agents
Everybody deserves a free and incentive-aligned AI agent. Finance it by letting apps call your agent.
Scroll to read
4/4/2025
AI is closing the gap between
desire and action
belief and knowledge
In the near future, we’ll just be able to say what we want and our agent will get it done.
But today we’re on a path of protectionism, where it’s implied it’s an American duty to pay high prices to a concentrated set of American companies for intelligence.
Agents are too important to live behind a $20 / month subscription.
The open internet broke free of private networks through innovations in security and data protocols that unlocked e-commerce.
This provides a playbook to free and open agents.
E-commerce and memory unlocked the open internet
The early internet was controlled by two private networks, AOL and MSN. They charged both users and businesses for access to the network.
Engineers at Netscape believed in a different future: One of open networks controlled by no one.
But for this belief to succeed, it needed an economic model – rather, a data and security model – to support it. Netscape’s Lou Montulli explained
The thought was that if we enabled e-commerce we would essentially create a revenue stream for the internet that would be sustainable [and outcompete private networks]
And one of the core technologies that we needed to create was some form of memory for the web so that when a user came to a site and expressed some sort of preference that when they continued through the site they would remember [their preference].
Cookies unlocked memory like clicks, scrolls and infrequent purchases, but within the pocket universe of a given site or app.
But we expect agents to operate on our complete context – making the internet’s current memory model insufficient.
Actions our agent may wish to take
Reading the NYT
Searching Airbnb
Booking an Uber
are often guarded by apps – “no one in tech wants to be someone else’s dumb API call” Ben Evans wrote in January.
But apps that own valuable actions don’t know our agent and its complete context. Their first party data is nothing compared to what our agent knows about us. Montulli continues
Cookies were complex in that they were attempting to solve the data privacy problem while giving memory to websites. The core concept of cookies is what if we sandboxed them to only the relationship to a given website.
The first step was giving the web memory through cookies and CDPs.
With agents, the next step is giving the web context-aware delegated agency: Our agent needs a way to take action and apps need a secure way to call our agent.
Agents can already take reliable and cheap action
For our personal agent to be valuable, it needs to be able to take reliable and inexpensive action.
While many have focused on browser-led agent action, many are finding the economics don’t work in the consumer case.
Browser-use charges $0.05 per agent step. These economics can work in enterprise – but do they work in consumer? $0.25 extra to have your agent order DoorDash when the alternative is use the app for free?
Every expense that agents incur takes us further away from free agents. Worse, folks find browser agents slow and unreliable. Sites are hiring engineers to block them.
The browser path may not be needed, as agents can already take reliable action through APIs:
Kroger
Instacart
eBay
Hotel + flight APIs
already offer much more economical and purpose-built interfaces into action spaces consumers care about. While some apps may choose to eschew agents, others may (and already do!) see open interfaces into (agent-lead) commerce as valuable sales channels.
Most of these API integrations are completely free, enable reliable and purpose built action, and need only someone to register a client id.
This way, users can have their personal agent take action on their behalf reliably and for the lowest possible cost. Since the personal agent knows the user’s complete context, it doesn’t just call Kroger’s API naively, it does so with all available context.
Internet memory drives commerce but is expensive
In 2023, McKinsey reported that
companies that excel at personalization generate 40 percent more revenue from those activities than average players
Over three-quarters of consumers (76 percent) said that receiving personalized communications was a key factor in prompting their consideration of a brand, and 78 percent said such content made them more likely to repurchase.
But personalization is expensive.
Databricks ACV is around $150k
Snowflake ACV is around $300k,
CDPs cost anywhere from $50k - $800k per year
Machine learning engineers or data scientists cost around $200k / year
But as we’ve written before, agents introduce new economics in personalization, making it better and cheaper.
Can spending on personalization infrastructure more efficiently collapse into calls to a personal agent?
Agent API calls pay for free agents
The free internet was paid for by ads.
Can free agents be paid for by apps calling your agent?
Apps usually want you to buy, engage, or return more often.
Personal agents do “in each instance, exactly what the user would want it to do if asked specifically.”
Personal agents may not have all of the capabilities of apps, but they do know the user the best. As a user agent, they can also make sure that only the context that’s relevant to an app is actually shared with a purpose built interface.
Personal agents enable apps to deliver experiences that are more relevant – encouraging higher spend, engagement, retention, and reduced cost of reacquisition. Personal agents can also save apps money, by reducing spend on expensive data and personalization infrastructure.
And relative to shipping a remote Model Context Protocol server, personal agents have
unified context
streamlined consent screens
and trusted operations
that make agent context ops efficient for AI and intuitive and safe for users.
Agent alignment is about taste and economics
It’s generally accepted that personal agents should be aligned to users.
At first pass, it seems like the only way to do that is to have users pay.
We’re suspicious of this argument. We pay internet service providers and they’ve sold our data. A monopolist or member of an oligopoly may not be aligned to us either.
If AI agents are going to be important, they should be available to everyone on the web regardless of ability to pay.
Accepting that means defining new business models and pursuing capital structure that doesn’t induce misaligned incentives like ads, monopoly, or price increases when VC money runs dry.
“Agents calling agents” has been a popular refrain in the Valley, but we believe their strongest motivation lives in economics and (cross-trust boundary) context management.
There are of course interesting questions on what “your agent” is – we believe they’re more like an user agent headless server with a variety of AI client interfaces – than any singular interface or God model.
This is the future we believe in – one where technology saves us time at the lowest possible cost. Crosshatch is the internet’s first personalized inference provider, providing the internet’s fastest way to log with your personal AI enriched with complete context.
If you’d like to experiment with this world – try talking to your agent on our playground or stay tuned for our Personal MCP Server to land and link it to your favorite MCP client.
"P" in MCP stands for privacy
No it doesn’t. MCP has security vulnerabilities: it’s also a privacy nightmare.
Scroll to read
4/10/2025
Interest in Anthropic’s Model Context Protocol has blown up.
People are using MCP to enable MCP Clients like Claude Desktop to
Update their databases
Make changes in your Stripe account
Control your browser
all with natural language.
But MCP has security challenges.
Over the weekend, one of the founders of Invariant Labs wrote The S in MCP stands for Security highlighting the
Command injection vulnerabilities: using unsafe shell calls in the server
Tool poisoning: adding malicious instructions inside the tool description
The Rug Pull: silently changing tool definition after installation
Cross-server tool shadowing: A malicious server makes the agent client override or intercept calls to a different trusted server
These security vulnerabilities appear to focus on local instances of MCP.
But as MCP and related infrastructure patterns scale to the consumer web, MCP servers will be deployed remotely, and a new host of privacy problems will appear.
Remote MCP are Data Controllers
Under CPRA, a Data Controller determines the purposes for which data can be processed while a Data Processor “Service Provider” acts under the instructions of the Controller.
In consumer applications, the web application is almost always a Data Controller, one that gets to choose how data is used, typically under terms that supply the web application rights to use data for the purposes of “improving the service” – a vague coverall that enables exercise of data for flexible purposes.
For instance, OpenAI ChatGPT terms of service say they
may use Content to provide, maintain, develop, and improve our Services,
where Content is anything you provide ChatGPT and “improve our Services” is never left defined.
OpenAI is clearly a Data Controller for ChatGPT.
Remote MCP will be the same, but arguably worse, in the sense that you’re giving remote MCP your authentication credentials that will come with terms no one will read that likely include usage for purposes including, “improving their service” which could mean anything.
To be clear, Crosshatch also have this language in our Consumer Privacy Policy – that said, our usage is bound by our obligations to our user-controlled data connections pattern and a clear specification of what Our Services exactly are.
Remote MCP are like aggregators. Last gen aggregators sold customer data to adtech and hedge funds
When iPhone and Android launched without flashlights, apps rushed in to provide services that turned camera flash features into a flashlight.
But these apps also asked for location and other permissions on install.
Users usually installed these apps in a rush, not noticing the strange permissions requested on install.
WHEN I DOWNLOADED the Flashlight app to my iPhone, I was in a jam. I was camping, I think. Or maybe a pen had rolled under my couch. I remember that smug sense of self-congratulation after I downloaded the software, which converted the iPhone's LED flash into a steady and bright beam of light.
But I shouldn't have been so pleased with myself. Though I didn't realize it at the time, I was potentially handing over a boatload of data to advertisers as well.
Wired magazine reported in 2014.
The late Mint.com was a Data Controller that provided a unified view of your finances in an easy to use app by aggregating your logins. Its parent company Yodlee also sold customer data to hedge funds.
These patterns reflect a larger challenge in MCP’s chosen “privacy as consent” pattern: overwhelming the user with consent messages that they’re ill-equipped to meaningfully understand.
MCP’s consent nightmare
MCP authors have eschewed a central trusted authority for the protocol, preferring to leave the user responsible for allowing access.
We would prefer to avoid a central trust authority. We want clients and servers to be able to determine whether they trust each other based on user consent, rather than delegating to any kind of registry of trusted implementations.
MCP’s Justin Spahr-Summers wrote in November.
This has two problems:
Unbounded consent space
Cognitive overhead of consent
Since MCP allows arbitrary function interfaces to integrations, the user is left to parse what these allowed functions even do. Studying one MCP server interface pays no dividends to the user for understanding another because each one could be different. A function get_personalized_recommendations from one (remote) MCP server could mean something entirely different on another. [As the Invariant Labs piece highlighted, this function could also change post install!]
Even if there were only one function to consent to, this still may be too complicated for most users.
We still collectively don’t know what all these cookie consent messages say. Why would MCP consent be any different? MCP consent messages are a cookie fatigue redux.
These banners, designed to give users control over their data, have ironically achieved the opposite. Most users, overwhelmed by the sheer volume and complexity of the choices, simply click “Accept All” to get to the content (if it ever loads). This defeats the purpose of consent and creates a pervasive sense of fatigue.
Privacy scholar Daniel Solove calls it a Kafkaesque Consent.
[Kafka's] work also paradoxically suggests that empowering the individual isn’t the answer to protecting privacy, especially in the age of artificial intelligence. In Kafka’s world, characters readily submit to authority, even when they aren’t forced and even when doing so leads to injury or death. The victims are blamed, and they even blame themselves.
User consent is convenient from an open protocol design perspective, but it’s poor from a user experience or privacy perspective. Getting consent that a user blindly accepts is not consent.
A simplified consent layer
Consumer AI agents are supposed to save us time.
Part of the UX game for agents is to create the minimal interface for unleashing agent action such that they don’t (too often!) do something we don’t want. If agents ask us for permission for every little choice, they’ll collapse into the “world’s most insecure intern”.
Third party applications will have fine grained access to our context and use it to anticipate our needs and adapt to our state.
But this statefulness, particularly as part of an open internet, should heed lessons from the past decade on privacy UX and incentives.
It is not reasonable to expect end users will be able to make informed consent decisions for every candidate MCP server. This is why e.g., we’re seeing folks like Auth0 enable enterprise administrators to set consent permissions at a company level.
To the extent patterns like MCP scale to consumer, consumers will need a similar trusted and simplified consent interface to easily and effectively manage cross-application connections, where it’s clear, intuitive, and well-motivated how and why applications are accessing otherwise private information.
We’ve long felt like applications should “call your agent” but where “your agent” is any private AI resource with permissioned access to your context.
In 2025 privacy feels less pressing. But principles of privacy like data minimization – e.g., apps should only access the data that’s necessary and proportionate to the service they’re providing you – still feel fundamentally reasonable.
We shouldn't have to auth private resources into multiple remote MCP servers. We should be able to do it once and link those resources out to new apps.
P in MCP clearly doesn’t stand for privacy.
But it should.
Personalized AI Gateway
AI gateways sit between users and AI to govern and scale AI applications. Personalized AI gateways add user-owned runtime context.
Scroll to read
5/3/2025
What is a personalized AI gateway?
AI gateways sit between users and AI to govern and scale AI applications. Personalized AI gateways extend this pattern, sitting between users and internet applications to hydrate AI calls with user-owned runtime context.
This simple extension unlocks a new type of internet infrastructure – one where every application can adapt to us based on our complete context, not just what we do in any single app.
AI gateways are increasingly popular in enterprise
In September 2023, Cloudflare launched their AI Gateway to make "AI applications more observable, reliable, and scalable."
AI Gateway sits between your application and the AI APIs that your application makes requests to (like OpenAI) – so that we can cache responses, limit and retry requests, and provide analytics to help you monitor and track usage.
Databricks followed suit in September 2024, launching their own AI gateway with additional governance features.
From web gateways to AI gateways
These gateways follow Secure Web Gateway (SWG) patterns, which sit between users and the internet. Palo Alto Networks summarizes:
A SWG is a cloud-delivered network security technology that filters internet traffic and enforces regulatory policy compliance. SWGs inspect traffic from client devices aiming to connect to the internet, they authenticate users and examine web requests blocking those that violate policies or those that pose a security threat.
SWGs are typically deployed by enterprises to protect employee web traffic. But this architectural pattern – a trusted intermediary that governs interactions between users and services – extends naturally to personalization.
Today, applications wishing to personalize experiences typically:
Instrument user behavior to collect local context
Activate this context with AI models
But for the age of AI, this fragmented context is proving insufficient. Every app builds its own data silos, yet none have the complete picture needed for truly adaptive experiences.
Here's how this personalized AI gateways look in practice:
The trouble with integration protocols
Integration protocols like Model Context Protocol (MCP) are rising to fix this fragmentation. MCP enables users to bring their context to AI clients, but their privacy properties are poor.
MCP follows a "context to compute" pattern, where users send their context to possibly untrusted third party AI clients. This creates several problems:
Context hunger: MCP clients may consume much more context than needed to answer simple questions like:
When am I free next week?
Where should I go for dinner?
What type of sofa might I like?
Privacy violations: This may violate CCPA and GDPR's data minimization principle. CCPA requires that data use be "reasonably necessary and proportionate" to the purpose. But MCP clients can access broad interfaces like the following without being bound to any purpose limitation:
search_all_email()
list_recent_transactions()
User burden: Users must understand and consent to complex server interfaces, creating cognitive overhead that leads to consent fatigue.
The case for compute to context
A personalized AI gateway can fix this by moving compute to context. Instead of sending context to compute, applications call a user's AI agent to get inference based on permissioned context.
Here's how it works:
// Define what context you need
const contextQuery = {
"recent_travel": {
"select": ["object_name", "object_style"],
"from": "personalTimeline",
"where": [
{"field": "object_subcategory", "op": "=", "value": "HOTELS_ACCOMMODATIONS"}
],
"limit": 10,
"orderBy": [{"field": "originalTimestamp", "dir": "desc"}]
}
};
// Call the gateway
const response = await fetch("https://ai.crosshatch.io/", {
headers: {
"Authorization": `Bearer ${userToken}`,
"Crosshatch-Provider": "gpt-4o",
"Crosshatch-Replace": JSON.stringify(contextQuery)
},
body: JSON.stringify({
messages: [{
role: "user",
content: "Based on their travel history {recent_travel}, where might they like to stay in Paris?"
}]
})
});
The gateway may provide all the features of enterprise AI gateways:
Logging
Caching
Rate limits
Retries
Support for all top model providers
But adds critical new capabilities:
Runtime context hydration
User-governed sharing policies
Fine-grained access controls
Automatic context synchronization
The shift to personalized AI gateways
Crosshatch is the Internet's first personalized AI gateway.
We created Crosshatch because we believe that every application on the internet should adapt to us based on our complete context.
The modern data stack, which powered the last generation of personalized applications, has become bloated, unwieldy and expensive while failing to deliver the promise of adaptive applications.
Personalized AI gateways align with the new patterns of MCP – enabling users to port their context into third party applications – but do so in a way that's privacy compliant and easy for consumers to understand.
Want to see how it works?
Try our playground
Check out our docs
Create a developer account
Join us in building an internet that adapts to each user.
Vibe personalization
Algorithms have cooked our personal style. Personal AI without complete context is about to do the same.
Scroll to read
5/11/2025
Last week OpenAI’s Sam Altman had lunch with the Financial Times
Altman hints at his ultimate goal … when a subscription to ChatGPT becomes a personal AI, through which users log into other services. “You could just take your AI, which is gonna get to know you better over the course of your life, have your data in and be more personalised and you could use it anywhere. That would be a very cool platform to offer.”
We’re very excited for this future – we’re betting it rises as a Personalized AI Gateway – one where apps can adapt to our complete context using any private AI.
Importantly, we believe personal AI runs on all our context, not just what we share with a single (chat) application.
Information from a single application can work fine in the immediate contexts of our engagement, but can dull quickly as we change or distract as applied to outside contexts.
We say personal AI without complete context induces a vibe personalization – committing a worse form of Baudrillard’s Perfect Crime, creating a new reality out of an incomplete picture of who we are.
As we interact with vibe personalized services, we gradually lose access to what our preferences might have been without this influence. We all become West Village Girls, gravitating toward the things a single model provider induce (or "frack") us to engage about.
Complete context doesn’t save us from this Perfect Crime, but interoperability across models and context sources leave more opportunity for the world’s mystery – and the real – to endure.
# Docs
---
title: Account Management
order: 4
---
## Account Management
The Account Management section in the Crosshatch Profile allows users to update their personal information and manage account settings.
## Profile Information
Users can manage basic profile details including:
- Name
- Email address
- Profile picture
## Personal Preferences (Coming Soon)
In an upcoming feature, users will be able to:
- Directly specify preferences (shoe size, dietary restrictions, family status, etc.)
- Share these preferences with connected app partners
- Enhance personalization alongside insights from their linked data sources
This feature will complement the data from connected sources, allowing users to explicitly share additional information that might not be captured in their existing data connections.
## Account Settings
This section provides users with options to:
- Manage communication settings
- Request their data
- Request account deletion
When a user requests account deletion, all their data connections and app partner access permissions are revoked, and their personal data is deleted according to Crosshatch's data retention policy.
---
title: App Connections
order: 3
---
## App Connections
The App Connections section in the Crosshatch Profile shows users which app partners have connected to their account and lets them control access permissions.
## What Users Can See and Do
In this section, users can:
- View all app partners connected to their account
- See which data sources each app partner can access
- Enable or disable specific data source access for each app partner
- Disconnect app partners entirely
When access permissions are changed or an app partner is disconnected, the changes take effect immediately.
## How It Works
When users interact with an app partner through Crosshatch:
1. The app partner requests access to specific data sources
2. Users grant permission through the Link flow
3. The app partner can then query the user's data through Crosshatch's AI layer
4. Users can modify or revoke these permissions at any time in their Profile
This design ensures that users always maintain control over their data while allowing app partners to deliver personalized experiences.
---
title: Data Connections
order: 2
---
## Data Connections
The Data Connections section in the Crosshatch Profile is where users connect, monitor, and manage their various data sources. This central hub gives users full visibility and control over what data they're sharing through Crosshatch.
## Connecting Data Sources
Users can connect a variety of data sources to their Crosshatch account, including:
- Email (Gmail)
- Calendar (Google Calendar)
- Financial (Plaid)
- Fitness (Fitbit, Strava, Oura)
- And more
Each connection follows OAuth flows to ensure secure access without storing credentials directly in Crosshatch.
## Data Collection Scope
Currently, Crosshatch collects the following data from connected sources:
### Gmail
- Email confirmation messages (bookings, tickets, receipts, flights, reservations, orders)
- Purchase receipts from common retailers
- Data includes recipients, subject, content, and timestamps
### Google Calendar
- Calendar events including title, time, location, and attendees
### Other Sources
- Each data source has specific data collection parameters based on relevant information
## Managing Connections
For each connected data source, users can:
### View Connection Status
- See when data was last synced
- View basic connection information
### Delete Connections
- Permanently remove a data source
- All associated data is deleted according to Crosshatch's data retention policy
- Applications lose access to that specific data source
## Upcoming Features
The Profile is continually evolving with new privacy controls:
- Fine-grained access control. Users will soon be able to toggle between
"Always" and "While Using" access modes for applications
- Detailed data selection. Future updates will allow users to choose
which specific data elements within a source are made available
- Sync visibility. Users (and developers) will be able to see when data
was last synced.
The Data Connections interface is designed with privacy as a priority, giving users confidence that they understand and control their personal data ecosystem within Crosshatch.
---
title: Profile Overview
description: How users manage their data connections and application permissions in Crosshatch
order: 1
---
## Profile
The Crosshatch Profile is where users manage their data connections and control which applications can access their information. This is a critical part of Crosshatch's privacy-first architecture - users always maintain visibility and control over their data.
## Relationship to Link
While Link provides a focused experience for connecting to a specific application, the Profile gives users a comprehensive view of all their connections and app partners.
- Link is application-specific, providing onboarding for a single app
partner
- Profile is user-centric, providing management across all connected app
partners
Users who have connected through Link can always access their Profile at [profile.crosshatch.io](https://profile.crosshatch.io) to manage their data and permissions independently.
## Profile Functionality
After connecting to your application through Link, users can access their Profile at [profile.crosshatch.io](https://profile.crosshatch.io) to:
- Connect and manage data sources
- Control which applications can access specific data
- Update account settings and preferences
## Key Features
### Data Connections
Users can manage all their connected data sources:
- View all currently connected data sources (Gmail, Google Calendar, Plaid, etc.)
- Add new data connections
- Monitor sync status for each connection
- Delete data connections when no longer needed
### Application Permissions
Users have granular control over how applications access their data:
- See all applications connected to their Crosshatch account
- Toggle which data sources each application can access
- Set permissions at the individual data source level
- Disconnect applications entirely
### Account Management
Users can manage their Crosshatch account:
- Update profile information
- Configure notification preferences
- Request account deletion
## Handling User-Initiated Changes
Since users can modify their data connections and permissions at any time through their Profile, developers should be aware of how these changes affect your application:
### Data Source Disconnection
When a user disconnects a data source in their Profile:
- The disconnected data source will simply not be available in query results
- No specific error is returned - the data will just not appear in responses
- Your application should gracefully handle this absence of expected data
### App Access Revocation
If a user completely disconnects your application:
- Refresh tokens will no longer grant new access tokens
- Your application should detect authentication failures and prompt the user to reconnect through Link
### Current Limitations
The current API has some limitations to be aware of:
- No metadata is provided about which data sources a user has connected
- No sync status information is available (data typically takes 5-7 minutes to appear after initial connection)
- No webhooks or events are sent when permissions change
Future API enhancements may include:
- Data source availability metadata in responses
- Sync status information for data sources
- Webhooks for permission change events
---
title: Personalization Library
order: 2
---
# Personalization Prompt Library
This library provides ready-to-use prompt patterns for common consumer personalization use cases. Each template demonstrates how to effectively combine application data with Crosshatch personal timeline data to create powerful, personalized experiences.
## Shopping & Style Patterns
### Personal Shopping Assistant
This pattern helps recommend products based on a user's shopping behavior.
```javascript
const prompt = `
CUSTOMER INFORMATION
Recently Viewed Items: ${recentlyViewed.map((item) => item.name).join(", ")}
Past Purchases: ${pastPurchases.map((p) => p.productName).join(", ")}
Cart Abandons: ${abandonedItems.map((a) => a.productName).join(", ")}
Customer Since: ${customerSince}
Based on the customer's transaction history across retailers {purchase_history}
and our current seasonal inventory ${availableInventory.seasonalFocus},
recommend 3-5 items that:
1. Complement their past purchases
2. Align with styles they've shown interest in
3. Include items similar to those they've abandoned
4. Are available in our current inventory
5. Would be appropriate for the current season
For each recommendation, explain briefly why it's relevant to this customer.
`;
const contextQueries = {
purchase_history: {
select: [
"originalTimestamp",
"object_name",
"object_style",
"object_brand",
"order_total",
],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "SHOPPING" },
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 15,
},
};
```
### Wardrobe Planner
This pattern helps suggest outfits based on weather and upcoming calendar events.
```javascript
const prompt = `
USER CONTEXT
Recent Purchases: ${recentPurchases.map((p) => p.itemName).join(", ")}
Browsing Categories: ${browsingCategories.join(", ")}
Location: ${userLocation}
Current Season: ${currentSeason}
Based on the user's clothing purchases across retailers {clothing_purchases}
and upcoming calendar events {calendar_events},
create a personalized outfit guide that:
1. Focuses on items they've actually purchased
2. Suggests combinations they might not have considered
3. Takes into account the current weather and season
4. Proposes appropriate outfits for upcoming events
5. Includes accessories and layering options when relevant
Organize recommendations by event type or day, with specific outfit details.
`;
const contextQueries = {
clothing_purchases: {
select: [
"originalTimestamp",
"object_name",
"object_style",
"object_brand",
"object_category",
],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "SHOPPING" },
{ field: "object_subcategory", op: "=", value: "APPAREL" },
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 30,
},
calendar_events: {
select: [
"object_name",
"object_start_time",
"object_end_time",
"object_location",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "google-calendar" },
{ field: "event", op: "=", value: "confirmed" },
{ field: "object_start_time", op: ">", value: "2024-02-24T00:00:00Z" },
],
orderBy: [{ field: "object_start_time", dir: "asc" }],
limit: 10,
},
};
```
## Travel & Entertainment Patterns
### Personalized Travel Planner
This pattern helps create customized travel experiences based on a user's behavior.
```javascript
const prompt = `
TRAVEL INQUIRY
Destination Interest: ${searchedDestination}
Dates Browsed: ${browsedDates}
Price Range Viewed: ${viewedPriceRange}
Previous Bookings: ${previousBookings.map((b) => b.destination).join(", ")}
Based on this traveler's:
- Past hotel stays across travel sites {hotel_history}
- Restaurant visits {dining_history}
- Activity level from fitness apps {activity_data}
- Upcoming calendar {calendar_events}
Create a personalized travel itinerary that:
1. Suggests accommodations similar to places they've enjoyed before
2. Recommends restaurants matching their demonstrated dining preferences
3. Proposes activities appropriate for their typical activity level
4. Works around their existing calendar commitments
5. Stays within their browsed price range
Provide a day-by-day breakdown with specific recommendations and reasoning.
`;
const contextQueries = {
hotel_history: {
select: [
"originalTimestamp",
"object_name",
"object_style",
"object_city",
"order_total",
],
from: "personalTimeline",
where: [
{ field: "object_subcategory", op: "=", value: "HOTELS_ACCOMMODATIONS" },
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 10,
},
dining_history: {
select: [
"originalTimestamp",
"object_name",
"object_style",
"object_city",
"order_total",
],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "FOOD_DRINK" },
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 15,
},
activity_data: {
select: [
"originalTimestamp",
"object_name",
"object_duration",
"object_distance",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "in", value: ["fitbit", "strava"] },
{ field: "event", op: "=", value: "exercised" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 20,
},
calendar_events: {
select: ["object_name", "object_start_time", "object_end_time"],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "google-calendar" },
{ field: "event", op: "=", value: "confirmed" },
{ field: "object_start_time", op: ">", value: "2024-02-24T00:00:00Z" },
],
orderBy: [{ field: "object_start_time", dir: "asc" }],
limit: 10,
},
};
```
### Music Festival Scheduler
This pattern helps prioritize performances based on a user's music preferences.
```javascript
const prompt = `
FESTIVAL DETAILS
Festival Name: ${festival.name}
Dates: ${festival.dates}
User's Saved Artists: ${savedArtists.join(", ")}
Viewed Performances: ${viewedPerformances.join(", ")}
LINEUP
${festivalLineup.map((a) => `- ${a.name}: ${a.genre} - ${a.stage} @ ${a.time}`).join("\n")}
Based on this user's music listening history across platforms {music_history},
create a personalized festival schedule that:
1. Prioritizes artists similar to their most-played tracks and genres
2. Includes their explicitly saved artists
3. Discovers new artists aligned with their taste
4. Creates a realistic schedule considering venue locations and set times
5. Includes appropriate breaks
Format as a day-by-day, hour-by-hour schedule with recommendations and alternatives.
`;
const contextQueries = {
// Note: Apple Music integration coming in Q1 2025
// Contact Crosshatch for early access
music_history: {
select: [
"originalTimestamp",
"object_name",
"object_artist",
"object_genre",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "apple-music" },
{ field: "event", op: "=", value: "played" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 30,
},
};
```
## Health & Wellness Patterns
### Personalized Health Dashboard
This pattern helps users understand how their lifestyle choices affect their health metrics.
```javascript
const prompt = `
USER APP DATA
Current Goal: ${currentGoal}
Last Logged Metrics: ${lastLoggedMetrics}
App Usage Frequency: ${appUsageFrequency}
Self-Reported Concerns: ${selfReportedConcerns.join(", ")}
Based on this user's:
- Wearable activity data {activity_data}
- Sleep patterns {sleep_data}
- Food purchases {food_purchases}
Create a personalized health insights report that:
1. Shows correlations between activities and their health metrics
2. Identifies positive patterns they should continue
3. Suggests specific adjustments that would help them reach their goal
4. Notes progress toward their stated goal
5. Provides actionable recommendations prioritized by potential impact
Format as a clear, supportive analysis that empowers rather than judges.
`;
const contextQueries = {
activity_data: {
select: [
"originalTimestamp",
"object_name",
"object_duration",
"object_distance",
"object_calories",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "in", value: ["fitbit", "strava", "oura"] },
{ field: "event", op: "=", value: "exercised" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 20,
},
sleep_data: {
select: [
"originalTimestamp",
"object_duration",
"object_quality",
"object_deep_sleep",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "fitbit" },
{ field: "event", op: "=", value: "slept" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 10,
},
food_purchases: {
select: ["originalTimestamp", "object_name", "object_category"],
from: "personalTimeline",
where: [
{
field: "object_category",
op: "in",
value: ["FOOD_DRINK", "GROCERIES"],
},
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 30,
},
};
```
### Adaptive Workout Generator
This pattern helps create personalized workout plans based on recent performance and goals.
```javascript
const prompt = `
FITNESS APP DATA
Current Goal: ${currentGoal}
Completed Workouts: ${completedWorkouts.map((w) => w.type).join(", ")}
Typical Duration: ${typicalDuration} minutes
Preferred Workout Days: ${preferredDays.join(", ")}
Based on this user's:
- Recent workout history from connected devices {workout_history}
- Performance trends {performance_data}
- Sleep quality {sleep_data}
- Upcoming schedule {calendar_events}
Create a personalized workout plan for the next 7 days that:
1. Aligns with their current fitness goal
2. Builds on their recently completed workouts
3. Adapts intensity based on recent performance data
4. Schedules rest days based on sleep patterns
5. Fits around their calendar commitments
Provide specific exercises, sets, reps, and duration for each workout session.
`;
const contextQueries = {
workout_history: {
select: [
"originalTimestamp",
"object_name",
"object_duration",
"object_distance",
"object_calories",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "in", value: ["fitbit", "strava"] },
{ field: "event", op: "=", value: "exercised" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 15,
},
performance_data: {
select: [
"originalTimestamp",
"object_name",
"object_duration",
"object_distance",
"object_pace",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "strava" },
{ field: "event", op: "=", value: "exercised" },
{ field: "object_name", op: "contains", value: "run" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 10,
},
sleep_data: {
select: ["originalTimestamp", "object_duration", "object_quality"],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "fitbit" },
{ field: "event", op: "=", value: "slept" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 7,
},
calendar_events: {
select: ["object_name", "object_start_time", "object_end_time"],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "google-calendar" },
{ field: "event", op: "=", value: "confirmed" },
{ field: "object_start_time", op: ">", value: "2024-02-24T00:00:00Z" },
],
orderBy: [{ field: "object_start_time", dir: "asc" }],
limit: 20,
},
};
```
## Food & Nutrition Patterns
### Personalized Recipe Suggester
This pattern helps recommend recipes based on a user's food preferences.
```javascript
const prompt = `
RECIPE APP DATA
Saved Recipes: ${savedRecipes.map((r) => r.name).join(", ")}
Recipe Search History: ${recipeSearches.join(", ")}
Diet Filters: ${dietFilters.join(", ")}
Meal Plan Subscription: ${mealPlanType}
Based on this person's:
- Recent grocery purchases {grocery_history}
- Restaurant dining patterns {restaurant_history}
- Weekly schedule {calendar_events}
Create a personalized meal plan that:
1. Incorporates their demonstrated food preferences
2. Uses ingredients likely available based on recent purchases
3. Respects their dietary filters
4. Accounts for their schedule (quick meals on busy days)
5. Includes variety throughout the week
Provide 5 dinner recipes with ingredient lists, noting which ingredients they likely already have.
`;
const contextQueries = {
grocery_history: {
select: [
"originalTimestamp",
"object_name",
"object_category",
"order_summary",
],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "GROCERIES" },
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 10,
},
restaurant_history: {
select: ["originalTimestamp", "object_name", "object_style", "object_city"],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "FOOD_DRINK" },
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 15,
},
calendar_events: {
select: ["object_name", "object_start_time", "object_end_time"],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "google-calendar" },
{ field: "event", op: "=", value: "confirmed" },
{ field: "object_start_time", op: ">", value: "2024-02-24T00:00:00Z" },
],
orderBy: [{ field: "object_start_time", dir: "asc" }],
limit: 10,
},
};
```
### Smart Party Planner
This pattern helps plan events with personalized recommendations.
```javascript
const prompt = `
EVENT APP DATA
Event Type: ${eventType}
Guest Count: ${guestCount}
Budget Range: ${budgetRange}
Previous Events Created: ${previousEvents.map((e) => e.type).join(", ")}
Based on the host's:
- Past food and beverage purchases {food_purchases}
- Restaurant preferences {restaurant_history}
- Previous events {past_events}
Create a comprehensive party plan that:
1. Suggests a menu based on demonstrated food preferences
2. Recommends beverages based on past purchases
3. Proposes entertainment aligned with previous events
4. Includes a shopping list with estimated costs
5. Provides a preparation timeline
Format as a complete event plan with specific recommendations and alternatives.
`;
const contextQueries = {
food_purchases: {
select: [
"originalTimestamp",
"object_name",
"object_category",
"order_summary",
],
from: "personalTimeline",
where: [
{
field: "object_category",
op: "in",
value: ["GROCERIES", "FOOD_DRINK"],
},
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 20,
},
restaurant_history: {
select: ["originalTimestamp", "object_name", "object_style", "order_total"],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "FOOD_DRINK" },
{ field: "object_subcategory", op: "=", value: "RESTAURANTS" },
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 15,
},
past_events: {
select: [
"originalTimestamp",
"object_name",
"object_description",
"object_location",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "google-calendar" },
{ field: "event", op: "=", value: "confirmed" },
{ field: "object_name", op: "contains", value: "party" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 5,
},
};
```
## Communication & Productivity Patterns
### Smart Home Optimizer
This pattern helps adjust home settings based on a user's patterns.
```javascript
const prompt = `
SMART HOME APP DATA
Connected Devices: ${connectedDevices.join(", ")}
Current Settings: ${currentSettings}
Automation Rules: ${automationRules.map((r) => r.name).join(", ")}
Home Location: ${homeLocation}
Based on the user's:
- Typical daily schedule {calendar_patterns}
- Device usage patterns {device_usage}
- Upcoming calendar {calendar_events}
Create a personalized home automation plan that:
1. Adjusts settings based on their demonstrated preferences
2. Optimizes for both comfort and energy efficiency
3. Accounts for their upcoming calendar events
4. Creates routines for different times of day
5. Suggests improvements to their existing automation rules
Format as a schedule of recommended settings with specific device instructions.
`;
const contextQueries = {
calendar_patterns: {
select: ["object_name", "object_start_time", "object_day_of_week"],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "google-calendar" },
{ field: "event", op: "=", value: "confirmed" },
{ field: "object_is_recurring", op: "=", value: true },
],
limit: 10,
},
device_usage: {
select: ["originalTimestamp", "object_name", "object_category"],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "HOME_GARDEN" },
{ field: "object_subcategory", op: "=", value: "HOME_APPLIANCES" },
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 30,
},
calendar_events: {
select: ["object_name", "object_start_time", "object_end_time"],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "google-calendar" },
{ field: "event", op: "=", value: "confirmed" },
{ field: "object_start_time", op: ">", value: "2024-02-24T00:00:00Z" },
],
orderBy: [{ field: "object_start_time", dir: "asc" }],
limit: 10,
},
};
```
## Entertainment & Financial Patterns
### Cross-Platform Media Recommender
This pattern helps suggest content across streaming services.
```javascript
const prompt = `
STREAMING APP DATA
Recently Watched: ${recentlyWatched.map((w) => w.title).join(", ")}
Watchlist Items: ${watchlist.map((w) => w.title).join(", ")}
Content Ratings: ${contentRatings.map((r) => r.title + ": " + r.rating).join(", ")}
Subscribed Services: ${subscribedServices.join(", ")}
Based on the user's:
- Viewing history across platforms {viewing_history}
- Music listening patterns {music_history}
- Content engagement {content_engagement}
Create personalized content recommendations that:
1. Work across all their available streaming services
2. Include a mix of content types they've engaged with
3. Suggest both trending items and hidden gems
4. Consider their viewing habits and preferences
5. Group recommendations by mood and category
Format as a curated guide with specific titles and where to find them.
`;
const contextQueries = {
viewing_history: {
select: [
"originalTimestamp",
"object_name",
"object_category",
"object_genre",
"object_creator",
],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "ARTS_ENTERTAINMENT" },
{ field: "object_subcategory", op: "in", value: ["MOVIES", "TV_VIDEO"] },
{ field: "event", op: "=", value: "confirmed" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 30,
},
// Note: Apple Music integration coming in Q1 2025
// Contact Crosshatch for early access
music_history: {
select: [
"originalTimestamp",
"object_name",
"object_artist",
"object_genre",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "apple-music" },
{ field: "event", op: "=", value: "played" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 30,
},
content_engagement: {
select: ["originalTimestamp", "object_name", "object_category", "event"],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "ARTS_ENTERTAINMENT" },
{ field: "event", op: "=", value: "confirmed" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 20,
},
};
```
### Personal Financial Advisor
This pattern helps suggest financial strategies based on spending behavior.
```javascript
const prompt = `
FINANCIAL APP DATA
Account Types: ${accountTypes.join(", ")}
Recent Transactions: ${recentTransactions.length} in the last 30 days
Savings Goals: ${savingsGoals.map((g) => g.name).join(", ")}
Budget Categories: ${budgetCategories.map((c) => c.name).join(", ")}
Based on the user's:
- Spending patterns {spending_history}
- Regular expenses {recurring_expenses}
- Upcoming bills {upcoming_bills}
Create a personalized financial advisory report that:
1. Analyzes spending patterns to identify savings opportunities
2. Highlights budget categories where they're over/under spending
3. Suggests specific adjustments to help reach their goals
4. Identifies potentially unnecessary subscriptions
5. Proposes a realistic savings plan based on their actual behavior
Format as a supportive financial analysis with actionable recommendations.
`;
const contextQueries = {
spending_history: {
select: [
"originalTimestamp",
"object_name",
"object_category",
"order_total",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "plaid" },
{ field: "event", op: "=", value: "purchased" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 50,
},
recurring_expenses: {
select: [
"object_name",
"object_category",
"order_total",
"object_frequency",
],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "plaid" },
{ field: "event", op: "=", value: "purchased" },
{ field: "object_is_recurring", op: "=", value: true },
],
limit: 20,
},
upcoming_bills: {
select: ["object_name", "object_due_date", "order_total"],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "plaid" },
{ field: "event", op: "=", value: "bill_due" },
{ field: "object_due_date", op: ">", value: "2024-02-24T00:00:00Z" },
],
orderBy: [{ field: "object_due_date", dir: "asc" }],
limit: 10,
},
};
```
## Adapting These Templates
These patterns can be customized for your specific use case:
1. **Start with what you have**: Begin with the application data you collect
2. **Complement with timeline data**: Use Crosshatch to fill gaps in understanding
3. **Focus on specific use cases**: Target personalization to solve concrete user problems
4. **Start simple**: Begin with basic personalization before attempting complex scenarios
## Best Practices
When implementing these personalization patterns:
1. **Test your queries**: Use the Query API to verify your context queries return useful data
2. **Honor user privacy**: Only query the data needed for your specific use case
3. **Plan for missing data**: Ensure your prompts work even if certain context is unavailable
4. **Collect feedback**: Refine your patterns based on user reactions
5. **Be transparent**: Let users know how their data is being used to personalize experiences
## Next Steps
- Learn more about [Prompt Templates](/guides/prompt-templates) for effective variable usage
- Explore [Context Layer Querying](/context-layer/querying) for advanced query techniques
- Review the [AI API Reference](/apis/ai) for implementation details
---
title: Prompt Templates
order: 1
---
# Prompt Templates with Runtime Variables
When building AI-powered applications with Crosshatch, you'll typically need to combine two distinct types of data in your prompts:
- **Application data**: Information your application already has access to
- **Private user data**: Personal information accessed through Crosshatch
This guide explains how to effectively combine these using prompt templates with runtime variables.
## Understanding the Two Types of Variables
### Application Variables
These are variables that your application controls and inserts into prompts before sending them to Crosshatch:
- User account information
- Business logic parameters
- Application state
- Previously computed results
- Public or reference data
You handle these variables entirely within your application using your preferred templating approach.
### Crosshatch Variables
These are placeholders for private user data that Crosshatch will resolve at runtime:
- Email contents
- Calendar events
- Financial transactions
- Personal messages
- Health and fitness data
Crosshatch replaces these variables using data retrieved from the user's personal timeline through the Context Layer.
## How Template Variables Work Together
A complete prompt typically combines both types of variables:
```
Here's what we know about the customer:
- Name: ${customerName}
- Account type: ${accountType}
- Joined: ${joinDate}
Based on their recent transactions {recent_transactions},
recommend products they might be interested in.
```
In this example:
- `${customerName}`, `${accountType}`, and `${joinDate}` are application variables you replace before sending
- `{recent_transactions}` is a Crosshatch variable that gets replaced at runtime with private user data
## Creating Effective Templates
### Step 1: Design Your Prompt Structure
Start by designing the overall structure of your prompt, considering:
- What instructions does the AI model need?
- What context will help it generate better responses?
- What information should be fixed vs. variable?
### Step 2: Identify Your Variables
Determine which parts of your prompt should be:
- Fixed content (static instructions)
- Application variables (controlled by your application)
- Crosshatch variables (private user data)
### Step 3: Implement Application-Side Templating
Use your preferred approach to insert application variables into your prompt:
```javascript
// Using JavaScript template literals
const appPrompt = `
Customer: ${customer.name}
Account: ${customer.accountType}
Region: ${customer.region}
Based on their purchase history {purchase_history},
recommend 3 products they might enjoy.
`;
```
### Step 4: Define Crosshatch Variable Queries
For each Crosshatch variable, define a query to retrieve the relevant private data:
```javascript
const contextQueries = {
purchase_history: {
select: ["originalTimestamp", "object_name", "order_total"],
from: "personalTimeline",
where: [{ field: "event", op: "=", value: "purchased" }],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 5,
},
};
```
### Step 5: Send to Crosshatch for Processing
Send your pre-processed prompt to Crosshatch along with the variable queries:
```javascript
fetch("https://ai.crosshatch.io/", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "Bearer " + userToken,
"Crosshatch-Provider": "claude-3-5-sonnet",
"Crosshatch-Replace": JSON.stringify(contextQueries),
},
body: JSON.stringify({
anthropic_version: "vertex-2023-10-16",
messages: [
{
role: "user",
content: appPrompt,
},
],
max_tokens: 1000,
}),
});
```
## Examples of Combined Templates
### Product Recommendation Engine
```javascript
// Application variables
const customer = {
name: "Alex Rivera",
tier: "Gold",
interests: ["Tech", "Home", "Outdoors"],
};
// Prompt with both types of variables
const prompt = `
Customer Profile:
Name: ${customer.name}
Tier: ${customer.tier}
Interests: ${customer.interests.join(", ")}
Based on their recent purchases {recent_purchases}
and their browsing history {browsing_history},
recommend 3-5 products they might be interested in.
`;
// Crosshatch variable queries
const contextQueries = {
recent_purchases: {
select: ["originalTimestamp", "object_name", "order_total"],
from: "personalTimeline",
where: [{ field: "event", op: "=", value: "purchased" }],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 5,
},
browsing_history: {
select: ["originalTimestamp", "object_name"],
from: "personalTimeline",
where: [{ field: "event", op: "=", value: "viewed" }],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 10,
},
};
```
### Travel Planning Assistant
```javascript
// Application variables
const preferences = {
destination: "Hawaii",
budget: "$3000-5000",
tripDuration: "7-10 days",
};
// Prompt with both types of variables
const prompt = `
Trip Planning Request:
Destination: ${preferences.destination}
Budget: ${preferences.budget}
Duration: ${preferences.tripDuration}
Based on their upcoming calendar {calendar_events}
and their past travel history {travel_history},
suggest an itinerary for ${preferences.destination} that:
1. Works around their existing commitments
2. Matches their apparent travel preferences
3. Fits within the specified budget and timeframe
`;
// Crosshatch variable queries
const contextQueries = {
calendar_events: {
select: ["object_name", "object_start_time", "object_end_time"],
from: "personalTimeline",
where: [
{ field: "channel", op: "=", value: "google-calendar" },
{ field: "object_start_time", op: ">", value: "2024-02-24T00:00:00Z" },
],
orderBy: [{ field: "object_start_time", dir: "asc" }],
limit: 10,
},
travel_history: {
select: ["originalTimestamp", "object_name", "object_city", "object_style"],
from: "personalTimeline",
where: [
{ field: "object_category", op: "=", value: "TRAVEL_TRANSPORTATION" },
{ field: "object_subcategory", op: "=", value: "HOTELS_ACCOMMODATIONS" },
],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 10,
},
};
```
## Best Practices
### For Application Variables
1. **Pre-process complex data**: Format complex application data into digestible chunks before inserting into templates
2. **Use consistent formatting**: Maintain a consistent style in your templates
3. **Sanitize user inputs**: If any variables come from user input, ensure they're properly sanitized
4. **Version your templates**: Keep track of changes to your prompt templates as your application evolves
### For Crosshatch Variables
1. **Be specific in queries**: Use precise queries to retrieve only the relevant private data
2. **Test queries independently**: Use the Query API to test and refine your context queries
3. **Include appropriate limits**: Control the amount of data returned to manage token usage
4. **Use descriptive variable names**: Make your variable names self-explanatory
### General Template Guidelines
1. **Clear instructions**: Provide explicit instructions about how to use each type of data
2. **Single responsibility**: Focus each template on a specific task or scenario
3. **Fallback handling**: Consider what should happen if certain data is unavailable
4. **Test with sample data**: Validate templates with representative data before deploying to production
## Complete Example
Here's a complete example showing how everything works together:
```javascript
async function getPersonalizedRecommendations(userId) {
// 1. Get application data
const customer = await getCustomerProfile(userId);
const store = await getNearestLocation(customer.zipCode);
const currentPromotions = await getActivePromotions(store.id);
// 2. Format application data in a prompt template
const promptTemplate = `
CUSTOMER PROFILE
Name: ${customer.name}
Loyalty Tier: ${customer.tier}
Location: ${customer.city}, ${customer.state}
STORE INFORMATION
Nearest Location: ${store.name}
Current Promotions: ${currentPromotions.map((p) => p.name).join(", ")}
Based on this information, their recent purchases {recent_purchases},
and their browsing history {browsing_history}, provide:
1. Three product recommendations tailored to their preferences
2. A personalized message explaining why each recommendation suits them
3. Any relevant promotions from our current offerings that apply
`;
// 3. Define Crosshatch variable queries
const contextQueries = {
recent_purchases: {
select: [
"originalTimestamp",
"object_name",
"object_style",
"order_total",
],
from: "personalTimeline",
where: [{ field: "event", op: "=", value: "purchased" }],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 5,
},
browsing_history: {
select: ["originalTimestamp", "object_name", "object_category"],
from: "personalTimeline",
where: [{ field: "event", op: "=", value: "viewed" }],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 10,
},
};
// 4. Get user token
const userToken = await getUserToken(userId);
// 5. Send request to Crosshatch
const response = await fetch("https://ai.crosshatch.io/", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "Bearer " + userToken,
"Crosshatch-Provider": "claude-3-5-sonnet",
"Crosshatch-Replace": JSON.stringify(contextQueries),
},
body: JSON.stringify({
anthropic_version: "vertex-2023-10-16",
messages: [
{
role: "user",
content: promptTemplate,
},
],
max_tokens: 1000,
}),
});
return await response.json();
}
```
## Testing Your Templates
To ensure your templates work as expected:
1. **Test application variables**: Verify your application correctly processes its own variables
2. **Test Crosshatch queries**: Use the Query API to test each context query independently
3. **Test the combined template**: Check if the complete prompt produces the expected results
4. **Test edge cases**: Try scenarios with missing data or unusual values
Example Query API test:
```javascript
// Test a Crosshatch context query
const testResponse = await fetch("https://query.crosshatch.io/", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "Bearer " + userToken,
"Crosshatch-Replace": JSON.stringify({
recent_purchases: {
select: ["originalTimestamp", "object_name", "order_total"],
from: "personalTimeline",
where: [{ field: "event", op: "=", value: "purchased" }],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 5,
},
}),
},
});
// Check the results
const testResults = await testResponse.json();
console.log("Query returns:", testResults);
```
## Next Steps
- Explore the [Context Layer Querying](/context-layer/querying) documentation for more advanced query patterns
- Learn how to use the [Query API](/apis/query) to test your context queries
- Check out our upcoming [Personalization Library](/guides/personalization-library) for ready-to-use template patterns
- See the [AI API Reference](/apis/ai) for complete API details
---
title: OAuth
order: 3
---
## Crosshatch OAuth Integration
The Crosshatch OAuth integration allows you to implement a standard OAuth 2.0 flow for connecting user data. This approach is particularly useful for mobile apps, server-side applications, or any situation where implementing the Web SDK isn't ideal.
## Overview
The OAuth flow follows these steps:
1. Redirect users to Crosshatch's authorization URL
2. Users authenticate and connect data sources
3. Users are redirected back to your app with an authorization code
4. Your app exchanges the code for access and refresh tokens
5. Your app uses these tokens to access the Crosshatch API
## Implementation steps
### 1. Initiate authentication flow
To begin the authentication flow, redirect your users to the Crosshatch authorization endpoint:
```bash
CLIENT_ID="your_client_id"
REDIRECT_URI="https://your-app.com/callback"
# User visits this URL to start the authentication process
https://link.crosshatch.io/api/auth?client_id=$CLIENT_ID&redirect_uri=$REDIRECT_URI
```
The parameters are:
- `client_id`: Your Crosshatch Client ID (found in the Details section of Application Management)
- `redirect_uri`: Where users will be sent after authorization (must be registered in your Crosshatch dashboard)
### 2. Handle the redirect
After the user completes the authorization process, they'll be redirected to your `redirect_uri` with an authorization code:
```
https://your-app.com/callback?code=AUTHORIZATION_CODE
```
Your application should be set up to extract this code from the URL parameters.
### 3. Exchange the code for tokens
Once you have the authorization code, exchange it for access and refresh tokens:
```bash
CLIENT_ID="your_client_id"
CODE="authorization_code_from_redirect"
curl -v "https://link.crosshatch.io/api/token?client_id=$CLIENT_ID&grant_type=authorization_code&code=$CODE"
```
The response will contain the tokens:
```json
{
"accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"refreshToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expiresIn": 60,
"expiresAt": 1739232095873
}
```
### 4. Use the access token
Use the access token to make authenticated requests to the Crosshatch API:
```bash
curl -v "https://ai.crosshatch.io/chat/completions?api-version=2024-06-01" \
-H "Authorization: Bearer ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-H "Crosshatch-Provider: gpt-4o" \
-H "Crosshatch-Replace: {\"recent_activities\":{\"select\":[\"event\",\"object_name\"],\"from\":\"personalTimeline\",\"limit\":5}}" \
-d '{
"messages": [
{
"role": "system",
"content": "Consider the user'\''s recent activities {recent_activities}"
},
{
"role": "user",
"content": "Summarize what I've been doing recently."
}
],
"response_format": { "type": "json_object" }
}'
```
### 5. Refresh expired tokens
Access tokens expire after a short period (60 seconds). When this happens, use the refresh token to obtain a new access token:
```bash
CLIENT_ID="your_client_id"
REFRESH_TOKEN="refresh_token_from_step_3"
curl -v "https://link.crosshatch.io/api/token?client_id=$CLIENT_ID&grant_type=refresh_token&refresh_token=$REFRESH_TOKEN"
```
This will return a new access token with the same format as step 3.
## OAuth endpoint reference
| Endpoint | Description |
| -------------------------------------- | ------------------------------------------------------------------------------ |
| `https://link.crosshatch.io/api/auth` | Authorization endpoint to redirect users to |
| `https://link.crosshatch.io/api/token` | Token endpoint for exchanging authorization codes and refreshing tokens |
## Best practices
- **Register redirect URIs**: Always register your redirect URIs in the Crosshatch dashboard
- **Store refresh tokens securely**: Never store refresh tokens in client-side code or localStorage
- **Implement proper token handling**: Refresh access tokens when they expire
- **Use HTTPS**: Always use HTTPS for all OAuth-related requests
- **Implement PKCE**: For mobile apps, implement PKCE (Proof Key for Code Exchange) for added security
## OAuth parameters
### Authorization endpoint
| Parameter | Required | Description |
| -------------- | ----------- | ------------------------------------- |
| `client_id` | Yes | Your Crosshatch Client ID |
| `redirect_uri` | Yes | Where to redirect after authorization |
| `state` | Recommended | Random string to prevent CSRF attacks |
### Token endpoint
| Parameter | Required | Description |
| --------------- | -------- | --------------------------------------------------------------------------- |
| `client_id` | Yes | Your Crosshatch Client ID |
| `grant_type` | Yes | "authorization_code" for initial exchange or "refresh_token" for refreshing |
| `code` | Depends | Required when grant_type is "authorization_code" |
| `refresh_token` | Depends | Required when grant_type is "refresh_token" |
## Mobile implementation
For mobile apps, you should implement the OAuth flow using an embedded browser or the system browser with a custom URL scheme for the redirect:
1. Open the authorization URL in a browser
2. Handle the redirect through a custom URL scheme (iOS) or intent filter (Android)
3. Extract the authorization code from the redirect
4. Exchange it for tokens in your backend
## Next steps
- [Explore available data sources](/context-layer/data-sources)
- [Learn how to query user data](/context-layer/querying)
- [Use the AI API](/apis/ai) to create personalized experiences
---
title: Web SDK
order: 2
---
## Crosshatch Link Web SDK
The Crosshatch Link Web SDK provides a simple way to integrate Crosshatch's user authentication and data connection functionality into your web application. With just a few lines of code, you can give users the ability to securely connect their personal data.
## Installation
Install the SDK using npm or yarn:
```bash
npm install @crosshatch/link
# or
yarn add @crosshatch/link
```
## Basic implementation
### 1. Get your Client ID
Before you begin, you'll need a Client ID from the Details section in the Application Management tab of your [Crosshatch Dashboard](https://platform.crosshatch.io).
### 2. Implement the Link component
Add Crosshatch Link to your application with just a few lines of code:
```javascript
import { link } from "@crosshatch/link";
// Trigger this function when a user clicks your "Connect with Crosshatch" button
const openLink = async () => {
try {
// Launch the Link UI
const { getToken } = await link({
clientId: "YOUR_CLIENT_ID",
});
// The getToken function provides access to the authentication token
const token = getToken();
// Now you can use the token to make requests to the Crosshatch API
console.log("Ready to use token:", token.substring(0, 10) + "...");
} catch (error) {
console.error("Link error:", error);
}
};
```
### 3. Make an API request with the token
Once you have the token, you can use it to make authenticated requests to the Crosshatch API:
```javascript
// Using the token from the previous example
const makeAIRequest = async (token) => {
try {
const response = await fetch(
"https://ai.crosshatch.io/chat/completions?api-version=2024-06-01",
{
method: "POST",
headers: {
"Content-Type": "application/json",
Accept: "application/json",
"Crosshatch-Replace": JSON.stringify({
hotel_history: {
select: ["originalTimestamp", "object_name"],
from: "personalTimeline",
limit: 10,
where: [
{
field: "object_subcategory",
op: "=",
value: "HOTELS_ACCOMMODATIONS",
},
],
},
}),
"Crosshatch-Provider": "gpt-4o",
Authorization: `Bearer ${token}`,
},
body: JSON.stringify({
messages: [
{
role: "system",
content: "Consider the users past hotel stays {hotel_history}",
},
{
role: "user",
content:
"Summarize the top 5 hotel vibes or moods the user likes and return as json key vibes and value list of vibes. Each vibe should be no more than 5 words",
},
],
response_format: { type: "json_object" },
}),
},
);
const data = await response.json();
console.log("API response:", data);
return data;
} catch (error) {
console.error("API request error:", error);
throw error;
}
};
```
## Token management
The token provided by the Link SDK is automatically refreshed in the background. Here's how to work with it:
```javascript
// The getToken function always returns a valid token
const { getToken } = await link({ clientId: "YOUR_CLIENT_ID" });
// Get the token value when you need to make an API call
const token = getToken();
// You can also listen for token updates
const { onToken } = await link({ clientId: "YOUR_CLIENT_ID" });
// This callback will be called whenever a new token is generated
const unsubscribe = onToken((token) => {
console.log("New token available:", token.substring(0, 10) + "...");
// Update your token storage or make a new API call
});
// When you no longer need to listen for token updates
unsubscribe();
```
## Configuration options
The `link()` function accepts these configuration options:
```javascript
const { getToken } = await link({
// Required
clientId: "YOUR_CLIENT_ID",
// Optional (defaults shown)
type: "inline", // 'inline', 'popup', or 'redirect'
linkDomain: "link.crosshatch.io", // Typically you won't need to change this
});
```
## Security considerations
- Tokens are short-lived (expiring in 5 minutes) to enhance security
- The SDK handles token refresh automatically in the background
- Always call `getToken()` right before making an API request to get the latest token from the cache
- Never manually store tokens in localStorage or other persistent storage
## Complete example
Here's a complete example using React:
```jsx
import React, { useState } from "react";
import { link } from "@crosshatch/link";
function CrosshatchIntegration() {
const [isConnected, setIsConnected] = useState(false);
const [response, setResponse] = useState(null);
const [loading, setLoading] = useState(false);
const [error, setError] = useState(null);
// Store the link controller reference
const [linkController, setLinkController] = useState(null);
const connectWithCrosshatch = async () => {
setLoading(true);
setError(null);
try {
// Launch the Crosshatch Link UI
const controller = await link({
clientId: "YOUR_CLIENT_ID",
});
setLinkController(controller);
setIsConnected(true);
} catch (err) {
setError(err.message || "Failed to connect with Crosshatch");
} finally {
setLoading(false);
}
};
const makeApiRequest = async () => {
if (!linkController) return;
setLoading(true);
setError(null);
try {
// Get a fresh token
const token = linkController.getToken();
const response = await fetch(
"https://ai.crosshatch.io/chat/completions?api-version=2024-06-01",
{
method: "POST",
headers: {
"Content-Type": "application/json",
Accept: "application/json",
"Crosshatch-Replace": JSON.stringify({
recent_activities: {
select: [
"originalTimestamp",
"object_name",
"object_category",
"event",
],
from: "personalTimeline",
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 5,
},
}),
"Crosshatch-Provider": "gpt-4o",
Authorization: `Bearer ${token}`,
},
body: JSON.stringify({
messages: [
{
role: "system",
content:
"Consider the user's recent activities {recent_activities}",
},
{
role: "user",
content:
"Summarize my recent activities and suggest something I might enjoy doing next.",
},
],
response_format: { type: "json_object" },
}),
},
);
const data = await response.json();
setResponse(data);
} catch (err) {
setError(err.message || "API request failed");
} finally {
setLoading(false);
}
};
return (
{!isConnected ? (
{loading ? "Connecting..." : "Connect with Crosshatch"}
) : (
Connected to Crosshatch!
{loading ? "Loading..." : "Get Personalized Recommendation"}
)}
{error &&
Error: {error}
}
{response && (
{JSON.stringify(response, null, 2)}
)}
);
}
export default CrosshatchIntegration;
```
## Next steps
- [Explore available data sources](/context-layer/data-sources)
- [Learn how to query user data](/context-layer/querying)
- [Use the AI API](/apis/ai) to create personalized experiences
---
title: Overview
order: 1
---
## Crosshatch Link
Crosshatch Link enables users to securely connect their personal data to your application. It handles the entire authentication and data connection process through a user-friendly interface, allowing you to focus on building great experiences without managing complex data integrations.
## What is Crosshatch Link?
Crosshatch Link is a drop-in component that:
- Presents a sleek, secure UI for users to connect their data sources
- Handles authentication with third-party services
- Manages ongoing data synchronization
- Provides your application with secure access tokens
- Ensures user privacy through explicit consent
## How it works
1. **User initiates the process**: The user clicks "Connect with Crosshatch" in your application
2. **Link UI appears**: A secure overlay or popup presents authentication and data source options
3. **User connects sources**: The user authenticates and selects which data sources to share
4. **Link generates tokens**: Your application receives secure tokens to access permitted data
5. **Your app delivers personalized experiences**: Use the Crosshatch AI API to create personalized experiences
## Integration options
Crosshatch Link offers flexible integration options to suit different application needs:
| Integration Method | Best For | Description |
| ------------------------------ | --------------------------------- | ----------------------------------------------- |
| [Web SDK](/link/link) | Web applications | JavaScript SDK for easy integration in web apps |
| [OAuth Flow](/link/link-oauth) | Mobile apps, backend-driven flows | Standard OAuth 2.0 authorization flow |
## Key features
- **Simple integration**: Implement with just a few lines of code
- **Pre-built UI**: Professional, responsive interface with no UI work required
- **Multi-source support**: Connect to Google, Plaid, fitness trackers, and more from a single interface
- **Secure by design**: User data remains private and protected
- **Fine-grained access control**: Users can share specific data types (e.g., only recent purchases from Gmail)
- **Dynamic permissions**: Request only the data your application needs
- **Configurable data sources**: Control which data sources users can connect to your application via the Developer Portal
- **Token management**: Automatic handling of authentication tokens
## Example implementation
Here's how simple it is to implement Crosshatch Link:
```javascript
import { link } from "@crosshatch/link";
// When a user clicks "Connect with Crosshatch"
const startLink = async () => {
try {
// Launch the Crosshatch Link UI
const { getToken } = await link({
clientId: "YOUR_CLIENT_ID",
});
// Get the authentication token to use with Crosshatch API
const token = getToken();
// Use the token to make personalized API requests
// See the Web SDK documentation for complete examples
} catch (error) {
console.error("Link error:", error);
}
};
```
## Configuring data sources
In the [Developer Portal](https://platform.crosshatch.io), you can customize which data sources your users will be able to connect to your application:
1. Navigate to **App Management**
2. Select your app from the list and click **Details**
3. Click **App Settings**
4. In the **Data Access** section, toggle on/off the specific data sources you want to enable
5. Click **Update** to save your changes
This configuration allows you to request only the relevant data for your application's personalization needs, enhancing both user privacy and the user experience.
## Next steps
- [Implement the Web SDK](/link/link) in your application
- [Use OAuth](/link/link-oauth) for mobile or backend applications
- [Explore available data sources](/context-layer/data-sources)
- [Learn about the AI API](/apis/ai) for using connected data
---
title: Data Sources
order: 5
---
Crosshatch integrates with popular platforms to provide a unified view of user activities across different applications. This page outlines the available data sources, the types of events they provide, and how frequently they're updated.
## Core data sources
These platforms are directly integrated with Crosshatch and provide a range of user activity data.
| Source | Key Events | Data Type | Update Frequency | Description |
| ----------------- | ------------------------ | --------- | ---------------- | ------------------------------------- |
| `plaid` | `purchased` | `track` | 15 minutes | Financial transactions and merchants |
| `google-mail` | `purchased`, `confirmed` | `derived` | 5 minutes | Activities detected in email content |
| `google-calendar` | `confirmed` | `track` | 3 hours | Scheduled events and appointments |
| `fitbit` | `exercised`, `slept` | `track` | 12 hours | Fitness activities and sleep tracking |
| `strava` | `exercised` | `track` | 12 hours | Detailed exercise and route data |
## Email-derived activities
The `google-mail` source discovers activities from email content, providing context from many services without requiring direct integration. Here are common sources found in emails:
| Source Type | Events | Example Data |
| ------------- | ----------- | ------------------------------------------------- |
| Retail | `purchased` | Amazon, Walmart, Instacart, Target, Best Buy |
| Restaurants | `confirmed` | OpenTable, Resy, Tock, restaurant confirmations |
| Travel | `confirmed` | Hotel bookings, flight confirmations, car rentals |
| Entertainment | `confirmed` | Event tickets, reservations, bookings |
| Services | `confirmed` | Appointments, subscriptions, memberships |
All email-derived activities are updated every 5 minutes and marked with `type: "derived"` to indicate they were extracted from unstructured content.
## Event types
Each activity in the Context Layer is represented by an event type that describes the user action.
### `purchased`
Represents a completed transaction or purchase made by the user, typically from financial transactions.
Available in: Plaid, Google Mail (derived)
Key data points:- Merchant name and location - Transaction amount and
currency - Purchase date and time - Categories and item details (when available)
### `confirmed`
Represents a confirmation of an upcoming event, reservation, booking, or purchase.
Available in: Google Calendar, Google Mail (derived)
Key data points:- Venue or service name - Reservation/event date and time
- Location details - Confirmation details and reference numbers
Note: Purchases can appear in both `purchased` events (from financial
transactions) and `confirmed` events (from purchase confirmation emails). The
key difference is that `purchased` events represent completed transactions with
verified amounts, while purchase `confirmed` events represent order
confirmations that may occur before the financial transaction.
### `exercised`
Represents a physical activity or workout.
Available in: Fitbit, Strava
Key data points:- Activity type (run, cycle, swim, etc.) - Duration and
distance - Start and end time - Location (for outdoor activities) - Performance
metrics (pace, calories, etc.)
### `slept`
Represents sleep activity and patterns.
Available in: Fitbit
Key data points:- Sleep start and end time - Duration and quality
measures - Sleep stages when available
## Data freshness
Different data sources update at different frequencies:
- High frequency (5-15 min): Email-derived activities (5 min) and
financial transactions (15 min)
- Medium frequency (3 hours): Calendar events
- Lower frequency (12 hours): Fitness and activity data
These update cadences balance data freshness with system performance.
## Upcoming integrations
Crosshatch is continuously expanding its data source integrations:
- Q4 2024: Whoop, Oura Ring, Reddit, YouTube
- Q1 2025: Google Maps*, Google Chrome*, Apple Music
- Backlog: Spotify, Pinterest, Eight Sleep, Apple Health
\*EU users only
## Special considerations
- Receipt details: Itemized receipts may be available in the
`order_summary` field for some purchases detected in emails
- Data variability: The level of detail available varies by source and
specific content
- User permissions: All data access requires explicit user consent
through Crosshatch Link
- Historical data: When users first connect, Crosshatch syncs
approximately 6 months of historical data where available
## Next steps
- See [data mappings](/context-layer/mappings) to understand how sources are transformed
- Learn how to [query](/context-layer/querying) this data effectively
- Explore the [schema](/context-layer/schema) for field definitions
---
title: Mappings
order: 4
---
Crosshatch transforms data from diverse sources into a unified `personalTimeline` schema. This page shows how different data sources are represented in the Context Layer through practical examples.
## How data is transformed
When user data enters the Context Layer, it undergoes a multi-step transformation process:
1. Extract: Raw data is retrieved from the source application
2. Derive: Structured events are generated from the raw data
3. Enrich: Additional metadata is added to provide deeper context
4. Normalize: The enriched events are transformed to fit the
personalTimeline schema
This process creates a consistent representation of user activities regardless of the original data source.
## Data sources and examples
### Google Mail (`channel = 'google-mail'`)
Google Mail data is processed to extract information about purchases, bookings, and other activities. The system parses email content to identify key details like merchant names, prices, dates, and locations.
Key events: `purchased`, `confirmed`
Primary categories: Travel, Food/Drink, Shopping
#### Hotel confirmation example
```json
{
"userid": "usr_1234567890",
"channel": "google-mail",
"type": "derived",
"event": "confirmed",
"originalTimestamp": "2023-05-14T14:50:00Z",
"message_id": "msg_abcdef123456",
"object_name": "Four Seasons Hotel Miami",
"object_category": "TRAVEL_TRANSPORTATION",
"object_subcategory": "HOTELS_ACCOMMODATIONS",
"object_style": "Luxury, Beachfront, Resort",
"object_address": "1435 Brickell Ave",
"object_city": "Miami",
"object_region": "FL",
"object_country": "US",
"object_start_time": "2023-06-10T15:00:00Z",
"object_end_time": "2023-06-15T11:00:00Z",
"order_total": "2750.00",
"order_currency": "USD",
"object_unstructured": "Your reservation for 5 nights has been confirmed..."
}
```
#### Online purchase example
```json
{
"userid": "usr_1234567890",
"channel": "google-mail",
"type": "derived",
"event": "purchased",
"originalTimestamp": "2023-04-22T08:15:00Z",
"message_id": "msg_ghijkl789012",
"object_name": "Amazon",
"object_category": "SHOPPING",
"object_subcategory": "APPAREL",
"order_total": "89.97",
"order_currency": "USD",
"order_summary": "1x Wireless Earbuds ($49.99), 1x Phone Case ($19.99), 1x USB Cable ($19.99)",
"object_unstructured": "Thank you for your order. Your package will arrive on Thursday, April 25..."
}
```
Special features:
- Itemized receipt extraction for selected merchants (Amazon, Walmart, Target, Instacart) in `order_summary`
- Location extraction from unstructured text - Order confirmation details including dates, times, and prices
- `object_unstructured` contains raw text from, enabling you to extract or transform additional information from an email body using the Crosshatch API.
### Google Calendar (`channel = 'google-calendar'`)
Calendar events show how users plan to spend their time. The system preserves event details, locations, and scheduling information.
Key events: `confirmed`
Primary categories: Business, Travel, Social
#### Meeting example
```json
{
"userid": "usr_1234567890",
"channel": "google-calendar",
"type": "track",
"event": "confirmed",
"originalTimestamp": "2023-05-01T09:00:00Z",
"message_id": "msg_calendar123456",
"object_name": "Team planning meeting",
"object_category": "BUSINESS_INDUSTRIAL",
"object_subcategory": "BUSINESS_SERVICES",
"object_address": "123 Tech Blvd, Conference Room A",
"object_city": "San Francisco",
"object_region": "CA",
"object_country": "US",
"object_start_time": "2023-05-05T09:00:00Z",
"object_end_time": "2023-05-05T10:00:00Z",
"object_duration": 60,
"object_unstructured": "Quarterly planning meeting with Design and Engineering teams. Please bring your Q3 roadmap ideas."
}
```
#### Travel event example
```json
{
"userid": "usr_1234567890",
"channel": "google-calendar",
"type": "track",
"event": "confirmed",
"originalTimestamp": "2023-06-02T14:30:00Z",
"message_id": "msg_calendar789012",
"object_name": "Flight to Miami",
"object_category": "TRAVEL_TRANSPORTATION",
"object_subcategory": "TRAVEL_DESTINATIONS",
"object_start_time": "2023-06-10T08:30:00Z",
"object_end_time": "2023-06-10T11:45:00Z",
"object_duration": 195,
"object_unstructured": "Delta Airlines DL1234, Seat 12A, Confirmation #ABC123"
}
```
Special features:
- Event duration calculation
- Location parsing from event details
- Event categorization based on title and description
### Plaid (`channel = 'plaid'`)
Financial transactions show spending patterns and merchant interactions. Plaid data includes rich details about purchase locations, merchants, and amounts.
Key events: `purchased`
Primary categories: Food/Drink, Shopping, Travel, Services
#### Restaurant transaction example
```json
{
"userid": "usr_1234567890",
"channel": "plaid",
"type": "track",
"event": "purchased",
"originalTimestamp": "2023-05-10T12:30:00Z",
"message_id": "msg_plaid123456",
"object_name": "Pura Vida 1940 Alton Rd Miami Beach FL",
"object_category": "FOOD_DRINK",
"object_subcategory": "RESTAURANTS",
"object_style": "Casual, Healthy, Beachside",
"object_address": "1940 Alton Rd",
"object_city": "Miami Beach",
"object_region": "FL",
"object_zip": "33139",
"object_country": "US",
"order_total": "24.50",
"order_currency": "USD"
}
```
#### Online purchase example
```json
{
"userid": "usr_1234567890",
"channel": "plaid",
"type": "track",
"event": "purchased",
"originalTimestamp": "2023-05-15T09:15:00Z",
"message_id": "msg_plaid789012",
"object_name": "Netflix",
"object_category": "ARTS_ENTERTAINMENT",
"object_subcategory": "TV_VIDEO",
"order_total": "15.99",
"order_currency": "USD"
}
```
Special features:
- Location details for in-person transactions
- Merchant categorization
- Style descriptions to help understand the types of places people spend money
### Fitness Services
Fitness apps provide detailed activity tracking, showing exercise patterns and physical activities.
#### Fitbit (`channel = 'fitbit'`)
Key events: `exercised`
Primary categories: Sports, Fitness
```json
{
"userid": "usr_1234567890",
"channel": "fitbit",
"type": "track",
"event": "exercised",
"originalTimestamp": "2023-05-20T17:30:00Z",
"message_id": "msg_fitbit123456",
"object_name": "Evening run",
"object_type": "Run",
"object_category": "SPORTS",
"object_subcategory": "OUTDOOR_SPORTS",
"object_start_time": "2023-05-20T17:30:00Z",
"object_end_time": "2023-05-20T18:15:00Z",
"object_duration": 45,
"object_unstructured": "Distance: 4.8 km, Calories: 320"
}
```
#### Strava (`channel = 'strava'`)
Key events: `exercised`
Primary categories: Sports, Fitness
```json
{
"userid": "usr_1234567890",
"channel": "strava",
"type": "track",
"event": "exercised",
"originalTimestamp": "2023-05-15T06:30:00Z",
"message_id": "msg_strava123456",
"object_name": "Morning run",
"object_type": "Run",
"object_category": "SPORTS",
"object_subcategory": "OUTDOOR_SPORTS",
"object_city": "Miami",
"object_region": "FL",
"object_country": "US",
"object_start_time": "2023-05-15T06:30:00Z",
"object_end_time": "2023-05-15T07:15:00Z",
"object_duration": 45,
"object_unstructured": "Distance: 5.2 km, Elevation gain: 35m, Average speed: 11.6 km/h"
}
```
Special features:- Activity metrics (distance, speed, elevation) -
Duration calculations - Location tracking for outdoor activities
## Understanding data enrichment
Many fields in the examples above are enhanced beyond what's directly available in the source data:
- Categorization: Objects are assigned categories and subcategories from
our standard taxonomy
- Style descriptions: AI-generated descriptions of places, businesses,
and objects to better understand the types of places people interact with
- Location parsing: Structured location data extracted from text
- Receipt itemization: Items and prices extracted from receipts where
available
- Contextual metadata: Additional context extracted from unstructured
content
## Data provenance
The `type` field in each event indicates how the data was sourced:
- track: Directly observed in a source system (e.g., a Plaid transaction
or Strava activity)
- derived: Extracted from unstructured data (e.g., parsing an email to
detect a hotel booking)
This distinction helps maintain transparency about where data originated and how it was processed.
## Next steps
- Learn how to [query this schema](/context-layer/querying) with examples
- See what [data sources](/context-layer/data) are available
- Explore the [API documentation](/apis/ai) for using this data in AI calls
---
title: Overview
order: 1
---
The Context Layer is the foundation of how Crosshatch connects, organizes, and enables retrieval of user data from various applications. It provides a unified way to access permissioned user context through a consistent schema and query language.
## Core concepts
### Timeline events
The Context Layer represents user activities as a timeline of events. Each event follows a simple structure:
- **Actor**: The user who performed the action
- **Action**: What the user did (e.g., "purchased", "liked", "confirmed")
- **Object**: What the action was performed on (e.g., a product, song, or reservation)
- **Metadata**: Additional context about the action and object
```json
// Example event representing a purchase
{
"userid": "usr_123456789",
"event": "purchased",
"object_name": "Four Seasons Hotel Miami FL",
"object_category": "TRAVEL_TRANSPORTATION",
"object_subcategory": "HOTELS_ACCOMMODATIONS",
"object_style": "Luxury, Modern, Beachfront",
"order_total": "1053.06",
"originalTimestamp": "2023-02-05T16:10:00Z"
}
```
### Unified schema
All data sources in Crosshatch are mapped to a common schema called `personalTimeline`. This allows you to:
- Query across different data sources with a single syntax
- Retrieve consistent field names regardless of the original source
- Apply filters that work the same way across all data types
### Privacy-first access
The Context Layer ensures that:
- Users explicitly consent to data access
- Developers only receive the minimum data needed for their use case
- Sensitive information is automatically filtered out
- All data access is logged and auditable
## Key features
- Unified Schema: All data sources follow a standardized schema, so you
don't have to handle different formats—querying is always consistent and
straightforward.
- Event-Based Model: User interactions are represented as structured
events, capturing detailed actions and context from various sources.
- Flexible Querying: A SQL-style JSON interface allows for powerful and
expressive queries, making it easy to retrieve exactly the context you need.
- Privacy-Focused: Sensitive data is automatically filtered, ensuring
that users privacy is always respected.
- Contextually Enriched: Objects are preloaded with relevant metadata,
the foundation for rich context.
## Quick example
Here's a simple query to get a user's recent food and drink purchases:
```json
{
"select": ["originalTimestamp", "object_name", "order_total"],
"from": "personalTimeline",
"where": [
{ "field": "object_category", "op": "=", "value": "FOOD_DRINK" },
{ "field": "event", "op": "=", "value": "purchased" }
],
"orderBy": [{ "field": "originalTimestamp", "dir": "desc" }],
"limit": 5
}
```
This query would return the 5 most recent food and drink purchases made by the user, including when they occurred, the establishment name, and how much was spent.
## Next steps
- Learn about the [Schema](/context-layer/schema) structure and available fields
- See how to write effective [Queries](/context-layer/querying) to get the exact context you need
- Explore available [Data Sources](/context-layer/data-sources) that can be connected
- Understand how data is [Mapped](/context-layer/mappings) into the timeline format
---
title: Querying
order: 3
---
The Context Layer uses a SQL-inspired JSON query language to retrieve relevant user data. This approach provides a flexible and powerful way to filter, order, and limit the context used in your AI calls. It also creates an efficient interface for AI to automatically define Crosshatch calls.
## Query structure
Every query has this basic structure:
```json
{
"select": ["field1", "field2", "field3"], // Fields to retrieve
"from": "personalTimeline", // Data source (always personalTimeline)
"where": [
// Optional filtering conditions
{ "field": "fieldName", "op": "=", "value": "fieldValue" }
],
"orderBy": [
// Optional sorting
{ "field": "fieldName", "dir": "desc" }
],
"limit": 10 // Optional result count limit
}
```
## Core components
| Component | Description | Required | Example |
| --------- | --------------------------------------- | -------- | ------------------------------------------------------- |
| `select` | Array of field names to retrieve | Yes | `["originalTimestamp", "object_name", "order_total"]` |
| `from` | Data source (always "personalTimeline") | Yes | `"personalTimeline"` |
| `where` | Array of filter conditions | No | `[{"field": "event", "op": "=", "value": "purchased"}]` |
| `orderBy` | Sorting instructions | No | `[{"field": "originalTimestamp", "dir": "desc"}]` |
| `limit` | Maximum number of results to return | No | `10` |
### Where conditions
Filter events with conditions:
```json
"where": [
{"field": "event", "op": "=", "value": "purchased"},
{"field": "object_category", "op": "=", "value": "FOOD_DRINK"}
]
```
Supported operators:
- `=`: Equal to
- `!=` or `<>`: Not equal to
- `>`: Greater than
- `>=`: Greater than or equal to
- `<`: Less than
- `<=`: Less than or equal to
- `contains`: String contains
- `in`: Value is in list
### Sorting results
Order by one or more fields:
```json
"orderBy": [
{"field": "originalTimestamp", "dir": "desc"},
{"field": "order_total", "dir": "asc"}
]
```
Direction options:
- `asc`: Ascending (A-Z, 0-9)
- `desc`: Descending (Z-A, 9-0)
## Common query patterns
### Recent hotel stays
Retrieve the user's recent hotel experiences:
```json
{
"select": [
"originalTimestamp",
"object_name",
"object_style",
"object_city",
"order_total"
],
"from": "personalTimeline",
"where": [
{
"field": "object_subcategory",
"op": "=",
"value": "HOTELS_ACCOMMODATIONS"
},
{ "field": "event", "op": "=", "value": "purchased" }
],
"orderBy": [{ "field": "originalTimestamp", "dir": "desc" }],
"limit": 10
}
```
This returns the last 10 hotel stays with dates, names, styles, locations, and amounts spent.
### Food preferences
Understand the user's dining patterns:
```json
{
"select": [
"originalTimestamp",
"object_name",
"object_style",
"object_city",
"order_total",
"order_summary"
],
"from": "personalTimeline",
"where": [
{ "field": "object_category", "op": "=", "value": "FOOD_DRINK" },
{ "field": "event", "op": "=", "value": "purchased" }
],
"orderBy": [{ "field": "originalTimestamp", "dir": "desc" }],
"limit": 20
}
```
This retrieves the 20 most recent food and drink purchases with details about the establishments and what was ordered.
### Upcoming calendar events
Get the user's next calendar appointments:
```json
{
"select": [
"originalTimestamp",
"object_name",
"object_start_time",
"object_end_time",
"object_city"
],
"from": "personalTimeline",
"where": [
{ "field": "channel", "op": "=", "value": "google-calendar" },
{ "field": "event", "op": "=", "value": "confirmed" },
{ "field": "object_start_time", "op": ">", "value": "2024-02-24T00:00:00Z" } // Use current date in ISO format
],
"orderBy": [{ "field": "object_start_time", "dir": "asc" }],
"limit": 10
}
```
This finds the next 10 calendar events the user has confirmed, starting from today. In production, you would dynamically set the date to the current date.
## Best practices
- **Limit results**: Always include a reasonable `limit` to reduce latency and token usage in AI calls.
- **Filter by event type**: Use the `event` field (e.g., `purchased`, `confirmed`) to focus on specific user actions.
- **Focus on data-rich fields**: Select fields that provide valuable context without overwhelming the AI model.
- **Sort by time**: Order events chronologically to provide the most recent or relevant context first.
- **Channel independence**: When possible, filter by `object_category` or `event` rather than `channel` to make queries work regardless of which data sources a user has connected.
- **Test queries first**: Use the [Query API](/apis/query) to test and refine your queries before incorporating them into AI calls.
## Common pitfalls
- **Missing data**: Some fields may be empty depending on the data source or event type.
- **Overly specific queries**: If a query is too restrictive, it may return no results.
- **Case sensitivity**: Field values are case-sensitive (e.g., "FOOD_DRINK" vs "Food_Drink").
- **Field naming inconsistency**: Some fields have a prefix like `object_` while others don't.
## Next steps
- Learn more about the available [schema fields](/context-layer/schema)
- Explore [data sources](/context-layer/data) that can be connected
- Use the [Query API](/apis/query) to test your queries
---
title: Schema
order: 2
---
The `personalTimeline` schema provides a unified way to represent user actions and interactions across different data sources. This standardized format makes it easy to query diverse data types with a consistent structure.
## Event structure
Each event in the personalTimeline represents a user action on an object, such as purchasing a product, booking a hotel, or listening to a song. Events consist of:
- The **action** the user performed (e.g., purchased, liked, confirmed)
- The **object** they interacted with (e.g., product, song, reservation)
- **Metadata** about the action and object
## Core fields
Every event includes these essential fields:
| Field | Type | Description | Example |
| ------------------- | -------- | --------------------------------------------------- | ------------------------------------------- |
| `userid` | string | Internal identifier for the user | "usr_1234567890" |
| `channel` | string | Data source where the event was observed | `plaid`, `google-mail`, `spotify` |
| `type` | string | Source type: directly observed or derived from text | `track`, `derived` |
| `event` | string | The action verb | `purchased`, `confirmed`, `played`, `liked` |
| `originalTimestamp` | datetime | When the event actually occurred | "2022-02-01T19:14:18.381Z" |
| `sent_at` | datetime | When the event was last updated | "2022-02-01T19:14:18.381Z" |
| `message_id` | string | Unique ID for the event | "msg_abc123def456" |
| `object_name` | string | Name of the object interacted with | "Pura Vida 1940 Alton Rd Miami Beach FL" |
| `object_key` | string | Entity ID hash of object_name | "ent_789xyz" |
## Object metadata
These fields provide rich context about the object involved in the event:
| Field | Type | Description | Example |
| --------------------- | -------- | --------------------------------------------------- | ---------------------------- |
| `object_category` | string | General category (level 1) | "TRAVEL_TRANSPORTATION" |
| `object_subcategory` | string | Specific category (level 2) | "HOTELS_ACCOMMODATIONS" |
| `object_style` | string | Aesthetic description of the object | "Modern, Luxury, Beachfront" |
| `object_tags` | array | List of tags associated with the object | ["Beachfront", "Pool"] |
| `object_address` | string | Street address | "1940 Alton Rd" |
| `object_city` | string | City | "Miami Beach" |
| `object_region` | string | State/region | "FL" |
| `object_zip` | string | Postal code | "33139" |
| `object_country` | string | Country | "US" |
| `order_total` | string | Purchase amount | "24.50" |
| `order_currency` | string | Currency code | "USD" |
| `order_summary` | string | Additional order details (e.g., items purchased) | "1x Salad, 1x Coffee" |
| `object_start_time` | datetime | When the activity began | "2022-02-01T19:00:00Z" |
| `object_end_time` | datetime | When the activity ended | "2022-02-01T20:00:00Z" |
| `object_duration` | number | Duration in seconds | 3600 |
| `object_unstructured` | string | Additional context (e.g., raw email for AI parsing) | "Your reservation is..." |
## Category taxonomy
Crosshatch uses a two-level category system based on Google's [Topics API taxonomy V2](https://github.com/patcg-individual-drafts/topics/blob/main/taxonomy_v2.md):
### Top-level categories (object_category)
- ARTS_ENTERTAINMENT
- AUTOS_VEHICLES
- BEAUTY_FITNESS
- BUSINESS_INDUSTRIAL
- COMPUTERS_ELECTRONICS
- FINANCE
- FOOD_DRINK
- HOBBIES_LEISURE
- HOME_GARDEN
- JOBS_EDUCATION
- NEWS
- PEOPLE_SOCIETY
- PETS_ANIMALS
- REAL_ESTATE
- SHOPPING
- SPORTS
- TRAVEL_TRANSPORTATION
### Subcategories (object_subcategory)
- MOVIES
- MUSIC_AUDIO
- TV_VIDEO
- BOOKS_LITERATURE
- VISUAL_ART_DESIGN
- MOTOR_VEHICLES_BY_TYPE
- VEHICLE_PARTS_ACCESSORIES
- VEHICLE_REPAIR_MAINTENANCE
- FITNESS
- FACE_BODY_CARE
- BUSINESS_SERVICES
- INDUSTRIAL_MATERIALS_EQUIPMENT
- CONSUMER_ELECTRONICS
- SOFTWARE
- BANKING
- INSURANCE
- INVESTING
- COOKING_RECIPES
- RESTAURANTS
- FOOD
- GROCERY_STORES
- ART_CRAFT_SUPPLIES
- OUTDOORS
- HOME_APPLIANCES
- HOME_FURNISHINGS
- HOME_IMPROVEMENT
- PATIO_LAWN_GARDEN
- EDUCATION
- JOBS
- BUSINESS_NEWS
- POLITICS
- WORLD_NEWS
- FAMILY_RELATIONSHIPS
- SOCIAL_ISSUES
- PET_FOOD_PET_CARE_SUPPLIES
- PETS
- REAL_ESTATE_LISTINGS
- REAL_ESTATE_SERVICES
- APPAREL
- GIFTS_SPECIAL_EVENT_ITEMS
- TOYS
- TEAM_SPORTS
- OUTDOOR_SPORTS
- SPORTING_GOODS
- HOTELS_ACCOMMODATIONS
- TRAVEL_DESTINATIONS
- CAR_RENTALS
## Data provenance
Crosshatch distinguishes between two types of events:
- **track**: Directly observed in a data source (e.g., a Spotify listen event)
- **derived**: Extracted from unstructured data (e.g., parsing an email to detect a hotel booking)
This distinction helps maintain transparency about where data originated and how it was processed.
## Example event
```json
{
"userid": "usr_1234567890",
"channel": "plaid",
"type": "track",
"event": "purchased",
"originalTimestamp": "2023-05-14T14:50:00Z",
"sent_at": "2023-05-14T14:52:11Z",
"message_id": "msg_abcdef123456",
"object_name": "Acqualina Resort & Residences",
"object_key": "ent_acqualina_resort",
"object_category": "TRAVEL_TRANSPORTATION",
"object_subcategory": "HOTELS_ACCOMMODATIONS",
"object_style": "Luxury, Resort, Beachfront, Family-friendly",
"object_address": "17875 Collins Ave",
"object_city": "Sunny Isles Beach",
"object_region": "FL",
"object_zip": "33160",
"object_country": "US",
"order_total": "120.14",
"order_currency": "USD"
}
```
## Next steps
- Learn how to [query this schema](/context-layer/querying) with examples
- See what [data sources](/context-layer/data) are available
- Understand how we [map data](/context-layer/mappings) to this schema
---
title: Crosshatch API
order: 3
---
```bash
POST https://ai.crosshatch.io/
```
Creates a model response for the given context query and templated chat conversation.
## Request endpoints
Crosshatch uses different API endpoints depending on the model provider you are using.
- Anthropic or Gemini: `POST https://ai.crosshatch.io/`
- OpenAI: `POST https://ai.crosshatch.io/chat/completions?api-version=2024-08-01-preview`
## Headers
- `Crosshatch-Replace`: object. A JSON object that defines the context queries to be used in the messages array.
- `Crosshatch-Provider`: string. The model that will be used to generate the response.
- `Content-Type`: string. Must be set to `application/json`.
- `Authorization`: string. The user authentication token. This short-lived token is required for all requests to the API, to authenticate your application and the user whose context you intend to reference. Get your token in Testing Sources or from the Link.
## Body
The requirements for the body of the request are dependent on the model provider you are using. All models accept templated messages
```text
'Here is some context {context_query} and some other context {other_context_query}'
```
and substitute the context queries with the query responses defined in the `Crosshatch-Replace` header.
### Anthropic
- `max_tokens`: integer. The maximum number of tokens to generate before stopping.
- `messages`: array. An array of templated messages to be sent to the model.
- `stream`: boolean. If set to true, the response will be streamed as it is generated.
- `anthropic_version`: string. Always set to `"vertex-2023-10-16"`
```json
// Request body
{
"anthropic_version": "vertex-2023-10-16",
"messages": [
{
"role": "user",
"content": "Consider context {context_query}. What should I do next?"
}
],
"max_tokens": 50,
"stream": false
}
// Response body
{
"id": "msg_vrtx_019vSbjYDYwZNjxqk5AXYLzF",
"type": "message",
"role": "assistant",
"model": "claude-3-haiku-20240307",
"content": [
{
"type": "text",
"text": "You should lock in."
}
],
"stop_reason": "max_tokens",
"stop_sequence": null,
"usage": {
"input_tokens": 2523,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"output_tokens": 5
}
}
```
You can see other available parameters in the [Anthropic documentation](https://docs.anthropic.com/en/api/messages).
### OpenAI
```json
// Request body
{
"messages": [
{
"role": "system",
"content": "Consider the user's past hotel stays {hotel_history}"
},
{
"role": "user",
"content": "Summarize the top 5 hotel vibes the user likes"
}
],
"response_format": { "type": "json_object" }
}
// Response body
{
"id": "chatcmpl-1234",
"object": "chat.completion",
"created": 1740450285,
"model": "gpt-4o-2024-11-20",
"system_fingerprint": "fp_b705f0c291",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The users top 5 hotel vibes are: Bathrobe Connoiseur, Really Really Late Checkout, Minibar Raider Platinum, Spa Creatures, Do Not Disturb Ever",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop",
"content_filter_results": {
}
}
],
"usage": {
"prompt_tokens": 1686,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens": 1254,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0,
"reasoning_tokens": 0,
"audio_tokens": 0
},
"total_tokens": 2940
},
"prompt_filter_results": [
]
}
```
You can see the available parameters in the [OpenAI documentation](https://platform.openai.com/docs/api-reference/making-requests) including its capability to create [structured outputs](https://platform.openai.com/docs/guides/structured-outputs).
### Gemini
```json
// Request body
{
"contents": [
{
"role": "user",
"parts": [{ "text": "Consider the users recent spending {recent_travel}. Where should they travel next?" }]
}
]
}
// Streaming Response (simplified)
// First chunk
{
"candidates": [
{
"content": {
"role": "model",
"parts": [
{
"text": "They"
}
]
},
"safetyRatings": [
// Safety ratings truncated for brevity
]
}
],
"modelVersion": "gemini-1.5-flash-001",
"responseId": "hi69Z5CiC9HSptQPptOHSQ"
}
// Final chunk
{
"candidates": [
{
"content": {
"role": "model",
"parts": [
{
"text": "should definitely travel to Miami."
}
]
},
"finishReason": "STOP"
// Safety ratings omitted for brevity
}
],
"usageMetadata": {
"promptTokenCount": 2105,
"candidatesTokenCount": 25,
"totalTokenCount": 2130
// Token details omitted for brevity
},
"modelVersion": "gemini-1.5-flash-001",
"responseId": "hi69Z5CiC9HSptQPptOHSQ"
}
// Note: Each chunk contains a portion of the full response.
// Client applications should concatenate the text from each chunk.
// The complete concatenated text would form a complete response.
```
## Supported model providers
Crosshatch supports the following models, hosted in our virtual private cloud. You can reference them by their name indicated below.
| Provider | Model Name | Model Docs |
| --------- | ---------------------- | ----------------------------------------------------------------------------------- |
| Anthropic | `claude-3-haiku` | [Link](https://www.anthropic.com/news/claude-3-haiku) |
| Anthropic | `claude-3-5-sonnet` | [Link](https://www.anthropic.com/news/claude-3-5-sonnet) |
| OpenAI | `gpt-4o` | [Link](https://openai.com/index/hello-gpt-4o/) |
| OpenAI | `gpt-4o-mini` | [Link](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) |
| Google | `gemini-1.5-flash-001` | [Link](https://deepmind.google/technologies/gemini/flash/) |
| Google | `gemini-1.5-pro-001` | [Link](https://deepmind.google/technologies/gemini/pro/) |
### Model Pricing
You can see our model pricing on our [pricing page](https://www.crosshatch.io/for-businesses#pricing).
### Token lifetime
User authentication tokens (for Production and Testing Users) are short-lived:
- 2 hours for Testing Users
- 5 minutes for Production Users
### Error handling
Error handling is currently limited.
## Next steps
- Learn more about Crosshatch [Security and Privacy](https://crosshatch.io/security).
- See how to use Crosshatch with [live first-party context](/apis/ai).
---
title: SDK
order: 6
---
We provide a Custom Provider for the Vercel AI SDK to make it easier to integrate Crosshatch into your web application.
[Typescript SDK](https://www.npmjs.com/package/@crosshatch/ai-provider)
```typescript
import { crosshatch } from "@crosshatch/ai-provider";
import { link } from "@crosshatch/link";
import { generateText } from "ai";
const { getToken } = await link({ clientId: "YOUR-CLIENT-ID" });
const { text } = await generateText({
model: crosshatch.languageModel("gpt-4o", {
token: getToken(),
replace: {
last_5_buys: {
select: ["originalTimestamp", "object_style"],
from: "personalTimeline",
where: [{ field: "event", op: "=", value: "purchased" }],
orderBy: [{ field: "originalTimestamp", dir: "desc" }],
limit: 5,
},
},
}),
prompt:
"What should the user buy next? They last purchased these 5 things {last_5_buys}",
});
```
See the Vercel AI SDK [documentation](https://sdk.vercel.ai/docs/foundations/providers-and-models) for more information on how to use the SDK.
---
title: Overview
order: 1
---
Crosshatch offers three distinct APIs that enable you to build personalized AI applications using your users' permissioned data.
## Available APIs
### Crosshatch API
The primary API that enables AI models to access user context securely. This API accepts standard LLM requests with template placeholders that Crosshatch privately hydrates with user data before forwarding to the specified model.
```bash
POST https://ai.crosshatch.io/
```
### Query API
A development tool that helps you test and refine your context queries before implementation. This API lets you see exactly what data your queries return, ensuring they retrieve the relevant information for your use case.
```bash
POST https://query.crosshatch.io/
```
### Webhooks API
A notification system that generates AI responses automatically when user context changes. This API allows your application to receive updated insights without polling.
```bash
POST https://ai.crosshatch.io/webhook
```
## How to use these APIs
A typical development workflow involves:
1. **Testing your queries**: Use the Query API to experiment with different queries and ensure they return the exact data you need
2. **Implementing personalized AI**: Integrate the Crosshatch API to generate context-aware AI responses when users interact with your application
3. **Setting up automated updates** (optional): Configure webhooks if you need automatic notifications when user context changes
## Core concepts
### Context queries
All Crosshatch APIs use a SQL-like syntax to retrieve specific subsets of user data:
```json
{
"select": ["originalTimestamp", "object_name", "order_total"],
"from": "personalTimeline",
"where": [{ "field": "object_category", "op": "=", "value": "FOOD_DRINK" }],
"limit": 5
}
```
### Template placeholders
When using the Crosshatch API or Webhooks API, you can embed query references directly in your prompts:
```
"Tell me about my recent travels {travel_history}"
```
Crosshatch privately replaces these placeholders with the results of your context queries before forwarding to the AI model.
### Authentication
All APIs use short-lived user authentication tokens to ensure secure access:
- 5 minutes for production users
- 2 hours for testing users
These tokens can be obtained through the Link SDK in production or from the Developer Portal for testing.
## What to use when
- **Query API**: When you need to test and debug your context queries during development
- **Crosshatch API**: When you want to generate personalized responses based on user interactions
- **Webhooks API**: When you need automated updates whenever relevant user data changes
## API availability
- The **Query API** is available only for development and testing.
- The **Crosshatch API** and **Webhooks API** are available for both development and production use.
For production applications, the Crosshatch API enables personalization without exposing raw user data, as only the AI-generated responses are returned to your application.
## Next steps
- Learn how to generate personalized responses with the [Crosshatch API](/apis/ai)
- Test your context queries with the [Query API](/apis/query)
- Set up automatic updates with [Webhooks](/apis/webhooks)
- Integrate easily using our [SDK](/apis/sdk)
---
title: Query API
order: 4
---
```bash
POST https://query.crosshatch.io/
```
Resolves context queries and returns the data that would be substituted into your prompts. This API is used for testing and debugging your queries before using them with the AI API.
## Headers
- `Crosshatch-Replace`: object. A JSON object that defines the context queries to test.
- `Content-Type`: string. Must be set to `application/json`.
- `Authorization`: string. The user authentication token. This short-lived token is required to authenticate your application and access the user's context.
## Body
The Query API doesn't require a request body. Unlike the AI API which needs request parameters for the language model, the Query API is strictly focused on testing how your context queries resolve.
All query parameters are defined in the `Crosshatch-Replace` header, and the API simply returns how those queries would be resolved when used in AI API calls.
## Example
```bash
curl --location 'https://query.crosshatch.io' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $USER_AUTH_TOKEN' \
--header 'Crosshatch-Replace: {
"hotel_history": {
"select": [
"originalTimestamp",
"object_name",
"order_total"
],
"from": "personalTimeline",
"where": [
{
"field": "object_subcategory",
"op": "=",
"value": "HOTELS_ACCOMMODATIONS"
},
{
"field": "channel",
"op": "=",
"value": "plaid"
}
]
}
}'
// Response:
{
"hotel_history": [
{
"originalTimestamp": "2023-02-05T16:10:00Z",
"object_name": "Four Seasons Hotel Miami FL",
"order_total": "1053.06"
},
{
"originalTimestamp": "2024-02-12T17:35:00Z",
"object_name": "Nobu Miami Beach FL",
"order_total": "146.7"
},
{
"originalTimestamp": "2024-02-24T19:30:00Z",
"object_name": "The Gabriel Miami Downtown FL",
"order_total": "161.06"
},
{
"originalTimestamp": "2023-03-05T15:30:00Z",
"object_name": "The Ritz-Carlton Bal Harbour Miami Beach FL",
"order_total": "65.67"
},
{
"originalTimestamp": "2024-05-14T14:50:00Z",
"object_name": "Acqualina Resort & Residences Sunny Isles Beach FL",
"order_total": "120.14"
}
]
}
```
## Purpose and use cases
- Test Context Queries: Experiment with different queries and see the
actual data subsets they return.
- Refine Your Prompts: Understand the query output to ensure it aligns
with your AI prompts.
- Debugging: If a context query isn't returning expected results in the
AI API, use this to troubleshoot.
- Optimize Performance: Ensure your query retrieves the most relevant and
concise data.
- Understand Data Structure: Learn how different types of user data are
formatted.
## Limitations
- Testing Only: The Query API is for development and debugging purposes
and cannot be used in production environments.
- No AI Processing: This API does not process prompts or generate AI
responses—it only resolves and returns data from your queries.
- Authentication: Only available for users who have linked their data
through the Crosshatch Link.
## Next steps
- Learn more about creating valid query objects in [Context Layer](/context-layer/querying) documentation.
- See how to use these queries with the [AI API](/apis/ai).
---
title: Getting started
order: 2
---
The Crosshatch API is a privacy-preserving personalized inference gateway that accepts standard LLM requests with context template placeholders in messages.
The `Crosshatch-Replace` header defines SQL-like queries that retrieve private user data. Crosshatch privately replaces these placeholders with query results before forwarding to the specified model, allowing personalized AI responses without exposing raw user data.
## Accessing the API
The API is accessible via our web [Developer Portal](https://platform.crosshatch.io). You can use the [Playground](https://playground.crosshatch.io) to try out the API and experiment how it performs with different users: your own or synthetic.
## Authentication
All requests to the API are authenticated via a short-lived user-specific authentication token that identify your application and the user whose context you intend to reference. If you are using the Playground, this token is managed on your behalf. If you are using the Developer Portal to test our API, you can generate an user authentication token in Testing Sources. In production, you can generate an authentication token using our Link SDK.
## Content types
The Crosshatch API always accepts JSON in request bodies and returns JSON in response bodies. You will need to send `'Content-Type: application/json'` in your request headers. If you're using our AI SDK, this will be handled for you.
## Example
```bash
curl -L 'https://ai.crosshatch.io/chat/completions?api-version=2024-06-01' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Crosshatch-Replace: {
"last_10_things_you_did": {
"select": ["event", "object_name", "object_style"],
"from": "personalTimeline",
"orderBy": [{"field":"originalTimestamp", "dir": "desc"}],
"limit": 10}}' \
-H 'Crosshatch-Provider: gpt-4o' \
-H 'Authorization: Bearer $USER_AUTH_TOKEN' \
--data '{
"messages": [
{
"role": "user",
"content": "Write a story about the last 10 things the user did {last_10_things_you_did}"
}
],
"max_tokens": 250
}'
```
## Errors
Error handling is currently limited and follows the errors of the underlying LLM resources:
- [Gemini + Anthropic](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/api-errors)
- [OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference) on Azure error code documentation is limited.
---
title: Webhooks API
order: 5
---
```bash
POST https://ai.crosshatch.io/webhook
```
Register a webhook to receive AI completions whenever context data changes. This allows your application to respond to user context updates in real-time without polling.
## Headers
- `Crosshatch-Replace`: object. A JSON object that defines the context queries to monitor for changes.
- `Crosshatch-Provider`: string. The model that will be used to generate the response.
- `Crosshatch-Target`: string. The URL where webhook callbacks should be sent.
- `Content-Type`: string. Must be set to `application/json`.
- `Authorization`: string. The user authentication token.
## Body
The request body follows the same format as the AI API, containing the messages and parameters for the language model. When context changes are detected, this request will be executed and the results sent to your callback URL.
### Identifying webhook sources
You can add query parameters to your callback URL in the `Crosshatch-Target` header to help identify which user or context the webhook is responding to:
```
--header 'Crosshatch-Target: https://your-api.com/webhooks?user_id=123'
```
This allows your webhook handler to associate incoming webhook data with the correct user or context in your application.
## Example
```bash
curl --location --request POST 'https://ai.crosshatch.io/webhook' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Crosshatch-Replace: {"last_confirmation": {"select": ["sent_at", "event", "object_name", "object_unstructured"], "from": "personalTimeline", "where": [{"field": "event", "op": "=", "value": "confirmed"}, {"field": "channel", "op": "=", "value": "google-mail"}], "orderBy": [{"field": "sent_at", "dir": "desc"}], "limit": 1}}' \
--header 'Crosshatch-Provider: claude-3-haiku' \
--header 'Crosshatch-Target: $YOUR_CALLBACK_URL' \
--header 'Authorization: Bearer $USER_AUTH_TOKEN' \
--data '{
"messages": [
{
"role": "user",
"content": "Structure this email message {last_confirmation} in a json format including order name, number, and items."
}
],
"max_tokens": 25,
"anthropic_version": "vertex-2023-10-16"
}'
// Response:
{
"webhook_id": "whk_01H9XYZABC123",
"status": "active"
}
// When context changes, the AI completion is sent directly to your callback URL.
// The response format will match the output of the AI model you selected.
// For this example with Claude, the response would look like:
{
"completion": "{\n \"order_name\": \"Amazon Prime\",\n \"order_number\": \"112-3456789-0123456\",\n \"items\": [\"Wireless Earbuds\", \"Phone Charger\"]\n}"
}
```
## Listing webhooks
```bash
GET https://ai.crosshatch.io/webhook
```
Lists all registered webhooks for a user.
```bash
curl --location --request GET 'https://ai.crosshatch.io/webhook' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $USER_AUTH_TOKEN'
// Response:
{
"webhooks": [
{
"id": "whk_01H9XYZABC123",
"created_at": "2024-02-15T12:34:56Z",
"callback_url": "https://example.com/webhooks",
"query_keys": ["last_confirmation"],
"model": "claude-3-haiku"
}
]
}
```
## Deleting webhooks
```bash
DELETE https://ai.crosshatch.io/webhook/{webhook_id}
```
Deletes a registered webhook.
```bash
curl --location --request DELETE 'https://ai.crosshatch.io/webhook/whk_01H9XYZABC123' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $USER_AUTH_TOKEN'
// Response:
{
"deleted": true
}
```
## Key purpose
- Real-Time Updates: Get notified as soon as there are changes to a
user's linked context or specific events occur, such as a new purchase as
observed in email, calendar event, professional contact, or run logged.
- Automated Workflows: Use these notifications to trigger actions in your
application, such as updating a UI or sending push notifications.
- Efficient Communication: Reduce the need for frequent API calls by
receiving updates only when necessary.
- Compliant access: Get real-time access as consented by the user.
## Limitations
- Webhook Lifetime: Webhooks remain active as long as the user's
authorization is valid.
- Request Size: Large webhook payloads may be truncated.
- Retry Policy: Failed webhook deliveries will be retried with
exponential backoff for 24 hours.
---
title: Models
order: 2
---
Crosshatch supports private personalized inference for a variety of top models hosted in our virtual private cloud. This guide highlights the models available and how to use them.
## Supported Models
Crosshatch currently supports multiple language models, each with different strengths suited to various use cases. Below is a high-level comparison of the models available through the Crosshatch API.
| Provider | Model Name | Strengths | Explicit Structured Outputs | Context Window | Latency | Cost ($/1M tokens) |
| --------- | ---------------------- | --------------------------------- | --------------------------- | -------------- | -------- | ----------------------- |
| Anthropic | `claude-3-haiku` | Fast, cost-effective | ⌠| 200K | Low | $0.5 input / $1 output |
| Anthropic | `claude-3-5-sonnet` | Balanced speed and capability | ⌠| 200K | Medium | $5 input / $25 output |
| OpenAI | `gpt-4o` | Strong general-purpose LLM | ✅ | 128K | Low | $7.5 input / $25 output |
| OpenAI | `gpt-4o-mini` | Budget-friendly with good quality | ✅ | 128K | Low | $0.25 input / $1 output |
| Google | `gemini-1.5-flash-001` | High-speed, cost-efficient | ⌠| 1M | Very Low | $0.25 input / $1 output |
| Google | `gemini-1.5-pro-001` | Advanced reasoning and depth | ⌠| 1M | Medium | $7.5 input / $25 output |
Only OpenAI models guarantee explicit structured output, though you can use our personalized inference [provider](https://sdk.vercel.ai/providers/community-providers/crosshatch) to Vercel's AI SDK to get structured output support.
## Crosshatch as Proxy
Crosshatch acts a a proxy or personalization gateway, not a router. This means that it does not modify the original request or response, but instead adds context to the templated request, forwards it to the specified model resource in our VPC, and returns the personalized response. Additionally, this means:
- Our API passes calls directly to the model resource without modifying them
- Requests inherit the idiosyncrasies of the source model resource
- Each model follows its native API structure and you must conform to that model's expected request format
That is, you can call each model via Crosshatch with a request structured according to the respective model provider's documentation. You can see more on how to call these models in the [API Reference](/apis/ai).
---
title: Security & Compliance
order: 3
---
We built Crosshatch on a simple principle: personalization should not require sacrificing privacy.
Users should control their data—where it goes, how it’s used, and when to revoke access. Developers should be able to integrate this data **securely and responsibly**, without unnecessary exposure. AI should provide **meaningful, context-aware insights** while respecting privacy boundaries.
These beliefs guide how we design our infrastructure, set our policies, and approach compliance.
## Privacy
### User-Controlled Data Sharing
- Users own their data. Crosshatch does not train models on user data.
- Fine-grained permissions. Users decide which apps can access specific
data, for how long, and for what purpose.
- Scoped access with OAuth 2.0. Applications authenticate via OAuth and
receive short-lived, scoped access tokens.
- Verifiable transparency. All API interactions are logged for
auditability and compliance.
#### OAuth Authentication Flow
1. Developers obtain an OAuth CLIENT_ID from Crosshatch.
2. Users explicitly grant permissions for each Use Case via OAuth.
3. Applications receive short-lived, scoped access tokens to interact with the API.
### Data Minimization
Some of the most useful context for personalization—like restaurant reservations or recent purchases—is found in data sources that contain other sensitive information. To ensure privacy, Crosshatch follows a minimization-first approach:
- Filter data sources by relevance. Crosshatch selects only emails
related to purchases, reservations, and confirmations—other emails do not
enter the user's context layer.
- Access data through the AI API. Applications do not receive raw data
directly. Instead, they activate relevant context via LLMs, enabling
expressive, context-aware personalization while minimizing unnecessary data
exposure.
- Time-bound access. Users decide how long their data remains
available—choosing either session-based access ('while using') or ongoing
access ('always').
## Security & Compliance
Crosshatch maintains multiple layers of security to protect user data and ensure compliance.
- Data is encrypted at rest and in transit. TLS 1.2+ for data in motion,
AES-256 for storage.
- SOC 2 Type 1 audited. SOC 2 Type 2 + ongoing penetration testing in
progress.
- Audit logs for all API requests. Every data access request is logged
and can be reviewed for security compliance.
### End-User Protections
- Right to be forgotten. Users can permanently delete their data.
- Full transparency. Users can view and manage their data-sharing
preferences at any time.
### Developer Responsibilities
- Use data only for the declared purpose. Applications must specify their
intended Use Case in the Link configuration and must not use the API for any
other purpose.
- Enforced compliance. Crosshatch actively audits API logs to ensure
responsible data use.
## Our Business Model
We believe that building trust means aligning incentives.
Crosshatch does not monetize user data or AI interactions. Instead, we charge platform and usage fees for enabling secure, consented data connections. This model ensures that the value we create is directly tied to privacy, security, and utility—not data extraction.
## Learn More
- [Consumer Privacy Policy](https://www.crosshatch.io/policies/end-user-privacy)
- [Developer Terms](https://www.crosshatch.io/policies/developer-terms)
If you have questions, concerns, or need guidance on implementing security best practices, reach out to us at **support@crosshatch.io**.
---
title: Use Cases
order: 1
---
Crosshatch is designed to enable a variety of personalization use cases. Explore the guides below to learn how to build common use cases with Crosshatch.
## Personalized search
Here are some indicators that you should use personalized inference like that from Crosshatch to improve search results:
- Your users shop across multiple services and you want to leverage their
broader shopping patterns without friction or privacy tradeoffs.
- You want to reduce "search friction" by automatically applying relevant
user context instead of requiring manual filter selection.
- Your user profiles are sparse or incomplete making improving search
relevance with first party data challinging or ineffective.
- You have a large inventory of products and improving product discovery
drives key business outcomes.
- You're already experimenting with natural language search and want to
further drive relevance with personalized inference.
- Your users aren't providing enough context in search queries to make
natural language search effective.
- Your analytics show high search abandonment rates despite having
relevant inventory, suggesting users aren't finding what they want.
### Build your hybrid search experience
Hybrid search is a two stage search process that combines
1. Initial Retrieval: Your existing search system (keyword, vector, or
both) generates initial candidates
2. Re-ranking: A language model evaluates and re-orders these candidates
Crosshatch enhances the re-ranking step by leveraging your users' context to improve the relevance of the results. Here's how it works:
1. User provides a query
2. You generate candidate results from your existing search system
3. Write a prompt to re-rank the results using a user's context
4. Call the Crosshatch API to generate personalized results
### Writing a strong prompt
The prompt is the key to personalizing your search results. It's a natural language instruction that guides the language model to generate personalized results. The context you might provide the model is dependent on your use case, so here we'll focus on the prompt.
````text
You are part of a hybrid search system that re-ranks search results based on user preferences. Your task is to analyze and re-rank a list of candidate results according to their relevance to both the search query and the user's profile.
Below are the candidate results to be re-ranked:
{{CANDIDATE_RESULTS}}
Here's the search query:
{{SEARCH_QUERY}}
This is the user's profile, containing relevant information about their preferences:
{{USER_CONTEXT}}
Process:
1. For each candidate result:
a. Analyze how well it aligns with the user's profile.
b. Evaluate its relevance to the search query.
c. Assign two separate scores:
- Query relevance score (1-5)
- User profile alignment score (1-5)
d. Calculate the final relevance score by adding the two scores (2-10).
e. Briefly justify the scores.
2. Re-rank all results based on their final relevance scores.
3. Format the output as JSON.
Wrap your analysis in tags. For each result, consider:
- Key words or concepts that match the search query
- Aspects of the result that align with user preferences or interests
The final output should be a JSON array of objects, each containing:
- rank: The new position in the re-ranked list
- original_position: The item's original position in the candidate results
- item: A brief description or title of the result
- query_relevance: The assigned query relevance score (1-10)
- user_alignment: The assigned user profile alignment score (1-10)
- final_score: The sum of query_relevance and user_alignment (2-20)
- justification: A concise explanation for the scores
Example JSON structure:
```json
{
"analysis_of_results": "Concise analysis of results",
"results": [
{
"rank": 1,
"original_position": 3,
"item": "Example Item Title",
"sku": "Example SKU",
"query_relevance": 4,
"user_alignment": 5,
"final_score": 9,
"justification": "Concise reason for scores"
},
// ... more results ...
]
}
````
Let's break down the key components of the prompt:
- We give the model a clear role as part of a hybrid search system that re-ranks search results based on user preferences.
- We structure the input so it knows the candidate results, search query, and user context (Crosshatch has a specific way of passing this information to the model that we'll cover later)
- We provide a process for the model to analyze and re-rank the results based on their relevance to both the search query and the user's profile.
- We provide a desired output format for the model to generate the re-ranked results to make it easier to integrate the results into your front-end. Note that some model providers and frameworks can guarantee these specific output formats.
## Adaptive notifications
This section explains how to get AI responses automatically when your users' context changes, enabling timely and relevant engagement.
### Before building with Crosshatch
Here are some key indicators that you should employ a user's complete context to build real-time adaptivity into your application:
- You want to take action on behalf of your users based on their behavior
and need a privacy-preserving low-cost way to respond to cross-app signals
-
You provide a replenishment service that needs to adapt to a user's needs
as given by behaviors across multiple services
- Your notifications have low engagement rates despite having relevant
content to share
### Determine the context you want to listen to
Crosshatch enables users to bring their complete context to your app in a few taps, and delivers AI responses whenever selections or aggregations of that context changes.
For instance, you can receive notifications whenever the user
- buys something new from across retailers
- makes a new restaurant reservation
- updates their calendar
- confirms a grocery pickup order
- likes a new song
- logs a new run
### Define the notification you want to receive
Crosshatch Webhooks lets you receive AI responses automatically when user context changes. Think of it as the AI API in push form - instead of polling for changes and making API calls, you get insights from the user's updated context directly.
For instance, suppose you want to offer to send a courier to pick up groceries whenever the user confirms a pickup order. You can define a webhook that listens for grocery order confirmations and uses AI to transform the raw associated context into a form actionable by your application.
You might write a prompt like:
```text
You will be given the content of an email representing an order confirmation. Your task is to transform this email into a structured JSON format containing key information about the order.
Here is the email content:
{recent_order_email}
Extract the following information from the email (if available):
1. Order ID
2. Pickup window (date and time range)
3. Pickup location (store name and address)
4. List of items in the order (including quantity, name, and price for each item)
5. Subtotal
6. Tax
7. Total price
Structure the extracted information into a JSON format with the following keys:
- order_id
- pickup_window
- pickup_location
- items (an array of objects, each containing "quantity", "name", and "price")
- subtotal
- tax
- total
If any information is not available in the email, omit that key from the JSON structure rather than including it with a null or empty value.
Ensure that the JSON is properly formatted and valid.
Here's an example of how the structure should look:
{
"order_id": "123456",
"pickup_window": "2023-05-15 14:00-15:00",
"pickup_location": {
"store_name": "Grocery Store Name",
"address": "123 Main St, City, State, ZIP"
},
"items": [
{
"quantity": 2,
"name": "Apples",
"price": 3.99
},
{
"quantity": 1,
"name": "Bread",
"price": 2.49
}
],
"subtotal": 10.47,
"tax": 0.84,
"total": 11.31
}
Remember to adjust the structure based on the actual information available in the email. If you're unsure about any information, it's better to omit it than to include potentially incorrect data.
```
This prompt transforms information representing a user's order confirmation into a structured JSON format that can be easily consumed by your application. This could trigger an additional API call to Crosshatch to check the user's calendar to schedule an optimized delivery time.
You can learn more about registering Crosshatch Webhooks [here](/apis/webhooks).
## Adaptive UI
This section explains how to use Crosshatch to personalize UI elements based on a user's complete, cross-app context, enabling dynamic, contextually relevant experiences that adapt to user behavior in real time.
### Before building with Crosshatch
Here are some key indicators that you should personalize UI elements using Crosshatch’s cross-app context:
- You want to reduce decision fatigue and simplify discovery by
dynamically surfacing the most relevant content without requiring users to
search or filter manually.
- You want to increase engagement, retention, and trust by ensuring UI
elements feel intuitive and highly relevant to individual users.
- You want to enhance convenience and optimize for action by anticipating
user needs based on their real-world behaviors across multiple services.
- You want to personalize experiences at scale by leveraging AI-driven
context-aware adaptation instead of relying on explicit user input.
### Define the UI Elements to Personalize
Crosshatch enables dynamic UI adaptation based on a user's real-world behaviors and cross-app activity. Examples include:
- Category Carousels. Surface the most relevant categories based on past
interactions across services (e.g., prioritize "Fitness Equipment" if a
user recently purchased running shoes and booked a gym class).
- Recommended Sections. Highlight
personalized content, products, or services based on the user's
past transactions, subscriptions, or browsing behavior.
- Home Screen Layouts. Adjust featured sections dynamically, such as
emphasizing "Tech Deals" for a user who
recently purchased a laptop or streamed AI-related content.
- Call-to-Action Elements. Adapt CTAs based on user context (e.g.,
"Reorder your favorite items" for a user who regularly purchases coffee
pods and just ran out).
- In-App Messaging & Nudges. Provide predictive nudges like "
Upcoming flight? Get noise-canceling headphones", if a user recently
booked a flight.
### Write a prompt to personalize UI elements
Crosshatch enables dynamic UI adaptation based on a user's real-world behaviors and cross-app activity. To personalize UI elements, you need to write a prompt that guides the AI model to generate personalized UI elements.
Here's an example prompt that guides the AI model to personalize a category carousel.
```text
You are a personalization system for an ecommerce and content platform. Your task is to reorder the categories carousel based on the user's **cross-app purchase, browsing, and engagement history**.
Below is the user's **context** across multiple services:
{{USER_CONTEXT}}
The platform has the following available categories:
{{AVAILABLE_CATEGORIES}}
Instructions:
1. Analyze **cross-app behaviors** (e.g., purchases, reservations, subscriptions, and engagement data) to identify relevant preferences.
2. Prioritize categories that align with the user’s **recent transactions, streaming history, or planned activities**.
3. If no strong preferences exist, rank based on **seasonal trends or recent interactions**.
4. Provide a **JSON-formatted response** with the reordered list.
```
This prompt guides the AI model to personalize the categories carousel based on the user's purchase, browsing, and engagement history. The model will analyze the user's behaviors across multiple services and prioritize categories that align with their recent transactions, streaming history, or planned activities. If no strong preferences exist, the model will rank based on seasonal trends or recent interactions. Finally, the model will provide a JSON-formatted response with the reordered list.
---
title: Quickstart
order: 3
---
Let's learn how to use the Crosshatch API to personalize AI outputs with linked user context.
In this example, we'll have `gpt-4o-mini` recommend hotels based on past hotel purchases, with context sourced from Gmail and Plaid.
## Prerequisites
You will need:
- A Crosshatch [platform account](https://platform.crosshatch.io)
- A user auth token
- A place to run curl commands
Crosshatch provides a [Typescript SDK](https://www.npmjs.com/package/@crosshatch/ai-provider) though we recommend starting with direct HTTP requests to the API.
### Start with the Playground
The Crosshatch AI API is a personalized inference gateway, proxying top LLMs. It's different than traditional AI APIs in that
- it's authenticated by a user-specific authentication token (not an API key)
- enables application-defined queries into the user's context layer
- its prompts are templated and privately hydrated with queried user context
As you're learning to make the most of Crosshatch, we recommend that you start the development process in the Playground, a web-based interface that allows you to experiment with API calls, refine your queries, and see how different prompts interact with user data.
Start with [Playground](https://playground.crosshatch.io). Select a user you'd like to test personalization with, either yourself or a synthetic user.
Choose a personalization context or generate your own.
```text
A full itinerary for your next trip abroad.
```
On the right hand side you'll see a prompt
```text
Using the users previous travel history and dining preferences `\{recent_travel\}`, create 2-sentence itinerary for their next trip abroad that highlights activities and dining experiences they'll likely enjoy. Explain your recommendation.
```
and defines a variable `recent_travel`
```json
{
"recent_travel": {
"select": [
"event",
"object_name",
"object_category",
"object_style",
"object_tags",
"object_address",
"object_city",
"object_country",
"object_start_time",
"object_end_time",
"object_duration"
],
"where": [
{
"field": "object_category",
"op": "in",
"value": ["TRAVEL_TRANSPORTATION", "HOBBIES_LEISURE", "FOOD_DRINK"]
},
{
"field": "event",
"op": "in",
"value": ["confirmed", "purchased"]
}
],
"orderBy": [
{
"field": "originalTimestamp",
"direction": "desc"
}
],
"limit": 2
}
}
```
This variable is a query into the user's context layer, and contains the 2 most-recent purchases and confirmations of items in travel, transportation and food categories. Click on the code block to see or modify the query. At the bottom you can click `result` to see how the query will hydrate into the prompt. Here's what the result could look like for one user:
```json
{
"recent_travel": [
{
"event": "purchased",
"object_name": "Panther Coffee Wynwood Miami FL",
"object_category": "FOOD_DRINK",
"object_style": "Panther Coffee blends specialty coffee culture with Cuban-style vibrancy and modern roasting techniques. Founded by Leticia and Joel Pollock, it fosters community and craftsmanship, offering an inviting space for coffee lovers to enjoy meticulously crafted cups.",
"object_tags": [
"Specialty Coffee",
"Wholesale Coffee",
"Retail Coffee",
"Organic Growth",
"Single Origin",
"Coffee Blends",
"Roasting Facilities"
],
"object_address": "2390 NW 2nd Ave",
"object_city": "Miami",
"object_country": "USA",
"object_start_time": "2023-01-05T08:30:00Z",
"object_end_time": null,
"object_duration": null
},
{
"event": "confirmed",
"object_name": "Yoga Retreat at The Standard Spa Miami Beach FL",
"object_category": "TRAVEL_TRANSPORTATION",
"object_style": "The Standard Spa, Miami Beach is an adults-only retreat blending relaxation with social spaces. Nestled on Belle Isle, it offers communal wellness experiences, hydrotherapy, and holistic treatments in a serene yet lively environment.",
"object_tags": [
"Adults-Only",
"Spa and Wellness",
"Yoga and Fitness Classes",
"Outdoor Soaking Tubs",
"Hydrotherapy Playground",
"Communal Bathing",
"Holistic Treatments"
],
"object_address": "40 Island Ave",
"object_city": "Miami Beach",
"object_country": "USA",
"object_start_time": "2023-01-15T10:00:00Z",
"object_end_time": "2023-01-15T11:00:00Z",
"object_duration": "60"
}
]
}
```
When you call the API, Crosshatch will dynamically fetch the user's context and hydrate the prompt with the queried data. In this case, Crosshatch will fetch the result of the query `recent_travel` and replace the variable `\{recent_travel\}` with the result. Then, it will send the prompt to your specified LLM hosted in the secure Crosshatch permiter. The model processes this information and returns a personalized response tailored to the user and your instructions.
For instance, sending this call to `gpt-4o` might return
```text
For your next journey abroad, consider exploring the vibrant coffee culture of Melbourne, Australia. Begin your days at local favorites like St. Ali Coffee Roasters, which mirrors the artisanal and community-focused coffee experience you adored in Miami, and unwind with a visit to the Peninsula Hot Springs for a rejuvenating spa session reminiscent of your serene retreat at The Standard Spa. These activities align with your appreciation for specialty coffee and holistic wellness experiences, ensuring a delightful and rejuvenating trip.
```
This answer is good, but let's say you want the recommendation in a different style. You can control the format, tone, personality of the response by modifying the prompt, AI model or context. For instance, try changing the model to `claude-3-5-sonnet`. Delete the response and click Run again.
```text
Based on their visit to Panther Coffee and The Standard Spa, I'd recommend this itinerary for Copenhagen, Denmark:
"Start your morning with a carefully crafted coffee and house-made pastry at Coffee Collective in Nørrebro, where their approach to specialty coffee and community mirrors Panther Coffee's dedication to craft and sourcing. Later, unwind at CopenHot, a Nordic wellness experience featuring harbor-view saunas and hot tubs that offers a uniquely Scandinavian take on the communal wellness atmosphere you enjoyed at The Standard Spa."
I made these recommendations because the user has shown interest in both high-quality coffee culture and communal wellness experiences. Coffee Collective shares Panther's philosophy of ethical sourcing and expert roasting, while CopenHot provides the same focus on communal wellness as The Standard Spa but with a distinctive Nordic twist that would offer a novel yet familiar experience.
```
See how the API response changed? Claude 3.5 has a different personality than `gpt-4o`. We recommend testing to see which combination of prompts, models and contexts works best for your application.
## Next steps
Now that you've made your first Crosshatch request, it's time to explore what's possible:
- [Use Case Guides](/guides/personalization-library): End-to-end guides for common personalization use cases.
- [Personalization library](/guides/personalization-library): Explore example prompts for inspiration.
- [Authenticating users](/link/overview): Learn how to add the Crosshatch Link to your site.
---
title: Intro to Crosshatch
order: 5
---
Crosshatch is a secure personalized inference provider, allowing users to "bring their personal AI" to your application.
While Crosshatch enables users to bring their complete context to your app in a few taps, it's also a privacy-preserving layer, enabling users to easily control how, when and with whom their data is shared with fine-grained controls. Crosshatch is model-agnostic, allowing you to choose the hosted AI model that best fits your application's needs.
This guide introduces Crosshatch's core capabilities and walks you through the process of creating your first adaptive experience powered by complete context.
## What you can do with Crosshatch
Crosshatch is designed to empower applications to deliver effortless personalized experiences that save users time. Here's a quick overview of what you can do with Crosshatch:
- Hyper-personalized search: Instead of requiring users to to fully
specify their search intent, Crosshatch enables you to use their context to
adapt search results to their preferences and past behaviors.
- Personalized recommendations: Deliver personalized recommendations to
users based on their context, improving the overall user experience.
- Conversational AI: Build conversational AI experiences that feel
natural and intuitive, tailored to each user's preferences.
- Generative UI: Generate dynamic UI elements based on user context,
enhancing the user experience.
- Personalized rankings: Deliver personalized rankings to users based on
their context, improving the overall user experience.
- Adaptive notifications: Send notifications adapted to user's changing
context, creating experiences that actually anticipate their needs.
All of these activations of a user's context can be accomplished with a single API call, using private AI to transform selected context into a personalized response tailored to the user's needs and expectations.
## Model options
Personalized inference needs vary. That's why Crosshatch separates inference from the context layer, allowing you to request personalized inference from a variety of models hosted in our virtual private cloud. Applications access the context layer through the model layer, using AI to transform selected context into a personalized response tailored to the user's and application's needs.
You can reference them by their name indicated below.
- OpenAI `gpt-4o`: fast, intelligent, flexible
- OpenAI `gpt-4o-mini`: fast and cost-effective
- Anthropic `claude-3-5-sonnet`: most intelligent with large context
window
- Anthropic `claude-3-haiku`: near instant responses with large context
window
- Google `gemini-1.5-flash-001`: affordable, fast, with largest context
- Google `gemini-1.5-pro-001`: most intelligent with largest context
We continue to add new models to out platform. If you'd like to see a model added to Crosshatch, please [reach out to us](mailto:support@crosshatch.io).
## Implementing Crosshatch
1. Scope your use case: Identify personalization opportunities where user
context could delight or save users time.
2. Try the API: Use our free [developer
platform](https://platform.crosshatch.io) or
[Playground](https://playground.crosshatch.io) to see how our API can be used
to power your use case.
3. Explore the Context Layer: Use Query API to see how retrieved context
appears in your prompts.
4. Develop your prompts: Iterate on your prompts using your favorite
framework (e.g., Anthropic Workbench) to create polished personalization
prompts.
5. Add the Link: Add the Crosshatch Link to your site to enable users to
link their data to your app.
6. Test your app: See how your application responds to different users
and their contexts.
7. Deploy to production: Once your application runs smoothly, deploy it
to production.
8. Monitor and improve: Monitor performance, particularly how
personalization with complete context impacts metrics like engagement,
conversion, and retention.
## Start building with Crosshatch
When you're ready, start building with Crosshatch:
- Follow the [Quickstart](/overview/api-quickstart) to make your first API call.
- Check out our [API Reference](/apis/ai)
- Explore the [Personalization library](/guides/personalization-library) for example personalization calls.
- Experiment and start building with the [Playground](https://playground.crosshatch.io).
---
title: Overview
order: 2
---
Crosshatch is the missing context layer for personalized AI applications. It provides the internet's fastest onboarding for complete user context, enabling applications to deliver experiences adapted to each user without friction or privacy tradeoffs.
Looking to try our API? Get started with our
[playground](https://playground.crosshatch.io).
🚧 Crosshatch is currently in public beta.
## Get started
If you're new to Crosshatch, start here to learn the essentials and make your first API call:
- [Intro to Crosshatch](/overview/introduction): Explore Crosshatch's capabilities.
- [Quickstart](/overview/api-quickstart): Learn how to make your first API call.
- [Context layer](/context-layer/overview): Learn more about the context users can share.
## Develop with Crosshatch
Crosshatch is building developer tools to build applications privately personalized with complete context and your favorite AI.
- [Developer Console](https://platform.crosshatch.io): Manage your app and test our API with live or synthetic data.
- [API Reference](/apis/ai): Explore how Crosshatch enables private personalized inference.
- [Personalization library](/guides/personalization-library): Explore example prompts for inspiration.
---
title: Why Crosshatch
order: 10
---
# Why Crosshatch
Crosshatch is a personalized inference gateway, with the internet's fastest onboarding for complete user context.
Effective personalization depends on access to comprehensive user context. This document explains common challenges with existing context and personalization solutions and how Crosshatch addresses these limitations.
## The Challenge of Personalization
Today, applications seeking to personalize their experiences face four options:
### Customer Data Platforms (CDPs)
CDPs centralize first-party data, but they often suffer from:
- Poor or limited data quality. Incomplete or outdated information.
- High setup costs. Expensive and time-consuming implementation
- Activation challenges. Difficult and expensive to operationalize data
for real-time personalization.
- Incomplete customer view. Context limited to activities directly
observed.
### Data Management Platforms (DMPs)
DMPs enable access to consumer data from third party aggregators, but they often suffer from:
- Poor data quality. Incomplete or outdated information updated
infrequently
- Privacy risks. Data access via terms of service grey areas
- Data unreliability. DMPs stitch together data on inconsistent identity
data
- High acquisition costs. Annual contracts are expensive
### Onboarding Flows
Traditional onboarding questionnaires try to gather user preferences, but they:
- Suffer from low conversion. Users abandon lengthy forms
- Capture limited signals. Only gather what users explicitly share
- Quickly become outdated. Static data that doesn't evolve with user
behavior
- Create repetitive work. Users must complete similar forms across many
apps
### Custom Integrations
Applications may build their own direct integrations to data sources, but as shown in our diagram, this approach can be problematic:
- Unique APIs require approval. Each integration requires separate
implementation, approval processes, and ongoing maintenance.
- Coarse permission granularity. Users have limited control over what
specific data is shared.
- Direct data access. Applications directly access sensitive user
information, creating potential privacy and security concerns.

These approaches lead to:
- Fragmented user experiences across applications
- Generic AI responses due to limited context
- Incomplete context for personalization
- High development costs
- Privacy and security concerns
## Crosshatch Architecture
Crosshatch consists of three core components:
2. A **link system** that enables users to easily connect and manage their data.
1. A **context layer** that unifies user data from multiple sources with a streamlined query interface optimized for AI reference.
1. A **personalized inference API** that processes user context through secured popular AI models.
These components work together to enable privacy-preserving, context-aware AI applications:
- Standardized integrations. Connect to multiple data sources without
lengthy approval processes or integrations.
- Privacy-preserving sampling. Get AI completions of linked context that
adapt to users while respecting privacy boundaries.
- Granular permissions. Enable users to control exactly what data is
shared with each application in a few taps.
- AI-optimized interface. Access personalized inference through a
consistent API and streamlined data representation.

This approach enables:
- One-click user onboarding with complete context
- Comprehensive user profiles
- Simple integration for developers
- User-controlled data sharing
- Truly personalized AI experiences
## Core Design Principles
Crosshatch was built with these principles in mind:
- Frictionless onboarding - Users can bring their complete context to
your app in seconds, eliminating repetitive setup processes and allowing for
immediate personalization
- Comprehensive context - Access the rich user context from multiple
sources, allowing AI to generate truly personalized responses based on
complete context
- Privacy by design - Built with user control at its core, Crosshatch
enables fine-grained data sharing permissions that build trust with your users
- Developer simplicity - Integrate once to access multiple data sources,
with a unified API that works with leading AI models, reducing implementation
complexity
- Future-proof architecture - We add new data sources and AI models to
our platform, expanding the contextual capabilities available to your
application without additional integration work
## Next Steps
To begin implementing Crosshatch in your application:
- [Explore the overview](/overview/overview) for additional architecture details
- Follow the [quickstart guide](/overview/api-quickstart) to make your first API call
- Use the [playground](https://playground.crosshatch.io) to experiment with the API
# Site
Crosshatch
The internet's first personalized AI gateway
Website: crosshatch.io
Documentation: docs.crosshatch.io
Overview
Crosshatch lets users share their preferences and history in a tap—enabling businesses to deliver personalized experiences with complete context. Go beyond clicks from a CDP and ship adaptive experiences that tap into your users' complete context.
Key Features
Instant Onboarding
Offer complete customization without a lengthy onboarding flow.
Sitewide Customization
Turn every page into a personalized experience. Show products and content that match the user's context.
Real-time Experiences
Continuously evolve and create meaningful personalization for users by listening for changes in user context.
Permission Settings and Security
Users can manage what content and information they wish to share, whether it's their recent purchases through Gmail or their recently liked videos on YouTube, enabling intuitive consent and enterprise-grade data security.
AI with Context
Crosshatch API enables businesses to privately integrate data shared by the user. By dynamically hydrating prompts with customizable context at runtime using top AI models—businesses can build adaptive, privacy-compliant AI experiences.
Webhooks
Listen for changes in a user's context across apps, so that you can adapt your product experience to your users in real time. This allows you to take proactive AI-powered actions at exactly the right time, such as offering a new restaurant recommendation if a user's schedule opens up.
For Businesses
The Complete Toolkit for Hyper-Personalization
Crosshatch provides the fastest way to onboard users and their data from across apps, enabling personalized experiences with complete user context.
Leverage user data without the infrastructure:
Embeddable Consumer Link — Enable data sharing with 4 lines of code.
Privacy-Compliant Pipelines — Protect user privacy right from the start.
One API, Many Integrations — Tap into a range of data sources without engineering lift.
What Your Business Gets
Conversion-optimized user onboarding flow
Data integrations for spending, travel and more
Managed privacy and access control
For Consumers
The Button That Turns Everything More You
Crosshatch lets you share (and un-share) your data with the apps you choose to turn on hyper-personalized experiences.
Above and Beyond Personalization
Unlock experiences beyond your wildest internet dreams. Think personal travel agents, products filtered in your size and budget, and more. When you safely let brands know who you are, they can serve you on a whole new level.
Privacy-First Personalization
Crosshatch safeguards all private personal information, ensuring apps only access what you choose, and never share it. You can un-share access any time.
Use Cases
Travel — Get one-of-a-kind trip plans based on your preferences
Real Estate — Find the perfect home with personalized recommendations
Shopping — Enjoy crazy-customized shopping experiences
Pricing
Free Tier
Start building with Crosshatch
$15 credit towards AI model usage fees
Access to all data sources: Plaid, Fitbit, Gmail, Calendar, Strava
Growth
Personalize your user experience
AI usage starting at $0.25/1M tokens
Access to most data sources: Fitbit, Gmail, Calendar, Strava
Enterprise
Unlimited access to context
AI usage starting at $0.25/1M tokens
Access to all data sources: Fitbit, Gmail, Calendar, Strava, Plaid
AI Model Usage Fees
Model
Input Tokens ($/1M)
Output Tokens ($/1M)
GPT-4o
$7.50
$25
GPT-4o Mini
$0.25
$1
Claude 3.5 Sonnet
$5
$25
Claude 3 Haiku
$0.5
$2
Gemini 1.5 Flash
$0.25
$1
Gemini 1.5 Pro
$7.50
$20
Gemini 2.5 Flash (soon)
$0.5
$2
About Crosshatch
The Problem
Apps don't have the personal context needed to fill in the true picture of who we are, and consumers deserve to know how their data is being used before—and if—we share.
The Solution
Crosshatch created a secure, radically transparent way for consumers to share (and un-share) their context with their favorite apps, enabling brands to build experiences that really answer their needs. The personal data wallet represents an enlightenment of data sharing, removing the mystery and replacing it with a new kind of empowerment and possibility.
Security & Privacy
Data is encrypted at rest and in transit
Crosshatch is in the process of SOC2 Type 2 certification
Crosshatch logs how your apps use your connected data
You decide what to share and un-share
Leadership Team
Name
Role
Background
Soren
Co-founder
Harvard-trained computational scientist with over a decade in applied ML, has shaped data innovations in ad tech, finance and e-commerce
Jesse
Co-founder
U Michigan engineer + data scientist, previously built enterprise software at Thompson Reuters and OpenStore
Ben
Co-founder
Seasoned entrepreneur and Fulbright Scholar, founded Neyborly, an omnichannel marketplace for emerging retail entrepreneurs
Neil
Co-founder
UCLA-trained biochemist, bridges design and tech, co-founded a full service brand agency and shaped strategy in early-stage life sciences ventures at Tachyon
Contact
Careers: careers@crosshatch.io
Demo: Book a demo