A cybersecurity incident at analytics provider Mixpanel, announced hours before the US Thanksgiving weekend, could set a new standard for how data breaches are not announced.
Summary: In a bare-bones blog post last Wednesday, Mixpanel CEO Jen Taylor announced that the company detected an unspecified security incident affecting some customers on November 8th, but she did not say how or how many were affected, only that Mixpanel had taken various security measures to “root out unauthorized access.”
Mixpanel CEO Jen Taylor did not respond to multiple emails from TechCrunch containing more than a dozen questions about the company's data breach. We asked Taylor other specific questions about the breach, including whether the company received any communications from the hackers, such as demands for money, and whether Mixpanel employee accounts are protected with multi-factor authentication.
One of the affected customers, OpenAI, published its own blog post two days later confirming that customer data was obtained from Mixpanel's systems, which Mixpanel did not explicitly say in its own post.
OpenAI said it was affected by the breach because it relied on software provided by Mixpanel to understand how OpenAI users interact with certain parts of the website, such as developer documentation.
OpenAI users affected by the Mixpanel breach are likely developers who have their own apps and websites that rely on OpenAI products to work. OpenAI said the stolen data included user-provided names, email addresses, approximate location based on IP address (such as city or state), and identifiable device data such as operating system and browser version. Some of this information is the same type of data that Mixpanel collects from your device when you use the app or browse the website.
OpenAI spokesperson Nico Felix told TechCrunch that the compromised data obtained from Mixpanel “did not include any identifiers such as Android Advertising ID or Apple IDFA,” which could have made it easier to personally identify specific OpenAI users or to combine OpenAI activity with usage from other apps or websites.
OpenAI said in a blog post that the incident does not directly impact ChatGPT users and that it has terminated its use of Mixpanel as a result of the violation.
Although details of the breach are still limited, the incident has drawn new attention to the data analytics industry, which profits from collecting vast amounts of information about how people use websites and apps.
How Mixpanel tracks taps, clicks and monitors your screen
Mixpanel is one of the largest web and mobile analytics companies that you may have never heard of unless you work in app development or marketing. According to its website, Mixpanel has 8,000 enterprise customers, now down by one due to OpenAI's early exit.
Each Mixpanel customer potentially has millions of users, so the number of civilians whose data was exposed in a data breach could be huge. The type of data compromised may vary by Mixpanel customer depending on how each customer configured their data collection and the amount of user data they collected.
Companies like Mixpanel are part of a burgeoning industry that provides tracking technology that allows businesses to understand how their customers and users interact with their apps and websites. As a result, analytics companies can collect and store vast amounts of information, including billions of data points, about average consumers.
For example, app makers and website developers can embed code from analytics companies like Mixpanel within their apps and websites to gain visibility. For app users and website visitors, it's like having someone watching over your shoulder as you browse a website or use an app, even though your clicks, taps, swipes, and link presses are always shared with the company that developed the app or website.
For Mixpanel, you can easily see what kind of data Mixpanel collects from apps and websites that have embedded code. TechCrunch used open-source tools like Burp Suite to analyze network traffic to and from several apps that incorporate Mixpanel code, including Imgur, Lingvano, Neon, and Park Mobile. In various tests, we found that Mixpanel uploads varying degrees of information about your device and in-app activity while using the app.
This data may include your activities such as opening apps, tapping links, swiping through pages, and signing in with your username and password. This event log data is attached to information about users and their devices. This includes the device type (such as iPhone or Android), screen width and height (if the user is using a phone network or Wi-Fi), the user's mobile network carrier, the logged-in user's unique identifier for that service (which can be associated with the app user), and the exact timestamp of that event.
The data collected may contain information that should be off-limits. Mixpanel admitted in 2018 that its analytics code inadvertently collected users' passwords.
The data collected by analytics companies is intended to be pseudonymized, which means that it is essentially scrambled so that it does not include personally identifying details such as an individual's name. Instead, the information collected is associated with a seemingly random unique identifier that is used in place of an individual's name. Ostensibly a more private method of storing data. However, pseudonymized data can be used in reverse and to identify people in the real world. Data collected about a person's device can also be used to uniquely identify that device, known as “fingerprinting.” You can also use it to track that user's activities across different apps and the internet.
By tracking users' on-device behavior across different apps, analytics companies make it easy for customers to build profiles of users and their activities.
Mixpanel also allows customers to collect “session replays.” It visually restructures how the company's users interact with its apps and websites to help developers identify bugs and issues. Although session replay is intended to remove personally identifiable and sensitive information, such as passwords and credit card numbers, from collected user sessions, this process is not perfect either.
As Mixpanel itself admits, session replays may contain sensitive information that should not have been logged but was collected in error. Apple cracked down on apps that used screen recording codes after TechCrunch exposed the practice in 2019.
To say that Mixpanel has questions to answer regarding its violations is probably an understatement. Without knowing the specific type of data involved, it is not clear how large a breach this is or how many people are affected. You may not know about Mixpanel yet.
What is clear is that companies like Mixpanel store vast amounts of information about people and how they use their apps, making them an obvious focus for malicious hackers.
Do you know more about the Mixpanel data breach? Do you work for Mixpanel or a company affected by the breach? We'd love to hear from you. To contact this reporter securely, use Signal using username zackwhittaker.1337.

