Glean vs. the universe
In which developer Chris Moore persuasively argues that a recent Glean glitch was caused by… the sun?
2 min read Published: 4 Feb 2022The incident
One day, while checking logs, I noticed that a user was having trouble uploading some changes they made to an event.
Each audio snippet in an event has some corresponding metadata and one of the fields in this metadata is the “duration” of the audio. However, this user had one bit of metadata with a “deration” field instead, so the server was rejecting it as invalid event JSON.
Combing through the history of the codebase, “deration” appeared precisely zero times. So what happened here?
Bit flips
At the lowest level, all data is stored in binary. Each bit represents some power of 2, so a bit changing unexpectedly can result in a discrepancy of that power of 2. This happened in the 2003 election in Schaerbeek, Belgium, where a candidate received 4096 (or 2^12) more votes than was considered possible, and so a bit flip is suspected to have been the culprit.
Another example is when a Super Mario 64 speedrunner inexplicably teleported upwards during a run, and managed to replicate it by flipping a bit in Mario’s height.
In our case, a particularly eagle-eyed developer noticed that in Unicode, “e” and “u” are represented in binary as “1100101” and “1110101” respectively. So all it would take is one bit in the user’s local storage to flip from a 1 to a 0 to go from “u” to “e”.
This is about as far as we can get with cold, hard facts. I cannot prove that a bit definitely flipped in this user’s storage. They could’ve manually gone and messed with their IndexedDB instance for all we know. But let’s assume the bit did flip for the sake of this tech blog!
Any number of software or hardware issues could cause a bit to flip: integer overflow errors, power surges, the list goes on. Considering that at the date of writing, this is the only time we’ve seen this happen, we’ll rule out issues in our codebase (how convenient!). As for potential hardware issues, we can’t really say for sure. However, there is one potential cause that is significantly more interesting (to me at least, and hopefully to you too)!
Cosmic rays
Space is full of particles, all whizzing around from stars and galaxies. Cosmic rays in particular are high-energy particles (mostly protons), which primarily come from the sun. These particles very rarely make it to the surface of the earth, but rather interact with particles in the earth’s atmosphere. This in turn produces showers of particles, and it’s these particles that are more likely to get up to mischief.
The particles in these cosmic ray showers are still quite energetic, enough to potentially ionise atoms they come into contact with. If this were to happen in a hard drive, the small resultant charge could be enough to flip a bit.
So it’s entirely possible (although impossible to prove) that the sun is directly responsible for this Glean user having been unable to upload their event.
Case closed
We fixed the issue by putting some custom code in to clean up the affected user’s IndexedDB instance, and removed it again once it had run in production. Therefore, by ironclad deductive reasoning with only a few assumptions, it can be concluded that we are, in fact, more powerful than the Sun.
Take on the universe with Team Glean
When we’re not fighting against the cosmos, we develop software that helps learners achieve their full potential.
To learn more about our mission (and to see our current vacancies), follow the link below!
More from Tech Blog
View AllGlean hack week - developing a Minimal Loveable Feature
Our Glean Engineering team recently took time out of their busy schedules to run a hack week, designed to build innovative solutions and unleash their creativity. Engineering Manager, Mala Benn, is here to tell us how they got on.
Dart Type Promotion
In this article, we'll walk you through a common issue with dart type promotion, showing you why it occurs and some ways to solve it.
Exploratory Testing at Glean
Zaryoon shares what Exploratory Testing is, and how it can be implemented in your tech teams to gain a deeper understanding of how your features work.