Everything you need to know about canvas fingerprinting
APRIL 11, 2023
Websites can track and identify you through various methods, including canvas fingerprinting. This technique identifies users by examining how their computers draw images. Some people suggest creating a list of canvas fingerprinting hashes, but this is not feasible due to the nature of the process.
To understand canvas fingerprinting, you need to consider different factors. In this article, we explain how it works and why the options to combat this tracking technique are not ideal.
Understanding hashing functions
Before going into canvas fingerprinting, you must understand the concept of hashing functions. Hashing functions take a chunk of data, such as a piece of text, image, or audio, and reduce it to a standardized amount of data without losing its uniqueness. These are known as hashes.
There are numerous hashing functions available, but they all have one thing in common: they are used to reduce the amount of data for fast and easy comparison, albeit there were other uses.
One of the reasons why hashing functions are used in canvas fingerprinting is that they will always produce the same result if the input remains exactly the same. For instance, if you run the word “bizarre” through the SHA-256 hashing function, the resulting hash will always be:
Now, to the interesting part. If we run the term “bizarre,” which consists of the same word with a space next to it, the resulting hash would be completely different:
In other words, if you run two elements identical to the human eye through a hashing function, they may still produce different results. This is due to small variations and differences in the input that are not noticeable to the human eye. Remember this, as it plays a big role in canvas fingerprinting!
Another characteristic you need to understand is that hashing functions are non-reversible. In other words, you can turn any data into a hash, but you will not be able to reverse it back into the original input. For example, using the MD5 hashing function, you can turn the word “dog” into 06d80eb0c50b49a509b49f2424e8c805, but you will not be able to perform the same procedure backward.
In addition, if you apply a hash function to different chunks of information, it’s impossible to tell how similar or how different the original inputs were. Unfortunately, this characteristic does not affect the efficacy of canvas fingerprinting.
How do websites read canvas fingerprints?
The images can be complex with different elements, colors, and backgrounds, and may vary slightly in appearance on different computers. Even tiny differences can result in drastically different hashes, which is why computers generate different images when given the same instructions.
Drawing images with mathematical formulas
When programmers draw images inside a canvas object the process is not the same as drawing an image in MS Paint. The image drawn is a result of a script that follows a mathematical formula. Take a minute to remember your high school days and think about how you could draw a circle using formulas.
First, you would need two coordinates (X and Y) to establish the circle's center point. Then, you will need a radius (R) expressed in pixels. Once you have these two, your computer draws the circle on your screen by filling in all the individual pixels located at the R distance from the center of the circle. Easy, right?
What makes a canvas fingerprint unique?
In the "Stone Age" of the computer era, all machines would draw the exact same image when given the same instructions. But, with the development of high-resolution screens, hardware, and software developers came up with filters that improved the final appearance of these images.
These filters are applied when the formulas are transformed into pixel images, resulting in sharper images, crisper, and simply put, look better. The most notable filter is anti-aliasing, but there are other specific ones, like hints, which are utilized when drawing fonts.
Likewise, all fonts contain glyphs, which can be described as a set of paths or closed curves specified using a particular mathematical formula. For instance, a lowercase “i” has two glyphs, one for the dot and one for the body. These particular glyphs, also known as outlines, are then filled with pixels to create the final letter form.
Glyphs can also behave differently because they sometimes depend on other components that surround and influence them. A glyph may contain references to other paths that combine to make a compound glyph, for example, an “é.” In this compound glyph, both the “e” and the accent mark have placement and optional transformation data associated with them.
Besides the basic mathematical data that defines the outline of each particular glyph, fonts can also store additional “hints.” Hints are basically instructions that are executed when the glyphs are drawn on your screen. These instructions move some of the points which define the shape of the letters to make sure they are positioned correctly on the grid where the glyph is displayed. This way, the font will look the same regardless of what screen it’s displayed on.
Anti-aliasing may be the most common filter used today, and it consists of using gray pixels to smudge the edges of each glyph. If you zoom into a page, you’ll notice that the edges of curved letters are not perfect but are rather jagged. The anti-aliasing filter smoothens out these jagged edges because our eyes average out the difference in tonality.
What makes each canvas fingerprint unique is not the final image we see but how each computer renders hinting and anti-aliasing. Different computers carry out each process differently, allowing for effective fingerprinting.
When two computers are given the same drawing task, there are slightly different tones of borderline pixels, among other distinctions. These minuscule process discrepancies result in an image that looks the same to us but not to websites. Note that scientific studies indicate that computer hardware, drivers, and browser versions can all affect the resulting glyphs. Also, in our research, we’ve noticed that computers with the same graphic processing units (GPUs) will likely produce the same results.
Comparing canvas fingerprints through hashes
To perform canvas fingerprinting, websites provide instructions for drawing images, but it's not practical to send the rendered image back. Instead, a hashing function is used to reduce data size while maintaining its uniqueness.
Minor differences in the images can result in different hashes. This process enables websites to identify and track visitors through unique strings of information based on the machine they use.
Creating a database of canvas hashes is nearly impossible due to the variability in instructions and hashing functions. Websites can change these at any time to fingerprint visitors in different ways.
Mitigation mechanisms and why they don’t work well
Canvas fingerprinting is a technique that is hard to combat. Disabling the canvas function was suggested as a solution, but it is not widely adopted. Only 20,000 users worldwide use add-ons that block canvas fingerprints, which could be used as a fingerprint on its own.
Computers with the same GPU will have the same canvas output, so adding arbitrary noise to the canvas output was proposed. Here at Multilogin we developed a solution called Canvas Defender, but major web platforms started filtering out users with completely unique canvas fingerprints.
Canvas fingerprinting is a big challenge for online privacy, and blocking canvas objects is only effective if a large group of users adopts this solution. Masking canvas fingerprint data with fake parameters requires additional steps and web platforms can easily detect masked parameters.
However, there is hope as a solution is being developed by Multilogin, but details are not being discussed yet as they are still working on implementation.