Everything you need to know about canvas fingerprinting
June 13, 2017 | FINGERPRINTS
There are many techniques out there that can be used to track and identify you through your browsers. One of the most creative and effective ways websites can do this is through canvas fingerprinting. This innovative tracking technique allows websites to identify users by examining how their computers perform the task of drawing an image.
We frequently receive messages from privacy-oriented individuals that suggest we create a list of canvas fingerprinting hashes. This misconception stems from the fact that online tests represent an individual canvas fingerprint as a hash. However, creating a full database of these hashes is not possible due to the very nature of the fingerprinting process itself.
You need to consider several different factors to truly understand how canvas fingerprinting works. In this article, we will provide an in-depth explanation about canvas fingerprinting, how it works, and why the options available to combat this tracking technique are not ideal.
Understanding hashing functions
Before going into canvas fingerprinting, you have to understand the concept of hashing functions. Hashing functions take a chunk of data such as a piece of text, image, or audio and reduce it to a standardized amount of data without losing its uniqueness. These are known as hashes.
There are numerous hashing functions available, but they all have one thing in common: they are used to reduce the amount of data for fast and easy comparison, albeit there were other uses.
One of the reasons why hashing functions are used in canvas fingerprinting is the fact that they will always produce the same result if the input remains exactly the same. For instance, if you run the word “bizarre” through the SHA-256 hashing function, the resulting hash will always be:
Now, to the interesting part. If we run the term “bizarre ” which consists of the same word with space next to it, the resulting hash would be completely different:
In other words, if you run two elements that are identical to the human eye through a hashing function, they may still produce different results. This is due to small variations and differences in the input that are not noticeable to the human eye. Remember this, as it plays a big role in canvas fingerprinting!
Another characteristic you need to understand is that hashing functions are non-reversible. In other words, you can turn any piece of data into a hash, but you will not be able to reverse it back into the original input. For example, using the MD5 hashing function you can turn the word “dog” into 06d80eb0c50b49a509b49f2424e8c805 but you will not be able to perform the same procedure backward.
In addition, if you apply a hash function to different chunks of information, it’s impossible to tell how similar or how different the original inputs were. Unfortunately, this characteristic does not affect the efficacy of canvas fingerprinting.
How do websites read canvas fingerprints?
Canvas fingerprinting begins when a website gives your browser the task of drawing a canvas object. Keep in mind that the canvas object is not your canvas fingerprint, it’s simply a tool that sites use to create simple and complex graphics alike.
The main thing you have to always keep in mind is that different computers will draw the image in a slightly different way. Even if the images produced look the same to the human eye, there are slight variations that allow them to be differentiated.
So, remember the nature of hash functions? Two pieces of information that look the same to the human eye, but have slight variations, will result in completely different hashes. Even the smallest, most minuscule differences will be enough to produce drastically different results.
But, why do different computers generate different images when they are given the same instructions?
Drawing images with mathematical formulas
When programmers draw images inside a canvas object the process is not the same as drawing an image in MS Paint. The image drawn is a result of a script that follows a mathematical formula. Take a minute to remember your high school days and think how you could draw a circle using formulas.
First off, you would need two coordinates (X and Y) to establish the center point of the circle. Then, you will need a radius (R) expressed in pixels. Once you have these two, your computer then draws the circle on your screen by filling all individual pixels that are located at the R distance from the center of the circle. Easy, right?
What makes a canvas fingerprint unique?
Image: how system fonts looked like on old computer systems.
In the stone age of the computer era, all machines would draw the exact same image when given the same instructions. But, with the development of high-resolution screens, hardware and software developers came up with filters that improve the final appearance of these images.
These filters are applied when the formulas are transformed into pixel images, and they result in images that are sharper, crisper, and simply put, look better. The most notable filter is anti-aliasing, but there are also other specific ones, like hints, which are utilized when drawing fonts.
Likewise, all fonts contain glyphs, which can be described as a set of paths or closed curves that are specified using a particular mathematical formula. For instance, a lower case “i” has two glyphs, one for the dot and one for the body. These particular glyphs, which are also known as outlines, are then filled with pixels to create the final letter form.
Glyphs can also behave differently because they sometimes depend on other components that surround and influence them. A glyph may contain references to other paths that combine to make a compound glyph, for example, an “é.” In this compound glyph, both the “e” and the accent mark have placement and optional transformation data that are associated with them.
Image: font hinting principle scheme
Besides the basic mathematical data that defines the outline of each particular glyph, fonts can also store additional “hints.” Hints are basically instructions that are executed when the glyphs are drawn on your screen. These instructions move some of the points which define the shape of the letters, in order to make sure they are positioned correctly in relation to the grid where the glyph is displayed. This way, the font will look the same regardless of what screen it’s displayed on.
Image: how anti-aliasing works
Anti-aliasing may be the most common filter used today, and it consists of using gray pixels to smudge the edges of each glyph. If you zoom into a page, you’ll notice that the edges of curved letters are not perfect, but are rather jagged. The anti-aliasing filter smoothens out these jagged edges because our eyes average out the difference in tonality.
What makes each canvas fingerprint unique is not the final image that we see, but how each computer renders hinting and anti-aliasing. Different computers carry out each process differently, and this fact allows for effective fingerprinting.
When two computers are given the same drawing task, there are slightly different tones of borderline pixels among other distinctions. These minuscule process discrepancies result in an image that looks the same to us, but not websites. Note that scientific studies indicate that computer hardware, drivers, and browser versions can all affect the resulting glyphs. Also, in our own research, we’ve noticed that computers with the same graphic processing units (GPUs) will likely produce the same results.
Comparing canvas fingerprints through hashes
Now that we’ve covered hashing functions and what makes canvas fingerprints unique, it’s time to tie them together. Websites that use canvas fingerprinting give all visitors instructions on drawing specific images. But, sending back the rendered image to a website would be impractical. Instead, websites use a hashing function to reduce the size of the data without losing its uniqueness.
Like we mentioned before, two pieces of data that look the same to the human eye can feature differences that result in different hashes, like “bizarre” and “bizarre ”. The principle is the same for canvas fingerprints. To the human eye, the images displayed all look the same, but minor differences and discrepancies that result from using different machines can be identified, hashed, and sent over to websites through simple yet unique strings of information.
In a nutshell, canvas fingerprinting happens when a website determines how your computer processes graphical instructions. This is why coming up with a database of canvas hashes is virtually impossible. Websites can send a specific set of instructions to your computer, and can use one of several different hashing functions to simplify the data sent back to them. Moreover, sites can change the instructions and/or hashing functions at any point in time to fingerprint visitors in a completely different way.
Mitigation mechanisms and why they don’t work well
Because of the nature of canvas fingerprinting techniques, it’s really difficult to come up with an effective way to combat it. The first solution developed by the online privacy community was to disable the canvas function. This may have been a feasible solution if it was adopted by a large enough group. However, at the time of writing this article, we estimate that only 20,000 users around the world use add-ons that block canvas fingerprints, and the mere fact of using these add-ons can be used as a fingerprint on its own.
As we mentioned before, computers that have the same GPU will likely have the same canvas output. Video adapters that have identical GPUs can be sold by the thousands or millions even. Putting yourself in a group that consists of millions, rather than the 20,000 using canvas blockers, would deliver more effective results.
Here at Multilogin, we developed a more robust solution known as Canvas Defender. The idea behind it was to add an arbitrary noise to the canvas output by randomly changing tones of some of the pixels in the image. Yes, while the solution allowed sites to track users more precisely, there were two major benefits that worked in specific scenarios.
The first was the ability to “drop the tail” because you could decide when you wanted to change your canvas fingerprint. The second benefit only applied to users who employed different browser profiles to carry out different tasks. Although every profile could be fingerprinted, tracking a user that only visited one website and carried out one specific task was a pointless exercise.
Unfortunately, Canvas Defender was quickly noticed by major web platforms, and as much as we love protecting online privacy, they love collecting information from us. To prevent users from using Canvas Defender, major websites started filtering out users who had a completely unique canvas fingerprint. This makes sense because there are no users with completely unique GPUs, except for the engineers who are testing new graphic card models in their laboratories.
Canvas fingerprinting is efficient and poses a huge challenge for online privacy enthusiasts. Blocking canvas objects is only effective if a large group of users adopts this solution, which is unlikely due to the complexity of this subject.
Masking canvas fingerprint data with fake parameters requires additional steps, like setting up different browser profiles. In addition, web platforms can usually detect masked parameters easily. Even if they became easy to implement, web platforms would disrupt users who utilize masked parameters by incorporating additional verification steps and security checks.
That being said, it’s not all gloom and doom when it comes to canvas fingerprinting. Here at Multilogin, we have already developed a theoretical solution that is likely to become the effective answer we have all been waiting for.
For now, we won’t discuss this solution in detail as we are still working on how to implement it and apply it to all scenarios; not to mention we don’t want to give a competitive advantage to those who develop fingerprinting mechanisms.
But, rest assured that we will be rolling out this solution sooner or later as big web platforms have had their turn. Now the ball is in our court, and we are planning on having a big impact.