home

Workwise • supercookie

📚 Please have a look at this elaboration from University of Illinois: www.cs.uic.edu

Introduction

Data is the new gold!

Browsers are the most widespread access medium that makes it incredibly easy for us humans to connect to the Word Wide Web.
Due to the constant development of the Internet, such as the continuous elaboration of new standards and features, the introduction of powerful APIs and further interfaces on the browser side, the possibilities for collecting and analyzing data have also significantly expanded over the last few decades!

First and foremost, there is nothing wrong with collecting data at all. All of us collect data, whether unconsciously in private everyday life or completely consciously in school or at work - collecting data, interpreting it and drawing conclusions is actually incredibly important!

With the launch of the WWW for the public and the development of the first online services, data collection also started to become interesting for the various website providers, according to the motto if I own a website, I also want to know who is surfing it.
However, in most cases we as consumers only want to disclose as little as possible and only the data necessary for the intended service - in fact, my private data is no one else's business.

The above-mentioned further development of the WWW's capabilities has allowed data to be assigned to individual profiles, enabling the recognition of unique users and the ability to trace their browsing activities even across different pages - the so called device fingerprinting.
Some known methods for assigning a unique fingerprint to browsers are hardware benchmarking, fingerprinting via Canvas and WebGL or analysis of active browser extensions.
This article is about a less known way to achieve something similar!

Background

Modern browsers offer a wide range of features to improve and simplify the user experience.
One of these features are the so-called favicons: A favicon is a small (usually 16×16 or 32×32 pixels) logo used by web browsers to brand a website in a recognizable way. Favicons are usually shown by most browsers in the address bar and next to the page's name in a list of bookmarks.

To serve a favicon on their website, a developer has to include an <link rel> attribute in the webpage’s header. If this tag does exist, the browser requests the icon from the predefined source and if the server response contains an valid icon file that can be properly rendered this icon is displayed by the browser. In any other case, a blank favicon is shown.

<link rel="icon" href="/favicon.ico" type="image/x-icon">
The favicons must be made very easily accessible by the browser. Therefore, they are cached in a separate local database on the system, called the favicon cache (F-Cache). A F-Cache data entries includes the visited URL (subdomain, domain, route, URL paramter), the favicon ID and the time to live (TTL).
While this provides web developers the ability to delineate parts of their website using a wide variety of icons for individual routes and subdomains, it also leads to a possible tracking scenario.

When a user visits a website, the browser checks if a favicon is needed by looking up the source of the shortcut icon link reference of the requested webpage.
The browser initialy checks the local F-cache for an entry containing the URL of the active website. If a favicon entry exists, the icon will be loaded from the cache and then displayed. However, if there is no entry, for example because no favicon has ever been loaded under this particular domain, or the data in the cache is out of date, the browser makes a GET request to the server to load the site's favicon.

Threat Model

In the article a possible threat model is explained that allows to assign a unique identifier to each browser in order to draw conclusions about the user and to be able to identify this user even in case of applied anti-fingerprint measures, such as the use of a VPN, deletion of cookies, deletion of the browser cache or manipulation of the client header information.

A web server can draw conclusions about whether a browser has already loaded a favicon or not:
So when the browser requests a web page, if the favicon is not in the local F-cache, another request for the favicon is made. If the icon already exists in the F-Cache, no further request is sent.
By combining the state of delivered and not delivered favicons for specific URL paths for a browser, a unique pattern (identification number) can be assigned to the client.
When the website is reloaded, the web server can reconstruct the identification number with the network requests sent by the client for the missing favicons and thus identify the browser.


  1. Write identification

    The goal of the write operation is to generate a unique identifier and store it on the client side.
    First step is to create a new N-bit ID on the server and translate it to a path vector as shown below.

    Example:

    const N = 4;
    const ROUTES = ["/a", "/b", "/c", "/d"];
    const ID = generateNewID(); // -> 1010 • (select unassigned decimal number, here ten: 10 -> 1010b in binary)
    const vector = generateVectorFromID(ID); // -> ["/a", "/c"] • (because [a, b, c, d] where [1, 0, 1, 0] is 1 -> a, c)

    Second step is to store the actual data inside the browser:
    The user will be redirected along all of the website paths, starting at /a, navigating to /b, to /c and finally to /d.
    • /a
    • /b
    • /c
    • /d

    While the user is redirected on every load the browser requests a favicon for the respective route, going the same way like/a/favicon.ico, to /b/favicon.ico, to /c/favicon.ico and finally to /d/favicon.ico.
    • /a/favicon.ico
    • /b/favicon.ico
    • /c/favicon.ico
    • /d/favicon.ico

    The webserver will now only process those favicon requests whose path is present in the previously created path vector. If the route is present the webserver answers with the favicon file and Status 200 OK.
    If the requested route is not in the path vector, the webserver aborts the request with an Error 404 Not Found, or sends no response.
    Since the browser - as described earlier - only stores the delivered favicons in the F-Cache, we have now stored our unique identification number and the writing process is complete.

    In the above example, the webserver only responds to requests for the favicons under paths /a/favicon.ico and /c/favicon.ico. The F-Cache only has favicons-entries for these two paths.



  2. Read identification

    Here the goal is to re-identify a returning user based on his existing F-Cache entries.

    In read mode the server always responds to favicon requests with an Error 404 Not Found status, but responds normally to all other requests. This preserves the integrity of the cached favicons during the read operation, since no new F-cache entry is created by the browser.
    To reconstruct a visitor's identifier, the browser must be routed through all available routes. The server records which favions are requested by the browser (those that are not present in the browsers F-cache) and which are not.

    Example:

    const visitedRoutes = [];
    Webserver.onvisit = (route) => visitedRoutes.push(route); // -> ["/b", "/d"]
    Webserver.ondone = () => { const ID = getIDFromVector(visitedRoutes) }; // -> 10 • (because "/a" and "/b" are missing -> 1010b)
    The server can thus reconstruct the identification from the missing favicon requests and the reading process is complete.


Target

It looks like all top browsers are vulnerable to this attack scenario.
Mobile browsers are also affected.

Browser

Windows

MacOS

Linux

iOS

Android

Info
Chrome (v 87.0) -
Safari (v 14.0) - - - -
Edge (v 87.0) -
Firefox (v 85.0) Fingerprint different in incognito mode
Brave (v 1.19.92) -

Browser

Windows

MacOS

Linux

iOS

Android

Info
Brave (v 1.14.0) -
Firefox (< v 84.0) -


The demonstration also impressively shows that applying anti-tracking software, adblockers, VPN or surfing in incognito mode does not offer any significant improvement and the browser remains vulnerable to the tracking even with these measures:

Browser Incognito / Private mode Clear Website Data VPN Adblock / Anti-Tracking
Chrome
Safari
Edge
Firefox

Scalability & Performance

By varying the number of bits that corresponds to the number of redirects to subpaths, this attack can be scaled almost arbitrarily.
It can distinguish 2^N unique users, where N is the number of redirects on the client side.

Since each subpath redirection increases the duration of the identification, the performance of the attack the webserver can dynamically increase the number of redirects. This is done trivially by appending a new subpath in the sequence of subpaths.
The calculation of the number of redirects (N) is done by the operation: "floor(log2(id))+1", where id corresponds to the decimal identification number.
For example, if the server changes from 3-bit identifiers to 4-bit identifiers, the subpath vector will change from ["/a", "/b", "/c"] to ["/a", "/b", "/c", "/d"] and the identifier of a client (here dec. 6) changes from "011" to "0110" without changing the actual value of already written F-Cache identifiers.

This leads to the fact that only the minimum number of redirections is necessary for the attack.

The time taken for the read and write operation increases as the number of distinguishable clients and redirects does.

The following measured times prove to be the minimum time required for this attack to work. The actual time required in practice depends on many more factors, such as Internet speed, location, hardware setup and browser type.

Redirects
(N bit)
distinguishable clients write time read time scale information
2 4 < 300ms < 300ms One user with four browsers
3 8 < 300ms ~ 300ms About the amount of Kardashians
4 16 < 1s ~ 1s Bunch of your neighbors
8 256 < 1s ~ 1s All your facebook-friends
10 1024 < 1.2s ~ 1s Really small village
20 1,048,576 < 1.8s < 1.5s Small city (San Jose, California)
24 16,777,216 < 2.4s < 2s Whole Netherlands
32 4,294,967,296 ~ 3s < 3s All people with internet access
34 17,179,869,184 ~ 4s ~ 4s All people with internet access each using 4 different browsers

Related work