1. Introduction
User agents generally make decisions about whether or not to load resources based on judgements about those resources' impact on user safety. Some of these decisions are widely agreed-upon, and have been codified as normative requirements in Fetch ("bad port" and Mixed Content restrictions, for example), while other decisions diverge between agents in a reflection of their unique and proprietary heuristics and judgements. User agents which rely upon Google’s Safe Browsing; Microsoft’s SmartScreen; or tracking protection lists from Disconnect, DuckDuckGo, etc. will all make different decisions about the specific set of resources they’ll refuse to load. It would be ideal, however, for those decisions to have a consistent impact when made. How are those decisions exposed to the web? How are they ordered vis a vis the standardized decisions discussed above? Are there properties we can harmonize and test?
This document aims to answer those questions in the smallest way possible, monkey-patching Fetch to provide an implementation-defined hook for blocking decisions, and sketching out a process by which widely agreed-upon categories of resource-blocking decisions could be tested at a high level of abstraction.
2. Infrastructure
For many of the blocking behaviors described above, user agents seem to have aligned on a pattern of applying well-defined blocking mechanisms ([CSP], [MIX], etc) first, only consulting a proprietary set of heuristics if the request would generally be allowed. Likewise, agents generally align on treating blockage as a network error, though some browsers will instead generate a synthetic response ("shim") for well-known resources to ensure compatibility.
We can support these behaviors with additions to [FETCH] that define an implementation-defined algorithm that we can call from Fetch § 4.1 Main fetch.
2.1. Overriding responses.
The Override response for a request algorithm
takes a request (request), and returns either a response or null
.
This provides user agents to intervene on a given request by returning a
response (either a network error or a synthetic response), or to allow
the request to proceed by returning null
.
By default, this operation has the following trivial implementation:
-
Return
null
.
https://mikewest.org/
, while shimming the
widely-used resource https://mikewest.org/widget.js
to avoid breakage. That
implementation might look like the following:
-
If request’s current url’s host’s registrable domain is "mikewest.org":
-
If request’s current url’s path is « "widget.js" », then:
-
Return a network error.
-
-
Return
null
.
2.2. Monkey-Patching Fetch
-
If fetchParams is canceled, then return the appropriate network error for fetchParams.
-
Let request be fetchParams’s request.
-
Let override response be the result of executing Override response for a request on request.
-
If override response is not
null
, return override response. -
Switch on request’s current URL’s scheme and run the associated steps:
- …
cors
requests) can be handled:
-
Let request be fetchParams’s request.
-
Let response and internalResponse be null.
-
Set response to the result of executing Override response for a request on request.
-
If response is
null
and request’s service-workers mode is "all
", then:- …
Note: Putting this check in both Fetch § 4.2 Scheme fetch and Fetch § 4.3 HTTP fetch
seems redundant, but ensures that it has a chance to act upon requests to both
HTTP(S) schemes and non-HTTP(S) schemes (like blob:
, data:
, and
file:
) for both no-cors
and cors
requests. An alternative to this approach
would place this new check just before step 12 of Fetch § 4.1 Main fetch, and
extract the network errors which could be produced in that step to ensure
they happen consistently prior to potential shimming. This approach seems
simpler, but the alternative might be clearer? Especially given the appearance
of a bypass through the preloaded response candidate
check (which isn’t really
a bypass, assuming that the user agent’s intervention blocks the initial request
which would have preloaded the response (which seems reasonable to assume)).
3. Testing Considerations
It would be ideal to verify the ordering of various restrictions that come into play via the patch to Fetch described above. Content Security Policy, Mixed Content (both blockage and upgrades), and port restrictions are all evaluated prior to checking in with any implementation-defined blockage oracle, and this behavior should be verifiable and consistent across user agents.
There’s likely no consistent way to do this for any and all blocking mechanisms, but specific categories of blocking behavior that have widespread agreement seem possible to test in a consistent way. As a potential path to explore, consider that Google’s Safe Browsing defines a small set of known-bad URLs (see https://testsafebrowsing.appspot.com/) that allow manual verification of browser behavior. Perhaps we could extend this notion to some set of high-level blockage categories that user agents seem to generally agree upon ("phishing", "malware", "unwanted software", "fingerprinting", etc), and define well-known test URLs for each within the WPT framework.
That is, phishing.web-platform.test
could be added to user agents' lists of
phishing sites, and represented within WPT via substitutions against
{{domains[phishing]}}
. We’d likely need some Web Driver API to make this
possible, but it seems like a plausible approach that would allow us to verify
ordering and replacement behaviors in a repeatable way.
Note: Some blocking behaviors (blocking all top-level navigation to an origin, for example) might be difficult to test based only upon web-visible behavior, as network errors and cross-origin documents ought to be indistinguishable. We could rely upon leaks like frame counting, but ideally we’d treat that as a bug, not a feature we can rely upon.
4. Security Considerations
Blocking or shimming subresource requests can put pages into unexpected states that developers are unlikely to have tested or reasoned about. This can happen in any event, as pages might be unable to load specific resources for a variety of reasons (outages, timeouts, etc). Ideally developers would handle these situations gracefully, but user agents implementing resource blocking would be well-advised to take the risk seriously, and carefully evaluate resources' usage before taking action against them.
5. Privacy Considerations
Blocking resources has web-visible implications. If the set of resources blocked for one user differs from the set of resources blocked by another user (based, perhaps, on heuristics that take individual users' browsing behavior into account), that visible delta could be used as part of a global identifier (see e.g. "Information Leaks via Safari’s Intelligent Tracking Prevention" for a variant of this attack [information-leaks]). User agents implementing resource blocking can avoid this risk by ensuring that the set of blocked resources is as uniform as possible across their userbase.