Script Blocking

A Collection of Interesting Ideas,

This version:
https://explainers-by-googlers.github.io/script-blocking/
Issue Tracking:
GitHub
Editor:
(Google Inc.)

Abstract

User agents may block resource requests in order to protect users. This document suggests a hook in Fetch which would explain these behaviors and allow alignment on timing and impact.

1. Introduction

User agents generally make decisions about whether or not to load resources based on judgements about those resources' impact on user safety. Some of these decisions are widely agreed-upon, and have been codified as normative requirements in Fetch ("bad port" and Mixed Content restrictions, for example), while other decisions diverge between agents in a reflection of their unique and proprietary heuristics and judgements. User agents which rely upon Google’s Safe Browsing; Microsoft’s SmartScreen; or tracking protection lists from Disconnect, DuckDuckGo, etc. will all make different decisions about the specific set of resources they’ll refuse to load. It would be ideal, however, for those decisions to have a consistent impact when made. How are those decisions exposed to the web? How are they ordered vis a vis the standardized decisions discussed above? Are there properties we can harmonize and test?

This document aims to answer those questions in the smallest way possible, monkey-patching Fetch to provide an implementation-defined hook for blocking decisions, and sketching out a process by which widely agreed-upon categories of resource-blocking decisions could be tested at a high level of abstraction.

2. Infrastructure

For many of the blocking behaviors described above, user agents seem to have aligned on a pattern of applying well-defined blocking mechanisms ([CSP], [MIX], etc) first, only consulting a proprietary set of heuristics if the request would generally be allowed. Likewise, agents generally align on treating blockage as a network error, though some browsers will instead generate a synthetic response ("shim") for well-known resources to ensure compatibility.

We can support these behaviors with additions to [FETCH] that define an implementation-defined algorithm that we can call from Fetch § 4.1 Main fetch.

2.1. Overriding responses.

The Override response for a request algorithm takes a request (request), and returns either a response or null.

This provides user agents to intervene on a given request by returning a response (either a network error or a synthetic response), or to allow the request to proceed by returning null.

By default, this operation has the following trivial implementation:

  1. Return null.

Note: This default implementation is expected to be overridden by a somewhat more complex implementation-defined algorithm. For example, a user agent might decide that its users' safety is best preserved by generally blocking requests to https://mikewest.org/, while shimming the widely-used resource https://mikewest.org/widget.js to avoid breakage. That implementation might look like the following:
  1. If request’s current url’s host’s registrable domain is "mikewest.org":

    1. If request’s current url’s path is « "widget.js" », then:

      1. Let body be [insert a byte sequence representing the shimmed content here].

      2. Return a new response with the following properties:

        type

        cors

        status

        200

        header list

        « Insert content-type, etc as appropriate here »

        ...

        ...

        body

        The result of getting body as a body.

    2. Return a network error.

  2. Return null.

2.2. Monkey-Patching Fetch

Fetch will call into this algorithm near the top of Fetch § 4.2 Scheme fetch, ensuring that any potential shimming happens only after Fetch § 4.1 Main fetch (which performs CSP, MIX, bad port checks, etc):
  1. If fetchParams is canceled, then return the appropriate network error for fetchParams.

  2. Let request be fetchParams’s request.

  3. Let override response be the result of executing Override response for a request on request.

  4. If override response is not null, return override response.

  5. Switch on request’s current URL’s scheme and run the associated steps:

Fetch will also call into the algorithm near the top of Fetch § 4.3 HTTP fetch, ensuring that requests which bypass Fetch § 4.2 Scheme fetch (today, at least cors requests) can be handled:
  1. Let request be fetchParams’s request.

  2. Let response and internalResponse be null.

  3. Set response to the result of executing Override response for a request on request.

  4. If response is null and request’s service-workers mode is "all", then:

Note: Putting this check in both Fetch § 4.2 Scheme fetch and Fetch § 4.3 HTTP fetch seems redundant, but ensures that it has a chance to act upon requests to both HTTP(S) schemes and non-HTTP(S) schemes (like blob:, data:, and file:) for both no-cors and cors requests. An alternative to this approach would place this new check just before step 12 of Fetch § 4.1 Main fetch, and extract the network errors which could be produced in that step to ensure they happen consistently prior to potential shimming. This approach seems simpler, but the alternative might be clearer? Especially given the appearance of a bypass through the preloaded response candidate check (which isn’t really a bypass, assuming that the user agent’s intervention blocks the initial request which would have preloaded the response (which seems reasonable to assume)).

3. Testing Considerations

It would be ideal to verify the ordering of various restrictions that come into play via the patch to Fetch described above. Content Security Policy, Mixed Content (both blockage and upgrades), and port restrictions are all evaluated prior to checking in with any implementation-defined blockage oracle, and this behavior should be verifiable and consistent across user agents.

There’s likely no consistent way to do this for any and all blocking mechanisms, but specific categories of blocking behavior that have widespread agreement seem possible to test in a consistent way. As a potential path to explore, consider that Google’s Safe Browsing defines a small set of known-bad URLs (see https://testsafebrowsing.appspot.com/) that allow manual verification of browser behavior. Perhaps we could extend this notion to some set of high-level blockage categories that user agents seem to generally agree upon ("phishing", "malware", "unwanted software", "fingerprinting", etc), and define well-known test URLs for each within the WPT framework.

That is, phishing.web-platform.test could be added to user agents' lists of phishing sites, and represented within WPT via substitutions against {{domains[phishing]}}. We’d likely need some Web Driver API to make this possible, but it seems like a plausible approach that would allow us to verify ordering and replacement behaviors in a repeatable way.

Note: Some blocking behaviors (blocking all top-level navigation to an origin, for example) might be difficult to test based only upon web-visible behavior, as network errors and cross-origin documents ought to be indistinguishable. We could rely upon leaks like frame counting, but ideally we’d treat that as a bug, not a feature we can rely upon.

4. Security Considerations

Blocking or shimming subresource requests can put pages into unexpected states that developers are unlikely to have tested or reasoned about. This can happen in any event, as pages might be unable to load specific resources for a variety of reasons (outages, timeouts, etc). Ideally developers would handle these situations gracefully, but user agents implementing resource blocking would be well-advised to take the risk seriously, and carefully evaluate resources' usage before taking action against them.

5. Privacy Considerations

Blocking resources has web-visible implications. If the set of resources blocked for one user differs from the set of resources blocked by another user (based, perhaps, on heuristics that take individual users' browsing behavior into account), that visible delta could be used as part of a global identifier (see e.g. "Information Leaks via Safari’s Intelligent Tracking Prevention" for a variant of this attack [information-leaks]). User agents implementing resource blocking can avoid this risk by ensuring that the set of blocked resources is as uniform as possible across their userbase.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[MIX]
Emily Stark; Mike West; Carlos IbarraLopez. Mixed Content. URL: https://w3c.github.io/webappsec-mixed-content/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/

Informative References

[CSP]
Mike West; Antonio Sartori. Content Security Policy Level 3. URL: https://w3c.github.io/webappsec-csp/
[INFORMATION-LEAKS]
Artur Janc; et al. Information Leaks vis Safari's Intelligent Tracking Protection. URL: https://arxiv.org/pdf/2001.07421