TLS typically works via per-hostname certificates. If you have only a single host on an end-point then no problem. But what if you want to host two or more domains with different certificates on a single end-point? It's a problem.
So Server Name Indication (SNI) was invented. In effect you send, in plain text, the hostname name you wish to contact to the end-point, who is then able to use that information to route the request. The correctly routed request can then utilize the right certificate.
At the time this didn't have large security implications because unencrypted DNS was already leaking requests like crazy. But with DNS slowly getting more secure and TLS gaining a huge market share (<3 Let's Encrypt), SNI is starting to stand out as an "easy" way to spy on what someone is requesting.
While IP Addresses themselves can be used for spying, encrypted DNS, protected SNI, and TLS all increase the cost and complexity of that spying, and make the results harder to trust (particularly on shared hosts or sites using a shared load balancing system). If encrypted SNI becomes a reality, it is a huge win against ISPs spying on us and selling our browsing histories.
With non-SNI TLS (traditional) the server name is only sent ONCE the connection is encrypted, you can have several severs by using Subject Alternate Names (SAN), with a de facto maximum of about a 100. All the names have to be in a single cert though, because at time of establishing the encrypted connection, there is no way to know which cert to use.
With SNI, the hostname is sent before the encryption starts, so you can have more than one cert on the same server. This makes life easier for the admins as they no longer have to coordinate a bunch of hostname within a single cert, but it means that the target hostname travels through your ISP etc... in clear.
100 is the exact maximum by policy from Let's Encrypt, but if you're interested in more details about what CAs can do, there is a great thread on the Let's Encrypt Community Forum about the nature of the limitations:
> tl;dr the only reason to care about SNI in TLS 1.3 is if you are a privacy freak. It's not a security concern.
You also need to care about SNI if you're trying to evade a firewall in a repressive country that blocks encrypted messaging services and VPNs that are used to bypass censorship and surveillance. Some of those firewalls were looking at the SNI to figure out if the traffic was something they wanted to block or not.
They can still check the SNI string, since initial encryption is negotiated with the proxy.
They can't subsequently masquerade as the actual SNI host since they don't have a valid cert for it, but by that point they've probably learned enough to terminate the connection and log the attempt. Well unless they have their root cert in your browser...
SNI encryption is useful against ISPs and casual snooping, not state actors.
Trying to circumvent a repressive regime's laws is not a great design spec for the internet. Why not make Tor mandatory for all TLS 1.4 connections?
Either they'll use traffic analysis to find out where you're going anyway or they'll just ban all HTTPS connections. Not that they need to as Russia is fine with just blocking whole IP ranges, making SNI irrelevant.
One of the good things in TLS 1.3 was that cryptographers were brought in early, the TLS 1.3 core design is a product of cryptographers plus engineers, and then the work to file off sharp corners and make it work in the real world are the engineering problem, whereas previous versions the engineers built something that seems to work, then (sometimes years later) the cryptographers were shown the results and usually they are not entirely comfortable but it's too late.
Maybe SNI needs (if it hasn't had already) a few rounds of smart crypto postgrads and postdocs trying to figure out what is possible here. If the answer is that nothing we can possibly do fulfils all the criteria, well, that's sad but we must find a compromise knowing this limitation, if there's a way to satisfy all requirements in this document, maybe they can find it. As it is today, it feels a bit hopeless, or at least I'm worried that it does.
Effectively, there is only a single usable Internet protocol, called TCP. There is only two usable ports, 80 and 443. There is only a single usable transport-layer protocol, called SSL/TLS. And nothing more.
Luckily, Tor relay operators have recognized this problem from the beginning - Tor uses standard TLS for the transport-layer encryption, most of the relays are running at port 443. It enables everyone to bootstrap them to the complete version of Internet. And recently definitely IETF is well-aware of this, hence DNS-over-HTTPS.
Looking at this from an information exposure viewpoint, TOR is actually the only type of technology that I think can provide actual anonymity (by making it computationally expensive to track an end user with diligent opsec; which is actually an extremely high bar).
If one node is speaking to another node (packets are routed) then it is known that the user may be speaking to any publicly listed, previously expected to exist there (pub/priv), or plausibly secret services at that location.
If the attacker is able to impersonate the identity (crypto) of the target node or otherwise transparently observe the node's contents then the same can also be said for any data routed through the node.
Given the above I do not see a compelling reason to reserve information about a desired target of contact from that node.
Thus it seems logical to have any name resolution / identity certification system allow delegates for 'middle men' (other crypto IDs) that are authorized to provide termination routing.
With that included in the name resolution / certificate, connecting to a specified node and then asking for the 'named service' over that secured connection should not expose any information that could not already be observed via other systemic weaknesses. If desirable a tunneled session to the end service seems the most likely to be secure, but some method of switching to a still encrypted direct channel to that other service (without any further encryption between the source and middle node) might be useful in the case of load balancing systems.
Title: Default, unchangeable1 browser settings and modern, unencrypted SNI
1 Ignoring the possibility of editing and recompiling the source code, if the browser is open source.
Background
SNI is only required for websites using TLS that are also using shared IP address space (e.g., CDNs) where a single IP address may be used by multiple websites, perhaps having no common owner or no relationship to each other.
However websites that have rented or purchased a dedicated IP address do not need to use SNI in order to take advantage of TLS. Traditionally, all websites using SSL/TLS fell into this category.
It is possible many websites using TLS still use dedicated IP addresses and hence do not need SNI.
For the curious, some data on websites posted to HN is provided below.
By default popular browsers send unencrypted SNI regardless of whether the website is using a shared IP address and regardless of whether the website actually requires SNI.
This default, unchangeable setting makes sense if one assumes that, by and large, most websites using TLS are also sharing an IP address with other websites.
The question is whether that assumption is true.
Hence we ask:
What percentage of the websites does the HN reader visit that are both (a) using TLS and (b) using a shared IP address. In other words, what percentage of these websites accessed by HN readers actually require SNI to be sent by the browser?
Sample data: A survey of websites currently appearing on HN
Number of unique urls: 462
Number of http urls: 72 (unique domains: 67)
Number of https urls: 390 (unique domains: 229)
Number of https urls requiring SNI: 36
Number of https urls requiring correct SNI: 22
What is meant by "correct SNI"?
It is possible with some CDNs to send incorrect SNI and still retrieve the correct page.
This is because these CDNs do not use the SNI in order to retrieve the correct page. Like all other sites since the dawn of the web (up until the appearance of SNI) they only need a correct Host header.
(This begs the question of what, if anything, the SNI might be used for within these CDNs. Users concerned about privacy, tracking, ads, etc. might wonder if the SNI is being used for something.)
Sending an incorrect (fake) SNI has been nicknamed "domain fronting". The user simply picks any domain using the CDN and sends it as the "fake" SNI.
14 of the 36 are using a CDN that does not require a correct SNI in order to retrieve the correct page.
I have automated this "SNI test" and can run it daily, weekly or, for a larger sample, I can run it on historical HN data, e.g., all domains appearing on HN in the year XXXX.
Is anyone aware of formal studies on the percentage of sites that actually require (correct) SNI?
Is there any technical (cf. practical) reason the client must obtain the certificate from the server it names, and contemporaneous with the client's HTTP command?
Is there any technical reason the client cannot obtain the certificate from other sources and/or not contemporaneously with the HTTP command?
For example, CurveCP explains how the public key can be inserted into (DNSCurve-encrypted) DNS RRs. Clients can obtain the public keys from DNS servers instead of from www servers. They might even obtain public keys in bulk in zone files rather than by piecemeal requests. Further, CurveCP provides for a "server extension", an identifier that allows multiple sites to use the same IP address and port, obviating the problem which SNI aims to solve.
For example, third party sources such as crt.sh provide repositories of certificates that users can download at a time they choose, not necessarily contemporaneous with when they send HTTP commands to the servers named in the certificates. There are a variety of sources of bulk certificates in addition to crt.sh. (Incidentally, crt.sh does not require SNI.)
The practice of third parties providing certificates versus letting clients obtain them from the named servers/issuing parties is seen among the few companies/organizations that write popular clients when they distribute collections of "trusted" certificates with the client software.
Another example is when a www server sends "intermediate certificates" along with the certificate that names the www server. This too demonstrates third party distribution of public keys to clients instead of letting clients get the keys from the entity to which they were issued.
No matter how the client obtains the server certificate, it still has to somehow tell the server which server certificate it has used for the public key which it has encrypted the premaster secret with.
TLS typically works via per-hostname certificates. If you have only a single host on an end-point then no problem. But what if you want to host two or more domains with different certificates on a single end-point? It's a problem.
So Server Name Indication (SNI) was invented. In effect you send, in plain text, the hostname name you wish to contact to the end-point, who is then able to use that information to route the request. The correctly routed request can then utilize the right certificate.
At the time this didn't have large security implications because unencrypted DNS was already leaking requests like crazy. But with DNS slowly getting more secure and TLS gaining a huge market share (<3 Let's Encrypt), SNI is starting to stand out as an "easy" way to spy on what someone is requesting.
While IP Addresses themselves can be used for spying, encrypted DNS, protected SNI, and TLS all increase the cost and complexity of that spying, and make the results harder to trust (particularly on shared hosts or sites using a shared load balancing system). If encrypted SNI becomes a reality, it is a huge win against ISPs spying on us and selling our browsing histories.