Tl;dr: The vast majority of adults will never have to interact with our age assurance systems and their experience won't change, because we know Discord and how people use it, so we're designing to respect privacy and deliver a safer experience while minimizing friction for adults.
Hey folks –
I’ve been on Discord since very early 2016 and actually joined the company in 2017. Safety is one of my areas, so today’s announcement on our blog is something I’ve been pretty involved with. I’ve always cared about Discord's approach to privacy (E2EE for A/V was another of my projects here), so I figured I’d add some more context to today's news.
I can say confidently that the vast majority of people will never see age verification. I say this because we launched age assurance in the UK and Australia in 2025, and we have some pretty good data on this now. The idea here is that we can pre-identify most adults based on what we already know (not including your messages!), and that looks to get us pretty far here. No face scans, no IDs, for the vast majority of adults.
And if you are one of the smaller subset of folks that we can't definitively pre-identify, then still, you only have to do it if you're accessing age-restricted servers or channels, or changing certain settings. That's really not most users. (Altho... might be more Redditors, tbh.)
Last, I know that there is concern about privacy and data leaks. That's a real concern. The selfie system is built purely client-side, it never leaves your device, and we did that intentionally. That'll work for a bunch of users who aren't pre-identified as adults. But if you do end up in the ID bucket, then yeah, you're right that has some risk. We're doing what we can to minimize this by working with our range of partners (who are different partners than the data leak you read about), and if it's any help, we learned a lot internally from the last issue. But I get if that doesn't necessarily inspire more confidence.
Anyway, we’ll be sharing more next month as we get closer to the global roll out about the system, including the technology behind it in March. I honestly wouldn't be happy if we didn't build something good and I am excited about what we’re launching, but please let us know what you think when we share more details.
And I really appreciate everybody's feedback here today. We’re definitely reading it!
Kids can create accounts only at age 13+, adulthood is at age 18 (at least in my country) which means any account older than 5 years should automatically be marked as an adult's account. Please tell me that's the case.
If you still require an ID for those accounts, that means you don't really care about age verification, you just want to tie people to a government ID.
Yeeeep. I usually end up creating isolated grids with circuit networks and banks of capacitors to make it so the power production (and fuel production to feed it) can never shut down...
Dyson Sphere Program (an amazing factory builder game, if you haven't tried it) has similar problems -- but no circuit networks. I haven't yet figured out how to make a robust power generation system that doesn't rely on just alerting the operator that something is going wrong...
Yeah, I've recently started another run of Space Ex.
Currently have an isolated grid with some solar/batteries for enough boilers to kick start everything.
As I scale, I'll be using a circuit network to set up a steam battery that'll be able to kick start everything and take the hit on surges of power requirements (looking at you Coronal Mass Ejections).
We use data services to do "data related things" that make sense to do at a central proxy layer. This may include caching/coalescing/other logic but it doesn't always, it really depends on the particular use case of that data.
For messages, we don't really cache. Coalescing gives us what we want and the hot channel buckets will end up in memory on the database, which is NVMe backed for reads anyway so the performance of a cache wouldn't add much here for this use case.
In other places, where a single user query turns into many database queries and we have to aggregate data, caching is more helpful.
Our messaging stack is not currently multi-regional, unfortunately. This is in the works, though, but it's a fairly significant architectural evolution from where we are today.
Data storage is going to be multi-regional soon, but that's just from a redundancy/"data is safe in case of us-east1 failure" scenario -- we're not yet going to be actively serving live user traffic from outside of us-east1.
It's not even the most interesting metric about our systems anyway. If we're really going to look at the tech, the inflation of those metrics to deliver the service is where the work generally is in the system --
* 50k+ QPS (average) for new message inserts
* 500k+ QPS when you factor in deletes, updates, etc
* 3M+ QPS looking at db reads
* 30M+ QPS looking at the gateway websockets (fanout of things happening to online users)
But I hear you, we're conflating some marketing metrics with technical metrics, we'll take that feedback for next time.
Ideally I'd like to hear about messages per second at the 99.99th percentile or something similar. That number says far more about how hard it is to service the load than a per-day value ever will.
(I work at Discord and manage our Infrastructure, Security, and Safety engineering organizations.)
We currently don't intentionally block or disable third party clients or action the accounts of people who use them.
We do monitor the traffic of spammers and we build heuristics around how to identify them -- and sometimes third party clients get caught up in that. Cold comfort, I know, but it's not us trying to block/come after well-behaved third party clients.
Anyway, to OP, good luck with discordo! For one of our internal hack weeks a few years ago I tried to build an RFC1459 compliant Discord gateway... it was a fun POC, but definitely lots of rough edges because the paradigms don't exactly match up. :)
Is it possible those heuristics could accidentally trigger for browsers other than Chrome? I had an old account where I normally used the android app, then one day I logged in with Firefox on desktop (with adblocker) and my account was banned about a minute later.
At a business level, can you share why the ToS forbids third party clients at all? We all know that "trusting the client" is not a viable security plan, so why does it matter what client people use?
> At a business level, can you share why the ToS forbids third party clients at all? We all know that "trusting the client" is not a viable security plan, so why does it matter what client people use?
Because if something breaks for a user and they complain, the company cannot diagnose it or fix it. Simply dealing with the complaints would be an extra cost on the company.
And when they decide to change part of the API, you have an unknown number of users that would be broken.
Eh, this reads weird to me. So third party clients are "ignored," but things like Better Discord which modify the first party client are explicitly not kosher? I'd love for better clarification around this at some point honestly.
Clearly Discord as a corporation is not ok with third party clients or modifications to the client.
But the engineers who would be in charge of enforcing those rules do not spend time explicitly seeking out third party clients or modifications. They instead look for "non-standard behavior", which may incidentally catch either.
PS: This is why you don't speak about your employer's business unless asked to by your employer.
Which brings me back to my initial post, despite the (mind you, high level engineer)'s opinion, you should probably stay way clear. Support will just not help you in certain situations, and it's not worth the risk. Was surprised to even see a reply from him, Discord the organization has typically been _very_ clear it's not kosher.
I spent a good amount of time in the late 90s writing little bots for a similar game, AT-Robots, where you write an assembly type language to program a little laser-shooting robot that could drive around, scan for enemies, etc.
Anyway, I loved this kind of thing as a teenager. I felt it really helped me to fall in love with programming and the ability to make things happen by writing code.
There was a fun PC game called Omega [1] where you earned resources to build a tank and then programmed it to win combat; which earned you more resources. It was a great game. It had it's own programming language, iirc.
We (Discord) moved off of MongoDB for various reasons and are quite happy about that decision but managing Cassandra/Scylla clusters is not exactly a walk in the park either.
I didn't make the original decision but if I were starting something and I had no idea whether or not it'd be successful, I'd do whatever was the absolute fastest way to get to MVP. That'd probably be a cloud database, honestly -- but a modern MongoDB would be technically fine too (licensing stuff notwithstanding.)
Most startups fail not because they picked a suboptimal database for their usage but because they didn't build something that was good or it didn't achieve product market fit. I wouldn't worry about your database over-much in the beginning (unless it's critical to what you're doing and in that case, worry like hell, but you will probably know if that's the case.)
Many of Discord's issues with Mongo were exacerbated that we were using TokuMX which was abandoned shortly after we started using it. A few years into Discord we found ourselves with a rapidly scaling dataset and userbase that was built on top of an abandoned and not super popular third party version of MongoDB. (Funny story: at one point towards the end we realized that all of the packages had been pulled from every mirror we could find and literally the only place we could find the package files was off of some gov.uk mirror... that was a bad day. Thankfully we had the hashes and were able to validate the packages...)
FWIW, we did honestly debate moving our core user model (which was what was left in TokuMX by the end there) into a modern version of MongoDB -- some of the things we did (reverse indexes, secondary indexes, locking, etc) are much more complicated in a database like Scylla. It was tempting to just migrate the data from one "Mongo" to another and call it a day.
We didn't for a variety reasons, not least of which is keeping things simple by reducing the number of technologies you have in production (like when we chose to embrace Rust we went back and migrated nearly all of our Go systems).
Anyway, I'm pretty happy with not running MongoDB anymore, but not because MongoDB is inherently bad. It's popular for a reason!
Really appreciate this great, detailed answer! 100% agree with getting to an MVP with PMF as quickly as possible should be the top priority for a startup.
I run the infrastructure department at Discord which includes our anti-spam engineering team --
Just want to +1 what you're saying and confirm that we are never trying to ban third party clients (that aren't self-bots). Honestly, it would be a waste of our time and basically do nothing good for Discord. But as you correctly point out, they do sometimes trip the ever-evolving heuristics we build that try to identify and mitigate spam on the platform.
If this happens to your account you can write in to our TNS team at https://dis.gd/request and they will usually take care of unbanning any accounts that get accidentally caught up in a spam heuristic. It sometimes takes a bit to investigate and respond to these kinds of requests but they generally come out right in the end.
I got banned a couple months ago seconds after joining the Discord server for a game (https://www.reddit.com/r/LoopHero/comments/lwx8m8/loopers_jo...) and lost all my account information and the servers I had joined, I tried reaching out through that form but got a mail that told that I had indeed abused (without knowing anything about what the abuse was - I barely use discord, as in, I was connecting to chat maybe once a week and am confident I did not violate guidelines, yet this is the mail I had gotten: "Your account was disabled for violating our Terms of Service or Community Guidelines. We’ve reviewed and have confirmed this violation, and we will not be reinstating your account.").
Support n° was 12080748, any chance I could get it back at some point ?
> You agree not to (and not to attempt to) (i) use the Service for any use or purpose other than as expressly permitted by these Terms;(ii) copy, adapt, modify, prepare derivative works based upon, distribute, license, sell, transfer, publicly display, publicly perform, transmit, stream, broadcast, attempt to discover any source code, reverse engineer, decompile, disassemble, or otherwise exploit the Service or any portion of the Service, except as expressly permitted in these Terms;
It's a catch-all but the best restriction that prevents third-party clients is probably 'prepare derivative works based on' and 'reverse engineer' (which you would need if you want your third party client to use any regular client API calls, or if you want to support signing in with the user-facing login page/qr login).
Pretty cool to see my project hit the front page of HN, but definitely a bit of a /shrug moment on the subject itself. "Facebook gonna Facebook" I think is approximately how we feel about this.
I know here on HN we're used to hearing stories about scrappy startups trying to carve a piece of the pie big enough to exit on, but that is pretty much the exact opposite of what Dreamwidth is. Our motivations are very different, so this FB block is mostly a curiosity to us.
Dreamwidth is a small, neighborhood corner store kind of site. We're run by a couple of dedicated part-time staff (who have other jobs/responsibilities in life -- I personally work for Discord!) and a cadre of amazing volunteers who donate of their time and energy to make a nice little corner of the Internet that isn't driven by the cycle of VC and growth and user monetization.
We do not have any goals around growth, we don't advertise, and we ultimately don't care that much what the other platforms do. Our goal is to give people a stable home where they don't have to worry about their data being sold, their writing being monetized. Users choose to pay us for a few more advanced features (like full text search), and we support ourselves entirely off of that.
We are home to a large group of online roleplayers, Hugo Award winning fiction writers, Linux kernel developers, parents, security researchers, artists, activists, recipe bloggers, educators, and everything in between and around the edges who would rather work with a service owned and run by people who are motivated by something other than get-big-and-exit. Large communities of online roleplayers who get together and build whole worlds on Dreamwidth, who tell stories together. I'm constantly impressed by the creativity of our community.
Anyway, it's super cool to see Dreamwidth on the home page here. It's been my side project for over a decade now, and I'm quite proud of it. Even if modernizing a 20+ year old Perl project is a hellish undertaking at the best of times... but we keep going. :)
Great to hear all around. I like the focus on doing your own thing and ignoring the other platforms (and likely naysayers). We could all use a healthy dose of this in our side projects! Also - I would take it as a positive you are blocked on FB.
My wife and I tried to setup a simple business page for our local store we opened less than a year ago; they flag us as a fake/fraudulent account multiple times when we tried to created one; neither of us have personal/active FB accounts so I guess that's the reason (and this behaivor, yeah makes me double down on NEVER getting a FB account now). I even tried to emailed them 'proof' as they requested because my wife was worried it would really hurt us, nothing ever came of it. We finally decided it wasn't worth our effort, forgot about them and our store has thrived since. I'm happy to grow our business without having to deal with them. We've been using local and other ad platforms such as NextDoor.com, which I'd never heard of but one of our older customers brought to our attention. People talk about getting rid of Facebook, to me it starts with the actions you guys take and how my wife and I are going about it.
Don't support Facebook at all, they don't deserve it.
This might come off as a little rude, but it's sincere advice from someone who used to work in anti-spam (not at Facebook):
I had a quick look through Dreamwidth's "latest" page (https://www.dreamwidth.org/latest) earlier today, and a major portion of the posts on there were blatant spam for things like credit card scams, "Work from home and make $1000/day!", and so on.
You seem to be hosting a lot of spam, and those spam posts are also far more likely to be getting linked externally on sites like Facebook, since that's the reason they're being created.
Because Dreamwidth is effectively free website hosting along with a free new subdomain for each account, blocking individual subdomains is futile, and it's difficult for external sites to distinguish between spam and legitimate blogs.
I'm sure Facebook will unblock you fairly soon, but unless you get the spam on Dreamwidth under control, this will probably happen fairly often with different sites blocking it. It would be easy to end up with an impression of Dreamwidth being a spam-hosting site, and decide to block it (either manually or automatically).
Blogspot has always been in a similar situation and would get blocked from a lot of sites due to the sheer amount of spam it hosts.
You're definitely right -- this is an issue. I could very well believe that we tripped some FB spam measures.
We have a very manual anti-spam process right now that relies on humans to detect it and action it. We have a couple of very dedicated folks who end up looking every few hours, but it's not automated, and we don't have full timezone coverage.
It's definitely something I'd like to see us improve, but we've been focused on other projects (like switching from mid-90s HTML to a responsive design, which is a slow rewrite of the entire site). That said, if you have any advice on reasonably scalable ways of doing this in-house that don't involve sending our user content to a third party, I'd love to take any recommendations!
Feel free to email me, mark@dreamwidth.org, if you would rather do that. And if not, don't worry about it, I appreciate the comment anyway :)
The simplest spam filtering algorithm would be a naive bayes filter. It's essentially keep a count of words that appear in all posts, words that appear in spam posts, and words in non spam posts. Those counts + bayes rule will let you figure out the probability of spam given a word. It's called naive bayes because you assume each word in your post is independent of the others so probability the whole post is spam is just product of the probabilities.
The nice thing about this is it's pretty computationally light and straightforward to implement for any language. I have no clue as to your stack, but if you have python for your backend then sklearn is a good library that has a naive bayes classifier (plus a lot of other better options). Any post with a high probability of being spam, I'd automatically flag and by default just remove with the option for a user to ask for manual review. Main thing you'd need for this or any fancier approach is some dataset of spam/non spam posts. If you have an easy way of retrieving past posts that were labelled spam that should allow you to make a fine dataset. If you don't want to train on your own user posts (although only information kept is word counts here), you can look online for spam datasets and use one of those to train your classifier.
The nice part is that SpamBayes gives you two numbers, the spam "probability" and the ham "probability". When one of them is very close to 1 (like > .99) and the other is very close to 0 (like <.01), there is a good chance that the message is really spam or ham. And this classify almost all the messages. But from time to time you get a message where the numbers are not so clear, or both are big or both are small, and this means the classifier is confused and you really must take a look at the message.
Wow when this came out (I think this was the ‘original’) it felt quite ground breaking. Perhaps early 2000s it was?
Then google started doing that or something similar at scale and effectively eliminated spam in my mailbox ever since. (With the curious recent exception of some highly similar bitcoins spams)
Controlling spam used to be about stopping unwanted messages sent to users. Now it has morphed into this idea that every site has the responsibility of content-policing their own users, lest what they publish be used to facilitate spam. Your advice may be pragmatic, but it shows how far we've slid down the slippery slope.
> Now it has morphed into this idea that every site has the responsibility of content-policing their own users, lest what they publish be linked from spam.
Not sure what you mean here. The problem Deimorz was bringing up wasn't just about users writing something, and spammers linking to it. It was that this site was being used to host the spam payloads. By spammers, not by actual users.
And this is how a lot of the early spam fighting worked: by finding hosts that allowed sending spam and publishing their IPs on blocklists. All mail traffic from those IPs, even if legit, would then be rejected by a large proportion of mail servers that subscribed to these blocklists.
Compromised accounts trying to sell bogus Ray-Bans and tagging some friends seems to be a pretty common scam on Facebook. I see it in my feed a couple of times a year.
> this site was being used to host the spam payloads
Calling these "spam payloads" is incorrect. The spam payloads are on Faceboot's servers. These are sites that are linked to by the spam, ostensibly for the purpose of funneling to whatever the spam is trying to market. Trying to police generic web pages, rather than the spam itself, seems like an exercise in futility given the basic philosophy of the Internet.
> And this is how a lot of the early spam fighting worked: by finding hosts that allowed sending spam and publishing their IPs on blocklists
The situation has a similar shape, but there is a distinction as Dreamwidth is not actively sending spam but rather responding to requests from viewers. Still, we can look at the outcome of what happened to the email ecosystem - increased centralization of providers - for a warning of what's to come.
In this hypothetical, the message that is posted on Facebook would just be a link + something innocent that makes people click through. Why? Because the easiest form of spam filtering works by looking at the content. Spamming via a link rather than directly gives this kind of content filtering little to work on.
A typical way to deal with this is to consider domain reputation somehow, if the content contains a link. E.g. trust links to old domains more than young ones. Or trust sites that with lots of back links more than ones with none.
So an old domain with user created content, a good reputation , but little moderation or abuse protection turns into a great place to host this data. Eventually links to the domain get flagged one too many times, and it gets blocked.
I agree that they are not sending spam in this scenario. But neither were the open smtp relays of old. They just passed it through, while allowing the spammers to leech off of the relay’s reputation.
(Just to be clear, I have no knowledge of what happened here in reality. So I don’t know that DW is hosting spam, nor that it was linked to from Facebook. This is just an example of why a domain blocklist might be a totally reasonable option.)
Malware and childporn reduction efforts also often go after the hosts of that content. I'm not sure why calling the folks hosting this stuff what it is incorrect. Sure, childporn folks don't actually necessarily "send" child porn, they just respond to requests from viewers. But they host it.
These scam sites are like that - do you really think you can make $30,000 a week working 30 minutes a day from your home computer if you just send these idiot $25?
You're just listing the earlier stops on the slippery slope. Make hosts responsible for policing information when it's viscerally-revolting child porn. Then make hosts police content when it's directly harmful to people's computers. Then make hosts police content when it's an attempt to scam.
There's already a call to control political information when it has harmful effects on society. Next up is "your website was blacklisted because you allowed a user to link to Plandemic". I agree Plandemic has no redeeming purpose, but censorship is not the answer.
I'm explaining why sites that HOST but do not necessarily send content are blocked.
I've got no problem with their operation, but YOU are going down a VERY dangerous and slippery slope by saying I can't block domains that clearly host trash because they might host something else.
On my network I can block child porn, malware sites, scam sites and even entertainment sites like youtube. If you are running a service that mixes the content together, then you may be blocked by folks (like me) who don't have time to chase down every (free) subdomain you allow scammers to create.
That is my right. Period. Full stop. That is not censorship.
Folks here get censorship confused. The govt does virtually nothing to stop these scam sites - so they are certainly not being censored. I'm fine if govt does nothing, as long as communities of people can block these places.
And yes, if you run a site on the internet and don't make it slightly difficult for scammers to use your site to host crap, then other folks in the neighborhood will move the heck away from you.
> Folks here get censorship confused. The govt does virtually nothing to stop these scam sites - so they are certainly not being censored
It seems like you're getting confused on what censorship is. https://en.wikipedia.org/wiki/Censorship . Censorship can be done by the government, and it also can be done by sufficiently powerful private entities.
Also, nowhere have I argued that anyone shouldn't block whatever they'd like on their personal infrastructure. Although if you do it to your kids, then you are indeed censoring.
Want to personally thank you as a user of DW for a few years that migrated over from tumblr after the NSFW ban - I can't thank you and the team enough for what y'all provide. It's a haven and a remnant of the "old web" that is honestly the one thing aside from my personal webring-esque site that I can /trust/ not to change with trends (whether it be payment trends or idiotic aesthetic changes that end up making more of a mess than not). Having a place for fandom analysis and journal posts and just to exist with some level of privacy is a rare treasure. Glad to see your thoughts on this FB ban.
Super cool to have people with a mission(other than optimizing KPI for profit). It wasn’t obvious for me how to discover quality content though, any words on how to use this site or how to think about it? Do we need to know people beforehand that publish things that we want to follow?
I really like the product you've made, and use it regularly. It might be helpful for your user base, and the project as a whole if one responded with a little more than a shrug. Facebook may be more willing to listen to you if you contact them then random users in getting this resolved.
As a Dreamwidth user, I would like to use this occasion to express my thanks for this decision. I've seen too many blogging platforms being swallowed by large companies, shut down, "improved" into oblivion by marketers and made completely unusable in the chase of "growth". You're doing it right. Thank you.
Tl;dr: The vast majority of adults will never have to interact with our age assurance systems and their experience won't change, because we know Discord and how people use it, so we're designing to respect privacy and deliver a safer experience while minimizing friction for adults.
Hey folks –
I’ve been on Discord since very early 2016 and actually joined the company in 2017. Safety is one of my areas, so today’s announcement on our blog is something I’ve been pretty involved with. I’ve always cared about Discord's approach to privacy (E2EE for A/V was another of my projects here), so I figured I’d add some more context to today's news.
I can say confidently that the vast majority of people will never see age verification. I say this because we launched age assurance in the UK and Australia in 2025, and we have some pretty good data on this now. The idea here is that we can pre-identify most adults based on what we already know (not including your messages!), and that looks to get us pretty far here. No face scans, no IDs, for the vast majority of adults.
And if you are one of the smaller subset of folks that we can't definitively pre-identify, then still, you only have to do it if you're accessing age-restricted servers or channels, or changing certain settings. That's really not most users. (Altho... might be more Redditors, tbh.)
Last, I know that there is concern about privacy and data leaks. That's a real concern. The selfie system is built purely client-side, it never leaves your device, and we did that intentionally. That'll work for a bunch of users who aren't pre-identified as adults. But if you do end up in the ID bucket, then yeah, you're right that has some risk. We're doing what we can to minimize this by working with our range of partners (who are different partners than the data leak you read about), and if it's any help, we learned a lot internally from the last issue. But I get if that doesn't necessarily inspire more confidence.
Anyway, we’ll be sharing more next month as we get closer to the global roll out about the system, including the technology behind it in March. I honestly wouldn't be happy if we didn't build something good and I am excited about what we’re launching, but please let us know what you think when we share more details.
And I really appreciate everybody's feedback here today. We’re definitely reading it!