That's not a shallow dismissal. Fundamentally, what I read was a cohort size of 11. 1 died. No control group. No double blind (as no control group).
There's no aspect of this that would be statistically distinguishable from noise.
Then again, the original study suffered from this as well.
n=11 may be succinct, but it is not shallow. There is a world of statistical DoE, analytics, etc. behind this.
My concern is the rush to headlines about studies without any real resolving power. The bad part of this is that we don't have time to do proper studies (this takes months and many people to make work). The n=11 comment actually infers this issue.
Sort of like -1 = exp(sqrt(-1)*pi), you don't need many letters to convey a tremendous amount of meaning.
Somehow the hivemind decided a few years ago (it would be interesting to trace how and when) that every study with a small sample size is worthless and merely mentioning sample size is enough to cancel it, and whoever is first to bring this up in a thread wins. It's a gotcha response.
Obviously there are statistical concerns with small samples. Equally obviously, not every such study is worthless and the authors are not all idiots who knows less than any internet commenter. So to get substantive discussion, we need to engage with the specifics of a particular study, not post a generic dismissal. "n = 11?" looks specific but it isn't. It's a template instantiation, as stock as "Correlation is not causation".
I don't mean to pick on the GP commenter. Maybe they had something else in mind. The issue is how such a comment plugs into a known shallow-discussion dynamic. The problem is at the group level, not the individual, and we all suffer from this reflexiveness, so none of us gets to feel smug.
That would be just as shallow. The issue isn't brevity, it's genericness. Are we really making the assumption that every study with a small sample is worthless, and that our reflexes know better than the people who worked on it? Obviously we need to do better than that. Generic discussion is shallow discussion (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...).
Here's a comparable comment that at least begins to engage with the specifics: https://news.ycombinator.com/item?id=22798031. I don't know if it's a good point, but it's at least actually about this particular study. We wouldn't moderate such a comment.
I'm not sure I agree with this. Would you feel the same way if the sample size was 1? 5?
11 is very small, and useful only for evaluating the presence of very very prominent effect sizes, regardless of domain. A comment saying n=11 is not a novel insight, but I believe it is useful, similar to people who add "in mice" to articles whose headlines might imply the test was on humans. I appreciate those comments.
In this case I think it is fair to conclude that the headline "Severe Covid-19 Cases Don't Respond to..." is an accurate takeaway purely because the sample size is so small. With this sample size you can, at best, conclude that these combinations of drugs don't reliably cure the disease in severe cases, which is a very different idea.
Why not? It depends on the study. I agree with you about misleading headlines and the mice thing. It's an interesting point to compare.
While I have you: could you please stop creating accounts for every few comments you post? We ban accounts that do that, and in fact this account was banned. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.
You needn't use your real name, of course, but for HN to be a community, users need some identity for others to relate to. Otherwise we may as well have no usernames and no community, and that would be a different kind of forum. https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...
Imagine a treatment that would being people back to life. Surely if you saw that 11 times you would believe it worked. In fact, after n = 1, you would be so amazed you'd be thinking there is a trick. The point is that, while N is important, the N needed to uncover a signal differs wildly with the outcome you're trying to affect, specifically the outcome's variance. If one of these drug cocktails 'works' (which has many different definitions in clinical research), the signal may be obvious enough that n = 11 is enough to hone in on a promising treatment for further study. It's complicated.
I can only think in the context of A/B testing, but I really don't get it. Can anyone explain how you can have any confidence with these small sample sizes?
Imagine you have a purchase form that converts at 0.1%. You test a new change, and with the first 22 users (11 in each group), if the control group randomly gets a conversion, it will have AT LEAST a ~9% CVR now. Okay, maybe you ignore that data, since you can just compare to historical data. Whatever.
Even if the new form is dramatically better and has a 30% CVR -- There's a 3% chance it would show a ~9% CVR also, and about the same that it would show a 0% CVR. Similarly, there's about a 6% chance it would show a 91%+ CVR -- skewing your results just as badly in the opposite direction. There's only like a 30% chance you get close to the "true" CVR.
Unless these hydroxycloroquine is close to 100% effective (and that doesn't seem to be what any of these studies say) I just don't understand how there's any confidence with small sample sizes.
Does anyone have a link to how this works? I've asked a few friends in medicine, and they haven't given a good answer. Haven't found anything on Wikipedia. Probably don't know what to search for.
You're thinking about things correctly. The statistical and scientific methodology of clinical trials are by and large the same thing as A/B tests. I don't think anyone is drawing much from this sample of n = 11 except to say it's definitely not a miracle cure. No approval would normally be given to a new treatment for any study of this size.
If an A/B test took a form from almost 0% conversion to almost 100% conversion, you might only need 11 users to show that. But even in that case, the devil is in the details. Were your users really a completely random sample from your population, etc?
The other side of the coin is, can we say these are non-interesting treatments with only 11 subjects? Or are the studies too underpowered to say much of all?
I don't think your scenario is as different as you think.
Depending on what we mean by "back to life", a trick is in fact much more likely than something thought to be biologically impossible.
If we eliminated the possibility of a trick -- that is, proved beyond a doubt that 11 people were dead, and then alive -- I would absolutely not be ready to attribute it to some new treatment. Natural law is being violated here! I would instead start checking for other possibilities, like:
- Are other people spontaneously reanimating around the world, or just in this study?
- Did these 11 people have anything else in common? Extraterrestrial origin, perhaps? Unusual religious beliefs?
- Hang on, let's double (triple, quadruple, etc.) check that they were really dead. And then go back and check again.
- And by all means, start a proper clinical trial of the new treatment while we're trying to understand how resurrection is even possible.
That wouldn't really be n=11 because there is a huge implicit control group of people who haven't come back to life, and if you do the math it would very strongly reject the null hypothesis.
Is it too morbid to joke about it actually being a great time to get a lot of samples for a study?
I mean, that plus Zoom and you're like halfway to a world-wide sample size, more-or-less
On a serious note:
I'm extremely grateful for everyone who's working to solve this big and complicated problem, and (like the other poster said) it's entirely reasonable that it's really difficult to get larger studies going.
I don’t believe it. Studying how to effectively treat coronavirus patients is literally the most important problem in the world right now, and I’m sure a lot more than 11 severe cases are being treated with hydrochloroquine.
While your point is a good one, meaningful clinical research is hard enough under normal conditions. Forms, enrollment, protocols, data collection, statistical analysis plans, reporting, etc. It's an entire industry used to operating in scales that often span years for a single meaningful study.
And I agree with that as far as it goes, but I think it reflects the problem - clinical research contains a bunch of hurdles that have nothing to do with uncovering the truth. So in a sudden emergency, where clinical research starts to diverge from the treatment decisions of doctors and the drug supply actions of governments, I'm not going to put much weight on clinical research.
Generally speaking, a criticism of a study that focuses only on something like this is uninteresting -- while it _may_ be a valid criticism, it is not always (many valid studies have very small samples!), and so stating it unsupported is actually more like evidence that you don't know how to critically evaluate a study. As Zeynep Tufekci said: 'Anybody immediately responds to a correlation with “but correlation does not imply causation” probably doesn’t know what they’re talking about. Don’t have much to say, throw around smart sounding cocktail phrase.'