Learn Genetics

svara · on May 25, 2020

Commenters here are noting many ways in which CS and biology are different, and how computer analogies can break down.

I've got a PhD in biology, and have been into computers all my life. I write code as a biology researcher every day.

To me, there's a much more practical level to this than that of philosophical questions on how far analogies take you. Biologists and computer scientists learn, in their studies and through lots of experience, a different mindset about how things work.

As a computer scientist, finding a solution to a problem, or predicting how a system will behave, is ultimately just a question of having a deep knowledge of the system, plus being a little bit smart about using that knowledge.

As a biologist, having a deep knowledge of what you are dealing with plus being a little bit smart is just a starting point for formulating hypotheses that you will then still need to test. Every biologist knows in their gut that a plausible story is ultimately just that. It's not a proof of anything, just a starting point.

This runs really deep and can make communication between people who aren't aware of this difficult. I see this in comments here all the time, where someone has read up on a little biology, and then goes on to explain that, therefore, clearly this or that has to be true. Usually that makes me go: "Yeah, maybe. But what about all these other things you didn't consider? And what about all those things that literally no one in the world knows you would need to consider in this particular case?"

Anyhow, I think it's still productive to try to find simple physicsy explanations in biology. Sometimes it does work, and then you get things like PCR or gene editing... ;)

valarauko · on May 25, 2020

I too have a PhD in Biology, and also write code for biological problems.

I would suggest that the distinction approaches to thinking you're seeing is rather between theorists and application driven people, regardless of domain. The same difference would apply between, say, an astrophysicist, and an engineer who develops a space telescope. People at the cutting edge of research appreciate that they're at the 'edge', and that their deep domain knowledge can only ever serve as a starting point for answers that may not even exist. Scientists working in applications aren't interested, nor need to worry about the ontological limitations of deep knowledge. If, for example, my concern is development of ML based high throughput cancer screening, I already know this is a doable problem (with varying levels of success), and just requires my deep knowledge + smarts. If instead I'm a researcher working on a unified theory for all cancers, I would (ideally) be self aware enough that this is something that might not even exist. I imagine the same dichotomy exists in CS between researchers (i.e., cryptography, AI, quantum computing) and developers.

lowdose · on May 25, 2020

Are you willing to consider the opinion that CS people can make a similar mistake with out of field problems as people with a PhD in Biology make about CS problems?

sn9 · on May 27, 2020

(DISCLAIMER: The following is based on my own experiences and may not agree with your own. These are just things I've personally observed to be true. And of course these are just tendencies rather than absolutes.)

Do you often see biologists without CS backgrounds making those kinds of mistakes about CS topics that people with CS backgrounds make about biology topics?

I also have a background in CS and biology, with some math and physics, too. I've also observed the tendency of people mainly trained in formal sciences (e.g., CS, math, logic) who's schooling involves a large amount of deriving things from first principles, to make this mistake (this is also true of more mathematically inclined physics majors and certain flavors of economics majors). They often think that what they know is enough to derive a novel insight from first principles, when the further you get from physics, the less true that becomes as the nonlinearity and sheer complexity of the world starts to interfere.

Different areas provide different ways of thinking with different strengths and weaknesses. They aren't mutually exclusive in the sense that learning one makes it harder to learn the other, but they require a non-trivial depth of study to pick up so most people tend to get mentally siloed unless they either study one of those other fields or somehow pick it up through a more non-traditional route (which absolutely happens, but is less reliable).

If you want an example of the type of things biologists tend to be weak at, I'd say quantitative thinking (at least relative to other sciences). Biologists tend to be the most math-phobic of the natural sciences, so most have a mental ugh field around most math. You'd be shocked how many grad students can't do some fairly basic stuff. Many undergraduate programs barely require calculus, though that's slowly changing, and you'll often get some exposure to some elementary probability and combinatorics in your introductory genetics class. And it's shocking how many people with whom I studied evolution and comparative physiology and anatomy didn't come away with some degree of probabilistic intuition.

Other fields I've studied to variously minor degrees that train particular mental habits or develop particular skills and perspectives that I've found valuable are psychology and the study of cognitive biases, cognitive science, computer programming and software engineering, chemistry and biochemistry, literature, economics (both the traditional kind and the more modern behavioral kind), probability, game theory, history, anthropology and a few others.

Fields I suspect train other mental habits/skills/etc. that I lack and haven't yet studied include poetry, martial arts, dance, jazz and/or improv, music theory and music in general, drawing, deeper dives into the topics I've already encountered, and probably a ton more that I'm can't remember.

Really the more wide ranging your curiosity is, the more well-rounded you become. And since most people don't bother leaving their silos (maybe a handful of others at most), you can after many years start to put together all sorts of insights that others find non-obvious (though they will still rarely be novel).

(I think nowadays people might call these "mental models", but learning about mental models directly through a description in a listicle has always seemed less useful than studying the fields themselves and indirectly building the mental model yourself.)

valarauko · on May 27, 2020

As a bioinformatician, while I largely agree with you, it's worth nothing that it's not that biologists are math-phobic or bigoted about the value of numbers - it's that biology has undergone the most radical changes in recent years of any of the primary sciences. A helpful metric (perhaps apocryphal) that I heard at a talk is that the total amount of knowledge in the sciences doubles about every 10 years, in biology every five, and in genetics every two years. Most biologists working today were trained before the genomics revolution, and have not developed the mental heuristics around it. In comparison to today, the genetics of 25 years ago feels practically paleolithic, and it's a comparison this stark is hard to find in any of the other sciences. Genomics in the early years was also an unregulated Wild West, with lots of speculative studies and predictions that never panned out. The field matured rapidly, but as a consequence experimentalists are vary of "predictions". For example, despite intense scrutiny, about a third of E. coli genes remain uncharacterised, and we don't even know if they're really genes, or an artefact of the gene prediction process. As a bioinformatician, the barrier to entry for predictions is quite low, and experimentalists are understandably cautious. It's also a case of moving goalposts - they get used to and begin to accept computational predictions from one domain, and meanwhile whole new fields of computational biology have opened up. It'll take a while for them to accept them.

lowdose · on May 28, 2020

Sounds like a great time to be alive!

valarauko · on May 28, 2020

It is! If you look at some of the brand new labs being established by freshly minted assistant professors, many of these fields didn't even exist 5-10 years ago. Machine Learning has just begun to percolate into Biology, and we are on the verge of major shifts in the field.

It's also not just computational biology that's booming. Another of the fields I follow, population genetics, has been practically rewritten in the past decade, thanks to improved techniques extracting Ancient DNA. When I started my PhD, the idea of extracting and assembling the genome of a Neanderthal was a distant pipe dream. By the time I finished my PhD, we have multiple Neanderthal genomes from across Eurasia, and the discovery of a previously unknown human ancestor, sequenced entirely from a single finger bone in a Siberian cave.

vikramkr · on May 25, 2020

This looks like a pretty awesome resource. I think it's well worth it to spend some of your quarantine time on learning the basics of the biology you need to know to understand what's actually going on with the virus that's the reason you're in quarantine in the first place. It's not super productive to try and understand this stuff through analogy to computers and algorithms. Assuming DNA works just like code amd that cells are biological computers turned out to be a very poor assumption indeed, as evidenced by how much less we understand after sequencing the human genome compared to what we thought we would have known. It's also why the secrets of the coronavirus didnt reveal themselves immediately after sequencing it's genome. See the epigenetics section for a bit of an understanding of the layer of complexity that lies on top of DNA, and let's take a moment to appreciate the biochemists and the protein biologists unraveling what makes SARS-COV-2 tick.

If you want to really grok genetics and be able to understand and interpret news and discussion about the field, especially considering how important the field is in our day to day lives, both with the virus and with biotech/medicine in general.

x3blah · on May 25, 2020

"It's not super productive to try and understand this stuff through analogy to computers and algorithms."

You mean like this: https://ds9a.nl/amazing-dna

Having worked in both industries I prefer working with wet science people. For some reason they generally have a much healthier perspective on life. Their work is humbling because it is, and will forever be, full of unsolved mysteries, not simply because it is challenging. The other folks, whether they call themselves "scientists" or "engineers" or "developers" or "coders" or whatever, are working with something that as far as I can see has no inherent connection to the natural world, other than being a production of the human mind. Perhap that affects the perspective many of them have on life. For example, how common among them is this belief that all things, not simply computers, can be thoroughly understood and mastered. Note this is pure opinion, not fact, and I am generalising; there are exceptions to every generalisation.

rtkaratekid · on May 25, 2020

I moved into programming from Neuroscience. The first thing programmers do when they learn this is talk to me about how their neural network does xyz.

I don’t know if it’s ignorance, naïveté, or hubris, but it’s amazing to me that these programmers think the world/universe/reality is a complex problem that could be easily understood eventually. When working with “wet” scientists I found that attitude was almost non existent. The complexity is just so high and there are so many unknowns that many of them are very comfortable saying “I don’t know” or “we may never know.”

One of my favorite examples to give is when I was still in undergrad, endocannabinoid research was getting hot in the Neuroscience field because it challenged the mental model that neurons communicate in a “linear” or “feed forward” fashion. Are neural networks going to implement that? Probably not, and it’s probably not worth it because at this point it introduces unneeded complexity. Try replicating the biochem of an entire cell for each cell in a NN and you _might_ be half way to achieving the complexity of the human brain.

I’m not saying this is impossible, all I’m saying that I find it remarkable how quickly programmers seem to think of themselves as “expert” on outside fields as if they’re the smartest people in the world. I will say, crusty old systems programmers tend to have more of the familiar characteristics of when I was in life sciences (Neuroscience and Genetics).

eythian · on May 25, 2020

As someone who has/does dabble in genetic algorithms and neural networks, it's always wise to keep in the back of ones mind that these systems are inspired by biology, and not generally an attempt to replicate it[0]. It's also often useful to go back to biology with an eye for ideas to steal, but rarely are they are useful model to inform biology or biological understanding.

As an anecdote, I once had a summer project between the neuroscience and computer science departments at my university. They had data from rat brains that they'd potentiated parts of (basically, zapped some neurons so they connection weights (in NN terms) got messed up and were sending too-strong signals to their neighbours), and how that potentiation decayed over time. They got me to attempt to reproduce it in a neural network. So, I build an NN system with the ability to have neurons zapped and managed to reproduce their results. But NNs being a very abstract model of a set of neurons, there are a lot of parameters that can be twiddled. By making fairly small changes to those parameters, I managed to get the exact inverse of their results also.

[0] this applies both to computer scientists building them, and also to biologists looking at them and going "that's a really poor attempt to be a brain, look at all the things it's missing."

vikramkr · on May 25, 2020

As usual theres a relevant xkcd: https://xkcd.com/1831/.

I think the level of control programmers have over their domain naturally gives rise to that sort of overconfidence. You need to remember that computer systems are built on human made abstractions to human standards and follow human defined logic. DNA is not code, it's just a molecule that reacts with stuff, as are all the other molecules. They exist as they are and are their own system that needs to be understood, we did not create that system. Chemistry and probability and time did.

x3blah · on May 26, 2020

"DNA is not code, it's just a molecule that reacts with stuff, as are all other molecules."

This sentence exemplifies the perspective to which I referred.

sankha93 · on May 25, 2020

I am from a computer science background. I understand basic biology and genetics. I have been trying to understand what are reasons why code and that cells are biological computers is a poor assumption. Anecdotal evidence like the SARS-COV2 you mentioned or things I hear from biologist friends mostly along "it is not so simple". Are there good studies that shed light on what are the missing pieces and how can we simulate/model biological processes better?

pugio · on May 25, 2020

Okay, imagine Mel, of hacker lore [0] had several billion years to write the HumanOS program...

The point is that biology is ridiculously, ludicrously, compressed. Reading a basic biology book introduces you to all of these wonderful and seemingly complete abstractions: DNA blueprints, RNA messengers, information transfer into assembly units constructing little protein machines... at least that's how we wish it would look, and how we abstraction-craving mortals would like it to go.

But Melvolution is parsimonious - it sees a region of DNA and says "well sure that section encodes one gene, but if I bump the read head up by one and start halfway through I can magically read a whole other sequence for this entirely different task. Oh and that RNA you thought was for message transfer, well turns out that the right message can cause the thing to fold up and act sort of like a protein, so let's use that too. And sure this repeated section looks like uninitialized memory "junk" DNA, but it's too much work to take out, so let's arbitrarily read from addresses 12, 42, and 107, and stitch that information into a contiguous unit. Except that every once in a million times the read head can slip and start reading from location 14 instead of 12... and that possibility is __important__ because if you take it out the whole system crashes.

Every possible quirk of chemistry and physics is ruthlessly exploited again and again and again in a million simultaneous ways. Talk about leaky abstractions.

(Not to mention that we still can't reproduce the algorithm reality uses to compute this stuff. It takes a super computer hundreds of hours to simulate a reasonably okay protein fold (which happens in a cell in a fraction of a fraction of a second) - and even then we get it wrong most of the time. )

[0] http://www.catb.org/~esr/jargon/html/story-of-mel.html

acqq · on May 25, 2020

And the biggest magic that allows for all that complexity to arise is: everything happens in parallel and at the scales that are beyond any intuition of humans. We already know that even extremely simple rules can produce extremely complex-appearing artifacts, like linear congruential generators, fractals, automata, Conway's game of life.

Now, everything is being generated all the time in all the places, in immense amounts of cases, and during billions of years. The results are extremely complex, but those results that we are aware of are only those that survived all the competition and we also see them only as the aggregates.

Humans have problems even just to imagine the exponential growth, because even that is beyond our intuition. In nature, a lot of stuff grows exponentially as long as the resources aren't depleted. That's how the latest pandemics also started to grow, before we limited the possibilities of spreading by the physical separation of human carriers.

tragomaskhalos · on May 25, 2020

This is very nicely put. Another problem is that the simple genetics we learn at school is about things like alleles for eye colour which, un-nuanced, leave the mistaken impression that the whole genome implements a similarly simple mapping to phenotype.

AareyBaba · on May 25, 2020

"It is not so simple"

These are the biochemical pathways that we know within a cell which give a picture of cellular complexity. Zoom out to see the high-level block diagrams.

http://biochemical-pathways.com/#/map/1

http://biochemical-pathways.com/#/map/2

acqq · on May 25, 2020

The background about these posters:

https://www.roche.com/sustainability/philanthropy/science_ed...

The book (2012):

https://www.amazon.com/Biochemical-Pathways-Biochemistry-Mol...

From the the accompanying text of the paper posters (all emphasis mine), it can be seen that the paper posters showed just a convenient selection of all the relevant knowledge:

"In the wall chart “Biochemical Pathways” the following principles were applied:"

"The selection of reactions has to be made arbitrarily. Of course, no discussion is necessary about e. g. glycolysis, protein biosynthesis and other central reactions. Peripheral reaction pathways are preferably selected if they are of high interest in biochemical, medical or biological research (receptors, vitamins, antibiotics, compounds of importance in regulation etc.), if they are of interest in medicine (e.g. blood coagulation, complement system), if they lead to important end products (e.g. microbiological fermentations) or if they enable comparison of phylogenetic development (anaerobic/aerobic respiration or photosynthesis in various species).

Some indication on the degree of selection can be taken from the fact that in the present “Pathways” about 1000 enzymes are shown, while the 1984 “Enzyme Nomenclature” with its 2 supplements names 2859 enzymes. Estimations of the number of proteins (with and without enzymatic activity) in a single mammalian cell are in the order of magnitude of 30000."

"e) In general, we desisted from showing detailed reaction mechanisms. Only in cases where discrete steps (e. g. in multi-enzyme systems) are involved or well-characterized intermediates exist, single steps are given. The same holds true for receptor-activation steps etc.

f) The interrelationships of metabolic pathways cause the biggest technical problem of graphical representation. Since many compounds take part in various pathways, one would obtain a “spider web” of lines criss-crossing the whole chart. In order to avoid this, one has to “cut” connections. The respective compounds, which reoccur in other places of the chart, are written here in sharp-edged boxes."

smallnamespace · on May 25, 2020

Computer code is written by humans and based on abstractions that are designed to be comprehensible to the human mind. You can say that computer programs are memes that undergo two types of selection pressure: they need to be understandable to both humans and computers.

Genetic code only needs to work, therefore it’d be surprising if neatly grokable abstractions fall out of the system.

akiselev · on May 25, 2020

I disagree with the other commenters: code and computers are a great analogy for biology. The first poor sod we'd call an "engineer" started writing the code of life four-ish billion years ago and a trillion trillion trillion engineers followed, giving new meaning to "reverting to the mean." Lacking any kind of version control or even a method of communication with adequate error correction, they kept copying the software over and over again, each one modifying a few bits at random, until their were trillions of tiny variations all competing for attention. The only commonality between all the engineers was nucleotides and amino acids - hardly a universal language capable of supporting comments - and a few billions years later, these nerds discovered sex and a new level of technical debt was born.

Here we are, a few mass extinction events and genetic bottlenecks later, trying to decrypt code with no history because it has a half life of a few hundred years.

Oh and the worst part? The computer architecture can only be programmed using a bootstrapped compiler - and we've lost billions of years of releases. That's why every program basically looks like a chicken before linking.

acqq · on May 25, 2020

Even if that narration is something easily relatable by software developers, the simplification is too big to allow the reasonable comprehension of the subject.

So I still suggest anybody interested to really try to learn more about the actual science topics, instead of comforting themselves falsely believing they "understood" anything.

For the start, I would suggest the nicely produced courses:

https://www.thegreatcourses.com/courses/biology-the-science-...

https://www.thegreatcourses.com/courses/understanding-geneti...

Just to illustrate how non-intuitive our "common sense" is, the current estimate of the number of human cells in human body is 30,000,000,000,000 (30e12). The current estimate of the number of bacteria in human body (in the mouth and guts) is 3 times more.

The size of human genome (present in each human cell) is around 3 billion (3e9) base pairs (encoding information units of DNA https://en.wikipedia.org/wiki/Base_pair ). But not all the information is in the DNA alone.

The biochemical reactions happen in parallel even when a single cell is observed.

The reconstruction of gene transcription, also played in the speed it actually happens:

https://www.youtube.com/watch?v=7Hk9jct2ozY&t=248

rumcajz · on May 25, 2020

I agree that the analogy is fruitful, but you have to look at a big ancient codebase full of spaghetti code, dead code, code that nobody knows about and so on. Parts are in COBOL and assembly, parts are in JavaScript. There are shims on top of shims and a lot of mutually interacting half-assed attempts to rewrite the codebase. With that kind of system you get a glimpse at how a biological system looks like.

sn9 · on May 27, 2020

It's more like zillions of lines of uncommented assembly written by exponentially more programmers who each try to execute the code against a particularly brutal test harness.

TeMPOraL · on May 25, 2020

As a programmer with a side interest in molecular biology, my take is this:

Yes, protein synthesis via RNA translation, as explained in high-school biology, does look like reading Assembly opcodes with 3-pairs-long window, with some opcodes being redundant (translating to the same protein). But then you learn (probably not in school) that they're not redundant at all; some organisms (like bacteria) actually translate RNA with different offsets of the read window, so the same RNA string will code for different proteins simultaneously. Sure, we did things like these back in the heyday of the industry, but that's just the tip of the iceberg.

Then you discover things like epigenetics, or that protein functions are determined by how they fold (something we can't simulate just yet), or that horizontal gene transfer (direct exchange of genetic material between cells, instead of through reproduction) is not only a thing, it's a very important (if not the primary) way by which bacteria evolve. You zoom into how electron transport chains work in chloroplasts (i.e. how cells power themselves by light), and you see a series of complexes that are tuned in dimensions to enable quantum tunneling of the electrons. And now think of how all of these is like a bag of sand - everything is there next to each other, bumping against each other all the time, and not only it works this way, a lot of things in the cells depend on that random walk to work.

In the end, I think our programming experience is useful for viewing some of the biology in terms of high-level systems and their interplay. On the mathy side, perhaps the fundamentals of information theory can inform some aspects of biological research. But beyond that, these things are just different. It's like you took a piece of code, run it through an optimizing compiler, and then through a magical demoscene compressor that makes the code self-modifying, re-encodes opcodes on top of each other by exploiting misaligned reads, makes the binary use PC register as arithmetic input everywhere, and ensure lock-free parallelism by abusing delays from cache misses as synchronization mechanism. And then you give that to someone else to reverse-engineer. That's like 1% as difficult as the stuff biologists have to deal with.

Jabbles · on May 25, 2020

A better analogy would perhaps be studying how a machine-learning model works, once it has been fully trained.

Code is written by humans for humans. The way models are trained has some similarities with the way evolution has created our biology.

yrb · on May 25, 2020

The mismatch is expectations about what would be known from sequencing DNA doesn't speak to the validity of the analogy but to the poor mental models about the reality of code, computation and systems we have.

Though I agree that the analogy doesn't really buy you much useful leverage when seeking useful understanding.

lend000 · on May 25, 2020

Indeed. Turing arguably had a better intuition for how developmental biology works than the status quo of most "experts" I read 80 years later [0].

[0] https://en.wikipedia.org/wiki/The_Chemical_Basis_of_Morphoge...

Qision · on May 25, 2020

Thanks a lot for this! Could someone recommend me a good book to start learning cell biology and genetics? I study physics and I'd like to learn about biophysics.

AareyBaba · on May 25, 2020

The Molecular Biology of the Cell is the standard text. https://www.amazon.com/Molecular-Biology-Sixth-Bruce-Alberts...

But you'll probably benefit from taking an Edx course https://www.edx.org/course/introduction-to-biology-the-secre...

villedepommes · on May 25, 2020

Do I have to brush up on inorganic and orgranic chemistry in order to get the most out of this book?

AareyBaba · on May 25, 2020

You should be able to get through with basic high-school chemisty. You need to know there are elements C, H, N, O, P, Ca, K, S, know what an ionic bond and covalent bonds is and be able to look at the structure of a molecule.

villedepommes · on May 26, 2020

thanks!

mjg59 · on May 25, 2020

Molecular Biology of the Cell is a wonderful book (my first PhD advisor was one of the authors of the 3rd edition), but if you're more interested in the purely genetics side of things then I'd recommend Genes by Lewin - it's definitely the book I relied on most during my undergrad degree, and it's written in a way that largely lets you bootstrap from not having a strong biology background.

ramraj07 · on May 25, 2020

Others recommend MBoC, I personally liked Lodish better. More importantly, cell biology is one of those subjects that can actually be fairly light reading. It doesn't burden itself too much with arbitrarily named cytokines, or the astonishingly complicated biochemical pathways and enzyme names. The processes that govern each aspect of the cell are surprisingly very unique so each chapter is a veritable cornucopia of amazing methods our cells use to solve their problems.

The pretty pictures don't hurt either!

sudokill · on May 25, 2020

Is there are other good resources like this? I am definitely going to read this.

Rochus · on May 25, 2020

Great material, thanks for the hint.

Unfortunately some of the animations require Flash or other plugins which are no longer common.

cameronbrown · on May 25, 2020

I like this. On a more meta note, does anyone have any generalist advice on picking up the basics of a new field? For instance, I'm looking into hobbyist electronics, but it's hard to know where to start.

chrisco255 · on May 25, 2020

https://hackaday.com/ is a great community for learning hobby electronics.

hemdawgz · on May 28, 2020

For me it involves scrounging around for anything related as a jumping off point, and sniffing for important keywords and concepts to look up and branch out into learning. Usually it takes me a few days of exploration before I stumble upon a really good or authoritative resource of some sort, and I end up mapping out the internet locations of related communities and resources in the process.

Of course, it's much faster when having experienced friends or acquaintances to help you navigate.

yters · on May 25, 2020

People say DNA is not like computer code. However, it seems that people are really saying it is not like human written computer code. Otherwise, it is still a digital code that is turing complete.