Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.

One of the best reviewed functions I wrote at work is a 2000 line monster with 9 separate variable scopes (stages) written in a linear style. It had one purpose and one purpose only. It was supposed to convert from some individual html pages used in one corner of our app on one platform into a carousell that faked the native feel of another platform. We only needed that in one place and the whole process was incredibly specific to that platform and that corner of the app.

You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them. Yet, each step had subtle assumptions about what happened before. The moment we would have spent effort to make them distinct functions we would have had to recheck our assumptions, generalize, verify that methods work on their own... For code that's barely ever needed elsewhere. We even had some code that was similar to some of the middle parta of the process... But just slightly didn't fit here. Changing that code caused other aspects of our software to fail.

The method was not any less debuggable, it still had end to end tests, none of the intermediate steps leaked state outside of the function. In fact 2 other devs contributed fixes over time. It worked really well. Not to mention that it was fast to write.

Linear code scales well and solves problems. You don't always want that but it sure as hell makes life easier in more contexts than you'd expect.

Note. Initial reactions to the 2000 line monster were not positive. But, spend 5 minutes with the function, and yeah... You couldn't really find practical flaws, just fears that didn't really manifest once you had a couple tests for it.



I don't know if it is still like this, but the code for dpkg used to be like this, and it was amazing: if you ever needed to know in exactly what order various side effects of installing a package happened in, you could just scroll through the one function and it was obvious.

To this end, I'd say it is important to be working in a language that avoids messing up the logic with boiler plate, or building some kind of mechanism (as dpkg did) to ease error handling and shove it out of the main flow; this is where the happy path shines: when it reads like a specification.


I don't think the fact that a function works well is a good enough reason to write a 2000 line function. Sometimes there are long pieces of code that implement complex algorithms that are difficult to break into smaller pieces of code, but those cases are limited to the few you mentioned.


Computers execute code in a linear fashion, why on earth would you "need a reason" to NOT abstract something? Just because abstraction is often the right thing to do doesn't make it the base case.

It's like saying you need a reason not to add 4000 random jumps in your assembly code just to make it more difficult to read...


Source code isn't written to be executed by computers, it's written to be read by other humans.

Source code tends to be very far removed from how computers execute anything, so I wouldn't use that as a justification for any sort of code style.


> Source code isn't written to be executed by computers, it's written to be read by other humans.

It is pronounced "documentation".


> that implement complex algorithms that are difficult to break into smaller pieces of code

My longest code is always image processing. It's usually too hard to break up for the sake of breaking up. There's nothing to reuse between the calls to filters/whatever.


The default should be reversed, don't break into smaller pieces unless there's a really good reason.


>I don't think the fact that a function works well is a good enough reason to write a 2000 line function.

The fact that it works well and reads well (when it does, as in the parent's case), is.

Aside from those factors what else would be against it? Dogma?


I guess all we know is there were 2K lines of code and the commenter thinks that was the right way to do it. It would be necessary to see the code to appropriately critique it.


Not just the commenter, but his team as well. It passed code review with flying colors, apparently. The moral of the story is that there always exceptions and developers should not be ideologically committed to one approach above all else.


we know more than that: You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them. Yet, each step had subtle assumptions about what happened before.

what we don't know is if it would have been possible to abstract those assumptions away so that functions could have been defined without them.


We do know that if we trust the poster, they said very clearly it could have been done but they didn't consider the value to outweigh the downsides.


yes, i meant we don't know if it would have been possible to extract functions in such a way that they are actually safely reusable.


Even the contrived example in the post can be factored differently (and better imo). How do we know those 9 scopes are appropriate?


>The moment we would have spent effort to make them distinct functions we would have had to recheck our assumptions, generalize, verify that methods work on their own

Why? Why can't the functions say "to be used by <this other function>, makes assumptions based on that function, do not use externally"? Breaking out code into a function so that the place it came from is easier to maintain... does not mandate that the code broken out needs to be "general purpose".


Specifically, in that place, there was no need. And prematurely splitting it would have caused us to overthink and over generalize. Having a long, linear and tested function was a better choice.


I understand your point, but perhaps that would have simply been an opportunity to refine your approach to code design. If such a situation leads to excessive deliberation and overgeneralisation, your code base must be riddled with unnecessary overthinking and overgeneralisation.


Or maybe it was just a long, sequential algorithm where breaking it up wouldn't have been an improvement.


I have been programming for more than 30 years. Except for code generated explicitly to be only consumed by machine, I've never come across a function consisting of 2000 lines of code that should not have been broken up. Something is wrong there, and if you show me the code, I'll tell you what's wrong with it.


Glad you can see that without even looking at the code.


Some things you don't have to see to know whats going on. Function with 2000 lines of code? Have fun rationalising this.


I worked with an engineer that wrote the most clear and elegant linear code. It was remarkable, never seen anything like it since. I can't reproduce it but I do have an idea of what a well designed linear function looks like.. a story.


I was just thinking that if I _needed_ to refactor this I might structure the stages as chapters in a book. One might be able to write an inner class or some such that had a “table of contents” function that called each stage in sequence as a void function with data managed out of line, maybe via cleverly designed singleton structs. Then the code itself can be written in order with minimal boilerplate between stage boundaries.

I think I’ve worked with some Python that looked and worked this way. I can’t place the details but probably in a processor pipeline running over a particularly hairy data format. Consider ancient specifications written by engineers talking on the phone encapsulated in relatively “modern” but still vintage specifications, sometimes involving screen-scraping a green screen mainframe terminal, wrapped in XML and sent over the internet. Anyway, point is I couldn’t agree more about stories.


I will agree that it takes some skill, not that I am great at it. It's a different kind of skill than abstraction. Reading error handling in c code offered good insights for me to learn linearity better (c code that uses goto to jump to the end of a function for cleanup when an error occurs, for example).

However, if you screw up linear code, you screw up locally. If you write poor small functions, the rest of the team screws up because they barely ever read the contents of your functions that call other functions that call other functions. I've had way more problems with stuff being called slightly out of order, than with large functions.


That is true of well designed nonlinear code as well.The code needs to tell a story or it will be a mess.


You don't have to write tests to prove that private methods work on their own. Just test the public behaviour.


At first I thought how horrible, but basically you have sort of 9 functions within the same scope, each having a docstring. So I guess not too different from splitting them up.

I read you have "end to end" tests.

One question though: Wouldn't each part benefit for having their own unit tests?


Maybe, maybe not. For our particular case it would have been mostly wasted effort.

I found that I like to write tests at the level of abstraction I want to keep an implementation stable. I'd be totally fine if someone went in and changed the implementation details of that long process if needed. We cared that stuff got cleaned up at the end of the process, that the output matched certain criteria, that certain user interaction was triggered and so on... In that case it made more sense to test all our expectations for a larger scope of code, rather than "fix" the implementation details.

Tests usually "fix" expectations so they don't change from build to build. Tests don't ensure correctness, they ensure stuff doesn't alter unexpectedly.


Tests effectively freeze requirements; you should test those things which should be preserved throughout any changes, and not test those things which should be open to change. In this case, it seems that is no real requirements for any of these 9 steps - perhaps the implementer could figure out how to do the same outcome by skipping a step or merging two steps, and the existence of unit tests for these 9 functions somehow encodes the idea that these 9 functions each are inherently needed, which is not necessarily true.


>One question though: Wouldn't each part benefit for having their own unit tests?

Not necessarily better, especially since this allows for the case where individual unit tests pass fine, but the combined logic fails.


If the sub-functions could be reused and people would be tempted to change them, then that’s what your tests are for. In fact, it’s often tricky to test the sun-function logic without pulling them out because to write the test you have to figure out how to trick the outer function to get into certain states. Follow the Beyoncé rule: if you like it: put a test on it. Otherwise it’s on you if someone breaks it.


> You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them.

Good thinking. Now they’ll just add 50 flags and ten levels of nested ifs instead which is much simpler.


2000 lines is like a small project. I cant imagine putting that all in one function.


>”but then devs would be tempted to reuse them”

Isn’t that the fucking point? Having a 2000 line function is a code smell so bad, I don’t care how well the function works. It’s an automatic review fail in my book. Abstractions, closures, scope, and most importantly - docs to make sure others use your functions the way you intended them. Jesus.


Some devs did find it a code smell... But each scope had a clear short high level comment describing what it did, there were end to end tests for the method, and very little state flowed from scope to scope (some did) - because that's what scoprs do... Prevent variables from leaking.

My point is the code smell isn't always accurate, and there are times and even for 2000 line monsters other devs agreed that it was the best way to hide complexity away from the rest of the codebase in that case. If we ever needed to factor things out (we never did), we could spend some effort and do it.


Have you tried reading code instead of smelling it?


A code smell means you should look into it, not that it's wrong.

Some things are genuinely 2kloc-complex. Maybe not that many. Do check! But some are.


Definitely not that many. Even for me this was an outlier, but it made me more comfortable with functions most people would consider long.

I'd like to clarify this was not necessarily 2kloc-complex, this was just 2kloc-long-and-not-really-meant-to-be-reused. It was a fairly long but linear process that was out of the ordinary for the rest of the codebase. It could easily have been split (hell, I had 9 fairly separate stages), but calling any of the intermediate stages out of order or without the context of the rest of the execution flow... would have been a foot gun for someone else. And, as time showed, we never needed those stages for anything else.


Agreed. I’ve written plenty of software of all kinds and have never had to write a 2000 line long methods (although I have had the joy of refactoring such messeses a time or two).

Just don’t do that. Your code doesn’t have to have abstractions out the wazzo, but if your class (or method) is getting bigger than 1000 lines that’s a great sign that it’s doing too much and abstractions can be teased out. Your future self will thank you, as well as your team.


I like this from Sandi Metz:

> You can't create the right abstraction until you fully understand the code, but the existence of the wrong abstraction may prevent you from ever doing so. This suggests that you should not reach for abstractions, but instead, you should resist them until they absolutely insist upon being created.


At least in the mobile world, I find that this “no abstraction” approach is the default one, and it usually leads to huge objects which do everything from drawing views to making network requests to interacting with the file system. These kinds of classes are quite hard to work in, hard to test, and also keep snowballing to get bigger and bigger. Things usually end with unmaintainable code and a full rewrite.

I am not saying you need to create complex abstract hiarchies right off the bat. But usually, it’s pretty easy to tease out a couple significant abstractions that are very obvious, and break down your classes by a factor of two or three. Just getting such low hanging fruit will prevent you from ever having a 2000 line long method.

And for the folks who are saying that they make sure to not add abstractions too early - are you disciplined enough to go back and add them later? I feel like if you’re the kind of engineer that busts out 2000 line methods, you’re also not going to refactor it as this method grows to 2500 or 3000 lines or beyond.

Probably most robust software you depend on is full of solid, quality abstractions. Learning to write code like this takes practice. The wrong abstraction might be wrong, but it’s one step closer on your journey to growing as an engineer. You won’t grow if you never try.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: