Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I totally agree with you the irony being python and languages like were built in part to reduce the complexity not only of the language but also to build and run the code… I feel machine learning is a low enough level thing that it should not be tied to a high level language like python… so I can use node, ruby, php or whatever by adding a c binding etc that to me is why this is most interesting


The problem is that python is designed assuming people want to use system-wide packages. In hindsight, that has turned out to be a mistake. Conda / venv try to bridge that gap but they’re kludgy, complex hacks compared to something like cargo or even npm.

Worse, because Python is a dynamic language, you also have to deal with all of that complexity at deployment time. (Vs C/C++/Zig/Rust where you can just ship the compiled binary).


> The problem is that python is designed assuming people want to use system-wide packages.

This wasn't true for decades, `virtualenv` was de-facto standard isolation solution (now baked in as `python -m venv`, still de-facto standard), and `pip` is the package manager (we don't talk about setuptools/distutils, ssh!). If someone still used system-wide packages that was either because a) they were building a container or some single-purpose system; or b) they were sloppy or had no idea what they're doing (most likely, following some crappy tutorial). Or it was distro people creating packages to satisfy dependencies for Python programs - but that's a whole different story (and one's virtualenv shouldn't inherit system packages unless it is really really necessary and iif it makes sense to do so).

The problem started when one needed some external non-Python dependencies. Python had invented binary wheels and they're around for a while (completely solving issues with e.g. PostgreSQL drivers, no one needs to worry about libpq), but I suppose depending on specific versions of kernel drivers and CUDA libraries is a more complex and nuanced subject.

> Vs C/C++/Zig/Rust where you can just ship the compiled binary

Only assuming that you can either statically link, or if all libraries' ABIs are stable (or if you're targeting a very specific ABI, but I've had my share of "version `GLIBC_2.xx' not found"s and not fond of those).

In a similar spirit, any Python project can be distributed as one binary (Python interpreter and a ZIP archive, bundled together) plus a set of zero or more .so files.


> This wasn't true for decades, `virtualenv` was de-facto standard isolation solution (now baked in as `python -m venv`, still de-facto standard)

Right; but python itself doesn’t check your local virtual environment unless you “activate” it (ugh what). And it can’t handle transitive dependency conflicts, like node and cargo can. Both of those problems stem from python assuming that a simple, flat set of dependencies are passed in from its environment variables.


Virtual envs are actually quite simple -- they contain a bin/ directory with a linked python binary. When the python binary runs, it checks it sibling directories (it knows it was executed as e.g. /home/user/.venv/bin/python) for what to load. You don't need the activate shell scripts or anything, just running that binary within your venv is enough; the shell script is just for convenient of inserting the bin directory into the $PATH so just "python" or "pip" runs the right thing.


> the shell script is just for convenient of inserting the bin directory into the $PATH so just "python" or "pip" runs the right thing.

Or so any reference in the program you run that launches another binary or loads a DLL relying onnthe environment gets the right one, etc. There are some binaries you can run without activating a venv with no problem, and others will crash hard, and others will just subtly do the wrong thing if the conditions are “right” in your normal system environment.


Another implication of this is that its impossible for 2 mutually incompatible copies of the same package to exist in the same environment. If packageA needs numpy 1.20 and packageB needs numpy 1.21, you're stuck.


> Virtual envs are actually quite simple

You have never trashed your system from virtualenv?

Also, there is a problem when wheels assume they can have everything like tensorflow from years ago -- I don't know about now, since tf used to be tied to cuda versions you could get into trouble installing tf versions, even with venv, conda, etc.


> You have never trashed your system from virtualenv?

Unless one have done something they shouldn't have done (in particular, using sudo while working with virtualenv), this shouldn't be possible.

Due to limitations of most commonplace system-wide package managers (like, dpkg, rpm or ebuild, not modern stuff like nix) system packages exist to support other system packages. One installs some program, it needs libraries, dependencies get pulled. And then its distro package managers' job to ensure compatibility and deal with multiple version conflicts (not fun).

But if you start or check out some project, common knowledge was that you shouldn't be using on system packages, even if they're available and could work. With some obligatory exceptions like when you're working on a distribution packaging, or developing something meant to be tightly integrated with a particular distro (like a corporate standard stuff).

That is, unless we're talking about some system libraries/drivers needed for CUDA in particular (which is system stuff) rather than virtualenv itself.


> That is, unless we're talking about some system libraries/drivers needed for CUDA in particular (which is system stuff) rather than virtualenv itself.

Sir, this is an ML thread.

Venv interacts with that poorly, though to be fair it could be googles fault. Still it shouldn't be even possible.


I mean, virtualenv is not supposed to interact with that at all. System libraries are systems' package manager responsibility. Doubly so, as - as I get it - all this stuff is directly tied to the kernel driver.

What Python's package manager (pip in virtualenv) should do is build (or download prebuilt binaries) the relevant bindings and that's the extent of it. If others say it works this way with C (that comment about cmake and pkgconfig), then it must work this way with Python too.


> If someone still used system-wide packages that was either because a) [...] b) [...]

Or simply because they are packagers for some distro and they user want a simple way to pull-in some software by it's name, while the upstream devs imaging people cloning their public repo and run the software from the checkout in their own home, with regular pull, regularly rebuilding the needed surroundings...

Not to talking about modern systems/distro with not-really-posix vision like NixOS or Guix System...

> In a similar spirit, any Python project can be distributed as one binary

A single 10+Gb binary :-D


Ridiculous, 10gb binary only if machine learning models involved. I had distributed full stack binaries in 70mb or less.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: