Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You can't normally hard link directories.

That's only to avoid loops, as far as I understand. Symlinks do allow loops, but require application programmers to handle them. So maybe we just need better APIs/API contracts around loops, rather than two types of links?

> If a file has multiple links, finding them all normally requires scanning the entire file system

Couldn't this pretty easily be solved at the file system level? Just store a back pointer from a file to each of its names.

The fact that it's possible to break symlinks very easily by deleting the pointed-to file (name) is a problem as well: Wouldn't application developers usually, or at least sometimes, want to know about the fact that they are about to break a link (or conversely, not deleting the final copy of a file and not just a reference to it)?

> So there are more reasons for symlinks than just "hard links are restricted to linking within the same filesystem"

I think this might be the only real (technical/historical) limitation. The rest could probably be worked around, but maybe having two distinct types of links, with these other binary decisions (allowing loops, making deletion explicit vs. a matter of referenc counting) being more or less arbitrarily bucketed into those two types based on what was easier to implement.



I tried to construct my argument to make it clear that I'm aware there are ways to solve the issues with hard links, but they have their own sets of trade-offs.

For hard links, it's not only that they can cause loops. There are the other issues I outlined (linking across file systems, no single canonical representation of the file in the file system, finding all the links to the file, etc).

There's no "just store a back pointer." That will obviously introduce its own set of complexities and trade-offs. Where do you store the pointers? What's the API for viewing them? What's the CLI for viewing them? Is it a new switch to `ls`? A new CLI entirely? How do you keep the pointers up to date? What sort of locking is needed when updating the pointers? What about `fsck`? How do you get this implemented across the multitude of Unix and Unix-like OS's and file systems?

(As an aside, I've been really trying to stop using the word "just" lately as I've learned that things are rarely so simple to justify the word.)

Again, I'm not saying there isn't a better solution, but I don't think it's patching up hard links. I think it's something outside the box of both hard links and symbolic links.


Re: Symlink analysis: Well said.

> (As an aside, I've been really trying to stop using the word "just" lately as I've learned that things are rarely so simple to justify the word.)

Me too! I realized how it immediately frustrated me to hear it used about my domains. I’m constantly having to work to not seem as short/blunt/know-it-all as I feel. I think this word is a connotation trap, because when I use it feels inoffensive, but when I hear it seems blunt and dismissive and I’m quick to assume the person doesn’t understand or empathize with the complexities of the situation. That’s a long way of saying I really enjoyed your aside.


Totally agree with “just”.

I’ve also tried to eliminate “but” since it usually comes across as “throw out whatever I just said and focus on this instead”.

The language we use is important and worth optimizing.


I noticed recently that I often preface statements with "just wanted to say" or "just chiming in here" and similar. I cringed hard when I realized and am working on eliminating that use of "just". Seems like the same general thing: it's never "just" X.


Right? And that’s a double whammy. You’re minimizing the statement you want to make, and you’re minimizing your place in that conversation.


> [...] I think it's something outside the box of both hard links and symbolic links.

Absolutely agreed – given your examples and all the other challenges around backwards compatibility with decades of application code, I'd also assume it would be something new entirely.

But my guess is that it would be able to meet the existing use cases of both.


Hard links don't have a canonical name though - they're all equally the same file, and this is really a problem: opening and editing a file in one location, edits it in all of them without you knowing what those locations might be.

Symlinks at least explicitly declare the dependency and how it should mutate.

A classic being /etc/resolve.conf symlinks - if I'm untarring and restore a symlink for it, I'm currently saying the file should have content from somewhere else on the system - not that the file is specific content.


> Hard links don't have a canonical name though - they're all equally the same file, and this is really a problem: opening and editing a file in one location, edits it in all of them without you knowing what those locations might be.

That is something the filesystem could store tho, in the same way it stores the number of links to a file it could be a bit more capable and store the links themselves (possibly in a xattr).

> Symlinks at least explicitly declare the dependency and how it should mutate.

They only declare one dependency one way, it's not like a symlink gives you all the other symlinks to the terminal location it will affect.


Symlinks do that too even inevitably: no matter how you change the file, it changes at all links and you can't prevent it; systemd uses this feature when it creates dependency references (the linked dependency must never differ from the source, what hard links don't ensure).


The difference is a symlink at least declares this explicitly. A hard link on the other hand looks and works like an independent file...but isn't one.


> Couldn't this pretty easily be solved at the file system level?

It will not solve a problem that does not even exist in the first place, but will rather badly break the semantics of the UNIX file system precisely at the file system level.

> Just store a back pointer from a file to each of its names.

UNIX file systems do not have files in the conventional sense. They have disk block allocations referenced to by an inode and one or more directory entries pointing back to a specific block allocation via the associated inode. This makes hard links easily possible and very cheap. It is a one to many relationship (one block allocation to many directory entries), and turning it into a many to many relationship, with each directory entry pointing to every single possible permutation of other directory entries across the entire file system a nightmare in every imaginable way.

It is even possible to zero directory entries pointing to an inode (if you poke around with the file system debugger, you can manually delete the last remaining directory entry without releasing allocated blocks into the disk block pool but the next fsck run will reclaim them anyway).


> It is even possible to zero directory entries pointing to an inode.

Historically, fsck would link such anonymous inodes into lost+found using their inode number as their name in the lost+found directory, but I admit having no idea whether this still applies to modern journaled file systems.


File system journals have reduced the likelihood of unlinked inodes ending up in /lost+found but have not eliminated it completely. There is still a non-zero chance a journal corruption as well during a unexpected shutdown or complete power loss during the journal update and something turning up after a full fsck run later.


>> You can't normally hard link directories.

> That's only to avoid loops, as far as I understand

Later HFS+ does support directory hard links, a feature introduced for Time Machine IIRC, but generally unavailable to the user.


Symlink loops are handled in the pathname resolution function in the kernel. Too many indirections of symlinks (typically around forty or so?) result in the resolution bailing with an ELOOP errno.


I at first read "errno" as "emo" and was trying to picture what that would look like.


> Couldn't this pretty easily be solved at the file system level? Just store a back pointer from a file to each of its names.

In theory yes, but no filesystem does this as far as I know.


In Windows both NTFS and ReFS keep backpointers to all their names. They store the file ID of the directory, and the name of the file in the directory. In NTFS these are stored as a special attribute, and in ReFS they reside as rows in the file table.

It's required for a few reasons. Historically NTFS has had an API to query all of a files names and this needs to be done efficiently. And when a file is opened by ID, the file system needs to construct a canonical path for it in the name space.

Source: I am the Microsoft developer that added hardlink support to ReFS. All opinions are my own.


ELOOP errno




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: