Hacker Newsnew | past | comments | ask | show | jobs | submit | fatih-erikli-cg's commentslogin

They want people who not get scared of getting their hands dirty. There is something like perfectionism trap. It is very difficult to manage that.

I think recruiters would like to see what candidates will do when get some free time. It is not really a lifestyle. It is a short amount of time. Maybe couple of weekends, before they get some other work or get laid off.

E.g., Nobody wants to continue working with someone who create sound effects, movie player, operating system, etc.


> E.g., Nobody wants to continue working with someone who create sound effects, movie player, operating system, etc.

What do you mean by this?


I think it is `atan` function. Sin is almost a lookup query.

On modern machines, looking things up can be slower than recomputing it, when the computation is simple. This is because the memory is much slower than the CPU, which means you can often compute something many times over before the answer from memory arrives.

Not just modern machines, the Nintendo64 was memory bound under most circumstances and as such many traditional optimizations (lookup tables, unrolling loops) can be slower on the N64. The unrolling loops case is interesting. Because the cpu has to fetch more instructions this puts more strain on the memory bus.

If curious, On a N64 the graphics chip is also the memory controller so every thing the cpu can do to stay off the memory bus has an additive effect allowing the graphics to do more graphics. This is also why the n64 has weird 9-bit ram, it is so they could use a 18-bit pixel format, only taking two bytes per pixel, for cpu requests the memory controller ignored the 9th bit, presenting a normal 8 bit byte.

They were hoping that by having high speed memory, 250 mHz, the cpu ran at 90mHz, it could provide for everyone and it did ok, there are some very impressive games on the n64. but on most of them the cpu is running fairly light, gotta stay off that memory bus.

https://www.youtube.com/watch?v=xFKFoGiGlXQ (Kaze Emanuar: Finding the BEST sine function for Nintendo 64)


The N64 was a particularly unbalanced design for its era so nobody was used to writing code like that yet. Memory bandwidth wasn't a limitation on previous consoles so it's like nobody thought of it.

> This is also why the n64 has weird 9-bit ram, it is so they could use a 18-bit pixel format, only taking two bytes per pixel, for cpu requests the memory controller ignored the 9th bit, presenting a normal 8 bit byte.

The Ensoniq EPS sampler (the first version) used 13-bit RAM for sample memory. Why 13 and not 12? Who knows? Possibly because they wanted it "one louder", possibly because the Big Rival in the E-Mu Emulator series used μ-law codecs which have the same effective dynamic range as 13-bit linear.

Anyway you read a normal 16-bit word using the 68000's normal 16-bit instructions but only the upper 13 were actually valid data for the RAM, the rest were tied low. Haha, no code space for you!


Unless your lookup table is small enough to only use a portion of your L1 cache and you're calling it so much that the lookup table is never evicted :)

Even that is not necessarily needed, I have gotten major speedups from LUTs even as large as 1MB because the lookup distribution was not uniform. Modern CPUs have high cache associativity and faster transfers between L1 and L2.

L1D caches have also gotten bigger -- as big as 128KB. A Deflate/zlib implementation, for instance, can use a brute force full 32K entry LUT for the 15-bit Huffman decoding on some chips, no longer needing the fast small table.


It's still less space for other things in the L1 cache, isn't it?

Interesting. About 20 years ago, it must have been the other way around because I remember this paper [1] where the authors were able to speed up the log function by making use of a lookup table in the CPU cache.

[1] https://www.researchgate.net/profile/Nikki-Mirghafori/public...


It........depends.

I make things faster all the time by leveraging various CPU caches, sometimes even disk or networked disks. As a general principle though, memory lookups are substantially slower than CPU (and that has indeed changed over time; a decade or three ago they were close to equal), and even cache lookups are fairly comparatively slow, especially when you consider whole-program optimization.

That isn't to say that you can't speed things up with caches, but that you have to be replacing a lot of computations for even very small caches to be practically better (and even very small caches aren't helpful if the whole-program workload is such that you'll have to pull those caches from main RAM each time you use them).

To your paper in particular, their technique still assumes reasonably small caches which you constantly access (so that you never have to reach out to main RAM), even when it was written, and part of what makes it faster is that it's nowhere near as accurate as 1ULP.

Logarithms are interesting because especially across their entire domain they can take 40-120 cycles to compute, more if you're not very careful with the implementation. Modern computers have fairly fast floating-point division and fused multiply-add, so something I often do nowadays is represent them as a ratio of two quadratics (usually rescaling the other math around the problem to avoid the leading coefficient on one of those quadratics) to achieve bounded error in my domain of interest. It's much faster than a LUT (especially when embedded in a larger computation and not easily otherwise parallelizable) and much faster than full-precision solutions. It's also pretty trivially vectorizable in case your problem is amenable to small batches. Other characteristics of your problem might cause you to favor other solutions.


Logarithms are interesting because there's hardware to approximate them built into every modern processor as part of floating point. If you can accept the error, you can abuse it to compute logs with a single FMA.

An example of an exp and a log respectively from my personal library of bit hacks:

    bit_cast<float>((int32_t)(fma(12102203.2f, x, 0x3f800000)));

    bit_cast<float>((uint32_t)(-0x3f800000 - 36707.375f*x)) + 7;

It’s a delicate balance and really hard to benchmark. You can write a micro benchmark that keeps the lookup table in cache but what if your function isn’t the only thing being done in a loop? Then even if it’s in the hotpath, there’s insufficient cache to keep the table loaded the entire way through the loop and lookup is slower.

TLDR: it depends on the usage and we actually should have multiple functions that are specialized based on the properties of the caller’s needs where the caller can try a cache or compute approach.


It may be, especially when it comes to unnecessary cache. But I think `atan` is almost a brute force. Lookup is nothing comparing to that.

Sin/cos must be borders of sqrt(x²+y²). It is also cached indeed.


What do you mean brute force?

We can compute these things using iteration or polynomial approximations (sufficient for 64 bit).


There is a loop of is it close enough or not something like that. It is a brute force. Atan2 purely looks like that to me.

> Sin/cos must be borders of sqrt(x²+y²). It is also cached indeed

This doesn't make a ton of sense.


In what way do you think a sin function is computed? It is something that computed and cached in my opinion.

I think it is stored like sintable[deg]. The degree is index.


> In what way do you think a sin function is computed?

In some way vaguely like this: https://github.com/jeremybarnes/cephes/blob/master/cmath/sin...

> I think it is stored like sintable[deg]. The degree is index.

I can think of a few reasons why this is a bad idea.

1. Why would you use degrees? Pretty much everybody uses and wants radians.

2. What are you going to do about fractional degrees? Some sort of interpretation, right?

3. There's only so much cache available, are you willing to spend multiple kilobytes of it every time you want to calculate a sine? If you're imagining doing this in hardware, there are only so many transistors available, are you willing to spend that many thousands of them?

4. If you're keeping a sine table, why not keep one half the size, and then add a cosine table of equal size. That way you can use double and sum angle formulae to get the original range back and pick up cosine along the way. Reflection formulae let you cut it down even further.

There's a certain train of thought that leads from (2).

a. I'm going to be interpreting values anyway

b. How few support points can I get away with?

c. Are there better choices than evenly spaced points?

d. Wait, do I want to limit myself to polynomials?

Following it you get answers "b: just a handful" and "c: oh yeah!" and "d: you can if you want but you don't have to". Then if you do a bunch of thinking you end up with something very much like what everybody else in these two threads have been talking about.


It isnt good idea to store such values in code. I think it is something that computed when a programming environment is booting up. E.g. when you run "python", or install "python".

I try to understand how Math.sin works. There is Math.cos. It is sin +90 degrees. So not all of them is something that completes a big puzzle.


There's no nice way of saying this, and I mean no malice here, but I think you're exceptionally confused or ignorant, and I don't think it would be rewarding for either of us to continue this conversation.

It's ok if you don't reply If you don't have something to add to my saying. I think it is generated in like

for x in range(0, 90): for y in range(0, 90): if xx + yy < 90*90: # it is in circle

So for the each x, the one that has the greatest y will be the sin.

Something like floatrange(0,1,0.001) may work too.


> I think it is generated in like

You can think whatever you want, that's no substitute for being correct.


This is not about being correct. Posted article looks like a clickbait. I am digging what really is about. I would like to dig more Imho. I am looking for work here.

atan(x) = asin(x / sqrt(1+x*x))

So the asin is brute force. I think it is `atan` function. Written article explains nothing. Sqrt is also like that.

It looks like there is a reasonable explanation when it is written math form but there is no.


https://fatih-erikli-potato.github.io/ I am writing notes on computer graphics.


My experience, real work is what a junior dev does, without an exception. Senior developer creates over-engineered version of what junior dev made.

Senior devs gibber about database systems, sql, nosql, clustering, etc. The real work storing the input in text file, querying with a simple loop.

Plus there is an additional huge misconception in industry, if something is realtime, it is a senior dev thing. It is something like a plague.


I have 14 YOE and its the opposite. Really brilliant jrs don't have the experience to know the consequences of their designs, they build fast under pressure. They reinvent things that already exist (they don't know because they a Junior).

Seniors have PTSD and push back.


TCO and YOE?


I don't understand these acronyms.


Sorry, Total compensation (salary + equity) and years of experience


:= for assinging a variable sounds and looks weird to me


It's the walrus operator. Pascal and Python use it as well. You get used to it pretty quickly.

Pascal: https://www.freepascal.org/docs-html/ref/refse104.html#x224-... Python: https://docs.python.org/3/whatsnew/3.8.html#assignment-expre...


It was something introduced like a 1 april joke. Clean implementation of any programming language won't have that.


I actually think it’s more readable because it makes the distinction between assignment and equivalence very clear.

bugs originating from the similarity of == vs = has probably cost the industry millions over the last 3 decades.


That ship has long sailed. ALGOL 1958 used := and Pascal popularized it.

https://en.m.wikipedia.org/wiki/Assignment_(computer_science...


I think they scan the code by two characters because one is not enough for <= and => so what is why assignment is := or =:. Probably + is ++ too.


I think it's good. 2 characters for declaration + assign.


it's just a shorthand for var foo int = 5 vs foo := 5 where the type is derived from the assigned value.


I think 10000 is a lot enough for a queryable dataset. More of them is like computer generated things like logs etc.


Storing data in text costs less. A tcp connection to get some blog posts from another process is not necessary.


On other hand it's a mere blog post. You should not be bothered by TCP cost. But reliable data storage and reliability/restore (in case of backups) cost is a concern.


It is mostly a blog post. A usecase for a database that holds tables and rows is very rare in real world. I know noone who uses contacts app in their mobile phones. Access is already there with checkboxes, select inputs and everything for years. Noone uses.


Where and how do you handle the operator precedence? I couldn't find in the codebase.


I wrote about it at length here:

https://www.nhatcher.com/post/a-rustic-invitation-to-parsing...

The implementation in IronCalc follows that.


I'm cognizant it's just documentation, because the code is its own thing, but it seems to be a recursive descent parser https://github.com/ironcalc/IronCalc/blob/2c2228c2c26386b019...


[flagged]


The function for asterisks and slashes is called from the functions for pluses and minuses. The result is that asterisks and slashes are evaluated before pluses and minuses. It's actually very easy to understand once you get the idea.

https://en.wikipedia.org/wiki/Recursive_descent_parser


I don't think a recursion should be involved in the parser. Shift-reduce may be an answer but it does not fix the problem either if you try to implement that. I would like to see real code rather than written homework pieces.


You're conflating parsing of expressions with operator precedence parsers (aka Pratt parsers). IronCalc uses a recursive descent parser, and Pratt parsers are an optimization of recursive descent parsers. Pratt parsers have one function that implements all precedence levels, while recursive descent parsers have one function per precedence level, but they contain basically the same logic.

Source: 20 years ago I converted GCC's C++ parser from recursive descent to an operator precedence parser; obviously GCC respected precedence even before my work, which was purely an optimization. In fact the code I replaced had the note "The expression parser recurses through the various levels of precedence as specified in the grammar, rather than using an operator-precedence technique".

https://github.com/gcc-mirror/gcc/commit/b8b94c5ba81744d325f...


Operator precedence is an addition to parsing of expressions. Both of them is needed if you accept "2 + 1" kind of expression. "2 + 2 * 2" must be evaluated like "2 + (2 * 2)". Only expression parsing will evaluate them like 8.


You can choose: learn recursive descent parsing, or keep on being confidently incorrect.


Better not use popular toolchains then:

https://gcc.gnu.org/wiki/New_C_Parser

https://clang.llvm.org/features.html

I think this is a quite popular approach for multiple reasons. That being said, tree sitter uses GLR.


There is no recursion involved in parsing. You keep a stack for the tree.


Recursive descent uses the processor stack as the parsing stack.


Recursion is mostly involved in interpreter, and most of the time it is a bad practice. No recursion needed in parser.


Dunning-Krueger effect at its best, I am done arguing.


Nothing if my favorite programming language is installed in my pc. I make the computer fun for myself. Internet doesnt bring anything.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: