Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the abstract: "A pseudo-random number generator (PRNG) is a deterministic algorithm that produces numbers whose distribution is indistinguishable from uniform."

I think this is incorrect, since it would imply that a program that produces the sequence 1,2,1,2,1,2,1.... is a PRNG.



Clearly, a PRNG is a subtle concept and it's common for abstracts to not be completely precise. In context, it's clear to me that by "uniform" they mean "a random process which generates numbers independently from a uniform distribution." Your process is generating numbers uniformly, but successive outputs are definitely not independent.

One reason why I feel your interpretation is strained is that the phrasing makes "uniform" sound like a single process. You seem to be reading it as "numbers which are distributed uniformly," but the word indistinguishable really suggests that it's more than just the distribution that matters.


Using ent, and 0's and 1's (instead of 1's and 2's).

(By this I mean a pattern of 0,1,0,1 bits etc. - 4248759 bytes worth in this case)

Below is the output from ent:

----------------------------

Entropy = 0.000000 bits per byte.

Optimum compression would reduce the size of this 4248759 byte file by 100 percent.

Chi square distribution for 4248759 samples is 1083433545.00, and randomly would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bytes is 170.0000 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is undefined (all values equal!).


No, your sequence IS distinguishable from a uniform sequence. At least it is if I understand what you mean by "...".

If you mean that it generates 1,2,1,2,1,2,1 and then after that generates a mix of different values that include all possible values in approximately equal frequency, then you may have a fairly good PRNG. If you mean that it produces 1,2,1,2,1,2,1 and then after that it continues to alternate between 2 and 1, then this is easily distinguished from uniform.


>"No, your sequence IS distinguishable from a uniform sequence." No, it is not distinguishable. The definition according to wikipedia: The discrete uniform distribution is a symmetric probability distribution whereby a finite number of values are equally likely to be observed; every one of n values has equal probability 1/n.

So the distribution is indeed uniform but it fails in being random.


The point is that, in the context of the paper, you are supposed to group consecutive sequence elements and find their distribution.

Consider the sequence 1,2,1,2,1,2,1.... When you take elements one at a time, you have a uniform distribution, but the sequence is clearly not random. However, when you take elements two at a time, you have; 12,12,12,12... In this case, the distribution is not uniform, which indicates non-randomness.

The paper's definition implies that you're grouping the sequence elements like this and, for all groupings, you find uniform distributions. This is of course common knowledge to experts, and is made clear in the body of the paper.


Actually, it is more general than that. Any polynomial time algorithm that takes as input a sequence and which has access to a unbounded source of uniform random numbers (i.e. a "true" random number generator) should output a 1 when given a uniform random sequence with "nearly" the same probability as when given the output of a PRNG ("nearly" here means within a negligible amount e.g. the algorithm might just try to guess the PRNG seed). In other words,

  Pr[A(prng(seed, n))] - Pr[A(uniform(n))] < 1/negl(n)
Where for all polynomial functions p(n), there exists an N such that for all n > N, 1/p(n) > 1/negl(n); uniform(n) is an n element long uniform random sequence, and prng(seed, n) is the first n elements of the output of the PRNG for a (uniform) random seed. This is a common definition in cryptography and a variant is mentioned further down in the paper in Section 2.


All that says is that 1,2,1,2,1,2... is a possible uniform random sequence. It is not the only such sequence, and it is a very unlikely sequence.

The notion of "distinguishible" here refers to an experiment where an algorithm can receive one of two possible inputs, with equal probability: a sequence of uniformly sampled numbers, or a sequence generated by a PRNG. If the PRNG is secure, the algorithm should output "1" when it receives the uniform sequence with (very nearly i.e. different by a negligible amount) equal probability as when it receives the PRNG's sequence. So, for example, if you test if even indexed elements are 1 and odd indexed elements are 2, and output 1 if both tests pass, then you will almost never output 1 for a uniform sequence; on the other hand, if the PRNG always and very frequently outputs 1,2,1,2,1,2... then you will always or very frequently output 1 when you get the PRNG's sequence as input, and thus your probability of outputting a 1 is greater in that case.


What is "...", here? If it's infinite then we're talking about something abstract and you need to be more precise. If we're talking about something finite then you need the specify the finite-ness.

Note that (1,2,1,2,...,1,2) is not a "distribution," as you seem to be saying. It's a sequence of values drawn from a distribution.

For a fixed length, is it possible for the sequence (1,2,1,2,1,2,...,1,2) to be drawn from a uniformly random distribution that contains 1 and 2 as possible values? Yes. How likely is it? It depends on the number of possible values in the distribution and the length of the sequence. The more values besides 1 and 2 and the longer the sequence, the less likely that sequence will have been drawn from a uniformly random distribution.

In fact, as the length of the sequence increases, the probability of any particular sequence approaches 0 but is always greater than 0.

The crux of your confusion appears to be that you think "distinguishable" means "a sequence that couldn't possibly have been drawn from a particular distribution," which is not the case.


Please note that I am not the OC. It seems like you wrote in the wrong thread (for instance, I didn't use any ellipsis on my comment)


Ah, so your actual complaint was that they used the term "uniform sequence" as shorthand for the more accurate "uniformly random sequence".


But I never said it was indistinguishable. What I'm saying is that the definition in the first sentence of the paper is wrong or, at least, incomplete.


"whose distribution is indistinguishable from uniform"

Emphasis mine. Distribution here is not its English meaning, it is referring to the statistical term. Thus, it is talking about numbers "drawn from the uniform distribution", not "uniform numbers".

A PRNG drawing from the uniform distribution may produce that sequence (for any set that includes 1 and 2), but a PRNG that only produced that sequence would certainly not be drawing from the uniform distribution.


Perhaps you don't understand the terms in context. If you looked at the distribution of your example, it would have two big spikes in it, at numbers one and two.


The sequence 1,2,1,2,1,2,1... is not indistinguishable from a uniform sequence, at least in the context of cryptography. With high probability, an algorithm that checks if all the even indexed inputs are 1 and all the odd indexed inputs are 2 will distinguish that sequence from a uniform random sequence (this is easy to see, as the probability of a sequence of that form being generated by repeatedly tossing a fair coin is very small).


you can also check that (1,1), (1,2), (2,1), and (2,2) all happen with expected frequencies (i.e. roughly 25%), where the two numbers in the pair are just two numbers in a row in the sequence. You would have gotten 100% (1,2) and 0% for the rest, so that's suspicious.

same for 3-tuples, 4-tuples, ..., and n-tuples.

There's probably other, more sophisticated tests as well, but I'm not an expert.


I think you missed my point, which is this: the first sentence in the paper's abstract is wrong.


My examples with pairs and n-tuples are also distributions.


That's a good point, and that may well be what the authors meant.

Since all PRNGs are periodic (at least all I now of), the definition eventually breaks down. I can see how it can be made to work with a bit of extra formalism, though.

Thanks.


Any finite state PRNG has to be periodic. Good luck implementing an infinite state one.


you're quoting the first sentence from the abstract in isolation; there's a lot more detail in the paper. in particular the define the notion of epsilon-closeness (section 2) for any distinguisher (which could include one that detects repeated patterns).


My intention is not to criticize the paper, which I don't even claim to understand. I just thought the authors definition of a PNRG is wrong (or at least incomplete) and that caught my attention. If I'm wrong, I hope somebody will set me right.


i just did.

(1) an abstract is just a sketch of the entire paper contents.

(2) in section 2 of the paper they give a more precise meaning to indistinguishable which allows you to use any function you can think of to detect non-uniformity (roughly - there are details like it's a limit as you go to large sequences, so that any finite random pattern becomes progressively less likely, etc etc). so you can cover the case you are worried about by using a test that looks for repeated patterns of values.

[more generally, even with hn as it is these days, if you're asking for clarification rather than raising an objection, then you should probably prefix your comment with something like "i don't understand the paper, can someone please understand why...".]


>"(1) an abstract is just a sketch of the entire paper contents."

That is not an excuse for containing a bogus definition without a warning, especially in the abstract which is the first thing that you read to tell if the paper is interesting enough to look it in depth.


i agree it's kind-of a weird sentence for the start of the abstract - seems too introductory for the rest. i suspect it was added because a referee wanted the abstract to be more general, or just because they didn't know how else to start referring to PRNGs. writing papers is hard.

but it's not bogus, it's just incomplete. what i was trying to explain is that if something in an abstract doesn't make sense, you should look to the paper for a more detailed explanation.

and finally, in a "deep" way it's actually a very good definition of a PRNG. it's exactly what section 2 expands on. there's nothing you can fall back to that's more fundamental, that i can see, than it being indistinguishable from random.


It's a fine definition for an abstract. If it contained a full, precise definition, it wouldn't be an abstract. It's a single paragraph that summarizes a dense, academic 30 page paper. Everyone who reads them understands that the abstract is about the general idea, not formal correctness.


>"It's a fine definition for an abstract"

It is not. An abstract is one of the most important parts in a paper and that's why authors tend to avoid writing definitions on them but stating something like "A definition for X is given in the paper". It is meant to "describe" the paper rather than a summary or introduction of the subject in question.

In any case, writing a good abstract is hard, and it doesn't diminish the quality of this paper.


Abstracts should have an introductory sentence, and they quite often are a one paragraph summary of the work.


Thanks.


I don't think people understand what your point is. What do you think is missing from the sentence? An explicit mention of independence? Why do you think 1,2,1,2,1,2,... is indistinguishable from uniform? In another response you said

> But I never said it was indistinguishable

which is confusing in context. What properties do yo claim your sequence has?


I'm sorry for being unclear.

The authors define a a PRNG as a sequence such that, if you find its distribution (histogram), it resembles the uniform distribution (all values appear the same number of times).

In 1,2,1,2,1,2,..., all values appear the same number of times and its histogram is uniform. However, it is not random. It is clearly not random -- so it is distinguishable from a uniform "true random" sequence, which was my point.

However, as was explained to me by another commenter, what I missed is that you may group a number of consecutive numbers together and find the histogram of that, and, in that case, my sequence fails to have a uniform histogram.

So, at this point I'm willing to accept that the author's definition is correct and I just had a hole in my understanding.


I guess I could see your interpretation, but it requires that you ignore all of the context of the paper. For instance, you take distribution to equal histogram. This may make sense in some contexts, but here it's absolutely not true; the values are supposed to be taken independently from some probability distribution.

Pro tip: if you think you've spotted a major problem in the first line of an abstract, maybe give the author the benefit of the doubt. Especially if you're not familiar with the field.


I take your point, I should have phrased my thoughts differently. I did start with "I think", but I could have made it clearer that I didn't mean to criticize the paper, but rather to invite explanations from people who know the subject better than I do.


Yes, it's always interesting how different presentations of the same basic idea are received. "Is X false?" gets a much different response than "why is X true?". It's good to keep in mind that the second one tends to elicit more helpful replies.


Are you missing the point of a limit?

As t->inf, if we counted all of the 1's and all of the 2's and all of the 3's, etc. and there were no 3's, it's not random according to this definition or any definition I'm aware of.


I guess he is restricting the domain to [1,2] in which case 1,2,1,2,1,2... is as "random" as 1,2,2,2,1,2,1,1 ...

I guess what is bothering him is the lack of entropy in the sequence.


I am not terribly good with statistical terminology, but I don't think that the definition you're assuming for "uniform distribution" is the one in general use. Once you know a[n], a[n+1] will follow a delta distribution, not a uniform distribution.


This sequence is trivially distinguishable from something uniform.


Not all deterministic algorithms with uniform distributions are PRNGs.


I know that, but that is what the first sentence in the paper's abstract is claiming. As I said, I think it's wrong.


Breaking Bad is a TV show. American Idol is also a TV show.

However, Breaking Bad is not American Idol.


A dog is a mammal, but not all mammals are dogs.

The paper states "[a] pseudo-random number generator (PRNG) is a deterministic algorithm that produces numbers whose distribution is indistinguishable from uniform."

You are asserting the converse, that all deterministic algorithms with uniform distributions are PRNGs.


Nope, because you don't have the same number of 9s and 8s as 1s and 2s, so it's not uniform.


I don't think you're correct. A discrete distribution can be defined for any number of values. Linux, which is the subject of the paper, uses a discrete distribution too.


Ok, that's fine, you just want to use 1s and 2s (though it would have been more obvious if you'd just used 0s and 1s :) ).

In that case, you don't have as many 11s and 22s as 12s and 21s (I'm ignoring the commas since they're irrelevant).


why not 4, 4, 4, 4, 4,... ;)

obligatory: http://xkcd.com/221/





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: