## Friday, December 09, 2005

### Inference

Consider the numeric sequence:

1, 2, 3, 5, 7, 11, ?, ?, ?,...

There are an infinite number of formulas that could generate this sequence.

Consider these conjectures:
1) The sequence is generated by an unknown formula that produces {1, 2, 3, 5, 7, 11} and follows it with indeterminate numbers.

2) The sequence is generated by a formula that produces {1, 2, 3, 5, 7, 11} and follows it with the sequence {6, 3, 2, 6,...}.

3) The sequence is an ascending list of integers evenly divisible only by themselves and 1.

The list of such consistent conjectures is infinite. Which ones do you infer?

I can see three criteria that we might apply to determine whether or not an inference is valid:

(A) A conjecture is inferred if it merely replicates the data, i.e., it is consistent with the data.

(B) A conjecture is inferred if it replicates some subset of the data, and makes a prediction about the future.

(C) A conjecture is inferred if it replicates some subset of the existing data from another subset of the existing data, i.e., it internally predicts the data.
If you go with criterion (A), then every data set should lead you to infer a solution like #1. But #1 isn't explanatory. You don't know anything more about the data (or the future) if you accept solution #1.

Solution #2 has the added virtue that it is predictive. However, it has more free parameters than the data set. You have no more reason to infer this theory instead of a different theory that will predict any other number as the next number in the series. This solution could be inferred by criterion (B).

Solution #3 meets criterion (C). You have reason to choose solution #3 based only on the data in the existing set. Any number in the sequence predicts the next and previous number in the sequence.

To summarize the criteria met by these solutions:

Criterion A Criterion B Criterion C Yes No No Yes Yes No Yes Yes Yes

I claim:

i) that all three solutions are consistent, but not all are inferred, unless inference means the same thing as consistency. I reject criterion A.

ii) that a solution must be future-predictive to be scientific, but that future prediction alone is not adequate for inference. I reject criterion B.

iii) solutions that are internally predictive are inferred. Your inference should enable you to predict some subset of your data from another subset of your data. I accept criterion C.

This issue of inference came up in a recent discussion about Intelligent Design. My claim is that generic ID doesn't make any predictions, either internally or externally. Therefore, it's not even inferred, let alone scientific. However, if your ID theory is specific enough to allow you to predict, say, one aspect of the fossil record from another aspect, then you can make an inference. Your inference may be less than scientific, but it is, at least, an inference. To do this, you have to know enough about the physical limitations of the designer to say why the data is the way it is.

The next question is, are there any explanations that meet criterion B, but not C?

I'm guessing there aren't because any such solution wouldn't explain the existing data set. It would just be a wild guess about the future.

michael said...

so here's a question; I'm confused as to the difference between (A) and (C). Because surely if (A) is met, then one is assumed to be using the totality of the data available. But for (C) one is using part of the data available, and then checking with the rest of the available data. But this was done for (A), because one had to make sure that all the available data fit. SO the difference between (A) and (C) is only produced through the method, while any outcome produced by either will be equally attainable through the other.
I agree that (B) is useless because it is not required that all the data is considered.
one is not a prime.

Doctor Logic said...

Your question is a reasonable one because I used the term "replicate" inconsistently. In (A) I mean it as "copied", and in (C) I meant it as "predict".

Consider this solution:

4a) Roses are red.

This can always be restated as

4b) The sequence is generated by an unknown formula that produces {1, 2, 3, 5, 7, 11} and follows it with indeterminate numbers, and roses are red.

This is because "Roses are red" is consistent with the data, but not dependent on it. I can always "fit" any irrelevant proposition to the data just by copying the data and adding it to my irrelevant conclusion.

Is 4b a valid inference from the data? It may well be true, but that doesn't make it an inference.

So the question is, what is the minimum requirement for a inference that will exclude irrelevant statements or statements that don't add any new information?

My claim is that an inference is the recognition of a pattern in the data. A pattern can be expressed as the seeming dependency of one part of the data on another part. If the pattern is global, then you might be able to start with any subset of the data, and predict the remainder in such a way that no element of the data is more privileged than another.

So, the three criteria I am evaluating are:

A) Consistency with the data.

B) Future predictions.

C) Predictions (patterns) within the data.

You are correct about prime numbers officially starting from 2, but, since the sieve algorithm works from 1, I figured, why not throw it in there!

michael said...

fair enough