\begin{verbatim}

Ruth and I quickly went over the proof.
The counter example that I sent last night arose because the Gaussian
Error function has varying slopes.
Ruth suggested turning it into a triangle.
    /\
  /    \
/        \
Writing this, I see that it can be generalized.

Recall p=Error(u) gives that pr[Error = u] = p.
Lets require Error(u) to be strictly concave, ei  its derivative is
strictly decreasing with u.
ie has a nice dome curve
Eg it could be Gaussian with the tails cut off past the inflection
points.

Let t be drawn from ANY fixed distribution
t' = t + Error
Let Grade(t',i) be ANY function strictly increasing in both t and in
i.
Let g = Grade(t',i).

Prove
  Exp[t | g, i=0 ]  >  Exp[t | g, i=1 ]


==============

Hi,
It is hard to understand. I keep getting muddled myself.
I still claim that t in [4,4.5] works.
But now I see that neither t in [2,3] nor t in [6-7] work.

g=t+i+Error
pr[t|g=5,i] = pr[t & g=5|i]/pr[g=5|i].
The pr[g=5|i] is just to make the area under the curve be 1.
 pr[t & g=5|i] = pr[t]*Error(g-t-i)
Lets assume t is uniform in some range say t in [4,4.5].
Error(g-t-i) = prob that of the error being E=g-t-i.
Note this prob is maximized when E=g-t-i=0,
  ie when t=g-i, which is t=5 for i=0 and t=4 when i=1.
So the curve you are drawing is a plot of Error(g-t-i) vs t.
Then the plot of  pr[t & g=5|i] = pr[t]*Error(g-t-i)
is the same except truncated to be only in the range t in [4,4.5].
Then the plot of pr[t|g=5,i] = pr[t & g=5|i]/pr[g=5|i] is the same
except it is scaled so that the area under the curve is one.
Because of this, the relative heights of Error(g-t-i) for i=0 vs i=1
does not matter.
What really matters is the slope of the curve.

i=0
        /|
     /   |
    |    |
    |    |   t in [4,4.5].

i=1
    |\ 
    |  \
    |    |
    |    |   t in [4,4.5].

How do you sample a value of t?
  - throw a dart so it lands anywhere in the area of the curve with
  equal p
    Each 1mmx1mm square under the curve is equally likely to be hit by
    the dart.
    Then read the value of t for the location of the dart.
Note for i=0, because of the slope of the probability curve is upward
that t=4.5 is more likely than t=4.
So the expected value of t is closer to 4.5 than it is to 4.
Note for i=1, because of the slope of the probability curve is
downward that t=4 is more likely than t=4.5.
So the expected value of t is closer to 4 than it is to 4.5.
Hence E[t & g=5|i=0] > E[t & g=5|i=1].

Now consider t uniform in [2,3]
Note for i=0, because of the slope of the probability curve is close
to 0, that t is about uniform in [2,3]
So the expected value of t is close to 2.5
Note for i=1, because of the slope of the probability curve is upward
that t=3 is more likely than t=2.
So the expected value of t is closer to 3 than it is to 2.5.
Hence E[t & g=5|i=0] < E[t & g=5|i=1].

Now consider t uniform in [6,7]
Note for i=0, because of the slope of the probability curve is
downward that t=6 is more likely than t=7.
So the expected value of t is closer to 6 than it is to 6.5.
Note for i=1, because of the slope of the probability curve is close
to 0, that t is about uniform in [6,7]
So the expected value of t is close to 6.5
Hence E[t & g=5|i=0] < E[t & g=5|i=1].

What restriction on pr[t] could we add to make it work?
Certainly if pr[t] is uniform across a large range including [2,6].
What if p[t] is arbitrary but within the range [2,6] it is
approximately linear (increasing or decreasing)

I still think if error is uniform in [-1,1], then E[t & g=5|i=0] >=
E[t & g=5|i=1].
  Note the > was replaced by >=
The reason is that the slope will always be 0.


==================================

Oops. The following statement is NOT true!!!
> Then knowing t' = t+error, the expected value of t is t'.

This bug applies whether we add noise to the Type or to Grade function.
Because of Bayesian things, the distribution on type t matters.
-- My pictures had completely forgotten this.
If Error is {-1,1} then really bad things can really happen.
So let Error be uniform in [-1,1].
Even here some things surprising and a little bad can happen.
When i=0, let g = t+0+Error.
When i=1, let g = t+1+Error.
Lets condition on g=5.
When i=0, t must be in the range [4,6].
When i=1, t must be in the range [3,5].
But suppose the distribution on t is such that t can only be in the
range [4,5].
Other values of t, simply dont happen.
This means for both i=0 and i=1, t must be in [4,5].
Within this range, both i=0 and i=1 have the same distribution on t.
Hence, Exp[t|g=5,i=0] = Exp[t|g=5,i=1].
Being equal is not a disaster, but I expected the first to be strictly
bigger.
If Error was Gaussian or at least put a little more weight on Error
close to zero,
then this would be enough to make Exp[t|g=5,i=0] > Exp[t|g=5,i=1].

All the best jeff
Karan - talk to me if you dont understand this.
Any idea how other papers deal with this kind of problem?


\end{verbatim}