Celeb Glow
updates | April 20, 2026

Meaning of Fisher's information

$\begingroup$

If I am correct, Fisher's information at parameter $\theta$ is defined to be the variance of the score function at $\theta$. The score function is defined as the derivative of the log-likelhood function wrt $\theta$, and therefore measures the sensitivity of the log-likelihood function wrt $\theta$.

I was wondering how to understand the meaning of Fisher's information?

Especially, why does Wikipedia say:

The Fisher information is a way of measuring the amount of information that an observable random variable $X$ carries about an unknown parameter $θ$ upon which the probability of $X$ depends.

What kind of information is in "the amount of information"? Shannon information, no?

Why is the "information" carried by $X$ about $\theta$?

Thanks and regards!

$\endgroup$

2 Answers

$\begingroup$

"Information" is an abstract concept that may be quantified in a number of different ways. Shannon's approach was to compress the data as much as possible and then to count the number of bits needed in the most compressed form. Fisher's approach is radically different and is closer to what laymen intuitively think. If I give you data on death rate of rats in China and ask you to estimate the population of Cuba based on that, you'll surely say that the data contains no information about the quantity to be estimated. Generalizing this, information may be quantified as follows: Try your "best" to estimate the quantity of interest based on the data, see how "well" you have performed. A natural choice for "best" is maximum likelihood estimation (MLE). A natural choice for "well" is to consider the variance of the MLE. Smaller the variance, more the "information". So consider 1/variance. If sample size is large then its limiting behavior gives you Fisher info.

$\endgroup$ $\begingroup$

Just as an example, let's say we have a uniformly distributed random variable $X$. Then $X$ is dependant on two parameters, where max and min, or average and span are the usual ones. If you have observed $X$ you can say something about those parameters, just from what $X$-s you have observed.

Say we have observed the following values of the uniformly distributed integer random variable $X$: $$ 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0 $$ Wouldn't you agree that we can with some certainty conclude that $\max(X)=1$ and $\min(X) = 0$?

To show an actual Fisher information example, let's instead say that the random variable $X$ is either $0$ with some probability $\theta$ or $1$ with probability $(1-\theta)$. Thus $f_X(0;\theta) = \theta$ and $f_X(1;\theta) = 1-\theta$ The Fisher information of $\theta$ is the value $$ \mathcal I(\theta) =E\left[\left(\frac{\partial}{\partial \theta}f_X(X;\theta)\right)^2\Bigg |\theta\right] = \left(\frac{\partial}{\partial \theta}\ln f_X(0;\theta)\right)^2f_X(0;\theta) + \left(\frac{\partial}{\partial \theta}\ln f_X(1;\theta)\right)^2f_X(1;\theta) \\\\ = \frac{1}{\theta^2}\cdot \theta + \frac{1}{(1-\theta)^2}(1-\theta) = \frac{1}{\theta(1-\theta)} $$ and this function measures how much information observations of $X$ gives about $\theta$. According to wikipedia a function with large values means observations give much information. Together with the observed most likely estimate of $\theta$ as $0.5$, we get that $\mathcal I(0.5) = 4$. I do not have enough experience with Fisher information to tell you if this specific value is "large".

$\endgroup$ 7

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy