Numerical characteristics of a system of two random variables. Correlation moment. Correlation coefficient. Theoretical foundations of mathematical and instrumental methods of economics

Numerical characteristics of a system of two random variables. Correlation moment. Correlation coefficient

We introduced into consideration the numerical characteristics of one random variable X - the initial and central moments of various orders. Of these characteristics, two are the most important: mathematical expectation m x and variance Dx.

Similar numerical characteristics - initial and central moments of various orders - can be introduced for a system of two random variables. The initial moment of order k, s of the system (X, Y) is the mathematical expectation of the product X k on Y s:

M[X k Y s]

The central moment of order k, s of the system (X, Y) is the mathematical expectation products k-th And sth degree corresponding centered quantities:

In practice, only the first and second moments are usually applied.

The first initial moments represent the mathematical expectations of the values ​​X and Y included in the system that are already known to us:

m x and m y

Set of mathematical expectations m x, m y is a characteristic of the position of the system. Geometrically, these are the coordinates of the midpoint on the plane around which the point is scattered (X. Y).

In addition to the first initial moments, the second central moments of the system are also widely used in practice. Two of them represent the dispersions of the X and Y values ​​already known to us.

D[X] and D[Y], characterizing the dispersion of a random point in the direction of the Ox and Oy axes.

A special role as a characteristic of the system is played by the second mixed central point:

μ 1,1 = M,

i.e., the mathematical expectation of the product of centered quantities. Due to the fact that this moment plays an important role in the theory of systems of random variables, a special notation has been introduced for it:

Khu =M[X 0 Y 0 ]=M[(X-m x)(Y- m y)].

The characteristic Kxy is called the correlation moment (otherwise the “moment of connection”) of the random variables X, Y.

For discrete random variables correlation moment expressed by the formula

Kxy =Σ Σ(x i-m x)(y j-m y) p ij

Let us find out the meaning and purpose of this characteristic. The correlation moment is a characteristic of a system of random variables that describes, in addition to the dispersion of the values ​​X and Y, also the connection between them. For independent random variables, the correlation moment equal to zero.

Thus, if the correlation moment of two random variables is different from zero, this is a sign of the presence of a dependence between them.

It is clear from the formula that the correlation moment characterizes not only the dependence of quantities, but also their dispersion. Indeed, if, for example, one of the quantities (X, Y) deviates very little from its mathematical expectation (almost not random), then the correlation moment will be small, no matter how closely the quantities (X, Y) are related. Therefore, to characterize the relationship between quantities (X, Y) in pure form pass from moment to dimensionless characteristic

rху=Кху/σх σу

where σх, σу - average standard deviations quantities X, Y. This characteristic is called correlation coefficient X and Y values.

Obviously, the correlation coefficient goes to zero simultaneously with the correlation moment; therefore, for independent random variables the correlation coefficient is zero.

Random variables for which the correlation moment (and therefore the correlation coefficient) is equal to zero are called uncorrelated (sometimes “unrelated”).

Is the concept of uncorrelated random variables equivalent to the concept of independence. It is known that independent random variables are always uncorrelated. It remains to be seen: is the converse true, does their independence follow from the uncorrelatedness of the quantities? It turns out - no. There are random variables that are uncorrelated but dependent. The equality of the correlation coefficient to zero is a necessary but not sufficient condition for the independence of random variables. The independence of random variables implies that they are uncorrelated; on the contrary, their independence does not follow from the uncorrelated nature of greatness. The condition of independence of random variables is more stringent than the condition of uncorrelatedness.

The correlation coefficient does not characterize any dependence, but only the so-called linear dependence. The linear probabilistic dependence of random variables is that when one random variable increases, the other tends to increase (or decrease) according to a linear law. This trend towards linear dependence may be more or less pronounced, more or less close to functional, i.e., the closest linear dependence. The correlation coefficient characterizes the degree of closeness of the linear relationship between random variables. If random variables X and Y are related by an exact linear functional relationship:

Y = aX + b, then rxy = ±1, and the “plus” or “minus” sign is taken depending on whether the coefficient a is positive or negative. IN general case, when the values ​​of X and Y are related by an arbitrary probabilistic dependence, the correlation coefficient can have a value within the limits:

1 < rху < 1

In the case of r > 0 they speak of a positive correlation between the values ​​of X and Y, in the case of r<0 - об отрицательной корреляции. Положительная корреляция между случайными величинами озна­чает, что при возрастании одной из них другая имеет тенденцию в среднем возрастать; отрицательная корреляция означает, что при возрастании одной из случайных величин другая имеет тенденцию в среднем убывать.

Let us give several examples of random variables with positive and negative correlation.

1.A person’s weight and height are positively correlated.

2. The time spent preparing for classes and the grade received are positively correlated (if, of course, the time is spent wisely). On the contrary, the time spent on preparation and the number of bad marks received are negatively correlated.

3. Two shots are fired at the target; the point of impact of the first shot is recorded, and a correction is introduced into the sight, proportional to the error of the first shot with the opposite sign. The coordinates of the impact points of the first and second shots will be negatively correlated.

If we have at our disposal the results of a number of experiments on a system of two random variables (X, Y), then the presence or absence of a significant correlation between them can be easily judged to a first approximation by a graph on which all pairs of values ​​of random variables obtained from experiment are depicted as points . For example, if the observed pairs of quantity values ​​are arranged as follows



In Chapter 5, we introduced into consideration the numerical characteristics of one random variable - the initial and central moments of various orders. Of these characteristics, two are the most important: mathematical expectation and dispersion.

Similar numerical characteristics - initial and central moments of various orders - can be introduced for a system of two random variables.

The initial moment of order of the system is the mathematical expectation of the product by:

. (8.6.1)

The central moment of the order of a system is the mathematical expectation of the product of the th and th powers of the corresponding centered quantities:

, (8.6.2)

Let us write down the formulas used to directly calculate the moments. For discontinuous random variables

, (8.6.3)

, (8.6.4)

Where - the probability that the system will take the values ​​, and the summation extends over all possible values ​​of the random variables , .

For continuous random variables:

, (8.6.5)

, (8.6.6)

where is the distribution density of the system.

In addition to and , which characterize the order of the moment in relation to individual quantities, the total order of the moment is also considered, equal to the sum of the exponents of and . According to the total order, the moments are classified into first, second, etc. In practice, only the first and second moments are usually used.

The first initial moments represent the already known mathematical expectations of the quantities and included in the system:

The set of mathematical expectations represents a characteristic of the system’s position. Geometrically, these are the coordinates of the midpoint on the plane around which the point is scattered.

In addition to the first initial moments, the second central moments of the system are also widely used in practice. Two of them represent the dispersion of quantities and , already known to us:

characterizing the scattering of a random point in the direction of the and axes.

The second mixed central moment plays a special role as a characteristic of the system:

,

those. mathematical expectation of the product of centered quantities.

Due to the fact that this moment plays an important role in the theory, we introduce a special notation for it:

. (8.6.7)

The characteristic is called the correlation moment (otherwise the “moment of connection”) of random variables, .

For discontinuous random variables, the correlation moment is expressed by the formula

, (8.6.8)

and for continuous ones - by the formula

. (8.6.9)

Let us find out the meaning and purpose of this characteristic.

The correlation moment is a characteristic of a system of random variables that describes, in addition to the dispersion of variables and also the connection between them. To verify this, let us prove that for independent random variables the correlation moment is equal to zero.

We will carry out the proof for continuous random variables. Let , be independent continuous quantities with distribution density . In 8.5 we proved that for the independent quantities

. (8.6.10)

where , are the distribution densities of the values ​​and , respectively.

Substituting expression (8.6.10) into formula (8.6.9), we see that integral (8.6.9) turns into the product of two integrals:

.

Integral

represents nothing more than the first central moment of the quantity , and is therefore equal to zero; for the same reason, the second factor is also zero; therefore, for independent random variables .

Thus, if the correlation moment of two random variables is different from zero, this is a sign of the presence of a dependence between them.

From formula (8.6.7) it is clear that the correlation moment characterizes not only the dependence of quantities, but also their dispersion. Indeed, if, for example, one of the quantities deviates very little from its mathematical expectation (almost not random), then the correlation moment will be small, no matter how closely the quantities are related. Therefore, to characterize the relationship between quantities in its pure form, we move from the moment to the dimensionless characteristic

where , are the standard deviations of the values ​​, . This characteristic is called the correlation coefficient of quantities and. Obviously, the correlation coefficient goes to zero simultaneously with the correlation moment; therefore, for independent random variables the correlation coefficient is zero.

Random variables for which the correlation moment (and therefore the correlation coefficient) is equal to zero are called uncorrelated (sometimes “unrelated”).

Let us find out whether the concept of uncorrelated random variables is equivalent to the concept of independence. We proved above that two independent random variables are always uncorrelated. It remains to be seen whether it is fair reverse position, does their independence follow from the uncorrelatedness of the quantities? It turns out - no. It is possible to construct examples of such random variables that are uncorrelated but dependent. Equality of the correlation coefficient to zero is necessary, but not sufficient condition independence of random variables. The independence of random variables implies that they are uncorrelated; on the contrary, the fact that quantities are uncorrelated does not necessarily mean that they are independent. The condition of independence of random variables is more stringent than the condition of uncorrelatedness.

Let's see this with an example. Let's consider a system of random variables distributed with uniform density inside a circle of radius with a center at the origin (Fig. 8.6.1).

The distribution density of values ​​is expressed by the formula

From the condition we find .

It is easy to see that in this example the quantities are dependent. Indeed, it is immediately clear that if a quantity takes, for example, the value 0, then the quantity can with equal probability take on all values ​​from to ; if the quantity has taken on the value , then the quantity can only take on one single value, exactly equal to zero; in general, the range of possible values ​​depends on what value .

Let's see if these quantities are correlated. Let's calculate the correlation moment. Keeping in mind that for reasons of symmetry, we get:

. (8.6.12)

To calculate the integral, we divide the area of ​​integration (circle) into four sectors corresponding to four coordinate angles. In sectors and the integrand is positive, in sectors and it is negative; in absolute value, the integrals over these sectors are equal; therefore, the integral (8.6.12) is equal to zero, and the quantities are not correlated.

Thus, we see that the uncorrelated nature of random variables does not always imply their independence.

The correlation coefficient does not characterize any dependence, but only the so-called linear dependence. The linear probabilistic dependence of random variables is that when one random variable increases, the other tends to increase (or decrease) according to a linear law. This tendency towards linear dependence can be more or less pronounced, more or less approaching functional, i.e., the closest linear dependence. The correlation coefficient characterizes the degree of closeness of the linear relationship between random variables. If random variables are related by an exact linear functional relationship:

then , and the “plus” or “minus” sign is taken depending on whether the coefficient is positive or negative. In the general case, when the quantities and are related by an arbitrary probabilistic dependence, the correlation coefficient can have a value within the following limits: only the range of change changes, and its average value does not change; Naturally, the quantities turn out to be uncorrelated.

Rice. 8.6.2 Fig.8.6.3

Let us give several examples of random variables with positive and negative correlation.

1. A person’s weight and height are positively correlated.

2. The time spent adjusting the device in preparation for operation and the time of its trouble-free operation are associated with a positive correlation (if, of course, the time is spent wisely). On the contrary, the time spent on preparation and the number of faults detected during operation of the device are negatively correlated.

3. When firing in a salvo, the coordinates of the impact points of individual projectiles are connected by a positive correlation (since there are aiming errors common to all shots, which equally deflect each of them from the target).

4. Two shots are fired at the target; the point of impact of the first shot is recorded, and a correction is introduced into the sight, proportional to the error of the first shot with the opposite sign. The coordinates of the impact points of the first and second shots will be negatively correlated.

If we have at our disposal the results of a number of experiments on a system of random variables, then the presence or absence of a significant correlation between them can easily be judged to a first approximation by a graph on which all pairs of random variable values ​​obtained from experiment are depicted as points. For example, if the observed pairs of quantity values ​​are located as shown in Fig. 8.6.2, then this indicates the presence of a clearly expressed positive correlation between the quantities. An even more pronounced positive correlation, close to a linear functional dependence, is observed in Fig. 8.6.3. In Fig. Figure 8.6.4 shows the case of a relatively weak negative correlation. Finally, in Fig. 8.6.5 illustrates the case of practically uncorrelated random variables. In practice, before examining the correlation of random variables, it is always useful to first plot the observed pairs of values ​​on a graph to make a first qualitative judgment about the type of correlation.

To describe a system of two random variables, in addition to mathematical expectations and variances of the components, other characteristics are used, which include correlation moment And correlation coefficient(briefly mentioned at the end of T.8.p.8.6) .

Correlation moment(or covariance, or moment of connection) two random variables X And Y called m.o. product of deviations of these quantities (see equality (5) clause 8.6):

Corollary 1. For the correlation moment r.v. X And Y the following equalities are also valid:

,

where the corresponding centralized r.v. X And Y (see clause 8.6.).

In this case: if
is a two-dimensional d.s.v., then the covariance is calculated by the formula

(8)
;

If
is a two-dimensional n.s.v., then the covariance is calculated by the formula

(9)

Formulas (8) and (9) were obtained based on formulas (6) in clause 12.1. There is a computational formula

(10)

which is derived from definition (9) and based on the properties of the MO, indeed,

Consequently, formulas (36) and (37) can be rewritten in the form

(11)
;

The correlation moment serves to characterize the relationship between quantities X And Y.

As will be shown below, the correlation moment is equal to zero if X And Y are independent;

Therefore, if the correlation moment is not equal to zero, thenXAndYare dependent random variables.

Theorem 12.1.Correlation moment of two independent random variablesXAndYis equal to zero, i.e. for independent r.v.XAndY,

Proof. Because X And Y independent random variables, then their deviations

And

T also independent. Using the properties of mathematical expectation (the mathematical expectation of the product of independent rvs is equal to the product of the mathematical expectations of the factors
,
, That's why

Comment. From this theorem it follows that if
then s.v. X And Y dependent and in such cases r.v. X And Y called correlated. However, from the fact that
does not follow independence r.v. X And Y.

In this case (
s.v. X And Y called uncorrelated, Thus, from independence follows uncorrelated; the converse statement is, generally speaking, false (see below example 2.)

Let us consider the main properties of the correlation moment.

Ccovariance properties:

1. The covariance is symmetric, i.e.
.

This follows directly from formula (38).

2. There are equalities: i.e. dispersion r.v. is its covariance with itself.

These equalities follow directly from the definition of dispersion and equality (38), respectively, for

3. The following equalities are valid:

These equalities are derived from the definition of variance and covariance of r.v.
And , properties 2.

By definition of dispersion (taking into account the centrality of r.v.
) we have

Now, based on (33) and properties 2 and 3, we obtain the first (with a plus sign) property 3.

Similarly, the second part of property 3 is derived from the equality

4. Let
constant numbers,
then the equalities are valid:

Usually these properties are called the properties of first-order homogeneity and periodicity in arguments.

Let us prove the first equality, and we will use the properties of m.o.
.

Theorem 12.2.Absolute valuecorrelation moment of two arbitrary random variablesXAndYdoes not exceed the geometric mean of their variances: i.e.

Proof. Note that for independent r.v. the inequality holds (see Theorem 12.1.). So, let r.v. X And Y dependent. Let us consider standard r.v.
And
and calculate the dispersion of r.v.
taking into account property 3, we have: on the one hand
On the other side

Therefore, taking into account the fact that
And - normalized (standardized) r.v., then for them m.o. is equal to zero, and the variance is equal to 1, therefore, using the property of m.o.
we get

and therefore, based on the fact that
we get

It follows that i.e.

=

The statement has been proven.

From the definition and properties of covariance it follows that it characterizes both the degree of dependence of r.v. and their scattering around a point
The dimension of covariance is equal to the product of the dimensions of random variables X And Y. In other words, the magnitude of the correlation moment depends on the units of measurement of random variables. For this reason, for the same two quantities X And Y, the magnitude of the correlation moment will have different values ​​depending on the units in which the values ​​were measured.

Let, for example, X And Y were measured in centimeters and
; if measured X And Y in millimeters, then
This feature of the correlation moment is the disadvantage of this numerical characteristic, since comparison of the correlation moments of different systems of random variables becomes difficult.

In order to eliminate this drawback, a new numerical characteristic is introduced - “ correlation coefficient».

Correlation coefficient
random variables
And is called the ratio of the correlation moment to the product of the standard deviations of these quantities:

(13)
.

Since the dimension
equal to the product of the dimensions of quantities
And ,
has the dimension of magnitude
σ y has the dimension of magnitude , That
is just a number (i.e. " dimensionless quantity"). Thus, the value of the correlation coefficient does not depend on the choice of units of measurement of r.v., this is advantage correlation coefficient before the correlation moment.

In T.8. clause 8.3 we introduced the concept normalized s.v.
, formula (18), and the theorem has been proven that
And
(See also Theorem 8.2.). Here we prove the following statement.

Theorem 12.3. For any two random variables
And equality is true
.In other words, the correlation coefficient
any two with
.V.XAndYequal to the correlation moment of their corresponding normalized s.v.
And .

Proof. By definition of normalized random variables
And

And
.

Taking into account the property of mathematical expectation: and equality (40) we obtain

The statement has been proven.

Let's look at some commonly encountered properties of the correlation coefficient.

Properties of the correlation coefficient:

1. The correlation coefficient in absolute value does not exceed 1, i.e.

This property follows directly from formula (41) - the definition of the correlation coefficient and Theorem 13.5. (see equality (40)).

2. If random variables
And are independent, the current correlation coefficient is zero, i.e.
.

This property is a direct consequence of equality (40) and Theorem 13.4.

Let us formulate the following property as a separate theorem.

Theorem 12.4.

If r.v.
And are interconnected by a linear functional dependence, i.e.
That

at the same time

And on the contrary, if
,
That s.v.
And are interconnected by a linear functional dependence, i.e. there are constants
And
such that equality holds

Proof. Let
Then Based on property 4 of covariance, we have

and since, , therefore

Hence,
. Equality in one direction is obtained. Let further
, Then

two cases should be considered: 1)
and 2)
So, let's consider the first case. Then by definition
and therefore from the equality
, Where
. In our case
, therefore from the equality (see the proof of Theorem 13.5.)

=
,

we get that
, Means
is constant. Because
and since then
really,

.

Hence,


.

Similarly, it is shown that for
takes place (check it yourself!)

,
.

Some conclusions:

1. If
And independents.v., then

2. If r.v.
And are linearly related to each other, then
.

3. In other cases
:

In this case they say that r.v.
And interconnected positive correlation, If
in cases
negative correlation. The closer
to one, the more reason to believe that r.v.
And are connected by a linear relationship.

Note that the correlation moments and dispersions of the system of r.v. usually given correlation matrix:

.

Obviously, the determinant of the correlation matrix satisfies:

As already noted, if two random variables are dependent, then they can be like correlated, so uncorrelated. In other words, the correlation moment of two dependent quantities can be not equal to zero, but maybe equal zero.

Example 1. The distribution law of a discrete r.v. is given by the table


Find the correlation coefficient

Solution. Finding the laws of distribution of components
And :


Now let's calculate the m.o. components:

These values ​​could be found on the basis of the r.v. distribution table.

Likewise,
find it yourself.

Let's calculate the variances of the components and use the computational formula:

Let's create a distribution law
, and then we find
:

When compiling a table of the distribution law, you should perform the following steps:

1) leave only different meanings of all possible products
.

2) to determine the probability of a given value
, need to

add up all the corresponding probabilities located at the intersection of the main table that favor the occurrence of a given value.

In our example, r.v. takes only three different values
. Here the first value (
) corresponds to the product
from the second line and
from the first column, so at their intersection there is a probability number
similarly

which is obtained from the sum of the probabilities located at the intersections of the first row and first column, respectively (0.15; 0.40; 0.05) and one value
, which is at the intersection of the second row and the second column, and finally,
, which is at the intersection of the second row and third column.

From our table we find:

We find the correlation moment using formula (38):

Find the correlation coefficient using formula (41)

Thus, a negative correlation.

Exercise. Law of distribution of discrete r.v. given by table


Find the correlation coefficient

Let's look at an example where there are two dependent random variables there may be uncorrelated.

Example 2. Two-dimensional random variable
)
given by the density function

Let's prove that
And dependent , But uncorrelated random variables.

Solution. Let us use the previously calculated distribution densities of the components
And :

Since then
And dependent quantities. To prove uncorrelated
And , it is enough to make sure that

Let's find the correlation moment using the formula:

Since the differential function
symmetrical about the axis OY, That
similarly
, due to symmetry
relative to the axis OX. Therefore, taking out a constant factor

The inner integral is equal to zero (the integrand is odd, the limits of integration are symmetrical with respect to the origin), therefore,
, i.e. dependent random variables
And are not correlated with each other.

So, from the correlation of two random variables, their dependence follows, but from the uncorrelatedness it is still impossible to conclude that these variables are independent.

However, for normally distributed r.v. such a conclusion is except those. from uncorrelated normally distributed s.v. flows them out independence.

The next paragraph is devoted to this issue.

  • Spearman's correlation coefficient: an example of solving the problem

A random variable is described by two numerical characteristics: mathematical expectation and variance. To describe a system of two random variables, in addition to the “main” characteristics, the correlation moment and the correlation coefficient are also used.
Correlation moment µxy random variables X and Y are called the mathematical expectation of the product of deviations of these values:

µ xy = M ( [ X - M(X) ] [ Y - M(Y) ] )

To find the correlation moment of discrete quantities, use the formula:

and for continuous quantities - the formula:

The correlation moment characterizes the presence (absence) of a connection between the quantities X and Y. It will be proven below that the correlation moment is equal to zero if X and Y are independent; If the correlation moment for random variables X and Y is not equal to zero, then there is a dependence between them.

Note 1. Taking into account that deviations are centered random variables, we can define the correlation moment as the mathematical expectation of the product of two centered random variables:

µ xy = M .

Note 2. It is not difficult to prove that the correlation moment can be written in the form

µ xy = М(ХY) – М(X) М(У).

Theorem 1. The correlation moment of two independent random variables X and Y is equal to zero.

Proof. Since X and Y are independent random variables, their deviations X-M (X) and Y-M (Y) are also independent. Using the properties of mathematical expectation (the mathematical expectation of the product of independent random variables is equal to the product of the mathematical expectations of the factors) and deviation (the mathematical expectation of deviation is zero), we obtain

µ xy = М (M) = М M = 0.

From the definition of the correlation moment it follows that it has a dimension equal to the product of the dimensions of the quantities X and Y. In other words, the magnitude of the correlation moment depends on the units of measurement of random variables. For this reason, for the same two quantities, the magnitude of the correlation moment has different values ​​depending on the units in which the quantities were measured. Let, for example, X and Y be measured in centimeters and µxy = 2 cm2; if you measure X and Y in millimeters,
then µxy = 200 mm. This feature of the correlation moment is a disadvantage of this numerical characteristic, since comparison of the correlation moments of different systems of random variables becomes difficult. In order to eliminate this drawback, a new numerical characteristic is introduced - correlation coefficient.
Correlation coefficient r xy of random variables X and Y is the ratio of the correlation moment to the product of the standard deviations of these
quantities:

r xy = µ xy /σ x σ y

Since the dimension µxy is equal to the product of the dimensions of the quantities X and Y, σ x has the dimension of the quantity X, σ y has the dimension of the quantity Y, then r xy is a dimensionless quantity. Thus, the value of the correlation coefficient does not depend on the choice of units of measurement of random variables. This is the advantage of the correlation coefficient over the correlation moment.
Obviously, the correlation coefficient of independent random variables is zero (since µ xy = 0).

Note 3. In many questions of probability theory, it is advisable to consider, instead of the random variable X, the normalized random variable X", which is defined as the ratio of the deviation to the standard deviation:

X" = (X - M(X))/σ x.

The normalized quantity has a mathematical expectation equal to zero and a variance equal to one. Indeed, using the properties of mathematical expectation and dispersion, we have:

It is easy to verify that the correlation coefficient r xy is equal to the correlation moment of the normalized values ​​X" and Y":

Theorem 2. The absolute value of the correlation moment of two random variables X and Y does not exceed the geometric mean of their variances:

Proof. Let us introduce the random variable Z 1 = σ y X - σ x Y into consideration and find its variance D(Z l) = M 2 . Having carried out the calculations, we get

D(Z 1) = 2σ x 2 σ y 2 – 2σ x σ y µ xy

Any variance is non-negative, so

2σ x 2 σ y 2 – 2σ x σ y µ xy ≥0.

µ xy ≤ σ x σ y .

By introducing the random variable Z t = σ y X+ σ x Y, we similarly find

µ xy ≥ − σ x σ y .

Let us combine these two inequalities:

σ x σ y ≤ µ xy ≤ σ x σ y or | µxy | ≤ σ x σ y

Theorem 3. The absolute value of the correlation coefficient does not exceed one:

Proof: Let us divide both sides of the resulting double inequality by the product of positive numbers σxσy:

1 ≤ r xy ≤ 1

STATE COMMITTEE FOR SCIENCE AND TECHNOLOGY OF THE REPUBLIC OF AZERBAIJAN

BAKU RESEARCH AND TRAINING CENTER

GRADUATE STUDENT OF THE DEPARTMENT OF PEDIATRIC SURGERY

AMU named after N. NARIMANOV

MUKHTAROVA EMIL GASAN ogly

CORRELATION MOMENTS. COEFFICIENT CORRELATION

INTRODUCTION

Probability theory is a mathematical science that studies patterns in random phenomena.

What is meant by random phenomena?

In the scientific study of physical and technical problems, one often encounters phenomena of a special type, which are usually called random. Random phenomenon- this is a phenomenon that, when the same experience is repeated repeatedly, proceeds somewhat differently.

Let's give an example of a random phenomenon.

The same body is weighed several times on an analytical balance: the results of repeated weighings are somewhat different from each other. These differences are due to the influence of various secondary factors accompanying the weighing operation, such as random vibrations of the equipment, errors in reading the instrument, etc.

It is obvious that there is not a single physical phenomenon in nature in which elements of randomness are not present to one degree or another. No matter how accurately and in detail the experimental conditions are fixed, it is impossible to ensure that when the experiment is repeated, the results coincide completely and exactly.

Accidents inevitably accompany any natural phenomenon. However, in a number of practical problems these random elements can be neglected, considering its simplified diagram instead of a real phenomenon, i.e. model, and assuming that under the given experimental conditions the phenomenon occurs in a very definite way. At the same time, from the countless number of factors influencing this phenomenon, the most important, fundamental, and decisive ones are singled out. The influence of other, minor factors is simply neglected. When studying patterns within the framework of a certain theory, the main factors influencing a particular phenomenon are included in the concepts or definitions with which the theory in question operates.

Like any science that develops a general theory of any range of phenomena, probability theory also contains a number of basic concepts on which it is based. Naturally, not all basic concepts can be strictly defined, since to define a concept means to reduce it to other, more well-known ones. This process must be finite and end with primary concepts that are only explained.

One of the first concepts in probability theory is the concept of an event.

Under event refers to any fact that may or may not occur as a result of experience.

Let's give examples of events.

A - the birth of a boy or girl;

B - selection of one or another opening in a chess game;

C - belonging to one or another zodiac sign.

Considering the above events, we see that each of them has some degree of possibility: some greater, others less. In order to quantitatively compare events with each other according to the degree of their possibility, obviously, it is necessary to associate a certain number with each event, which is greater, the more possible the event is. This number is called the probability of an event. Thus, the probability of an event is a numerical characteristic of the degree of objective possibility of an event.

The unit of probability is taken to be the probability of a reliable event equal to 1, and the range of changes in the probabilities of any events is a number from 0 to 1.

Probability is usually denoted by the letter P.

Let's look at the example of the eternal problem of Shakespeare's Hamlet “to be or not to be?” How can you determine the probability of an event?

It is quite obvious that a person, an object and any other phenomenon can be in one of two and no more states: presence (“to be”) and absence (“not to be”). That is, there are two possible events, but only one can happen. This means that the probability of, for example, existence is 1/2.

In addition to the concept of event and probability, one of the main concepts of probability theory is the concept of a random variable.

Random variable is a quantity that, as a result of experiment, can take on one or another value, and it is not known in advance which one.

Random variables that take only values ​​that are separate from each other and that can be listed in advance are called continuous or discrete random variables.

For example:

1. Number of surviving and deceased patients.

2. The total number of children from patients admitted to the hospital overnight.

Random variables whose possible values ​​continuously fill a certain interval are called continuous random variables.

For example, weighing error on an analytical balance.

Note that modern probability theory primarily operates with random variables, rather than with events, which the “classical” probability theory was mainly based on.

CORRELATION MOMENTS. COEFFICIENT OF CORRELATION.

Correlation moments, correlation coefficient - these are numerical characteristics that are closely related to the concept of a random variable introduced above, or more precisely with a system of random variables. Therefore, to introduce and define their meaning and role, it is necessary to explain the concept of a system of random variables and some properties inherent in them.

Two or more random variables that describe some phenomenon are called system or complex of random variables.

A system of several random variables X, Y, Z, …, W is usually denoted by (X, Y, Z, …, W).

For example, a point on a plane is described not by one coordinate, but by two, and in space - even by three.

The properties of a system of several random variables are not limited to the properties of individual random variables included in the system, but also include mutual connections (dependencies) between random variables. Therefore, when studying a system of random variables, one should pay attention to the nature and degree of dependence. This dependence may be more or less pronounced, more or less close. And in other cases, random variables turn out to be practically independent.

The random variable Y is called independent from a random variable X, if the distribution law of the random variable Y does not depend on what value the variable X took.

It should be noted that the dependence and independence of random variables is always a mutual phenomenon: if Y does not depend on X, then the value X does not depend on Y. Taking this into account, we can give the following definition of the independence of random variables.

Random variables X and Y are called independent if the distribution law of each of them does not depend on what value the other takes. Otherwise, the values ​​of X and Y are called dependent.

Law of distribution A random variable is any relation that establishes a connection between the possible values ​​of a random variable and their corresponding probabilities.

The concept of “dependence” of random variables, which is used in probability theory, is somewhat different from the usual concept of “dependence” of variables, which is used in mathematics. Thus, a mathematician by “dependence” means only one type of dependence - complete, rigid, so-called functional dependence. Two quantities X and Y are called functionally dependent if, knowing the value of one of them, you can accurately determine the value of the other.

In probability theory, there is a slightly different type of dependence - probabilistic dependence. If the value Y is related to the value X by a probabilistic dependence, then, knowing the value of X, it is impossible to accurately indicate the value of Y, but you can indicate its distribution law, depending on what value the value X has taken.

The probabilistic relationship may be more or less close; As the tightness of the probabilistic dependence increases, it becomes closer and closer to the functional one. Thus, functional dependence can be considered as an extreme, limiting case of the closest probabilistic dependence. Another extreme case is the complete independence of random variables. Between these two extreme cases lie all gradations of probabilistic dependence - from the strongest to the weakest.

Probabilistic dependence between random variables is often encountered in practice. If random variables X and Y are in a probabilistic relationship, this does not mean that with a change in the value of X, the value of Y changes in a completely definite way; this only means that with a change in the value of X, the value of Y

tends to also change (increase or decrease as X increases). This trend is observed only in general terms, and in each individual case deviations from it are possible.

Examples of probabilistic dependence.

Let's select one patient with peritonitis at random. random variable T is the time from the onset of the disease, random variable O is the level of homeostatic disturbances. There is a clear relationship between these values, since the T value is one of the most important reasons determining the O value.

At the same time, there is a weaker probabilistic relationship between the random variable T and the random variable M, which reflects mortality in a given pathology, since the random variable, although it influences the random variable O, is not the main determinant.

Moreover, if we consider the T value and the B value (the age of the surgeon), then these values ​​are practically independent.

So far we have discussed the properties of systems of random variables, giving only verbal explanation. However, there are numerical characteristics through which the properties of both individual random variables and a system of random variables are studied.

Did you like the article? Share with friends: