Correlation coefficient

Definition

Correlationisanon-deterministicrelationship,andthecorrelationcoefficientistheamountoflinearcorrelationbetweenresearchvariables.Duetodifferentresearchobjects,thecorrelationcoefficientcanbedefinedinthefollowingways.

Simplecorrelationcoefficient:alsocalledcorrelationcoefficientorlinearcorrelationcoefficient,generallyrepresentedbytheletterr,usedtomeasurethelinearrelationshipbetweentwovariables.

Definingformula

Amongthem,Cov(X,Y)isthecovarianceofXandY,Var[X]isthevarianceofX,Var[Y]isthevarianceofY

Multiplecorrelationcoefficient:alsocalledmultiplecorrelationcoefficient.Multiplecorrelationreferstothecorrelationbetweenthedependentvariableandmultipleindependentvariables.Forexample,thereisacomplexcorrelationbetweentheseasonaldemandforacertaincommodityanditspricelevel,employeeincomelevelandotherphenomena.

Thecanonicalcorrelationcoefficient:firstlyconductprincipalcomponentanalysisontheoriginalsetofvariablestoobtainthecomprehensiveindexofthenewlinearrelationship,andthenusethelinearcorrelationcoefficientbetweenthecomprehensiveindexestostudythecorrelationbetweentheoriginalsetofvariablesrelation.

Nature

Here,,isonethatcancharacterizeand

Theamountofclosenessofthelinearrelationship.Ithastwoproperties:

(1)

(2)Thenecessaryandsufficientconditionofisthatthereisaconstanta,b,making

derivedfromproperties:

a.ThecorrelationcoefficientquantitativelydescribesthedegreeofcorrelationbetweenXandY,namely

Thebigger,thegreaterthedegreeofcorrelation;correspondstothelowestdegreeofcorrelation;

b.XandYarecompletelycorrelatedmeansthatthereisalinearrelationshipwiththeprobabilityof1,soisaquantitythatcancharacterizetheclosenessofthelinearrelationshipbetweenXandY.Whenislarge,itisusuallysaidthatXandYarerelatedbetter;whenissmall,itisusuallysaidthatXandYarelessrelated;whenXandYarenotrelated,ItisgenerallybelievedthatthereisnolinearrelationshipbetweenXandY,butitcannotberuledoutthattheremaybeotherrelationshipsbetweenXandY.

Irrelevantandindependent

IfXandYareirrelevant,,itisgenerallyconsideredthatthereisnolinearrelationshipbetweenXandY,butXcannotberuledoutTheremaybeotherrelationshipsbetweenXandY;if,XandYarenotrelated.

IfXandYareindependent,theremustbe,soXandYarenotrelated;ifXandYarenotrelated,thereissimplynolinearrelationship,andotherrelationshipsmayexist,Suchas,XandYarenotindependent.

Therefore,"irrelevant"isaweakerconceptthan"independent".

Examplesoflife

Softwarecompanieshavemanyagentsthroughoutthecountry.Inordertostudytherelationshipbetweenadvertisinginvestmentandsalesofitsfinancialsoftwareproducts,statisticiansrandomlyselected10agentstoconductresearch.Observe,collecttheannualadvertisingexpenditureandmonthlyaveragesalesdata,andcompilethemintorelatedtables,seeTable1:

Annualadvertisingexpenditure

12.5

15.3

23.2

26.4

33.5

34.4

39.4

45.2

55.4

60.9

monthAveragesales

Table1Advertisingexpensesandmonthlyaveragesalesrelatedtableunit:tenthousandYuan

21.2

23.9

32.9

34.1

42.5

43.2

49.0

52.8

59.4

63.5

RefertoTable1,andthecorrelationcoefficientcanbecalculatedasTable2:

Serialnumber

Advertisinginvestment(tenthousandyuan)

x

Averagemonthlysales(tenthousandyuan))

y

1

2

3

4

5

6

7

8

9

10

12.5

15.3

23.2

26.4

33.5

34.4

39.4

45.2

55.4

60.9

21.2

23.9

32.9

34.1

42.5

43.2

49.0

52.8

59.4

63.5

156.25

234.09

538.24

696.96

1122.25

1183.36

1552.36

2043.04

3069.16

3708.81

449.44

571.21

1082.41

1162.81

1806.25

1866.24

2401.00

2787.84

3528.36

4032.25

265.00

365.67

763.28

900.24

1423.75

1486.08

1930.60

2386.56

3290.76

3867.15

Total

346.2

422.5

14304.52

19687.81

16679.09

Thecorrelationcoefficientis0.9942,indicatingthatthereisahighdegreeoflinearpositivecorrelationbetweenadvertisingexpenditureandmonthlyaveragesales.

Application

ProbabilityTheory

[Example]Ifacoinistossedntimes,Xrepresentsthenumberofheadsinntrials,andYrepresentsnThenumberofnegativesinthistrial.CalculateρXY.

Solution:SinceX+Y=n,thenY=-X+n,inferringfromthenatureofthecorrelationcoefficient,ρXY=−1.

Enterpriselogistics

[Example]Anewproductislaunched.Beforelisting,thecompany’slogisticsdepartmentneedstoallocatethenewproductsto10warehousesacrossthecountry.Onemonthafterthenewproductsarelisted,itisnecessarytoevaluatewhethertheactualdistributionplanisbetterthantheotherdistributionplansconsideredbefore,orwhetherithasnotbeenused.Thedistributionplanisbetter.Throughthisevaluation,amoreaccurateproductdistributionplancanbeusedinthenextnewproductlaunchtoavoidbacklogandoutofstockduetodistribution.Table1isanumbertablebasedonactualdata.

Throughcalculations,itiseasytofindthatamongthethreeallocationplans,thecorrelationcoefficientofBisthelargest.Inthisway,itisestimatedthattheallocationplanofBisbetterthantheactualallocationplanA.InthenextnewproductInthelistingdistributionplan,thedistributionmethodBcanbeconsideredtocalculatetheactualdistributionplan.

Clusteranalysis

[Example]Ifthereareseveralsamples,andeachsamplehasnfeatures,thecorrelationcoefficientcanindicatethedegreeofsimilaritybetweenthetwosamples.Inthisway,distanceclusteringcanbeperformedontheclosenessandclosenessofthesamples.Forexample,9wheatvarieties(respectivelyuseA1,A2,...,A9)The6traitsdataareshowninTable2,andthecorrelationcoefficientiscalculatedandtested.

Thecorrelationcoefficientamongthe6traitscanbecalculatedfromthecorrelationcoefficientcalculationformula.TheanalysisandtestresultsareshowninTable3.ItcanbeseenfromTable3thatthereisanegativecorrelationbetweentilleringinwinterandthenumberofgrainsperspike(ρ=−0.8982),thatis,themoretillersinwinter,thelessthenumberofgrainsperspike,andtherelationshipbetweenothertraitsisnotSignificantly.

Disadvantages

Itshouldbepointedoutthatthecorrelationcoefficienthasanobviousdisadvantage,thatis,itscloseto1isrelatedtothenumberofdatasetsn,whichiseasytogivepeopleanillusion.Thisisbecausewhennissmall,thecorrelationcoefficientfluctuatesgreatly,andtheabsolutevalueofthecorrelationcoefficientiseasilycloseto1forsomesamples;whennislarge,theabsolutevalueofthecorrelationcoefficientislikelytobesmall.Especiallywhenn=2,theabsolutevalueofthecorrelationcoefficientisalways1.Therefore,whenthesamplesizenissmall,itisnotappropriateforustojudgethatthereisacloselinearrelationshipbetweenthevariablesxandybasedonthelargecorrelationcoefficient.

Seealso

Rankcorrelationcoefficient

kendallrankcorrelationcoefficient

spearmancorrelationcoefficient

Related Articles
TOP