Definition
Correlationisanon-deterministicrelationship,andthecorrelationcoefficientistheamountoflinearcorrelationbetweenresearchvariables.Duetodifferentresearchobjects,thecorrelationcoefficientcanbedefinedinthefollowingways.
Simplecorrelationcoefficient:alsocalledcorrelationcoefficientorlinearcorrelationcoefficient,generallyrepresentedbytheletterr,usedtomeasurethelinearrelationshipbetweentwovariables.
Definingformula
Amongthem,Cov(X,Y)isthecovarianceofXandY,Var[X]isthevarianceofX,Var[Y]isthevarianceofY
Multiplecorrelationcoefficient:alsocalledmultiplecorrelationcoefficient.Multiplecorrelationreferstothecorrelationbetweenthedependentvariableandmultipleindependentvariables.Forexample,thereisacomplexcorrelationbetweentheseasonaldemandforacertaincommodityanditspricelevel,employeeincomelevelandotherphenomena.
Thecanonicalcorrelationcoefficient:firstlyconductprincipalcomponentanalysisontheoriginalsetofvariablestoobtainthecomprehensiveindexofthenewlinearrelationship,andthenusethelinearcorrelationcoefficientbetweenthecomprehensiveindexestostudythecorrelationbetweentheoriginalsetofvariablesrelation.
Nature
Here,,isonethatcancharacterizeandTheamountofclosenessofthelinearrelationship.Ithastwoproperties:
(1)
(2)Thenecessaryandsufficientconditionofisthatthereisaconstanta,b,making
derivedfromproperties:
a.ThecorrelationcoefficientquantitativelydescribesthedegreeofcorrelationbetweenXandY,namelyThebigger,thegreaterthedegreeofcorrelation;correspondstothelowestdegreeofcorrelation;
b.XandYarecompletelycorrelatedmeansthatthereisalinearrelationshipwiththeprobabilityof1,soisaquantitythatcancharacterizetheclosenessofthelinearrelationshipbetweenXandY.Whenislarge,itisusuallysaidthatXandYarerelatedbetter;whenissmall,itisusuallysaidthatXandYarelessrelated;whenXandYarenotrelated,ItisgenerallybelievedthatthereisnolinearrelationshipbetweenXandY,butitcannotberuledoutthattheremaybeotherrelationshipsbetweenXandY.
Irrelevantandindependent
IfXandYareirrelevant,,itisgenerallyconsideredthatthereisnolinearrelationshipbetweenXandY,butXcannotberuledoutTheremaybeotherrelationshipsbetweenXandY;if,XandYarenotrelated.
IfXandYareindependent,theremustbe,soXandYarenotrelated;ifXandYarenotrelated,thereissimplynolinearrelationship,andotherrelationshipsmayexist,Suchas,XandYarenotindependent.
Therefore,"irrelevant"isaweakerconceptthan"independent".
Examplesoflife
Softwarecompanieshavemanyagentsthroughoutthecountry.Inordertostudytherelationshipbetweenadvertisinginvestmentandsalesofitsfinancialsoftwareproducts,statisticiansrandomlyselected10agentstoconductresearch.Observe,collecttheannualadvertisingexpenditureandmonthlyaveragesalesdata,andcompilethemintorelatedtables,seeTable1:
21.2 | 23.9 | 32.9 | 34.1 | 42.5 | 43.2 | 49.0 | 52.8 | 59.4 | 63.5 |
RefertoTable1,andthecorrelationcoefficientcanbecalculatedasTable2:
Serialnumber | Advertisinginvestment(tenthousandyuan) x | Averagemonthlysales(tenthousandyuan)) y | |||
---|---|---|---|---|---|
1 2 3 4 5 6 7 8 9 10 | 12.5 15.3 23.2 26.4 33.5 34.4 39.4 45.2 55.4 60.9 | 21.2 23.9 32.9 34.1 42.5 43.2 49.0 52.8 59.4 63.5 | 156.25 234.09 538.24 696.96 1122.25 1183.36 1552.36 2043.04 3069.16 3708.81 | 449.44 571.21 1082.41 1162.81 1806.25 1866.24 2401.00 2787.84 3528.36 4032.25 | 265.00 365.67 763.28 900.24 1423.75 1486.08 1930.60 2386.56 3290.76 3867.15 |
Total | 346.2 | 422.5 | 14304.52 | 19687.81 | 16679.09 |
Thecorrelationcoefficientis0.9942,indicatingthatthereisahighdegreeoflinearpositivecorrelationbetweenadvertisingexpenditureandmonthlyaveragesales.
Application
ProbabilityTheory
[Example]Ifacoinistossedntimes,Xrepresentsthenumberofheadsinntrials,andYrepresentsnThenumberofnegativesinthistrial.CalculateρXY.
Solution:SinceX+Y=n,thenY=-X+n,inferringfromthenatureofthecorrelationcoefficient,ρXY=−1.
Enterpriselogistics
[Example]Anewproductislaunched.Beforelisting,thecompany’slogisticsdepartmentneedstoallocatethenewproductsto10warehousesacrossthecountry.Onemonthafterthenewproductsarelisted,itisnecessarytoevaluatewhethertheactualdistributionplanisbetterthantheotherdistributionplansconsideredbefore,orwhetherithasnotbeenused.Thedistributionplanisbetter.Throughthisevaluation,amoreaccurateproductdistributionplancanbeusedinthenextnewproductlaunchtoavoidbacklogandoutofstockduetodistribution.Table1isanumbertablebasedonactualdata.
Throughcalculations,itiseasytofindthatamongthethreeallocationplans,thecorrelationcoefficientofBisthelargest.Inthisway,itisestimatedthattheallocationplanofBisbetterthantheactualallocationplanA.InthenextnewproductInthelistingdistributionplan,thedistributionmethodBcanbeconsideredtocalculatetheactualdistributionplan.
Clusteranalysis
[Example]Ifthereareseveralsamples,andeachsamplehasnfeatures,thecorrelationcoefficientcanindicatethedegreeofsimilaritybetweenthetwosamples.Inthisway,distanceclusteringcanbeperformedontheclosenessandclosenessofthesamples.Forexample,9wheatvarieties(respectivelyuseA1,A2,...,A9)The6traitsdataareshowninTable2,andthecorrelationcoefficientiscalculatedandtested.
Thecorrelationcoefficientamongthe6traitscanbecalculatedfromthecorrelationcoefficientcalculationformula.TheanalysisandtestresultsareshowninTable3.ItcanbeseenfromTable3thatthereisanegativecorrelationbetweentilleringinwinterandthenumberofgrainsperspike(ρ=−0.8982),thatis,themoretillersinwinter,thelessthenumberofgrainsperspike,andtherelationshipbetweenothertraitsisnotSignificantly.
Disadvantages
Itshouldbepointedoutthatthecorrelationcoefficienthasanobviousdisadvantage,thatis,itscloseto1isrelatedtothenumberofdatasetsn,whichiseasytogivepeopleanillusion.Thisisbecausewhennissmall,thecorrelationcoefficientfluctuatesgreatly,andtheabsolutevalueofthecorrelationcoefficientiseasilycloseto1forsomesamples;whennislarge,theabsolutevalueofthecorrelationcoefficientislikelytobesmall.Especiallywhenn=2,theabsolutevalueofthecorrelationcoefficientisalways1.Therefore,whenthesamplesizenissmall,itisnotappropriateforustojudgethatthereisacloselinearrelationshipbetweenthevariablesxandybasedonthelargecorrelationcoefficient.
Seealso
Rankcorrelationcoefficient
kendallrankcorrelationcoefficient
spearmancorrelationcoefficient