Support Vector Machines - mascotasohana

History

SVMisaclassifierdevelopedfromthegeneralizedportraitalgorithminpatternrecognition.ItsearlyworkcamefromformerSovietscholarsVladimirN.VapnikandAlexanderY.ResearchpublishedbyLernerin1963.In1964,VapnikandAlexeyY.ChervonenkisfurtherdiscussedthegeneralizedportraitalgorithmandestablishedalinearSVMwithhardmargins.Sincethen,inthe1970sand1980s,withthetheoreticalresearchofthemaximummargindecisionboundaryinpatternrecognition,theemergenceofslackvariable-basedplanningproblemsolvingtechniques,andtheVCdimension(Vapnik-Chervonenkisdimension,VCdimension)),SVMwasgraduallytheorizedandbecameapartofstatisticallearningtheory.In1992,BernhardE.Boser,IsabelleM.GuyonandVapnikobtainedthenonlinearSVMthroughthekernelmethod.In1995,CorinnaCortesandVapnikproposedasoft-marginnonlinearSVMandappliedittohandwrittencharacterrecognitionproblems.Thisresearchhasreceivedattentionandcitationsafterpublication,providingareferencefortheapplicationofSVMinvariousfields.

Theory

Linearclassification

Linearseparability(linearseparability)

Giventheinputdataandlearningobjectivesintheclassificationproblem:,whereeachsampleoftheinputdatacontainsmultiplefeaturesandthusconstitutesafeaturespace:

,andthelearningobjectiveisthebinaryvariable,whichmeansnegativeclassandpositiveclass.

Ifthereisahyperplaneasadecisionboundaryinthefeaturespacewheretheinputdataislocated,thelearningtargetsareseparatedintopositiveandnegativeclasses,andthedistancefromthepointtotheplaneofanysampleisgreaterthanorequalto1:

Itissaidthattheclassificationproblemislinearlyseparable,andtheparametersarethenormalvectorandinterceptofthehyperplanerespectively.

Thedecisionboundarythatmeetsthisconditionactuallyconstructstwoparallelhyperplanesastheintervalboundarytodistinguishtheclassificationofthesample:

AllupperintervalsThesamplesabovetheboundarybelongtothepositiveclass,andthesamplesbelowthelowerintervalboundarybelongtothenegativeclass.Thedistancebetweenthetwointervalboundariesisdefinedasthemargin,andthepositiveandnegativesampleslocatedontheintervalboundaryarethesupportvectors.

lossfunction

Whenaclassificationproblemisnotlinearlyseparable,usingahyperplaneasadecisionboundarywillbringclassificationloss,Thatis,partofthesupportvectorisnolongerlocatedontheboundaryoftheinterval,butenterstheboundaryoftheinterval,orfallsonthewrongsideofthedecisionboundary.Thelossfunctioncanquantifytheclassificationloss.Accordingtothemathematicalmeaning,the0-1lossfunctioncanbeobtained:

0-1lossfunctionisnotacontinuousfunction,whichisnotconducivetooptimizationTosolvetheproblem,theusualchoiceistoconstructasurrogateloss.Theavailableoptionsincludehingeloss,logisticloss,andexponentialloss.SVMuseshingeloss:

Consistencyresearchonsubstitutionlossshowsthatwhentheproxylossisacontinuousconvexfunction,andistheupperboundofthe0-1lossfunctionatanyvalue,theresultofsolvingtheproxylossminimizationisalsothe0-1lossminimizationThesolution.Thehingelossfunctionsatisfiestheaboveconditions.

Empiricalriskandstructuralrisk

_{See:StatisticalLearningTheory}

Accordingtothestatisticallearningtheory,theclassifierwillgenerateriskswhenitislearnedandappliedtonewdata.Thetypesofriskscanbedividedintoempiricalrisksandstructuralrisks:

Therepresentstheclassifier,theempiricalriskisdefinedbythelossfunction,whichdescribestheaccuracyoftheclassificationresultsgivenbytheclassifier;thestructuralriskisdefinedbythenormoftheclassifierparametermatrix,whichdescribestheclassifieritselfThedegreeofcomplexityandstabilityofthecomplexclassifierispronetooverfitting,soitisunstable.Ifaclassifierdeterminesitsmodelparametersbyminimizingthelinearcombinationofempiricalriskandstructuralrisk:

Thenthesolutiontotheclassifierisaregularizationproblem,constant

istheregularizationcoefficient.When,theformulaiscalledL₂regularizationorTikhonovregularization(Tikhonovregularization).ThestructuralriskofSVMisrepresentedby.Underlinearlyseparableproblems,theempiricalriskofhard-boundarySVMcanbereducedto0,soitisaclassifierthatcompletelyminimizesstructuralrisks;inlinearlyinseparableproblems,Theempiricalriskofsoft-boundarySVMcannotbereducedtozero,soitisanL₂regularizedclassifierthatminimizesthelinearcombinationofstructuralriskandempiricalrisk.

Kernelmethod

_Mainarticle:_Kernelmethod

SomelinearlyinseparableproblemsmaybenonlinearItisseparable,thatis,thereisahypersurfaceinthefeaturespacetoseparatethepositiveclassfromthenegativeclass.Usingnonlinearfunctions,nonlinearseparableproblemscanbemappedfromtheoriginalfeaturespacetohigher-dimensionalHilbertspace(Hilbertspace),whichcanthenbetransformedintolinearlyseparableproblems.Thehyperplaneofthedecisionboundaryisexpressedasfollows:

whereisthemappingfunction.Sincethemappingfunctionhasacomplicatedformanditisdifficulttocalculateitsinnerproduct,thekernelmethodcanbeused,thatis,theinnerproductofthemappingfunctionisdefinedasakernelfunction:toavoidtheinnerproductExplicitcalculationof.

Mercer'stheorem(Mercer'stheorem)

Thechoiceofkernelfunctionrequirescertainconditions,functionThenecessaryandsufficientconditionforbeingakernelfunctionisthatforanyvectorintheinputspace:,itskernelmatrix,thatis,theGrammatrixofthefollowingform:

isapositivesemi-definitematrix.TheaboveconclusioniscalledMercer'stheorem.Theproofofthetheoremisomitted,conclusively,asasufficientcondition:theinnerproductoftwofunctionsinthecharacteristicspaceisabinaryfunction,andwhenitskernelmatrixisapositivesemi-definitematrix,thebinaryfunctionisreproducible:

,soitsinnerproductspaceisanormedvectorspace(normedvectorspace),whichcanbecompletedtoobtainHilbertspace,namelyReproducingKernelHilbertSpace(RKHS).Asanecessarycondition,afterconstructingthekernelmatrixforthekernelfunction,itiseasytoknow:.

Commonkernelfunction

Afterconstructingthekernelfunction,itisdifficulttoverifythatitisapositivesemi-definitematrixforanyGrammatrixintheinputspace.Therefore,theusualchoiceistouseaready-madekernelfunction.Someexamplesofkernelfunctionsaregivenbelow.Theunexplainedparametersareallhyper-parametersofthekernelfunction:

NameAnalyticformulapolynomialkernelRadialbasisfunctionkernel(RBFkernel)LaplaciankernelSigmoidkernel

Whentheorderofthepolynomialkernelis1Whenitiscalledalinearkernel,thecorrespondingnonlinearclassifierdegeneratesintoalinearclassifier.TheRBFkernelisalsocalledaGaussiankernel,anditscorrespondingmappingfunctionmapsthesamplespacetoaninfinitedimensionalspace.ThelinearcombinationandCartesianproductofkernelfunctionsarealsokernelfunctions.Inaddition,forthefunctioninthefeaturespace,isalsoakernelfunction.

Algorithm

StandardAlgorithm

LinearSVM(linearSVM)

1.HardHardmargin

Giventheinputdataandlearningobjective:,hardmarginSVMistosolvethemaximummarginhyperplaneinlinearseparableproblems(Maximum-marginhyperplane)algorithm,theconstraintconditionisthatthedistancefromthesamplepointtothedecisionboundaryisgreaterthanorequalto1.Thehard-boundarySVMcanbetransformedintoanequivalentquadraticconvexoptimizationproblemforsolving:

Thedecisionboundaryobtainedbytheaboveformulacanclassifyanysample:.Notethatalthoughthehyperplanenormalvectoristheonlyoptimizationtarget,thelearningdataandtheinterceptofthehyperplaneaffectthesolutionoftheoptimizationproblemthroughconstraints.ThehardmarginSVMisthesoftmarginSVMwhentheregularizationcoefficientis0.Forthedualproblemandsolution,pleaserefertothesoftmarginSVM,whichisnotlistedhere.

2.Softmargin

UsinghardmarginSVMinlinearlyinseparableproblemswillproduceclassificationerrors,soitcanbemaximizedOnthebasisofmargins,alossfunctionisintroducedtoconstructanewoptimizationproblem.SVMusesthehingelossfunction,followingtheformoftheoptimizationproblemofthehard-boundarySVM.Theoptimizationproblemofthesoft-marginSVMisexpressedasfollows:

Theaboveformulashowsthatthesoft-marginSVMisAnL₂regularizedclassifier,whererepresentsthehingelossfunction.Useslackvariables:Afterprocessingthesegmentedvalueofthehingelossfunction,theaboveformulacanbetransformedinto:

SolvingtheabovesoftmarginSVMisusuallyusedThedualityoftheoptimizationproblem,hereisaderivation:

DefinetheoptimizationproblemofthesoftmarginSVMastheprimalproblem,throughtheLagrangemultiplier:TheLagrangianfunctioncanbeobtained:

LetthepartialderivativeoftheLagrangianfunctiontotheoptimizationobjectivebe0,aseriesofexpressionscontainingLagrangianmultiplierscanbeobtained:

BringitintotheLagrangianfunctiontoobtainthedualproblemoftheoriginalproblem(dualproblem):

Theconstraintsofthedualproblemincludeinequality,sotheconditionforitsexistenceoflocaloptimalityisthattheLagrangianmultipliersatisfiestheKarush-Kuhn-Tuckercondition(Karush-Kuhn-Tuckercondition,KKT):

FromtheaboveKKTconditions,wecanseethatforanysample,thetotalThereareor.Fortheformer,thesamplewillnotaffectthedecisionboundary.Forthelatter,thesamplemeetstherequirementsof

Meansthatitisontheboundaryoftheinterval(),insidetheinterval(),ormisclassified(),thatis,thesampleisasupportvector.ItcanbeseenthatthedeterminationofthedecisionboundaryofthesoftmarginSVMisonlyrelatedtothesupportvector,andtheuseofthehingelossfunctionmakestheSVMsparse.

NonlinearSVM(nonlinearSVM)

Usinganonlinearfunctiontomaptheinputdatatoahigh-dimensionalspace,applyingalinearSVMcanobtainanonlinearSVM.Non-linearSVMhasthefollowingoptimizationproblems:

AnalogsoftmarginSVM,nonlinearSVMhasthefollowingdualproblems:

Notethatthereisaninnerproductofthemappingfunctionintheformula,soyoucanusethekernelmethod,thatis,directlyselectthekernelfunction:.TheKKTconditionofthedualproblemofnonlinearSVMcanbesimilarlyanalogoustothesoft-marginlinearSVM.

Support Vector Machines

Numericalsolution

ThesolutionofSVMcanusethenumericalmethodofquadraticconvexoptimizationproblem,suchasinteriorpointmethodandsequentialminimumoptimizationalgorithm.Stochasticgradientdescentcanalsobeusedwhenlearningsamples.HereisanintroductiontotheapplicationoftheabovethreenumericalmethodsinSVM.

1.InteriorPointMethod(IPM)

TakesoftmarginSVMasanexample,IPMuseslogarithmicbarrierfunction(logarithmicbarrierfunction)ConvertthedualproblemofSVMfromamaximumvalueproblemtoaminimumvalueproblem,andapproximatelyexpressitsoptimizationobjectivesandconstraintsasthefollowingform:

where

isalogarithmicbarrierfunction,essentiallyusingacontinuousfunctiontoapproximatetheinequalityrelationshipintheconstraints.Foranyhyperparameter,theNewton-Raphsonmethodcanbeusedtosolve,andthenumericalsolutionisalsoanapproximatesolutionoftheoriginaldualproblem:.

IPMneedstoinverttheN-ordermatrixwhencalculating.WhenusingNewton'siterationmethod,italsoneedstocalculatetheinverseoftheHessianmatrix.Thisisalargememoryoverheadandacomplexityofalgorithmisonlyapplicabletothecaseofasmallnumberoflearningsamples.SomestudieshaveproposedIPMthatismoresuitableforbigdatathroughlow-rankapproximationandparallelcomputing,andappliedandcomparedthemintheactuallearningofSVM.

2.SequentialMinimalOptimization(SMO)

SMOisacoordinatedescentmethodthatsolvesSVMinaniterativemannerThedualproblemisdesignedtoselecttwovariablesintheLagrangianmultiplierateachiterationstepandfixotherparameterstosimplifytheoriginaloptimizationproblemtoa1-dimensionalsub-feasibledomain(feasiblesubspace),theconstraintconditionhasthefollowingequivalentform:

BringtherightsideoftheaboveformulaintothedualproblemofSVMandeliminatetheinthesummationtermsection>Thequadraticprogrammingproblemonlyaboutcanbeobtained.Theoptimizationproblemhasaclosed-formsolutionthatcanbequicklycalculated.Onthisbasis,SMOhasthefollowingcalculationframework:

InitializeallLagrangianmultipliers;
IdentifyoneMultipliersthatdonotmeettheKKTcondition,andsolvethequadraticprogrammingproblem;
RepeattheabovestepsuntilallthemultipliersmeettheKKTconditionortheupdateamountoftheparameterislessthanthesetvalue.

Itcanbeprovedthatinthequadraticconvexoptimizationproblem,eachiterationofSMOstrictlyoptimizesthedualproblemofSVM,andtheiterationwillconvergetoGlobalmaximumvalue.TheiterationspeedoftheSMOalgorithmisrelatedtothedegreeofdeviationoftheselectedmultiplierfromtheKKTcondition,soSMOusuallyusesheuristicmethodstoselectLagrangianmultipliers.

3.StochasticGradientDescent(SGD)

SGDisacommonoptimizationalgorithminmachinelearningproblems,suitableforlearningwithsufficientsamplesproblem.EachiterationofSGDrandomlyselectslearningsamplestoupdatemodelparameterstoreducethememoryoverheadcausedbyprocessingallsamplesatonce.Theupdaterulesareasfollows:

wherethegradientbeforeThecoefficientisthelearningrate,andisthecostfunction.SincetheoptimizationgoalofSVMisaconvexfunction,itcanbedirectlyrewrittenasaminimumvalueproblemandrunSGDasacostfunction.TakingnonlinearSVMasanexample,theSGDiterationrulesareasfollows:

Fromtheaboveformula,SGDfirstdeterminestheconstraintconditionsineachiteration,ifthesampledoesnotmeettheconstraintsIftheconditionissatisfied,SGDminimizesthestructuralriskaccordingtothelearningrate;ifthesamplemeetstheconstraintconditionandisthesupportvectoroftheSVM,thenSGDbalancestheempiricalriskandstructuralriskaccordingtotheregularizationcoefficient,thatis,theiterationofSGDmaintainsthesparsityoftheSVM.

Programmingimplementation

HereisaSVMprogrammingimplementationusingscikit-learnpackagemoduleinpython3environment:

#ImportModuleimportnumpyasnpimportmatplotlib.pyplotaspltfromsklearnimportsvm,datasets%matplotlibinline#Irisdatairis=datasets.load_iris()X=iris.data[:,:2]#Inordertofacilitatedrawing,only2featuresareselected.=iris.target#Testsample(drawingclassificationarea)xlist1=np.linspace(X[:,0].min(),X[:,0].max(),200)xlist2=np.linspace(X[:,1].min(),X[:,1].max(),200)XGrid1,XGrid2=np.meshgrid(xlist1,xlist2)#Non-linearSVM:RBFcore,hyperparameteris0.5,regularizationcoefficientIs1,theSMOiterationaccuracyis1e-5,andthememoryfootprintis1000MBsvc=svm.SVC(kernel='rbf',C=1,gamma=0.5,tol=1e-5,cache_size=1000).fit(X,y)#PredictandplottheresultZ=svc.predict(np.vstack([XGrid1.ravel(),XGrid2.ravel()]).T)Z=Z.reshape(XGrid1.shape)plt.contourf(XGrid1,XGrid2,Z,Cmap=plt.cm.hsv)plt.contour(XGrid1,XGrid2,Z,colors=('k',))plt.scatter(X[:,0],X[:,1],c=y,edgecolors='k',linewidth=1.5,cmap=plt.cm.hsv)

Improvedalgorithm

Improvedalgorithmforskeweddata

LinearandnonlinearSVMswithsoftmarginscanbeusedtoweightskewdatabymodifyingtheirregularizationcoefficients.Specifically,ifthenumberofpositiveexamplesinthelearningsampleismuchgreaterthanthenegativeexamples,theregularizationcoefficientcanbesetaccordingtothesampleratio:

whereIndicatespositiveandnegativeexamples,thatis,whentherearemanypositiveexamples,useasmallregularizationcoefficientforpositiveandprofit,sothatSVMtendstoreducethestructuralriskthroughpositiveexamples,andalsouseslargeregularizationcoefficientsfornegativeexamples,sothatSVMtendstopassNegativecasesreduceexperiencerisk.

Platt'sprobabilisticoutputs(Platt'sprobabilisticoutputs)

ProbabilitySVMcanberegardedasacombinationofLogisticregressionandSVM.SVMdirectlyoutputstheclassificationofsamplesfromthedecisionboundary.ProbabilitySVMusestheSigmoidfunctiontocalculatetheprobabilitythatthesamplebelongstoitscategory.Specifically,aftercalculatingthestandardSVMtoobtainthedecisionboundaryofthelearningsample,theprobabilitySVMusesthescalingandtranslationparameterstolinearlytransformthedecisionboundary,andusesMaximumLiklihoodEstimation(MLE)Obtainthevalueof,andusethedistancefromthesampletothehyperplaneafterlineartransformationastheinputoftheSigmoidfunctiontoobtaintheprobability.AftersolvingthedecisionboundarybythestandardSVM,theimprovementoftheprobabilitySVMcanbeexpressedasfollows:

Theoptimizationprobleminthefirstlineoftheformulaisactuallythelogisticregressionofscalingandtranslationparameters.Thegradientdescentalgorithmneedstobeusedtosolvetheproblem,whichmeansthattheoperatingefficiencyoftheprobabilisticSVMislowerthanthatofthestandardSVM.AftertheMLEofthescalingandtranslationparametersisobtainedbylearningthesamples,theoutputprobabilityoftheSVMcanbecalculatedbyapplyingtheparameterstothetestsamples.

MultipleclassSVM(multipleclassSVM)

StandardSVMisanalgorithmdesignedbasedonbinaryclassificationproblemsandcannotdirectlyhandlemulti-classclassificationproblems.UsethestandardSVMcalculationprocesstoconstructmultipledecisionboundariesinordertoachievemulti-classificationofsamples.Theusualimplementationsare"one-against-all"and"one-against-one".One-to-manySVMestablishesmdecisionboundariesformcategories,andeachdecisionboundarydeterminestheattributionofonecategorytoallothercategories;one-to-oneSVMisavotingmethod,anditscalculationprocessistoclassifymcategoriesAnytwodecision-makingboundariesof,thatis,thereareatotalofdecision-makingboundaries,andthecategoryofthesampleisselectedaccordingtothecategorywiththehighestscoreamongthediscriminantresultsofallthedecision-makingboundaries.One-to-manySVMcanrealizeoneiterationtocalculatealldecisionboundariesbymodifyingtheoptimizationproblemofstandardSVM.

LeastSquareSVM(LeastSquareSVM,LS-SVM)

LS-SVMisavariantofstandardSVM,thedifferencebetweenthetwoItisthatLS-SVMdoesnotusethehingelossfunction,butrewritesitsoptimizationproblemintoaformsimilartoridgeregression.ForsoftmarginSVM,theoptimizationproblemofLS-SVMisasfollows:

ByanalogytothestandardSVM,thedualproblemofLS-SVMcanbeobtainedthroughtheLagrangianmultiplier:,thedualproblemisalinearsystem:

TheaboveformulacanusethekernelmethodtogetthenonlinearLS-SVM.ThelinearsystemofLS-SVMcanbesolvedbyconjugategradientorSMO,andthesolutionefficiencyisusuallyhigherthanthatofthequadraticconvexoptimizationproblemofstandardSVM.Studieshaveshownthatforanydimensionalfeaturespace,whenthesamplesarelinearlyindependent,LS-SVMandSVMwillgetthesameresult.Iftheconditionisnotmet,theoutputofthetwowillbedifferent.Anexampleofcomparingthetwoistwo-spiralclassification.

StructuredSVM(structuredSVM)

StructuredSVMisageneralizationofstandardSVMwhendealingwithstructuredpredictionproblems.Giventhestructureddatainthesamplespaceandthelabelspaceandthedistancefunctioninthelabelspace,theoptimizationproblemisasfollows:

StructuredSVMisappliedtoNaturalLanguageProcessing(NLP)problems,forexample,givencorpusdatatopredictthestructureofitsparser,itisalsousedinbioinformaticsProteinstructureprediction.

MultiplekernelSVM(multiplekernelSVM)

Multi-coreSVMistheimplementationofmultiplekernellearninginsupervisedlearning.InlinearSVM,asinglekernelfunctionisreplacedbyanimprovedalgorithmofthekernelfamily(kernelfamily).Theconstructionmethodsofmulti-coreSVMcanbesummarizedintothefollowingfivecategories:

Fixedrule:Usethenatureofthekernelfunctionwithoutaddinganyhyperparameters,suchasLinearadditivityconstructsafamilyofkernelfunctions.Themulti-coreSVMconstructedbythedisplayrulescanbesolveddirectlyusingthestandardSVMmethod.
Heuristicapproach:Useacombinationfunctionthatincludesparameterstoconstructafamilyofkernelfunctions,andtheparametersaredeterminedaccordingtothekernelmatrixorclassificationperformanceofthesinglekernelfunctionparticipatingintheconstruction.
Optimizationapproach:useacombinationofparameterstoconstructafamilyofkernelfunctions,andtheparametersarebasedonthesimilaritybetweenthekernelfunctionsorminimizethestructuralriskortheresultingoptimizationProblemsolving.
Bayesianapproach:Useacombinationofparameterstoconstructafamilyofkernelfunctions.TheparametersaretreatedasrandomvariablesandestimatedaccordingtotheBayesianinferencemethod.
Boostingapproach:Iterativelyaddkernelfunctionstothekernelfunctionfamilyuntiltheclassificationperformanceofthemulti-coreSVMnolongerimproves.

Researchshowsthat,intermsofclassificationaccuracy,multi-coreSVMhashigherflexibilityandisgenerallybetterthanusingasinglekernelfunctionfamily.ThestandardSVMforkernelcalculation,butthenon-linearandsample-dependentkernelfunctionfamilyconstructionmethodisnotalwaysbetter.Theconstructionofthekernelfunctionfamilyusuallydependsonthespecificproblem.

Extendedalgorithm

SupportVectorRegression(SVR)

ItispossibletoextendSVMfromaclassificationproblemtoaregressionproblemGetSupportVectorRegression(SVR).Atthistime,thestandardalgorithmofSVMisalsocalledSupportVectorClassification(SVC).ThehyperplanedecisionboundaryinSVCistheregressionmodelofSVR:.SVRhassparseness.Ifthesamplepointiscloseenoughtotheregressionmodel,thatis,fallswithintheintervalboundaryoftheregressionmodel,thenthesampledoesnotcalculatetheloss,andthecorrespondinglossfunctioniscalledε-insensitivelossfunction(ε-insensitiveloss):,whereisahyperparameterthatdeterminesthewidthoftheintervalboundary.ItcanbeseenthattheinsensitivelossfunctionissimilartothehingelossfunctionusedbySVC,andthevalueofthepartneartheoriginisfixedto0.AnalogoustosoftmarginSVM,SVRisaquadraticconvexoptimizationproblemofthefollowingform:

Useslackvariabletorepresentthescoreofε-insensitivelossfunctionAfterthevalueofthesectionisobtained:

SimilartothesoftmarginSVM,theLagrangiancanbeobtainedbyintroducingtheLagrangianmultiplier:Dailyfunctionanddualproblem:

ThedualproblemhasthefollowingKKTconditions:

SolvingthedualproblemcangettheformofSVR:

SVRcanobtainnonlinearregressionresultsthroughthekernelmethod.Inaddition,LS-SVMcansolveregressionproblemsinasimilarwaytoSVR.

Supportvectorclustering(supportvectorclustering)

Supportvectorclusteringisanon-parametricclusteringalgorithm,whichistheclusteringproblemofSVMInthepromotion.Specifically,supportvectorclusteringfirstusesakernelfunction,usuallyaradialbasisfunctionkernel,tomapthesamplestoahigh-dimensionalspace,andthenusestheSVDD(SupportVectorDomainDescription)algorithmtoobtainaclosedhypersurfaceasthesamplepointrichinthehigh-dimensionalspace.Thedescriptionofthecollectionarea.Finally,supportvectorclusteringmapsthesurfacebacktotheoriginalfeaturespacetoobtainaseriesofclosedcontours.Thesamplesinsideeachcontourareassignedacategory.

Supportvectorclusteringdoesnotrequireapredeterminednumberofclusters.Researchhasshownthatsupportvectorclusteringhasstableperformanceintheclusteringoflow-dimensionallearningsamples.High-dimensionalsamplescanbereducedbyotherdimensionality.Thereduction)methodcanalsoperformsupportvectorclusteringafterpreprocessing.

Semi-SupervisedSVM(Semi-SupervisedSVM,S3VM)

S3VMisanapplicationofSVMinsemi-supervisedlearning,whichcanbeappliedtoasmallamountoflabeldataandAlearningsamplecomposedofalargenumberofunlabeleddata.Whenunlabeledsamplesarenotconsidered,SVMwillsolvethemaximummarginhyperplane.Afterconsideringunlabeleddata,S3VMwillsolvethetwotypesoflabeledsamplesbasedonthehypothesisoflowdensityseparation,andpassthroughunlabeledsamples.Thepartitioninghyperplaneofthelow-densityareaofdata.

ThegeneralformofS3VMistosolvethedecisionboundaryfromlabeleddataaccordingtothestandardSVMmethod,andadjustthedecisionboundarybyexploringunlabeleddata.BasedonthesoftmarginSVM,theoptimizationproblemofS3VMintroducestwoadditionalslackvariables:

whereislabeledandunlabeledThenumberofsamples,theslackvariablerepresentstheempiricalriskofSSVMclassifyingunlabeleddataintotwocategories.

S3VMhasmanyvariants,includingtransductiveSVM(TransductiveSVM,TSVM),LaplacianSVM(LaplacianSVM)andmeanS3VM(meanS3VM).

Properties

Robustnessandsparsity:TheoptimizationproblemofSVMconsidersbothempiricalriskandstructuralriskminimization,soitisstable.Fromageometricpointofview,thestabilityofSVMisreflectedinthefactthatitrequiresthelargestmarginwhenconstructingthehyperplanedecisionboundary,sothereisamplespacebetweentheseparationboundariestocontaintestsamples.SVMusesthehingelossfunctionastheproxyloss.ThevaluecharacteristicofthehingelossfunctionmakesSVMsparse,thatis,itsdecisionboundaryisonlydeterminedbythesupportvector,andtheremainingsamplepointsdonotparticipateinempiricalriskminimization.Inthenon-linearlearningusingthekernelmethod,therobustnessandsparsityofSVMnotonlyensurereliablesolutionresults,butalsoreducethecalculationamountandmemoryoverheadofthekernelmatrix.

Relationshipwithotherlinearclassifiers:SVMisageneralizedlinearclassifier.OthertypesoflinearclassifierscanbeobtainedbymodifyingthelossfunctionandoptimizationproblemsundertheframeworkoftheSVMalgorithm.Forexample,thelossofSVMReplacethefunctionwiththelogisticlossfunctiontogetanoptimizationproblemclosetologisticregression.SVMandlogisticregressionareclassifierswithsimilarfunctions.Thedifferencebetweenthetwoisthattheoutputoflogisticregressionhasprobabilisticmeaningandcaneasilybeextendedtomulti-classificationproblems.ThesparsityandstabilityofSVMmakeithavegoodgeneralizationabilityandareinuseTheamountofcalculationissmallerwhencheckingthemethod.

Asthenatureofthekernelmethod:SVMisnottheonlymachinelearningalgorithmthatcanusekerneltechniques.Logisticregression,ridgeregressionandlineardiscriminantanalysis(LinearDiscriminantAnalysis,LDA)canalsobeusedtoobtainkernellogisticregression(Kernellogisticregression,kernelridgeregressionandkernellineardiscriminantanalysis(KernelizedLDA,KLDA)methods.Therefore,SVMisoneoftherealizationsofnuclearlearninginabroadsense.

Application

SVMhasapplicationsinpatternrecognitionproblemsinvariousfields,includingportraitrecognition,textclassification,handwrittencharacterrecognition,bioinformatics,etc.

ProgrammingmoduleincludingSVM

Accordingtothenumberofcitations,LIBSVMdevelopedbytheInstituteofInformationEngineering,NationalTaiwanUniversityisthemostwidelyusedSVMtool.LIBSVMincludesstandardSVMalgorithm,probabilityoutput,supportvectorregression,multi-classSVMandotherfunctions.ItssourcecodeiswritteninC,andithasacallinterfaceforlanguagessuchasJAVA,Python,R,MATLAB,CUDA-basedGPUacceleration,andotherfunctions.Components,suchasmulti-coreparallelcomputing,modelcross-validation,etc.

Themachinelearningmodulescikit-learndevelopedbasedonPythonprovidespre-packagedSVMtools,anditsdesignreferstoLIBSVM.OtherPythonmodulesthatincludeSVMincludeMDP,MLPy,PyMVPA,etc.TensorFlow'shigh-levelAPIcomponentEstimatorsprovidesapackagemodelofSVM.