History
SVMisaclassifierdevelopedfromthegeneralizedportraitalgorithminpatternrecognition.ItsearlyworkcamefromformerSovietscholarsVladimirN.VapnikandAlexanderY.ResearchpublishedbyLernerin1963.In1964,VapnikandAlexeyY.ChervonenkisfurtherdiscussedthegeneralizedportraitalgorithmandestablishedalinearSVMwithhardmargins.Sincethen,inthe1970sand1980s,withthetheoreticalresearchofthemaximummargindecisionboundaryinpatternrecognition,theemergenceofslackvariable-basedplanningproblemsolvingtechniques,andtheVCdimension(Vapnik-Chervonenkisdimension,VCdimension)),SVMwasgraduallytheorizedandbecameapartofstatisticallearningtheory.In1992,BernhardE.Boser,IsabelleM.GuyonandVapnikobtainedthenonlinearSVMthroughthekernelmethod.In1995,CorinnaCortesandVapnikproposedasoft-marginnonlinearSVMandappliedittohandwrittencharacterrecognitionproblems.Thisresearchhasreceivedattentionandcitationsafterpublication,providingareferencefortheapplicationofSVMinvariousfields.
Theory
Linearclassification
Linearseparability(linearseparability)
Giventheinputdataandlearningobjectivesintheclassificationproblem:,whereeachsampleoftheinputdatacontainsmultiplefeaturesandthusconstitutesafeaturespace:,andthelearningobjectiveisthebinaryvariable,whichmeansnegativeclassandpositiveclass.
Ifthereisahyperplaneasadecisionboundaryinthefeaturespacewheretheinputdataislocated,thelearningtargetsareseparatedintopositiveandnegativeclasses,andthedistancefromthepointtotheplaneofanysampleisgreaterthanorequalto1:Itissaidthattheclassificationproblemislinearlyseparable,andtheparametersarethenormalvectorandinterceptofthehyperplanerespectively.
Thedecisionboundarythatmeetsthisconditionactuallyconstructstwoparallelhyperplanesastheintervalboundarytodistinguishtheclassificationofthesample:
AllupperintervalsThesamplesabovetheboundarybelongtothepositiveclass,andthesamplesbelowthelowerintervalboundarybelongtothenegativeclass.Thedistancebetweenthetwointervalboundariesisdefinedasthemargin,andthepositiveandnegativesampleslocatedontheintervalboundaryarethesupportvectors.
lossfunction
Whenaclassificationproblemisnotlinearlyseparable,usingahyperplaneasadecisionboundarywillbringclassificationloss,Thatis,partofthesupportvectorisnolongerlocatedontheboundaryoftheinterval,butenterstheboundaryoftheinterval,orfallsonthewrongsideofthedecisionboundary.Thelossfunctioncanquantifytheclassificationloss.Accordingtothemathematicalmeaning,the0-1lossfunctioncanbeobtained:
0-1lossfunctionisnotacontinuousfunction,whichisnotconducivetooptimizationTosolvetheproblem,theusualchoiceistoconstructasurrogateloss.Theavailableoptionsincludehingeloss,logisticloss,andexponentialloss.SVMuseshingeloss:
Consistencyresearchonsubstitutionlossshowsthatwhentheproxylossisacontinuousconvexfunction,andistheupperboundofthe0-1lossfunctionatanyvalue,theresultofsolvingtheproxylossminimizationisalsothe0-1lossminimizationThesolution.Thehingelossfunctionsatisfiestheaboveconditions.
Empiricalriskandstructuralrisk
See:StatisticalLearningTheory
Accordingtothestatisticallearningtheory,theclassifierwillgenerateriskswhenitislearnedandappliedtonewdata.Thetypesofriskscanbedividedintoempiricalrisksandstructuralrisks:
Therepresentstheclassifier,theempiricalriskisdefinedbythelossfunction,whichdescribestheaccuracyoftheclassificationresultsgivenbytheclassifier;thestructuralriskisdefinedbythenormoftheclassifierparametermatrix,whichdescribestheclassifieritselfThedegreeofcomplexityandstabilityofthecomplexclassifierispronetooverfitting,soitisunstable.Ifaclassifierdeterminesitsmodelparametersbyminimizingthelinearcombinationofempiricalriskandstructuralrisk:
Thenthesolutiontotheclassifierisaregularizationproblem,constantistheregularizationcoefficient.When,theformulaiscalledL2regularizationorTikhonovregularization(Tikhonovregularization).ThestructuralriskofSVMisrepresentedby.Underlinearlyseparableproblems,theempiricalriskofhard-boundarySVMcanbereducedto0,soitisaclassifierthatcompletelyminimizesstructuralrisks;inlinearlyinseparableproblems,Theempiricalriskofsoft-boundarySVMcannotbereducedtozero,soitisanL2regularizedclassifierthatminimizesthelinearcombinationofstructuralriskandempiricalrisk.
Kernelmethod
Mainarticle:Kernelmethod
SomelinearlyinseparableproblemsmaybenonlinearItisseparable,thatis,thereisahypersurfaceinthefeaturespacetoseparatethepositiveclassfromthenegativeclass.Usingnonlinearfunctions,nonlinearseparableproblemscanbemappedfromtheoriginalfeaturespacetohigher-dimensionalHilbertspace(Hilbertspace),whichcanthenbetransformedintolinearlyseparableproblems.Thehyperplaneofthedecisionboundaryisexpressedasfollows:
whereisthemappingfunction.Sincethemappingfunctionhasacomplicatedformanditisdifficulttocalculateitsinnerproduct,thekernelmethodcanbeused,thatis,theinnerproductofthemappingfunctionisdefinedasakernelfunction:toavoidtheinnerproductExplicitcalculationof.
Mercer'stheorem(Mercer'stheorem)
Thechoiceofkernelfunctionrequirescertainconditions,functionThenecessaryandsufficientconditionforbeingakernelfunctionisthatforanyvectorintheinputspace:,itskernelmatrix,thatis,theGrammatrixofthefollowingform:
isapositivesemi-definitematrix.TheaboveconclusioniscalledMercer'stheorem.Theproofofthetheoremisomitted,conclusively,asasufficientcondition:theinnerproductoftwofunctionsinthecharacteristicspaceisabinaryfunction,andwhenitskernelmatrixisapositivesemi-definitematrix,thebinaryfunctionisreproducible:,soitsinnerproductspaceisanormedvectorspace(normedvectorspace),whichcanbecompletedtoobtainHilbertspace,namelyReproducingKernelHilbertSpace(RKHS).Asanecessarycondition,afterconstructingthekernelmatrixforthekernelfunction,itiseasytoknow:.
Commonkernelfunction
Afterconstructingthekernelfunction,itisdifficulttoverifythatitisapositivesemi-definitematrixforanyGrammatrixintheinputspace.Therefore,theusualchoiceistouseaready-madekernelfunction.Someexamplesofkernelfunctionsaregivenbelow.Theunexplainedparametersareallhyper-parametersofthekernelfunction:
Whentheorderofthepolynomialkernelis1Whenitiscalledalinearkernel,thecorrespondingnonlinearclassifierdegeneratesintoalinearclassifier.TheRBFkernelisalsocalledaGaussiankernel,anditscorrespondingmappingfunctionmapsthesamplespacetoaninfinitedimensionalspace.ThelinearcombinationandCartesianproductofkernelfunctionsarealsokernelfunctions.Inaddition,forthefunctioninthefeaturespace,isalsoakernelfunction.
Algorithm
StandardAlgorithm
LinearSVM(linearSVM)
1.HardHardmargin
Giventheinputdataandlearningobjective:,hardmarginSVMistosolvethemaximummarginhyperplaneinlinearseparableproblems(Maximum-marginhyperplane)algorithm,theconstraintconditionisthatthedistancefromthesamplepointtothedecisionboundaryisgreaterthanorequalto1.Thehard-boundarySVMcanbetransformedintoanequivalentquadraticconvexoptimizationproblemforsolving:
Thedecisionboundaryobtainedbytheaboveformulacanclassifyanysample:.Notethatalthoughthehyperplanenormalvectoristheonlyoptimizationtarget,thelearningdataandtheinterceptofthehyperplaneaffectthesolutionoftheoptimizationproblemthroughconstraints.ThehardmarginSVMisthesoftmarginSVMwhentheregularizationcoefficientis0.Forthedualproblemandsolution,pleaserefertothesoftmarginSVM,whichisnotlistedhere.
2.Softmargin
UsinghardmarginSVMinlinearlyinseparableproblemswillproduceclassificationerrors,soitcanbemaximizedOnthebasisofmargins,alossfunctionisintroducedtoconstructanewoptimizationproblem.SVMusesthehingelossfunction,followingtheformoftheoptimizationproblemofthehard-boundarySVM.Theoptimizationproblemofthesoft-marginSVMisexpressedasfollows:
Theaboveformulashowsthatthesoft-marginSVMisAnL2regularizedclassifier,whererepresentsthehingelossfunction.Useslackvariables:Afterprocessingthesegmentedvalueofthehingelossfunction,theaboveformulacanbetransformedinto:
SolvingtheabovesoftmarginSVMisusuallyusedThedualityoftheoptimizationproblem,hereisaderivation:
DefinetheoptimizationproblemofthesoftmarginSVMastheprimalproblem,throughtheLagrangemultiplier:TheLagrangianfunctioncanbeobtained:
LetthepartialderivativeoftheLagrangianfunctiontotheoptimizationobjectivebe0,aseriesofexpressionscontainingLagrangianmultiplierscanbeobtained:
BringitintotheLagrangianfunctiontoobtainthedualproblemoftheoriginalproblem(dualproblem):
Theconstraintsofthedualproblemincludeinequality,sotheconditionforitsexistenceoflocaloptimalityisthattheLagrangianmultipliersatisfiestheKarush-Kuhn-Tuckercondition(Karush-Kuhn-Tuckercondition,KKT):
FromtheaboveKKTconditions,wecanseethatforanysample,thetotalThereareor.Fortheformer,thesamplewillnotaffectthedecisionboundary.Forthelatter,thesamplemeetstherequirementsofMeansthatitisontheboundaryoftheinterval(),insidetheinterval(),ormisclassified(),thatis,thesampleisasupportvector.ItcanbeseenthatthedeterminationofthedecisionboundaryofthesoftmarginSVMisonlyrelatedtothesupportvector,andtheuseofthehingelossfunctionmakestheSVMsparse.
NonlinearSVM(nonlinearSVM)
Usinganonlinearfunctiontomaptheinputdatatoahigh-dimensionalspace,applyingalinearSVMcanobtainanonlinearSVM.Non-linearSVMhasthefollowingoptimizationproblems:
AnalogsoftmarginSVM,nonlinearSVMhasthefollowingdualproblems:
Notethatthereisaninnerproductofthemappingfunctionintheformula,soyoucanusethekernelmethod,thatis,directlyselectthekernelfunction:.TheKKTconditionofthedualproblemofnonlinearSVMcanbesimilarlyanalogoustothesoft-marginlinearSVM.
Numericalsolution
ThesolutionofSVMcanusethenumericalmethodofquadraticconvexoptimizationproblem,suchasinteriorpointmethodandsequentialminimumoptimizationalgorithm.Stochasticgradientdescentcanalsobeusedwhenlearningsamples.HereisanintroductiontotheapplicationoftheabovethreenumericalmethodsinSVM.
1.InteriorPointMethod(IPM)
TakesoftmarginSVMasanexample,IPMuseslogarithmicbarrierfunction(logarithmicbarrierfunction)ConvertthedualproblemofSVMfromamaximumvalueproblemtoaminimumvalueproblem,andapproximatelyexpressitsoptimizationobjectivesandconstraintsasthefollowingform:
whereisalogarithmicbarrierfunction,essentiallyusingacontinuousfunctiontoapproximatetheinequalityrelationshipintheconstraints.Foranyhyperparameter,theNewton-Raphsonmethodcanbeusedtosolve,andthenumericalsolutionisalsoanapproximatesolutionoftheoriginaldualproblem:.
IPMneedstoinverttheN-ordermatrixwhencalculating.WhenusingNewton'siterationmethod,italsoneedstocalculatetheinverseoftheHessianmatrix.Thisisalargememoryoverheadandacomplexityofalgorithmisonlyapplicabletothecaseofasmallnumberoflearningsamples.SomestudieshaveproposedIPMthatismoresuitableforbigdatathroughlow-rankapproximationandparallelcomputing,andappliedandcomparedthemintheactuallearningofSVM.
2.SequentialMinimalOptimization(SMO)
SMOisacoordinatedescentmethodthatsolvesSVMinaniterativemannerThedualproblemisdesignedtoselecttwovariablesintheLagrangianmultiplierateachiterationstepandfixotherparameterstosimplifytheoriginaloptimizationproblemtoa1-dimensionalsub-feasibledomain(feasiblesubspace),theconstraintconditionhasthefollowingequivalentform:
BringtherightsideoftheaboveformulaintothedualproblemofSVMandeliminatetheinthesummationtermsection>Thequadraticprogrammingproblemonlyaboutcanbeobtained.Theoptimizationproblemhasaclosed-formsolutionthatcanbequicklycalculated.Onthisbasis,SMOhasthefollowingcalculationframework:
InitializeallLagrangianmultipliers;
IdentifyoneMultipliersthatdonotmeettheKKTcondition,andsolvethequadraticprogrammingproblem;
RepeattheabovestepsuntilallthemultipliersmeettheKKTconditionortheupdateamountoftheparameterislessthanthesetvalue.
Itcanbeprovedthatinthequadraticconvexoptimizationproblem,eachiterationofSMOstrictlyoptimizesthedualproblemofSVM,andtheiterationwillconvergetoGlobalmaximumvalue.TheiterationspeedoftheSMOalgorithmisrelatedtothedegreeofdeviationoftheselectedmultiplierfromtheKKTcondition,soSMOusuallyusesheuristicmethodstoselectLagrangianmultipliers.
3.StochasticGradientDescent(SGD)
SGDisacommonoptimizationalgorithminmachinelearningproblems,suitableforlearningwithsufficientsamplesproblem.EachiterationofSGDrandomlyselectslearningsamplestoupdatemodelparameterstoreducethememoryoverheadcausedbyprocessingallsamplesatonce.Theupdaterulesareasfollows:
wherethegradientbeforeThecoefficientisthelearningrate,andisthecostfunction.SincetheoptimizationgoalofSVMisaconvexfunction,itcanbedirectlyrewrittenasaminimumvalueproblemandrunSGDasacostfunction.TakingnonlinearSVMasanexample,theSGDiterationrulesareasfollows:
Fromtheaboveformula,SGDfirstdeterminestheconstraintconditionsineachiteration,ifthesampledoesnotmeettheconstraintsIftheconditionissatisfied,SGDminimizesthestructuralriskaccordingtothelearningrate;ifthesamplemeetstheconstraintconditionandisthesupportvectoroftheSVM,thenSGDbalancestheempiricalriskandstructuralriskaccordingtotheregularizationcoefficient,thatis,theiterationofSGDmaintainsthesparsityoftheSVM.
Programmingimplementation
HereisaSVMprogrammingimplementationusingscikit-learnpackagemoduleinpython3environment:
#ImportModuleimportnumpyasnpimportmatplotlib.pyplotaspltfromsklearnimportsvm,datasets%matplotlibinline#Irisdatairis=datasets.load_iris()X=iris.data[:,:2]#Inordertofacilitatedrawing,only2featuresareselected.=iris.target#Testsample(drawingclassificationarea)xlist1=np.linspace(X[:,0].min(),X[:,0].max(),200)xlist2=np.linspace(X[:,1].min(),X[:,1].max(),200)XGrid1,XGrid2=np.meshgrid(xlist1,xlist2)#Non-linearSVM:RBFcore,hyperparameteris0.5,regularizationcoefficientIs1,theSMOiterationaccuracyis1e-5,andthememoryfootprintis1000MBsvc=svm.SVC(kernel='rbf',C=1,gamma=0.5,tol=1e-5,cache_size=1000).fit(X,y)#PredictandplottheresultZ=svc.predict(np.vstack([XGrid1.ravel(),XGrid2.ravel()]).T)Z=Z.reshape(XGrid1.shape)plt.contourf(XGrid1,XGrid2,Z,Cmap=plt.cm.hsv)plt.contour(XGrid1,XGrid2,Z,colors=('k',))plt.scatter(X[:,0],X[:,1],c=y,edgecolors='k',linewidth=1.5,cmap=plt.cm.hsv)
Improvedalgorithm
Improvedalgorithmforskeweddata
LinearandnonlinearSVMswithsoftmarginscanbeusedtoweightskewdatabymodifyingtheirregularizationcoefficients.Specifically,ifthenumberofpositiveexamplesinthelearningsampleismuchgreaterthanthenegativeexamples,theregularizationcoefficientcanbesetaccordingtothesampleratio:
whereIndicatespositiveandnegativeexamples,thatis,whentherearemanypositiveexamples,useasmallregularizationcoefficientforpositiveandprofit,sothatSVMtendstoreducethestructuralriskthroughpositiveexamples,andalsouseslargeregularizationcoefficientsfornegativeexamples,sothatSVMtendstopassNegativecasesreduceexperiencerisk.
Platt'sprobabilisticoutputs(Platt'sprobabilisticoutputs)
ProbabilitySVMcanberegardedasacombinationofLogisticregressionandSVM.SVMdirectlyoutputstheclassificationofsamplesfromthedecisionboundary.ProbabilitySVMusestheSigmoidfunctiontocalculatetheprobabilitythatthesamplebelongstoitscategory.Specifically,aftercalculatingthestandardSVMtoobtainthedecisionboundaryofthelearningsample,theprobabilitySVMusesthescalingandtranslationparameterstolinearlytransformthedecisionboundary,andusesMaximumLiklihoodEstimation(MLE)Obtainthevalueof,andusethedistancefromthesampletothehyperplaneafterlineartransformationastheinputoftheSigmoidfunctiontoobtaintheprobability.AftersolvingthedecisionboundarybythestandardSVM,theimprovementoftheprobabilitySVMcanbeexpressedasfollows:
Theoptimizationprobleminthefirstlineoftheformulaisactuallythelogisticregressionofscalingandtranslationparameters.Thegradientdescentalgorithmneedstobeusedtosolvetheproblem,whichmeansthattheoperatingefficiencyoftheprobabilisticSVMislowerthanthatofthestandardSVM.AftertheMLEofthescalingandtranslationparametersisobtainedbylearningthesamples,theoutputprobabilityoftheSVMcanbecalculatedbyapplyingtheparameterstothetestsamples.
MultipleclassSVM(multipleclassSVM)
StandardSVMisanalgorithmdesignedbasedonbinaryclassificationproblemsandcannotdirectlyhandlemulti-classclassificationproblems.UsethestandardSVMcalculationprocesstoconstructmultipledecisionboundariesinordertoachievemulti-classificationofsamples.Theusualimplementationsare"one-against-all"and"one-against-one".One-to-manySVMestablishesmdecisionboundariesformcategories,andeachdecisionboundarydeterminestheattributionofonecategorytoallothercategories;one-to-oneSVMisavotingmethod,anditscalculationprocessistoclassifymcategoriesAnytwodecision-makingboundariesof,thatis,thereareatotalofdecision-makingboundaries,andthecategoryofthesampleisselectedaccordingtothecategorywiththehighestscoreamongthediscriminantresultsofallthedecision-makingboundaries.One-to-manySVMcanrealizeoneiterationtocalculatealldecisionboundariesbymodifyingtheoptimizationproblemofstandardSVM.
LeastSquareSVM(LeastSquareSVM,LS-SVM)
LS-SVMisavariantofstandardSVM,thedifferencebetweenthetwoItisthatLS-SVMdoesnotusethehingelossfunction,butrewritesitsoptimizationproblemintoaformsimilartoridgeregression.ForsoftmarginSVM,theoptimizationproblemofLS-SVMisasfollows:
ByanalogytothestandardSVM,thedualproblemofLS-SVMcanbeobtainedthroughtheLagrangianmultiplier:,thedualproblemisalinearsystem:
TheaboveformulacanusethekernelmethodtogetthenonlinearLS-SVM.ThelinearsystemofLS-SVMcanbesolvedbyconjugategradientorSMO,andthesolutionefficiencyisusuallyhigherthanthatofthequadraticconvexoptimizationproblemofstandardSVM.Studieshaveshownthatforanydimensionalfeaturespace,whenthesamplesarelinearlyindependent,LS-SVMandSVMwillgetthesameresult.Iftheconditionisnotmet,theoutputofthetwowillbedifferent.Anexampleofcomparingthetwoistwo-spiralclassification.
StructuredSVM(structuredSVM)
StructuredSVMisageneralizationofstandardSVMwhendealingwithstructuredpredictionproblems.Giventhestructureddatainthesamplespaceandthelabelspaceandthedistancefunctioninthelabelspace,theoptimizationproblemisasfollows:
StructuredSVMisappliedtoNaturalLanguageProcessing(NLP)problems,forexample,givencorpusdatatopredictthestructureofitsparser,itisalsousedinbioinformaticsProteinstructureprediction.
MultiplekernelSVM(multiplekernelSVM)
Multi-coreSVMistheimplementationofmultiplekernellearninginsupervisedlearning.InlinearSVM,asinglekernelfunctionisreplacedbyanimprovedalgorithmofthekernelfamily(kernelfamily).Theconstructionmethodsofmulti-coreSVMcanbesummarizedintothefollowingfivecategories:
Fixedrule:Usethenatureofthekernelfunctionwithoutaddinganyhyperparameters,suchasLinearadditivityconstructsafamilyofkernelfunctions.Themulti-coreSVMconstructedbythedisplayrulescanbesolveddirectlyusingthestandardSVMmethod.
Heuristicapproach:Useacombinationfunctionthatincludesparameterstoconstructafamilyofkernelfunctions,andtheparametersaredeterminedaccordingtothekernelmatrixorclassificationperformanceofthesinglekernelfunctionparticipatingintheconstruction.
Optimizationapproach:useacombinationofparameterstoconstructafamilyofkernelfunctions,andtheparametersarebasedonthesimilaritybetweenthekernelfunctionsorminimizethestructuralriskortheresultingoptimizationProblemsolving.
Bayesianapproach:Useacombinationofparameterstoconstructafamilyofkernelfunctions.TheparametersaretreatedasrandomvariablesandestimatedaccordingtotheBayesianinferencemethod.
Boostingapproach:Iterativelyaddkernelfunctionstothekernelfunctionfamilyuntiltheclassificationperformanceofthemulti-coreSVMnolongerimproves.
Researchshowsthat,intermsofclassificationaccuracy,multi-coreSVMhashigherflexibilityandisgenerallybetterthanusingasinglekernelfunctionfamily.ThestandardSVMforkernelcalculation,butthenon-linearandsample-dependentkernelfunctionfamilyconstructionmethodisnotalwaysbetter.Theconstructionofthekernelfunctionfamilyusuallydependsonthespecificproblem.
Extendedalgorithm
SupportVectorRegression(SVR)
ItispossibletoextendSVMfromaclassificationproblemtoaregressionproblemGetSupportVectorRegression(SVR).Atthistime,thestandardalgorithmofSVMisalsocalledSupportVectorClassification(SVC).ThehyperplanedecisionboundaryinSVCistheregressionmodelofSVR:.SVRhassparseness.Ifthesamplepointiscloseenoughtotheregressionmodel,thatis,fallswithintheintervalboundaryoftheregressionmodel,thenthesampledoesnotcalculatetheloss,andthecorrespondinglossfunctioniscalledε-insensitivelossfunction(ε-insensitiveloss):,whereisahyperparameterthatdeterminesthewidthoftheintervalboundary.ItcanbeseenthattheinsensitivelossfunctionissimilartothehingelossfunctionusedbySVC,andthevalueofthepartneartheoriginisfixedto0.AnalogoustosoftmarginSVM,SVRisaquadraticconvexoptimizationproblemofthefollowingform:
Useslackvariabletorepresentthescoreofε-insensitivelossfunctionAfterthevalueofthesectionisobtained:
SimilartothesoftmarginSVM,theLagrangiancanbeobtainedbyintroducingtheLagrangianmultiplier:Dailyfunctionanddualproblem:
ThedualproblemhasthefollowingKKTconditions:
SolvingthedualproblemcangettheformofSVR:
SVRcanobtainnonlinearregressionresultsthroughthekernelmethod.Inaddition,LS-SVMcansolveregressionproblemsinasimilarwaytoSVR.
Supportvectorclustering(supportvectorclustering)
Supportvectorclusteringisanon-parametricclusteringalgorithm,whichistheclusteringproblemofSVMInthepromotion.Specifically,supportvectorclusteringfirstusesakernelfunction,usuallyaradialbasisfunctionkernel,tomapthesamplestoahigh-dimensionalspace,andthenusestheSVDD(SupportVectorDomainDescription)algorithmtoobtainaclosedhypersurfaceasthesamplepointrichinthehigh-dimensionalspace.Thedescriptionofthecollectionarea.Finally,supportvectorclusteringmapsthesurfacebacktotheoriginalfeaturespacetoobtainaseriesofclosedcontours.Thesamplesinsideeachcontourareassignedacategory.
Supportvectorclusteringdoesnotrequireapredeterminednumberofclusters.Researchhasshownthatsupportvectorclusteringhasstableperformanceintheclusteringoflow-dimensionallearningsamples.High-dimensionalsamplescanbereducedbyotherdimensionality.Thereduction)methodcanalsoperformsupportvectorclusteringafterpreprocessing.
Semi-SupervisedSVM(Semi-SupervisedSVM,S3VM)
S3VMisanapplicationofSVMinsemi-supervisedlearning,whichcanbeappliedtoasmallamountoflabeldataandAlearningsamplecomposedofalargenumberofunlabeleddata.Whenunlabeledsamplesarenotconsidered,SVMwillsolvethemaximummarginhyperplane.Afterconsideringunlabeleddata,S3VMwillsolvethetwotypesoflabeledsamplesbasedonthehypothesisoflowdensityseparation,andpassthroughunlabeledsamples.Thepartitioninghyperplaneofthelow-densityareaofdata.
ThegeneralformofS3VMistosolvethedecisionboundaryfromlabeleddataaccordingtothestandardSVMmethod,andadjustthedecisionboundarybyexploringunlabeleddata.BasedonthesoftmarginSVM,theoptimizationproblemofS3VMintroducestwoadditionalslackvariables:
whereislabeledandunlabeledThenumberofsamples,theslackvariablerepresentstheempiricalriskofSSVMclassifyingunlabeleddataintotwocategories.
S3VMhasmanyvariants,includingtransductiveSVM(TransductiveSVM,TSVM),LaplacianSVM(LaplacianSVM)andmeanS3VM(meanS3VM).
Properties
Robustnessandsparsity:TheoptimizationproblemofSVMconsidersbothempiricalriskandstructuralriskminimization,soitisstable.Fromageometricpointofview,thestabilityofSVMisreflectedinthefactthatitrequiresthelargestmarginwhenconstructingthehyperplanedecisionboundary,sothereisamplespacebetweentheseparationboundariestocontaintestsamples.SVMusesthehingelossfunctionastheproxyloss.ThevaluecharacteristicofthehingelossfunctionmakesSVMsparse,thatis,itsdecisionboundaryisonlydeterminedbythesupportvector,andtheremainingsamplepointsdonotparticipateinempiricalriskminimization.Inthenon-linearlearningusingthekernelmethod,therobustnessandsparsityofSVMnotonlyensurereliablesolutionresults,butalsoreducethecalculationamountandmemoryoverheadofthekernelmatrix.
Relationshipwithotherlinearclassifiers:SVMisageneralizedlinearclassifier.OthertypesoflinearclassifierscanbeobtainedbymodifyingthelossfunctionandoptimizationproblemsundertheframeworkoftheSVMalgorithm.Forexample,thelossofSVMReplacethefunctionwiththelogisticlossfunctiontogetanoptimizationproblemclosetologisticregression.SVMandlogisticregressionareclassifierswithsimilarfunctions.Thedifferencebetweenthetwoisthattheoutputoflogisticregressionhasprobabilisticmeaningandcaneasilybeextendedtomulti-classificationproblems.ThesparsityandstabilityofSVMmakeithavegoodgeneralizationabilityandareinuseTheamountofcalculationissmallerwhencheckingthemethod.
Asthenatureofthekernelmethod:SVMisnottheonlymachinelearningalgorithmthatcanusekerneltechniques.Logisticregression,ridgeregressionandlineardiscriminantanalysis(LinearDiscriminantAnalysis,LDA)canalsobeusedtoobtainkernellogisticregression(Kernellogisticregression,kernelridgeregressionandkernellineardiscriminantanalysis(KernelizedLDA,KLDA)methods.Therefore,SVMisoneoftherealizationsofnuclearlearninginabroadsense.
Application
SVMhasapplicationsinpatternrecognitionproblemsinvariousfields,includingportraitrecognition,textclassification,handwrittencharacterrecognition,bioinformatics,etc.
ProgrammingmoduleincludingSVM
Accordingtothenumberofcitations,LIBSVMdevelopedbytheInstituteofInformationEngineering,NationalTaiwanUniversityisthemostwidelyusedSVMtool.LIBSVMincludesstandardSVMalgorithm,probabilityoutput,supportvectorregression,multi-classSVMandotherfunctions.ItssourcecodeiswritteninC,andithasacallinterfaceforlanguagessuchasJAVA,Python,R,MATLAB,CUDA-basedGPUacceleration,andotherfunctions.Components,suchasmulti-coreparallelcomputing,modelcross-validation,etc.
Themachinelearningmodulescikit-learndevelopedbasedonPythonprovidespre-packagedSVMtools,anditsdesignreferstoLIBSVM.OtherPythonmodulesthatincludeSVMincludeMDP,MLPy,PyMVPA,etc.TensorFlow'shigh-levelAPIcomponentEstimatorsprovidesapackagemodelofSVM.