Origins
Informationretrievaloriginatedfromthereferenceandabstractindexingworkoflibraries.Itfirstbegantodevelopinthesecondhalfofthe19thcentury.Bythe1940s,indexingandretrievalhadbecomebooks.Library’sindependenttoolsanduserserviceitems.Withtheadventoftheworld’sfirstelectroniccomputerin1946,computertechnologygraduallyenteredthefieldofinformationretrievalandwascloselyintegratedwithinformationretrievaltheory;offlinebatchinformationretrievalsystem,onlinereal-timeinformationretrievalsystem
Successivelydevelopedandcommercialized.Fromthe1960stothe1980s,undertheimpetusofinformationprocessingtechnology,communicationtechnology,computeranddatabasetechnology,informationretrievaldevelopedrapidlyinvariousfieldssuchaseducation,militaryandcommerce,andwaswidelyused..DialogInternationalOnlineInformationRetrievalSystemistherepresentativeoftheinformationretrievalfieldinthisperiod,anditisstilloneofthemostfamoussystemsintheworld.
Definition
Informationretrievalcanbedividedintobroadandnarrowsense.Informationretrievalinabroadsenseiscalled"informationstorageandretrieval",whichreferstotheprocessoforganizingandstoringinformationinacertainway,andfindingoutrelevantinformationaccordingtotheneedsofusers.Informationretrievalinanarrowsenseisthesecondhalfof"informationstorageandretrieval",usuallycalled"informationsearch"or"informationsearch",whichreferstotheprocessoffindingouttherelevantinformationthatusersneedfromtheinformationcollection.Informationretrievalinanarrowsenseincludesthreemeanings:understandingusers'informationneeds,informationretrievaltechniquesormethods,andmeetinginformationusers'needs.
Accordingtotheprincipleofinformationretrieval,thestorageofinformationisthebasisforinformationretrieval.Theinformationtobestoredhereincludesnotonlytheoriginaldocumentdata,butalsopictures,videos,andaudios.First,theoriginalinformationmustbeconvertedintocomputerlanguageandstoredinthedatabase,otherwisemachinerecognitioncannotbeperformed.Aftertheuserentersthequeryrequestaccordingtotheintention,theretrievalsystemsearchesthedatabaseforinformationrelatedtothequeryaccordingtotheuser’squeryrequest,calculatesthesimilarityoftheinformationthroughacertainmatchingmechanism,andconvertstheinformationinorderfromlargetosmallOutput.
Type
(1)Accordingtostorageandretrievalobjects,informationretrievalcanbedividedinto:
Documentretrieval
Dataretrieval
Factretrieval
Themaindifferencebetweentheabovethreetypesofinformationretrievalis:dataretrievalandfactretrievalaretoretrievetheinformationitselfcontainedintheliterature.Theliteraturesearchistoretrievetheliteraturethatcontainstherequiredinformation.
(2)Accordingtothestoragecarrierandthetechnicalmeanstorealizethesearchasthestandard:
Manualsearch
Mechanicalsearch
Computerretrieval
Therelativelyfast-growingcomputerretrievalis"networkinformationretrieval",
thatis,networkinformationsearch,whichreferstoInternetusersonthenetworkterminal,Theactoffindingandobtaininginformationthroughspecificwebsearchtoolsorthroughbrowsing.
(3)AccordingtoSearchmethodDivision:
Directsearch
Indirectretrieval
Mainlinks
Analysisandcodingofinformationcontent,generationofinformationrecordsandretrievalidentification.
Organizestorageandorganizeallrecordsintoanorderlycollectionofinformationintheformoffiles,databases,etc.
Userquestionprocessingandretrievaloutput.Thekeypartisthematchingandselectionoftheinformationquestionandtheinformationcollection,thatis,thesimilaritycomparisonbetweenthegivenquestionandtherecordsinthecollection,andtheselectionofrelevantinformationaccordingtocertainmatchingcriteria.Itisdividedintodocumentretrieval,dataretrievalandfactretrievalbyobject;itisdividedintomanualretrieval,mechanicalretrievalandcomputerretrievalbyequipment.Aservicefacilitycomposedofacertainsetofequipmentandinformationiscalledaninformationretrievalsystem,suchasapunchcardsystem,anonlineretrievalsystem,aCDretrievalsystem,andamultimediaretrievalsystem.Informationretrievalwasfirstusedinlibrariesandscientificandtechnologicalinformationinstitutions,andthengraduallyexpandedtootherfieldsandcombinedwithvariousmanagementinformationsystems.Theories,technologiesandservicesrelatedtoinformationretrievalconstitutearelativelyindependentfieldofknowledge,whichisanimportantbranchofinformaticsandintersectswithcomputerapplicationtechnology.
Hotspot
Intelligentsearchorknowledgesearch
Traditionalfull-textsearchtechnologyisbasedonkeywordmatchingtosearch,oftenthereareincompletesearch,inaccuratesearch,andsearchqualityThephenomenonisnothigh,especiallyintheInternetinformationage,itisdifficulttomeetpeople'ssearchrequirementsbyusingkeywordmatching.Intelligentsearchuseswordsegmentationdictionaries,synonymdictionaries,andhomophonedictionariestoimprovesearchresults.Forexample,userscansearchfor"computer",andinformationrelatedto"computer"canalsoberetrieved;furthermore,itcanalsoassistinsearchattheknowledgelevelorconceptuallevel,throughSubjectdictionaries,upperandlowerdictionaries,andrelatedequivalentdictionariesformaknowledgesystemorconceptualnetworktogiveusersintelligentknowledgepromptsandultimatelyhelpusersobtainthebestsearchresults.Forexample,userscanfurthernarrowthesearchscopeto"computers"and"servers""Orexpandthequeryto"informationtechnology"orqueryrelated"electronictechnology","software","computerapplications"andothercategories.Inaddition,intelligentretrievalalsoincludesambiguityinformationandretrievalprocessing,suchas"Apple",whetheritreferstoafruitoracomputerbrand,andthedistinctionbetween"Chinese"and"People'sRepublicofChina"willbebasedontheambiguityknowledgedescriptiondatabase,full-textindex,andusersearchcontext.Analysisanduserrelevancefeedbackandothertechnologiesarecombinedtoprocesstheinformationthatusersneedmostefficientlyandaccurately.Knowledgemining
Mainlyreferstothedevelopmentoftextminingtechnology,thepurposeistohelppeoplebetterdiscover,organize,representinformation,extractknowledge,andmeetthehigh-levelneedsofinformationretrieval.Knowledgeminingincludessummarization,classification(clustering)andsimilarityretrieval.
Automaticabstractionistheuseofacomputertoautomaticallyextractabstractsfromoriginaldocuments.Ininformationretrieval,automaticsummarieshelpusersquicklyevaluatetherelevanceofthesearchresults.Ininformationservices,automaticsummarieshelpvariousformsofcontentdistribution,suchassendingtoPDAs,mobilephones,etc.Similarityretrievaltechnologyretrievessimilarorrelateddocumentsbasedondocumentcontentcharacteristics,whichisthebasisforrealizinguserpersonalizedfeedbackandcanalsobeusedfordeduplicationanalysis.Automaticclassificationcanbebasedonstatisticsorrules,throughmachinelearningtoformapredefinedclassificationtree,andthencategorizethemaccordingtothecontentcharacteristicsofthedocument;automaticclusteringisgroupingandmergingaccordingtothedegreeofrelevanceofthedocumentcontent.Automaticclassification(clustering)isveryusefulininformationorganizationandnavigation.
Heterogeneousinformationintegratedretrievalandholographicretrieval
Underthetrendofdistributedandnetworkedinformationretrieval,theopennessandintegrationofinformationretrievalsystemsTherequirementsaregettinghigherandhigher,anditisnecessarytobeabletoretrieveandintegrateinformationfromdifferentsourcesandstructures.Thisisthebasisforthedevelopmentofheterogeneousinformationretrievaltechnology,includingsupportforvariousformattedfiles,suchasTEXT,HTML,XML,RTF,MSOffice,PDF,PS2/PS,MARC,ISO2709,etc.processingandretrieval;supporttheretrievalofmultilingualinformation;supporttheunifiedprocessingofstructureddata,semi-structureddataandunstructureddata;seamlessintegrationwithrelationaldatabaseretrievalandotheropenretrievalsIntegrationofinterfaces,etc.Theso-called"holographicretrieval"conceptistosupportretrievalinallformatsandmethods.Fromapracticalpointofview,ithasdevelopedtothelevelofheterogeneousinformationintegrationretrieval.Human-computerinteractionbasedonnaturallanguageunderstandingandmultimediainformationretrievalintegrationneedtobefurtherimproved.breakthrough.
Inaddition,fromtheperspectiveofengineeringpractice,theintegrateduseofmemoryandexternalstorageofmulti-levelcaching,distributedclusteringandloadbalancingtechnologyisalsoanimportantaspectofthedevelopmentofinformationretrievaltechnology.
WiththepopularityoftheInternetandthedevelopmentofe-commerce,theamountofinformationthatcompaniesandindividualscanobtainandneedtoprocesshasexploded,andmostofitisunstructuredandsemi-structureddata.Theimportanceofcontentmanagementhasbecomeincreasinglyprominent,andinformationretrieval,asthecoresupportingtechnologyofcontentmanagement,willbeappliedtovariousfieldswiththedevelopmentandpopularizationofcontentmanagement,becomingaclosepartnerofpeople'sdailyworkandlife.
Reasonsforretrieval
1.Informationretrievalisashortcuttoobtainknowledge
AyoungcollegestudentnamedYoofromtheDepartmentofPhysicsofPrincetonUniversityHanPhilip,borrowedrelevantpublicmaterialsfromthelibrary,andinonlyfourmonths,hedrewablueprintfortheconstructionofanatomicbomb.Theatomicbombhedesignedwassmall(thesizeofabaseball),light(7.5kg),powerful(equivalentto3/4ofthepoweroftheHiroshimaatomicbomb),andlowcost(onlytwothousandUSdollarsatthetime),whichcausedsomecountries(France,Pakistan)Etc.)havewrittentotheUSEmbassy,rushingtobuyacopyofhisdesign.
Inthe1970s,AmericannuclearexpertTaylorreceivedareportentitled"MethodsofMakingNuclearBombs".Hewasattractedbythereport’ssuperbtechnicaldesignandsaidinawe:AmongthereportsIsaw,itwasthemostdetailedandcomprehensive."ButwhatmadehimevenmoresurprisedwasthatthisreportwasactuallywrittenbyayoungstudentmajoringineconomicsatHarvardUniversity.Alltheinformationsourcesofthetechnicalreportareobtainedfromtheverycommonandcompletelyopenbooksandmaterialsofthelibrary.
2.Informationretrievalistheguideofscientificresearch
Intheimplementationofthe"ApolloMoonLandingProgram",theUnitedStatescarriedoutaDuringthepressuretest,itwasfoundthatmethanolwouldcausetitaniumstresscorrosion.Forthisreason,millionsofdollarswerepaidtostudyandsolvethisproblem.Itwaslaterfoundoutthatsomeonehadworkeditoutmorethantenyearsago.Themethodisverysimple.Justadd2%waterinmethanol,andthetimetosearchthisdocumentismorethan10minutes.Inthefieldofscientificresearchanddevelopment,repeatedlaborexiststovaryingdegreesinallcountriesintheworld.Accordingtostatistics,thelossescausedbyrepeatedresearchintheUnitedStateseachyearaccountforabout38%oftheannualresearchfunding,reachingahugeamountofUS$2billion.Japan’sresearchtopicsrelatedtochemistryandchemicalengineeringareduplicatedabroad,withuniversitiesaccountingfor40%,privateaccountingfor47%,andnationalresearchinstitutionsaccountingfor40%,withanaveragerepetitionrateofmorethan40%;China’srepetitionrateisevenhigher.
3.Informationretrievalisthefoundationoflifelongeducation
Thegoaloftheschooltotrainstudentsisstudentintelligence:includingself-studyability,researchability,thinkingability,Abilitytoexpressandorganizeandmanage.
UNESCOproposesthateducationhasbeenextendedtoaperson'sentirelife.Itbelievesthatonlycomprehensivelifelongeducationcancultivateperfectpeople,preventtheagingofknowledge,constantlyupdateknowledge,andadapttotheneedsofcontemporaryinformationsocietydevelopment..
Fourelements
1Thepremiseofinformationretrieval----informationawareness
Theso-calledinformationawarenessistheuseofinformationbypeopleTheinternalmotivationofthesystemtoobtaintherequiredinformationisspecificallymanifestedinthesensitivitytoinformation,theabilitytochoose,andtheabilitytodigestandabsorb,soastodeterminewhethertheinformationcanbeusedbyoneselforacertaingroup,andwhetheritcansolveacertainspecificinreallifepractice.Questionsandaseriesofthinkingprocesses.Informationawarenesscontainsthreelevelsofinformationcognition,informationemotionandinformationbehaviortendency.
ThetermInformationLiteracy(InformationLiteracy)wasfirstproposedbyPaulZurkowski,chairmanoftheAmericanInformationIndustryAssociation,inareporttotheUSgovernmentin1974.Hebelievesthatinformationliteracyistheabilityofpeopletouseinformation,learninformationtechnology,anduseinformationtosolveproblemsintheirwork.
2.Thebasisofinformationretrieval----informationsource
Thedefinitionofinformationsource:inthe"DocumentaryTerminology"publishedbyUNESCO,Theinformationsourceisdefinedasthesourcefromwhichindividualsobtaininformationtomeettheirinformationneeds,whichiscalledtheinformationsource.
Typeofinformationsource:
Dividedaccordingtothewayofexpression:oralinformationsource,bodylanguageinformationsource,physicalinformationsourceanddocumentinformationsource.
Dividedaccordingtotheformofdigitalrecords:bibliographicinformationsource,generalbookinformationsource,referencebookinformationsource,newspaper,periodicalinformationsource,specialliteratureinformationsource,digitallibraryinformationsource,searchengineinformationsource.
Accordingtothedocumentcarrier-printingtype,miniature,machinereadabletype,audio-visualtype
Accordingtothecontentandprocessinglevelofthedocument-primaryinformation,secondaryinformation,Three-timeinformation
Accordingtothepublicationform-books,newspapers,researchreports,conferenceinformation,patentinformation,statisticaldata,governmentpublications,archives,dissertations,standardinformation(theyareconsideredTenmajorinformationsources,ofwhichthelasteightarecalledspecialdocuments.Educationalinformationresourcesaremainlydistributedindifferenttypesofpublicationssuchaseducationalbooks,professionaljournals,anddissertations)
3.Thecoreofinformationretrieval----informationacquisitionability
1.Learnaboutvarioussourcesofinformation
2.Masterthesearchlanguage
3.Proficiencyinusingsearchtools
4.Canjudgeandevaluatethesearcheffect
Twoindicatorsforjudgingthesearcheffect:
Recallrate=relevantinformationdetected/totalamountofrelevantinformation(%)
p>Accuracyrate=amountofrelevantinformationdetected/totalamountofinformationdetected(%)
4.Thekeytoinformationretrieval:informationutilization
p>
Theprocessofsocialprogressisaprocessofcontinuousproduction-circulation-reproductionofknowledge.
Inordertomakefullandeffectiveuseofexistingknowledgeandinformation,intheprocessoflearning,scientificresearchandstudents
,theproportionoftimeforinformationretrievalisgraduallyincreasing.
Theultimategoalofobtainingacademicinformationistoorganize,analyze,summarizeandsummarizetheinformationobtained,accordingtothethinkingandideasintheprocessoflearningandresearch,toreorganizeallkindsofinformationtocreatenewKnowledgeandinformation,soastoachievethepurposeofinformationactivationandvalue-added.
Retrievalmethods
Informationretrievalmethodsinclude:commonlaw,retrospectivemethodandsegmentedmethod.1.Commonlawisamethodofsearchingfordocumentsusingsearchtoolssuchasbibliography,abstracts,andindexes.Thekeytousingthismethodistobefamiliarwiththenature,characteristicsandsearchprocessofvarioussearchtools,andtosearchfromdifferentangles.Thecommonlawcanbefurtherdividedintotheforwardlawandthereverselaw.Thesequentialinspectionmethodistosearchinchronologicalorderfromthepasttothepresent,whichiscostlyandlowefficiency;thereverseinspectionmethodistosearchfromtherecenttothelong-terminreversechronologicalorder.Itemphasizestherecentdataandthecurrentinformation.Ithasstronginitiativeandgoodeffect..
2.Theretrospectivemethodisamethodofcontinuouslytrackingandsearchingbyusingthereferencesattachedtotheexistingliterature.Whenthereisnoretrievaltoolortheretrievaltoolisincomplete,thismethodcanobtainhighlytargetedinformationandcheckaccurately.Therateishigherandtherecallrateispoor.
3.Thesubsectionmethodisacombinationofretrospectivemethodandcommonlaw.Itusesthetwomethodsinstagesandalternatelyuntiltherequiredinformationisfound.
Generalretrievalprocedures
(1)Analyzetheproblem
(2)Selecttheretrievaltool
Indicativeretrievaltoolthatprovidesclues(Secondaryliterature):Bibliography,collectioncatalog,index,abstract,referencebookguide;
Referencetoolstoprovidespecificinformation(thirdliterature):dictionary,quotationreferencebook,encyclopedia,classbook,politicalBooks,biographicalmaterials,manuals,directoryofinstitutions,geographicalmaterials,statisticalmaterials,yearbooks,catalogues,governmentdocuments.
(3)Useofsearchtools
(4)Obtaintheoriginaltext
(5)Analysisofsearchresults
(6))Changesearchstrategy