Information retrieval

Origins

Informationretrievaloriginatedfromthereferenceandabstractindexingworkoflibraries.Itfirstbegantodevelopinthesecondhalfofthe19thcentury.Bythe1940s,indexingandretrievalhadbecomebooks.Library’sindependenttoolsanduserserviceitems.Withtheadventoftheworld’sfirstelectroniccomputerin1946,computertechnologygraduallyenteredthefieldofinformationretrievalandwascloselyintegratedwithinformationretrievaltheory;offlinebatchinformationretrievalsystem,onlinereal-timeinformationretrievalsystem

Successivelydevelopedandcommercialized.Fromthe1960stothe1980s,undertheimpetusofinformationprocessingtechnology,communicationtechnology,computeranddatabasetechnology,informationretrievaldevelopedrapidlyinvariousfieldssuchaseducation,militaryandcommerce,andwaswidelyused..DialogInternationalOnlineInformationRetrievalSystemistherepresentativeoftheinformationretrievalfieldinthisperiod,anditisstilloneofthemostfamoussystemsintheworld.

Definition

Informationretrievalcanbedividedintobroadandnarrowsense.Informationretrievalinabroadsenseiscalled"informationstorageandretrieval",whichreferstotheprocessoforganizingandstoringinformationinacertainway,andfindingoutrelevantinformationaccordingtotheneedsofusers.Informationretrievalinanarrowsenseisthesecondhalfof"informationstorageandretrieval",usuallycalled"informationsearch"or"informationsearch",whichreferstotheprocessoffindingouttherelevantinformationthatusersneedfromtheinformationcollection.Informationretrievalinanarrowsenseincludesthreemeanings:understandingusers'informationneeds,informationretrievaltechniquesormethods,andmeetinginformationusers'needs.

Accordingtotheprincipleofinformationretrieval,thestorageofinformationisthebasisforinformationretrieval.Theinformationtobestoredhereincludesnotonlytheoriginaldocumentdata,butalsopictures,videos,andaudios.First,theoriginalinformationmustbeconvertedintocomputerlanguageandstoredinthedatabase,otherwisemachinerecognitioncannotbeperformed.Aftertheuserentersthequeryrequestaccordingtotheintention,theretrievalsystemsearchesthedatabaseforinformationrelatedtothequeryaccordingtotheuser’squeryrequest,calculatesthesimilarityoftheinformationthroughacertainmatchingmechanism,andconvertstheinformationinorderfromlargetosmallOutput.

Type

(1)Accordingtostorageandretrievalobjects,informationretrievalcanbedividedinto:

Documentretrieval

Dataretrieval

Factretrieval

Themaindifferencebetweentheabovethreetypesofinformationretrievalis:dataretrievalandfactretrievalaretoretrievetheinformationitselfcontainedintheliterature.Theliteraturesearchistoretrievetheliteraturethatcontainstherequiredinformation.

(2)Accordingtothestoragecarrierandthetechnicalmeanstorealizethesearchasthestandard:

Manualsearch

Mechanicalsearch

Computerretrieval

Therelativelyfast-growingcomputerretrievalis"networkinformationretrieval",

thatis,networkinformationsearch,whichreferstoInternetusersonthenetworkterminal,Theactoffindingandobtaininginformationthroughspecificwebsearchtoolsorthroughbrowsing.

(3)AccordingtoSearchmethodDivision:

Directsearch

Indirectretrieval

Mainlinks

Analysisandcodingofinformationcontent,generationofinformationrecordsandretrievalidentification.

Organizestorageandorganizeallrecordsintoanorderlycollectionofinformationintheformoffiles,databases,etc.

Userquestionprocessingandretrievaloutput.Thekeypartisthematchingandselectionoftheinformationquestionandtheinformationcollection,thatis,thesimilaritycomparisonbetweenthegivenquestionandtherecordsinthecollection,andtheselectionofrelevantinformationaccordingtocertainmatchingcriteria.Itisdividedintodocumentretrieval,dataretrievalandfactretrievalbyobject;itisdividedintomanualretrieval,mechanicalretrievalandcomputerretrievalbyequipment.Aservicefacilitycomposedofacertainsetofequipmentandinformationiscalledaninformationretrievalsystem,suchasapunchcardsystem,anonlineretrievalsystem,aCDretrievalsystem,andamultimediaretrievalsystem.Informationretrievalwasfirstusedinlibrariesandscientificandtechnologicalinformationinstitutions,andthengraduallyexpandedtootherfieldsandcombinedwithvariousmanagementinformationsystems.Theories,technologiesandservicesrelatedtoinformationretrievalconstitutearelativelyindependentfieldofknowledge,whichisanimportantbranchofinformaticsandintersectswithcomputerapplicationtechnology.

Hotspot

Intelligentsearchorknowledgesearch

Traditionalfull-textsearchtechnologyisbasedonkeywordmatchingtosearch,oftenthereareincompletesearch,inaccuratesearch,andsearchqualityThephenomenonisnothigh,especiallyintheInternetinformationage,itisdifficulttomeetpeople'ssearchrequirementsbyusingkeywordmatching.Intelligentsearchuseswordsegmentationdictionaries,synonymdictionaries,andhomophonedictionariestoimprovesearchresults.Forexample,userscansearchfor"computer",andinformationrelatedto"computer"canalsoberetrieved;furthermore,itcanalsoassistinsearchattheknowledgelevelorconceptuallevel,throughSubjectdictionaries,upperandlowerdictionaries,andrelatedequivalentdictionariesformaknowledgesystemorconceptualnetworktogiveusersintelligentknowledgepromptsandultimatelyhelpusersobtainthebestsearchresults.Forexample,userscanfurthernarrowthesearchscopeto"computers"and"servers""Orexpandthequeryto"informationtechnology"orqueryrelated"electronictechnology","software","computerapplications"andothercategories.Inaddition,intelligentretrievalalsoincludesambiguityinformationandretrievalprocessing,suchas"Apple",whetheritreferstoafruitoracomputerbrand,andthedistinctionbetween"Chinese"and"People'sRepublicofChina"willbebasedontheambiguityknowledgedescriptiondatabase,full-textindex,andusersearchcontext.Analysisanduserrelevancefeedbackandothertechnologiesarecombinedtoprocesstheinformationthatusersneedmostefficientlyandaccurately.Knowledgemining

Mainlyreferstothedevelopmentoftextminingtechnology,thepurposeistohelppeoplebetterdiscover,organize,representinformation,extractknowledge,andmeetthehigh-levelneedsofinformationretrieval.Knowledgeminingincludessummarization,classification(clustering)andsimilarityretrieval.

Automaticabstractionistheuseofacomputertoautomaticallyextractabstractsfromoriginaldocuments.Ininformationretrieval,automaticsummarieshelpusersquicklyevaluatetherelevanceofthesearchresults.Ininformationservices,automaticsummarieshelpvariousformsofcontentdistribution,suchassendingtoPDAs,mobilephones,etc.Similarityretrievaltechnologyretrievessimilarorrelateddocumentsbasedondocumentcontentcharacteristics,whichisthebasisforrealizinguserpersonalizedfeedbackandcanalsobeusedfordeduplicationanalysis.Automaticclassificationcanbebasedonstatisticsorrules,throughmachinelearningtoformapredefinedclassificationtree,andthencategorizethemaccordingtothecontentcharacteristicsofthedocument;automaticclusteringisgroupingandmergingaccordingtothedegreeofrelevanceofthedocumentcontent.Automaticclassification(clustering)isveryusefulininformationorganizationandnavigation.

Heterogeneousinformationintegratedretrievalandholographicretrieval

Underthetrendofdistributedandnetworkedinformationretrieval,theopennessandintegrationofinformationretrievalsystemsTherequirementsaregettinghigherandhigher,anditisnecessarytobeabletoretrieveandintegrateinformationfromdifferentsourcesandstructures.Thisisthebasisforthedevelopmentofheterogeneousinformationretrievaltechnology,includingsupportforvariousformattedfiles,suchasTEXT,HTML,XML,RTF,MSOffice,PDF,PS2/PS,MARC,ISO2709,etc.processingandretrieval;supporttheretrievalofmultilingualinformation;supporttheunifiedprocessingofstructureddata,semi-structureddataandunstructureddata;seamlessintegrationwithrelationaldatabaseretrievalandotheropenretrievalsIntegrationofinterfaces,etc.Theso-called"holographicretrieval"conceptistosupportretrievalinallformatsandmethods.Fromapracticalpointofview,ithasdevelopedtothelevelofheterogeneousinformationintegrationretrieval.Human-computerinteractionbasedonnaturallanguageunderstandingandmultimediainformationretrievalintegrationneedtobefurtherimproved.breakthrough.

Inaddition,fromtheperspectiveofengineeringpractice,theintegrateduseofmemoryandexternalstorageofmulti-levelcaching,distributedclusteringandloadbalancingtechnologyisalsoanimportantaspectofthedevelopmentofinformationretrievaltechnology.

WiththepopularityoftheInternetandthedevelopmentofe-commerce,theamountofinformationthatcompaniesandindividualscanobtainandneedtoprocesshasexploded,andmostofitisunstructuredandsemi-structureddata.Theimportanceofcontentmanagementhasbecomeincreasinglyprominent,andinformationretrieval,asthecoresupportingtechnologyofcontentmanagement,willbeappliedtovariousfieldswiththedevelopmentandpopularizationofcontentmanagement,becomingaclosepartnerofpeople'sdailyworkandlife.

Reasonsforretrieval

1.Informationretrievalisashortcuttoobtainknowledge

AyoungcollegestudentnamedYoofromtheDepartmentofPhysicsofPrincetonUniversityHanPhilip,borrowedrelevantpublicmaterialsfromthelibrary,andinonlyfourmonths,hedrewablueprintfortheconstructionofanatomicbomb.Theatomicbombhedesignedwassmall(thesizeofabaseball),light(7.5kg),powerful(equivalentto3/4ofthepoweroftheHiroshimaatomicbomb),andlowcost(onlytwothousandUSdollarsatthetime),whichcausedsomecountries(France,Pakistan)Etc.)havewrittentotheUSEmbassy,​​rushingtobuyacopyofhisdesign.

Inthe1970s,AmericannuclearexpertTaylorreceivedareportentitled"MethodsofMakingNuclearBombs".Hewasattractedbythereport’ssuperbtechnicaldesignandsaidinawe:AmongthereportsIsaw,itwasthemostdetailedandcomprehensive."ButwhatmadehimevenmoresurprisedwasthatthisreportwasactuallywrittenbyayoungstudentmajoringineconomicsatHarvardUniversity.Alltheinformationsourcesofthetechnicalreportareobtainedfromtheverycommonandcompletelyopenbooksandmaterialsofthelibrary.

2.Informationretrievalistheguideofscientificresearch

Intheimplementationofthe"ApolloMoonLandingProgram",theUnitedStatescarriedoutaDuringthepressuretest,itwasfoundthatmethanolwouldcausetitaniumstresscorrosion.Forthisreason,millionsofdollarswerepaidtostudyandsolvethisproblem.Itwaslaterfoundoutthatsomeonehadworkeditoutmorethantenyearsago.Themethodisverysimple.Justadd2%waterinmethanol,andthetimetosearchthisdocumentismorethan10minutes.Inthefieldofscientificresearchanddevelopment,repeatedlaborexiststovaryingdegreesinallcountriesintheworld.Accordingtostatistics,thelossescausedbyrepeatedresearchintheUnitedStateseachyearaccountforabout38%oftheannualresearchfunding,reachingahugeamountofUS$2billion.Japan’sresearchtopicsrelatedtochemistryandchemicalengineeringareduplicatedabroad,withuniversitiesaccountingfor40%,privateaccountingfor47%,andnationalresearchinstitutionsaccountingfor40%,withanaveragerepetitionrateofmorethan40%;China’srepetitionrateisevenhigher.

3.Informationretrievalisthefoundationoflifelongeducation

Thegoaloftheschooltotrainstudentsisstudentintelligence:includingself-studyability,researchability,thinkingability,Abilitytoexpressandorganizeandmanage.

UNESCOproposesthateducationhasbeenextendedtoaperson'sentirelife.Itbelievesthatonlycomprehensivelifelongeducationcancultivateperfectpeople,preventtheagingofknowledge,constantlyupdateknowledge,andadapttotheneedsofcontemporaryinformationsocietydevelopment..

Fourelements

1Thepremiseofinformationretrieval----informationawareness

Theso-calledinformationawarenessistheuseofinformationbypeopleTheinternalmotivationofthesystemtoobtaintherequiredinformationisspecificallymanifestedinthesensitivitytoinformation,theabilitytochoose,andtheabilitytodigestandabsorb,soastodeterminewhethertheinformationcanbeusedbyoneselforacertaingroup,andwhetheritcansolveacertainspecificinreallifepractice.Questionsandaseriesofthinkingprocesses.Informationawarenesscontainsthreelevelsofinformationcognition,informationemotionandinformationbehaviortendency.

ThetermInformationLiteracy(InformationLiteracy)wasfirstproposedbyPaulZurkowski,chairmanoftheAmericanInformationIndustryAssociation,inareporttotheUSgovernmentin1974.Hebelievesthatinformationliteracyistheabilityofpeopletouseinformation,learninformationtechnology,anduseinformationtosolveproblemsintheirwork.

2.Thebasisofinformationretrieval----informationsource

Thedefinitionofinformationsource:inthe"DocumentaryTerminology"publishedbyUNESCO,Theinformationsourceisdefinedasthesourcefromwhichindividualsobtaininformationtomeettheirinformationneeds,whichiscalledtheinformationsource.

Typeofinformationsource:

Dividedaccordingtothewayofexpression:oralinformationsource,bodylanguageinformationsource,physicalinformationsourceanddocumentinformationsource.

Dividedaccordingtotheformofdigitalrecords:bibliographicinformationsource,generalbookinformationsource,referencebookinformationsource,newspaper,periodicalinformationsource,specialliteratureinformationsource,digitallibraryinformationsource,searchengineinformationsource.

Accordingtothedocumentcarrier-printingtype,miniature,machinereadabletype,audio-visualtype

Accordingtothecontentandprocessinglevelofthedocument-primaryinformation,secondaryinformation,Three-timeinformation

Accordingtothepublicationform-books,newspapers,researchreports,conferenceinformation,patentinformation,statisticaldata,governmentpublications,archives,dissertations,standardinformation(theyareconsideredTenmajorinformationsources,ofwhichthelasteightarecalledspecialdocuments.Educationalinformationresourcesaremainlydistributedindifferenttypesofpublicationssuchaseducationalbooks,professionaljournals,anddissertations)

3.Thecoreofinformationretrieval----informationacquisitionability

1.Learnaboutvarioussourcesofinformation

2.Masterthesearchlanguage

3.Proficiencyinusingsearchtools

4.Canjudgeandevaluatethesearcheffect

Twoindicatorsforjudgingthesearcheffect:

Recallrate=relevantinformationdetected/totalamountofrelevantinformation(%)

p>

Accuracyrate=amountofrelevantinformationdetected/totalamountofinformationdetected(%)

4.Thekeytoinformationretrieval:informationutilization

p>

Theprocessofsocialprogressisaprocessofcontinuousproduction-circulation-reproductionofknowledge.

Inordertomakefullandeffectiveuseofexistingknowledgeandinformation,intheprocessoflearning,scientificresearchandstudents

,theproportionoftimeforinformationretrievalisgraduallyincreasing.

Theultimategoalofobtainingacademicinformationistoorganize,analyze,summarizeandsummarizetheinformationobtained,accordingtothethinkingandideasintheprocessoflearningandresearch,toreorganizeallkindsofinformationtocreatenewKnowledgeandinformation,soastoachievethepurposeofinformationactivationandvalue-added.

Retrievalmethods

Informationretrievalmethodsinclude:commonlaw,retrospectivemethodandsegmentedmethod.1.Commonlawisamethodofsearchingfordocumentsusingsearchtoolssuchasbibliography,abstracts,andindexes.Thekeytousingthismethodistobefamiliarwiththenature,characteristicsandsearchprocessofvarioussearchtools,andtosearchfromdifferentangles.Thecommonlawcanbefurtherdividedintotheforwardlawandthereverselaw.Thesequentialinspectionmethodistosearchinchronologicalorderfromthepasttothepresent,whichiscostlyandlowefficiency;thereverseinspectionmethodistosearchfromtherecenttothelong-terminreversechronologicalorder.Itemphasizestherecentdataandthecurrentinformation.Ithasstronginitiativeandgoodeffect..

2.Theretrospectivemethodisamethodofcontinuouslytrackingandsearchingbyusingthereferencesattachedtotheexistingliterature.Whenthereisnoretrievaltoolortheretrievaltoolisincomplete,thismethodcanobtainhighlytargetedinformationandcheckaccurately.Therateishigherandtherecallrateispoor.

3.Thesubsectionmethodisacombinationofretrospectivemethodandcommonlaw.Itusesthetwomethodsinstagesandalternatelyuntiltherequiredinformationisfound.

Generalretrievalprocedures

(1)Analyzetheproblem

(2)Selecttheretrievaltool

Indicativeretrievaltoolthatprovidesclues(Secondaryliterature):Bibliography,collectioncatalog,index,abstract,referencebookguide;

Referencetoolstoprovidespecificinformation(thirdliterature):dictionary,quotationreferencebook,encyclopedia,classbook,politicalBooks,biographicalmaterials,manuals,directoryofinstitutions,geographicalmaterials,statisticalmaterials,yearbooks,catalogues,governmentdocuments.

(3)Useofsearchtools

(4)Obtaintheoriginaltext

(5)Analysisofsearchresults

(6))Changesearchstrategy

Related Articles
TOP