This project is read-only.

Getting Started

Download source code, open it with Visual Studio 2010.

Index

LingDong doesn’t have a crawler, so you should get your own data. The source code contains a sample data named SogouT.mini provided by sogou lab.

In VS2010’s solution, configuration files are in Config/LingDong folder. Set generate index destination folder before running:

  • Set ConfigCommon.settings->LingDongDataRoot to your local disk;

Then set Examples/IndexExample as setup project, run.

Query

Set Web/Web as setup project, set Index.aspx as setup page, RUN, then Visual Studio will start a web server automatic.

The SogouT.mini data only contains about 500 pages, and most of these pages are chinese, so it's very normal that no result returned for most of words. You can try chinese word "北京", or a single letter "a".

 

 

 

Index your own pages

Create a DataReader class

Pages (or documents) to index are provided by DataProvider. Pages in different format are read by different classes, which implement the interface IDataReader:

public interface IDataReader : IDisposable

{

    DataEntity GetNextData();

    bool IsAllDataRead();

    bool IsDataContainsOnlyText();

}

Create your own DataReader class, add it to DataProvider assembly. LingDong.DataProviderSogouT2DataReader is a simple.

Change configuration file

The program uses a Design Pattern named “Abstract Factory Pattern” to create instance of DataReader. DataReaderFactory reads configuration file to identify which DataReader will be the right one to use.

string className = ConfigProvider.ConfigDataProvider.Default.DataReaderInstanceClassName;

string assemblyName = Assembly.GetExecutingAssembly().GetName().Name;

return (IDataReader)Assembly.GetExecutingAssembly().CreateInstance(className);

Set ConfigDataProvider.settings->DataReaderInstanceClassName to your own DataReader class name.

Index and Run

The following steps are same as ones in Getting Started.

 

 

Architecture

Architecture

 

 

Directory

Build: Visual Studio build target.

Config: Some configuration files, will be copied to Build/ auto when build the project.

Dependencies: Third-part dll.

Docs: Some documents.

Examples: Some examples source code

Index: Index module source code

Log: Default log output directory.

Query: Query module source code.

SupportPrograms: Some auxiliary functions’ source code.

Tests: Unit test source code.

Web: User interface.

 

 

Reference

LingDong is using some existing open-source projects’ achievements:

HtmlAgilityPack: An agile HTML parser.

PanGu: A library that can segment Chinese and English words from sentence.

ProtoBuf: A serialization solution, better than .NET build in.

NLog: A logging platform for .NET with rich log routing and management capabilities.

Last edited Jan 19, 2011 at 7:42 AM by air8712, version 8

Comments

No comments yet.