Sitecore 7 Arabic
search, spell check And suggestions
Recently I was asked to create a
proof of concepts for sitecore Arabic search with spell checking and suggestion
accordingly while searching and investigating I had a good results and to let
our TANASUK value "Sharer's" comes
true I am sharing these results.
Let's start with the good news
that's sitecore 7 support Arabic search and it is as simple as searching for
English words, as you can see in the next code snippet you just use the lucene
index to search for your Arabic phrase:
// get the index that will be used for the search
var index = ContentSearchManager.GetIndex(new SitecoreIndexableItem(Sitecore.Context.Item));
// using the search context of the above index
using (var context = index.CreateSearchContext())
{
// pass the Arabic text and get the results.
IQueryable<SearchResultItem>
result = context.GetQueryable<SearchResultItem>().
Where(P
=> P.Content.Contains(ArabicText));
}
https://gist.github.com/9469ef732a6fab46be36
|
Now let's go to spell checking and suggestions using
lucene.Net library and let's suppose that we need these suggestions to be from
our sitecore site items contents;
The first step to do is to create a new index for the
items that will be used as a source for our operations; as an example below we
just indexed the [Title] field of our items:
/// <summary>
/// Method that create lucene
index
/// </summary>
/// <param name="indexPath">Represents the directory path for the newly created
index</param>
private static void CreateIndex(string indexPath)
{
// open the index directory
Lucene.Net.Store.Directory directory =
Lucene.Net.Store.FSDirectory.Open(indexPath);
// create
the indexer with a standard analyzer
var indexWriter = new Lucene.Net.Index.IndexWriter(directory, new Lucene.Net.Analysis.Standard.StandardAnalyzer(
Lucene.Net.Util.Version.LUCENE_CURRENT), true, new Lucene.Net.Index.IndexWriter.MaxFieldLength(Int32.MaxValue));
try
{
// get sitecore master database
Sitecore.Data.Database masterDb =
Sitecore.Configuration.Factory.GetDatabase("master");
// get all items under the Home item
Sitecore.Data.Items.Item _items =
masterDb.GetItem("/sitecore/content/Home");
// Loop
the items and index the Title field for each item
foreach
(Sitecore.Data.Items.Item _item in _items.Axes.GetDescendants())
{
if (_item.Fields["Title"] != null)
{
// create a Lucene document for this book
var _document
= new
Lucene.Net.Documents.Document();
_document.Add(new Lucene.Net.Documents.Field("Title", _item.Fields["Title"].Value,
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.ANALYZED,
Lucene.Net.Documents.Field.TermVector.NO));
indexWriter.AddDocument(_document);
}
//
make lucene fast
indexWriter.Optimize();
}
}
catch (Exception ex)
{
Sitecore.Diagnostics.Log.Error(ex.Message, ex, this);
}
finally
{
// close
the index writer
indexWriter.Close();
} }}
https://gist.github.com//bfa93cffbf8ba14877aa
|
The next thing to do is preparing
the lucene spell checker dictionary by setting its source to the previously
created index above:
/// <summary>
// Method that create a
dictionary for the index provided
/// </summary>
/// <param name="indexPath">Represents the targeted Index directory</param>
/// <param name="spellPath">Represents the spell checker dictionary directory</param>
private static void IndexWords(string indexPath, string spellPath)
{
// open the index reader
Lucene.Net.Index.IndexReader indexReader = Lucene.Net.Index.IndexReader.Open(Lucene.Net.Store.FSDirectory.Open(indexPath),
false);
// create the spell checker
var spell = new SpellChecker.Net.Search.Spell.SpellChecker(Lucene.Net.Store.FSDirectory.Open(spellPath));
// add all the words in the field description to the
spell checker
spell.IndexDictionary(new SpellChecker.Net.Search.Spell.LuceneDictionary(indexReader,
"Title"));
}
https://gist.github.com /f6ae59411483fabf7907
|
Now we can get our word similar suggestion
using the spell checker of the Lucene.Net Lib as simple as the following code:
/// <summary>
/// Method that suggest
similar words to specific word
/// </summary>
/// <param name="spellPath">Represents the spell dictionary directory path</param>
/// <param name="term">Represents the term to suggest similar</param>
/// <returns>retunr
Array of strings which represent the similar suggestions</returns>
private static string[] SuggestSimilar(string spellPath, string term)
{
// create the spell checker
var spell = new SpellChecker.Net.Search.Spell.SpellChecker
(Lucene.Net.Store.FSDirectory.Open(spellPath));
// get
similar suggestions
string[] similarWords =
spell.SuggestSimilar(term, 5);
return similarWords;
}
https://gist.github.com /446b2a59671636f6efd1
|
finally to complete our code
sample here; the following will present the caller code:
/// <summary>
/// Main method
/// </summary>
/// <param name=" WordToCheck ">Represents
the term to spell check and suggest</param>
private void Main(string WordToCheck)
{
try
{
// create a directory to store the index in
string indexPath = @"c:\NewSitecoreLucene\";
if (!System.IO.Directory.Exists(indexPath))
System.IO.Directory.CreateDirectory(indexPath);
// create a directory to store the spell dictionary in
string spellPath = @"c:\NewSitecoreLuceneDictionary\";
if (!System.IO.Directory.Exists(spellPath))
System.IO.Directory.CreateDirectory(spellPath);
//
Create index
CreateIndex(indexPath);
// index the words
IndexWords(indexPath, spellPath);
// Suggest similar words
string[] _suggestions = SuggestSimilar(spellPath,
txtSearchFilter.Text.Trim());
}
catch (Exception ex)
{
Sitecore.Diagnostics.Log.Error(ex.Message, ex, this);
}
}
https://gist.github.com/1bdbd6f9f3f0ab34b30f
|
As we can see using the simple few
steps above using sitecore 7 and Lucene.Net we can search for Arabic phrases,
we can spell check words and make suggestions for wrong words.
Keywords: sitecore 7 search,
Arabic, spell check, Lucene search, Index, suggestions, create index
No comments:
Post a Comment