Thursday 1 January 2015

Sitecore 7 Arabic search, spell check And suggestions



Sitecore 7 Arabic search, spell check And suggestions

Recently I was asked to create a proof of concepts for sitecore Arabic search with spell checking and suggestion accordingly while searching and investigating I had a good results and to let our TANASUK value "Sharer's" comes true I am sharing these results.



Let's start with the good news that's sitecore 7 support Arabic search and it is as simple as searching for English words, as you can see in the next code snippet you just use the lucene index to search for your Arabic phrase:

// get the index that will be used for the search
var index = ContentSearchManager.GetIndex(new SitecoreIndexableItem(Sitecore.Context.Item));

// using the search context of the above index
using (var context = index.CreateSearchContext())
{
         // pass the Arabic text and get the results.
        IQueryable<SearchResultItem> result = context.GetQueryable<SearchResultItem>().
        Where(P => P.Content.Contains(ArabicText));
}
https://gist.github.com/9469ef732a6fab46be36


Now let's go to spell checking and suggestions using lucene.Net library and let's suppose that we need these suggestions to be from our sitecore site items contents;
The first step to do is to create a new index for the items that will be used as a source for our operations; as an example below we just indexed the [Title] field of our items:
/// <summary>
/// Method that create lucene index
/// </summary>
/// <param name="indexPath">Represents the directory path for the newly created index</param>
private static void CreateIndex(string indexPath)
{
        // open the index directory
        Lucene.Net.Store.Directory directory = Lucene.Net.Store.FSDirectory.Open(indexPath);
        // create the indexer with a standard analyzer
        var indexWriter = new Lucene.Net.Index.IndexWriter(directory, new                       Lucene.Net.Analysis.Standard.StandardAnalyzer(
        Lucene.Net.Util.Version.LUCENE_CURRENT), true, new         Lucene.Net.Index.IndexWriter.MaxFieldLength(Int32.MaxValue));

try
{
        // get sitecore master database
        Sitecore.Data.Database masterDb = Sitecore.Configuration.Factory.GetDatabase("master");
        // get all items under the Home item
        Sitecore.Data.Items.Item _items = masterDb.GetItem("/sitecore/content/Home");
        // Loop the items and index the Title field for each item
        foreach (Sitecore.Data.Items.Item _item in _items.Axes.GetDescendants())
        {
                if (_item.Fields["Title"] != null)
                {
                        // create a Lucene document for this book
                        var _document = new Lucene.Net.Documents.Document();
                        _document.Add(new Lucene.Net.Documents.Field("Title",                                                   _item.Fields["Title"].Value,
                        Lucene.Net.Documents.Field.Store.YES,          
                        Lucene.Net.Documents.Field.Index.ANALYZED,
                        Lucene.Net.Documents.Field.TermVector.NO));
                        indexWriter.AddDocument(_document);
                 }                  
                 // make lucene fast
                    indexWriter.Optimize();
         }
}
catch (Exception ex)
{
        Sitecore.Diagnostics.Log.Error(ex.Message, ex, this);
}
finally
{
        // close the index writer
        indexWriter.Close();
} }}
https://gist.github.com//bfa93cffbf8ba14877aa




The next thing to do is preparing the lucene spell checker dictionary by setting its source to the previously created index above:

/// <summary>
// Method that create a dictionary for the index provided
/// </summary>
/// <param name="indexPath">Represents the targeted Index directory</param>
/// <param name="spellPath">Represents the spell checker dictionary directory</param>
private static void IndexWords(string indexPath, string spellPath)
{
        // open the index reader
        Lucene.Net.Index.IndexReader indexReader =         Lucene.Net.Index.IndexReader.Open(Lucene.Net.Store.FSDirectory.Open(indexPath), false);

        // create the spell checker
        var spell = new         SpellChecker.Net.Search.Spell.SpellChecker(Lucene.Net.Store.FSDirectory.Open(spellPath));

        // add all the words in the field description to the spell checker
        spell.IndexDictionary(new SpellChecker.Net.Search.Spell.LuceneDictionary(indexReader, "Title"));
}
https://gist.github.com /f6ae59411483fabf7907



Now we can get our word similar suggestion using the spell checker of the Lucene.Net Lib as simple as the following code:

/// <summary>
/// Method that suggest similar words to specific word
/// </summary>
/// <param name="spellPath">Represents the spell dictionary directory path</param>
/// <param name="term">Represents the term to suggest similar</param>
/// <returns>retunr Array of strings which represent the similar suggestions</returns>
private static string[] SuggestSimilar(string spellPath, string term)
{
        // create the spell checker           
        var spell = new SpellChecker.Net.Search.Spell.SpellChecker
        (Lucene.Net.Store.FSDirectory.Open(spellPath));

        // get similar suggestions
        string[] similarWords = spell.SuggestSimilar(term, 5);

       
        return similarWords;
}
https://gist.github.com /446b2a59671636f6efd1

finally to complete our code sample here; the following will present the caller code:

/// <summary>
/// Main method
/// </summary>
/// <param name=" WordToCheck ">Represents the term to spell check and suggest</param>
private void Main(string WordToCheck)
{
        try
        {
            // create a directory to store the index in
            string indexPath = @"c:\NewSitecoreLucene\";
            if (!System.IO.Directory.Exists(indexPath))
                System.IO.Directory.CreateDirectory(indexPath);

            // create a directory to store the spell dictionary in
            string spellPath = @"c:\NewSitecoreLuceneDictionary\";
            if (!System.IO.Directory.Exists(spellPath))
                System.IO.Directory.CreateDirectory(spellPath);

           // Create index
            CreateIndex(indexPath);

            // index the words
            IndexWords(indexPath, spellPath);

            // Suggest similar words
            string[] _suggestions = SuggestSimilar(spellPath, txtSearchFilter.Text.Trim());
        }
        catch (Exception ex)
        {
            Sitecore.Diagnostics.Log.Error(ex.Message, ex, this);
        }
}
https://gist.github.com/1bdbd6f9f3f0ab34b30f

As we can see using the simple few steps above using sitecore 7 and Lucene.Net we can search for Arabic phrases, we can spell check words and make suggestions for wrong words.

Keywords: sitecore 7 search, Arabic, spell check, Lucene search, Index, suggestions, create index

No comments:

Post a Comment