Google Search Appliance and .Net applications
Google Search Appliance is an excellent search engine in most ways. It’s fast, it’s relatively simple, it’s feature rich and it supports most (if not all) infrastructural scenarios.
A few weeks ago I was asked to review and simplify a GSA installation for a client. They were consuming the Google Search Appliance feed with code similar to this.
var webClient = new WebClient();
var host = string.Format(".../search?q={0}&client{1}&site={2}", query, client, col);
var result = webClient.DownloadString(new Uri(host));
//Manual parsing of the XML feed that GSA returns
While it of course works just fine it’s quite inflexible and in my customers case they had several different applications that needed to consume different feeds and with different levels of security. Therefore it would be better to separate this code into it’s own autonomous project that each application then could use. I thought there must be some kind of ready made library like that already out there that could be used (and perhaps there is) so I started looking around. While I found a few that I peeked at, nothing really caught my eye so to speak and finally I decided to make my own.
It’s nothing too advanced but it supports most of the standard functionality of the GSA and I hope to find the time for further development of it soon. Right now the solution consists of two projects and two test projects. GSA.Search is the main library that handles the actual interaction with the GSA and parsing of results. The GSA.Search.WebApi project is a simple ASP.NET Web API project that leverages the classes in GSA.Search for some simple operations.
For a simple, normal search without any secure content or any other funny stuff you could simply do something like this.
//Create a server
var server = new SearchServer();
//Create a query and populate it
var query = new Query();
query.SearchTerm = "The search query as inputted by the user";
query.Collections = "The collection that you have configured in your GSA installation";
query.Client = "The client frontend that you have configured in your GSA installation";
query.MaxSearchHits = 10;
query.GsaHostAddress = "The hostadress of your GSA installation"
var result = server.Search(query);
//The result will be a ISearchResult object that contains all the properties of the normal XML result but strongly typed.
var numHits = result.NumberOfHits;
//You could for example easily use the result to populate a repeater or as part of an MVC model:
repeater.DataSource = result.SearchHits;
repeater.DataBind();
//or
return View(result.SearchHits);
The project also supports cookie cracking and secure content. Just make sure to pass through the correct cookie to the call to SearchServer.Search, like so:
var server = new SearchServer();
//Pass your HttpCookie that contains your authentication and authorization information.
//This cookie will be passed on to GSA and if a secure search is initiated it will be passed on to any configured cookie cracker
server.Search(query, yourHttpCookie);
And to initiate a secure search simply set the Access property of your query to All or Secure.
var query = new Query();
query.Access = SearchAccess.Secure;
If you wish to get suggestions for a query for autocomplete you can simply pass an ISuggestionQuery to SearchServer.Search instead of a normal query and you will get a List of strings back with suggestions for the specified term.
var query = new SuggestionQuery();
query.SearchTerm = "some query";
query.Collections = "somecollection";
query.Client = "someClientFrontend";
query.MaxSuggestions = 5;
query.GsaHostAddress = "SomeGsaHost.domain.com";
var server = new SearchServer();
server.Search(query);
Also note that the SearchResult object returned from a normal query contains any spelling suggestions or synonyms that the GSA returns.
The Web API project that the solution contains show the most simple usage of the classes in GSA.Search. It’s a nice and simple way to consume the GSA search results if you wish to handle most (or all) of the search code on the client side.
There’s also built in support for handling simple feeds to the GSA. At this moment only web feeds are supported and only in their most basic form (as that was all that we needed for the current applications), but I will be looking into adding support for the more advanced form of feeds.
Feeds are handled by the FeedManager class. This simple example would push an URL to the GSA for immediate re-indexing.
//Create a new feed to push to the GSA
var feed = new Feed();
//A feed consists of one or more FeedRecords which basically is a container for a specific url
//and how it should be handled by the feed
var feedRecord = new FeedRecord();
//The url this feedrecord handles
feedRecord.Url = HttpUtility.UrlDecode(u);
//Whether we want to force the GSA to recrawl this url at once
feedRecord.CrawlImmediately = true;
feed.Records.Add(feedRecord);
var manager = new FeedManager();
manager.GsaHostAddress = "GSA.domain.com";
var response = manager.PushFeed(feed);
//The response should be a HTTP 200 if everything went ok
You could also specify if you want an url deleted from the index using feeds. This is excellent in the case of a CMS powered site (EPiServer springs to mind…) where content could be added and removed by editors. You could then subscribe to the publish and delete events of the CMS and make sure that your GSA index is appropriately updated.
Check out the full source code at GitHub.