Elastica
Include Elastica in your project as svn:externals PDF Print E-mail
Written by Nicolas Ruflin   
Wednesday, 21 December 2011 15:10

As most of you know, Elastica is hosted on github, which means it uses git as its revision control system. I have several projects which include Elastica but use subversion as its version control system. Until now, I included Elastica as an external svn source by hosting my own Elastica svn repository. But yesterday I discovered that the code from github can also be checked out through svn. I immediately asked google to get more details about this feature and discovered several blog entries on the github blog which I had somehow missed.

It is not only possible to check out repositories, but also to check out some specific subfolders or tags and you can even commit to the repository (which I didn't test). As in my projects I only use the Elastica library folder and don't need all the tests and additional data, I check out only the lib folder. If you want to check out the Elastica lib folder from version v0.18.6.0, use the following line of code:

svn co https://github.com/ruflin/Elastica/tags/v0.18.6.0/lib/ .

If you have a lib folder in your project with all your frameworks and libraries and you want to add Elastica as an external source (which is quite useful), you can set the svn:externals property on your library folder to the following.

https://github.com/ruflin/Elastica/tags/v0.18.6.0/lib/Elastica Elastica

If you already have other sources added as externals to your repository (for example ZF), just add this line below your existing lines. The next time you will update your repository, the Elastica folder with all its files will be checked out. To update to one of the next versions of Elastica, update the version number in the url in your svn:externals properties.

 
Using Elastica with multiple Elasticsearch Nodes PDF Print E-mail
Written by Nicolas Ruflin   
Monday, 21 November 2011 17:30

Elasticsearch was built with the cloud / multiple distributed servers in mind. It is quite easy to start a elasticsearch cluster simply by starting multiple instances of elasticsearch on one server or on multiple servers. Every elasticsearch instance is called a node. To start multiple instances of elasticsearch on your local machine, just run the following command in the elasticsearch folder twice:

./bin/elasticsearch -f
./bin/elasticsearch -f

As you will see, the first node will be started on port 9200, the second instance on port 9201. Elasticsearch automatically discovers the other node and creates a cluster. Elastica can be used to retrieve all node and cluster information. In the following example first the cluster object is retrieved (Elastica_Cluster) from the client and then the cluster state is read out. Then all cluster nodes (Elastica_Node) are retrieved and the name of every node is printed out. Every cluster has at least one node and every node has a specific name.

$client = new Elastica_Client();

// Retrieve a Elastica_Cluster object
$cluster = $client->getCluster();

// Returns the cluster state
$state = $cluster->getState();

// Gets all cluster notes
$nodes = $cluster->getNodes();

foreach ($nodes as $node) {
    echo $node->getName();
}

Client to multiple servers

As elasticsearch is a distributed search engine that can be run on multiple servers, it is possible that some servers fail and still, the search works as expected as the data is stored redundantly (replicas). The number of shards and replicas can be chosen for every single index during creation. Of course, this can also be set with Elastica through the mapping as can be seen in the Elastica_Index test. More details on this perhaps in a later blog post.

One of the goals of the distributed search index is availability. If one server goes down, search results should still be served. But if the client connects to only the server that just went down, no results are returned anymore. Because of this, Elastica_Client supports multiple servers which are accessed in a round robin algorithm. This is the only and also most basic option at the moment. So if we start a node on port 9200 and port 9201 above, we pass the following arguments to Elastica_Client to access both servers.

$client = new Elastica_Client(array(
	'servers' => array(
		array('host' => 'localhost', 'port' => 9200)
		array('host' => 'localhost', 'port' => 9201)
	)
));

From now on, every request is sent to one of these servers in a round robin type. Instead of localhost, an external server could be used in addition. I'm aware that this is still a quite basic implementation. As probably some of you already realized, this is no safe failover method, as every second request still goes onto the server that is down. One idea here is to give a specific threshold for every server in which the respond time should be and otherwise the query goes to the next server. In addition, it would be useful to store this information on unavailable servers somewhere in order to use it for the next request. Thus, only one client has to wait for the unavailable server. Storing this information is somehow an issue, since Elastica does not have any storage backend.

Load Distribution

This client implementation also allows to distribute the load on multiple nodes. As far as I know, Elasticsearch already does this quite well on its own. But it helps if more than one node can answer http requests. Therefore, the method above is really useful if you use more than one elasticsearch node in a cluster to send your request to all servers.

It is planned to enhance this multiple server implementation in the future with additional parameters such as priority for a server and some other ideas. Please feel free to write down your ideas in the comment section or directly create a pull request on github.

 
How to Log Requests in Elastica PDF Print E-mail
Written by Nicolas Ruflin   
Sunday, 20 November 2011 20:50

In the Elastica Release v0.18.4.1, the capability to log requests was added. There is a general Elastica_Log object that can later also be extended to log other things such as responses, exceptions and more. The Elastica_Log constructor takes an Elastica_Client as param. To enable logging, the config variable log for the client has to be set to true, or a specific path the log should be written to. This means that every client instance decides on its own whether logging is enabled or not.

The example below will log the message "hello world" to the general PHP log.

$client = new Elastica_Client(array('log' => true));
$log = new Elastica_Log($client);
$log->log('hello world');

If a file path is set as the log config param, the error log will write the "hello world" message to the /tmp/php.log file.

$client = new Elastica_Client(array('log' => '/tmp/php.log'));
$log = new Elastica_Log($client);
$log->log('hello world');

If logging is enabled, all request are at the moment automatically logged. There is a special conversion of request to log messages. The log message is converted to the shell format, so every log line can directly be pasted into the shell to test out. This is quite nice to debug and to create a gist if others ask what the query looks like. Furthermore, this makes it simpler to figure out whether the problem relates to Elastica or not.

For example the output for updating the number of replicas setting request for the index test would look like below.

curl -XPUT http://localhost:9200/test/_settings -d '{"index":{"number_of_replicas":0}}'
 


 
JOOMLA TEMPLATES Joomla Templates By JoomlaBear