miércoles, 8 de julio de 2015

Elasticsearch (Lucene)

Elasticsearch Basic Concepts Link:

Elasticsearch excellent tutorial:

Just to know:
  • First way to talk to the Elasticsearh is over the port 9300, using the native Elasticsearch transport protocol
  • The second way to talk to the Elasticsearh is over port 9200 using a RESTful API.
  • You can even talk to Elasticsearch from the command line by using the curl command
  • A document belongs to a type, and those types live inside an index. We can draw some (rough) parallels to a traditional relational database: 
  • Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns
  • Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields 
  • An Elasticsearch cluster can contain multiple indices (databases), which in turn contain multiple types(tables). These types hold multiple documents (rows), and each document has multiple fields (columns). (https://www.elastic.co/guide/en/elasticsearch/guide/current/_indexing_employee_documents.html)
  • To identify unequivocally to one document, we need to know is: a index, type and id.

A request to Elasticsearch consists of the same parts as any HTTP request:
  • curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>' -d '<BODY>'
  • Note:
VERB The appropriate HTTP method or verb: GET, POST, PUT, HEAD, or DELETE.
QUERY_STRING Any optional query-string parameters (for example ?pretty will pretty-print the JSON response to make it easier to read.)
BODY A JSON-encoded request body (if the request needs one.)

CURL Example:
  •  curl -i -X GET localhost:9200/_count?pretty
  • Note:
-i allows to see the http headers

Creation document example using REST API:

PUT
  • curl -i -X PUT localhost:9200/megacorp/employee/1 -d '{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]}'
  • Note:
The path /megacorp/employee/1 contains three pieces of information: 
megacorp => The index (database) name
employee => The type name (table)
1 => The ID of this particular employee (row)
-d (....)  => The Json document (row)
  • curl -i -X PUT localhost:9200/megacorp/employee/1/_create -d '{}'
HTTP/1.1 409 ConflictContent-Type: application/json; charset=UTF-8
Content-Length: 110
{"error":"DocumentAlreadyExistsException[[megacorp][3] [employee][40]: document already exists]","status":409}
  • Note:
In that case you can see that we can use _create to help us in order to prohibit create an existing document.
POST
If our data doesn’t have a natural ID, we can let Elasticsearch autogenerate one for us. Here we show how to use POST instead of use PUT to autogenerate the ID.
  • curl -i -X PUT localhost:9200/megacorp/employee/4 -d '{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]}'
HTTP/1.1 201 CreatedContent-Type: application/json; charset=UTF-8Content-Length: 78{"_index":"megacorp","_type":"employee","_id":"4","_version":1,"created":true}
  • curl -i -X POST localhost:9200/megacorp/employee/ -d '{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]}'
HTTP/1.1 201 CreatedContent-Type: application/json; charset=UTF-8Content-Length: 97{"_index":"megacorp","_type":"employee","_id":"AU5uz7mdzFHxYLzn_JYw","_version":1,"created":true}
Note that the he response is similar to what we saw before, except that the _id field has been generated for us 
Remember that the combination of _index, _type, and _id uniquely identifies a document. So the easiest way to ensure that our document is new is by letting Elasticsearch autogenerate a new unique_id, using the POST version of the index request.

Retrieve document example using REST API (simple mode)

GET
  • $ curl -i -X GET localhost:9200/megacorp/employee/3?pretty
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 290

{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "3",
  "_version" : 1,
  "found" : true,
  "_source":{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
}
  • $ curl -i -X GET localhost:9200/megacorp/employee/12?pretty
HTTP/1.1 404 Not Found
Content-Type: application/json; charset=UTF-8
Content-Length: 92

{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "12",
  "found" : false
}

Retrieve part of a document example using REST API 

GET
  • $ curl -i -X GET localhost:9200/megacorp/employee/4?_source=first_name,age
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 117

{"_index":"megacorp",
 "_type":"employee",
 "_id":"4",
 "_version":1,
 "found":true,
 "_source":{"first_name":"John","age":25}
}

Retrieve just a document without any metadata example using REST API 

GET
  • $ curl -i -X GET localhost:9200/megacorp/employee/4/_source
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 168
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]}

Retrieve document example using REST API (DSL mode)

GET
  • curl -i -X GET localhost:9200/megacorp/employee/_search -d '{
    "query" : {
        "match" : {
            "last_name" : "Smith"
                  }
              } 
   }'


Check if any document exists:

HEAD (because this action does not return the body, just HTTP headers)
  • $ curl -i -X HEAD localhost:9200/megacorp/employee/4
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
  • $ curl -i -X HEAD localhost:9200/megacorp/employee/121
HTTP/1.1 404 NOT FOUND
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

Update a document

PUT (twice)
  • $ curl -i -X PUT localhost:9200/megacorp/employee/40 -d '{
>     "first_name" : "John",
>     "last_name" :  "Smith",
>     "age" :        25,
>     "about" :      "I love to go rock climbing",
>     "interests": [ "sports", "music" ]}'

HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 79
{"_index":"megacorp","_type":"employee","_id":"40","_version":1,"created":true}

  • $ curl -i -X PUT localhost:9200/megacorp/employee/40 -d '{
>     "first_name" : "John",
>     "last_name" :  "Smith",
>     "age" :        25,
>     "about" :      "I love to go rock climbing",
>     "interests": [ "sports", "music" ]}'

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 80
{"_index":"megacorp","_type":"employee","_id":"40","_version":2,"created":false}



















No hay comentarios:

Publicar un comentario