wtorek, 27 stycznia 2015

Pig, Kibana and BetterMap - getting started guide

Year ago I had a possibility to test very nice Kibana feature - visualizing events on BetterMap that were indexed using Pig. I was able also to verify if geoquery works.

I will try to explain step by step how to achieve that.


Where is the challenge

My data has to be inserted into Elastic Search in GeoJson format.
GeoJson format is a a two element array [longitude,latitude]. Order is important, and is different than in other Elastic Search Geo formats.

Elastic Search has a capability to guess types and create/append mapping on runtime, but GeoPoint isn't one of those that is easy to discover.


Creating Index and Defining Type

First I have to create empty index with default settings:

POST /zagwozdka/
{}
Then I can create mapping for my new type, I call it events. My dataset row consists of timestamp, event string and location - I want to index all properties.
 POST /zagwozdka/_mapping/events  
  {   
    "events": {   
      "properties": {   
           "event_datetime": {   
              "type": "date"   
           },   
           "event": {   
              "type": "string"   
           },   
           "location": {   
              "type": "geo_point"   
           }   
        }   
     }   
  }   


Dataset

My dataset is a csv file with columns:
  • datetime (yyyy-MM-dd HH:mm:ss)
  • event(chararray)
  • latitude(double)
  • longitude(double)
Below I have put few rows.
Datetime
Event
Latitude
Longitude
2009-01-24 12:00:00.000
Travis
40.5833333
-4.1166667
2009-01-28 11:19:00.000
Diamond
37.65361
-101.19056
2009-01-07 17:48:00.000
Stefan
32.51722
-80.07583
2009-01-23 12:42:00.000
Watson
-31.6333333
150.3333333
2009-01-07 19:48:00.000
Andrew
-32.8833333
152.2166667
2009-01-26 11:19:00.000
John
61.9666667
24.6666667
2009-01-05 13:23:00.000
Greg
44.2166667
15.3666667


Loading a File

First I need to load file using standard PigStorage input.

 events = LOAD 'demo.csv' using PigStorage(';')   
 AS (event_datetime:chararray, event:chararray, latitude:double, longitude:double);  


Transforming into proper types

With Pig I can transform each record into datetime object, event and location tuple (longitude, latitude). Last tuple will be interpreted later as a geopoint in GeoJson format.
 transformed_events = FOREACH events GENERATE  
    ToDate(event_datetime,'yyyy-MM-dd HH:mm:ss') as event_datetime,  
    event,  
    TOTUPLE(longitude,latitude) as location;  

dumping transformed_events to stdout should give output similar to this:

(2009-01-24T12:00:00.000Z,Travis,(-4.1166667,40.5833333))
(2009-01-28T11:19:00.000Z,Diamond,(-101.19056,37.65361))
(2009-01-07T17:48:00.000Z,Stefan,(-80.07583,32.51722))
(2009-01-23T12:42:00.000Z,Watson,(150.3333333,-31.6333333))
(2009-01-07T19:48:00.000Z,Andrew,(152.2166667,-32.8833333))
(2009-01-26T11:19:00.000Z,John,(24.6666667,61.9666667))
(2009-01-05T13:23:00.000Z,Greg,(15.3666667,44.2166667))


Indexing

First I register Elastic Search user defined functions for Pig which enable storing data into Elastic Search, next I store using EsStorage output.
 REGISTER elasticsearch-hadoop-2.0.2.jar  
 STORE transformed_events INTO 'zagwozdka/events' USING org.elasticsearch.hadoop.pig.EsStorage('es.mapping.names=event_datime:event_datetime,event:event,location:location');  

I could ommit es.mapping.names parameter - names in pig are the same in elastic search type. When they're different, this parameter will be helpful.


Configuring Panel

Finally, I can run Kibana and configure dashboard for new source of data. Just need to add BetterMap panel and configure field with proper location field and tooltip source.

Result

If configured properly Kibana presents great events on BetterMap.


QA

- Is datetime mapping required in this scenario?

No, but other diagrams (like Histogram) in Kibana have useful filtering features based on timestamp, so it is worth to index properties with their correct types.



- Is geo_point mapping required in this case?

No, Kibana's BetterMap can display any array consisting of longitude and latitude as long as it is array and has proper order.

The missing part will be geo query in Elastic Search - it won't work without this mapping. Indexing Elastic Search will guess type and it may look like this:

 GET /zagwozdka/events/_mapping  
 {  
   "zagwozdka": {  
    "mappings": {  
      "events": {  
       "properties": {  
         "event": {  
          "type": "string"  
         },  
         "event_datetime": {  
          "type": "date",  
          "format": "dateOptionalTime"  
         },  
         "location": {  
          "type": "double"  
         }  
       }  
      }  
    }  
   }  
 }  

Sample geoquery:
 POST /zagwozdka/events/_search  
 {  
   "query": {  
   "filtered" : {  
     "filter" : {  
       "geo_distance" : {  
         "distance" : "50km",  
         "location" : [-111.89028,40.76083]  
       }  
     },  
      "query" : {  
       "match_all" : {}  
     }  
   }  
  }  
 }  

will give exception:
org.elasticsearch.index.query.QueryParsingException: [zagwozdka] field [location] is not a geo_point field

Update for Kibana 4:

Kibana 4 uses tilemap but it still requires one field with same type.

bettermap required it to be geopoint type in geoJSON format, tilemap support more formats, for example "lat,lon”:

 "location” : "41.12,-71.34”

There has to be some more mapping added, more information can be found in links below.


Brak komentarzy:

Prześlij komentarz