środa, 9 kwietnia 2014

Custom Trace in WebSphere Message Broker 7

It happens that standard console, syslog or eventviewer is not enough to understand an error in WebSphere Message Broker. It happened to me recently when implementing ws-security in WMB 7.0.0.1. 
Long story short I faced an CWWSS6521E error with huge stacktrace and funny exception 

com.ibm.wsspi.wssecurity.core.SoapSecurityException: CWWSS6521E: Logowanie nie powiodło się z powodu wyjątku.: javax.security.auth.login.LoginException: caught exception from broker

There was no place in Broker where I could easily look for more details. Even service trace in official debug mode didn't show anything more detailed than this stacktrace. 
My coworker has sent me this link to one of possible reasons but stacktrace was different so I had to be sure before I install additional fixes. 
Here is solution: 

  1. Add to execution group additional flag pointing a file: -DtraceSettingsFile=MyTraceSettings.properties 
  2. Create file MyTraceSettings.properties with body: com.ibm.ws.wssecurity.*=all=enabled 
This flag will configure trace to store tons of debug information and finally will show more detailed exception in debug:

Found an extranious X509 token - more than configured for PolicySet

For better undestanding those debugs jars responsible for ws-security are: ws-security-impl-1.0-SNAPSHOT.jar or com.ibm.jaxws.thinclient_7.0.0 depending on version of WMB. 

I had upgrade my broker to version 7.0.0.6 and it fixed my errors.

wtorek, 18 marca 2014

Udacity Data Wrangling with Mongo DB: Las Vegas exercice

For final project I have chosen Las Vegas region - it was one of my tour point during my last holidays. I remember what was the process of choosing the hotel - We have opened one of the booking sites and searched for good prices and reviews. 

I wanted to try different approach - choose hotel & casino based on neighborhood, how many other casinos and hotels are in 10 min walk - 500m radius.

Some information about data provided into mongodb - ideal information, how it should look like:
{
"id": "2406124091",
"type: "node",
"visible":"true",
"created": {
          "version":"2",
          "changeset":"17206049",
          "timestamp":"2013-08-03T16:43:42Z",
          "user":"linuxUser16",
          "uid":"1219059"
        },
"pos": [41.9757030, -87.6921867],
"address": {
          "housenumber": "5157",
          "postcode": "60625",
          "street": "North Lincoln Ave"
        },
"amenity": "restaurant",
"cuisine": "mexican",
"name": "La Cabana De Don Luis",
"phone": "1 (773)-271-5176"

what I really had:

{
  "building": "yes", 
  "website": "http://www.caesarspalace.com", 
  "amenity": "casino", 
  "node_refs": [
    "389482445", 
    "1483478762", 
    "1483478753", 
[...]
    "389482448", 
    "389482445"
  ], 
  "gnis:county_name": "Clark", 
  "created": {
    "uid": "336460", 
    "changeset": "9675966", 
    "version": "8", 
    "user": "robgeb", 
    "timestamp": "2011-10-28T12:11:39Z"
  }, 
  "tourism": "hotel", 
  "wheelchair": "yes", 
  "wikipedia": "en:Caesars Palace", 
  "ele": "644", 
  "visible": null, 
  "address": {
    "city": "Las Vegas", 
    "county": "Clark", 
    "state": "NV", 
    "street": "Las Vegas Boulevard", 
    "postcode": "89109", 
    "housenumber": "3570"
  }, 
  "gnis:feature_id": "2472987", 
  "type": "way", 
  "id": "115672893", 
  "name": "Caesars Hotel and Casino"
}

More complex nodes (buildings, ways) don't have position, they  reference to other nodes responsible mostly for having only position - for example 4 nodes, one for each corner of the building. 

MongoDB doesn't support joins, so in order to query for location of hotel I need to collect locations first.

I have created script - for each node with node_refs I iterate over array and create array of locations. I don't want to give one specific location because it generally invalid for roads and long buildings. MongoDB 'near' function in aggregate pipeline supports array for filtering, but doesn't support as a center location. Here is the script:

 p = db.lv.find({'node_refs':{'$exists':1}});
    for el in p:
        #lets add some details
        points = [];

        if 'pos_many' in el:
            continue;

        for ref in el['node_refs']:
            one = db.lv.find_one({'id':ref});
            if one is None:
                continue;
            if 'pos' in one:
                points.append(one['pos']);
                if not 'is_referenced' in one:
                    one['is_referenced'] = 1;
                    db.lv.save(one);
        el['pos_many'] = points;
        db.lv.save(el);
Apart of updating nodes with locations, my script flags nodes which were referenced. I would like to see what kind of nodes are referenced, are there only locations or can I find for example reference to bus stop or tram station. I could filter/delete base on this flag.

Script has updated more than 70 000 objects with references,  almost 678 000 nodes which were referenced. Only 72 were named nodes like tram station so I can't delete those nodes but I won't loose to much information if I filter this data.

Now I can run query to filter all the casinos in Las Vegas region:

db.lv.find({'$or':[{'name': '/Casino/'},{'amenity':'casino'}]})

It gives me more than 50 casinos, some of them are known to me.
For each casino now I can query:
db.lv.aggregate([
                      {
                        '$geoNear': {
                                    'near': pos,
                                    'distanceField': "dist.calculated",
                                    'maxDistance': 0.5/111.12,
                                    'query': {'id':{'$ne':el['id']},'$or':[{'name': '/Casino/'},{'amenity':'casino'}]},
                                    'includeLocs': "dist.location",
                                    'uniqueDocs': 1
                                    
                                  }
                      }
                   ]);

Result of this query for is this table - Top 10 casinos:
casino# of casinos nearby
Bill's Gamblin' Hall & Saloon8
Bellagio Hotel and Casino8
Imperial Palace Hotel and Casino7
Flamingo Hotel and Casino7
Harrah's Hotel and Casino7
Tropicana Hotel and Casino6
Paris Hotel and Casino6
Caesars Hotel and Casino6
Excalibur Hotel and Casino5


List of nearby casinos for top 2 [name , distance in meters, location (lat,lon)]:

Bill's Gamblin' Hall & Saloon:
  1. Flamingo Hotel and Casino 63.6570179247 [36.1154373, -115.1723441]
  2. Bellagio Hotel and Casino 105.992429283 [36.1143679, -115.1733477]
  3. Caesars Hotel and Casino 206.917388636 [36.1154496, -115.1743421]
  4. Paris Hotel and Casino 209.970162638 [36.1130181, -115.1725101]
  5. Bally's Hotel and Casino 229.545641554 [36.1143607, -115.1705686]
  6. Imperial Palace Hotel and Casino 334.493011664 [36.1179157, -115.1726557]
  7. Harrah's Hotel and Casino 397.143685884 [36.118481, -115.172568]
  8. Planet Hollywood Hotel and Casino 466.059119841 [36.1109562, -115.1711528]

Bellagio Hotel and Casino:
  1. Bill's Gamblin' Hall & Saloon 109.676718942 [36.114907, -115.1725608]
  2. Caesars Hotel and Casino 144.503466085 [36.1150138, -115.1744618]
  3. Flamingo Hotel and Casino 165.401843786 [36.1155678, -115.1725383]
  4. Paris Hotel and Casino 173.189321322 [36.1130181, -115.1725101]
  5. Bally's Hotel and Casino 280.117818406 [36.1136115, -115.1709408]
  6. Imperial Palace Hotel and Casino 406.50188833 [36.1179157, -115.1726557]
  7. Planet Hollywood Hotel and Casino 447.491259701 [36.1109562, -115.1711528]
  8. Harrah's Hotel and Casino 470.026846719 [36.118481, -115.172568]
Both hotels are located in a very center of city, on the corners of Las Vegas Boulevard and Flamingo Road. What I couldn't verify in this dataset is that Bill's Gamblin' Hall & Saloon is currently closed.



Known issues: 
When choosing one location of hotel I have chosen first location in array (on of the corners), you can notice this on picture above. It should be center of location.


czwartek, 13 lutego 2014

Content-Security-Policy issues with iOS Chrome

Few days ago I had an opportunity to trace an issue with iOS Chrome not loading a page. The page with all resources were downloaded properly but Chrome was constantly showing that it's still working on loading page. Result is lack of 'on load' events. Problem only occurred when reloading site. Copying whole content to local static web server didn't replicate the issue so it wasn't the problem of content. I was able to cut whole page and return simple 'hello world' page and it turns out that problem still exist on original webserver and it looks like I had to look deeper - http headers. I had created a sample webserver to show the problem I have found:

import time
import BaseHTTPServer

class HTTPHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_POST(s):
        length = int(s.headers['Content-Length'])
        print length
        data = s.rfile.read(length).decode('utf-8')
        print data
    def do_GET(s):
        s.send_response(200)
        s.send_header("Content-Security-Policy", "script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; object-src 'self'; img-src 'self' ; media-src 'self'; frame-src 'self'; font-src 'self' ;connect-src 'self'; report-uri '192.168.43.17/report'")
        s.send_header("Content-Type", "text/html;charset=UTF-8")
        s.end_headers()
        s.wfile.write("<html><head><title>hello</title></head><body><p>hello world %s</p></body></html>"% s.path)
if __name__ == '__main__':
    server_class = BaseHTTPServer.HTTPServer
    httpd = server_class(('192.168.43.17', 80), HTTPHandler)
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass

    httpd.server_close()

There is only one not ordinary element here - CSP header which secures site from cross site scripting and give mechanism of reporting security violations. It looks like Chrome is reporting problems - it violates directives:
1) frame-src with uri: chromeinvoke://cd931b8a0ca6aaed193d25b429ee4019
"csp-report":{
   "document-uri": "http://192.168.43.17/",
   "referrer": "",
   "violated-directive": "frame-src 'self'",
   "original-policy": "script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; object-src 'self'; img-src 'self' ; media-src 'self'; frame-src 'self'; font-src 'self' ;connect-src 'self'; report-uri '192.168.43.17/report'",
   "blocked-uri": "chromeinvoke://cd931b8a0ca6aaed193d25b429ee4019",
   "source-file": "http://192.168.43.17/",
   "line-number": 1
}
2) connect-src with uri: https://localhost
"csp-report":{
   "document-uri": "http://192.168.43.17/",
   "referrer": "",
   "violated-directive": "connect-src 'self'",
   "original-policy": "script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; object-src 'self'; img-src 'self' ; media-src 'self'; frame-src 'self'; font-src 'self' ;connect-src 'self'; report-uri '192.168.43.17/report'",
   "blocked-uri": "https://localhost",
   "source-file": "http://192.168.43.17/",
   "line-number": 1
}
3) violations of frame-src with uri: chromenull://
"csp-report":{
   "document-uri": "http://192.168.43.17/",
   "referrer": "",
   "violated-directive": "frame-src 'self'",
   "original-policy": "script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; object-src 'self'; img-src 'self' ; media-src 'self'; frame-src 'self'; font-src 'self' ;connect-src 'self'; report-uri '192.168.43.17/report'",
   "blocked-uri": "chromenull://",
   "source-file": "http://192.168.43.17/",
   "line-number": 21
}
4) frame-src with uri: chromeinvokeimmediate://3726692da42473af155b530fe0e48c61
"csp-report":{
   "document-uri": "http://192.168.43.17/",
   "referrer": "",
   "violated-directive": "frame-src 'self'",
   "original-policy": "script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; object-src 'self'; img-src 'self' ; media-src 'self'; frame-src 'self'; font-src 'self' ;connect-src 'self'; report-uri '192.168.43.17/report'",
   "blocked-uri": "chromeinvokeimmediate://3726692da42473af155b530fe0e48c61",
   "source-file": "http://192.168.43.17/",
   "line-number": 2
}

Further investigation shown that:
Issue with reporting internal/plugins url is known, it is already submitted here
Changing frame-src from 'self' to * solves loading site issue but is lowering security.
Interesting fact is that when switching from anonymous mode to normal I can notice for a short time an iframe:

wtorek, 4 lutego 2014

Clustering Udacity forum users

One of the questions I wanted to ask is can I cluster users into some groups. For clustering I wanted to use kmeans.
First I had to prepare simple export.
Mapper takes forum and user files and selects proper data from them:

import sys
import csv

def mapper():
    reader = csv.reader(sys.stdin, delimiter='\t')
    writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

    for line in reader:
        if line[0]=="id" or line[0]=="user_ptr_id":
             continue;
        if len(line)==5:
                l = (line[0],'A',line[1],line[2],line[3],line[4]);
                writer.writerow(l)
        else:
                l =(line[3],'B')
                writer.writerow(l)

def main():
    import StringIO

    mapper()
    sys.stdin = sys.__stdin__

main()

Reducer which outputs userid along with his badges, karma and posts count:

#!/usr/bin/python
import sys
import csv
def reducer():
    oldKey = None;
    rep=0
    gold=0
    silver=0
    bronze=0
    count = 0
    reader = csv.reader(sys.stdin, delimiter='\t')
    for line in reader:
        if line[1]=='A':
                if oldKey:
                        print '\t'.join([oldKey,rep,gold,silver,bronze,str(count)])
                oldKey, rep, gold, silver, bronze = line[0],line[2],line[3],line[4],line[5]
                count=0
        else:#B
                count+=1
    if oldKey:
        print '\t'.join([oldKey,rep,gold,silver,bronze,str(count)])
def main():
    import StringIO
    reducer()

if __name__ == "__main__":
    main()

I have used Java Modelling Tools (http://jmt.sourceforge.net/) to visualize k-means clustering and it looks like that we can split our users into 3 clusters where:
17432 users (99%) are in cluster 1, red:
Info
Center
Std. Dev.
Kurt.
Skew.
Reputation
111.457E0
273.317E0
330.457E-1
518.127E-2
Gold
270.996E-3
930.424E-3
868.928E-1
717.735E-2
Silver
878.499E-3
244.692E-2
874.737E-1
683.432E-2
Bronze
421.489E-2
613.437E-2
843.112E-1
584.215E-2
Count
823.078E-2
199.274E-1
820.611E-1
723.129E-2

Cluster 2, 157 users, blue:






Info
Center
Std. Dev.
Kurt.
Skew.
Reputation
555.198E1
267.607E1
284.950E-2
166.334E-2
Gold
712.739E-2
919.289E-2
112.543E-1
272.010E-2
Silver
211.210E-1
204.831E-1
512.309E-2
191.835E-2
Bronze
511.529E-1
324.686E-1
260.647E-2
133.968E-2
Count
302.185E0
238.706E0
122.783E-1
252.269E-2

Cluster 3, 18 users, pink:
Info
Center
Std. Dev.
Kurt.
Skew.
Reputation
267.654E2
105.582E2
123.350E-2
143.433E-2
Gold
242.222E-1
307.142E-1
145.243E-2
154.269E-2
Silver
846.111E-1
768.483E-1
-537.560E-3
863.259E-3
Bronze
134.889E0
103.588E0
-104.684E-2
678.600E-3
Count
760.833E0
622.366E0
-159.373E-2
331.745E-3

Plotting those 3 cluster against two main variables we receive this image:
y-axis – number of posts
x-axis - reputation


I can see that most of the users are not active and there is very small group which helps a lot.