A little less conversation and a little more action in this post. I wanted to have a flexible way to define simple charts for small HBase tables. Maybe using HBase for small data might sound crazy, but by doing so we can take advantage of its flexible NoSQL schema. So let’s exploit HBase's REST service (see also here) for this, first of all we have to launch the service with the following command:
[cloudera@localhost ~]$ hbase rest start -ro -p 9998 14/08/06 14:49:33 INFO util.VersionInfo: HBase 0.94.6-cdh4.4.0 ....That starts the HBase REST server in read only mode, and serving at port 9998. This service is started by default in some distributions like HDP2. Once the service is started, the idea is defining a mapping from service responses to some charts. For that we might use matplotlib combined with Flask like this:
@app.route('/barchart.png') def plotOne(): fig = Figure() axis = fig.add_subplot(1, 1, 1) axis.bar(range(len(values)), values) canvas = FigureCanvas(fig) output = StringIO.StringIO() canvas.print_png(output) response = make_response(output.getvalue()) response.mimetype = 'image/png' return responseBut I wanted something simpler, and I found Pygal: it offers nice SVG charts with a high level interface, some fancy animations, and Flask integration. The first chart is easy as pie(chart):
values = [2, 1, 0, 2, 5, 7] @app.route('/barchart.svg') def graph_something(): bar_chart = pygal.Bar(style=DarkSolarizedStyle) bar_chart.add('Values', values) return bar_chart.render_response()
Now with some creative URL routing in Flask we can define moderately complex graphs just in the URL, thanks to the suffix globbing of HBase's REST service, and by using a simple html table in the Jinja2 template. Autorefresh is obtained simply with a <meta http-equiv="refresh" content="{{refresh_rate}}"> element in the template. So we get
for the URL http://localhost:9999/hbase/charts/localhost:9998/test_hbase_py_client/width/1500/cols/2/refresh/500/bar/Sites%20Visited/visits/bar/Info/info/keys/* , assuming a table created in hbase shell as
create 'test_hbase_py_client', 'info', 'visits' put '${TABLE_NAME}', 'john', 'info:age', 42 put '${TABLE_NAME}', 'mary', 'info:age', 26 put '${TABLE_NAME}', 'john', 'visits:amazon.com', 5 put '${TABLE_NAME}', 'john', 'visits:google.es', 2 put '${TABLE_NAME}', 'mary', 'visits:amazon.com', 4 put '${TABLE_NAME}', 'mary', 'visits:facebook.com', 2 list scan '${TABLE_NAME}' exit
The main idea for the mapping into a barchart is that each HBase row corresponds to a group of bars (a color in the chart), and that given a column family the quals for that column are the values in the x-axis, while the cell values correspond to the values for the y-axis. If several rows are specified then all the bars groups are displayed together with a different color per row key.
For a more elaborate example, take a look at this simple Spark Streaming program (so simple it would be called script if it was written in Python ...), that populates an HBase table with a sliding window of one minute containing the mention count in Twitter for some musicians.
As usual, you can find all the code for the post in my github repo, where you can see that the chart service is a single Python script. Now all that is left is extending the Python service to cover all the different types of Pygal charts, and calling a web designer so the chart page stops looking like a web page from the dotcom era.
We are hiring!
If you have enjoyed this post, you are interested in Big Data technologies, and you have a solid experience as a Java developer, take a look to this open position at my company.
No comments:
Post a Comment