2015

Continuous integration with Docker on Amazon Elastic Beanstalk

[This post also appears on Medium] After spending long hours to have a fully working Circle CI / Elastic Beanstalk / Docker integration, I thought it would be useful to put my notes down so I will save some time next time I need to go through a similar setup and avoid spending long nights cursing the gods of the (internet) clouds.

We are making the following assumptions:

  • deploy to single-container Elastic Beanstalk (EB),
  • use Circle as CI engine, I am pretty sure that other services (e.g. Travis, Codeship, etc.) would offer equivalent functionalities,
  • use a private registry to store compiled Docker images: I use Quay.io because it is cheaper than Docker Hub and works just fine for what I need,
  • deploy a Python Django application, however this is not strictly necessary as most of the steps apply to any kind of web application.

There is plenty of documentation related to EB and a few samples that can be used as good starting points to understand what EB needs to deploy a Docker image to a single-container environment (e.g. here and here), but they did not seem to cover all the aspects of real-world implementations.

The aim of what follows is to provide a quick step by step guide to create and deploy an app to EB and describe some details hidden across different documents on Circle, EB and Docker.

Step 1. Elastic Beanstalk on AWS

Using the AWS Web Console, go to IAM and create a new user, give it full access to EB, take note of key+secret and edit ~/.aws/config on your machine to create a new profile, it should look something like:

[profile eb-admin]
aws_access_key_id = ACCESSKEY
aws_secret_access_key = SECRETKEY

still in AWS Web Console, go to Elastic Beanstalk and create a new WEB application and select the following settings:

  • Docker environment,
  • set it to auto-scale,
  • do NOT use RDS that comes with EB, it is more flexible to setup your own RDS and hook it to EB later,
  • sat yes to VPC,
  • set some other reasonable settings, assuming you have basic knowledge of AWS to handle security groups, key-pairs, etc.

Now, back to your machine, create a python virtual environment (using virtualenvwrapper here) and install AWS command line tools:

mkvirtualenv aws
# probably not strictly necessary, but will come useful later…
pip install awscli
pip install awsebcli

Time to setup EB on the local machine and hook it to the application created in the AWS Web Console! Go where your web applications lives (in my case where Django’s manage.py sits), then up 1 folder and run (assuming we run EB on Amazon’s EU-West data centre):

eb init --region eu-west-1 --profile eb-admin

you will be prompted a few questions by the init process:

  • it should show the EB applications you have created in your AWS account, select the one you want to use from the list, it should match the one just created with AWS Web Console,
  • all other options should be automagically detected from the online settings.

I guess the same settings can be defined by manually calling options for the eb init command, but I would not recommend it at this stage.

If the wizard does not work (e.g. because you have weird files that trigger auto detections), go through the init questions by choosing the following:

  • skip platform auto detection and choose Docker (if a Dockerfile is found, it probably will not ask for platform details)
  • choose an existing keypair ssh configuration

After this step, you should now have a fresh .elasticbeanstalk directory containing a single config.yml file reflecting the app settings you have just created. Add a line to your .gitignore file as eb init gets a bit too enthusiast with ignoring files that you might need to share with your team:

!.elasticbeanstalk/config.yml

Now you need to tweak AWS a bit to allow EB to be able to deploy by reading a configuration file from S3. Go back to IAM in the AWS Web Console, you should find 2 newly created Roles (have a look at this article for further information):

  • instance role (or EC2 role) is used by EB to deploy and have access to other resources
  • service role is used by EB to monitor running instances and send notifications

add READ permissions to S3 to the instance role so EB knows how to fetch the relevant files when deploying. Finally go to S3, there should be a new bucket with a few EB related files in it, take note of the name, you will use it later.

Step 2. Docker

Create a private repo on Quay.io: I am assuming that we are going to run some super secret code that we do not want to host on the docker public registry, so I am adding additional information to allow Elastic Beanstalk to authenticate on the private registry.

  • on Quay.io create a robot account and assign write credentials to the private repo
  • download the .dockercfg file associated to the robot account and put it BOTH in ~ on your local machine (so you can authenticate on Quay.io from the command line) and in your repository’s root (will be used later by Circle)

Now create your Dockerfile — there are plenty of good examples around, I quite like one from Rehab Studio that runs a simple Flask test app with Gunicorn and NGINX, you can find it here — and be sure it contains the two following instructions:

EXPOSE 80
CMD ["your-startup-file.sh"]

your-startup-file.sh is basically something that starts the web server, or web application, or any other service that would listen on 80 and produce some sort of output. In my case it is Supervisor running NGINX and Gunicorn.

When you are happy with your docker image, push it to Quay.io with

docker push quay.io/my_account/my_image:latest

Now it is time to integrate an automatic test and build on Circle.

Step 3. Circle CI

Open an account on Circle by authenticating with Github and allow it to connect the Github repo you want to use for continuous integration.

Create a custom circle.yml file in your repository’s root: this file will tell Circle to run tests and deploy your app to EB when committing to release branches, it should look more or less like the following:

machine:
  python:
  version: 2.7.3
  services:
  — docker
dependencies:
  cache_directories:
  — "~/docker"
pre:
  # used to run deploy script
  — pip install awscli
  # used to authenticate on quay.io
  — cp .dockercfg ~
override:
  # cache your TESTING docker image on circle CI
  — if [[ -e ~/docker/image.tar ]]; then docker load -i ~/docker/image.tar; fi
  — docker build -f Dockerfile.test -t test ./project_root/
  — mkdir -p ~/docker; docker save test > ~/docker/image.tar
test:
  post:
  # run tests
  — docker run test python runtests.py
deployment:
  staging:
  branch: release/stg
  commands:
  # this script might be more complicated
  # with a few more arguments maybe
  — ./deploy.sh $CIRCLE_SHA1 eb-stg-env

This yml file instructs Circle to run tests on the docker image, and deploy if pushed to release/stg branch if tests are successful. The interesting aspect of this approach is that deploy.sh can be run (and, more importantly, tested!) locally.

Note! you might need to use private ssh keys to access your repo and build the docker deploy image, this is totally doable by adding keys in the Circle project environment (check this out).

Step 4. Deploy

OK almost there! Now we need to create the deploy.sh file mentioned in the previous step: this will automate the process to build our deploy docker image and put it somewhere so that EB can go and fetch it: we use the AWS command line interface for it. Steps here are fairly straight forward:

  1. build docker image and push to Quay.io
  2. create a Dockerrun file for EB (read details below) and push it to S3
  3. tell EB to create a new application version and deploy it
# deploy.sh
#!/bin/bash
# name of the deploy image on quay.io
BUILD="quay.io/my_account/my_image:$SHA1"

# store quay.io authentication to S3
aws s3 cp .dockercfg s3://$EB_BUCKET/dockercfg

# deploy tag
SHA1=$1

# elastic beanstalk environment to deploy
EB_ENV=$2
APP_NAME="MyEBApp"

# remember the name of the bucket I asked to take note
# above in this post? here is where you use it!
EB_BUCKET="bucket-where-EB-deploys-its-stuff"

# where to put things in the S3 bucket
PREFIX="deploy/$SHA1"

# build main image
docker build -f Dockerfile.build -t $BUILD:$SHA1 .
docker push $BUILD:$SHA1

# replace vars in the DOCKERRUN_FILE
# so that EB knows where to pick things from
DOCKERRUN_FILE="Dockerrun.aws.json"
cat "$DOCKERRUN_FILE.template" \
  | sed 's|<BUCKET>|'$EB_BUCKET'|g' \
  | sed 's|<IMAGE>|'$BUILD'|g' \
  | sed 's|<TAG>|'$SHA1'|g' \
  > $DOCKERRUN_FILE

aws s3 cp $DOCKERRUN_FILE_WEB s3://$EB_BUCKET/$PREFIX/$DOCKERRUN_FILE
rm $DOCKERRUN_FILE

# Create application version from Dockerrun file
echo "creating new Elastic Beanstalk version"
aws elasticbeanstalk create-application-version \
  --application-name $APP_NAME \
  --version-label $SHA1 \
  --source-bundle S3Bucket=$EB_BUCKET,S3Key=$PREFIX/$DOCKERRUN_FILE

# Update Elastic Beanstalk environment to new version
echo "updating Elastic Beanstalk environment"
aws elasticbeanstalk update-environment \
  --environment-name $EB_ENV \
  --version-label $SHA1

So what is this mysterious Dockerrun.aws.json? it is a simple descriptor that tells AWS where to pull the Docker image from, what version and using which credentials. Below is the template file where , and are replaced with live variables by the *deploy.sh* script, and *dockercfg* tells EB where to find credentials for private docker images.

{
  "AWSEBDockerrunVersion": "1",
  "Authentication": {
    "Bucket": "<BUCKET>",
    "Key": "dockercfg"
  },
  "Image": {
    "Name": "<IMAGE>:<TAG>",
    "Update": "true"
  },
  "Ports": [{
    "ContainerPort": "80"
  }],
  "Logging": "/var/eb_log"
}

Step 5. Tweaks, env variables, etc.

Docker on EB needs environment variables to run properly! you can either set them up directly in EB, or run a script that automates the process by using a standard format accepted by the AWS command line interface option update-environment. Here is an example format of AWS options (say it’s stored with name EB-options.txt.template):

[
  {
    "Namespace": "aws:elasticbeanstalk:application:environment",
    "OptionName": "DJANGO_SETTINGS_MODULE",
    "Value": "$DJANGO_SETTINGS_MODULE"
  },
  {
    "Namespace": "aws:elasticbeanstalk:application:environment",
    "OptionName": "SECRET_KEY",
    "Value": "$SECRET_KEY"
  }
]

which can be processed by replacing local environment variables and send them to EB:

# update-env.sh
#!/bin/bash
OPTIONS_FILE="EB-options-$(date +%Y-%m-%d:%H:%M:%S).txt"
cat EB-options.txt.template | envsubst > $OPTIONS_FILE
aws elasticbeanstalk update-environment \
  --environment-name $EB_ENV \
  --option-settings file://$OPTIONS_FILE
rm $OPTIONS_FILE

The end!

quite a bit of a headache, but Woohoo! when you see it running on EB you feel like you are the god of AWS!

Convert Longitude-Latitude to a flat map using Python and PostGIS

I’ve recently had to develop a web app that shows Tweets locations on a map. It’s simple enough to extract a tweet’s location (when present), just check the API Docs for a Tweet object and you’ll find a coordinates field that reportedly:

Represents the geographic location of this Tweet as reported by the user or client application. The inner coordinates array is formatted as geoJSON (longitude first, then latitude).

The next step is to visualize it on a flat map with the widely accepted Mercator projection. There are a few useful references on StackOverflow and Wolfram that gave me the hints to write these simple python functions:

import math

def get_x(width, lng):
    return int(round(math.fmod((width * (180.0 + lng) / 360.0), (1.5 * width))))

def get_y(width, height, lat):
    lat_rad = lat * math.pi / 180.0
    merc = 0.5 * math.log( (1 + math.sin(lat_rad)) / (1 - math.sin(lat_rad)) )
    return int(round((height / 2) - (width * merc / (2 * math.pi))))

where width and height are the size in pixels of the flat projection. The formula works fine, translating the reference from Wolfram to the get_y function was simple enough, but the reason behind some details of the function found on StackOverflow (e.g. multiplying width by 1.5) seemed a bit arbitrary to me and I was too lazy to find the answers.

Turns out my Postgresql database also has PostGIS extensions installed, so I’ve decided to put them at use. I found that what we usually simply call lng-lat has a formal definition with the standard WGS84, this mapping to PostGIS’ spatial reference id.4326. On the other hand, the Mercator Projection is also a standard transformation known as EPSG:3785 mapping to PostGIS id.3785 (same id, thank god).

It’s then possible to transform a WGS84 reference to EPSG:3785 by calling PostGIS functions directly in the SQL query:

select
    t.latitude,
    t.longitude,
    ST_X(ST_Transform(ST_SetSRID(ST_Point(t.longitude, t.latitude),4326),3785)) as x,
    ST_Y(ST_Transform(ST_SetSRID(ST_Point(t.longitude, t.latitude),4326),3785)) as y,
    created
from tweet as t

nice! just be aware that transforming lng-lat to EPSG:3785 returns points where the axis origin is at the centre of the map, and boundaries are defined by the standard as -20037508.3428, -19971868.8804, 20037508.3428, 19971868.8804. It’s simple to translate the origin of axis to the top left corner and normalize the size in pixels to obtain the same results of the Python function.

uh, one last thing I never managed to permanently store in my brain: LONGITUDE is the X on the map, while LATITUDE is the Y. For me it’s easier to remember by visualizing the equivalence X-Y -> LNG-LAT.

References

2014

How to fix Django, Apache VirtualHost, mod_wsgi and DJANGO_SETTINGS_MODULE

I've spent countless hours trying to fix Django installations running on legacy Apache servers, these usually recurring every few months, a time span long enough to forget how the last fix was done. And for some reason, the docs are not mentioning this crucial features AT ALL! In the official Django + mod_wsgi documentation page, they don't mention something so irrelevant such as THE MAIN DJANGO SETTINGS FILE?

For future memory, here's a solution that makes me happy, and hopefully next time will only be a few minutes headache (although I already know it won't be the case...)

- copy the default wsgi.py to a new file called apache_wsgi.py

- modify the new apache_wsgi.py so that it reads:

import os
import mod_wsgi
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project_name.settings.%s" % mod_wsgi.process_group)

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

- now open your virtualhost configuration file and check it contains the following

WSGIScriptAlias / /path/to/your/apache_wsgi.py
WSGIDaemonProcess pick_settings_file_name processes=2 threads=15 python-path=/path/to/virtualenv/lib/python2.7/site-packages:/path/to/django/project
WSGIProcessGroup pick_settings_file_name

- almost there! create a /settings folder sibling of settings.py, put an empty __init__.py in it, then move settings.py inside that folder, rename it to common.py and create a new pick_settings_file_name.py with a single liner in it

from .common import *

VOILAAAAAAAAAAAAAAA!!!!

just restart your apache and everything will work. in your pick_settings_file_name.py you will have configurations specific for your environment (i.e. multiple copies of the file for dev, staging: project_dev.py, project_staging.py)

Time for my own AngularJS + Gulp + Browserify boilerplate

tl;dr: just visit my Github repo for a simple boilerplate with basic documentation.

I have found 36,749 examples of boilerplates using a miscellanea of AngularJS, Gulp, Browserify, Grunt, RequireJS and all the usual suspects.

But none of them actually did what I needed, although I reckon it is quite simple: define Angular modules in separate files, so the code is nice and clean and all the various services, directives and controllers are pluggable without massive headaches.

At the same time, having completely ignored the existence of Grunt so far (thanks to frontend guys who took care of this), I thought it was a good time for a fresh start with the latest and more fashionables Gulp and Browserify.

I ended up creating my own boilerplate (or seed, if you like), it works defining angular modules:

require('../../vendors/angular/angular');

module.exports = angular.module('boilerplate.controllers', []);

// Just define the list of controllers here while developing the whole app (in this boilerplate, just one):
require('./welcome.js');

using them:

require('../../vendors/angular/angular');

var module = require('./_module_init.js');

module.controller('WelcomeCtrl', ['$scope', function($scope) {
console.log('welcome controller');
$scope.greetings = 'Hey!'
}]);

and eventually compiling them in a single js

gulp.task('scripts', function() {
  return gulp.src('scripts/main/app.js')
    .pipe(browserify({
      insertGlobals : true,
      debug : !isProduction
    }))
    .pipe(gulp.dest('build'))
  });

if anyone is interested, feel free to clone and submit pull requests!

2012

Setting up Django with MAMP on Mac OS X Lion (in ? steps)

I started playing with a Django-Python version of my current project and I needed a testing environment on my local Mac-based machine. The thing is I didn't want to use a dedicated MySQL server just for the Django deployment, as it seems really silly when I already have my quiet MAMP running in parallel with all the other CodeIgniter-PHP based projects.

It's harder than you may expect. I had to go through a painful series of steps to eventually have the whole thing working, and taking here some notes so I can have them handy when I'll need to do it again, hopefully it will help someone else. It still needs to have a "dedicated" copy of MySQL installed just to compile, but it won't be used to actually run the server.

step1. download and install the latest mysql community server from the DMG file, here. Then from the command line, add the fresh installation to the local path, this will be used to pick the sources needed for the python module compilation. Also add a variable for the dynamic library linkage, this will be used on execution of your python scripts to find the correct library:

export PATH=/usr/local/mysql/bin/:$PATH
export DYLD_LIBRARY_PATH=/usr/local/mysql/lib/

step2. assuming we have already a python virtualenv script in place. Create a new virtual environment and activate it:

python virtualenv.py django-mysql
. django-mysql/bin/activate

step3. now we're ready to install django and mysql-python. the latter should compile with few minor warnings if and only if we have correctly executed step1 (just double check that paths in the export statements are correct)

pip install django
pip install mysql-python
python
>>> import MySQLdb
>>> print MySQLdb

if everything works fine, then the last command above should show the full path of the mysql module compiled during the installation process

step4. create a django project and configure the settings.py so as to use the MySQL server shipped with MAMP

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql', # Add 'postgresql_psycopg2', 'mysql', 'sqlite3' or 'oracle'.
        'NAME': 'django_test', # Or path to database file if using sqlite3.
        'USER': 'root', # Not used with sqlite3.
        'PASSWORD': 'root', # Not used with sqlite3.
        'HOST': '/Applications/MAMP/tmp/mysql/mysql.sock', # Set to empty string for localhost. Not used with sqlite3.
        'PORT': '', # Set to empty string for default. Not used with sqlite3.
    }
}

take EXTRA CARE to double check that the host variable is set to MAMP's socket, hence /Applications/MAMP/tmp/mysql/mysql.sock

step5. in my case, it's really not happening frequently, I also had problems with my locale, as some of the programs I installed messed around with the environment, so I got the error "unknown locale: UTF-8" as soon as I tried to sync the django db. this is solved exporting the correct variables, THEN syncing the db

export LC_ALL=en_us.UTF8
export LC_CTYPE=en_us.UTF8
export LANG=en_us.UTF8
python manage.py syncdb

and this should be all! Just a final note: this post was assembled trying things and using different sources to solve each and every problem appeared along the road, these are the original URLs I had to collate to eventually see my working copy of django: dedicated mysql dynamic library mamp socket  unknown locale

jquery+php regex checker and reverse geo-coding service

It happened a million times: I had to check a regex in PHP and did not have a handy tool to see the result. I usually linger on Rubular, which is a great service but based on Ruby, and sometimes the results are a bit different [update 2015 - just use regex101, best tool around]. I decided to implement a simple regex checker similar to Rubular, but based on php preg_match.

Another service that I need to use daily is reverse geo-coding, adding an easy way to find latitude and longitude of a specific point on the map, again there are tons of services around, but none is really what I needed. So I decided to develop my own, and share them (you might break them easily, they're just simple tools for manual use, hence there are API limits, and security wasn't my focus, so please don't hit them too hard). I might add a few things from time to time, depending on my needs, but feel free to drop me a line if you either want to contribute or feel something really missing.

these applications are no longer updated and will be discontinued soon, keeping the old URLs here just for the records

  • http://dev.londondroids.com/tools/index.php/main/preg_match
  • http://dev.londondroids.com/tools/index.php/main/geocode

Visualizing Cluster of Tweets in London during the Royal Wedding

So the little storm has arrived! I am now the proud (and tired) father of little Benjamin, and for this reason blocked at home with little or no time for doing anything but changing nappies and cooking super-proteic food for Val. But somehow I found some time amuse myself with a little piece of Processing and wrote a simple code to visualize tweets on a map of London during the Royal Wedding. Easy enough to foresee, tweets during the day are creating nice clusters around Buckingham Palace, Westminster Abbey and the super posh hotel where the Middleton's used to stay.

Here is the result:

To display this data I reused the information stored during the first phase of my Flux of MEME project, fetched from twitter with the Streaming API implementation in its Java flavour twitter4j. Processing is reading the information in XML directly from the database, hence a little PHP backend is providing the XML descriptor for all the posts locations.

2011

Configuring NGINX to serve multiple webapps from different directories

Few days ago I had to add a wordpress installation within the same environment where a Codeigniter app was already running happily and undisturbed. It took me a while to figure out how to keep separate folders on the filesystem, and serve the blog from a subfolder of the main domain: it ended up that the solution is super simple, but apparently I am not the only one who had similar problems. Symptoms of a bad installation usually result in "no input file specified" messages or, even worse, downloading the php source code with all your precious database passwords shown in clear.

So the premise being:

  • the webapps need to live in sibling folders to keep tidy our github repo, in the example below will be named as /home/ubuntu/repo/webapp (codeigniter) and /home/ubuntu/repo/blog (wordpress)
  • the main webapp needs to respond to all the requests, while wordpress needs to catch only requests starting with /blog

there might be better and more elegant solutions, but this is working for me, including pretty permalinks on wordpress:

server {
    server_name your.domain.com;

    access_log /home/ubuntu/repo/logs/access.log;
    error_log /home/ubuntu/repo/logs/error.log;

    # main root, used for codeigniter
    root /home/ubuntu/repo/webapp;
    index index.php index.html;

    # links to static files in the main app, mainly for dev purposes as this is
    # unlikely to be triggered when using a CDN with absolute URLs to assets
    location ~* ^/(css|img|js|flv|swf)/(.+)$ {
        root /home/ubuntu/repo/webapp/application/public;
    }

    # most generic (smaller) request
    # most of the times will redirect to named block @ci
    location / {
        try_files $uri $uri/ @ci;
    }

    # create the code igniter path and perform
    # internal redirect to php location block
    location @ci {
        if (!-e $request_filename)
        {
            rewrite ^/(.*)$ /index.php/$1 last;
            break;
        }
    }

    # now the meaty part, execute php scripts
    location ~ \.php {
        include /etc/nginx/fastcgi_params;

        # default path of our php script is the main webapp
        set $php_root /home/ubuntu/repo/webapp;

        # but we might have received a request for a blog address
        if ($request_uri ~ /blog/) {
            # ok, this line is a bit confusing, be aware
            # that path to /blog/ is already in the request
            # so adding a trailing /blog here will
            # give a "no input file" message
            set $php_root /home/ubuntu/repo;
        }

        # all the lines below are pretty standard
        # notice only the use of $php_root instead of $document_root
        fastcgi_split_path_info ^(.+\.php)(/.+)$;

        fastcgi_param PATH_INFO $fastcgi_path_info;
        fastcgi_param PATH_TRANSLATED $document_root$fastcgi_path_info;

        fastcgi_param SCRIPT_NAME $fastcgi_script_name;
        fastcgi_param SCRIPT_FILENAME $php_root$fastcgi_script_name;

        fastcgi_pass unix:/var/run/php-fastcgi/php-fastcgi.socket;
        fastcgi_index index.php;
    }

    # now the blog, remember this lives in a sibling directory of the main app
    location ~ /blog/ {
        # again, this might look a bit weird,
        # but remember that root directive doesn't drop
        # the request prefix, so /blog is appended at the end
        root /home/ubuntu/repo;
        if (!-e $request_filename)
        {
            rewrite ^/(.*)$ /index.php/$1 last;
            break;
        }
    }
}

please feel free to add comments and suggestions, hope this helps.

Twitter geo-located clustering and topic analysis, now opensource!

A year has passed since the beginning of the trial of Flux of MEME, the project I have presented during the Working Capital tour, and it is now time to analyze what has been learned and show what has been developed to conclude this R&D phase and deliver results to Telecom Italia.

the initial idea

It’s worthwhile giving a quick description of the context: Twitter is a company formed in 2006 which has received several rounds of funding by venture capitals over the past few years, this leading to today's valuation of $1.2B, still during the summer of 2009 the service was not yet mature and widespread as it may look now. At that time the development of the Twitter API had just started, this probably being one of the few sources, if not the only one, for geo-referenced data. The whole concept of communication in the form of public gossip, mediated by a channel that accepts 140 characters per message, was appearing in the world of social networks for the first time.
This lead to the base idea of crunching this data stream, which most importantly include the geographical source, then summarize the content, so as to analyze the space-time evolution of the concepts described and, ultimately, make a prediction of how they could migrate in space and time.

A practical use

It could allow you to control and curb the trend of potentially risky situations (such as social network analysis has been useful during the recent riots in London) or even define marketing strategies targeted to the local context.

The implementation

A consistent initial phase of research allowed to have an overview on different aspects: the ability to capture the information from Twitter, the structure of captured data, the ability to obtaining geo-located information, the classification of languages of the tweets, the enrichment of content through discovery of related information, the possible functions for spatial clustering, the algorithms for topic extraction, the definition of views useful for an operator and finally the ability to perform a trend analysis on the information extracted. All of this has resulted in a substantial amount of programming code, its outcome being a demonstrator for the validity of the initial theory.

space-time evolution of the concept "earthquake" in a limited subset of data captured during the period May 2011"

distribution of groups of tweets source languages ​​over Switzerland and northern Italy

The future of the project

The development done so far has had two important results: firstly, it allowed to demonstrate the validity of the initial idea, and secondly it has revealed the requirements needed by the system to be fully functional. The main problem lays in the architecture implemented for the demonstrator, which at the moment relies on a limited amount of data (for obvious reasons of availability of resources): this immediately proved the necessity of scaling up the application environment in a more complex architecture for distributed computing  The market and/or Telecom Italia will eventually decide if this second phase of development can be faced.

References

Configuring NGINX and CodeIgniter on Ubuntu Oneiric Ocelot on Amazon EC2

Few days ago I started the server setup for a web project @therumpusroom_ and, after receiving the traffic estimates, I thought a single Apache server was not enough to handle the expected load of visitors. For several reasons I want to avoid using a load balancer and multiple Apache instances, hence the decision to implement Nginx with MySql running on a separate dedicated server.

The whole infrastructure lives on Amazon Web Services and the web application - still under development - will rely on CodeIgniter. I have read quite a lot of articles on-line and stolen bits and pieces of configuration files, but none of them entirely reflected what I needed. I feel it is quite a common configuration hence I am writing down here the required steps and some code snippets, both for my personal records and also hoping it can be helpful for someone else with similar issues.

The premise: implement a CodeIgniter installation on Amazon EC2 with a dedicated DB server and content delivery network for rich media distribution.

Pre-requisites / specs: Ubuntu 11.10 Oneiric Ocelot 64bit with Nginx web server running on a large instance on Amazon EC2, dedicated MySQL server on Amazon RDS and Cloudfront CDN.

The steps:

1. choose your Ubuntu installation

I ended up choosing Oneiric Ocelot 64bit, I am always too tempted to try the latest, anyhow you can always find your own Ubuntu AMI using the super helpful AMI locator for EC2

2. start a basic NGINX installation

I used this guide on Linode to configure Nginx and PHP-FastCGI on Ubuntu 11.04 (Natty) as a starting point, just be aware of the following:

  • ignore the hostname configuration: it did not work for me and it is not relevant to make the web server work properly
  • start with the suggested config for nginx, but keep in mind you will need to finalize it later

also the init.d/php-fastcgi script in the Linode guide gave errors and was not working properly for me, so I have created a simpler version (you may need to manually create pid/socket folders before running the script the first time):

PHP_SCRIPT=/usr/bin/php-fastcgi
PID_DIR=/var/run/php-fastcgi
PID_FILE=/var/run/php-fastcgi/php-fastcgi.pid
SOCKET_FILE=/var/run/php-fastcgi/php-fastcgi.socket
RET_VAL=0

case "$1" in
    start)
      $PHP_SCRIPT
      RET_VAL=$?
  ;;
    stop)
      rm $PID_FILE
      rm $SOCKET_FILE
      killall -9 php5-cgi
      RET_VAL=$?
  ;;
    restart)
      rm $PID_FILE
      rm $SOCKET_FILE
      killall -9 php5-cgi
      $PHP_SCRIPT
      RET_VAL=$?
  ;;
    status)
      echo "php-fastcgi running with PID `cat $PID_FILE`"
  ;;
    *)
      echo "Usage: php-fastcgi {start|stop|restart|status}"
      RET_VAL=1
  ;;
esac
exit $RET_VAL

by this time you should be able to execute some test php code to chech that your FastCGI script is working properly and receiving parameters from the web server just using the default site already enabled.

3. setup CodeIgniter

now the interesting part: setting up CodeIgniter with correct locations is not straight forward. There is an interesting thread on the official CodeIgniter Forum, pointing the right way but unfortunately it does not entirely solve the problem.

After downloading CodeIgniter and extracting the archive in the document root, the first important step required to see at least the welcome screen is to setup the configuration file so as to receive parameters from the web server, under /application/config/config.php

$config['uri_protocol'] = 'REQUEST_URI';

and finally setup the Nginx "virtual host" to serve the correct directories and path infos used by CodeIgniter controllers to receive parameters: in my setup I have the CodeIgniter application folder also serving the main static contents (under /application/public with subfolders: css, img, js). I started from a config file find on gist then tweaked to reflect my specific needs. Here is the code:

server {
    server_name project.staging.example.com;
    access_log /home/ubuntu/repo/staging/logs/access.log;
    error_log /home/ubuntu/repo/staging/logs/error.log;

    root /home/ubuntu/repo/staging/webdev;
    index index.php index.html;

    location ~* ^/(css|img|js)/(.+)$ {
        root /home/ubuntu/repo/staging/webdev/application/public;
    }

    location / {
        try_files $uri $uri/ @rewrites;
    }

    location @rewrites {
        if (!-e $request_filename)
        {
            rewrite ^/(.*)$ /index.php/$1 last;
            break;
        }
    }

    location /(application|system) {
        internal;
    }

   location ~ \.php {
        include /etc/nginx/fastcgi_params;

        fastcgi_split_path_info ^(.+\.php)(/.+)$;
        fastcgi_param PATH_INFO $fastcgi_path_info;
        fastcgi_param PATH_TRANSLATED $document_root$fastcgi_path_info;
        fastcgi_param SCRIPT_NAME $fastcgi_script_name;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass unix:/var/run/php-fastcgi/php-fastcgi.socket;
        fastcgi_index index.php;
    }
}

this should be all, hope this helps and feel free to drop a line below for questions.

wordpress authentication - redirect after incorrect login

 

[caption id="" align="alignright" width="300" caption="Image via Wikipedia"]The logo of the blogging software WordPress.[/caption]

Few days ago I had to deploy a simple wordpress installation for a project at @therumpusroom_ with my buddy @twentyrogersc. Super simple, yet it presented a little issue: having a personalized authentication form, displayed on a wordpress page, it was re-directing to the default login screen after specifying incorrect credentials. It came out that we needed to implement the whole authentication mechanism on a page, hence tweaking the simple post form to send data to itself instead of submitting to wp-login.php (which, btw is working fine if users provide the correct credentials, and also accept a redirect parameter to make users land on a pre-determined page).

Just for my records, I leave the code here, and I hope I could save some time to people who incur in the same problem. Here's my solution:

  1. from the dashboard create a page and give it a meaningful permalink, such as log-in
  2. create a php file and give it a name according to the wordpress convention: page-<permalink>.php (in my case: page-log-in.php)
  3. write the php code to perform the whole authentication (hence including password check and signon), this is my code:
    <?php
    	// if user is logged in, redirect whereever you want
    	if (is_user_logged_in()) {
    		header('Location: '.get_option('siteurl').'/how-to');
    		exit;
    	}
    
    	// if this page is receiving post data
    	// means that someone has submitted the login form
    	if( isset( $_POST['log'] ) ) {
    		$incorrect_login = TRUE;
    		$log = trim( $_POST['log'] );
    		$pwd = trim( $_POST['pwd'] );
    
    		// check if username exists
    		if ( username_exists( $log ) ) {
    			// read user data
    			$user_data = get_userdatabylogin( $log );
    
    			// create the wp hasher to add some salt to the md5 hash
    			require_once( ABSPATH.'/wp-includes/class-phpass.php');
    			$wp_hasher = new PasswordHash( 8, TRUE );
    			// check that provided password is correct
    			$check_pwd = $wp_hasher->CheckPassword($pwd, $user_data->user_pass);
    
    			// if password is username + password are correct
    			// signon with wordpress function and redirect wherever you want
    			if( $check_pwd ) {
    				$credentials = array();
    				$credentials['user_login'] = $log;
    				$credentials['user_password'] = $pwd;
    				$credentials['remember'] = isset($_POST['rememberme']) ? TRUE : FALSE;
    
    				$user_data = wp_signon( $credentials, false );
    				header('Location: '.site_url('how-to'));
    			}
    			else {
    				// don't need to do anything here, just print some error message
    				// in the form below after checking the variable $incorrect_login
    			}
    		}
    	}
    
    	// and finally print the form, just be aware the action needs to go to "self",
    	// hence we're using echo site_url('log-in'); for it
    ?>
    <?php get_header(); ?>
    
    	<h2>log in</h2>
    	<form action="<?php echo site_url('log-in'); ?>" method="post" id="login-form">
    		<label for="log">User</label>
    		<input type="text" name="log" id="log" class="text" value="<?php echo wp_specialchars(stripslashes($user_login), 1) ?>" size="20" />
    
    		<label for="pwd">Password</label>
    		<input type="password" name="pwd" id="pwd" class="text" size="20" />
    
    		<label for="rememberme"><input name="rememberme" id="rememberme" type="checkbox" checked="checked" value="forever" /> Remember me</label>
    
    		<input type="hidden" name="redirect_to" value="<?php echo get_option('siteurl'); ?>/how-to" />
    		<input type="submit" name="submit" value="log in" class="button" />
    	</form>
    <?php
    	// incorrect credentials, print an error message
    	if( TRUE == $incorrect_login ) {
    ?>
    		<div class="incorrect_login">Incorrect login details. Please confirm the fields and submit it again,
    		or <a href="<?php echo site_url('contact-us'); ?>">contact us</a> to obtain a set of credentials.</div>
    <?php
    	}
    ?>	
    
    <?php get_footer(); ?>
  4. and finally load this php file where your wordpress theme lives, it's done!

Anonymous functions with PHP / Eclipse

Eclipse is a great IDE, I don't know how could I live without it, but when it comes to PHP development, the PDT plugin shows some flaws unfortunately. It may happen that you need to declare an anonymous function and get a "compiler" error, as shown below:

php fake errors in eclipse pdt

just ignore it, the code is working perfectly well, so declaring an anonymous function will not give any problem, you just need to cope with the little red error message until the end of the project (or declare the callback as an identified function).

Below the code to declare the inner function for array_filter:

foreach( $sem_clusters as $sem_cluster ) {
	$terms = explode(';', preg_replace('/\"/', '', $sem_cluster->terms_meta));
	$terms = array_filter( $terms, function($value) { return strlen($value) > 2; });
	$tf_idf = array_merge( $tf_idf, $terms );
}

Crunchbase visualization tool of Venture Firms

Over the past few days a couple of friends asked me more or less the same thing: retrieve a collection of Venture Firms and visualize a simple subset of important data, such as geo location, average investment, and short description. I have hence decided to spend a little time with some PHP/Javascript and implemented a simple application.

The application crawls pages from Crunchbase to retrieve a set of financial organizations, then update their locations using their API. Details on each organization are fetched through an AJAX request to Crunchbase's API after a marker on the map is clicked. A vector graph is displayed showing the list of the investments of the firm.

The application uses Raphael JS for creating the SVG graph, therefore works on Firefox and Safari only [EDIT: it was then updated to google chart api to maximize browser compatibility and now fully compliant].

2010

AIS & ONAV Wine tasting application for Android - ALPHA

I started a personal project months ago, aimed at realizing a web and mobile infrastructure for wine tasting notes. During the same period I quit my old job, moved from my old flat in Florence to my wife's in London, found a new flat and a new job, moved again to the new flat, changed computer (the old one being back to the lab where I used to work) and started a couple of new projects, quoted in my previous posts.

I think it's enough to justify that my original project - for which I found what I think a cute name: "Cellarium" - is currently in stand by. Still I think it's a real shame, mainly for two reasons:

  1. there are not official AIS and/or ONAV wine tasting applications for mobiles in the cloud (or at least I didn't found them)
  2. being now in London, I must say that Italian wines are under estimated and are really hard to find (no, I'm not talking about the crap you usually pick up around)

I decided thus to put everything under github, and if a brave volunteer fancies a contribution, can easily download the source code here: http://github.com/grudelsud/com.londondroids.cellarium

If you are brave enough to read through the rest of the post, you will discover the following:

  • code is completely undocumented, shame on me, but I didn't have the time to do anything else, it's just there and you can use it at your own risk;
  • wines and wine tasting notes are stored in a sqlite database (stored in the mobile device) and I don't even remember its structure, the only way to retrieve how things are done is to take a look at the two inner classes defined inside CellariumProvider.java
  • saving and updating wines work fine, while wine notes still need a lot of coding, both from a data persistence perspective and interface implementation.

The application is intended to help while writing down wine tasting notes for both AIS and ONAV standards, as depicted in the following.

I'd be happy to help/contribute and finish the application, volunteers appreciated. If no one interested in this, I'll probably finish it when I'll retire, using Android v.714.23 on a nuclear-powered-artificial-intelligence-smartphone produced in Mars by Cyberdyne systems.

Particle Animations with Flint

In the last few week I have been working on a project that involved the development of a particle animation system. The project is still under development and we are working under NDA: for this reason unfortunately I still cannot post any details about it, even though I will post details as soon as it goes live.

What I can definitely do is show the tests I had to develop using Flint particle animation system, developed by @Richard_Lord and give a brief explanation of the code, since I saw that a lot of people are having troubles with, to say the less, a little bit sparse documentation.

Let's show the results first, it is a tweaking of the logo tweener example found on the website.

[swf src="http://tom.londondroids.com/wp-content/uploads/2010/05/ParticleTest.swf" width=400 height=400 version=10]

It was not rocket science at all, since it was only matter of tweaking an already made example, still I noticed from the forum that loads of people got stuck while introducing some changes, and I had to go more in depth of some aspects related to EmitterEvents, ParticleEvents and Easing.

I wanted to use extensively the TweenToZone effect for a series of bitmaps which had to be displayed in the .swf, for this reason I decided it was better to create a class and reuse it when needed. The emitter can also reuse existing particles instead of blasting a complete new set of them, the code is the following.

public class Tweener extends Emitter2D
{
	public function Tweener( init:Boolean, bitmapFrom:Bitmap, bitmapTo:Bitmap, particles:uint, lifetime:uint, color1:uint, color2:uint )
	{
		if( init ) {
			counter = new Blast( particles );
			addInitializer( new ColorInit( color1, color2 ) );
			addInitializer( new Position( new BitmapDataZone( bitmapFrom.bitmapData, 0, 0 ) ) );
		}

		addInitializer( new Lifetime( lifetime ) );
		addAction( new Age( Elastic.easeInOut ) );
		addAction( new TweenToZone( new BitmapDataZone( bitmapTo.bitmapData, 0, 0 ) ) );
	}
}

Be aware of the fact that the Age initializer is using the easing function Elastic.easeInOut, and it is working properly if, and only if, the imported package is org.flintparticles.common.energyEasing.*

Another tricky problem was to create a loop of tweens without necessarily creating a listener function for each emitter, like the example shown in the website. And moreover, I did not figure out an elegant method to retrieve the instance of the emitter which was calling the ParticleDead event. For this reason, I decided to use the EmitterEvent EmitterEmpty and tweak the listener in order to restart the correct emitter as soon as its execution ends. The code is thus easily extendable, using arrays, to an indefinite amount of emitters and related events.

public function ParticleTest()
{
	stage.scaleMode = StageScaleMode.NO_SCALE;
	stage.align = StageAlign.TOP_LEFT;

	super();

	_renderer = new PixelRenderer( new Rectangle( 0, 0, 400, 400 ) );
	_renderer.addFilter( new ColorMatrixFilter( [ 1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0.92,0 ] ) );
	addChild( _renderer );

	runParticleDemo();
}

private function runParticleDemo() : void {

	var textBmp:Bitmap = new text();
	var imageBmp:Bitmap = new image();

	_loop = 0;

	_emitterLoop0 = new Tweener(true, textBmp, imageBmp, 8000, 9, 0xffff0098, 0xffffffff);
	_emitterLoop1 = new Tweener(true, imageBmp, textBmp, 8000, 9, 0xffff0098, 0xffffffff);

	_renderer.addEmitter( _emitterLoop0 );
	_renderer.addEmitter( _emitterLoop1 );

	_emitterLoop0.addEventListener( EmitterEvent.EMITTER_EMPTY, loopParticleEv );
	_emitterLoop0.start( );
}

public function loopParticleEv( ev:EmitterEvent ): void
{
	switch( _loop % 2 ) {
	case 0:
		_emitterLoop0.removeEventListener( EmitterEvent.EMITTER_EMPTY, loopParticleEv );
		_emitterLoop0.stop();
		_emitterLoop1.addEventListener( EmitterEvent.EMITTER_EMPTY, loopParticleEv );
		_emitterLoop1.start();
		break;
	case 1:
		_emitterLoop1.removeEventListener( EmitterEvent.EMITTER_EMPTY, loopParticleEv );
		_emitterLoop1.stop();
		_emitterLoop0.addEventListener( EmitterEvent.EMITTER_EMPTY, loopParticleEv );
		_emitterLoop0.start();
		break;
	}
	_loop++;
}

As should be clear enough from the code, listeners are removed from the emitter that has ended its execution and subsequently added to the emitter that is going to start. A global variable named _loop take care of switching to the proper emitter initialization function.