Creating an application for checking lottery numbers – Design, Modelling and Code creation

In Brazil we have a lottery that pays millions of reais / dollars twice a week.  This value is paid for the winner that got all 6 numbers (out of 60) correctly.  We can buy a ticket with the numbers we choose (or random numbers) for up to 8 draws, what means that we can win at any draw within the next 30 days or so.

But they also pay money if you got 4 or 5 numbers drawn.  It can be a tedious process to check if you got any number right, since there are 15 combinations for 4 numbers, 6 combinations for 5 numbers and a single combination for all the 6 numbers, with a total of 22 paying combinations.

In the first version of my code, I am checking all those 22 combinations automatically for a user that plays only 6 numbers (yes, we can play more numbers, up to 15, with increasing costs).  But how to be sure that this is done in an efficient manner?

I started with the process that I usually start any application:

  • What I want to be done?
  • What interfaces will I have to work with the user?  (e.g. command line, graphical interface, web interface, etc.)
  • What interfaces will I have to work with the data? (e.g. one single app accessing the data, more than one app, web app, external access, just internal access, etc.)
  • Where do I want things done? (e.g. and the frontend code, at backend code, at database, in the server, etc.)
  • etc.

This makes me think and define some simple rules that will be the foundations of my API and that will guide my code.

Most of the time, I do not have answers to all those questions and need to play on the safe side.  The usual answers — and something I have to be very careful with since it is already a bias — are:

  • <something>
  • Web Interface
  • A web app that might have external parties consuming the informaiton from it
  • Database, browser, backend — in that order.

With that in mind, I started the development of the database based on the data that the lottery provides and that I thought would be useful for some statistical analysis.

This would be analyzing the customer’s requirements (myself, in this case) and modelling the database to be able to represent that information in a manner that will be able to answer the basic questions from the customer.

The second step would be analyzing indices and other database specific questions that will allow me to deal with the volume of data, the type of queries that will be performed, the number of concurrent accesses, the type of concurrent queries, etc.  It is the work of a DBA to plan that.

The third step is creating the code.  The first question is: where?

As I wanted to try it out fast, and the situation is not all that complicated, I could code it inside the database itself (PostgreSQL allows me to code very complex things inside of it, in many different programming languages, so this is one of the reasons why I must be careful to not have everything inside the database itself).  The next question becomes: in which language?

The fastest language is plain SQL, then plpgsql, and then plpython (that is Python embedded on the database). My first thought, out of convenience was Python.  Then, I do not needed a full set of it to justify requiring plpython to run a simple query, so I moved to plpgsql.  But, this meant that I was not generating the combinations dynamically, so why not trying to go to plain SQL?  And that is what I ended up doing.

The following code helped me to get all possible combinations for 4 and 5 numbers out of 6 possible numbers:

>>> a=itertools.combinations([1, 2, 3, 4, 5, 6], 4)
>>> for x in a:
...    print x
...

and

>>> a=itertools.combinations([1, 2, 3, 4, 5, 6], 5)
>>> for x in a:
...    print x
...

This I copied onto my SQL code, applied some magic text replacement commands and ended up with my SQL function in PostgreSQL:

CREATE OR REPLACE FUNCTION f_v_is_possible_result(
       i_dezena1 INTEGER,
       i_dezena2 INTEGER,
       i_dezena3 INTEGER,
       i_dezena4 INTEGER,
       i_dezena5 INTEGER,
       i_dezena6 INTEGER)
       RETURNS SETOF INTEGER
AS $_$
SELECT contest_number
       FROM megasena
       WHERE
-- Checking four numbers
        ARRAY[$1, $2, $3, $4] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $2, $3, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $2, $3, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $2, $4, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $2, $4, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $2, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $3, $4, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $3, $4, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $3, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$2, $3, $4, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$2, $3, $4, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$2, $3, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$2, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$3, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
-- Checking five numbers
        ARRAY[$1, $2, $3, $4, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $2, $3, $4, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $2, $3, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $2, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$1, $3, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
        ARRAY[$2, $3, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
-- Checking six numbers
        ARRAY[$1, $2, $3, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6]
       ORDER BY contest_number;
$_$ LANGUAGE SQL;

In this particular case, I exploited some advantages from PostgreSQL and worked with array comparisons to check any of the possible combinations, since the number of conditions was small.  If I had been working with all 15 numbers that one can play, then I would have to dynamically generate that in the code.

The “<@” operator working with arrays means “is contained in”, so I check that the combinations from the left side are contained in the array of drawn numbers.  “dezena1” to “dezena6” are the drawn numbers — their column names, actually –, already ordered in ascending order (even though with arrays on PostgreSQL this ordering does not matter, the only requirement is that all elements from the left side are available on the right side to be a successful match).

This approach, as I mentioned before, would not work with all the 15 numbers since there are 1365 combinations for 4 numbers out of 15, 3003 combinations for 5 numbers out of 15 and 5005 combinations for 6 numbers out of 15, adding up to 9373 possible combinations.  Even with magic text replacement, this would mean a lot of work and a huge code for a SQL function.  Not the best idea at all…  In this case, I would go with a Python function.

With that being done, now I can move on to the next step: the web interface building.

As with the latest applications, I will be using ExtJS and grids.  The numbers from my bet will be the filters and this function above will help me limiting the number of lines I will be showing on the grid.

Around version 2 or 3, I will be changing it to allow up to 15 numbers to be played, with a minimum of 6 numbers.  But, then, it is another project, with another design phase, decision making, etc.  I will try remembering to post about it here when that time comes (and if I get there, since I never play more than 6 numbers anyway…).

Why this post?  Because the process is the same for every application and because this is a somewhat common problem.  Using arrays and their special operators shows up as an elegant solution to a problem.

Happy hacking and wish me luck with my bets. 🙂

 

UPDATE1:. I won a lesser prize.  Found out with the above code. About US$ 300, last week.

UPDATE2:. After a series of interactions with a colleague, I changed the above code to the one below:

CREATE OR REPLACE FUNCTION f_v_is_possible_result(
       i_dezena1 INTEGER,
       i_dezena2 INTEGER,
       i_dezena3 INTEGER,
       i_dezena4 INTEGER,
       i_dezena5 INTEGER,
       i_dezena6 INTEGER,
       i_matches INTEGER = 4,
       OUT contest_number INTEGER,
       OUT numbers INTEGER[],
       OUT quantity INTEGER)
       RETURNS SETOF RECORD
AS $_$
WITH numbers AS (SELECT contest_number, ARRAY_INTERSECT (ARRAY[$1, $2, $3, $4, $5, $6],
                                                   ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6]) AS intersect
                 FROM megasena 
                 ORDER BY contest_number)
     SELECT numbers.contest_number,
            numbers.intersect AS numbers,
            ARRAY_LENGTH(numbers.intersect, 1) AS quantity
     FROM numbers
     WHERE ARRAY_LENGTH(numbers.intersect, 1) >= $7;
$_$ STABLE LANGUAGE SQL;

The “WITH” clause will allow me to process the information more than once while preventing the query from being repeated at every time I use it.

The “i_matches” parameter allow me to show cases where there was no prize won or filter out specific types of prizes (4, 5 or 6 numbers matching).

The new code also shows me which numbers matched.  Here’s the new output:

megasena=# select * FROM f_v_is_possible_result(03, 11, 20, 49, 51, 59);
 contest_number |   numbers    | quantity 
----------------+--------------+------------
           1335 | {3,20,51,49} |          4
(1 row)

megasena=# select * FROM f_v_is_possible_result(03, 11, 20, 49, 51, 59, 3);
 contest_number |   numbers    | quantity 
----------------+--------------+------------
             56 | {20,51,59}   |          3
            217 | {3,51,49}    |          3
            296 | {3,20,51}    |          3
            548 | {51,49,59}   |          3
            658 | {11,3,20}    |          3
            857 | {3,49,59}    |          3
           1034 | {3,20,49}    |          3
           1035 | {11,51,59}   |          3
           1057 | {3,49,59}    |          3
           1192 | {20,49,59}   |          3
           1247 | {11,3,51}    |          3
           1335 | {3,20,51,49} |          4
(12 rows)

megasena=#

My prize was on contest number 1335.

Using matplotlib / pylab without a DISPLAY

I’ve been having issues with a web system I coded for a while now and couldn’t find a solution that worked to my satisfaction to eliminate the need of a Xvnc running to provide a virtual DISPLAY to run some commands to plot data using pylab.

Finally, today, I found the solution here and I want to share it.

import matplotlib
matplotlib.use('Agg') # Must be before importing matplotlib.pyplot or pylab!
import matplotlib.pyplot as plt

fig = plt.figure()
plt.plot(range(10))
fig.savefig('temp.png')

The solution, here, is the Agg option as the “device” to be used to plot files.

Solution in place, and I can now eliminate the requirement for Xvnc from my code and all the issues that came with it are also automatically gone.

So, here’s the code for Pylab:

import matplotlib
matplotlib.use('Agg') # Must be before importing matplotlib.pyplot or pylab!
import pylab

pylab.plot(range(10))
pylab.savefig('temp.png')

Enjoy!

Implementing and Monitoring

Most of us, people from IT, forget about the post-implementation process.

It isn’t just supporting the application, but providing means to monitor the system health.

Lets say you deploy a web system and for that you have to install the webserver, the database server and your own code. You guarantee that your system will work based on several tests you have coded and perharps some techniques to restart the application in case it fails or reach some limits. But what about when your application stops because of the web server? Or the database server?

If the customer has a team to support them, it is understandable that it is this team’s responsibility to install, monitor and keep the infrastructure needed by your software. If they don’t, a failure on any of those components is a failure on the functionality you provided them.

You can make agreements, draw the component structure, etc. but when things fail, it is your software that has become unusable to them.

Other than that, as time passes, disks fill up, hardware starts showing issues (some of them are SMART – hint! – enough to provide you with alerts) and that dedicated server to a single task is not “so dedicated” anymore (again, if you manage the infrastructure you can have some control over it, otherwise their IT will have to take care of what goes where for them).

One great thing that can be done – and should be done – is documenting certain thresholds on your system and providing alerting and monitoring tools to the customer. Put that in written form and say that when certain threshold is reached they should take some corrective or preventative actions (automate it if you can!) and for other components you can show them that new hardware is needed.

We are so focused and worried with delivering things that we forget about keeping it running.

Putting things on the cloud helps reducing some of your concerns, but shouldn’t be an excuse for not monitoring and reporting things.

A last reason to monitor: know your system requirements. Do you know how much memory your system requires with 10 simultaneous users? With 50? What about 1000? What is the required hardware to recommend your customer to upgrade to?

Monitor. Report. Follow on.

Serendipity

Just a quick note on something I found out by accident (see Serendipity at Wikipedia):

  • if I don’t change the date of the draft that I started writing with org2blog, then the publishing date becomes that of the draft

It is interesting that I can change the date of the publication by simply changing something on my own copy of the post. And it is more interesting that this change reflects on the archives and publishing dates that are visible on the main page of the website.

After some time, updates!

I’ve been doing a lot of things lately.

After my annual leave was over, I got back to work and had to deal with the huge backlog. It doesn’t matter if another person works on keeping your things up to date, you still have to go there and review items that need some follow up, take notes of processes that changed, etc.

I have also been working on developing some global strategies for my projects. This takes a huge effort and a lot of time.

At home, I have been playing with Emacs, Python, ExtJS and of course reading email and news.

I have found a very interesting article on Emacs keybinds. It is worth reading it. If not for the knowledge, at least for the templates.

I still have to “play” with Protovis. It looks like a great library for charts on the web.

I also had issues – again – with some magazine subscription that I wanted cancelled after they billed my credit card on an automatic renewal. I got reimbursed and they lost a customer that is now telling everybody how they didn’t stick to our agreement of not having the subscription automatically renewed.

I had my performance evaluation last week. Lots of things to think about, other things to keep tracking. Overall, a good thing.

Looking for new challenges as well…

As you can see, there are too many things going on.

And I still want to find some time to go back playing with electronics and microcontrollers, after all, I’m an engineer. 🙂

Ext.extend and Conquer

Today it was a very productive day.

I’ve redesigned several problematic screens of an application into more functional versions, working much faster, providing more information to the end user and – the best thing – with a very consistent appearance on their layout.

ExtJS provides some nice features to standardize its components by extending its standard classes.

The simplest way to do that is like what I show on the code snippet below. It simply provides some default values – that can be easily overriden when the custom component is instantiated – that help with the implementation of sets of components.

var myCustomCombo = new Ext.extend(
    Ext.form.ComboBox,
    {
        width: 180,
        mode: "local", // I load the data manually from the store.
        triggerAction: "all",
        emptyText: 'Choose an option...',
        autoSelect: false,
      editable: false
    }
);
Ext.reg('mycustomcombo', myCustomCombo);

I did custom implementations of panels, forms, combo boxes and grids. With that, I make only slightly variations from one case to the other – imagine addresses, where you have several different components from the country up to the zip code for a street, avenue, square, etc.

The new layout also made it possible to eliminate different pages and consolidate all the information on a single page where all records could be browsed, edited, removed or new records could be added to the database.

On the backend – my TurboGears application – my code also shrunk considerably as I didn’t have to make some convoluted queries to get the information. All modules were better decoupled and obtaining the information using AJAX – or better AJAJ since I don’t use XML but use JSON – much easier through a cleaner interface.

Summing up: it was a complete redesign of a whole module in a day. Some hundreds of lines of code eliminated and functionality added to the system.

I hope that tomorrow the day becomes as productive as today.