In Brazil we have a lottery that pays millions of reais / dollars twice a week. This value is paid for the winner that got all 6 numbers (out of 60) correctly. We can buy a ticket with the numbers we choose (or random numbers) for up to 8 draws, what means that we can win at any draw within the next 30 days or so.

But they also pay money if you got 4 or 5 numbers drawn. It can be a tedious process to check if you got any number right, since there are 15 combinations for 4 numbers, 6 combinations for 5 numbers and a single combination for all the 6 numbers, with a total of 22 paying combinations.

In the first version of my code, I am checking all those 22 combinations automatically for a user that plays only 6 numbers (yes, we can play more numbers, up to 15, with increasing costs). But how to be sure that this is done in an efficient manner?

I started with the process that I usually start any application:

- What I want to be done?
- What interfaces will I have to work with the user? (e.g. command line, graphical interface, web interface, etc.)
- What interfaces will I have to work with the data? (e.g. one single app accessing the data, more than one app, web app, external access, just internal access, etc.)
- Where do I want things done? (e.g. and the frontend code, at backend code, at database, in the server, etc.)
- etc.

This makes me think and define some simple rules that will be the foundations of my API and that will guide my code.

Most of the time, I do not have answers to all those questions and need to play on the safe side. The usual answers — and something I have to be very careful with since it is already a bias — are:

- <something>
- Web Interface
- A web app that might have external parties consuming the informaiton from it
- Database, browser, backend — in that order.

With that in mind, I started the development of the database based on the data that the lottery provides and that I thought would be useful for some statistical analysis.

This would be analyzing the customer’s requirements (myself, in this case) and modelling the database to be able to represent that information in a manner that will be able to answer the basic questions from the customer.

The second step would be analyzing indices and other database specific questions that will allow me to deal with the volume of data, the type of queries that will be performed, the number of concurrent accesses, the type of concurrent queries, etc. It is the work of a DBA to plan that.

The third step is creating the code. The first question is: where?

As I wanted to try it out fast, and the situation is not all that complicated, I could code it inside the database itself (PostgreSQL allows me to code very complex things inside of it, in many different programming languages, so this is one of the reasons why I must be careful to not have everything inside the database itself). The next question becomes: in which language?

The fastest language is plain SQL, then plpgsql, and then plpython (that is Python embedded on the database). My first thought, out of convenience was Python. Then, I do not needed a full set of it to justify requiring plpython to run a simple query, so I moved to plpgsql. But, this meant that I was not generating the combinations dynamically, so why not trying to go to plain SQL? And that is what I ended up doing.

The following code helped me to get all possible combinations for 4 and 5 numbers out of 6 possible numbers:

>>> a=itertools.combinations([1, 2, 3, 4, 5, 6], 4)
>>> for x in a:
... print x
...

and

>>> a=itertools.combinations([1, 2, 3, 4, 5, 6], 5)
>>> for x in a:
... print x
...

This I copied onto my SQL code, applied some magic text replacement commands and ended up with my SQL function in PostgreSQL:

CREATE OR REPLACE FUNCTION f_v_is_possible_result(
i_dezena1 INTEGER,
i_dezena2 INTEGER,
i_dezena3 INTEGER,
i_dezena4 INTEGER,
i_dezena5 INTEGER,
i_dezena6 INTEGER)
RETURNS SETOF INTEGER
AS $_$
SELECT contest_number
FROM megasena
WHERE
-- Checking four numbers
ARRAY[$1, $2, $3, $4] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $2, $3, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $2, $3, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $2, $4, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $2, $4, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $2, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $3, $4, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $3, $4, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $3, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$2, $3, $4, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$2, $3, $4, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$2, $3, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$2, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$3, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
-- Checking five numbers
ARRAY[$1, $2, $3, $4, $5] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $2, $3, $4, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $2, $3, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $2, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$1, $3, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
ARRAY[$2, $3, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6] OR
-- Checking six numbers
ARRAY[$1, $2, $3, $4, $5, $6] <@ ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6]
ORDER BY contest_number;
$_$ LANGUAGE SQL;

In this particular case, I exploited some advantages from PostgreSQL and worked with array comparisons to check any of the possible combinations, since the number of conditions was small. If I had been working with all 15 numbers that one can play, then I would have to dynamically generate that in the code.

The “<@” operator working with arrays means “is contained in”, so I check that the combinations from the left side are contained in the array of drawn numbers. “dezena1″ to “dezena6″ are the drawn numbers — their column names, actually –, already ordered in ascending order (even though with arrays on PostgreSQL this ordering does not matter, the only requirement is that all elements from the left side are available on the right side to be a successful match).

This approach, as I mentioned before, would not work with all the 15 numbers since there are 1365 combinations for 4 numbers out of 15, 3003 combinations for 5 numbers out of 15 and 5005 combinations for 6 numbers out of 15, adding up to 9373 possible combinations. Even with magic text replacement, this would mean a lot of work and a huge code for a SQL function. Not the best idea at all… In this case, I would go with a Python function.

With that being done, now I can move on to the next step: the web interface building.

As with the latest applications, I will be using ExtJS and grids. The numbers from my bet will be the filters and this function above will help me limiting the number of lines I will be showing on the grid.

Around version 2 or 3, I will be changing it to allow up to 15 numbers to be played, with a minimum of 6 numbers. But, then, it is another project, with another design phase, decision making, etc. I will try remembering to post about it here when that time comes (and if I get there, since I never play more than 6 numbers anyway…).

Why this post? Because the process is the same for every application and because this is a somewhat common problem. Using arrays and their special operators shows up as an elegant solution to a problem.

Happy hacking and wish me luck with my bets. :-)

UPDATE1:. I won a lesser prize. Found out with the above code. About US$ 300, last week.

UPDATE2:. After a series of interactions with a colleague, I changed the above code to the one below:

CREATE OR REPLACE FUNCTION f_v_is_possible_result(
i_dezena1 INTEGER,
i_dezena2 INTEGER,
i_dezena3 INTEGER,
i_dezena4 INTEGER,
i_dezena5 INTEGER,
i_dezena6 INTEGER,
i_matches INTEGER = 4,
OUT contest_number INTEGER,
OUT numbers INTEGER[],
OUT quantity INTEGER)
RETURNS SETOF RECORD
AS $_$
WITH numbers AS (SELECT contest_number, ARRAY_INTERSECT (ARRAY[$1, $2, $3, $4, $5, $6],
ARRAY[dezena1, dezena2, dezena3, dezena4, dezena5, dezena6]) AS intersect
FROM megasena
ORDER BY contest_number)
SELECT numbers.contest_number,
numbers.intersect AS numbers,
ARRAY_LENGTH(numbers.intersect, 1) AS quantity
FROM numbers
WHERE ARRAY_LENGTH(numbers.intersect, 1) >= $7;
$_$ STABLE LANGUAGE SQL;

The “WITH” clause will allow me to process the information more than once while preventing the query from being repeated at every time I use it.

The “i_matches” parameter allow me to show cases where there was no prize won or filter out specific types of prizes (4, 5 or 6 numbers matching).

The new code also shows me which numbers matched. Here’s the new output:

megasena=# select * FROM f_v_is_possible_result(03, 11, 20, 49, 51, 59);
contest_number | numbers | quantity
----------------+--------------+------------
1335 | {3,20,51,49} | 4
(1 row)
megasena=# select * FROM f_v_is_possible_result(03, 11, 20, 49, 51, 59, 3);
contest_number | numbers | quantity
----------------+--------------+------------
56 | {20,51,59} | 3
217 | {3,51,49} | 3
296 | {3,20,51} | 3
548 | {51,49,59} | 3
658 | {11,3,20} | 3
857 | {3,49,59} | 3
1034 | {3,20,49} | 3
1035 | {11,51,59} | 3
1057 | {3,49,59} | 3
1192 | {20,49,59} | 3
1247 | {11,3,51} | 3
1335 | {3,20,51,49} | 4
(12 rows)
megasena=#

My prize was on contest number 1335.