All sort of organizations needs as many information about their target population. Public datasets provides one important source of this information. However, the use of these databases is very difficult due to the lack of cross-references.
In Spain, two main public databases are available: Population and Housing Censuses and Family Expenditure Surveys. Both of them are published by Spanish Statistical Institute. These two databases can not be joined due to the different aggregation level (FES contains information about families while PHC contains the same information but aggregated). Besides, national laws protects this information and makes difficult the use of the datasets.
This work defines a new methodology for join the two datasets based on Genetic Algorithms. The approach proposed could be used in any case where data with different aggregation level need to be joined.