9. DATA COLLECTION TOOLS

TOOL NO 4: CHALLENGES IN COLLECTING QUANTITATIVE ETHNIC DATA

This tool was developed by UNDP’s Regional Centre for Europe and CIS. It draws from the experiences of data collection on minority groups, including innovative surveys conducted in support of the UNDP Regional Human Development Report on the Roma, ‘Avoiding the Dependency Trap’ (2003) and ‘At Risk: Roma and the Displaced in Southeast Europe’ (2006).

Using this tool:

This tool provides a detailed introduction to the approaches and challenges of collecting disaggregated data by ethnicity, religion and/or language. It provides UNDP COs with some guiding principles to observe when commissioning new data collection on minorities.

Data on household incomes and expenditures and in Labour Force surveys disaggregated by ethnicity or religion is scarce. For many reasons, statistical institutes do not tend to monitor household budgets along these lines. In the case of the Roma, for example, this reflects both political sensitivity regarding Roma and the rest of society and resistance from Romani organizations. The latter have (not wholly unreasonable) concerns that ethnically disaggregated data could be used for discriminatory purposes and thereby increase tensions and intolerance between the minority and majority.

Current data collection instruments fail to capture accurate information about minorities because of the following reasons:

  • In some countries, legal constraints prevent collection of ethnic data in censuses or other surveys.
  • Government and minorities fear the consequences of data collection
  • Household surveys and censuses often significantly underestimate the size of ethnic minorities.

In a national census, members of minorities may opt not to identify themselves as such, often out of fear of discriminatory practices. With fluid definitions of identity, the very populations in question are unclear and any estimates can be susceptible to speculation. National representative survey samples are usually based on census data with all the consequences from under-representation of minority groups. As a consequence, minorities who did not self-identify in the census are therefore likely to be under-sampled.

Here both researchers and policy-makers face a peculiar vicious circle: data is necessary but not available. When available, it is not reliable (different estimations of minorities can be equally acceptable and justified using different sets of arguments). As a result, the opportunity for data misinterpretation is disturbingly broad. Depending on whether higher or lower estimates “work” better in the particular political context, different actors can argue for or against some current political issue usually unrelated to the goal of improving the socio-economic status of minorities.

Obstacles to effective monitoring

The principle of self-identification

There is widespread agreement that data on ethnicity or religion are necessary for the design and implementation of effective policies to combat discrimination. At the same time, under international law, no one can be compelled to reveal certain kinds of sensitive information, including data on ethnic origin and religion. This standard is sometimes misinterpreted as prohibiting any collection of data on ethnicity. In fact, international law supports the principle of self-identification, leaving the individual to choose with which ethnic, religious or linguistic group(s), if any, to identify. Further, the Committee on the Elimination of Racial Discrimination (CERD) stated in General Recommendation VIII that the way individuals be identified as belonging to ethnic groups will ‘if no justification exists to the contrary, be based upon self-identification by the individual concerned’.

Although the principle of self-identification is useful for resolving legal-ethical dilemmas concerning the collection of data on ethnicity in general, this principle alone is not sufficient to ensure meaningful data on minority groups. In the case of many minorities, a deep-seated resistance to declare their ethnic or religious identity is rooted in lived experiences of abuse of personal data. On the other hand, where programmes are established for particular groups, such as a programme to assist members of minorities to obtain jobs, individuals who do not meet any of the objective criteria for membership of a particular ethnic group (culture, ethnicity, religion, language), may attempt to self-identify with that group in order to benefit from the programme; however, there is no right to arbitrarily choose to belong to a particular minority. ‘The individual’s subjective choice is inseparably linked to objective criteria relevant to the person’s identity’.47

Equally problematic is external identification. The State may not impose an identity on individuals so it is not acceptable to use the perception of the interviewer as the sole means of identifying different individuals’ membership of a group. Practically, this method would also be subject to the prejudices of the interviewer and therefore likely to be inaccurate. Resolving these ambiguities about self-identification requires confidence and trust building efforts by the government and minority NGOs.

Fear, stigmatization and confounded identities

Fear of the consequences of ethnic data collection is pervasive. The fears of minorities and governments differ. Government fears include concerns that data showing large inequalities between groups will cause conflict or exacerbate historical conflicts between groups. Minorities’ fear may include distrust towards government claims that data intended for beneficial use instead will create more discrimination and stigmatization. Another aspect of under-reporting of minorities is related to the multiple identities minorities might have. Experience shows that ethnicity is often confounded with civic, confessional, and linguistic identities.

Underestimation and overestimation

Taken together, the various pitfalls associated with measuring the size of ethnic populations combine to yield considerable discrepancies between official and unofficial numbers, with the official figures often considerably lower than the number of persons who identify themselves as ethnic minority in daily life. Among the problems associated with underestimating the size of a given country’s minority population are overestimation on socially sensitive indicators such as birth-rate, unemployment, and criminality.

Protecting sensitive data

Principles of data protection

Data protection laws are often cited as prohibiting the collection of ethnic data. However, data protection laws can distinguish between the collection of individually identifiable data and that of anonymous data, permitting the latter. European Union law, for example, applies to personal data and exempts anonymous data.48 The Council of Europe notes that statistical results are not personal data because they are not linked to an identifiable person and highlights the need for balance between the need for research and the protection of privacy of individuals.49

In an attempt to balance the need for data on ethnicity with considerations of personal privacy, the European Commission against Racism and Intolerance (ECRI) has recommended that ethnic data be collected in ways that ensure confidentiality, informed consent, and voluntary self-identification. Furthermore, ECRI has urged against publication of personal data in such a way as to divulge individual identity. Taking this line of thinking a step further, one data protection expert has suggested that abuse of personal data be prevented through a method that would “count the members of a community without numbering them, i.e., without recording them individually in files, registries or computer databases” (Székely 2001, p. 279).

In addition to containing a general prohibition on the processing of sensitive data –including but not necessarily limited to personal data on racial or ethic origin, political opinions, religious or philosophical beliefs, trade union membership and health or sex life – the EU Data Protection Directive enumerates conditions under which the processing of sensitive data can be legitimated. For example, Article 8 (2) states sensitive data may be processed on the basis of the data subject’s consent, unless the laws of the member States otherwise provide.

Further exemptions to the prohibition on processing sensitive data under the Data Protection Directive may be laid down by national laws or by decision of national supervisory authority, provided that suitable safeguards are provided (i.e. necessary technical and organizational measures are taken in order to maintain data security). The reason for this class of exemptions is to facilitate scientific research and government statistics, enabling processing and storage of sensitive data in central population registers, tax registers, census registers and the like.

Article 6 of the Data Protection Directive sets out five qualitative principles that must be respected when personal data is processed. These principles require that personal data must be:

  • Processed fairly and lawfully;
  • Collected for specified, explicit and legitimate purposes;
  • Adequate, relevant and non-excessive;
  • Accurate, and where necessary kept up to date;
    and
  • Kept in a form that permits the identification of data subjects for no longer than necessary.

By virtue of the above principles, data collection operations could wherever possible conduct:

  • Secondary rather than primary data collection;
  • Anonymous rather than non-anonymous surveys;
  • Sampling rather than full-scale surveys;
  • Voluntary rather than compulsory surveys.

Solutions

There are six major options for producing disaggregated data. All of the approaches are mutually reinforcing and complement each other and may be seen as integral pillars of comprehensive system of ethnically sensitive data collection and monitoring. However, in some cases, additional legislation may need to be enacted to ensure full respect for the right to privacy and individual data integrity.

1. Disaggregating hard statistics using personal identification numbers as ethnic markers

2. Disaggregating hard statistics using territorial tags as ethnic markers

3. Extending the samples of regular sample surveys with Roma boosters

4. Custom “on the spot” surveys among recipients of social services

5. Community-based collection of data conducted by data collectors from the communities monitored

6. Census improvement

1.Personal Identification Number (PIN) based tagging

PIN-tagging is based on the fact that in many countries the census records ethnic affiliation (e.g. mother tongue) and the individual respondents’ unique personal identification number (PIN). Matching the census identity with PIN registration in administrative data bases makes possible identification of the representatives from the respective ethnic group out of the total universe of the respective data base. This approach is applicable for extracting national-level ethnically disaggregated data on administrative (including population) statistics, registered unemployment, health treatment (both hospitalization and personal doctors visits), social insurance coverage (including labour contracts). Indicators such as registered unemployment rates, morbidity rates, mortality rates social assistance coverage, formal/informal employment rates may be computed with high level of accuracy.50 However, for such purpose, explicit procedures for data anonymization and relevant administrative structure need to be in place.

2. Territorial markers tagging

This approach is based on the fact that minority groups are also excluded territorially, in separate (often segregated) communities. Thus territorial mapping of those communities is possible. Once a detailed map of minority-dominated communities is available, ethnic tags based on an individual’s address can be applied with the assumption that an individual living in an area identified as “predominantly one ethnic group” is from this ethnic group. These tags can be used, for instance, in line ministry registries (particularly Ministry of Education) and personal doctor databases.

Territorial marker tagging is thus complementary to PIN-tagging. But it has some benefits that the latter does not have. To certain extent it can be more reliable because solves the problem with understating ethnic identity during censuses. It is also less susceptible to fluctuations due to changes in the political environment, revealing that ethnic identity is heavily influenced by the political climate, and the rise and influence of extremist parties. However, those benefits come at a cost – it grasps the marginalised, visually excluded segment of the ethnic population whilst the probability is high that the share of ethnic population integrated will fall out of the scope of the data collection exercise.

In any case however using territorial markers tagging is important (and to certain extent – the only reliable) approach that can provide acceptably relevant estimate of the absolute number of the population in question (and not just shares as poverty rates and unemployment rates). The absolute number is crucial for needs assessment and hence for defining numeric targets. If targets (and resources) are determined on the basis of census data, the real needs will be inevitably underestimated.

3. Ethnic minority boosters in sample surveys
Household budget surveys (HBS) and labour force surveys (LFS) are the most important surveys when looking at issues of poverty, unemployment and social exclusion and the respective policies to address those issues. Unfortunately, these surveys in most countries fail to include a representative sample of ethnic minorities (especially when a minority is small or lives segregated from the majority) or when it is solely based on census data. To overcome this sampling problem and use HBS/LFS as a regular and precise data collection mechanism for minorities, sampling boosters of the respective minorities (i.e. increasing the sample size of minorities) or separate minority samples would be necessary. However, this is very costly and impossible when several minorities exist in one country.

Constructing the random sample boosters may be a problem, mostly because of the unclear number of the ethnic population. One possible compromise is accepting the self-identification principle (during the census) and constructing a random sample based on the population self-identified or having declared a respective mother tongue (ideally both). In this case a minority booster would bear the “genetic” features (and problems) of the PIN-based methods for statistical data disaggregation and shares both its benefits and detriments. An alternative could be constructing a sample on the basis of territorial mapping of the ethnic population – assuming that such mapping is in place. Similar to the latter is using GIS (Geographic Information System)-based sampling, which to large extent is a variety of territorial tagging.

4. Custom surveys among social services recipients
This approach entails anonymous questionnaires (usually brief, consisting of just few questions) filled in by recipients of social services on voluntary basis. For example, unemployed person registering at the labour office is invited to fill in a questionnaire in addition to the regular forms. The questionnaire may include the field “ethnicity” and is dropped in a sealed box to make linking of the questionnaire with the standard application impossible.

Such approach can be a good source of information, both for the ethnic profile of the recipients of social services and for the way in which their providers work (for example, are there any ethnic-based prejudices?). In the best case scenario (assuming there is no duplication of questionnaires and their number is close to that of the recipients of social services) such survey could be representative just for the recipients, not for the whole ethnic group.

5. Community-based monitoring
Community-level data is particularly important with regard to monitoring social exclusion and poverty. Such a system could provide basic information on the communities in question based on standard questionnaires completed on regular basis by a designated member of the community after receiving training on basic data collection and reporting techniques. The system would provide:

  • Quantitative information on the community status (number of households, their housing conditions, number of children attending school, their age and grade, number of drop-outs, number of new-born, number of vaccinated children etc.).
  • Quantitative information on the occurrence of certain events relevant for monitoring perspective (power cuts and their duration, accidents, conflicts with majority or other ethnic groups, NGOs activities etc.).

Data collected within the system of community monitoring will provide information with respect to the status of the minority communities, their internal dynamics and the life in ethnic neighborhoods, particularly in closed ghettoes. In this regard, such data that will be complementary to other sources. For complementarity purposes, the structure of data (and the design of the instruments used) should be as close as possible to other instruments for similar data collection. A necessary precondition is the training of the local data collectors on basic data collection techniques and standards and establishing a system of incentives for responsible and reliable work as well as a control system.

6. Census improvement
The census remains the most effective instrument to collect comprehensive data on the population of a country. The major difficulty lies in capturing the multiple identities of minorities. As outlined above, using the ethnicity question, even if it is not prohibited, will not necessarily produce accurate statistics on the situation of minorities, given the issues of fear and self-identification. Therefore, the census needs to be improved in various ways to accommodate the multiple identities minorities might have, increase their willingness and trust to state their ethnicity and believe in the value and benefit of data.51

Regarding the ethnicity question, there are various suggestions on how to circumvent this issue. One is to introduce a multiple choice question on ethnicity. Another suggestion is to differentiate clearly between ethnicity and citizenship or nationality to prevent the respondent from the need of choosing one option only, though s/he feels to have various identities. Another option is to add questions on language, religion, partner’s ethnicity or country of birth or origin as objective identification criteria.52

Minority involvement

Collection of data on ethnic and cultural background can be successful only if the national statistical system creates trust with regard to the confidentiality of individual data, and more generally a positive environment for population sub-groups. Therefore, one of the major prerequisites for relevant data collection is the participation and involvement of the communities surveyed in the process of data collection at all stages. Fieldwork has an important role to play within the data collection process. Simple factors become relevant, such as the sex or ethnicity of the interviewer, the way a question will be asked, or how the interviewer would be accepted by the respondent.

Minority representatives, including women, could be trained as interviewers and in the basics of sociological data collection, interviewing techniques, the contents and context of individual questions. Fieldwork could then be carried out by the trained interviewers, or regular interviewers could be accompanied by an “assistant interviewer” from the surveyed minority.

The role envisaged for the “assistant interviewers” is much broader than community penetration. Such interviewers could constitute the core of future data collectors who could actively cooperate with the national statistical institutes and other bodies interested in collecting adequate data on the socio-economic status of marginalised groups. This is a long-term investment that goes far beyond the validity of the results of surveys and censuses. These kinds of partnerships with local communities and NGOs are required to improve the data collection process and respective results.


47
 Council of Europe, Framework Convention for the Protection of National Minorities (FCNM), Article 3.1 and Explanatory Report, H(1995)010, paragraph 35.
48  EU directive on the protection of individuals with regard to the processing of personal data and on the free movement of such data, 95/46/EC, 24 October.
49 Council of Europe Convention for the Protection of Individuals with Regard to Automatic Processing of Personal Data (1981) and Recommendation No. R(97) 18 of the Committee of Ministers Concerning the Protection of Personal Data Collected and Processed for Statistical Purposes (1997).
50 The fact that census data underestimates the number of Roma population is not a problem because the similar degree of underreporting will appear both in the nominator and the denominator. In addition indicators computed on the basis of PIN-tagging can be correlated with other data to improve their robustness.
51 The 2006 UNECE/EUROSTAT Conference of European Statisticians Recommendations for the 2010 Censuses of Population and Housing explicitly state: “It is recommended that representatives of ethnic, language and religious groups be consulted in the drafting of census questions, the definition of classification procedures and the conduct of censuses among minority populations to assure transparency, the correct understanding of the questions and the full participation of the population”.
52 The 2006 UNECE/EUROSTAT Conference of European Statisticians Recommendations for the 2010 Censuses of Population and Housing explicitly state: “Ethnicity has necessarily a subjective dimension and some ethnic groups are very small. Information on ethnicity should therefore always be based on the free self-declaration of a person, questionnaires should include an open question and interviewers should refrain from suggesting answers to the respondents. Respondents should be free to indicate more than one ethnic affiliation or a combination of ethnic affiliations if they wish so”.

Back to top