ESS Standard for Quality Reports Structure (ESQRS)
National Statistical Institute
|Contact organisation unit|
Statistics on Living Conditions Department,
Demographic and Social Statistics Directorate
Desislava Dimitrova, PhD
|Contact person function|
head of department
|Contact mail address|
2 P.Volov street, 1038 Sofia
|Contact email address|
|Contact phone number|
+359 2 9857 183
|Contact fax number|
Survey on income and living conditions (SILC) is a tool for providing timely and comparable data on income distribution, level and structure of poverty and social exclusion. The survey is carried out in a European methodology and provides information about the current state (cross-sectional data) and longitudinal (longitudinal data) changes in income level and structure of poverty and social exclusion.
EU-SILC provides four basic files containing target variables based on common concepts and definitions.
Annual data for the countries contain the following components:
• Household register (D-file);
• Personal register (R-file)
• Household data (Н-file)
• Personal data of people aged 16 and more (Р-file)
Each year additional data on the household and household members on specific topics is collected, the so-called ad-hoc modules.
The indicators on poverty and social inclusion are calculated on the basis of the survey "Statistics on income and living conditions" and a common methodology for data collection, target variables obtaining and calculating of common indicators, approved by Eurostat. The poverty rate is the share of households that are below the poverty line which is defined as 60% of the median equivalised disposable income.
The following social fields are included in the survey methodology:
|Statistical concepts and definitions|
Total household income:
Two main concepts for total household income are applied:
Total household gross income (HY010) is computed as the sum for all household members of gross personal income components:
Total disposable household income (HY020) can be computed as total household gross income (HY010) is reduced to:
Household is two or more persons, living in one dwelling or part of dwelling, sharing common budget and eating together.
Household is a person, living in one dwelling, room or part of it to a dwelling, has a separate budget for the cost of meals and expenses to satisfy other needs.
For the calculation of indicators of poverty and social inclusion using the total disposable household income is "equalised". Due to the different composition and number of persons in the household equivalent scales apply. Use the modified OECD scale, which gives a weight of 1.0 to the first person aged 14 or more, a weight of 0.5 to other persons aged 14 or more and a weight of 0.3 to persons aged 0-13. The weights are given to each member of the household and summed to obtain an equivalent household size. Total disposable net income for each household is divided by its equivalent size and form the total disposable net income per equivalent unit.
Units of observation are households and household members.
The EU-SILC target population consists of all private households and their current members residing in the country. Persons living in collective households and in institutions are generally excluded from the target population.
Entire territory of Republic of Bulgaria
2006 - 2020
The sample for EU-SILC 2020 are selected from the sampling frame based on the Population Census 2011. The data base includes all private households and their current members residing in the country. Persons living in collective households and in institutions are excluded from the target population. Student’s and worker’s hostels are excluded at the first stage of selection of PSU, because student’s and worker’s households rarely stay on the same addresses and are difficult to trace.
The frame is regularly updated according to the administrative changes made.
Household data within the selected PSUs are updated according to the Information System “Demography” data (ISD).
The longitudinal component consists of the sub-samples R1, R2, R3, R5 and R6.
All personal/household income variables were collected by interview.
In some cases, where the information on income component is unavailable a register to obtain missing value information is used. The National Social Security Institute keeps a register of all persons for whom employers pay social insurance contributions and of all self-insured persons. This register contains some data on personal income but it is generated by a labour activity of the persons and moreover, this is only the income on which the person was insured. From Social Assistance Agency obtained on income from social benefits
Two stage sampling on a territorial principle is implemented as follows:
- on the first stage - the census enumeration units (PSU) are selected;
- on the second stage - the households are identified.
Sampling rate and sampling size
Concerning the SILC instrument, three different sample size definitions can be applied:
- the actual sample size which is the number of sampling units selected in the sample
- the achieved sample size which is the number of observed sampling units (household or individual) with an accepted interview
- the effective sample size which is defined as the achieved sample size divided by the design effect with regards to the at-risk-of poverty rate indicator
Given that the effective sample size has been already treated in the section dealing with sampling errors, in this section the attention focuses mainly on the achieved sample size.
The necessary sample size for Bulgaria is determined in the Annex II of the Framework Regulation (1177/2003) to guarantee an effective sample size with regard to the at-risk-of-poverty indicator of 4500 households. The longitudinal sample for two successive waves should comprise at least 3500 households.
The total gross sample size (number of households) has been made analyzing the non-response rates and design effects of the previous EU-SILC surveys.
The total sample size in 2020 is 9052 households:
- 6372 “old” (2015, 2016, 2017, 2018 and 2019),
- 2680 “new” households (drawn in 2020).
Number of households for which an interview is accepted for the database.
Rotational group breakdown and total
Rotational group breakdown and total
The sample size for longitudinal component was 32374 households and 53938 persons aged 16 and over.
Number of households in longitudinal component
Number of persons 16 years and older
|Frequency of data collection|
SILC2018 data are collected with questionnaires (CAPI and PAPI) through personal interview with household, including in the sample and all household members aged 16 and more.
The mean interview duration per household is calculated as the sum of the duration of all household interviews plus the sum of the duration of all personal interviews, divided by the number of household questionnaires completed. Only households accepted for the database have to be considered.
The average household interview duration was about 23 minutes, while the average individual interview duration was about 21 minutes.
Average interview duration = 66
In the process Data-entry is a logical control of extreme values, filled-in information on all issues, data comparability checks, links between individual questionnaires and registers is carried out. After processing the primary data and receiving the target changes, a verification with the SAS program provided by Eurostat for verification and validation of the data is performed. Additional compatibility checks are performed before publishing the information
The database of each country contains a different types of weights:
Weighting factors were calculated as required to take into account the units’ probability of selection, non-response and to adjust the sample to external data relating to the distribution of households and persons in the target population, such as sex and age, residence or administrative-territorial districts (NUTS 3).
For the first year of the panel each household from the new rotation group got a sampling weight inversely proportional to the probability of selection of the household. These were the household’s design weights DB080.
To adjust for non-responding households the procedure “weighting classes” was used. The households were divided into classes where the probability to respond was assumed to be homogenous within the classes. Due to lack of information (demographic characteristics) for the non-responding households these classes were the sampling strata. The ratio of the weights of the responding households to the weights of all households in the given class was calculated.
After reflecting the non-responding households the base weights for the new rotation group were calibrated to the population as of 31.12.2019. For the calibration the following variables at individual and at household level were used:
The information on individuals as of 31.12.2019 was available from the ISD. The information on the households was an estimation made on the basis of the updated file on Census 2011 and data on the split-off households from the SILC survey. Persons born in 2020 were not included in the calibration as they were not part of the population as of the end of 2019. For the calibration of weights the SAS Macro Calmar 2 was used.
The logit method (M=3 in Calmar) was used for the calibration by setting upper and lower limits of the g-weights. The G-weights were the ratio of the assigned weights and the final calibrated weights. The upper limit in 2020 was 2.5 and the lower – 0.25.
The calibrated weights with reflected non-responding households were the base weights (RB060) for the new rotation group and will be used in the weighting procedure in the following years. These weights were also the longitudinal weights (DB095) of the households from the new rotation group.
Weighting procedure for rotation groups (13, 14, 15, 16, 17) from previous survey waves.
To get the base weights for the current year, the base weights (RB060) for each rotation group from the previous year were adjusted taking into account the non-response. The adjustment procedure was made on an individual and not on household level.
To adjust for non-response first all persons from the 2019 register (DB135 = 1 & RB110 in (1,2,3,4)) who were followed up in 2020 were marked as responding (current members of the household). Persons who have left the household between the two survey waves (2019 and 2020) were marked as non-responding. A logistic regression was used to calculate the probability for each individual to be enumerated between 2019 and 2020. The weights of the enumerated persons were adjusted with the probability of following up (result of logistic regression) and thus the base weights (RB060) for 2020 were get.
The model was applied for each rotation group separately. The independent variables used in the model were: poverty indicators, education, economic activity, age, sex, household size, household type, income, dwelling type. The dependent variable was the one showing if the individual was enumerated or not.
New members of the household after first year who were not part of the sample got base weights for the current year as follows:
· Children born to a sample mother got the weight of the mother;
· Persons who have come into the sample household outside the target population got base weight which was equal to the average base weight of the household members;
· Persons who have come into the sample household from other non-sample household within the target population got base weight equal to 0.
Each person in the household should receive equal weight within the household (RB050 cross-sectional weight). For this reason each household member whit zero and non-zero base weight received average base weight within the household.
After the non-response adjustment procedures each of the 5 rotation groups was calibrated separately to the population as of 31.12.2019 according to the method described above.
The same variables and levels as for the new rotation group were used for calibration.
Combining all (6) sub-samples
After applying all procedures for non-response adjustment and calibration, all sub-samples (rotation groups) were combined together. Each sub-sample separately represented all population of the country. To combine all sub-samples all weights were multiplied an appropriate scaling factor. The scaling factor used was 1/6 for 2020 as there were 6 rotation groups in the panel.
Final cross-sectional weights
Calibration of all rotation groups to current population.
After successfully applying all the procedures the weights were calibrated to the population as of 31.12.2019. The following variables on individual and household level were used for calibration:
(0-15) (16-19) (20-24) (25-29) (30-34) (35-39) (40-44) (45-49) (50-54) (55-59) (60-64) (65-69) (70-74) (75+)
In 2016 the number of pensioners was used as calibration variable for first time.
This variable had 3 levels:
1 - old-age pensions
2 - social pensions
3 - all others(rest of population)
To allocate each person to the correct sub-population data from NSSI was used- number of personal pensions as of 31.12. There were two reasons to use this variable as a calibration variable. First, get better estimation of pensioners and second, to reduce the standard error of the AROPE indicator.
After calibration the final cross-sectional weight DB090 of the household was obtained. The individual cross-section weight RB050 was equal to the corresponding household weight DB090 (RB050=DB090).
The newborn in 2020 were not included in the calibration. They received the corresponding household weight after calibration.
The personal cross-section weight for all individuals aged 16 and more (PB040) was calculated after the age group (0-15) was removed. Only the individuals who have responded (or were imputed) to the individual questionnaire (RB250 in (11,14)) were used. After one more calibration the weight PB040 (personal cross-sectional weight for all household members aged 16 and more) was obtained.
The Survey on Income and Living Conditions (SILC) is an annual survey implemented in the framework of Regulation (EC) No 1177/2003, which defines Scope, Definitions, Time coverage, Characteristics of the data, Sample size, Publication and Access to data.
Data are accompanied with quality reports analysing the accuracy, coherence and comparability of the data.
BG-SILC the main users are:
SILC covers only people living in private households (all persons aged 16 and over within the household are eligible for the operation), i.e. persons living in collective households and in institutions are generally excluded from the target population.
|Data completeness - rate|
|Accuracy and reliability|
As with any other statistical survey, SILC may be burdened with errors due to sampling and other relating to the inability to be interviewed some of the units in the sample, as well as the errors taking place at the stage of data recording, data processing, etc.
Regulation 1177/2003 defines the minimum effective sample sizes to be achieved to compensate for all kinds of non-response. The allocation of the effective sample size is done according to the size of the country and ensuring minimum precision criteria for the key indicator at national level (absolute precision of the at-risk-of-poverty rate of 1%).
Computations of standard errors were carried out using SAS programs for the SILC Quality Reports and Complex Sample analysis in SPSS ver.20.
|Sampling errors - indicators|
Sampling error - indicators
Estimation for main indicators by ethnic groups in 2020
Estimation for indicator ‘at-risk-of-poverty’ by districts in 2020
Non-sampling errors are basically of 4 types:
Coverage errors include over-coverage, under-coverage and misclassification:
|Over-coverage - rate|
Percentage of non-contacted addresses by reasons:
|Common units - proportion|
not requested by Reg.28/2004
As with any other statistical survey, EU-SILC may be burdened with non-sampling errors which occur at various stages of the survey and which cannot be eliminated completely. This mainly applies to interviewers’ errors at the stage of collecting the information, errors due to the respondents’ misunderstanding of questions and inaccurate or sometimes even false answers as well as the errors taking place at the stage of data recording.
EU-SILC is a non-obligatory, representative survey of individual households, performed by a face-to-face interview technique with the use of the CAPI methods. Two types of questionnaires: individual and household questionnaire were applied. In order to finalize the questionnaires, any observations made on the questionnaires of the previous years were taken into account. The data collected from the survey were compared to the data obtained from the registers. Some of the persons, who according to the register receive minimum income, defined themselves as unemployed or non-active in the survey, because they assess their current activity as temporary and did not indicate their income. Income from interests, dividends in unincorporated businesses is in general not provided from the households.
|Non response error|
|Unit non-response - rate|
* All the formulas are defined in the Commission Regulation 28/2004, Annex II
A* = Total sample; B = * New sub-sample
A* = Total sample; C = * Longitudinal 1 wave 2015 year
|Item non-response - rate|
The computation of item non-response is essential to fullfil the precision requirements concerning publication as stated in the Commission Regulation No 1982/2003. Item non-response rate is provided for the main income variables both at household and personal level.
EU-SILC data were collected with two kinds of questionnaires – household and individual questionnaire. Households and individuals are interviewed by electronic devices (CAPI). The data entry program was developed on Visual Basic.NET (MS Visual Studio 2017). The program is currently running on Windows 10 based tablet PCs.
We used the following components when installing the program:
A large number of edit checks (hard and soft) between questions in both questionnaires were implemented for ensuring data correctness and consistency. For example, two external files (at household and personal level) were used for verifying correctness of identifiers and for checking against previously collected information – household composition and questions such as day, month and year of birth, sex etc. for those individuals who are not observed for the first time. All gross income values were checked if they are equal or greater than net values (hard error) and if net values are greater or equal than gross values divided by two (soft error). In order to check the consistency of data on child allowances an additional check has been implemented – the program checks if the number and age of children in the household corresponds to the child allowances received in the household (hard error). Another check that has been added is between the salary of an individual, his/her profession and the minimum insurance income (soft error). According to national legislation the minimum insurance income is set to a certain level according to the profession type. For checking purposes, lower and upper boundaries, narrower than absolute, were set for most of the questions on income (e.g. social benefits, pensions) based upon national legislation. Internal files (implemented in the database) that hold valid ISCO-08 and NACE codes and descriptions were included.
During data entry phase, data entry operators were enabled to generate progress report by using SQL queries. The report contained form IDs, form status, number of errors and number of suppressed signals. A report for the number of individuals and households been interviewed or not grouped by interviewee had been added.
Data processing phase
After data-entry phase, further data checking and editing was performed by SILC unit, using SPSS scripts.
Initially, data were checked whether all questionnaires have been entered and completed. Special attention was paid to split-off households. Next, all suppressed signals and remarks made by data entry operators were checked up and relevant corrections were made. After that, data were converted to SPSS data sets. Extreme income values were compared with data provided by National Social Security Institute or administrative data sources and data from previous waves, where possible and corrected if necessary. All SILC target variables were computed after checking original variable(s). Finally, four transmission files were converted to .csv format and verified by Eurostat` SAS checking programs.
The main errors detected in the post-data-collection process were related to double registration of child allowances and personal income from agriculture, property or land. Both of them were recorded in household` and individual` questionnaires. As well as this, there were values that exceeded the maximum possible sizes of unemployment, old-age, survivor`, sickness and disability benefits.
All gross income values were checked if they are equal or greater than net values (hard error) and if net values are greater or equal than gross values divided by two (soft error).
|Imputation - rate|
Data processing is performed with statistical software SPSS.
Total gross income and disposable household income were calculated according to Document 065 (2020 operation). All personal/household income variables were collected by interview. For persons interviewed with electronic devices and where the information is available, the data from the administrative source is directly used. The National Revenue Agency provides data from the register of insured persons. The National Social Security Institute provides data on income from pensions and other social security payments. The Social Assistance Agency provides data on income from social benefits.
The interviewers and the respondents have the option of reporting income gross and/or net at component level. From 2012 Emploee cash or near cash income (PY010) collected only net. The form in which the net amounts are recorded in database are net of tax on income at source and of social contributions.
The gross income was obtained by summing up net value, income tax payments and compulsory social insurance contributions. If the information on tax and insurance contributions was missing, the amounts were imputed in accordance with the labour and social insurance legislations. If either the net or the gross value was missing for PY010 or PY050, the missing value was calculated on the basis of a net-gross conversion and vice versa.
In case of missing information on income components, the data of the National Revenue Agency, the National Social Security Institute and Social Assistance Agency are used.
When data from administrative registers are not available, the regression deterministic imputation method is applied.
For imputation of income variables in personal data file the following groups were created:
•Region (NUTS 2)
•Status in employment
The gross income was obtained by summing up net value, income tax payments and compulsory social insurance contributions. If the information on tax and insurance contributions was missing, the amounts were imputed according to labour and social insurance legislations. In some cases where only net income amounts were available these had to be converted to gross values using all necessary information.
Extreme income values and missing values were compared with data provided by National Social Security Institute or administrative data sources and data from previous waves, where possible and corrected if necessary.
Imputed rents are estimated for dwellings used as main residence by the households. The imputation is applied for those households that did not report paying rent:
The market rent is the rent due for the right to use an unfurnished dwelling on the private market, excluding charges for heating, water, electricity, etc.
Stratification method based on actual rents is used (the same used by National Accounts – the same stratification variables and the same market rents). The method is in line with ESA’95 and requirements of Commission Decision 95/309 and Commission Regulation 1722/2005 on the principle of estimating dwelling services.
-location (district centre with university, other district centre, smaller town, rural area)
-size of the dwelling
-number of rooms (1, 2, 3, 4+)
-amenities – availability of central heating
Actual market rents – main data sources:
-current price statistics
-household budget survey
-real estate agencies
The information on the private use of a company car is collected in the individual questionnaire. To evaluate the benefits of private use of company car we used the amount of kilometers driven, the number of months in which the car is used, the cost of fuel under statutory spending limits and the average price of fuel for the year. Take into account the amount that the employer provides of limit on fuel costs. In case of missing value imputation is applied with the use of hot-deck and regression imputation with simulated residuals methods.
|Model assumption error|
Not requested by Reg.28/2004
|Data revision - policy|
|Data revision - practice|
|Data revision - average size|
|Timeliness and punctuality|
SILC cross-sectional and longitudinal data are available in the form of tables 12 months after the end of the data collection period.
|Time lag - first results|
First data are available 6 months after data collection
|Time lag - final results|
Final results are available 12 months after data collection.
|Punctuality - delivery and publication|
|Coherence and comparability|
According to the Regulation (EC) No 1177/2003 of the European Parliament and of the Council concerning EU-SILC: "Comparability of data between Member States shall be a fundamental objective and shall be pursued through the development of methodological studies from the outset of EU-SILC data collection, carried out in close collaboration between the Member States and Eurostat".
Although the best way for keeping the comparability of data is to apply the same methods and definitions of variables, small departures of the definitions given by Eurostat are allowed in EU-SILC. In this way, the mentioned Regulation in its article 16th says: "Small departures from common definitions, such as those relating to private household definition and income reference period, shall be allowed, provided they affect comparability only marginally. The impact of comparability shall be reported in the quality reports."
The coherence of two or more statistical outputs refers to the degree to which the statistical processes, by which they were generated, used the same concepts and harmonised methods. A comparison with external sources for all income target variables and the number of persons who receive income from each ‘income component’ will be provided, where the Member States concerned consider such external data to be sufficiently reliable.
|Comparability - geographical|
Comparability across EU Member States is considered high due to use of harmonised concepts, variables, definitions and classifications.
|Asymmetry for mirror flows statistics - coefficient|
Not requested by Reg. 28/2004.
|Comparability - over time|
|Length of comparable time series|
|Coherence - cross domain|
The cross-sectional data for the EU-SILC2020 were compared to the Labor force survey 2020 and HBS 2020.
When comparing SILC and HBS we must take into account the discrepancies. The differences are to great extent brought about by the methodological diversity. Here are the main methodological differences:
|Coherence - sub annual and annual statistics|
Highest ISCED level attained
Self-defined current economic status
Status in employment weighted
|Coherence - National Accounts|
|Coherence - internal|
|Accessibility and clarity|
Poverty and Social Inclusion Indicators.
Detailed results are available to all users of the NSI website under the heading Social Inclusion and Living Conditions - Poverty and Social Inclusion Indicators: https://www.nsi.bg/en/node/8292 and INFOSTAT
|Data tables - consultations|
Anonymised individual data can be made available for scientific research purposes, and at the individual request of the Rules for the provision of anonymised individual data for scientific and research purposes.
Information service on request, according to the Rules for the dissemination of statistical products and services to NSI.
|Metadata - consultations|
|Documentation on methodology|
Detailed information about the list of social inclusion indicators, definitions and algorithm for their calculation on european level can be found on the following site:
|Metadata completeness – rate|
National Quality Report according to Regulation (EC) 28/2004.
|Cost and burden|
The total length of interviewing household in average 65 minutes.
|Confidentiality - policy|
|Confidentiality – data treatment|
According Art. 25 of the Statistics Act individual data are not published (they are suppressed). Dissemination of individual data is possible only according to Art. 26 of the Statistics Act.