Detailled presentation of the data

1 General presentation of the data

Data prepared for this project gathers all of the housing transactions in France between 2014 and 2021 and details about the property sold. Because the characteristics of the property sold isn’t open source, the data you’ll be working with are synthetic data. It has been generated to mirror the original data.

If you want to have a closer look at the data, CEREMA publishes on its website an extensive documentation of the different datasets, the variables, their modalities and the overall quality of variable.

2 List of variables

Label of the variable Full name of the variable Explanation and remarks Original label of the variable (French)
1 dist_tosea Distance of the property to the nearest seashore - capped at 10km This variable has been calculated distance_ltm (calculated)
2 farea Reported floor area of the property dsupdc
3 has_cheating If the property has access to central heating The modalities are coded in the following way :
- 0 : No
- 1 : Missing value
- 2 : Yes
gchclc
4 has_elec If the property has access to electricity The modalities are coded in the following way :
- 0 : No
- 1 : Missing value
- 2 : Yes
gelelc
5 has_elevator If the building of the flat has an elevator (for flats only) The modalities are coded in the following way :
- 0 : No
- 1 : Missing value
- 2 : Yes
gasclc
6 has_gas If the building is connected to the gas mains The modalities are coded in the following way :
- 0 : No
- 1 : Missing value
- 2 : Yes
ggazlc
7 has_mdrainage If the property is connected to the mains drainage system The modalities are coded in the following way :
- 0 : No
- 1 : Missing value
- 2 : Yes
gteglc
8 has_rchute If the building of the flat has refuse chutes (for flats only) The modalities are coded in the following way :
- 0 : No
- 1 : Missing value
- 2 : Yes
gvorlc
9 has_water If the property has access to water The modalities are coded in the following way :
- 0 : No
- 1 : Missing value
- 2 : Yes
geaulc
10 n_ancrooms Number of ancillary rooms reported in the property Ancillary rooms include hallways, attics. They differ from n_otherannex. dnbann
11 n_attic Number of attics reported in the whole property nb_greniers (calculated)
12 n_basmt Number of basements reported in the whole property nb_caves (calculated)
13 n_bath Number of bathtubs reported in the property dnbbai
14 n_eatr Number of eating rooms reported in the property dnbsam
15 n_floors Number of floor in the property (building or house) This variable is more reliable with houses than with buildings. Underground floors encoding is not fully harmonized and is often equal to 81 for minus 1, 82 for minus 2 … It can also be encoded as 99, 98. A flat at the 2nd floor of a seven-floors building should be encoded with nth_floor=1 and n_floors=8 (ground floor and seven floors above ground level) dnbniv
16 n_garage Number of garages reported in the whole property nb_garages (calculated)
17 n_kit8 Number of kitchens reported in the property with an area of less than 8 square meters dnbcu8
18 n_kit9 Number of kitchens reported in the property with an area of larger than 9 square meters dnbcu9
19 n_mrooms Number of main rooms reported in the property n_mrooms = n_eatr + n_slr + n_kit8 + n_kit9 + n_washr dnbppr
20 n_otherannex Number of other annexes reported in the whole property nb_autresdep (calculated)
21 n_pool Number of pools reported in the whole property nb_piscines (calculated)
22 n_rooms Number of rooms reported in the property n_rooms = n_mrooms + n_annex dnbpdc
23 n_show Number of showers reported in the property dnbdou
24 n_sink Number of sinks reported in the property dnblav
25 n_slr Number of sleeping rooms reported in the property dnbcha
26 n_terrace Number of terraces reported in the whole property nb_terrasses (calculated)
27 n_washr Number of washing rooms reported in the property dnbsea
28 n_wc Number of toilets reported in the property dnbwc
29 nth_floor Reported floor of the property It represents the floor of the flat (in France, the second floor is the first floor above ground level). This variable is set to 00 for houses. Underground floors encoding is not fully harmonized and is often equal to 81 for minus 1, 82 for minus 2 … It can also be encoded as 99, 98. A flat at the 2nd floor of a seven-floors building should be encoded with nth_floor=1 and n_floors=8 (ground floor and seven floors above ground level). dniv
30 price Price of the transaction Price of the transaction is in EUR valeurfonc
31 price_sqm Price per square meter of the transaction price_sqm (calculated)
32 prop_loc_citycode Official city’s code where the property is located see remarks above depcom
33 prop_loc_dep Department code where the property is located Data doesn’t cover the whole French territory - overseas territory are included but Alsace-Moselle (Eastern part of France) isn’t ccodep
34 prop_loc_x Longitude where the property is located see remarks above x
35 prop_loc_y Latitude where the property is located see remarks above y
36 prop_type Type of property 1 represents a flat and 2 a house dteloc
37 prop_year_harm Year of contruction of the property This variable has been harmonized to correct for typing mistakes. More details is available in the introduction. jannath
38 s_land_agri Agricultural land area (square meters) Agricultural land is used for farming. It includes fields, meadows, orchards and vineyards. dcntagri
39 s_land_artif Artificial land area (square meters) Artificial land includes recreational areas, land, building plots and gardens. Artificial land refers to land that has been altered by humans. dcntsol
40 s_land_nat Natural land area (square meters) Natural land is land that has not been altered. This includes, for example, forests. dcntnat
41 stair If the building of the flat has stairs (for flats only) The modalities are coded in the following way :
- 0 : No
- 1 : Missing value
- 2 : Yes
gesclc
42 trans_date Date of the official certified transaction datemut
43 trans_id Unique identifier code of the transaction idmutation
44 trans_month Month of the official certified transaction moismut
45 trans_type_code Type of transaction There are several types of transaction in the original data (sale, off-plan sale, sale of building land, tender, compulsory purchase). Original data have been filtered to keep only sale. idnatmut
46 trans_type_label Type of transaction libnatmut
47 trans_year Year of the official certified transaction anneemut

See CEREMA’s documentation (in French) for a more detailled description.

3 Detailled note

Note that the variable year of the building has been harmonized to correct for typing mistakes. Here are the transformation made from the original data :

Original modality Corrections to original modality Exemple of transformation
1 ≤ prop_year ≤ 22 Add 2000 as the first digits were not entered 8 → 2008
23 ≤ prop_year ≤ 99 Add 1900 as the first digits were not entered 83 → 1983
100 ≤ prop_year ≤ 119 Set to 0 as it is unknown 105 → 0
120 ≤ prop_year ≤ 200 Add a 0 at the end as the last digit has not been entered 187 → 1870
201 ≤ prop_year ≤ 299 Set to 0 as it is unknown 250 → 0
300 ≤ prop_year ≤ 999 Add 1000 as the first digits have not been entered 980 → 1980
1000 ≤ prop_year ≤ 1120 Set to 0 as it is unknown 1005 → 0
1120 ≤ prop_year ≤ 1199 Replace the second 1 with 9 1155 → 1955
1200 ≤ prop_year ≤ 2022 No change 1200 → 1200

More information about this transformation is available online (in French).