Data exploration: theory
This page includes both the theoretical and the first pracice part of the data exploration section. It goes hand in hand with the 2 second practice sessions (data-munging with pandas
).
Before class
Theory
- read Gabor & Békés 1.1, 1.5, 1.7
<!–
- prepare for Class discussions:
- what are the disadvantages & avantages of relying on the different sources (survey, administrative, data collection) of data for the analysis?
- What type of variable do you know?
–>
Practice
In class
Theory
Link to the slides: html, pdf
Practice:
The lecture introduces pandas
, a Python library used for working with data sets.
After class
Learning objectives
Theory
- Data should not be thought out of its context
- Important questions to ask when starting to collect / work with data
- Importance of the sampling
- Understand types of variables and observations
Practice
- create pandas Series
- create pandas DataFrames from Series, dictionaries, lists
- access data in a DataFrame with loc and iloc
- reset index
- rename columns
- access metadata of DataFrames
Links:
pandas
basics
- Data-munging with
pandas
<!–
- Case study on XXX (cf. lino)
–>
Reference