Data exploration: theory

This page includes both the theoretical and the first pracice part of the data exploration section. It goes hand in hand with the 2 second practice sessions (data-munging with pandas).

Before class

Theory

  • read Gabor & Békés 1.1, 1.5, 1.7 <!–
  • prepare for Class discussions:
    • what are the disadvantages & avantages of relying on the different sources (survey, administrative, data collection) of data for the analysis?
    • What type of variable do you know? –>

Practice

In class

Theory

Link to the slides: html, pdf

Practice:

The lecture introduces pandas, a Python library used for working with data sets.

After class

Learning objectives

Theory

  • Data should not be thought out of its context
  • Important questions to ask when starting to collect / work with data
  • Importance of the sampling
  • Understand types of variables and observations

Practice

  • create pandas Series
  • create pandas DataFrames from Series, dictionaries, lists
  • access data in a DataFrame with loc and iloc
  • reset index
  • rename columns
  • access metadata of DataFrames
  1. pandas basics
  2. Data-munging with pandas <!–
  3. Case study on XXX (cf. lino) –>

Reference