Working with data

Class ressources

Pandas basic

Introduction to pandas, a Python library used for working with data sets.

Pandas advanced

Data munging (wrangling) is the process of transforming raw data to a set of data tables that can be used for a variety of downstream purposes such as analytics.

References

Learning objectives

Theory

  • Data should not be thought out of its context
  • Important questions to ask when starting to collect / work with data
  • Importance of the sampling
  • Understand types of variables and observations

Practice basic

  • create pandas Series
  • create pandas DataFrames from Series, dictionaries, lists
  • access data in a DataFrame with loc and iloc
  • reset index
  • rename columns
  • access metadata of DataFrames

Practice advanced

  • Add new manipulated variables
  • Separate char to new variables
  • Convert variables to numeric or factor
  • Some string manipulations
  • Rename variables
  • Filter out different observations conditional selection tabulate frequency of a var missing values replace values duplicates
  • (Using pipes)
  • Sorting data