Working with data
Class ressources
Pandas basic
Introduction to pandas
, a Python library used for working with data sets.
Pandas advanced
Data munging (wrangling) is the process of transforming raw data to a set of data tables that can be used for a variety of downstream purposes such as analytics.
References
Learning objectives
Theory
- Data should not be thought out of its context
- Important questions to ask when starting to collect / work with data
- Importance of the sampling
- Understand types of variables and observations
Practice basic
- create pandas Series
- create pandas DataFrames from Series, dictionaries, lists
- access data in a DataFrame with loc and iloc
- reset index
- rename columns
- access metadata of DataFrames
Practice advanced
- Add new manipulated variables
- Separate char to new variables
- Convert variables to numeric or factor
- Some string manipulations
- Rename variables
- Filter out different observations
conditional selection
tabulate frequency of a var
missing values
replace values
duplicates
- (Using pipes)
- Sorting data