View on GitHub

ECON2206-Data-Management-2022

ECON2306-Data-Management-2021-22

Course materials for Spring 2021 HEC Liège Course, “Data Management”

Outline of the class

The following table presents the structure of the class through the program & materials of the Lectures et the Practices classes. The Lecture column lists the content associated with theoretical presentations.
The practice column details the content associated with practical sessions. The Homework column details the homework, that are either project’s milestones or problem sets.

Week Lecture Practice Problem sets, project milestones
10-Feb Introduction html pdf Using git html pdf Installing python using the guide ; getting ready with git
17-Feb Introduction to Python slides Python practice notebook  
24-Feb The importance of visualization principles (1h) slides Pandas 1: introduction (2h) slides PS1: simple pandas + graphs (due:17/03)
03 & 10 -Mar Webscraping theory notebook Webscraping practice notebook Start thinking about a project idea
17-Mar Satistical learning html pdf Kickstarting of course project ML1: Having decided on the project idea
24-Mar Supervised learning slides pdf Practice ML notebook PS2 on ML (due: 28/04)
31-Mar Unsupervised learning html pdf ‘Open house’ on webscraping of project  
28-Apr Natural Language Processing slides NLP notebook ML2: Webscraping done; starting with visualization
05-May ‘Open house’ on webscraping Presentation of project ideas & scraping methodology  
12-May Visualization using dash notebook ‘Open house’ on dash visualization  
19-May ML3: Student’s presentation of the applications    

A notebook with some guidelines for using selenium can be found here

Course project

I would like you to chose a standard [research or policy or business] question that you can answer using data (eg. where are located the death from covid? Where are the cheapest beers in Liège? At what time people go to work? When and where should you go sailing?)

You will develop a project abiding by the good practices taught in class. The project should incorporate:

Submission format: Invite @malkaguillot and @michel to collaborate on your GitHub repository by the due date (see here).

- This means, we expect well version controlled work.
- a github folder with at least 5 commits (bonus if you have several development branches!)
- Tag your final submission using the following `git command git tag -a 1.0 -m "submitted version"`.
- You must have a `README.md` in the main directory with:
  - instructions on how we can build the assignment & what it does;
  - a link to the deploymed application.

Projects should be realized by group of 1 or 2 (one group of 3 if there is an odd number of remaining students).

Project milestones We will discuss about your proposed project several times so that we can evaluate whether it is do-able within the time frame. The project will be organized around the following milestones, materialized by a presentation in class:

Problem sets

The Problem sets are simple exercises designed to help you to “get their hands in the data”. They will be available on Lola and should be handed in there. These problem sets will take the form of Jupyter notebooks, that will have to be converted in pdf before being handed in.

Evaluation

Detail Grading decomposition
Problem sets (individual) 15% (5% each)
Course project 85%
Project management (reproducibility, github, readme) 15%
Relevance of the project
Does the project respond to an interesting/important question?
10%
Quality of the visualization
Choice of the graphical representations & colors
20%
Technical dimension
Is the project using advanced tools/techniques?
15%
Quality of the oral presentation
Final presentation
Project idea & scraping methodology
Visualization plan
25%
15%
5%
5%
Bonus (class participation) 5%