DETAILS FOR THE 2021 EDITION WILL BE POSTED SHORTLY.
The Barcelona GSE Data Science Summer School introduces participants to some of the tools and methods of Data Science.
Program director
Fees and discounts
Fees vary by course based on the number of lecture hours and practical hours.
You may be eligible for one or more available Summer School discounts. Our staff can provide a personalized quote for you.
Computing for Data Science
Course Outline
This is an intensive, hands-on, 1-day course that will provide participants with the computing skills necessary for information retrieval, data management, data analysis and Machine Learning. It is an 8-hour course, with a number of little projects that take place during the course, that evolves around the following sub-themes:
- Programming with Python
Keywords: data types, functions, iterates, objects, classes - Data analysis with Python
Keywords: pandas, database management, groupby, merge - Scraping data
Keywords: scraping, API's, cloud storage - Data visualization in Python
Keywords: matplotlib, seaborn
About the Instructors
The course is coordinated by Omiros Papaspiliopoulos and delivered jointly with Data Scientists from the Barcelona GSE Data Science Center.
Omiros Papaspiliopoulos
ICREA-UPF and Barcelona GSE- Programming with Python
Foundations of Data Science
Course Outline
This is an intensive 20-hour course based on a hands-on approach using Jupyter notebooks, all material is motivated by specific information retrieval and data analysis questions and each thematic unit concludes with a small project.
The course evolves along the following thematic units:
- Supervised learning (regression and classification)
Keywords: sklearn, linear models, cross validation, regularisation, lasso, trees, ensembles, boosting, nearest neighbour methods, class imbalance, multiclass predictive models, ordinal data - Unsupervised learning
Keywords: factors, latent variables, independent component analysis, matrix factorization, embeddings, connections to neural networks (e.g. autoencoders), multidimensional scaling, clustering, K-means, spectral clustering and graph-based methods, latent semantic analysis and topic models
The course includes project sessions where the methods and algorithms developed in the hands-on sessions are employed within the context of concrete machine learning problems that the students with the guidance of the instructor and the help of junior data scientists are going to solve end to end.
Prerequisites:
Computing for DS (or proven record in the CV that candidate has these skills)
About the Instructors
The course will be delivered by Omiros Papaspiliopoulos, ICREA Research Professor and Director of the Barcelona GSE Data Science Center, Joan Verdu, Head of Consulting and Knowledge Transfer at the Data Science Center, and Nandan Rao, Senior Data Scientist at the Data Science Center and PhD student at UAB, in collaboration with junior Data Scientists affiliated with the Barcelona GSE Data Science Center who will be providing assistance to practical sessions.
Omiros Papaspiliopoulos
ICREA-UPF and Barcelona GSENandan Rao
PhD Student, UABJoan Verdú
Barcelona GSE Data Science Center- Supervised learning (regression and classification)
Using Text as Data for Public Policy: Machine Learning meets Economics
Course overview
In this course students will learn how to use large corpuses of text to support better decision making. Students will learn how to:
- Treat text as data. At the end of the course students will be able to convert vast archives of text into a format that can be used for data analysis.
- Use the data generated from text to tackle policy problems in the specific environment of their respective organization
To achieve both of these goals the course will work with a combination of hands-on case studies and examples from economics. The case studies which will serve as a way to motivate and explain the toolkit. Students will, for example:
- Use Twitter data to analyze (political) trends like polarization in social networks and predict the stock market.
- Use newspaper text to understand how news stories emerge, track uncertainty, forecast economic activity and spot and predict instability around the world.
- Use UN security council resolutions to keep track of global risks and track demands made towards countries by the security council.
- Use text from patents to construct a measure of the novelty and long-term impact of inventions.
These case studies will be complemented by examples from established research projects in economics to illustrate how data can be used to answer policy challenges and move from data analysis to hypothesis testing and causal inference.
At the end of the course students will be able to: read in large archives of text in all possible text formats, scrape text from the internet and know how to put this text into the right format for analysis. They will also know how to use standard econometric tools together with dictionaries, similarity measures like tf/idf, unsupervised machine learning like topic models, supervised machine learning, and word embeddings to develop and answer policy questions.
About the instructors
Hannes Mueller is a tenured researcher at the Institute for Economic Analysis (IAE-CSIC) and an Associate Research Professor at the Barcelona GSE. His fields of interest are Political Economy, Development Economics and Conflict Studies with a particular focus on the effect of violent conflict on the economy. Most recently, Prof. Mueller is trying to adopt supervised and unsupervised machine learning techniques for economics and political science research. He has published in leading journals in Economics and Political Science such as the American Economic Review (AER), the American Political Science Review (APSR), the Journal of the European Economic Association (JEEA) and the American Journal of Economics: Macro (AEJ: Macro). He has contributed reports for the International Growth Centre (UK government) and the World Bank on the economic effects of conflict, a joint UN/World Bank study on conflict prevention and the UN Economic Commission for Africa on structural change in Northern Africa. He is currently involved in projects with the Banco de España developing techniques for nowcasting and forecasting economic conditions with text.
Ruben Durante is ICREA Research Professor at UPF and Affiliated Professor of the Barcelona GSE. He works in the field of political economics, with a focus on the functioning and impact of traditional and new media in democratic societies. His work has been published in a number of top economic journals, including the Journal of Political Economy, the American Economic Journal: Applied Economics, and the Journal of the European Economic Association, and has been featured extensively in the popular press.
Nandan Rao is a product of the Barcelona GSE Master's Degree in Data Science (Class of 2017) and is currently pursuing his long-time dream of becoming an economist, working towards a PhD at Universitat Autònoma de Barcelona. Previously, he was Head of Engineering at machine-learning startup Relink Labs in Copenhagen, taught Fullstack Javascript programming at Codeworks, and put data science in production with research projects at the IOM, the OECD, and The World Bank.
Hannes Mueller
IAE-CSIC and Barcelona GSERuben Durante
ICREA-UPF and Barcelona GSENandan Rao
PhD Student, UABDeep Learning and Applications
Data Science Summer School Week 1 and 2 participants are expected to have a basic knowledge of linear algebra, basic computing skills, and familiarity with any kind of programming language.
However, Deep Learning and Applications in the Week 3 is taught at a higher level and participants must know fundamentals of programming and data analysis with Python and fundamental concepts in statistical learning, overall the type of material which is covered in Week 1 of the Data Science Summer School (Foundations of Data Science).
In case candidates for this course are not taking also Week 1 course, they must upload a statement of purpose explaining their qualification for the course at the moment of applying. The statement should include any relevant info about academic knowledge and personal skills and interests as well as professional experience in the sector if any. In case of having joined the Data Science Summer School in the past, this should be mentioned in the statement.
Course Outline
In this course we will introduce several aspects of modern machine learning, deep learning and it’s applications:
- An overall introduction to Deep Learning covering convolutional networks, recurrent neural networks, autoencoders and multilayer deep networks. The course includes a 2-hour tutorial on how to code these types of networks using the popular ‘keras’ library for python.
- Deep Learning for Recommender systems where we deal with the application of advanced Machine Learning and Deep Learning methods in recommender systems. Here we address Tensor Factorization, Factorization Machines, 2vec type embeddings, Deep Collaborative Filtering techniques such as Autoencoders for Collaborative Filtering, RNN’s for session-based recommendations and convolutional networks for feature extraction. This session includes a hands-on part where these techniques are applied to real recommendation data sets using the keras python library.
- Natural Language Processing with Deep Learning, this module will focus on several aspects of modern NLP such as language modeling, word and document embeddings, conversational models and visualization and the use of Deep Learning models to perform these tasks also including a hands-on session where several of these tasks are coded with keras.
- The course will also include two data hackathons where the aim will to use the knowledge acquired in the previous days of the summer school.
About the Instructors
Alexandros Karatzoglou is a Senior Researcher at Google Research. Before this he was the Scientific Director at Telefonica Research. His research focuses on Machine Learning. Alexandros received his PhD in Machine Learning from the Vienna University of Technology (TUWIEN). During his PhD he was a frequent visitor to the Statistical Machine Learning group at the ANU/NICTA in Canberra Australia. He has over 50 papers in the field and has won 3 best paper awards at the ACM RecSys and ECMLPKDD conferences. He is also the author of the core machine learning R package kernlab, and enjoys giving lectures on Machine Learning, Recommender Systems and Computational Statistics.
Ilias Leontiadis is a Senior Researcher at Samsung AI. Before this he was a researcher at Telefonica Research and University of Cambridge. Ilias received his PhD degree from University College London (UCL). His research interests include mobile systems, pervasive computing, wireless networks, sensor networks, mobile phone privacy and mobility modeling.
Carlos Segura is a Research Scientist at Telefonica Research in Barcelona. He works on machine learning and artificial intelligence applied to multimedia signal processing, natural language processing and dialogue systems. He obtained his MSc and PhD (2011) in computer science at the Center for Language and Speech Technologies and Applications of the Universitat Politècnica de Catalunya. During his MSc, he developed his Master's thesis at TU-Berlin, where he continued working as a research assistant. In 2010 he joined the company Herta Security as the Director of Innovation under the Torres Quevedo program, where he worked on the research and development of biometric algorithms for speaker and face recognition. He has collaborated in several national and EU projects and technology evaluations, and has published many scientific papers in peer-reviewed international journals and international conferences.
Alexandros Karatzoglou
Senior Researcher, Google ResearchIlias Leontiadis
Senior Researcher, Samsung AICarlos Segura
Researcher, Telefónica
Laptop required for practical courses
Practical courses will be held in a lecture room, not in a computer lab. Participants must bring a laptop in order to follow these sessions.
Entry requirements
Applicants to all Summer School programs should meet the basic entry requirements. In addition, Data Science participants are expected to have a basic knowledge of linear algebra, basic computing skills, and familiarity with any kind of programming language (not necessarily R, Python)
Certificate of attendance
Participants will receive a Certificate of Attendance stating the courses and number of hours completed.
Fees
The price of each course includes all lecture hours and practical hours. Multiple course discounts are available. Fees for courses in other Summer School programs may vary.
Course fees will be posted shortly.
* Reduced Fee applies to PhD/Masters students, including Barcelona GSE students and alumni. See more information about available discounts or request a personalized discount quote by email.
Course schedule
The schedule is designed to allow students to participate in all courses in the Data Science program. Courses can also be taken individually or in combination with courses in other Barcelona GSE Summer School programs, schedule permitting.
Course schedule will be posted shortly.
Mix and match your summer courses!
Remember that you can combine Data Science courses with courses in other programs happening during Week 1, Week 2, and Week 3 (schedule permitting).