• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Книга
Internet in Russia: A Study of the Runet and Its Impact on Social Life

Sherstoboeva E., Vartanova E., Konradova N. et al.

Cham: Springer, 2020.

Глава в книге
Investments in Runet

Rozhkov A., Zobnina M. R.

In bk.: Internet in Russia: A Study of the Runet and Its Impact on Social Life. Cham: Springer, 2020. Ch. 4. P. 65-82.

Препринт
Optimal policy design for the sugar tax

Grigoriev A., Geyskens K., Holtrop N. et al.

Working papers by Cornell University. Cornell University, 2018. No. 07243.

Data Analysis in Python

2019/2020
Учебный год
ENG
Обучение ведется на английском языке
4
Кредиты
Статус:
Курс по выбору
Когда читается:
3-й курс, 4 модуль

Course Syllabus

Abstract

The course introduce learners to data science through the python programming language. This skills-based specialization is intended for learners who have a basic python or programming background, and want to apply statistical, machine learning, information visualization, text analysis, and social network analysis techniques through popular python toolkits such as pandas, matplotlib, scikit-learn, nltk, and networkx to gain insight into their data.
Learning Objectives

Learning Objectives

  • Develop basic skills in data analysis with Python
  • Develop basic skills in visualisation with Python
Expected Learning Outcomes

Expected Learning Outcomes

  • Query DataFrame structures for cleaning and processing
  • Explain distributions, sampling, and t-tests
  • Describe common Python functionality and features used for data science
  • Understand techniques such as lambdas and manipulating csv files
  • Create a visualization using matplotlb
  • Identify the functions that are best for particular problems
  • Understand best practices for creating basic charts
  • Describe what makes a good or bad visualization
Course Contents

Course Contents

  • Common Python functionality and features
    An introduction to the field of data science, review common Python functionality and features which data scientists use, and you will be introduced to the Coursera Jupyter Notebook for the lectures. All of the course information on grading, prerequisites, and expectations are on the course syllabus, and you can find more information about the Jupyter Notebooks on our Course Resources page.
  • Fundamentals of Pandas
    The fundamentals of one of the most important toolkits Python has for data cleaning and processing -- pandas. You'll learn how to read in data into DataFrame structures, how to query these structures, and the details about such structures are indexed. The module ends with a programming assignment and a discussion question.
  • Dataframes in Pandas
    Understanding of the Python Pandas library by learning how to merge DataFrames, generate summary tables, group data into logical pieces, and manipulate dates. We'll also refresh your understanding of scales of data, and discuss issues with creating metrics for analysis. The week ends with a more significant programming assignment.
  • Statistics in Python
    Introduction to a variety of statistical techniques such a distributions, sampling and t-tests. The majority of the week will be dedicated to your course project, where you'll engage in a real-world data cleaning activity and provide evidence for (or against!) a given hypothesis. This project is suitable for a data science portfolio, and will test your knowledge of cleaning, merging, manipulating, and test for significance in data. The week ends with two discussions of science and the rise of the fourth paradigm -- data driven discovery.
  • Principles of Information Visualization
    Introduction to principles of information visualization. We will be introduced to tools for thinking about design and graphical heuristics for thinking about creating effective visualizations. All of the course information on grading, prerequisites, and expectations are on the course syllabus, which is included in this module.
  • Basic Charting
    Basic charting. For this week’s assignment, you will work with real world CSV weather data. You will manipulate the data to display the minimum and maximum temperature for a range of dates and demonstrate that you know how to create a line graph using matplotlib. Additionally, you will demonstrate the procedure of composite charts, by overlaying a scatter plot of record breaking data for a given year.
  • Charting Fundamentals
    Further exploration of charting fundamentals. For this week’s assignment you will work to implement a new visualization technique based on academic research. This assignment is flexible and you can address it using a variety of difficulties - from an easy static image to an interactive chart where users can set ranges of values to be used.
  • Applied Visualizations
    Final assignment. In this module, then everything starts to come together. Your final assignment is entitled “Becoming a Data Scientist.” This assignment requires that you identify at least two publicly accessible datasets from the same region that are consistent across a meaningful dimension. You will state a research question that can be answered using these data sets and then create a visual using matplotlib that addresses your stated research question. You will then be asked to justify how your visual addresses your research question.
Assessment Elements

Assessment Elements

  • non-blocking Онлайн-тестирования по теме "Introduction to Data Science in Python"
    Средняя оценка по пройденным онлайн-тестированиям (6 тестов)
  • blocking Письменный экзамен
    Дисциплина Data Analysis in Python читается в формате MOOC на платформе Coursera и состоит из двух курсов: (1) "Introduction to Data Science in Python" https://www.coursera.org/learn/python-data-analysis/home/welcome; (2) "Applied Plotting, Charting & Data Representation in Python” https://www.coursera.org/learn/python-plotting/home/welcome. Дедлайн завершения курса 16 июня 2020. Выполнение индивидуального итогового проекта по курсу: Applied Plotting, Charting & Data Representation in Python” https://www.coursera.org/learn/python-plotting/home/welcome. направленного на проверку освоения знаний по темам курсов: " Introduction to Data Science in Python ” и " Applied Plotting, Charting & Data Representation in Python ”, является экзаменом по дисциплине. Студенты могут пользоваться материалами курса и дополнительными материалами для выполнения экзаменационного задания и должны выполнить финальный проект самостоятельно.
  • non-blocking Онлайн-тестирования по теме "Applied Plotting, Charting & Data Representation in Python"
    Средняя оценка по пройденным онлайн-тестированиям (6 тестов)
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.5 * Онлайн-тестирования по теме "Applied Plotting, Charting & Data Representation in Python" + 0.5 * Онлайн-тестирования по теме "Introduction to Data Science in Python"
Bibliography

Bibliography

Recommended Core Bibliography

  • - Мастицкий С.Э. — Визуализация данных с помощью ggplot2 - Издательство "ДМК Пресс" - 2017 - ISBN: 978-5-97060-470-0 - Текст электронный // ЭБС Лань - URL: https://e.lanbook.com/book/107895
  • Nelli, F. (2015). Python Data Analytics : Data Analysis and Science Using Pandas, Matplotlib and the Python Programming Language. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1056488
  • Nelli, F. (2018). Python Data Analytics : With Pandas, NumPy, and Matplotlib (Vol. Second edition). New York, NY: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1905344
  • Демидова О. А., Малахов Д. И.-ЭКОНОМЕТРИКА. Учебник и практикум для прикладного бакалавриата-М.:Издательство Юрайт,2019-334-Бакалавр. Прикладной курс-978-5-534-00625-4: -Текст электронный // ЭБС Юрайт - https://biblio-online.ru/book/ekonometrika-432950

Recommended Additional Bibliography

  • Mirkin, B. Core concepts in data analysis: summarization, correlation and visualization. – Springer Science & Business Media, 2011. – 388 pp.