Statistical Data Science

View the Project on GitHub UCSB-PSTAT-134-234/Spring2019

PSTAT 134/234: Statistical Data Science

Instructor: Sang-Yun Oh


Catalog description: Overview and use of data science tools in Python for data retrieval, analysis, visualization, reproducible research and automated report generation. Case studies will illustrate practical use of these tools. This new course will focus on concepts that are relevant for data science by using some of the popular software tools in this area. Doing data science is more than using isolated methods. Creatively using a collection of concepts and domain knowledge is emphasized to clean, transform, analyze, and present data. Concepts in data ethics and privacy will also be discussed. Case studies will illustrate real usage scenarios.

Programming experience: This course is designed for students with a solid conceptual understanding of programming primitives (e.g., flow control, functions, arrays, data types) and is comfortable in at least one programming or scripting language (C/C++, R, Python, etc.).

Software tools: Many software tools are used for data science. Tools we will use for this course include (but not limited to)

Learning by doing will require searching/reading software documentation, experimenting by trial-and-error, and debugging.


INT 15: Data Science Principles & Techniques (Spring 2019) Students that do not need to take PSTAT 134 or 234 to fulfill degree requirements are encouraged to consider a similar course offered as INT 15. The course will be co-taught by CS and PSTAT faculty members. More information can be found here: https://ucsb-int15.github.io/