Why Pace 2.O for Data Science Masters Program




Enquire Now

Data Science Masters Program

1. Introduction to Data Science

  • Introduction to Big Data, State of the practice in analytics
  • Current Analytical Architecture
  • Drivers of Big Data, Emerging Big Data Ecosystem
  • Big Data Analytics Project Life Cycle: Overview, Phase 1- Discovery, Phase 2- Data preparation, Phase 3-Model Planning, Phase 4- Model Building, Phase 5- Communicate Results, Phase 6- Operationalize.
  • Introduction to Machine Learning


2. Python Programming Language Concepts

  • Programming Language: Python
  • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), PyCharm, Tableau, SubLime Text
  • Library/Package Usage: Datetime, Statsmodels, NumPy, Pandas, Seaborn, Matplotlib

  • Module 1: Introduction to Python, What is Python and history of Python?, Unique features of Python, Python-2 and Python-3 differences, Install Python and Environment Setup, First Python Program, Python Identifiers, Keywords and Indentation, Comments and document interlude in Python, Command line arguments, Getting User Input, Python Data Types, What are variables?, Python Core objects and Functions, Number and Maths.
  • Module 2: List, Ranges & Tuples in Python, Introduction, Lists in Python, More About Lists, Understanding Iterators, Generators , Comprehensions and Lambda Expressions, Introduction, Generators and Yield, Next and Ranges, Understanding and using Ranges, More About Ranges, Ordered Sets with tuples
  • Module 3: Python Dictionaries and Sets, Introduction to the section, Dictionaries, More on Dictionaries, thon Sets, Python Sets Examples text files, writing
  • Module 4: Input and Output in Python, Reading and writing Challenge, Writing Binary Files Manually, Text Files, Appending to Files and Using Pickle to Write Binary Files
  • Module 5: Python built in function, Python user defined functions, Python packages functions, Defining and calling Function, The anonymous Functions Loops and statement in Python, Python Modules & Packages
  • Module 6: Python Regular Expressions : What are regular expressions?, The match Function, The search Function, Matching vs searching, Search and Replace, Extended Regular Expressions, Wildcard
  • Module 7: Python For Data Analysis Numpy : Introduction to numpy, Creating arrays, Using arrays and Scalars, Indexing Arrays, Array Transposition, Universal Array Function, Array Processing, Array Input and Output
  • Module 8: Python For Data Analysis Pandas : What is pandas?, Where it is used?, Series in pandas, Index objects Reindex, Drop Entry, Selecting Entries, Data Alignment, Rank and Sort, Summary Statics, Missing Data, index Hierarchy, Matplotlib: Python For Data Visualization.
  • Module 9: Using Databases in Python, Python MySQL Database Access, Install the MySQLDB and other Packages, Create Database Connection, CREATE, INSERT, READ, UPDATE and DELETE Operation, DML and DDL Operation with Databases, Handling Database Errors, Web Scraping in Python.

3. Database Programming with SQL

Database Technology – Oracle SQL, MySQL
  • Introduction
    • Data vs. Information
    • History of the Database
    • Major Transformations in Computing

  • Entities and Attributes
    • Conceptual and Physical Models
    • Entities, Instances, Attributes, and Identifiers
    • Entity Relationship Modeling and ERDs

  • Relationship Fundamentals
    • Relationship Transferability
    • Relationship Types
    • Resolving Many-to-Many Relationships
    • Understanding CRUD Requirements
    • Anatomy of a SQL Statement

  • SELECT and WHERE
    • Columns, Characters, and Rows
    • Limit Rows Selected
    • Comparison Operators

  • WHERE, ORDER BY, GROUP BY, HAVING and Intro to Functions
    • Logical Comparisons and Precedence Rules
    • Sorting Rows
    • Introduction to Functions

  • Single Row Functions
    • Character Functions
    • Number Functions
    • Date Functions
    • Conversions Functions
    • General Functions

  • Joins
    • Cross Joins and Natural Joins
    • Join Clauses
    • Inner versus Outer Joins

  • Data Manipulation Language (DML)
    • INSERT Statements
    • Updating Column Values and Deleting Rows

  • Data Definition Language (DDL)
    • Creating Tables
    • Using Data Types
    • Modifying a Table

  • Constraints
    • Intro to Constraints; NOT NULL and UNIQUE Constraints
    • PRIMARY KEY, FOREIGN KEY, and CHECK Constraints
    • Managing Constraints

  • Views
    • Creating Views
    • DML Operations and Views
    • Managing Views

4. Statistics for Data Science

  • Programming Language: Python, R
  • Tools Usage: REPL Online, Anaconda(Jupyter Notebook / Spyder), R Studio, PyCharm, Tableau, SubLime Text
  • Library/Package Usage: SciPy.stats, Statistics, NumPy, Statsmodels, Seaborn
    • Datatypes and its measures
    • Random Variables and its applications
    • Introduction to Probability with examples
    • Sampling Techniques – Why and How
    • Measures of Central Tendency- Mean, Median, Mode
    • Measures of Dispersion- Variance, Standard Deviation, Range
    • Measures of Skewness & Kurtosis
    • Normality tests for dataset
    • Basic Graph Representations- Bar Chart, Histogram, Box Plot, Scatterplot
    • Probability Distributions
      • Continuous Probability Distribution
        • Normal Distribution
        • Standard Normal Distribution ( Z Distribution)
        • F-Distribution
        • Chi-Square Distribution
      • Discrete Probability Distribution
        • Binomial Distribution
        • Poisson Distribution
    • Building Normal Q-Q Plot and its Interpretation
    • Central Limit Theorem for sampling variations
    • Confidence Interval – Computation and analysis
    • Data Cleansing (Dealing with Missing Data, Outlier Detection)
    • Feature Engineering (Label Encoding, One-Hot Encoding)
    • Data Transformation, including merging, ordering, aggregation
    • Sampling (Balanced, Stratified, ...)
    • Data Partitioning (Create Training + Validation + Test Data Set)
    • Transformations (Normalization, Standardization, Scaling, Pivoting)
    • inning (Count-Based, Handling Of Missing Values as its own Group)
    • Data Replacement (Cutting, Splitting, Merging, ...)
    • Weighting And Selection (Attribute Weighting, Automatic Optimization)
    • Imputation (Replacement of Missing Observations with Statistical Algorithms)

5.Research Methodology for Data Science

  • Programming Language: Python, R
  • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), R Studio, PyCharm, Tableau, SubLime Text
  • Library/Package Usage: SciPy.stats, Statistics, NumPy
    • Formulating a hypothesis statement (NULL and ALTERNATE)
    • Type-I and Type-II Errors, P-Value, Level of Significance
    • Parametric Tests:
      • One Sample/Two Samples T Test
      • One Sample Z Test
      • Paired T Test
      • One-Way ANOVA
      • Chi-Squared Test
    • Non-Parametric Tests:
      • One Sample Sign Test
      • Mann-Whitney Test
      • Kruskal-Wallis Test
  • 6.Linear Algebra for Data Science

    • Programming Language: Python, R
    • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), R Studio, PyCharm, Tableau, SubLime Text
    • Library/Package Usage: SciPy.stats, Statistics, NumPy
      • Motivation – Why to learn Linear Algebra?
      • Representation of problems in Linear Algebra
        • Visualizing the problem: Line
        • System of linear equations
        • Planes
      • Matrix
        • Terms related to Matrix
        • Basic operations on Matrix
        • Representing in Matrix form
      • Solving the problem
        • Row Echelon form
        • Inverse of a Matrix
      • Eigenvalues and Eigenvectors
        • Finding Eigenvectors
        • Use of Eigenvectors in Data Science: PCA algorithm
      • Singular Value Decomposition of a Matrix

    7.Supervised Machine Learning – Part - I [Regression Analysis]

    • Programming Language: Python, R
    • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), R Studio, PyCharm, Tableau, SubLime Text
    • Library/Package Usage: SciKitLearn, NumPy, Pandas, Matplotlib, Seaborn, SciPy
      • Correlation Analysis, Correlation Coefficient
      • Introduction of Regression, Principles of regression
      • Simple Linear Regression Analysis
      • Splitting of Dataset into Train, Validation and Test data
      • Understanding Overfitting (Variance) vs Underfitting (Bias)
      • Generalization Error and Regularization Techniques
      • Multiple Linear Regression Model
      • Model Adequacy Checking
      • Transformation and Weighting to Correct Model Inadequacies
      • Diagnostic for Leverage and Influence
      • Generalized and Weighted Least Squares Estimation
      • Indicator Variables
      • Multicollinearity, Heteroskedasticity, Autocorrelation
      • Polynomial Regression Models
      • Poisson Regression Models
      • Variable Selection and Model Building
      • Case Studies I, II & III

    8.Supervised Machine Learning – Part - II [Classification Analysis]

    • Programming Language: Python, R
    • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), R Studio, PyCharm, Tableau, SubLime Text
    • Library/Package Usage: SciKitLearn, NumPy, Pandas, Matplotlib, Seaborn, SciPy
      • Two-Class Classification
        • Logistic Regression
        • Neural Network
        • Decision Tree
        • Random Forest
        • Naïve-Bayes
        • Support Vector Machine(SVM)
      • Multiclass Classification(MC)
        • MC-Logistic Regression
        • MC-Neural Network
        • MC-Decision Forest
        • K-Nearest Neighbor(KNN)
      • Anomaly Detection
      • Case Studies I, II, III, IV & V

    9.UnSupervised Machine Learning – [Clustering Analysis & Association Rules]

    • Programming Language: Python, R
    • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), R Studio, PyCharm, Tableau, SubLime Text
    • Library/Package Usage: SciKitLearn, NumPy, Pandas, Matplotlib, Seaborn, SciPy
      • Partitioning Clustering
      • Hierarchical Clustering
      • Clustering Validation and Evaluation with K-Means Clustering
        • Assessing clustering tendency
        • Determining the optimal number of clusters
        • Clustering validation statistics
      • DBSCAN: Density-based Clustering
      • Dimensionality Reduction with Principal Component Analysis(PCA)
      • Association Rule Learning and Recommendation
        • Apriori Algorithm
        • Frequent Pattern Growth
      • Case Studies I, II & III

    10.Model Performance Assessment in Machine Learning

    • Programming Language: Python
    • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), PyCharm, Tableau, SubLime Text
    • Library/Package Usage: SciKitLearn, NumPy, Pandas, Matplotlib, Seaborn, SciPy
    • Model Performance Assessment
      • Confusion Matrix
      • Precision
      • Recall
      • F1-Measure
      • Accuracy
      • Error Measures
      • Mean Squared Error(MSE
      • Root Mean Squared Error(RMSE)
    • Hyper Parameter Optimization/Tuning
      • Grid Search
      • Randomized Search
    • Cross-Fold Validation Techniques
      • Leave One Out
      • K-fold
      • Stratified K-fold
      • Stratified Shuffle Split
    • Ensemble Methods in Machine Learning
      • Bagging and Random Forests
      • Gradient Boosting
      • Optimized Distributed Gradient Boosting (XGBoost)
      • Adaptive Boost (AdaBoost)
      • Voting Classifier
    • Case Studies I, II, & III

    11.Data Visualization Tools & Techniques

    • Programming Language: Python
    • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), PyCharm, Tableau, SubLime Text
    • Library/Package Usage: Matplotlib, Seaborn
    • Data Visualization with Matplotlib, Seaborn, Tableau
    • Relational Plots
      • Relplot
      • Scatterplot
      • Lineplot
    • Categorical Plots
      • Catplot
      • Stripplot
      • Swarmplot
      • Boxplot
      • Violinplot
      • Barplot
      • Countplot
    • Distribution Plots
      • Jointplot
      • Pairplot
      • Distplot
    • Regression Plots
      • Lmplot
      • Regplot
      • Residplot
    • Matrix Plots
      • Heatmap
      • Clustermap
    • Tableau Products and Usage
      • Basic Charts on Tableau
      • Connecting Tableau with Multiple Sheets and Data Sources
      • Tableau Filters and Visualization Interactivity
      • Interaction and Grouping Data
      • Time Series Chart
      • Maps and Images in Tableau
      • Advanced Charts in Tableau and Analytical Techniques
      • Calculations on Tableau
      • Tableau Integration with Other Tools

    12.Natural Language Programming with Machine Learning

    • Programming Language: Python
    • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), PyCharm, Tableau, SubLime Text
    • Library/Package Usage: NLTK, Word2Vec, NumPy, Seaborn, SciPy
    • Natural Language Processing - What is it used for?
    • NLTK Exploration
      • Word Tokenization, Different Types of Tokenizers
      • Bigrams, Trigrams & N-grams
      • Stemming & Lemmatization
      • Stopwords Removal
      • Part of Speech (POS) Tagging
      • Named Entity Recognition
    • Bag of Words
    • TF- IDF Vectorizer
    • Co-occurrence matrix
    • Text Similarity/Clustering
    • Latent Semantic Analysis(LSA)
    • Topic Modeling
    • Latent Dirichlet Allocation (LDA)
    • Text Classification - Sentiment Analysis
    • Recommender Systems - Collaborative Filtering
    • Case Studies I & II

    13.Time Series Data Analysis and Forecasting

    • Programming Language: Python
    • Tools Usage: REPL Online, Anaconda (Jupyter Notebook / Spyder), PyCharm, Tableau, SubLime Text
    • Library/Package Usage: Datetime, Statsmodels, NumPy, Pandas, Seaborn, Matplotlib
    • Introduction to Time Series Data
    • Correlation And Autocorrelation
    • Components of Time Series
    • Visualization Principles - Scatter Plot , Time Plot and Lag Plot
    • Auto-Correlation Function (ACF)/ Correlogram
    • Naive Forecast Methods
    • Errors in Forecast and Its Metrics
    • Model Based Approaches
      • Linear Model, Exponential Model, Quadratic Model
    • Auto Regression (AR), Moving Average (MA)
    • Autoregressive Moving Average (ARMA)
    • utoregressive Integrated Moving Average (ARIMA)
    • Additive Seasonality
    • Multiplicative Seasonality
    • Random Walk
    • Smoothing Techniques
      • Moving Average, Exponential Smoothing
    • De-Seasoning and De-Trending
    • Case Studies I & II