SequelGate Training Institute

Data Science Training

 

SequelGate Technologies offers best Data Science LIVE Online Training with most experienced professionals. Our Instructors are Real-time Professionals in Big Data and Data ANalytics areas with proven expertize. All our training sessions are completely Real-time, hands-on. We framed our syllabus to match with the real world requirements for both beginner level to advanced level. Register Today

 

This course includes : Hadoop, R Studio, Python, Scala, Machine Learning, Tableau, Excel and more..!

 

REGULAR SCHEDULES: Classroom Training

Schedule Free Demo Start Date
6:00 AM June 8th June 8th Register
11:30 AM June 8th June 12th Register
 

WEEKEND SCHEDULES: Classroom Training

Schedule Free Demo Start Date
7 AM June 8th June 8th Register

Trainer: Mr Srinivas (18+ Yrs Exp)

Dur: 16 Weeks (Mon - Fri).

Total Course Fee: INR 55,000/-

 

Highlights

  • ✔ Completely Practical and Realtime
  • ✔ Theory Material provided in Advance
  • ✔ Highly Interactive and Interesting
  • ✔ Certification Guidance and FAQs
  • ✔ 80% Hands on Training, 20% Theroy part Explanantion

Pre-requisites for this course: NONE. Anyone can join!

 

Data Science Training Course Content

Data Science Training

 

 

Module -1

HADOOP - Frame work for Big data

HDFS

  • What is Big Data?
  • Challenges for processing big data?
  • What technologies support big data?
  • What is Hadoop?
  • Why Hadoop?
  • History of Hadoop
  • Use Cases of Hadoop
  • Hadoop eco Systems

Understanding the Cluster

  • Typical Workflow
  • Writing files to HDFS
  • Reading files from HDFS
  • Rack Awareness, 5 daemons

Map Reduce:

  • Before Map reduce
  • Map Reduce Overview
  • Job Tracker
  • Task Tracker Job Scheduling
  • Mapper and Reducer code
  • Configuring development environment – Eclipse

How MapReduce Works:

  • Anatomy of Map Reduce Job run
  • Job Submission, Job Initialization
  • Task Assignment
  • Job Completion, Job Scheduling
  • Job Failures, Shuffle and sort

PIG

  • Pig basics
  • Install and configure PIG on a cluster
  • PIG Vs MapReduce and SQL
  • Pig Vs Hive
  • Pig Latin
  • Primitive  Data Types and Complex Data Types
  • Types of Modes
  • Interactive mode
  • Script mode
  • Embedded mode
  • Modes of running PIG
  • Running in Grunt shell
  • Programming in Eclipse
  • Loading and Storing Datasets, Filters, Groups, Co-Groups, Foreach, Nested Foreach, Parallel, Distinct, Limit, Sample, Different Types of Joins
  • Debugging Commands(Illustrate and Explain)
  • Processing XML Files using Piggy bank
  • Processing Logfiles using Regex
  • Pigdate Function
  • PIG UDFs,UDAFs
  • Working with Predefine Functions, User Define Functions
  • Pig Macros
  • How To Load and Write JSON DATA using PIG
  • Accessing HBASE USING PIG

HIVE

  • Hive Introductions
  • Hive Architecture
  • Different Modes to Access HIVE
  • Command Line Interface
  • Web Interface(HWI)
  • Thrift Interface
  • Hive Meta Store
  • Hive QL
  • Primitive Data Types and Complex Data Types
  • Working with Partitions
  • Hive Bucketed Tables and Sampling
  • External Tables
  • Nested Queries
  • Multiple Inserts
  • Dynamic Partitions
  • Different Types of Joins
  • ORDER BY,SORT BY, DISTRIBUT BY,CLUSTER BY
  • INDEXES,VIEWS
  • Compression on Hive Tables and Migrating Hive Tables
  • Hive SerDe's
  • Processing XML Files using Regex
  • Processing Log Files using Regex
  • Accessing Hbase Tables using Hive
  • Hive UDF
  • Hive UDAF
  • Hive UDTF

Hbase

  • Hbase introduction
  • Hbase Data Model and Comparison between RDBMS and NOSQL
  • HBase  Architecture
  • HMaster,HRegionServer,Zookeeper,HRegion
  • File storage architecture
  • HFiles Compction,DeCompaction,Region Splits
  • HBase Opreations(DDL AND DML)Through Shell
  • Hbase Installation
  • Internal Zookeeper,External Zookeeper
  • Hbase Counters
  • Hbase Filters
  • HBase use Cases
  • Install and Configure HBase on a Multi Node Cluster
  • Create Database, Develop and Run Sample Applications
  • Access Data Stored in HBase  using Clients like Java, Python
  • MapReduce Client to Access the HBase  Data
  • HBase  and Hive Integration
  • HBase  Admin Tasks

Cassandra

  • Introduction
  • Installation
  • Creation of Database

 

Module -2

Scala - Language for Data Science & Bigdata

Scala Introduction &Environment Setup:

  • Scala is object-oriented, Scala is functional,Scala runs on the JVM
  • Installing Scala

Scala Basic Syntax

  • First Scala Program
  • Interactive Mode Programming
  • Script Mode Programming

Scala Data TYPES:

  • Literals
  • Strings
  • Escape Sequences

Scala Variables:

  • Declaration
  • Data Types
  • Type Inference
  • Multiple assignments
  • Variable Types

Scala Operators:

  • Arithmetic
  • Relational
  • Logical
  • Operator Precedence in Scala

Scala Conditions

 

Scala Loops

 

Scala Strings:

 

Scala Regular Expressions:

  • Forming regular expressions
  • Matching Literals and Constants
  • Matching Tuples and Lists
  • Matching with Types and Guards
  • Pattern Variables and Constants in case Expressions
  • Regular-expression Examples
  • Pattern matching with Extractors

Scala Functions:

  • Declarations
  • Definitions
  • Calling 
  • Function Literals
  • Anonymous
  • Currying

Scala Arrays

  • Declaring
  • Processing
  • Multi-Dimensional
  • Create Array with Range
  • Scala Arrays Methods

Scala Collections

  • Basic Operations on List,
  • Concatenating Lists
  • Creating Uniform Lists
  • Tabulating a Function
  • Scala List Methods
  • Concatenating Sets, Find max, min elements in Set
  • Find common values in Sets
  • Scala Set Methods
  • Basic Operations on Map
  • Check for a Key in Map

Scala Classes & Objects:

  • Oops Basics
  • Defining Fields,Methods,Constructors

Module -3

SPark - Frame work for Data Science & Bigdata Analytics

  • Introduction to Apache Spark:
  • What is Spark?
  • Spark Ecosystem, &modes of Spark
  • overview of Spark on a cluster
  • Spark Standalone cluster
  • Spark Web UI &
  • Spark Common Operations

Spark Core

  • performing basic Operations on files in Spark Shell and Overview of SBT
  • building a Spark project with SBT
  • running Spark project with SBT
  • Playing with RDDs:
  • RDDs, transformations in RDD, actions in RDD
  • loading data in RDD
  • saving data through RDD
  • Key-Value Pair RDD
  • MapReduce and Pair RDD Operations
  • Spark and Hadoop Integration-Yarn

Spark SQL

  • SparkSQL and Performance Tuning in Spark:
  • Analyze Hive and Spark SQL architecture, SQLContext in Spark SQL
  • working with Data Frames
  • implementing an example for Spark SQL
  • integrating hive and Spark SQL
  • support for JSON and Parquet File Formats
  • implement data visualization in Spark
  • loading of data
  • Hive queries through Spark
  • performance tuning tips in Spark

Spark Streaming

  • A Simple Example
  • Architecture and Abstraction
  • Transformations
  • Stateless Transformations
  • Stateful Transformations
  • Output Operations
  • Input Sources
  • Additional Sources
  • Multiple Sources and Cluster Sizing
  • Worker Fault Tolerance
  • Receiver Fault Tolerance
  • Processing Guarantees
  • Streaming UI
  • Batch and Window Sizes
  • Level of Parallelism

Spark GraphX

  • Edges
  • Vertices
  • Types of Graphs
  • Usages
  • Simple Program

SPARK Mlib

  • Vectors
  • Labledpoints
  • Lables
  • Features
  • RDD with Vectors
  • Matrices, Stats, Maths
  • Algorithms with Spark Mlib

Module -4

STATISTICS

STATISTICS: Descriptive & Inferential Statistics

DESCRIPTIVE  STATISTICS

  • Introduction to Advanced Data Analytics
  • Statistical descriptive and inferences for various Business problems
  • Types of Variables
  • Measures of central tendency
  • Dispersion
  • Variable Distributions
  • Probability Distributions
  • Normal Distribution and Properties
  • Skewness and Kurtosis
  • Five number Summary Analysis

INFERENTIAL STATICS

  • Null/Alternative Hypothesis formulation
  • Type I and Type II errors
  • One Sample T-TEST
  • Independent Sample T-TEST
  • Analysis of Variance ( ANOVA)
  • MANOVA
  • Chi Square Test (Non Parametric Tests)

Data quality and outlier treatment

  • Outlier treatment with robust measurements
  • Outlier treatment with central tendency Mean
  • Outlier with Min Max methods
  • Imputation with series means or median values
  • Z score Calculation
  • Sampling and estimation

Data Visualization

  • Stem and leaf
  • Dot Plot
  • Histogram
  • Density Plot
  • Frequency Plot and

Cumulative Frequency plots

  • Box and Whisker Plot
  • Scatter Plot
  • Line Graph
  • Bar Graph
  • Pie Chart
  • Tree Map
  • Cross Tabulation
  • Case Study for Visualization

Data Quality checking

  • Z score Calculation
  • Measure of position (percentile and Quartiles)
  • Measure of asymmetry --Skewness
  • Measure of Peaked-ness --Kurtosis
  • Q-Q probability plots
  • Kolmogorov Smirnov test
  • Shapiro Wilks test
  • Data Normalization
  • Handling missing Values
  • Case Studies for Data Quality Checking

Module -5

R–Lan/Python for Data Analytics

Getting Started R

  • R Basics
  • Variables and Class
  • Vectors, List, Factors, Matrix
  • Data Frames
  • Missing Values
  • Data Reading and Writing data
  • Data Visualization using GGPLOT
  • If-Else Conditions
  • Function
  • Loops
  • Data manipulation

Python

  • Python Basics
  • Python Lists
  • Functions and Packages
  • Numpy
  • Control flow and Pandas

Probability

  • Counting Combinations, Generating Combinations
  • Generating Random Numbers
  • Generating Reproducible Random Numbers
  • Generating a Random Sample
  • Generating Random Sequences
  • Randomly Permuting a Vector
  • Probabilities for Discrete Distributions
  • Probabilities for Continuous Distributions, Converting
  • Probabilities to Quantiles, Plotting a Density Function

Graphics

  • Edges
  • Vertices
  • Graphs
  • Programs

Machine Learning

  • Introduction to Machine Learning
  • Types Of Machine Learning
  • Real time use cases in Machine Learning
  • Types of Algorithms Types of Problems –
    • Regression
    • Classification
    • Clustering
    • Collaborative Filtering
    • Optimization
    • Prediction
  • Regression –
    • Linear Regression
    • Logistic Regression
  • Classification –
    • Logistic Regression
    • Decision Tree,Random Forest
    • KNN,SVM
    • Naive ayes
  • Clustering –
    • K-means Clustering

Register Today