HDP Data Science











    Pokud nevyberete termín ze seznamu, napište ho prosím políčka Poznámka.
    If you didn't select specific course date, enter it as Note, please.

    Overview

    This course provides instruction on the theory and practice of data science, including machine learning and natural language processing. This course introduces many of the core concepts behind today’s most commonly used algorithms and introducing them in practical applications. We’ll discuss concepts and key algorithms in all of the major areas – Classification, Regression, Clustering, Dimensionality Reduction, including a primer on Neural Networks. We’ll focus on both single-server tools and frameworks (Python, NumPy, pandas, SciPy, Scikit-learn, NLTK, TensorFlow Jupyter) as well as large-scale tools and frameworks (Spark MLlib, Stanford CoreNLP, TensorFlowOnSpark/Horovod/MLeap, Apache Zeppelin). Download the data sheet to view the full list of objectives and labs.

    • Prerequisites
      Students must have experience with Python and Scala, Spark, and prior exposure to statistics, probability, and a basic understanding of big data and Hadoop principles. While brief reviews are offered in these topics, students new to Hadoop are encouraged to attend the Apache Hadoop Essentials (HDP-123) course and HDP Spark Developer (DEV-343), as well as the language-specific introduction courses.
    • Target Audience
      Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Spark/Hadoop

    DAY 1 – An Introduction to Data Science, SciKit-Learn, HDFS, Reviewing Spark apps, DataFrames and NOSQL

    OBJECTIVES
    • Discuss aspects of Data Science, the team members, and the team roles
    • Discuss use cases for Data Science
    • Discuss the current State of the Art and its future direction
    • Review HDFS, Spark, Jupyter, and Zeppelin
    • Work with SciKit-Learn, Pandas, NumPy, Matplotlib, and Seaborn
    LABS
    • Hello, ML w/ SciKit-Learn 
    • Spark REPLs, Spark Submit, & Zeppelin Review 
    • HDFS Review 
    • Spark DataFrames and Files 
    • NiFi Review

    DAY 2 – Algorithms in Spark ML and SciKit-Learn: Linear Regression, Logistic Regression, Support Vectors, Decision Trees-

    OBJECTIVES
    • Discuss categories and use cases of the various ML Algorithms
    • Understand Linear Regression, Logistic Regression, and Support Vectors
    • Understand Decision Trees and their limitations
    • Understand Nearest-Neighbors
    • Discuss and demonstrate a Spam Classifier
    LABS
    • Linear Regression as a Projection 
    • Logistic Regression 
    • Support Vectors 
    • Decision Trees 
    • Linear Regression as a Classifier

    DAY 3 – K-Means & GMM Clustering, Essential TensorFlow, NLP with NLTK, NLP with Stanford CoreNLP

    OBJECTIVES
    • Discuss and understand Clustering Algorithms
    • Work with TensorFlow to create a basic neural network
    • Work with TensorFlow to create a basic neural network
    • Discuss Natural Language Processing
    • Discuss Dimensionality Reduction Algorithms
    LABS
    • K-Means Clustering 
    • GMM Clustering 
    • Essential TensorFlow 
    • Sentiment Analysis
    • Dimensionality Reduction with PCA

    DAY 4 – HyperParameter Tuning, K-Fold Validation, Ensemble Methods, ML Pipelines in SparkML

    OBJECTIVES
    • Discuss Hyper-Parameter Tuning and K-Fold Validation
    • Understand Ensemble Models
    • Discuss ML Pipelines in Spark MLlib
    • Discuss ML in production and real-world issues
    • Demonstrate TensorFlowOnSpark
    LABS
    • Hyper-parameter tuning 
    • K-Fold Validation 
    • Ensemble Methods 
    • ML Pipelines in SparkML 
    • Demo: TensorFlowOnSpark
    Poptat termín

    Aktuálně nejsou žádné termíny

    Vypňte formulář a my vás budeme informovat, jakmile bude vypsán nový termín kurzu.