Naveen Kannan’s Data Analytics Skills

Python

PANDAS, Keras, Numpy, scikitlearn, PySpark and TensorFlow

  • Proficient in the Python programming language and related packages such as pandas, NumPy, and scipy for data manipulation, exploration, and analysis.

  • Experienced in statistical analysis and modeling using Python, including regression analysis, time series analysis, clustering, and classification using packages such as statsmodels and scikit-learn.

  • Skilled in machine learning techniques such as decision trees, random forests, gradient boosting, and neural networks using packages such as TensorFlow, Keras, and PyTorch.

  • Experienced in using visualization tools such as Matplotlib, Seaborn, and Plotly for creating static and interactive visualizations.

  • Proficient in data preparation and cleaning using pandas and related packages, and able to handle missing values, outliers, and data transformations.

  • Skilled in feature engineering using techniques such as one-hot encoding, scaling, and feature selection.

  • Experienced in using scikit-learn for model selection, hyperparameter tuning, and evaluation using techniques such as cross-validation and ensemble methods.

  • Comfortable with using Jupyter notebooks for interactive data analysis and reporting.

R

Advanced Statistical Analysis, Regression Analysis

  • Proficient in the R programming language and related packages such as tidyverse, ggplot2, dplyr, and data.table for data manipulation, exploration, and visualization.

  • Experienced in statistical analysis and modeling using R, including regression analysis, time series analysis, clustering, and classification.

  • Experienced in using visualization tools such as ggplot2 and Shiny for creating interactive dashboards and reports to communicate insights to stakeholders.

Data Visualization

ggplot, gt, and matplotlib

  • Familiar with ggplot2, a popular data visualization package in R, and able to create a wide range of visualizations such as scatter plots, box plots, and density plots using the Grammar of Graphics framework.

  • Experienced in creating publication-quality plots and charts using Matplotlib and related libraries such as Seaborn and Plotly, as well as ggplot in R.

  • Experienced in creating publication-quality tables for scientific journals.

Big Data Tools

Hadoop, PySpark and SQL

  • Proficient in PySpark, a Python library for Apache Spark, for large-scale data processing and machine learning tasks, including data transformation, feature engineering, and model training.

  • Experienced in working with Hadoop distributed computing system and related technologies such as HDFS.

  • Skilled in writing and optimizing SQL queries for data extraction and analysis, using various database management systems.

Linux

CLI, Docker, Bash scripting and Apache Web Server

  • Proficient in Linux operating system and command-line interface, including shell scripting, file system management, package installation, and system administration.

  • Experienced in managing Linux-based servers for hosting data science applications, using technologies such as Docker, and Apache web server.

  • Comfortable with programming languages commonly used in Linux environment such as Python, R, and Bash scripting.