My Blog Posts.
Introduction to PXE boot servers.
A basic introduction to Pre-Execution Environment Servers.
Integrating HDFS and PostgreSQL through Apache Spark.
A guide to leveraging Spark’s inferSchema
tool in conjunction with the HDFS to streamline PostgreSQL database schema/table creation.
Mamba implementation in Scientific Pipelines.
A guide to using Mamba (over Conda) for pipeline building.
Installing and configuring the HIVE metastore with a MySQL backend.
A guide to configuring Hive Metastore to use a MySQL server as the backend RDBMS for metadata storage, while enabling Spark to connect to the Metastore.
Using Ansible to install Hive on a Spark cluster.
A detailed view of Ansible playbooks with a highly relevant example.
Installing and configuring Hadoop and Spark on a 4 node cluster.
A guide to installing Hadoop and Spark on a 4 node cluster, while configuring and setting up HDFS, YARN and MapReduce.
Using Ansible to remotely configure a cluster.
Using a containerized instance of Ansible to remotely connect to a cluster and perform a simple ping task to confirm connection.
Docker, Singularity, and HPC.
A brief rundown of Docker and Singularity, and their relevance in HPC environments.
SLURM and HPC.
An introduction to SLURM in the context of HPC clusters.
p values, Statistical Significance, and the magic number.
A brief exploration on p-values.