August 9, 2021

Camilo Valdes

Visualization

Microbiome Maps

August 9, 2021

Camilo Valdes

Visualization

Microbiome Maps are visualizations of microbial community profiles, and they can be created with the Jasper software. Jasper is a tool for creating rich, interactive microbiome maps that lets you explore your metagenomic samples like never before. Jasper uses a Hilbert Curve to place genomes on an interactive canvas that can display thousands of genomes at once.

July 8, 2019

Camilo Valdes

Paper: Large Scale Microbiome Profiling in the Cloud

July 8, 2019

Camilo Valdes

The paper for Flint just got published! You can view the publication at Oxford Bioinformatics. Flint is a metagenomics profiling pipeline that is built on top of the Apache Spark framework, and is designed for fast real-time profiling of metagenomic samples against a large collection of reference genomes.

May 15, 2019

Camilo Valdes

Paper Accepted at ISMB 2019

May 15, 2019

Camilo Valdes

Our paper, Large Scale Microbiome Profiling in the Cloud, got accepted for a Proceedings Presentation at the 2019 Intelligent Systems for Molecular Biology and European Conference on Computational Biology (ISMB / ECCB) conference in Basel, Switzerland!

March 25, 2019

Camilo Valdes

PhD Proposal Defense

March 25, 2019

Camilo Valdes

I recently defended my PhD proposal at the CS department at Florida International University (FIU). I’m currently working on the presentation and the talk is scheduled for April.

January 16, 2018

Camilo Valdes

Properly Shutting Down a VirtualBox Virtual Machine

January 16, 2018

Camilo Valdes

We’ve been testing some Spark code that will eventually be moved to AWS. For now, to save costs, we’ve created a 8 node Spark cluster that runs on a set of Virtual Machines running Ubuntu on VirtualBox. We’ve developed some bash-scripts to make starting (and shutting down) the VMs easy.

April 14, 2017

Camilo Valdes

Computers and Intractability Book

April 14, 2017

Camilo Valdes

Got a copy of a great book, Computers and Intractability: A Guide to the Theory of NP-Completeness, from Bell Labs.

October 25, 2016

Camilo Valdes

Bioinformatics

Genome Building

October 25, 2016

Camilo Valdes

Bioinformatics

The Bioinformatics repository at my GitHub account contains a script I use to "build" the Human Genome: it creates the necessary genomic data structures that I need to run a DNA sequencing analysis. The data structures are Burrows-Wheeler indices that the genomic aligners (Bowtie2) need to get their job done.

September 20, 2016

Camilo Valdes

YouTube, Machine Learning

Deep Learning Videos

September 20, 2016

Camilo Valdes

YouTube, Machine Learning

I found this great channel by professor Nando de Freitas at the University of Oxford. Most of the videos are good, but the series on Neural Networks and Deep Learning is great:

August 18, 2016

Camilo Valdes

Programming

Upgrading R

August 18, 2016

Camilo Valdes

Programming

Recently I had to upgrade my R installation because I needed to install a library that required a higher version of R than what I had installed. I used to live life on the edge and upgrade R as soon as a new version was available, but as my third-party libs started to grow I started to upgrade R less and less.

July 24, 2016

Camilo Valdes

Visualization, Data Mining, Machine Learning

Visualization & Diagnostic Plots

July 24, 2016

Camilo Valdes

Visualization, Data Mining, Machine Learning

I needed to create a series of diagnostic plots for a recent Data Mining project. I created the plots by hand using R — I say "by hand" to mean that I wrote a script to generate them, rather than using a tool such as Tableau. The reason is that the data for the plots came from the UCI Machine Learning Repository, and it just so happened that the particular datasets come bundled with the R standard library. :)

June 18, 2016

Camilo Valdes

Machine Learning

K-NN Solver in R

June 18, 2016

Camilo Valdes

Machine Learning

A recent assignment in a machine learning class called for drawing the k-nearest-neighbor decision boundary for some given values of k, starting with k=1. The task involved using standard Euclidean distance between the starting points to determine the class of the nearest neighbors, and at the same time to draw (by hand) the resulting figure.

May 15, 2016

Camilo Valdes

Programming

Spark INFO logging

May 15, 2016

Camilo Valdes

Programming

Spark is great, and the more I work with it on my PhD thesis the more changes I make to my local installation on my rMBP. One of the modifications I came across the other day is how to dial down logging messages in one of the Spark shells. Specifically, how to dial dow the messages in PySpark when programming in Python.