R & Weka

Weka is a great resource for data mining and machine learning.  You can get a lot done with the standalone GUI workbench, but sometimes you need to use it as part of a script in a custom R analysis pipeline.  Yes, you could create a shell script that makes use of the Weka command-line tools, and invoke said script from R using a 'system' call, but that could get out of hand really quickly.

Luckily, the RWeka package is an R library that serves as an interface to the Weka collections.  I recently had to implement a tree classifier in R (ID3), and I used RWeka to verify my code.  The advantage of using RWeka is that you can use data frames as input, so not having arff-formatted data is not a problem:

WPM("load-package", "simpleEducationalLearningSchemes")
id3_classifier = make_Weka_classifier("weka/classifiers/trees/Id3")
id3_model = id3_classifier( label ~ ., data=trainingDataFrame )
summary(id3_model)
plot(as.party.Weka_tree(id3_model))
id3ModelPredictions = predict( id3_model, testingDataFrame)

The above code is straight forward: we use the WPM package manager from RWeka to load the necessary packages for decision tree classification.  We then instantiate an ID3 classifier using the path to the Weka ".jar" libraries using "make_weka_classifier".  We create a model using the "id3_classifier" function, and supply it our model formula — in the above we are saying that the "label" response variable is modeled by all of the remaining predictor variables in the "trainingDataFrame".  After getting some summary statistics and a plot of our tree, we can get some predictions with the call to "predict" and a data frame that contains our testing data.

Good stuff... if you can get Weka, and R, and RWeka to talk to each other.  RWeka is a simple install from CRAN, and installing Weka is straightforward; the real headache is having all the necessary Java plumbing to get everything connected.  The key is to get Java correctly configured, specially in OS X.  

I found this post that has the necessary steps to get Java configured in OS X El Capitan (10.11), but note that if you try to edit the Plist with a text editor like TextMate and BBedit the changes will not get saved even after you authenticate.  If you run into such problems, head over to Termnial.app and use "vi" to edit the Plist.  Once your Plist is ready, just follow the commands on the post (mkdir, and ln), and restart your R REPL.

Post: https://oliverdowling.com.au/2014/03/28/java-se-8-on-mac-os-x/