Machine learning

Machine Learning Workshop with Kaggle

 

When: Saturday February 17th at 1:00pm

Where: CoMotion on King at 115 King Street East (3rd floor), Hamilton, ON

Organizer: Hamilton Machine Learning and Computing Research

Register: eventbrite.ca/e/machine-learning-workshop-with-kaggle-tickets-41883700275

Details:

 

 

At this workshop, we will work through a Kaggle problem as a group to learn about machine learning and data science!

The workshop leaders will introduce the problem to the group.  The workshop leaders will work with the group to solve the problem on the projector screen.  You can ask questions, participate and help, or just follow along, whatever your comfort level.

But you’re also welcome to solve the problem on your own as part of a group together, this is a casual event to share and learn.

We recommend the following background knowledge for attending the workshop, you may find the links helpful to prepare in advance:

We recommend bringing your laptop!

Mahsa Rahimi and Nick Miladinovic will lead the workshop.

 

Machine learning using scikit-learn

365630
Originally posted on kamillus.github.io

 

Scikit-learn is a fantastic library to solve problems using machine learning and other, more traditional statistical methods in the area of Data Science. In this post I’m outlining why machine learning is important, demonstrating a simple machine learning problem and how to solve it.

Why should you care? Data science is becoming more and more relevant with the growth of big data, and more autononomous systems (ex. recommender systems, pattern recognition). Machine learning, specifically, is applicable to many fields including finance (ex. Detecting credit card fraud), medical (ex. Classifying patient cancer), entertainment (ex. Chess playing bot). The number of careers involving machine learning will steadily increase (there is evidence it’s already happening) since the supporting technologies are becoming more prolific (Hadoop, scikit-learn, Mahout etc.).

One of the problems I was working on not too long ago was classifying which user is at the front of the computer. I have developed a small user classification game utilizing an SVM. The game asks a user for a bunch of words to create a profile of the user – the machine is “learning”. In the next part of the game, the user types a bunch of words and the computer tries to recognize who is typing at the keyboard by utilizing what it learned.

How does the computer learn? The feature generation is accomplished when the user is asked for their name, then presented with a series of words from a dictionary and finally asked to type words as they appear. The features that is recorded is the typing speed, number of errors, and corrections made to typed words.

The next part of the program is to run the data through the classifier (which in our case is SVM). The tricky part is to get the right values for gamma. You could experiment with this by using a test data set; do not use your training set. Once you have this data, the actual classifying is trivial with scikit-learn:

#create the classifiter
classifier = svm.SVC(gamma=1)
#get existing features, and their expected results
(features, targets) = profiles.get_classifier_data()
classifier.fit(features, targets)

#based on new features and targets feed into the program and guess the new predicted targets
predicted = classifier.predict([[data_point.time, data_point.error_count, data_point.distance]])

How could this be improved? I think the first opportunity for improvement is to recognize data clusters automatically using k-means and possibly utilize principal component analysis. That way, every cluster of data will be automatically assigned without first creating user profiles.

I hope this post elucidates the high level machine learning process for anyone that is interested. The technologies and ideas used here are just some tools that can be added to your toolbelt. If you’d like to find out more about machine learning, I recommend Andrew Ng’s set of lectures.

Full Listing