I am now most of the way through the machine learning course that I’ve been taking this summer- I have completed 8 of the 11 weeks of content. I have learned a lot so far- before taking this course I had no prior experience or knowledge of machine learning. The course- “Machine Learning” by Andrew Ng on Coursera is well designed and I can see why it is the most popular online machine learning class. It goes in depth on enough algorithms for a new learner like myself to understand how they work and implement them from scratch, rather than using existing tools to do a black box implementation. It also has a practical focus and spends lots of time on how to evaluate the performance of learning algorithms, and decide how many features and how much regularization to use to ensure that the algorithm is not underfitting or overfitting. I am glad that I chose to take this course and not the other one I was considering.
The class begins with linear and logistic regression, and discusses how to implement them starting with few features, and then with more features if necessary, as well as regularization. The gradient descent algorithm is the technique taught to minimize the cost function. Before taking this course, I had what is probably a common misconception that linear regression is linear fitting, which is not true- polynomial fitting (e.g. ax^2 + bx + c) is also linear regression, just with polynomial features. The fact that one can use any number of features and whatever features they want in linear and logistic regression make them pretty versatile algorithms that work for a lot of different problems.
The language the class is taught in and which is used for homework assignments is MATLAB, which is a new language for me. MATLAB is optimized for linear algebra, and designed to do linear algebra computations quickly and easily, which makes it a good language for machine learning. I know that it also has a lot of built in tools for machine learning, optimization, modeling, etc., that I haven’t used; the course implements every algorithm from scratch, without using existing libraries- the only exception I can think of is that the instructor suggested using functions like fmincon and fminunc for minimizing the cost function which might run faster than gradient descent. I have been getting more used to using MATLAB as the course goes on, and would be comfortable using it for future projects, however, for long programs, I would still prefer python. The array slicing and indexing in numpy is very similar to MATLAB.
The more advanced supervised learning techniques the course covers are neural networks and support vector machines. For both of these algorithms, I understand how they work and what they are doing, but would need to rewatch some of the videos to be able to implement them on my own. In particular, for neural networks, the back-propagation algorithm to learn the weights is quite complicated, and not easy to intuit what it is doing. It is also my understanding that there are many different algorithms in use today to optimize neural networks that do different types of back-propogation. The support vector machine algorithm is also a bit tricky- it is easy to visualize its effect of large-margin classification, but complicated to implement, especially with non-linear kernels.
Luckily, the course spends a lot of time on how to evaluate the performance of a learning algorithm, in particular, what features to use, how to add regularization, and how to use cross-validation to evaluate the model. I feel that for most learning algorithms, even if I can’t remember all of the math behind them, I would be able to implement them using some of the many existing tools (for example, all of the python machine learning libraries), and be able to evaluate and improve their performance.
I have been thinking about ideas for my final project, and one possible direction to go in is to continue work on the project I did for the other class I took this summer, scientific computing. My final project for that class was on trying to predict how fast an optimization algorithm will finish. I spent a lot of time deciding what type of optimization problem to try and solve, and unsuccessfully tried generating different convex and curvy shapes, which did not work very well. I settled on large linear programs, and tested how many iterations it took the simplex, interior point, and revised simplex method to finish on 10,000 training examples, then modeled this data with different regression neural networks. However, none of the models I experimented with performed very well, and I didn’t have much time to do further analysis to figure out why. There is definitely more work that I could do to improve the performance of my learning algorithm, as well as test different algorithms or try different optimization problems with different training data, if I decided to continue working on this project.