'ECE285 and SIO209 Machine learning for physical applications, Spring 2017

Professor Peter Gerstoft, Gerstoft@ucsd.edu
TA Mark Wagner, m2wagner@eng.ucsd.edu
Spiess Hall 330
Time: Monday and Wednesday 5-6:20pm

Final projects

Many thanks for the fun projects! Below are the final projects from the class. Only the report is posted, the corresponding code is just as important.

  1. Source localization in an ocean waveguide using supervised machine learning, Group3, Group6, Group8, Group10, Group11, Group15
  2. Indoor positioning framework for most Wi-Fi-enabled devices, Group1
  3. MyShake Seismic Data Classification, Group2
  4. Multi Label Image Classification, Group4
  5. Face Recognition using Machine Learning, Group7
  6. Deep Learning for Star-Galaxy Classification, Group9
  7. Modeling Neural Dynamics using Hidden Markov Models,
  8. Group12
  9. Star Prediction Based on Yelp Business Data And Application in Physics, Group13
  10. Si K edge X-ray spectrum absorption interpretation using Neural Network, Group14
  11. Plankton Classification Using VGG16 Network, Group16
  12. A Survey of Convolutional Neural Networks: Motivation, Modern Architectures, and Current Applications in the Earth and Ocean Sciences, Group17
  13. Use satellite data to track the human footprint in the amazon rainforest, Group18
  14. Automatic speaker diarization using machine learning techniques, Group19
  15. Predicting Coral Colony Fate with Random Forest, Group20

Syllabus

Machine learning has received enormous interest recently. However, for physical problems there is reluctance to use machine learning. Machine learning cannot replace existing physical models, but improve certain aspects of them. To learn from data, we use probability theory, which has been the mainstay of statistics and engineering for centuries. Probability theory can be applied to any problem involving uncertainty. The class will focus on implementations.

It is not a computer science class, so we go slowly through the fundamentals to appreciate the methods, implement these. We will discuss their use in Physical Sciences. While I have done some research on this, I will also have a steep learning curve

In the first part of the class we focus on theory and implementations. We will then transition to examples and implementations in the middle section. The last part will focus on final projects.

I am working on finding interesting examples. Please suggest examples. While I mostly work in matlab these might require python.

  1. Tracking ships using acoustics. Based on my paper: Niu et al, 2017 on arXiv. This can be solved in TensorFlow or SciKit-learn (in Python) or matlab . Data and SVM example
  2. Graph signal processing for localizing small events. Based on my paper Riahi 2017.
  3. Classifying plankton. This would be based on Jules Jaffe underwater microscope. . This might require convolutional networks, but random forest and support vector machines would also work.
  4. Identifying earthquakes from data on a mobile phone (example is in ipython). Extensive example with 3 Gb of data. See background paper in Science

Homework: I'm a strong believer in leaning by doing. Thus we will have computer-based homework each week. You can use any language. Some of the examples are in python. I mostly work in matlab.

Books: Main book: Chris M Bishop Pattern Recognition and Machine Learning . A third party Matlab implementation of many of the algorithms in the book.
Other good books:
Hastie and Tibshirani The Elements of Statistical Learning (2nd edition)
Kevin P. Murphy: Machine Learning: A Probabilistic Perspective. UCSD license . Matlab codes used in Murphy's book.

Online resources: While not required, I recommend taking these. Both are online classes are excellent.
Statistical Learning by Hastie and Tibshirani. My favorite class.
Andrew Ng's Coursera class, Machine learning. This was the first class offered by Coursera.

Grading> Full scale of the letter grade. Grade consists of About 30 % homework, 20% seminar summary, and 50% final-project. Your and my purpose is to lean, so a good effort is sufficient. 10% reduction/day for a delayed homework.
Seminar summary Based on one talk at the 3-day workshop Big Data and The Earth Sciences: Grand Challenges Workshop write a two-page summary. Due at class on 7 June.
Final project: Propose a topic before May 1. Otherwise it will be based on my paper: Niu et al, 2017 on arXiv. We will make teams on April 24 and 26. Report due ABOUT June 16.
Homework Cody homework will be graded.

Class Plan

  1. April 3, Introduction to course, Statistical foundations. Bishop ch. 1.2, 1.5 Slides
  2. April 5, Multivariate densities. Correlation. 2D normal distribution. Principal components Math foundations: linear algebra and matrices. Bishop Ch. 2.3, App C Slides
  3. April 10, linear models. Bishop Ch. 3.1, 3.2, 4. Slides
  4. April 12, Bishop Ch. 3.3, 4.0-4.3 linear classifiers Slides
  5. April 17, Bishop Ch. 5-5.5, Backpropagation Slides
  6. April 19, Santosh Nannuru, Sparse processing. Read Chapter 2 for an introduction to sparse problems and Slides
  7. April 24, Santosh Nannuru, Sparse processing Slides . More Chapter 5. At the end of class on Monday we will help with Python and TensorFlow installations as well as organizing groups. I suggest making groups of size 2-4, with 3 preferred.
  8. April 26, Bishop Ch 2.5, 6. Slides
  9. May 1, Error Backpropagation, Bishop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. Slides
  10. May 3, SVM, PCA, and K-means, Bishop Ch 12.1, 9.1 Slides
  11. May 8, Machine Learning for finding oil, focusing on 1) robust seismic denoising/interpolation using structured matrix approximation 2) seismic image clustering and classification, using t-SNE(t-distributed stochastic neighbor embedding) and CNN. Weichang Li, Group Leader Aramco, Houston.
  12. May 10, Ocean acoustic source tracking. A main goal in the last month is the Final project. Data and SVM example (1 GB of data). Slides summarizing the processing. Mixtures of Gaussians and Expectation Maximization, Bishop Ch 9. Slides
  13. May 15, Seismology and Machine Learning, Daniel Trugman (half class). Example of workshop summary report . Graphical models, Bishop Ch 8. Slides
  14. May 17, Graphical models Bishop Ch 8
  15. May 22, Mike Bianco Dictionary learning (half class), random forest, Slides
  16. May 24, Sequential methods Ch 13 Slides
  17. May 31, No Class, Big Data and The Earth Sciences: Grand Challenges Workshop
  18. June 5,Spiess Hall open for project discussion 11am-. Discuss workshop.
  19. June 12, Spiess Hall open for project discussion 2-7pm, I will be happy to discuss.
  20. June 16, 5 pm Final project is due.

Homework

April 5: This will be for discussion in class; Download this file . Read the ex1.pdf, run the matlab scripts, and develop your answers.

April 10: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

April 12: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

April 17: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

April 19: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

April 26: This will be for discussion in class; develop a solution for this XOR problem . Maybe most fun in TensorFlow, but matlab is fine too.

May 1: Homework will be graded in Matlab's Cody . If you have not received an invitation please email us.

May 3: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

May 8: Homework will be graded in Matlab's Cody .

May 10: This will be for discussion in class; Download this file . This is focused on SVM and is short, so there is room for developing your own ideas.

May 15: Homework will be graded in Matlab's Cody .

May 17: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

May 30: Final Homework will be graded in Matlab's Cody .