TA Mark Wagner, m2wagner@eng.ucsd.edu

Spiess Hall 330

Machine learning has received enourmous interest recently. However, for physical problems there is reluctance to use machine learning. Machine learning cannot replace existing physical models, but improve certain aspects of them. To learn from data, we use probability theory, which has been the mainstay of statistics and engineering for centuries. Probability theory can be applied to any problem involving uncertainty. The class will focus on implementations.

It is not a computer science class, so we go slowly through the fundamentals to appreciate the methods, implement these. We will discuss their use in Physical Sciences. While I have done some research on this, I will also have a steep learning curve

In the first part of the class we focus on theory and implementations. We will then transition to examples and implementations in the middle section. The last part will focus on final projects.

I am working on finding interesting examples. Please suggest examples. While I mostly work in matlab these might require python.

- Tracking ships using acoutics. Based on my paper: Niu et al, 2017 on arXiv. This can be solved in TensorFlow or SciKit-learn (in Python)or matlab . Data and SVM example
- Graph signal processing for localizing small events. Based on my paper Riahi 2017.
- Classifying plancton. This would be based on Jules Jaffe underwater microscope. . This might require convolutional networks, but random forrest and support vector machines would also work.
- Identifying earthquakes from data on a mobile phone (example is in ipython). Extensive example with 3 Gb of data. See bakgound paper in Science

** Homework:** I'm a strong beleiver in leaning by doing. Thus we will have computer-based homework each week.
You can use any language. Some of the examples are in python. I mostly work in matlab.

**Books:**
Main book:
Chris M Bishop
Pattern Recognition and Machine Learning
A third party Matlab implementation of many of the algorithms in the book.

Other good books:

Hastie and Tibshirani
The Elements of Statistical Learning (2nd edition)

Kevin P. Murphy: Machine Learning : A Probabilistic Perspective.
UCSD lisense .
Matlab codes used in Murphy's book.

**Online resources:**
While not required, I recommend taking these. Both are online classes are excelent.

Statistical Learning by Hastie and Tibshirani.
My favorite class.

Andrew Ng's Coursera class, Machine learning.
This was the first class offered by Coursera.

**Grading>** Full scale of the letter grade. Grade consist of About 25 % homework, 25% seminar summary, and 50% final-project 4-man teams. Your and my purpose is to lean, so a good effort is sufficient. 10% reduction/day for a delayed homework.

**seminar summary** Based on one talk at the 3-day workshop
Big Data and The Earth Sciences: Grand Challenges Workshop write a one or two page summary. Due at class on 7 June.

** Final project** Either propose a topic before May 1. Or it will be based on my paper: Niu et al, 2017 on arXiv. We will make teams on April 24 and 26.
Report due ABOUT June 16.

** Howework ** Three homeworks will be graded.

- April 3, Introduction to course, Statistical foundations. Bishop ch 1.2, 1.5 Slides
- April 5, Multivariate densities. Correlation. 2D normal distribution. Principal components Math foundations: linear algebra and matrices. Bishop Ch. 2.3, App C Slides
- April 10, linear models. Bishop Ch. 3.1, 3.2 ,4. Slides
- April 12, Bishop Ch. 3.3, 4.0-4.3 linear classifiers Slides
- April 17, Bishop Ch. 5-5.5, Backpropagation Slides
- April 19, Santosh Nanunuru, Sparse processing. Read Chapter 2 for an introduction to sparse problems and Slides
- April 24, Santosh Nanunuru, Sparse processing Slides . More Chapter 5. At the end of class on Monday we will help with Python and TensorFlow installations as well as organizing groups. I suggest making groups of size 2-4, with 3 preferred.
- April 26, Bishop Ch 2.5, 6. Slides
- May 1, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. Slides
- May 3, SVM, PCA, and K-means, Bishop Ch 12.1, 9.1 Slides
- May 8, Machine Learning for finding oil, focussing on 1) robust seismic denoising/interpolation using structured matrix approximation 2) seismic image clustering and classification, using t-SNE(t-distributed stochastic neighbor embedding) and CNN. Weichang Li, Goup Leader Aramco, Houston.
- May 10, Ocean acoustic source tracking. A main goal in the last month is the Final project. Data and SVM example (1 GB of data). Slides summarizing the processing. Mixtures of Gaussians and Expectation Maximization, Bishop Ch 9. Slides
- May 15, Seismology and Machine Learning, Daniel Trugman (half class). Example of workshop summary report . Graphical models, Bishop ch 8. Slides
- May 17, Graphical models Bishop Ch 8
- May 22, Mike Bianco Dictionary learning (half class), random forrest,
- May 24, Sequential methods ch 13
- May 31, No Class, Big Data and The Earth Sciences: Grand Challenges Workshop
- June 5,Spiess Hall open for project discussion 11am-. Discuss workshop.
- June 12, Spiess Hall open for project discussion 9-11:30am and 2-7pm