with help from Santosh Nannuru, snannuru@ucsd.edu

TA Mark Wagner, m2wagner@eng.ucsd.edu

Spiess Hall 330

Machine learning has received enourmous interest recently. However, for physical problems there is reluctance to use machine learning. Machine learning cannot replace existing physical models, but improve certain aspects of them. To learn from data, we use probability theory, which has been the mainstay of statistics and engineering for centuries. Probability theory can be applied to any problem involving uncertainty. The class will focus on implementations.

It is not a computer science class, so we go slowly through the fundamentals to appreciate the methods, implement these. We will discuss their use in Physical Sciences. While I have done some research on this, I will also have a steep learning curve

In the first part of the class we focus on theory and implementations. We will then transition to examples and implementations in the middle section. The last part will focus on final projects.

I am working on finding interesting examples. Please suggest examples. While I mostly work in matlab thesee also require python.

- Google TensorFlow (in Python) for tracking ships using acoutics. Based on my paper: Niu et al, 2017 on arXiv
- Graph signal processing for localizing small events. Based on my paper Riahi 2017.
- Gaussian classifiers.
- Classifying plancton. This would be based on Jules Jaffe underwater microscope. . This might require convolutional networks, but random forrest and support vector machines would also work.
- Identifying earthquakes from data on a mobile phone (example is in ipython). Extensive example with 3 Gb of data. See bakgound paper in Science

** Homework:** I'm a strong beleiver in leaning by doing. Thus we will have computer-based homework each week.
You can use any language. Some of the examples are in python. I mostly work in matlab.

**Books:**
Main book:
Chris M Bishop
Pattern Recognition and Machine Learning
A third party Matlab implementation of many of the algorithms in the book.

Other good books:

Hastie and Tibshirani
The Elements of Statistical Learning (2nd edition)

Kevin P. Murphy: Machine Learning : A Probabilistic Perspective.
UCSD lisense .
Matlab codes used in Murphy's book.

**Online resources:**
While not required, I recommend taking these. Both are online classes are excelent.

Statistical Learning by Hastie and Tibshirani.
My favorite class.

Andrew Ng's Coursera class, Machine learning.
This was the first class offered by Coursera.

**Grading>** Full scale of the letter grade. Grade consist of About 25 % homework, 25% seminar summary, and 50% final-project 4-man teams. Your and my purpose is to lean, so a good effort is sufficient. 10% reduction/day for a delayed homework.

**seminar summary** Based on one talk at the 3-day workshop
Big Data and The Earth Sciences: Grand Challenges Workshop write a one or two page summary. Due at class on 7 June.

** Final project** Either propose a topic before may 1. Or it will be based on my paper: Niu et al, 2017 on arXiv. We will make teams on April 24 and 26.
Report due ABOUT June 16.

** Howework ** 1 or 2 homeworks will be graded.

- April 3, Introduction to course, Statistical foundations. Bishop ch 1.2, 1.5 Slides
- April 5, Multivariate densities. Correlation. 2D normal distribution. Principal components Math foundations: linear algebra and matrices. Bishop Ch. 2.3, App C Slides
- April 10, linear models. Bishop Ch. 3.1, 3.2 ,4. Slides
- April 12, Bishop Ch. 3.3, 4.0-4.3 linear classifiers Slides
- April 17, Bishop Ch. 5-5.5, Backpropagation Slides
- April 19, Santosh Nanunuru, Sparse processing. Read Chapter 2 for an introduction to sparse problems and Slides
- April 24, Santosh Nanunuru, Sparse processing Slides . More Chapter 5. At the end of class on Monday we will help with Python and TensorFlow installations as well as organizing groups. I suggest making groups of size 2-4, with 3 preferred.
- April 26, Bishop Ch 2.5, 6
- May 1, Bishop Ch 7
- May 3, Bishop Ch 6-7
- May 8, Machine Learning for finding oil, Weichang Li, Goup Leader Aramco, Houston
- May 10, K-means. Dictinary learning, Mike Bianco (half class) Bishop Ch 9
- May 15, Seismology and Machine Learning, Daniel Trugman (half class)
- May 17, Importance of feature extraction, Aaron Thode (half class)
- May 22, Ocean acoustic source tracking. Final projects. The main goal in the last 3 weeks is the Final project.
- May 24, Graphical models Bishop Ch 8
- May 31, No Class, Big Data and The Earth Sciences: Grand Challenges Workshop
- June 5,
- June 7,