Professor Peter Gerstoft,
Gerstoft@ucsd.edu

TA Mark Wagner, m2wagner@eng.ucsd.edu

Spiess Hall 330

**Time:** Monday and Wednesday 5-6:20pm

Many thanks for the fun projects! Below are the final projects from the class. Only the report is posted, the corresponding code is just as important.

- Source localization in an ocean waveguide using supervised machine learning, Group3, Group6, Group8, Group10, Group11, Group15
- Indoor positioning framework for most Wi-Fi-enabled devices, Group1
- MyShake Seismic Data Classification, Group2
- Multi Label Image Classification, Group4
- Face Recognition using Machine Learning, Group7
- Deep Learning for Star-Galaxy Classification, Group9
- Modeling Neural Dynamics using Hidden Markov Models, Group12
- Star Prediction Based on Yelp Business Data And Application in Physics, Group13
- Si K edge X-ray spectrum absorption interpretation using Neural Network, Group14
- Plankton Classification Using VGG16 Network, Group16
- A Survey of Convolutional Neural Networks: Motivation, Modern Architectures, and Current Applications in the Earth and Ocean Sciences, Group17
- Use satellite data to track the human footprint in the amazon rainforest, Group18
- Automatic speaker diarization using machine learning techniques, Group19
- Predicting Coral Colony Fate with Random Forest, Group20

Machine learning has received enormous interest recently. However, for physical problems there is reluctance to use machine learning. Machine learning cannot replace existing physical models, but improve certain aspects of them. To learn from data, we use probability theory, which has been the mainstay of statistics and engineering for centuries. Probability theory can be applied to any problem involving uncertainty. The class will focus on implementations.

It is not a computer science class, so we go slowly through the fundamentals to appreciate the methods, implement these. We will discuss their use in Physical Sciences. While I have done some research on this, I will also have a steep learning curve

In the first part of the class we focus on theory and implementations. We will then transition to examples and implementations in the middle section. The last part will focus on final projects.

I am working on finding interesting examples. Please suggest examples. While I mostly work in matlab these might require python.

- Tracking ships
using acoustics. Based on my paper: Niu et al, 2017 on arXiv. This can
be solved in TensorFlow or SciKit-learn (in Python) or matlab . Data and SVM
example
- Graph signal
processing for localizing small events. Based on my paper Riahi 2017.
- Classifying plankton.
This would be based on Jules Jaffe underwater microscope. . This might
require convolutional networks, but random forest and support vector
machines would also work.
- Identifying
earthquakes from data on a mobile phone (example is in ipython). Extensive
example with 3 Gb of data. See background paper in
Science

**Homework:** I'm a strong believer in leaning by doing. Thus we will
have computer-based homework each week. You can use any language. Some of the
examples are in python. I mostly work in matlab.

**Books:** Main book: Chris M Bishop Pattern
Recognition and Machine Learning . A third party Matlab implementation of many of the
algorithms in the book.

Other good books:

Hastie and Tibshirani
The Elements of Statistical Learning (2nd edition)

Kevin P. Murphy: Machine Learning: A Probabilistic Perspective. UCSD license
. Matlab codes used in Murphy's
book.

**Online resources:** While not required, I recommend taking these. Both
are online classes are excellent.

Statistical
Learning by Hastie and Tibshirani. My favorite class.

Andrew Ng's Coursera class, Machine learning. This was the first class offered
by Coursera.

**Grading>** Full scale of the letter grade. Grade consists of About
30 % homework, 20% seminar summary, and 50% final-project. Your and my purpose
is to lean, so a good effort is sufficient. 10% reduction/day for a delayed
homework.

**Seminar summary** Based on one talk at the 3-day workshop Big
Data and The Earth Sciences: Grand Challenges Workshop write a two-page
summary. Due at class on 7 June.

**Final project:** Propose a topic before May 1. Otherwise it will be based
on my paper: Niu et al, 2017 on arXiv.
We will make teams on April 24 and 26. Report due ABOUT June 16.

**Homework **Cody homework will be graded.

- April 3, Introduction to course, Statistical foundations. Bishop ch. 1.2, 1.5 Slides
- April 5, Multivariate densities. Correlation. 2D normal distribution. Principal
components Math foundations: linear algebra and matrices. Bishop Ch. 2.3,
App C Slides
- April 10, linear models. Bishop Ch. 3.1, 3.2, 4. Slides
- April 12, Bishop Ch. 3.3, 4.0-4.3 linear classifiers Slides
- April 17, Bishop Ch. 5-5.5, Backpropagation Slides
- April 19, Santosh Nannuru, Sparse processing. Read Chapter 2 for an introduction to sparse problems and Slides
- April 24, Santosh Nannuru, Sparse processing Slides .
More Chapter 5. At the end of class on Monday we will help with Python and
TensorFlow installations as well as organizing groups. I suggest making
groups of size 2-4, with 3 preferred.
- April 26, Bishop Ch 2.5, 6. Slides
- May 1, Error Backpropagation, Bishop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. Slides
- May 3, SVM, PCA, and K-means, Bishop Ch 12.1, 9.1 Slides
- May 8, Machine Learning for finding oil, focusing on 1) robust seismic
denoising/interpolation using structured matrix approximation 2) seismic
image clustering and classification, using t-SNE(t-distributed stochastic
neighbor embedding) and CNN. Weichang Li, Group Leader Aramco, Houston.
- May 10, Ocean acoustic source tracking. A main goal in the last month is the Final project. Data and SVM example (1 GB of data). Slides summarizing the processing. Mixtures of Gaussians and Expectation Maximization, Bishop Ch 9. Slides
- May 15, Seismology and Machine Learning, Daniel Trugman (half class). Example of workshop summary report . Graphical models, Bishop Ch 8. Slides
- May 17, Graphical models Bishop Ch 8
- May 22, Mike Bianco Dictionary learning (half class), random forest, Slides
- May 24, Sequential methods Ch 13 Slides
- May 31, No Class, Big Data and The Earth Sciences: Grand Challenges Workshop
- June 5,Spiess Hall open for project discussion 11am-. Discuss workshop.
- June 12, Spiess Hall open for project discussion 2-7pm, I will be happy to discuss.
- June 16, 5 pm Final project is due.

April 5: This will be for discussion in class; Download this file . Read the ex1.pdf, run the matlab scripts, and develop your answers.

April 10: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

April 12: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

April 17: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

April 19: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

April 26: This will be for discussion in class; develop
a solution for this XOR problem . Maybe most fun in TensorFlow, but matlab
is fine too.

May 1: Homework will be graded in Matlab's Cody . If you have not received an invitation please email us.

May 3: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

May 8: Homework will be graded in Matlab's Cody .

May 10: This will be for discussion in class; Download this file . This is focused on SVM and is short, so there is room for developing your own ideas.

May 15: Homework will be graded in Matlab's Cody .

May 17: This will be for discussion in class; Download this file . Run the matlab scripts, and develop your answers.

May 30: Final Homework will be graded in Matlab's Cody .