Professor Peter Gerstoft,
Gerstoft@ucsd.edu

TA Mark Wagner, m2wagner@eng.ucsd.edu

Spiess Hall 330

**Time:** Monday and Wednesday 5-6:20pm

Many thanks for the fun projects! Below
are the final projects from the class. Only the report is posted, the
corresponding code is just as important.

- Source
localization in an ocean waveguide using supervised machine learning, Group3, Group6, Group8, Group10, Group11, Group15
- Indoor
positioning framework for most Wi-Fi-enabled devices, Group1
- MyShake Seismic
Data Classification, Group2
- Multi Label
Image Classification, Group4
- Face Recognition
using Machine Learning, Group7
- Deep Learning
for Star-Galaxy Classification, Group9
- Modeling Neural
Dynamics using Hidden Markov Models,
- Star Prediction
Based on Yelp Business Data And Application in Physics, Group13
- Si K edge X-ray
spectrum absorption interpretation using Neural Network, Group14
- Plankton
Classification Using VGG16 Network, Group16
- A Survey of
Convolutional Neural Networks: Motivation, Modern Architectures, and
Current Applications in the Earth and Ocean Sciences, Group17
- Use satellite
data to track the human footprint in the amazon rainforest, Group18
- Automatic
speaker diarization using machine learning techniques, Group19
- Predicting Coral
Colony Fate with Random Forest, Group20

Machine learning has received enormous interest recently. However, for physical problems there is reluctance to use machine learning. Machine learning cannot replace existing physical models, but improve certain aspects of them. To learn from data, we use probability theory, which has been the mainstay of statistics and engineering for centuries. Probability theory can be applied to any problem involving uncertainty. The class will focus on implementations.

It is not a computer science class, so we go slowly through the fundamentals to appreciate the methods, implement these. We will discuss their use in Physical Sciences. While I have done some research on this, I will also have a steep learning curve

In the first part of the class we focus on theory and implementations. We will then transition to examples and implementations in the middle section. The last part will focus on final projects.

I am working on finding interesting examples. Please suggest examples. While I mostly work in matlab these might require python.

- Tracking ships
using acoustics. Based on my paper: Niu et al, 2017 on arXiv. This can
be solved in TensorFlow or SciKit-learn (in Python) or matlab . Data and SVM
example
- Graph signal
processing for localizing small events. Based on my paper Riahi 2017.
- Classifying plankton.
This would be based on Jules Jaffe underwater microscope. . This might
require convolutional networks, but random forest and support vector
machines would also work.
- Identifying
earthquakes from data on a mobile phone (example is in ipython). Extensive
example with 3 Gb of data. See background paper in
Science

**Homework:** I'm a strong believer in leaning by doing. Thus we will
have computer-based homework each week. You can use any language. Some of the
examples are in python. I mostly work in matlab.

**Books:** Main book: Chris M Bishop Pattern
Recognition and Machine Learning . A third party Matlab implementation of many of the
algorithms in the book.

Other good books:

Hastie and Tibshirani
The Elements of Statistical Learning (2nd edition)

Kevin P. Murphy: Machine Learning: A Probabilistic Perspective. UCSD license
. Matlab codes used in Murphy's
book.

**Online resources:** While not required, I recommend taking these. Both
are online classes are excellent.

Statistical
Learning by Hastie and Tibshirani. My favorite class.

Andrew Ng's Coursera class, Machine learning. This was the first class offered
by Coursera.

**Grading>** Full scale of the letter grade. Grade consists of About
30 % homework, 20% seminar summary, and 50% final-project. Your and my purpose
is to lean, so a good effort is sufficient. 10% reduction/day for a delayed
homework.

**Seminar summary** Based on one talk at the 3-day workshop Big
Data and The Earth Sciences: Grand Challenges Workshop write a two-page
summary. Due at class on 7 June.

**Final project:** Propose a topic before May 1. Otherwise it will be based
on my paper: Niu et al, 2017 on arXiv.
We will make teams on April 24 and 26. Report due ABOUT June 16.

**Homework **Cody homework will be graded.

- April 3,
Introduction to course, Statistical foundations. Bishop ch. 1.2, 1.5 Slides
- April 5,
Multivariate densities. Correlation. 2D normal distribution. Principal
components Math foundations: linear algebra and matrices. Bishop Ch. 2.3,
App C Slides
- April 10, linear
models. Bishop Ch. 3.1, 3.2, 4. Slides
- April 12, Bishop
Ch. 3.3, 4.0-4.3 linear classifiers Slides
- April 17, Bishop
Ch. 5-5.5, Backpropagation Slides
- April 19,
Santosh Nannuru, Sparse processing. Read Chapter 2 for an
introduction to sparse problems and Slides
- April 24,
Santosh Nannuru, Sparse processing Slides .
More Chapter 5. At the end of class on Monday we will help with Python and
TensorFlow installations as well as organizing groups. I suggest making
groups of size 2-4, with 3 preferred.
- April 26, Bishop
Ch 2.5, 6. Slides
- May 1, Error
Backpropagation, Bishop 5.3, and Support Vector Machines (SVM) Bishop Ch
7. Slides
- May 3, SVM, PCA,
and K-means, Bishop Ch 12.1, 9.1 Slides
- May 8, Machine
Learning for finding oil, focusing on 1) robust seismic
denoising/interpolation using structured matrix approximation 2) seismic
image clustering and classification, using t-SNE(t-distributed stochastic
neighbor embedding) and CNN. Weichang Li, Group Leader Aramco, Houston.
- May 10, Ocean
acoustic source tracking. A main goal in the last month is the Final
project. Data
and SVM example (1 GB of data). Slides
summarizing the processing. Mixtures of Gaussians and Expectation
Maximization, Bishop Ch 9. Slides
- May 15,
Seismology and Machine Learning, Daniel Trugman (half class). Example of workshop summary report .
Graphical models, Bishop Ch 8. Slides
- May 17,
Graphical models Bishop Ch 8
- May
22, Mike Bianco Dictionary learning (half
class), random forest,
__May 24, Sequential methods Ch 13____May 31, No Class, Big Data and The Earth Sciences: Grand Challenges Workshop__- June 5,Spiess
Hall open for project discussion 11am-. Discuss workshop.
- June 12, Spiess
Hall open for project discussion 2-7pm, I will be happy to discuss.
**June 16, 5 pm Final project is due.**

Homework is currently discussed in
class.

·
April 5: This will be for discussion in class; Download
this file . Read the ex1.pdf, run the matlab scripts, and develop your
answers.

·
April 10: This will be for discussion in class; Download
this file . Run the matlab scripts, and develop your answers.

·
April 12: This will be for discussion in class; Download
this file . Run the matlab scripts, and develop your answers.

·
April 17: This will be for discussion in class; Download
this file . Run the matlab scripts, and develop your answers.

·
April 19: This will be for discussion in class; Download
this file . Run the matlab scripts, and develop your answers.

·
April 26: This will be for discussion in class; develop
a solution for this XOR problem . Maybe most fun in TensorFlow, but matlab
is fine too.

·
May 1: Homework will be graded in Matlab's
Cody . If you have not received an invitation please email us.

·
May 3: This will be for discussion in class; Download
this file . Run the matlab scripts, and develop your answers.

·
May 8: Homework will be graded in Matlab's
Cody .

·
May 10: This will be for discussion in class; Download
this file . This is focused on SVM and is short, so there is room for
developing your own ideas.

·
May 15: Homework will be graded in Matlab's
Cody .

·
May 17: This will be for discussion in class; Download
this file . Run the matlab scripts, and develop your answers.

·
May 30: Final Homework will be graded in Matlab's Cody .