3–5 Nov 2021
Asia/Tehran timezone

Data Science in Relativistic Astrophysics (Hands on Workshop)

 

  • Preparation before the meeting

We use the python language for programming, PyTorch/scikit-learn packages, and google Colab environment. Participants only need to register on google drive and have primary experience in working with google colab notebooks. We will also use the AstroML package which can be installed easily during the workshop.

Useful links for this workshop:

Google colab: https://colab.research.google.com/

Scikit-learn: https://scikit-learn.org/stable/

AstroML: https://www.astroml.org/index.html

Pytorch: https://pytorch.org

SDSS: https://www.sdss.org


  • The tutorial, in general, follows:

1. Basics of Machine Learning (ML)

2. Popular ML algorithms

3. ML in Astrophysics

4. SDSS Data, Download and Preparation 

5. Applying Some of the ML Methods on SDSS Data

6. Introduction of Neural Network

7. Basics of Pytorch/Scikit-learn

8. Neural Network Construction 

9. Network Training 

10. Evaluation


  • This workshop consists of three parts:

 

1.  Classification the stars using photometric optical data of SDSS:

M. H. Zhollideh Haghighi (IPM and KNTU, Iran)

RR Lyrae variables are periodic variable stars, commonly found in globular clusters. They are used as standard candles to measure (extra) galactic distances, assisting with the cosmic distance ladder. They are pulsating horizontal branch stars of spectral class A or F, with a mass of around half the Sun's. They are thought to have shed mass during the red-giant branch phase and were once stars of similar or slightly less mass than the Sun, around 0.8 solar masses. In contemporary astronomy, a period-luminosity relation makes them good standard candles for relatively nearby targets, especially within the Milky Way and Local Group. They are also frequent subjects in the studies of globular clusters. We use the set of photometric observations of RR Lyrae stars in the SDSS as our data. The data set comes from SDSS Stripe 82, and combines the Stripe 82 standard stars, which represent observations of non-variable stars; and the RR Lyrae variables pulled from the same observations as the standard stars, and selected based on their variability using supplemental data. The sample is further constrained to a smaller region of the overall color–color space following (0.7<u−g<1.35, −0.15 < g − r < 0.4, −0.15 < r − i < 0.22, and −0.21 < i − z < 0.25). These selection criteria lead to a sample of 92,658 non-variable stars, and 483 RR Lyraes. Two features of this combined data set make it a good candidate for testing classification algorithms: 

1- The RR Lyrae stars and main sequence stars occupy a very similar region in u, g, r, i, z color space. 

2- The extreme imbalance between the number of sources and the number of background objects is typical of real-world astronomical studies, where it is often desirable to select rare events out of a large background. Such unbalanced data aptly illustrates the strengths and weaknesses of various classification methods. 

Our goal is to characterize the relation between the features in the data and their classes and apply these classifications to a larger set of unlabeled data. In this hands-on session, participants will learn how to use machine learning algorithms in practice and classify observed stars from optical data. This session has two parts in the first part we try to classify objects by some well known conventional machine learning algorithms such as logistic regression and etc. In the second part we use Neural Network for our classification purposes.

 

2.  Classification of astronomical objects and determining their redshift using spectroscopic optical data of SDSS

Rahim Moradi (ICRANet-Italy​​​​​)​​

Quasi-stellar radio source (Quasars) or quasi stellar objects (QSO) are high-luminosity active galactic nuclei (AGN) which are believed to be powered by accretion disks around supermassive black holes (SMBHs) with masses in the range of 1 million to 1 billion solar mass. Thanks to their high luminosity, quasars have been found to spread from redshift z~0 all the way back to z~7 when the universe was forming its first structures, namely the epoch of reionization. Therefore, study the high-redshift quasars can be taken into account as a powerful tool to study the cosmic history and structure formation in the early universe. Owing to their existence at redshifts ranging from z=0 to z~ 7, quasars provide a new possible standard candle, like type Ia supernovae, which can infer new cosmological constraints to study the evolution of the universe.

In this part, after introducing the methods to process and prepare the spectroscopic optical data of SDSS, we represent the architecture of a 1-dimensional convolutional neural network (CNN) to estimate the redshift of quasars in Sloan Digital Sky Survey IV (SDSS-IV) catalog from DR16 quasar-only (DR16Q) of eBOSS. We show how this CNN can be easily extended in order to classify stars,  galaxies and quasars as well as prediction of their redshift. The CNN takes the flux of the quasars as an 1--dimensional array and their redshift as labels. Therefore, This CNN extract the spectroscopic features of SDSS data and predicts the redshift of quasars. We finally represent a similar CNN, but less efficient, which is already used by SDSS website to classify the quasars, stars and galaxies, as well as predict the redshift.

In this session, participants will learn how to process the SDSS spectral data in order to implement them in 1-dimensional CNN and observe the preliminary results.

 

3. More networks and more areas

Wang Yu (ICRANet-Italy​​​​​)​​

Based on the first two tutorials, we introduce more types of neural networks applied to more kinds of astronomical data.

In the above example of inferring redshift from SDSS data, we build simple but efficient 1D CNN networks and obtain accurate results. We further complicate the CNN network by introducing advanced structures such as Residual, Attention, etc., and applied the latest networks from the industry field to the same data to infer redshift, and to test whether the accuracy has improved.

Secondly, we make a brief introduction to gravitational wave and gamma-ray burst data, and transfer the above networks to the machine learning subjects of gravitational wave and gamma-ray burst. Astronomical data are nothing but temporal and spatial data, we hope this short tutorial can broaden the horizon and be able to build the network flexibly.


  • Workshop Materials : 

https://github.com/YWangScience/Isfahan-workshop-2021#isfahan-workshop-2021

  • Workshop Task: 

https://github.com/YWangScience/Isfahan-workshop-2021/blob/main/code/redshift-of-galaxies.ipynb

(Workshop task should be sent to Aidin Momtaz <a.momtaz98@gmail.com>  for evaluation and then your certificate will be issued )