Cancer Diagnosis using Medical Records
Python for Data Science Introduction
- Python, Anaconda and relevant packages installations
- Why learn Python?
- Keywords and identifiers
- comments, indentation and statements
- Variables and data types in Python
- Standard Input and Output
- Control flow: if else
- Control flow: while loop
- Control flow: for loop
- Control flow: break and continue
Python for Data Science: Data Structures
Python for Data Science: Functions
Python for Data Science: Numpy
Python for Data Science: Matplotlib
Python for Data Science: Pandas
Python for Data Science: Computational Complexity
Plotting for exploratory data analysis (EDA)
exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
- Introduction to IRIS dataset and 2D scatter plot
- 3D scatter plot
- Pair plots
- Limitations of pair plots
- Histogram and Introduction to PDF(Probability Density Function)
- Univariate Analysis using PDF
- CDF(Cumulative Distribution Function)
- Mean, Variance and Standard Deviation
- Percentiles and Quantiles
- IQR(Inter Quartile Range) and MAD(Median Absolute Deviation)
- Box-plot with Whiskers
- Violin Plots
- Summarizing Plots, Univariate, Bivariate and Multivariate analysis
- Multivariate Probability Density, Contour Plot
- Exercise: Perform EDA on Haberman dataset
It will give you the tools to help you with the other areas of mathematics required to understand and build better intuitions for machine learning algorithms.
- Why learn it ?
- Introduction to Vectors(2-D, 3-D, n-D) ,Row vector and column vector
- Dot Product and Angle between 2 Vectors
- Projection and Unit Vector
- Equation of a line (2-D), Plane(3-D) and Hyperplane (n-D), Plane Passing through origin, Normal to a Plane
- Equation of a Circle (2-D), Sphere (3-D) and Hypersphere (n-D)
- Equation of an Ellipse (2-D), Ellipsoid (3-D) and Hyperellipsoid (n-D)
Probability and Statistics
Dimensionality reduction and Visualization:
In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, via obtaining a set of principal variables. It can be divided into feature selection and feature extraction.
PCA(principal component analysis)
(t-SNE)T-distributed Stochastic Neighbourhood Embedding
Real world problem: Predict rating given product reviews on Amazon
Classification And Regression Models: K-Nearest Neighbors
Classification algorithms in various situations
Performance measurement of models
Solving optimization problems : Stochastic Gradient Descent
Support Vector Machines (SVM)
Case Study: Personalized Cancer Diagnosis.
A lot has been said during the past several years about how precision medicine and, more concretely, how genetic testing is going to disrupt the way diseases like cancer are treated. But this is only partially happening due to the huge amount of manual work still required.
Once sequenced, a cancer tumor can have thousands of genetic mutations. But the challenge is distinguishing the mutations that contribute to tumor growth (drivers) from the neutral mutations (passengers).
Currently this interpretation of genetic mutations is being done manually. This is a very time- consuming task where a clinical pathologist has to manually review and classify every single genetic mutation based on evidence from text-based clinical literature.
Objective:– To classify every single genetic mutation based on evidence from text-based clinical literature.
- Data Type:
- (||) pipe delimited and csv files.
- Training_variants.csv (Id , Gene, Variations, Class)
- Training_text (ID, Text)
- Test_variants.csv (Id , Gene, Variations)
- Test_text (ID, Text)
- Data Size: 159MB
- Validity of this course is 240 days( i.e Starts from the date of your registration to this course)
- Expert Guidance, we will try to answer your queries in atmost 24hours
- 10+ machine learning algorithms will be taught in this course.
- No prerequisites– we will teach every thing from basics ( we just expect you to know basic programming)
- Python for Data science is part of the course curriculum.
We are building our course content and teaching methodology to cater to the needs to students at various levels of expertise and varying background skills. This course can be taken by anyone with a working knowledge of a modern programming language like C/C++/Java/Python. We expect the average student to spend at least 5 hours a week over a 6 month period amounting to a 145+ hours of effort. More the effort, better the results. Here is a list of customers who would benefit from our course:
- Undergrad (BS/BTech/BE) students in engineering and science.
- Grad(MS/MTech/ME/MCA) students in engineering and science.
- Working professionals: Software engineers, Business analysts, Product managers, Program managers, Managers, Startup teams building ML products/services.
- Lectures 304
- Quizzes 0
- Duration 100+ hours
- Skill level All levels
- Language English
- Students 8
- Assessments Yes