The Fundamentals Of Data Science

Postuar në - Ndryshuar më së Fundmi më

Data science is fast becoming one of the hottest professions. It acts as a backbone for businesses and organizations, so the need to hire a data scientist becomes inevitable.

There are many fundamentals to learn if you want to pursue a career in data science. Getting a job can be easy, as most people tag it as a no-go area of study. But this is not true if you are determined, and committed to developing your understanding.

As a data scientist, you have the option to choose how you want to utilize your skills. If you want to work as a freelancer, there are many businesses that will need your services. You may also decide to become an employee in an organization, and help them move forward through relevant data analysis. There are lots of benefits to enjoy as a data scientist, but you need a solid background in probability and statistics to do well in this field.

Having talked about the prospect of data science, let-s move a step further by considering the fundamentals of this area of specialization.

Data Science: The fundamentals

The two biggest buzzwords in this industry are “data science” and “big data.” While the latter is gaining interest all over the world, the former is turning out to be a very hot subject.

You should make sure you fully understand the background of data science - what are the basics required to truly make data science a science? Our quest can begin from here.

There are some critical questions we need to ask when it comes to the basics of data science: what does the word “data” really mean, what are our intentions with it, and what scientific approaches do we need to apply to achieve our set goals with data?

  • What is data?

  • What is the purpose of data science

  • The scientific approach

Probability & Statistics

The world we live in is probabilistic, so the data we work with is probabilistic - this implies that when given a set of preconditions, it is normal that data will show information in a particular way, for a specific period of time. For this reason, you need to be acquainted and comfortable with probability and statistics to be able to apply data science properly.

  • The two characteristics of data

  • Introduction to probability

  • Examples of statistical data

  • Statistical properties (Median, Mean, Mode, Standard Deviation, Moments, etc.)

  • Probability distribution

  • Joint & conditional probabilities

  • Common probability distributions (Binomial, Discrete, Normal)

  • Other probability distributions (Poisson, Chi-square)

  • Connections with statistical distribution

  • Bayesian interface

  • Bayes- rule

Decision Theory

This is certainly one of the major fundamentals. Whether applied in engineering, business, or science, our sole aim is to use data to make decisions. Data on its own is insignificant unless it is revealing something with which we can make a decision. How do these decisions come about? What factors do we consider during this decision making process? Which approach is best to use for deciding with data? Decision Theory tells us;

  • Bayes risk

  • Hypothesis testing

  • Likelihood ratio & log likelihood ratio

  • Binary hypothesis test

  • Optimal decision making

  • Neyman-Pearson criterion

  • Mary hypothesis test

  • Receiver operating characteristic curve

Estimation Theory

There are times we make characterization of data - parameter estimates, averages, etc. Estimating data is absolutely an extension of decision theory. It is the thing that follows immediately after decision making.

  • Unbiased estimation

  • Estimation as extension of Mary hypothesis test

  • Kalman filter

  • Minimum mean square error (MMSE)

  • Maximum A posteriori estimation (MAP)

  • Maximum likelihood estimation (MLE)

Coordinate Systems

This is another crucial section that plays a significant role in the outcome of data interpretation. To group different data elements into a single decision-making structure, we need to understand how to align the data correctly. At this point, it becomes imperative to have adequate knowledge of coordinate systems, and how to utilize them in bringing together disparate data.

  • Introduction to coordinate systems

  • Orthogonal coordinate system

  • Properties of orthogonal coordinate system (dot product, angle, coordinate transformation, etc.)

  • Transformation between coordinate systems

  • Polar coordinate system

  • Euclidian spaces

  • Cylindrical coordinate system

  • Cartesian coordinate system

  • Spherical coordinate system

Linear Transformation

After gaining mastery over coordinate systems, the next step is to learn how to transform the data to produce the underlying information. Linear transformation talks about turning our data into useful information through various transformation types, including the well-known Fourier transform.

  • Introduction to linear transformation

  • Matrix multiplication

  • Properties of linear transformation

  • Fourier transform

  • Uncertainty principle & aliasing

  • Properties of Fourier transform (shift variance, time-frequency relationship, convolution theorem, Parseval’s theorem, spectral properties, etc.)

  • Discrete & continuous Fourier transform

  • Wavelet & other transforms

Computation, and its Effect on Data

One aspect of data science that doesn’t get much attention is the impact algorithms play on the information we are trying to achieve. Merely applying computations and algorithms to create data products has a huge impact on effective, data-driven decision making. This section leads us on a road of advanced areas of data science.

  • Irreversible computation

  • Mathematical representation of computation

  • Impulse response function

  • Impacts on decision making

  • Reversible computation (Bijective mapping)

  • Transformation of probability distribution (due to subtraction, addition, division, multiplication, arbitrary computation, etc.)

Prototype coding/programming

One of the main features of data scientists is the willingness to get their hands dirty with data. They should be able to write programs that process, access, and visualize data in essential in languages in science & industry. This segment takes us to these crucial elements.

  • Introduction to programming

  • Functions

  • Data structures

  • Data types, functions, and variables

  • Loops, if-then-else, comparisons

  • Compilable languages vs. scripting languages

  • SAS

  • SQL

  • Python

  • C++

  • R

Graph Theory

Graphs are used to illustrate connections between various data elements. They are also crucial in the current interconnected world.

  • Introduction to graph theory

  • Directed graphs

  • Undirected graphs

  • Route & network problems

  • Various graph data framework

Algorithms

Having an understanding of how to use algorithms to compute essential data-derived metrics is the key to data science.

  • Introduction to algorithms

  • Gradient search

  • Recursive algorithms

  • Parallel, serial, & distributed algorithms

  • Randomized algorithms

  • Exhaustive search

  • Divide-and-Conquer binary search

  • Linear programming

  • Sorting algorithms

  • Shortest path algorithm for graphs

  • Heuristic algorithms

  • Greedy algorithms

Machine Learning

When looking at the fundamentals of data science, it would be incomplete if machine learning gets ignored. However, these techniques can be acquired by gaining mastery over the fundamentals described in sections above. Machine learning offers practitioners an understanding of essential and well-known machine learning techniques, and their importance.

  • Introduction to machine learning

  • Decision trees

  • Linear classifiers (Naïve Bayes Classifier, Logistic Regression, Support Vector Machines)

  • Expectation Maximization

  • Bayesian networks

  • Vector quantization

  • Hidden Markov Models

  • K-means Clustering comment

  • Artificial neural networks & deep learning

Conclusion

The importance of data science in all fields of life cannot be refuted. There is a lot of work available for a data scientist, and the rate at which businesses need this profession suggests more more people should venture into it. The fundamentals given above will guide you in starting a career in data science. There are more advanced topics to go through in this field, so you need to be extremely good in statistics and probability for you to succeed as a data scientist.

How was the list mentioned above? If you have other useful tips or questions to ask, you can drop them in the comment box below.

Postuar 7 shtator, 2017

LucyKarinsky

Software Developer

Lucy is the Development & Programming Correspondent for Freelancer.com. She is currently based in Sydney.

Artikulli tjetёr

30 Free Courses: Neural Networks, Machine Learning, Neural Networks, Algorithms, AI