A step-by-step process of Data Analysis and its Techniques.

Once you set out to collect data for analysis, you are overwhelmed by the amount of information that you find to make a clear decision. With so much data to handle, you need to identify relevant data for your analysis to derive an accurate conclusion and make informed decisions. The following simple steps will help you identify and sort out your data for analysis.

So, without any further due let’s get started!!!

1. Define your scope

  • Define straightforward and short questions. And the answers to which you finally need must help you to make a decision.
  • Define measurement parameters.
  • Define which parameter you wanna take into account and which one you are willing to negotiate.
  • Define your unit of measurement. Ex — Time, Currency, Salary, and more.

2. Data Collection

  • Gather your data based on your parameters.
  • Collect data from databases, websites, and many other sources. This data may not be structured or uniform, which takes us to the next step.
  • Check the below article to If you want some free datasets.

Places where you can get datasetsTop 5.medium.com

3. Data Processing

  • Organize your data and add side notes, if any.
  • Cross-check data with credible sources.
  • Convert the data as per the scale of measurement you have defined earlier.
  • Bin irrelevant data.

I hope by now you would have collected your data.

4. Data Analysis

  • Perform sorting, plotting, and identifying correlations.
  • As you manipulate and organize your data, at times you need to traverse your steps from the beginning, where you may need to modify your question and reorganize your data.
  • Use of the different tools available for data analysis.

5. Infer and Interpret Results

  • Analyze the result of your initial questions.
  • Analyze all parameters for making the decision.
  • Analyze the hindering factors for implementing the decision.
  • Choose data visualization techniques to communicate the message better.

Once you have an inference, always remember it is only a theory. Real-life scenarios at times interfere with your results. In the process of Data Analysis, there are a few related terminologies that identity with different phases of the process.

1. Data Mining

This process involves methods in finding patterns in the data sample.

2. Data Modelling

This refers to how an organization organizes and manages its data.

There are different techniques for Data Analysis the technique which you want to use depends only upon the type of data, and the amount of data gathered. Each focuses on strategies of taking onto the new data, mining insights, and drilling down into the information to transform facts and figures into decision making parameters. Accordingly, the different techniques of data analysis can be categorized as follows:

1. Techniques based on Mathematics and Statistics

  • Descriptive Analysis: Descriptive Analysis takes historical data, Key Performance Indicators, and describes the performance based on a chosen benchmark.
  • Dispersion Analysis: This technique allows data analysts to define the variability of the factors.
  • Regression Analysis: This technique works by modeling the relationship between a dependent variable and one or more independent variables.
  • Factor Analysis: This technique helps to determine if there exists any relationship between a set of variables.
  • Discriminant Analysis: It is a classification technique in data mining.
  • Time Series Analysis: In this, measurements are traversed across time, which gives us a collection of organized data known as time-series.

2. Techniques based on Artificial Intelligence and Machine Learning

  • Artificial Neural Networks: It is a Neural network is a biologically-inspired programming paradigm that presents a brain metaphor for processing information.
  • Decision Trees: As the name, it is a tree-shaped model that represents a regression model.
  • Evolutionary Programming: This technique combines the different types of data analysis using evolutionary algorithms.
  • Fuzzy Logic: It is a technique based on probability.

3. Techniques based on Visualization and Graphs

  • Column Chart, Bar Chart: These charts are used to present numerical differences between categories.
  • Line Chart: This chart is used to represent the change of data over a continuous interval of time.
  • Pie Chart: It is used to represent the proportion of different classifications. It is best for a series of data.
  • Funnel Chart: This chart represents the proportion of each stage and indicates the size of each module.
  • Word Cloud Chart: It is a visual representation of text data.
  • Gantt Chart: It shows the actual timing and the progress of activity in comparison to the requirements.
  • Scatter Plot: It shows the distribution of variables in the form of points over a rectangular coordinate system.
  • Bubble Chart: It is a variation of the scatter plot. Here, in addition to the x and y coordinates, the area of the bubble represents the 3rd value.
  • Gauge: It is a kind of materialized chart. It is a suitable technique to represent interval observations.
  • Frame Diagram: It is a visual representation of a hierarchy in the form of an inverted tree structure.
  • Rectangular Tree Diagram: This technique is used to represent hierarchical relationships but at the same level.
  • Heat Map: This represents the weight of each point in a geographic area. The color here represents the density.

Everyone stay tuned!! To get my stories in your mailbox kindly subscribe to my newsletter.

Thank you for reading! Do not forget to give your claps and to share your responses and share with a friend!

Originally published at https://fittechie.in on March 1, 2021.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adith - The Data Guy

Adith - The Data Guy

More from Medium

Ford GoBike System Data Exploration

Ford GoBike

Basic understanding of Machine Learning and tools

Missing values and their handling in R studio- Basic concepts