Hi folks, after a long time I’m going to start a new series of articles that cover tutorials as well in the field of Data Science, Data Analytics, Machine Learning, Deep Learning, AI, and try to cover other areas too.
This article covers what is Data Analysis | Exploratory Analysis with details, and all the definition is marked in my own word to explain and understand easily, maybe the answer is different from Wikipedia or other website or office one but means are same.
Well, we analyze data every day, every hour, every minute, with our brain, using our 5 senses, reads the input processes it, and makes some conclusions.
For example, you are driving, and suddenly you stuck in very long traffic. your brain receives the data that something happening around you, so analyzing a new path to reach your destination, so you take either subway or another highway, that’s what you get your output.
Analysis Process
If we define data analysis process it takes 4 steps to complete: –
- Define the Problem
- Collect the data
- Process the data
- Create report/Output
So before explaining the thing what all this process and why we need them in Data Analysis. The main question asked by almost every newbie that – “How can I become a Data Analyst”, SO tada- here your First process Define the problem.
Means you know that you have a problem that you don’t know – You Have your problem defined, so what you do is you collect the information about Data analysis.
after that, you learn them, process by process, step by step, and acquire knowledge and after all this, you finally reached your final destination and become a data analyst.
So, the above answer is written in a simple and understanding language. now I’m going to explain more and here are we go.
Define the Problem
Before any solution we need problem
If your business is not growing, then you have to look back and acknowledge your mistakes and make a plan again without repeating those mistakes.
And even if your business is growing, then you have to look forward to making the business grow more. that’s we call defining the problem.
Example
Say we work at a BtoB software company that sells products subscription — let’s call the company techjunkgigs. company’s business is built on customers subscribing to our massive online inventory and logistic portal. Users are billed monthly. We have data about users who have canceled their subscription and those who have continued to renew month after month. Our management team wants us to analyze our customer data.
What is the problem we are trying to solve?
Well, as a company, the techjukgigs want to predict whether or not customers will cancel their subscription.
Collect the data
With our question clearly defined what we want, now it’s time to collect your data. As you collect and organize your data, remember some important point to keep in mind:
- Before you collect new data, determine what information could be collected from existing databases or sources on hand. Collect this data first.
- Determine a file storing and naming system ahead of time to help. This process saves time and prevents you and your members from collecting the same information twice.
- If you need to gather data via observation or interviews, then develop an interview template ahead of time to ensure consistency and save time.
- Keep your collected data organized in a log with collection dates and add any source notes as you go (including any data normalization performed). This practice validates your conclusions down the road.
Process the data
After you’ve collected the right data to answer your question from Step 1 “Define the Problem“, it’s time for deeper data analysis. Begin by manipulating your data in a number of different ways, such as plotting it out and finding correlations or by creating a pivot table in Excel.
A pivot table lets you sort and filter data by different variables and lets you calculate the mean, maximum, minimum and standard deviation of your data
During this step, data analysis tools and software are extremely helpful. Visio, Minitab, and Stata are all good software packages for advanced statistical data analysis. today widely data scientist uses the Python/R programming to analyze the data.
Create report/Output
After analyzing your data and possibly conducting further research, it’s finally time to interpret your results.
As you interpret your analysis, keep in mind that you cannot ever prove a hypothesis true. rather, you can only fail to reject the hypothesis. Meaning that no matter how much data you collect, chance could always interfere with your results.
As you interpret the results of your data, ask yourself these key questions:
- Does the data answer your original question? How?
- Does the data help you defend against any objections? How?
- Are there any limitations on your conclusions, any angles you haven’t considered?
If your interpretation of the data holds up under all of these questions and considerations, then you likely have come to a productive conclusion. The only remaining step is to use the results of your data analysis process to decide your best course of action.
By following these five steps in your data analysis process, you make better decisions for your business.
Check Python Python Libraries for Data Science and other articles
Top 4 Python Libraries for Data Science in 2018
Data Science – First Step with Python and Pandas (Read CSV File)
I hope this post helped you to know What is Data Analysis. To get the latest news and updates follow us on Twitter & Facebook, subscribe to our YouTube channel. And If you have any query then please let us know by using the comment form.
Sonu says
Great information about data science and its process. Thanks for sharing and keep posting.