Before doing data mining, you have to generate data. What data? All the data… You need to capture as much data as possible about your business, your processes, your customers, etc. Generating so much data comes at a cost, of course. However, tell yourself that the data you don't have is costing you revenue and potential efficiency. However, to generate value the data must be traceable.
You have to generate a lot of data, but not just anyhow. Indeed, data is of value only if it provides additional information. A test with a value of 87% is a given, but without context this data is useless. To generate value, data must be 100% traceable. Traceability is determined by two key concepts: context and relationship.
The context provides us with information on the measurement. For example, 87% is a measure of charging efficiency. The relationship gives us the link between this measure and the others. For example, 87% is a measurement taken on a battery with unique identifier 0x12_0x34_0x567. The more specific and detailed the context and relationships, the better the traceability.
If your business still uses a lot of paper. It may be that a plan industrialization 4.0 is necessary first. If your company already has digitized files listed in a maze of Excel files, CSV files, text files with names like "test_2005_12_25.txt", then it will take some time to improve traceability. It may be that a plan industrialization 4.0 or, once again, necessary to avoid continuing to generate data in this form.
Ready for data mining
The goal of data mining is not to generate more data. The French translation is more precise. It is indeed aboutdata mining. The goal is to find a correlation between the different data in order to predict trends, guide choices, make decisions, calculate a metric, etc.
Data mining is often associated with artificial intelligence. Sometimes that's the only solution. However, it is not always necessary or beneficial to develop machine learning models for data mining. A data miner with only a hammer in his toolbox ends up treating problems like nails. At Innovation Codotek, we will offer you the appropriate methods.
Knowledge of the field
Often, data is analyzed in a specific context. Knowledge of the application domain, the physics and the science behind the data is then very useful.
Before extracting meaning from it, the data must be prepared. Even if they are 100% traceable, there may be anomalies present. A power failure, human error, a faulty device; all of these events break the data. It is then necessary to correct, repair or eliminate certain data.
You may also want to preprocess them to eliminate trivial trends, low frequency variation, noise, etc.
Selecting data can be a task that requires different types of algorithms, statistical tools, or artificial intelligence. It can be easy, but it can also be the heart of the matter. We can think here of correlation algorithms,association learning, statistical selection or signal processing on analytical data.
Aggregation of data
Once the right data has been selected, it is useful to group it together to highlight the dependencies. We can think here of an algorithm of PCA model. The relationships between dependencies can then be used to classify new data or to derive useful value from it. At this point, we can talk about optimizing the color of the lighting as the correction to be applied on an interferometric spectrum.
Once the data has been gathered and the desired metrics have emerged, they must be highlighted. We can then record these metrics, transmit them to business intelligence software (BI software) or develop a suitable solution.
Completing the job correctly
Before finishing everything, make sure you validate the results. To do this, new data is needed. We make sure that the company also has all the visibility to continue using its data in our absence. Do you have in-house programmers? No problem! Our codes are developed with the best techniques for any programmer to understand.