Jump to content

Anomaly detection


Mat
 Share

Recommended Posts

Anyone got any ideas/advice here?

I'll be working with 1200000+ samples and a lot of noise.

I'm looking at knn and LOF atm, with a view to maybe using both side by side and collecting the result. Unfortunately I have no experience with data analysis other than a stats AS level (read as no experience at all).

Thanks for any info

Matt

Link to comment
Share on other sites

Mat, what format is the data?

You don't want to know.

It's a directory of 20+ csv files each with 60000 rows. Columns are a timestamp then 4 channels (float between +- 10). It's a mess. The reason it's a mess is because I was originally asked to output it in a format that could be opened in excel.

However, I don't think that's really that important. The timestamps can be ignored (intervals are ~constant), and the channels can be analysed separately, so it's just a question of where the anomalous results are in the sample.

Link to comment
Share on other sites

If the data is in csv format, you can easily transform it to xls.

You can determine the standard deviation, then see what values are far away from the mean.

Link to comment
Share on other sites

Thinking about this in my rule of thumb way of going about things: if there is a large amount of data and a large number of anomalies to watch out for, then I would think about taking smaller samples to look for a wider range of anomalies. Then if you find something that might be a candidate for deeper scrutiny, take larger samples and test them for similar anomalies. A kind of dead reckoning statistics. I don't know how relevant such an approach might be.

Edited by czardas
Link to comment
Share on other sites

Again, it's a 1 sample anomaly in 100000 that I'm looking for.

Grrr.... Guess there's no miracle cure then. One day someones going to reply and say: Oh yes, someone wrote exactly what you wanted >here< and I won't need to do it myself :graduated: I'm still waiting for that day, like a monkey at a typewriter.

Link to comment
Share on other sites

Find a way to graphicaly represent the data, then look for a change in the pattern. Human brain can see the difference lot more faster than any written code.

Link to comment
Share on other sites

"look" ... :graduated:

Think your ignoring some of the trailing zero's in the numbers Mat posted.

Edited by iEvKI3gv9Wrkd41u

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...