Jump to content
Sign in to follow this  
Mat

Anomaly detection

Recommended Posts

Mat

Anyone got any ideas/advice here?

I'll be working with 1200000+ samples and a lot of noise.

I'm looking at knn and LOF atm, with a view to maybe using both side by side and collecting the result. Unfortunately I have no experience with data analysis other than a stats AS level (read as no experience at all).

Thanks for any info

Matt

Share this post


Link to post
Share on other sites
Mat

Mat, what format is the data?

You don't want to know.

It's a directory of 20+ csv files each with 60000 rows. Columns are a timestamp then 4 channels (float between +- 10). It's a mess. The reason it's a mess is because I was originally asked to output it in a format that could be opened in excel.

However, I don't think that's really that important. The timestamps can be ignored (intervals are ~constant), and the channels can be analysed separately, so it's just a question of where the anomalous results are in the sample.

Share this post


Link to post
Share on other sites
taietel

If the data is in csv format, you can easily transform it to xls.

You can determine the standard deviation, then see what values are far away from the mean.

Share this post


Link to post
Share on other sites
Mat

The problem is that it's not a case of there being 1 line and a few anomalies. I should have added that at the beginning.

It will change, so it will start at 5 (+-1), step up to 7 +-1 go down to 2 +-1, and it's those changes that are important as well as the anomalies.

Share this post


Link to post
Share on other sites
czardas

Thinking about this in my rule of thumb way of going about things: if there is a large amount of data and a large number of anomalies to watch out for, then I would think about taking smaller samples to look for a wider range of anomalies. Then if you find something that might be a candidate for deeper scrutiny, take larger samples and test them for similar anomalies. A kind of dead reckoning statistics. I don't know how relevant such an approach might be.

Edited by czardas

Share this post


Link to post
Share on other sites
Mat

Again, it's a 1 sample anomaly in 100000 that I'm looking for.

Grrr.... Guess there's no miracle cure then. One day someones going to reply and say: Oh yes, someone wrote exactly what you wanted >here< and I won't need to do it myself :graduated: I'm still waiting for that day, like a monkey at a typewriter.

Share this post


Link to post
Share on other sites
taietel

Find a way to graphicaly represent the data, then look for a change in the pattern. Human brain can see the difference lot more faster than any written code.

Share this post


Link to post
Share on other sites
MvGulik

"look" ... :graduated:

Think your ignoring some of the trailing zero's in the numbers Mat posted.

Edited by iEvKI3gv9Wrkd41u

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×