Network data stream simulation with time range LDA pattern mining

Mbyllur Postuar 2 vite mё parё Paguhet në dorëzim
Mbyllur Paguhet në dorëzim

This project involves the simulation of a SIEM system using Latent Dirichlet Allocation for IoT device streams. It can be implemented in R, Python, C++ or any relevant language that achieves the outcome.

Workflow

Input config > random & pattern generated content streams > stream chunks > LDA parser > output pattern frequency & topics per stream

Data Generation

Input config > random & pattern generated content streams

The generator should be configurable and able to create network simulation data streams. Each stream generates random content and includes generated content as provided by the config file:

1. stream information

2. string and regex patterns to include in the stream (generator fills the regex with matching values)

3. occurrence frequency (range 0 to 10) which represents the number of the generated string and regex patterns to include per minute. Does not have to be very sophisticated, just relatively different.

The generator can be started and stopped.

Example inputs configuration for 2 streams in JSON format.

/ input/[login to view URL]

{

{

“name”: “endpoint1”,

“ip”: [login to view URL],

“port”: 345,

{

“pattern”: “IP_EXT: '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}' MSG: ^#[^ !@#$%^&*(),.?":{}|<>]*$ USER: ^[a-z0-9_-]{3,15}$”

“frequency”: 2

},

{

“pattern”: “PAYLOAD: ^ABC_[^ !@#$%^&*(),.?":{}|<>]*$ ID: ^[a-z0-9_-]{30,150}$”

“frequency”: 5

},

},

{ “name”: “syslog1”,

“ip”: [login to view URL],

“port”: 534,

{

“pattern”: “IP_EXT: '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}' MSG: ^#[^ !@#$%^&*(),.?":{}|<>]*$ USER: ^[a-z0-9_-]{3,15}$”

“frequency”: 2

},

{

“pattern”: “PAYLOAD: ^ABC_[^ !@#$%^&*(),.?":{}|<>]*$ ID: ^[a-z0-9_-]{30,150}$”

“frequency”: 5

},

},

}

Sample stream chunk.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed euismod eros a lectus porttitor, vitae aliquet magna ullamcorper. Praesent in enim non magna vehicula faucibus. Vestibulum lacinia velit ut dolor aliquet tincidunt. IP_EXT: [login to view URL] MSG: #abyx USER: das-dkjh Ut consectetur hendrerit massa vel tempus. Nulla sit amet libero id felis lacinia accumsan. PAYLOAD: ABC_aS57dasd USR: 42d8ffe6-8a65-416c-ac92-d5826315faa6 In dictum porta magna sed lectus venenatis. Aliquam accumsan molestie augue, sit lectus amet vulputate metus tristique et. Ut a lectus erat elit….

Regex specifications from

[login to view URL]

[login to view URL]

[login to view URL]

Stream Parser

stream chunks > LDA parser > output pattern frequency & topics per stream

The streams are red by a parser application which reads each input stream for a configurable span of time (e.g. 30 seconds) as input chunks. You must use the Latent Dirichlet Allocation package or method to analyze the data and create/append to 3 log files per stream. Each run is in a new output folder with a timestamp from when the run began.

1. the found matching patterns log (use the input file to identify patterns),

2. the count of the patterns in that timespan log, and

3. up to 10 highest frequency single string terms (LDA topics, occurrence > 1 & not in regex patterns?)

Attached is a research paper related to the filed of study. My aim is to replicate the basic stream generation and pattern matching using LDA. It is just a proof of concept and not for production code. Good use of comments is always welcome!

Programim në gjuhën C++ Gjuhё programimi R Python

ID Projekti: #30617901

Rreth projektit

3 propozimet Projekti në distancë Aktiv 2 vite mё parё

3 profesionistë freelancer dërguan një ofertë mesatare prej $92 për këtë punë

StatisticandArt

Hi, I graduated Bachelor of Statistics. I have experience using R, IBM SPSS, IBM Amos, IBM Modeler, and Tableau because that application have been learned when i was college. I am also a specialist in Basic Statistica Më shumë

$100 USD për 5 ditë
(10 Përshtypje)
3.1
ArtemStakheev

Hello! This is Artem from Russia who has been working as an Desktop App developer for the last 6 years. I have checked the project description and I think that I can help you to do this project. I am fully feeling co Më shumë

$100 USD për 7 ditë
(1 përshtypje)
0.4
dhruvradadiya111

the reason why something is done or used : the aim or intention of something. : the feeling of being determined to do or achieve something. : the aim or goal of a person : what a person is trying to do, become, etc.

$75 USD për 7 ditë
(0 Përshtypje)
0.0