Authors: Kaishi Hirahara, Keiichi Tamura, Hajime Kitakami and Shingo Tamura
Online documents on the Internet are represented as a document stream because the documents have a temporal order.This has resulted in numerous studies on extracting a frequent phenomenon (involving, e.g., keywords, users, and locations)known as a burst. In this paper, we propose a novel parallelization method for the parallel processing of Kleinberg’s burst detection algorithm in a large-scale document stream. In concrete terms,we suggest a way to combine the inter-task with the intra-task parallelization model. A combination of inter- and intra-task parallelization can achieve seamless dynamic load balancing, and detect a burst in a large-scale document stream on memory.
Keywords: document stream; burst detection; parallel processing;dynamic load balancing; text mining;