Mining high utility itemsets with positive and negative utility values

Singh, Kuldeep

IDR Home
→
Thesis
→
Department of Computer Science and Engineering
→
View Item

Mining high utility itemsets with positive and negative utility values

Singh, Kuldeep

URI: http://localhost:8080/xmlui/handle/123456789/254

Date: 2018

Abstract:

High utility itemsets (HUIs) mining is one of the emerging topics in the data mining. Compared to the frequent itemsets mining, the utility mining provides more informative and actionable information. High utility itemset mining discovers itemsets whose utility is not less than user defined minimum utility threshold. Several studies have been carried out on this topic, which often discovers a very large number of itemsets and rules, which reduces not only the efficiency but also the effectiveness of HUI mining. In order to increase the efficiency and discover more interesting HUIs, constraint-based mining plays an important role. To address this issue, we propose an algorithm to discover HUIs with length constraints named EHIL (Efficient High utility Itemsets with Length constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two new upper bound for pruning and modify them by incorporating length constraints. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. To speed up the utility counting, an array-based technique is utilized. The execution time improvements ranged from a modest five percent to two orders of magnitude across benchmark datasets. The memory usage is up to twenty eight times less than state-of-the-art algorithm FHM+. The experimental results show that the proposed algorithm achieves high performance. The existing high utility itemsets mining algorithms can discover all the itemsets satisfying a given minimum utility, it is often difficult for users to set a proper minimum utility threshold. A smaller minimum utility threshold value may produce a huge number of itemsets, whereas a higher one may produce a few itemsets. Specification of minimum utility threshold is difficult and time-consuming. To address these issues, top-k high utility itemsets mining has been defined where k is the number of high utility itemsets to be found. In this paper, we present an efficient algorithm (named TKEH) for finding top-k high utility itemsets. TKEH utilizes transaction merging and dataset projection techniques to reduce the dataset scanning cost. These techniques reduce the dataset when larger items are explored. TKEH employs three minimum utility threshold raising strategies. We adopt EUCP and sup strategies to prune search space efficiently. We carried out some extensive experiments on real datasets. The results show that TKEH outperforms the state-of-the-art algorithm. Moreover, TKEH always performs better for dense datasets. Most of the high utility itemsets mining algorithms work only for itemsets with positive utility values. However, in the real-world, items are found with both the positive and the negative utility values. To address this issue, we propose an algorithm named EHIN (Efficient High-utility Itemsets Mining with Negative Utility) to find all HUIs with negative utility. EHIN adopts two new upper bound named Revised sub-tree and Revised local utility for pruning. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. EHIN utilizes various properties and pruning strategies to mine the HUIs with the negative utility. The experimental results show that the proposed algorithm is 28 times faster and it consumes up-to 10 times less memory than the state-of-the-art algorithm FHN. Moreover, a key advantage is that it always performs better for dense datasets. High utility itemsets mining algorithms incur with the problem of generating a large number of candidate itemsets and most of the generated itemsets are tiny in size which degrade mining performance and action-ability. Apart from these problems, most of the algorithms work only with positive utility value. In order to overcome these issues, we propose an algorithm named EHNL(Efficient High utility itemsets mining algorithm with Negative utility and Length constraints). Although negative utility and constraint-based mining are commonly seen in real-world applications, mining HUIs with negative utility and length constraints have not yet been proposed in literature. Most of the algorithms suffer from the problem of repeatedly dataset scanning. To reduce the scanning cost, dataset projection and transaction merging techniques are utilized. Toenhance the performance of proposed algorithm, we utilize sub-tree based pruning technique. In order to check the efficiency of utilized techniques, the variation of the proposed algorithms named EHNL(RSUP) and EHNL(TM) is introduced. The experimental results show that variants of algorithm mine the high utility itemsets efficiently. Moreover, the proposed variant algorithms also reduce memory requirement. Traditional high utility itemsets mining algorithms mine a large number of itemsets, but most of the mined itemsets are redundant. In order to overcome this issue, closed high utility itemsets mining algorithms have been proposed which avoids duplicate itemsets. However, closed high utility itemsets mining algorithms work only with positive utility value. Mining closed high utility itemsets mining algorithm with negative utility has not yet been proposed, although negative utility mining is commonly seen in the real-world applications. To address this issue, we propose an efficient algorithm named CHN (Closed High utility itemsets mining with Negative utility). To our best knowledge, this is first work addressing the issue of closed high utility itemsets with negative utility value. It relies on pattern-growth approach. It utilizes transaction merging and dataset projection techniques to reduce the dataset scanning cost. In order to enhance the performance of proposed algorithm, we edit and utilize sub-tree based pruning technique. Moreover, an array-based utility counting technique is also utilized to speed up the utility counting process. A bi-directional extension technique is also adopted to closure the closure and prune the search space. In order to check the efficiency of utilized techniques, variation of the proposed algorithms named CHN(RSU-Prune) and CHN(TM) are introduced. The experimental results on dense and sparse datasets show that the variants of algorithms mine the closed high utility itemsets efficiently.

Show full item record

Files in this item

Name: Title page.pdf

Size: 32.16Kb

Format: PDF

Description: Title page

Mining high utility itemsets with positive and negative utility values

Mining high utility itemsets with positive and negative utility values

Abstract:

Files in this item

This item appears in the following Collection(s)

Search in IDR

Browse

All of IDR

This Collection

My Account