Abstract:
High utility itemsets (HUIs) mining is one of the emerging topics in the data mining.
Compared to the frequent itemsets mining, the utility mining provides more informative
and actionable information. High utility itemset mining discovers itemsets whose utility
is not less than user defined minimum utility threshold. Several studies have been carried
out on this topic, which often discovers a very large number of itemsets and rules, which
reduces not only the efficiency but also the effectiveness of HUI mining. In order to
increase the efficiency and discover more interesting HUIs, constraint-based mining
plays an important role. To address this issue, we propose an algorithm to discover HUIs
with length constraints named EHIL (Efficient High utility Itemsets with Length
constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two
new upper bound for pruning and modify them by incorporating length constraints. To
reduce the dataset scans, the proposed algorithm uses transaction merging and dataset
projection techniques. To speed up the utility counting, an array-based technique is
utilized. The execution time improvements ranged from a modest five percent to two
orders of magnitude across benchmark datasets. The memory usage is up to twenty eight
times less than state-of-the-art algorithm FHM+. The experimental results show that the
proposed algorithm achieves high performance.
The existing high utility itemsets mining algorithms can discover all the itemsets
satisfying a given minimum utility, it is often difficult for users to set a proper minimum
utility threshold. A smaller minimum utility threshold value may produce a huge number
of itemsets, whereas a higher one may produce a few itemsets. Specification of minimum
utility threshold is difficult and time-consuming. To address these issues, top-k high
utility itemsets mining has been defined where k is the number of high utility itemsets to
be found. In this paper, we present an efficient algorithm (named TKEH) for finding
top-k high utility itemsets. TKEH utilizes transaction merging and dataset projection techniques to reduce the dataset scanning cost. These techniques reduce the dataset when
larger items are explored. TKEH employs three minimum utility threshold raising
strategies. We adopt EUCP and sup strategies to prune search space efficiently. We
carried out some extensive experiments on real datasets. The results show that TKEH
outperforms the state-of-the-art algorithm. Moreover, TKEH always performs better for
dense datasets.
Most of the high utility itemsets mining algorithms work only for itemsets with positive
utility values. However, in the real-world, items are found with both the positive and the
negative utility values. To address this issue, we propose an algorithm named EHIN
(Efficient High-utility Itemsets Mining with Negative Utility) to find all HUIs with
negative utility. EHIN adopts two new upper bound named Revised sub-tree and Revised
local utility for pruning. To reduce the dataset scans, the proposed algorithm uses
transaction merging and dataset projection techniques. EHIN utilizes various properties
and pruning strategies to mine the HUIs with the negative utility. The experimental
results show that the proposed algorithm is 28 times faster and it consumes up-to 10
times less memory than the state-of-the-art algorithm FHN. Moreover, a key advantage is
that it always performs better for dense datasets.
High utility itemsets mining algorithms incur with the problem of generating a large
number of candidate itemsets and most of the generated itemsets are tiny in size which
degrade mining performance and action-ability. Apart from these problems, most of the
algorithms work only with positive utility value. In order to overcome these issues, we
propose an algorithm named EHNL(Efficient High utility itemsets mining algorithm
with Negative utility and Length constraints). Although negative utility and
constraint-based mining are commonly seen in real-world applications, mining HUIs
with negative utility and length constraints have not yet been proposed in literature. Most
of the algorithms suffer from the problem of repeatedly dataset scanning. To reduce the
scanning cost, dataset projection and transaction merging techniques are utilized. Toenhance the performance of proposed algorithm, we utilize sub-tree based pruning
technique. In order to check the efficiency of utilized techniques, the variation of the
proposed algorithms named EHNL(RSUP) and EHNL(TM) is introduced. The
experimental results show that variants of algorithm mine the high utility itemsets
efficiently. Moreover, the proposed variant algorithms also reduce memory requirement.
Traditional high utility itemsets mining algorithms mine a large number of itemsets, but
most of the mined itemsets are redundant. In order to overcome this issue, closed high
utility itemsets mining algorithms have been proposed which avoids duplicate itemsets.
However, closed high utility itemsets mining algorithms work only with positive utility
value. Mining closed high utility itemsets mining algorithm with negative utility has not
yet been proposed, although negative utility mining is commonly seen in the real-world
applications. To address this issue, we propose an efficient algorithm named CHN (Closed
High utility itemsets mining with Negative utility). To our best knowledge, this is first
work addressing the issue of closed high utility itemsets with negative utility value. It
relies on pattern-growth approach. It utilizes transaction merging and dataset projection
techniques to reduce the dataset scanning cost. In order to enhance the performance of
proposed algorithm, we edit and utilize sub-tree based pruning technique. Moreover, an
array-based utility counting technique is also utilized to speed up the utility counting
process. A bi-directional extension technique is also adopted to closure the closure and
prune the search space. In order to check the efficiency of utilized techniques, variation
of the proposed algorithms named CHN(RSU-Prune) and CHN(TM) are introduced. The
experimental results on dense and sparse datasets show that the variants of algorithms
mine the closed high utility itemsets efficiently.