Abstract Machine Learning-based malware detection is a promising scalable method for identifying suspicious applications. In particular, in today's mobile computing realm where thousands of applications are daily poured into markets, such a technique could be valuable to guarantee a strong filtering of malicious apps. The success of machine-learning approaches however is highly dependent on (1) the quality of the datasets that are used for training and of (2) the appropriateness of the tested datasets with regards to the built classifiers. Unfortunately, there is scarce mention of these aspects in the evaluation of existing state-of-the-art approaches in the literature. In this paper, we consider the relevance of history in the construction of datasets, to highlight its impact on the performance of the malware detection scheme. Typically, we show that simply picking a random set of known …