mldr.datasets and the Ultimate Multilabel Dataset Repository
Until now, multilabel datasets have been provided in different file formats for different pieces of software. mldr was created with compatibility in mind and allowed to read two widely-known formats: datasets from Mulan and MEKA repositories in ARFF format.
With the creation of the Ultimate Multilabel Dataset Repository (RUMDR) and a new R package, mldr.datasets, a huge set of multilabel datasets are now available in a common format and with the possibility of being converted into many more.
Note: mldr.datasets does not depend on mldr, but it’s useful to have both of them installed to access all functionality.
After installing and loading the package, some pre-loaded datasets will be available directly in the environment:
These are accessible via their names and the usual members of
"mldr" objects (
$datasets…). Additionally, a
toBibtex() method is provided for fast access to the citation information for each dataset.
Larger datasets are available to download from the repository (you can consult the complete list of datasets or call
mldrs()) via the
stratified.kfolds() functions partition multilabel datasets following a random strategy and a stratified one, respectively.
write.mldr() function is able to export
"mldr.folds" objects into several file formats: Mulan, MEKA, KEEL, LibSVM and CSV. This way, regular, partitioned and preprocessed datasets can be saved for later use in any well-known multilabel classification tool.
We’ve updated mldr to integrate functionality from mldr.datasets when it’s installed. Thus, now calling
mldr() with simply a dataset name will trigger a search within the datasets in the repository. If a dataset isn’t found, the function will attempt to read the dataset locally (this behavior can be forced using the
Other changes in this update include exposing the
read.arff() function, able to read ARFF files and differentiate input and output features without calculating any related measure, as suggested in issue 26; several fixes related to dataset reading and calculation of measures, and slight changes in some calculations. For detailed information visit our changelog or the commit history.