While the open source R programming language has been popular for data analysis and statistical modeling, it has been tough to use with large data sets because it runs as a single thread on a machine, which limits the amount of data that can be analyzed. As a result, data scientists using R often analyze samples from a large data set rather than the entire data set, which may yield less accurate results.
1010data, a provider of data discovery software, hopes to address that shortcoming with an R package called R1010 that integrates directly with its Big Data discovery platform. According to the company, the R1010 package provides an interface to use the data and advanced analytics within 1010data directly via the R console.
Modelers who are comfortable within the R environment can easily apply their models to Big Data by executing queries against data frames that access 1010data tables of any size. The R1010 package also includes functions to easily establish and manage 1010data sessions, and to browse 1010data folders from within the R interactive console. Users can also integrate their favorite CRAN packages using the full R feature set to perform complex statistical analysis. In addition, the R1010 package has R Query Interface (RQI) functions, which provide a native R experience for query development.
"Combining 1010data's ability to analyze unlimited volumes of data with the broad set of statistical functions familiar to the R community, enables data scientists to apply sophisticated models to big data with ease," said Sandy Steier, co-founder and CEO of 1010data, in a statement.
Major software and services vendors are signing on with R. Microsoft in January purchased Revolution Analytics, which offers analytics software based on R. HP last month introduced an analytics solution powered by a distribution of R that works with its Vertica database.