Getting data in and out of R (Part 4)

Myth 3b: You can’t get data from R into Excel.

You guessed it – getting data from R into Excel works just as easy the other way round via the clipboard (see the previous post). It’s not (yet) possible to select and copy all data from an object in the spreadsheet view, but that’s no problem because there are other convenient solutions.

As we’ve used read.table() to read the clipboard’s contents, we can use the function write.table() and copy data into it. For the sake of compactness I will just give an example for matrices, because it works the same for vectors.

Let’s generate some random data first

> set.seed(1)
> x = data.frame(a=rnorm(10),b=rnorm(10),c=rnorm(10),d=letters[1:10])
> x
             a          b          c d
1   1.35867955 -0.1645236  0.3981059 a
2  -0.10278773 -0.2533617 -0.6120264 b
3   0.38767161  0.6969634  0.3411197 c
4  -0.05380504  0.5566632 -1.1293631 d
5  -1.37705956 -0.6887557  1.4330237 e
6  -0.41499456 -0.7074952  1.9803999 f
7  -0.39428995  0.3645820 -0.3672215 g
8  -0.05931340  0.7685329 -1.0441346 h
9   1.10002537 -0.1123462  0.5697196 i
10  0.76317575  0.8811077 -0.1350546 j

When we look at the help for write.table() we see a lot of parameters to tweak:

write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ",
            eol = "\n", na = "NA", dec = ".", row.names = TRUE,
            col.names = TRUE, qmethod = c("escape", "double"))

The parameter file is set to an empty string ("") by default what tells R to print the output in the console. This is great to try new parameters because you don’t have to paste something to Excel to see the results, so I will not write to "clipboard" for the moment.

> write.table(x,file="")
"a" "b" "c" "d"
"1" 1.35867955152904 -0.164523596253587 0.398105880367068 "a"
[ . . . output omitted . . . ]
"10" 0.763175748457544 0.881107726454215 -0.135054603880824 "j"

Looks good, but wait … the header contains four variables while there are five columns of data! The parameter row.names = TRUE is responsible for this, chances are that you don’t need this vector because there’s some other ID-variable in your data, so let’s drop it.

> write.table(x,file="",row.names=FALSE)
"a" "b" "c" "d"
1.35867955152904 -0.164523596253587 0.398105880367068 "a"
[ . . . output omitted . . . ]
0.763175748457544 0.881107726454215 -0.135054603880824 "j"

Now the number of columns in the header and data match, but Excel will still not be able to read it, because the delimiter is set to " ". We need to change this parameter to tab using sep="\t".

> write.table(x,file="",row.names=FALSE,sep = "\t")
"a"     "b"     "c"     "d"
1.35867955152904        -0.164523596253587      0.398105880367068       "a"
[ . . . output omitted . . . ]
0.763175748457544       0.881107726454215       -0.135054603880824      "j"

At this point we could already send the data to the clipboard (file="clipboard") and paste it into Excel without problems, but for the sake of completeness I will show some other parameters that you might need as well.

If you have missing values these will be copied as "NA" and Excel can’t interpret it, so set na="" to leave these cells empty. If you don’t want to copy variable names you can use col.names=FALSE. Usually you also don’t need strings to be quoted – quote=FALSE. And last but certainly not least some of you will need to change the decimal sign from point to comma dec=",".

Writing your own I/O functions

One seeming restriction of this approach might be the rather long command you need to enter in order to set all parameters the right way, e.g. write.table(x, file="clipboard", row.names=FALSE, sep = "\t", na="", quote=FALSE).
To remedy this just write your own function:

toExcel = function(x,varnames=TRUE){
   write.table(x, file="clipboard", row.names=varnames,
               sep = "\t", na="", quote=FALSE)
}

Now you have a nice little function and eliminated the problem of forgetting or messing up parameters when entering this looooong command.

It’s also a good idea to do something similar for the import function from my previous post:

fromExcel = function(varnames=TRUE){
   read.table("clipboard", header=varnames, sep = "\t")
}

Note that both functions have an argument (varnames=TRUE) to switch reading/writing of variable names on and off.

The only problem that remains is that you’d have to enter the definition of toExcel() and fromExcel() in every R session manually. To make both available in every started R session, open the file Rprofile.site in the /etc/ folder of your R distribution and append the functions’ definitions at the end. Now restart R and have fun with it.

Tags: , ,

Monday, March 2nd, 2009 R

Leave a Reply

*