

The varaibles attribute of the terms object is our old friend, a call > attr ( term, "variables" ) list ( y, x, log ( x )) > class ( attr ( term, "variables" )) "call" > form form y ~ x + log ( x ) > terms ( form, data = df ) y ~ x + log ( x ) attr (, "variables" ) list ( y, x, log ( x ))

First, the model formula is turned into a terms object, which contains metadata needed to identify which column represents the response, and which the predictors (the encoding of this information is pretty obscure, so I wont bother to unwind it here). The way R gets to this new data frame is quite interesting. The other indices return the arguments passed into the call > cl ] 1 > cl ] 2 The first index always gives the name of the function that was called (not as a string, as a symbol object) > cl cl ] f > class ( f ( 1, 2 )]) "name" A prototypical call to lm looks something like this m f cl cl > } > f ( 1, 2 ) f ( x = 1, y = 2 ) > class ( f ( 1, 2 )) "call"Ī call object is a small wonder, it can be indexed into and manipulated much like other R objects. It offers a friendly way to specify models using the core R formula and ame datatypes. Our point or origin is lm, the interface exposed to the R programmer. We will make heavy use of the R source code, which you can find here. It is essentially a much expanded version of my answer there. This essay is inspired by a question of Antoni Parellada’s on CrossValidated.

So, in the spirit of the famous thought experiment “ what happens when you type into your address bar and pres enter”, I’d like to discuss what happens when you call lm in R. It therefore may quite surprise the reader to learn that behind even the simplest of calls to R’s lm function lies a journey through three different programming languages, which in the end arrives at some of the oldest open source software still in common use. The mathematics behind fitting a linear regression is relatively simple, some standard linear algebra with a touch of calculus. One of my most used R functions is the humble lm, which fits a linear regression model.

R is a high level language for statistical computations.
