Jamie Gaskins

Ruby/Rails developer, coffee addict

Perpetuity 1.0.0.beta Released

After what feels like way too long, I’ve finally released a 1.0 beta of Perpetuity. For those unfamiliar, Perpetuity is an implementation of the Data Mapper pattern in Ruby (and, from what I can tell, it was the first one in Ruby). If you’re used to ActiveRecord, it may feel a little awkward at first because suddenly your objects stand on their own, but this actually gives you a significant amount of freedom in how you structure your objects.

What makes Perpetuity awesome?

Because I love lists…

  • Your objects are whatever you want rather than forced subclasses of a library base class.
  • The query syntax is very similar to Ruby’s Enumerable module
  • Persisting entire object graphs is a one-liner for new objects (great for seed/test data)

Objects can be whatever you want

With most ORMs, your persisted objects are required to be subclasses of some library base class. Some ORMs do this the least evil way and let you include the persistence behavior as a mixin, but that’s still imposing.

Perpetuity allows your objects to be POROs (plain-old Ruby objects) or you can use gems like Virtus to give them a bit of a friendlier feel. As long as they save state in instance variables, Perpetuity can stick them into your database in a queryable form.

Query syntax

I get tired of writing Rubified SQL. I like to think of database tables/collections as arrays on disk, and we query arrays in Ruby using the select method and passing a block:

1
array.select { |object| object.name == 'foo' }

With Perpetuity, we query a database with the exact same syntax.

1
Perpetuity[Foo].select { |foo| foo.name == 'foo' }

The database adapter transforms this into its own query format:

1
2
/* PostgreSQL */
SELECT * FROM "Foo" WHERE name = 'foo'
1
2
// MongoDB
db.Foo.find({"name":"foo"})

You can find more information on queries in the project README.

Persisting entire object graphs

If you’re creating a new set of objects, such as seed data, test data, or just a complex graph that gets created when a new user registers (we’ve all seen Rails apps with a dozen after_create hooks on the User model), you can persist them all by inserting the top-level parent object. It will automatically persist all of its attributes if necessary.

Install Perpetuity

If you’d like to try out Perpetuity in an application, simply add one of the database adapters to your Gemfile:

1
2
gem 'perpetuity-postgres'
gem 'perpetuity-mongodb', '1.0.0.beta'

Configuration can also be a one-liner:

1
Perpetuity.data_source :postgres, 'my_pg_db'

For a more robust configuration:

1
2
3
4
5
Perpetuity.data_source :postgres, 'my_pg_db', host: 'localhost',
                                              port: 5432,
                                              username: 'spiderman',
                                              password: 'nobodyknowsimpeterparker',
                                              pool_size: 20

This would go in a Rails initializer or a file required by your application on startup.

As of this writing, the Postgres adapter, the one most people have been waiting for, does implement most of Perpetuity’s CRUD features but is missing indexing and a few of the niceties. The MongoDB driver fully implements all of Perpetuity’s current features, though. To configure it, put :mongodb in place of :postgres in the config line above.

You can find a lot more information on usage in the Perpetuity project readme. If you find any problems with Perpetuity or either of the database adapters, please let me know via the issue tracker or a tweet (preferably with a gist showing how to reproduce).

Perpetuity PostgreSQL Adapter Coming Soon

For those unfamiliar, Perpetuity is an object-relational mapper that follows Martin Fowler’s Data Mapper pattern. It is the first implementation of this pattern in Ruby.

Now that Perpetuity’s API has been stabilized somewhat, I’ve been working on a PostgreSQL adapter. This has been the #1 feature request since I began working on it. I don’t agree with the usual reasons behind this request, but I don’t think it’s an unreasonable one. You can’t always control what DB you get to use and at least you’ll be able to use Perpetuity if you can’t use MongoDB in production.

The first thing this made me think of was that there’s absolutely no point in keeping the dependency on the moped gem if you’re not going to use it, but adding that dependency manually in your app’s Gemfile seems unnecessary. Ideally, you shouldn’t have to think about your dependencies’ dependencies. Obviously, sometimes you do have to think about them, but I try to keep things as close to ideal as I reasonably can.

While discussing this with Kevin Sjöberg in a few of our pairing sessions (he’s paired with me multiple times on Perpetuity and has given some very valuable feedback on a lot of it), we discussed separating the adapters into perpetuity-mongodb and perpetuity-postgresql gems. This seems like the best idea so far.

These gems will have perpetuity as a dependency so that you can simply put the adapter gem into your Gemfile and get both the adapter and the main gem, similar to how rspec-rails depends on rspec. This also allows for plugin-style database adapters, allowing Perpetuity to talk to other databases without itself knowing about every available one.

What Even Is a Fake Geek Girl?

I’ve never understood the whole “fake geek girl” thing. Every time a woman does something geeky, claims to be a geek, or otherwise displays some form of geekiness, there’s at least one man around that cannot wrap his mind around the fact that you can be a geek without a Y chromosome. This man will generally quiz this woman on her knowledge of geek things such as stereotypically geeky books like Lord of the Rings, some comic book or even a TV show or movie like Star Trek or Star Wars. When she falters on one of these questions (which is extremely likely because the man is intentionally trying to trip her up by asking obscure things), he labels her as a “fake geek girl”.

The fallacy here is that this hypothetical (and all too often very real) man believes that the pop quiz he’s giving this hypothetical (and, again, often very real) woman is about topics and ideas common to all geeks. Let me be perfectly clear about this: there is nothing common to all of us other than our unwavering enthusiasm for something.

Here’s a list of things that would get me labeled as a fake geek if my gender were different:

  • I’ve never read LOTR. I read hardly any fiction at all, actually.
  • I read a few X-Men comic books on the rare occasion I could get one but I didn’t read any others. No Avengers, Incredible Hulk, Spider-Man, Batman, Superman, etc. I learned about most of these from the after-school cartoons instead.
  • I never got into D&D. I played a few games of it but I couldn’t tell you what edition I like best because I don’t know the difference.
  • I’ve never played Settlers of Catan or any game like it. Ever. I played Monopoly.
  • I didn’t get into Star Wars until I was in my 20s. I even liked Episode II because I thought the idea of Yoda jumping around with a lightsaber was awesome because I never expected it and only recently learned that it pissed off a lot of people who are really into the lore.
  • I love Star Trek: The Next Generation, Voyager and Deep Space Nine but I never really got into The Original Series.

These would all cause a lot of geek heart attacks, but nobody calls me on them or any of a dozen other “geek” things they grill women on simply because I’m a guy? Honestly, I think most geek guys would fail quite a few of these and I think a lot of them wear the geek label because it’s the cool thing these days.

The reason I was considered a geek growing up was because 20 years ago, to own and be able to operate a computer as a teenager was still considered super geeky. I was lucky enough to be introduced to computers at a very young age (we had one of the original 1984 Macs) and I fell in love with it. It did things when I told it to. If it screwed up, it was because I screwed up in telling it what to do. It was pure logic and it made so much sense to me.

When I was 12, I found out I could write my own software for it. I was no longer limited to software written by other people. And if knowing how to operate a computer was geeky, being able to write my own software was super geeky. But that’s really the only thing that made me a geek. I would be labeled an outsider in a heartbeat if they quizzed me like they do to women.

Sure, 20 years ago “geek culture” was pretty much nothing but men. I don’t mean to say that there weren’t geek women, but gender roles were much more mainstream back then. I think that’s where this whole stigma comes from.

Today, I ran across this image on Twitter, which I felt was entirely justified:

Who's the fake geek now?

There are plenty of “geeky things” enjoyed by a lot of women that most self-proclaimed geek guys will never know about, as evidenced in the image above. My girlfriend is probably a lot geekier than I am. She’s read all the things, she plays all the latest video games, and she is more in touch with “geek culture” than I am. I just follow a bunch of programmers on Twitter and get my geek news from retweets. Hell, that’s how I found out that the whole “fake geek girl” crap was even a thing. I had no idea that people were accusing women of this simply because they’re women.

Geek culture is much cooler now than it was back then and this will obviously attract all kinds of people. This isn’t a bad thing. Geek culture isn’t an exclusive club. We were all labeled geeks growing up because we didn’t fit into someone else’s elitist clique. Let’s not be those fuckers. So, you want to be a geek? By all means …

Geek out with your beak out!

Don’t call yourself a geek because it’s the cool thing, but do it because you love something nearly to the point of obsession. Do it because you’re amazingly talented at something.

The reason sci-fi and fantasy worlds are considered geeky is because they were not generally accepted by the mainstream (whatever that means). By that definition, a lot of other quirks and communities are considered geeky, as well (this is a very incomplete list):

  • LGBTQ — the LGB part of it is becoming more accepted as time goes on, but there’s still a long way to go there and transgender people still confuse the shit out of most of the population.
  • Cosplay — they love dressing up as their favorite characters and aren’t afraid to show it to everyone.
  • BDSM — it’s like sci-fi used to be, nearly everyone’s interested in it in some way but few will admit it publicly.

Don’t ever let anyone tell you you’re not good enough to fit in here. Anyone who tells you that doesn’t realize that a lot of us are geeks because it’s the only place everyone fits in.

Why I Like Developing With MongoDB

MongoDB is a document-oriented database. When I say “document”, I don’t mean the Microsoft Office variety. Specifically, it stores BSON documents. BSON is a form of JSON, but rather than JSON’s text representation, BSON is stored in binary form for efficiency. For the purposes here, BSON and JSON may be used interchangeably. Keep in mind that they are mostly equivalent, just stored in different form.

Some people hate MongoDB

There is a lot of MongoDB hate. A lot. I’m not going to go into examples here, but a lot of people have historically lost data with MongoDB due to not really knowing how to configure it properly. This is probably also a fault of the authors for not making it painfully obvious how to configure the database server/cluster for their purposes.

The problem comes from the database defaults being tuned for performance. This gives it excellent benchmarks and in a single-server installation this is fine, but makes durability across a cluster an issue. However, a cluster can be tuned for durability. I won’t go into that here, though, because this article isn’t about configuring MongoDB.

The only reasonable complaint I’ve personally seen is from people losing data after upgrading their MongoDB installation. This is bad, but as with any upgrade, you should backup your data first. Importing afterward is pretty straightforward.

Why I love MongoDB

When it comes to programming, there are a lot of reasons to choose one language or framework over another. I choose Ruby because, even with all of its drawbacks, it still conforms to my tastes better than any other language I’ve used. One of the core philosophies of Rails (besides “do what DHH feels like”) is that minor details get out of the way and let you focus on building web apps. This is why we don’t have to think about things like CSRF protection and HTTP headers/requests/responses except in special cases. I love that I don’t have to defend against CSRF in every POST request or even think about HTTP at all in the vast majority of my controllers.

MongoDB shows similar qualities to both Ruby and Rails and that’s what I love about developing applications with it.

Flexibility of data types

For the most part, JSON values are straightforward. A value:

  • surrounded by quotes is a string
  • surrounded by square brackets is an array/list
    • Each value in the array can also be of any type
  • surrounded by curly braces is a JSON object/hash/map/dictionary with keys and values (Note: BSON has a couple restrictions on keys)
  • without any decoration is numeric (or a variable reference, if supported by whatever you’re doing with JSON)

There is no type declaration for your data. You don’t tell the DB that all “email” attributes have to be a string. They don’t even all have to be the same type. If you want your values to be numeric in some, strings in others, and objects in others, you can do that.

SQL databases, on the other hand, are pretty inflexible, which is annoying in development. Every field in the same column of every row has to have the same type — I realize that this is fine for most cases, but there are times when that’s infeasible. One reason I develop in Ruby is so that I’m not constrained by types. Every object can be any type of object.

Databases are used primarily to store the state of an object, so if an object can hold different types of data in the same attribute, I should be able to store that as such in the database. I might have a legitimate reason to store strings and numeric values in the same field — and storing every single value as a string, then converting back to integers/floats may not be what I want to do.

HERE BE DRAGONS

The flexibility of data types can cause trouble with existing data if you decide to change things down the line. If, for example, one of your classes decides to assume that one of its objects’ attributes will be stored as strings when it’s been nothing but numerics so far, you’ll need to ensure that this is true.

1
2
3
4
5
6
# Using Perpetuity
my_objects = Perpetuity[MyClass].all
my_objects.each do |object|
  object.my_attribute = object.my_attribute.to_s
  Perpetuity[MyClass].save object
end

The only way I could find to do this was to update each document individually. I was hoping I could pass a JS function to the update, which would let me run the update in a single query, but I couldn’t figure out a way to do that. If anyone knows if this is possible, tweet at me.

Flexibility of structure

In a SQL DB, every time you add a new data attribute to an object that needs to be persisted, you need to add a column to the DB. In apps with large amounts of data, this can cause downtime, which can cost you money. In development, this stops the developer’s momentum while she runs a migration. If that data changes for any reason, that’s another migration.

The single best thing from a developer’s point of view is that adding an attribute to a MongoDB collection is that it’s as simple as adding the key to the document. There is no ALTER TABLE. You just pretend it was there all along. You can treat documents without that key as having nil as that attribute’s value (including in queries). This is the default state of any instance variable or hash lookup in Ruby anyway.

Some people claim this is actually a weakness, that it can hide bugs in your code, that a rigid structure will raise exceptions when you try to give it an invalid attribute. That last part is true, but I have my doubts about it hiding bugs in your code. I guess it depends on how these documents are generated. For example, in Perpetuity, all BSON documents are generated from object state. The only way you can put the wrong data into your database is if your objects are storing things in the wrong instance variables or your mappers are serializing the wrong attributes, which means your testing could use some improvement.

Some also claim that it’s a weakness because it bloats your data — every document has to explicitly specify which attributes hold which values (whereas in a SQL database, this is determined by the value’s position in the row). This is true, but that’s the cost of flexibility. SQL databases aren’t exempt from this type of overhead, though. Every NULL field in a SQL row carries extra cost, as well (though arguably not as much, depending on the column type), whereas document databases can simply leave that attribute out. It’s definitely a trade-off, but I can’t imagine it’d make or break most applications. If keys are a significant portion of your documents’ size and data size is an issue in your application, maybe a document database isn’t the best use case for you.

The last justification is completely outside the scope of this article because I’m aiming for a developer-happiness perspective and data size means sweet frak-all in that light, but I figure someone that reads this would probably mention it.

It plays along with whatever I do

When you start developing on an existing Ruby on Rails application backed by a SQL database, you have to:

  1. create the database
  2. ensure the DBMS you’re using for development has the right user account on it (for example, “root” with no password in MySQL) and configure your app to use that
  3. load your schema
  4. check Twitter
  5. write code that talks to the database

When you start working with an existing app backed by MongoDB, you:

  1. write code that talks to the database
  2. there is no step 2

It creates the DB on the fly. It defaults to no authentication. If you write to a collection that doesn’t exist, it creates that, too. You get to stop worrying about the details and focus on the stuff that matters.

If you’re logged into a PostgreSQL server as a user that has permission to create databases and you try to access a database that doesn’t exist, why is the response “it doesn’t exist”? I can’t imagine a situation where I’m trying to talk to a database that isn’t there and an error is the best result (unless the DB can’t be created). Why do I have to make my intent explicit? When I say “talk to this database”, it pretty much implies that I want to talk to it unless there is no way you can possibly let me talk to it, such as a disk error, network error or insufficient permissions.

I’m not saying there aren’t plenty of times you want things to be explicit in programming. There are a lot of cases where being explicit is superior. This is not one of those times.

Conclusion

Maybe MongoDB isn’t right for your particular use case because your app has requirements that are more important than developer happiness. Maybe your ops person/team doesn’t have enough experience with MongoDB to keep it in their toolbelt. Maybe you need a graph database or table joins or transactions. But for most apps, I use MongoDB because I find it more fun to work with; this keeps me motivated and helps me work quickly.

Get Rid of ‘New’ and ‘Edit’

In Rails, as well as several other “RESTful” web frameworks in various languages, provide 7 combinarions of URLs and HTTP verbs for accessing resources inside the app. In Rails, they are:

  • GET resources
  • POST resources
  • GET resources/:id
  • PUT resources/:id or PATCH resources/:id
  • DELETE resources/:id
  • GET resources/new
  • GET resources/:id/edit

The first two deal with collections of resources. GET retrieves the collection and POST adds to it. The next three deal with retrieving, modifying and removing an individual resource. The last two serve HTML forms for the user to generate or modify a resource.

Let me repeat that: the last two serve HTML forms. They do not actually interact with the resource at all. We put an HTML concern at the HTTP level.

My problem isn’t necessarily that this happens at all. Surely, there are some things that make sense to have a page dedicated to them. But why do we do this by default? The default for Rails is to generate the new and edit form pages. But they’re not really interacting with the resource. They’re not “views”. They’re pages.

Let’s take it out of the browser for a minute and into the realm of the native app. If you were going to create a new item for a list, even if this list were stored on remote hardware, would you ask that remote machine how to let the user enter the information for that? Hell no! You’d display a form you already had prepared. If you wanted to edit an item, would you ask the server for the item’s details? Why would you do that? If you can see an item to tell the application to edit it, you already have its information. The only reason you’d open a connection to the server at all would be to ensure you had the most up-to-date version of that resource.

We can do the same thing in the browser. In the simplest case, we can provide the “new” HTML form inside the index view and the “edit” form inside the show view. With web clients getting thicker by the day like the dude from Super-Size Me, we don’t even have to render separate “new” and “edit” forms. Render the same form for both, but pull the data from the show view into the form. Sure, Rails makes it so we don’t even have to care about populating the form, but this is an example.

To be clear, I’m not ranting. The fact that we add two additional pages to each resource by default does bother me, but only about as much as, say, the effects of continental drift on the field of cartography. It’s more the fact that we’re leaning on the wrong thing for no other reason than “that’s the way we’ve always done it” and I think we can come up with better ways to do it.

Ruby Warts

I love Ruby. I’ve been developing in it as a hobby for 8 years now and professionally for 3. It is still my favorite language to do absolutely anything. But even with that in mind, it has some things about it I’m not sure I like.

Ordered parameters

Note: I use the words “argument” and “parameter” interchangeably here.

This is an implementation detail that frustrates me to no end. Every time you add a new parameter to a method, you either have to add it onto the end or you break its interface. This is an easier thing to do when you’re first creating that method, but if it’s being used in the wild, changing its interface will piss off a lot of people.

Additionally, default values for parameters have to come at the end. You can’t have an optional param with a required one after it. Here is an example of how Ruby works around that. The first parameter can either be the separator or the limit, based entirely on its class. This is a horrible idea. There are very few good reasons to ask for an object’s class in Ruby and branching on behavior isn’t one of them.

Ordered parameters are a relic of systems programming languages like C, where arguments must be in a certain order because they are then pushed onto the stack for the function being called. We have no reason to keep ordered params other than “that’s the way we’ve always done it”.

Granted, for single-arg methods, it’s very, very convenient not to have to write the parameter’s name. For example, array.find(value) is awesome, and if there’s no ambiguity, it’s perfectly reasonable to go without naming.

Rails solves this by using a hash of arguments for most of its arguments and this has now become a common Ruby idiom because of it. This is a bandaid solution because we end up having to get the values of these arguments manually from the hash. It’s an improvement for methods that take many arguments, but it could be improved by adding support directly into the language.

Cannot override shortcut operators

In Ruby, you can override damn near anything. Operators like && and ||, however, are stubborn. Their functionality is hard-coded. Implementing them as methods on the object would be trivial:

1
2
3
4
5
6
7
8
9
class BasicObject
  def && other
    self ? other : self
  end

  def || other
    self ? self : other
  end
end

That would provide the short-circuit behavior of those operators and allow for overriding. This isn’t a major issue and there are very few reasons you’d ever want to override them. I only came across it while trying to create an Array-like interface for Perpetuity’s mapper-query syntax (something like ArticleMapper.select { |article| article.published: true && article.author_name == 'Jamie' } to feel more like Ruby and less like ActiveRecord).

Methods with questionable return values

I mentioned above the IO#readlines method. Its return value also frustrates me; it’s an array of lines with each line containing its separator. Surely there are times when you would want to keep the separator, but I have yet to come across one in 8 years of Ruby (and even in Perl, gets calls were almost universally chomped). The general case is that you just want the line’s content. You could even provide an additional argument to keep the line separator if you like.

When I first saw the method, I figured that it would be shorthand for io.read.split(separator) (IO#read slurps in the entire file to a string), but in reality it’s implemented as a loop of IO#gets which appends to an array. After benchmarking, I found that io.read.split(separator) was faster, but it’s impossible to get it to return the current implementation’s return value including the separators. String#split doesn’t have functionality for including the separator. These methods should do the same thing, even if they do it on different types of objects. Least astonishment was violated here.

Syntax-related stuff

The syntax is mostly awesome, but there are a few things I think could be improved.

Parentheses

Consider the following:

1
names.split(", ")

Why do we surround params with parentheses? I know we can omit them the majority of the time Seattle-style, and this is my preference, but why do they even go there? Personally, in my Ruby code, the only reason I don’t write (names.split ", ") is that it would raise the hackles of most Ruby devs. To me, that makes more sense. You’re using names.split ", " as a single value, so why not enclose the whole thing if you need parens?

Dot as a message indicator

This actually doesn’t bother me at all, but I wonder about other possibilities. When sending messages to Ruby objects (e.g. calling methods on them), we use the dot to indicate that the token following the object is the name of that message. Smalltalk, Objective-C and Fancy (the latter two being derivatives of the former) all use a space to denote a message.

This would make something like RSpec fun:

1
2
names = UserRepository all map &:name
names should include: 'Foo Bar'

I haven’t really thought this part all the way through yet, but clearly we’d need something to signify that &:name and include: 'Foo Bar' aren’t messages but parameters. Maybe the dot was chosen to make this easy. Matz did list Smalltalk as one of his inspirations for Ruby, so I’ve been curious about his choice of the dot as a message token.

What do I plan on doing about it?

I don’t have all the answers and some changes I’d like to make to Ruby definitely require changes to other pieces of the language. I’m also not confident that my pet peeves or desired features of Ruby would happen any time soon even if I posted them to Ruby’s issue tracker. Matz explained that he does not want to make any backwards-incompatible changes in Ruby 2.0. I don’t personally agree with that, since top-level version changes are the perfect time to make revolutionary changes (not to mention, there were a few incompatible changes from 1.9.2 –> 1.9.3). I think Matz is great and I’m ridiculously thankful that he brought this awesome language to life, but I disagree with quite a few of his decisions about Ruby. And that’s okay; not everyone has to agree on everything.

However, I’ve been toying with the idea of building my own language on the Rubinius VM, the way that Christopher Bertels did with Fancy. I’m not sure I’m ready to do that yet, but I think it’d be a great academic exercise as well as allowing me to scratch my own itches. One nice thing about using Rubinius’s VM would be to be able to call out to Ruby code in if I need to (similar to how you can call Ruby from Fancy or Java from JRuby), which would help give me a push start in the implementation.

I still love Ruby

In spite of all its warts and identity crises, Ruby is still my favorite programming language of all time out of the 20ish that I’ve used. I’ve been using it since before I knew what Rails was (possibly before it existed) and I’m sure I’ll be using it for a long time yet. Between the language, the culture and the community, I can’t imagine anything replacing it in my heart or my work.

Separate Unnecessarily Monolithic Apps

In my previous article, I mentioned that if your Rails app takes more than 5-10 seconds to load, you should consider separating it into smaller apps. Obviously, that duration is more of a rule of thumb (because “suggestion of thumb” doesn’t have the same ring to it) than a rigid metric but if your main app contains features that can stand on their own, then by all means, they should.

Examples of features that shouldn’t be in your main app

Blog

A collection of articles that potentially contain links have nothing to do with the rest of your data.

Forums

Sure, you want a place on your site for users to exchange information with other users. This is a great idea for some sites, but it’s probably not the primary reason people type your domain name into their address bar.

Image Gallery

If you have a store app and want collections of images for your products, it’s arguable that the gallery belongs in the app. For small galleries that are there simply to show off a few photos for each product, I’d agree with this. However, if you’re providing a huge gallery (for real estate, for example), I’d argue that the gallery should stand on its own.

Why does it matter?

Having a large, monolithic app isn’t the end of the world. We’ve all built them. However, there are several advantages to separating your apps.

Traffic in one app doesn’t bog down the rest

This can be very important. Let’s say an article on your blog hits Reddit’s front page and suddenly the entire internet comes stampeding to your site. In a monolithic app, users of your main app are seeing massive spikes in response times. These users may or may not have any idea what’s going on with the blog, but either way, this is likely to be unacceptable in their eyes and could cause them to stop doing whatever it is that they’re doing that keeps money in your wallet.

If you’re hosting each app on the same physical machine, this won’t apply, but all cloud providers will offer this benefit.

Testing

To be honest, this is probably my biggest reason. First of all, we’re all human. There is likely to be coupling somewhere in our code that we haven’t gotten around to extracting or refactoring yet. Making a change in a single app of a collection of apps is less likely to break something in this still-crappy code. It’s still possible to break another app if you modify an API somehow, but you just need to be mindful not to change public APIs unless absolutely necessary.

Second, running the tests in a smaller app takes a fraction of the time. If you modified something in the blog, you’re still running the tests for everything else in a monolithic app. If you gain nothing else from splitting your apps, you’ll still reduce your test runtime.

Free Heroku dynos!

Heroku is an awesome cloud-hosting platform for various reasons I won’t go into. For every app, you get one free dyno (compute unit). If you split your app up across 3 Heroku apps, you get additional free CPU time.

Separate databases

This goes along with the first explanation here, but having each app using separate databases (especially if each DB is on a different physical machine) can lead to increased throughput for each DB when it’s under load, especially if you are using something like MongoDB which uses a process-wide lock on write — when it’s writing to the DB, nothing else that talks to that server can read from or write to it.

But what if these satellite apps need data from the main app?

Obviously, there will be times when you’ll need, for example, your forum to share user information with the main app so it knows who is posting. For this particular example, there should be an API implemented on the forum so that the main app can trigger it whenever a user account is created or updated in the main app. The first idea that springs to my mind in this case would be to create a background job and trigger an HTTPS call to the forum to create the user. There are probably several other ways to implement such an API.

Doesn’t it mean that I’ll have to recreate things like User models?

This is a troubling thought I had, as well, when I first thought about this. And definitely, some of the data will need to be the same across your apps, but you don’t need to copy and paste and you don’t need to modify each of the user models across the various apps with every change.

Most of the data in each app will be specific to that app. For example, your forum-post data won’t need to be tracked in your store app and your purchase data won’t need to be available in your forum. Secondly, the behavior in each model won’t need to be the same, so you can rip out any methods that aren’t necessary for the specific app and just keep things like user IDs, names and e-mail addresses the same. I wouldn’t recommend transferring passwords or other authentication credentials between them unless you really feel you need to allow users to login through one of these satellite apps. Transmitting passwords over the network comes with its own security concerns.

There’s also the chance that you may not even need user data in the database. If you can provide the full experience without sharing data between your apps, you should.

Conclusion

Hopefully, you can understand the benefits of splitting monolithic apps. I’ll warn you, it’s not easy the first time you do it. But once you understand how things need to work for your specific case, you’ll begin wondering why you ever wrote those epic apps to begin with.

Data Mapper vs Active Record

Martin Fowler described two patterns of object persistence. Here they are with very simplistic descriptions:

  1. Active Record — Objects manage their own persistence
  2. Data Mapper — Object persistence is managed by a separate mapper class

There are Ruby gems with these names, but both actually implement the Active Record pattern. The DataMapper team, however, is currently working on version 2.0, which will implement the Data Mapper pattern. Also, as an academic exercise, off and on for the past few months I’ve been working on a gem called Perpetuity, an implementation of the Data Mapper pattern. I actually began it about two days before Piotr Solnica posted the above article. The name was chosen because “perpetuity” is the quality of lasting indefinitely.

Active Record vs Single Responsibility Principle

The entire reason for my writing that gem has been caused by the ActiveRecord gem’s violation of the Single Responsibility Principle. All objects and methods in a system should follow the Unix principle of “do one thing well”. Active Record (both the pattern and the gem) combine business logic and persistence logic in the same class.

An object based on the Active Record pattern represents not only the singular object but the representation in the database, as well. Additionally, methods on ActiveRecord classes are intended to operate over the entire collection of objects. This is entirely too much functionality and it should be separated.

The only semi-proper use case for an ActiveRecord-style class is when the class exists solely to represent data (a glorified struct, really) and has no behavior.

Why do we care about SRP?

Most programmers could probably skip this section, but feel free to read through it.

The Single Responsibility Principle is important in computer science because it allows us to make modifications to code without changing every single thing that that particular piece of code works with. It’s like modifying the engine of a car. Let’s say you want to give your engine some more power by adding a performance carburetor. But in order to let the carburetor perform at its peak capacity, you need an intake manifold that’s designed to handle the increased volume of fuel/air mixture. Then you need to install a larger camshaft because a stock cam will still only draw in the same amount of oomph from the carburetor.

But then you realize that your cylinder heads aren’t designed to handle that much juice coming in all at once, so you need to remachine them for that. But then you realize that that only handles the intake. You can pull some serious power into the combustion chamber, but after ignition, your exhaust system has to give it somewhere to go efficiently, so you have to modify that, too!

Before you know it, you’ve shaved the skin right off that yak all because you wanted to change the carburetor.

Applying this to programming, we can write code that allows us to figuratively change the carburetor without having to change the rest of the intake and exhaust systems. We’d be able to change just the carb.

How does ActiveRecord make it harder?

The hardest part about ActiveRecord (the gem) is testing. In order to run a single model spec, I need the ActiveRecord gem because the model class is a derivative of ActiveRecord::Base. Then, in order to instantiate an ActiveRecord class, I need to connect to a database server, just to test a single method that has nothing to do with persistence. When you’re specifying the domain logic of your application and haven’t written a single piece of Rails-specific code, there is absolutely no need for persistence.

This is definitely made ridiculously easy by leaning entirely on Rails generators for all of the boilerplate and then loading your Rails environment in tests, but all that does is move the pain from configuring ActiveRecord specifically for model specs to loading your entire Rails app to execute a single spec. For small apps, loading the Rails environment takes several seconds on a reasonably fast machine. For large apps, loading the Rails environment could take over 30 seconds. If you’re doing TDD properly, this means that a simple red/green/refactor cycle could potentially take several minutes instead of a single minute or so.

Side note: If your Rails app takes more than 5-10 seconds to load, consider moving significant portions of it into another app. I’ll write another post about this soon.

ActiveRecord isn’t all bad

I’m really talking a lot of shit about ActiveRecord here. I don’t hate it. I just disagree with it. The magic of it is what drew me to Rails back in 2005 and now, oddly, we’re learning that that magic is bad.

Reinstate SRP with Data Mapper

Before we step into the Data Mapper pattern, let’s have a look at how Corey Haines and Gary Bernhardt separate concerns in order to achieve their renowned fast tests.

The “Fast Rails Tests” gurus

The way I’ve seen Corey and Gary discuss their tests is that they extract the behavior of their models into a separate module or class and call that behavior from the model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class CalculatesTotalPrice
  def self.for(products)
    products.map(&:price).reduce(0, &:+)
  end
end

describe CalculatesTotalPrice do
  it 'returns 0 for an empty product list' do
    no_products = []
    CalculatesTotalPrice.for(no_products).should == 0
  end

  it 'returns the sum of all product prices for a list' do
    products = [stub(price: 10), stub(price: 15)]
    CalculatesTotalPrice.for(products).should == 25
  end
end

This is an outstanding way to separate behavior from data, but I’m not sure I agree with it. This is not meant as an insult to them — I think they’re both very talented people — I may just have a different view of OOP than they do.

My own views

It is my belief that data and behavior should not be separate. The two are organic to each other and they exist solely because the other exists. They’re like bread and butter, love and marriage, or Jenny and Forrest. They’re soulmates. Don’t split them up.

Let’s see some code

So, since using ActiveRecord means the objects are subclassed from ActiveRecord::Base, that means that the Data Mapper objects are the subclass of some DataMapper base class, right?

Nah, not even close. The idea behind the Data Mapper pattern is that the objects don’t know anything about persistence or even the classes/objects that map them to the database. We just use plain-old Ruby objects!

For example, the Article class can look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class Article
  attr_reader :comments
  def initialize(args = {})
    @comments = args.fetch(:comments) { Array.new }
  end

  def << comment
    comments << comment
  end
end

describe Article do
  describe :comments do
    it 'has an empty collection of comments upon init' do
      subject.comments.should be_empty
    end

    it 'can be given a list of comments' do
      article = Article.new(comments: [:first, :second])
      article.comments.should include :first, :second
    end

    it 'returns a collection of comments' do
      comment = Object.new
      subject << comment
      subject.should have(1).comments
    end
  end
end

All that matters is that we provide some sort of interface to the data so we can persist it. In this case, we just use an attr_reader. Ideally, we’d want to be able to write to it, too, so attr_accessor would be better, but you can use custom getter/setter methods if that works better for that particular piece of data (such as encrypted text).

With ActiveRecord, we wouldn’t be able to add any object to Article#comments that isn’t an instance of the Comment class due to the has_many macro. In the above spec, we don’t care that what we’re putting something that isn’t a comment into the comments collection. We’re only testing that we can put comments into it. Additionally, testing a plain Ruby object is fast. This example runs in 174ms on my machine, which includes loading the Ruby VM and the RSpec gem. At that speed, the feedback loop is limited only by how fast your fingers hit the keys.

The same example using an ActiveRecord class would take anywhere up to 30 seconds in larger apps and requires configuration — all because we inherited from ActiveRecord::Base.

The tight feedback loop isn’t mandatory for developing quality software, but the tighter it is, the more you’ll run the tests and the more likely you’ll actually do TDD properly, which is more likely to result in better code.

Unordered List Helper for Rails

The Rails helper idea of “one HTML element per helper method” is a silly abstraction. I’m not sure it’s the best idea for the general case. Here’s an example:

1
2
3
4
5
6
7
8
9
10
11
<%= form_for @article do |f| %>
  <div class="field">
    <%= f.label :title %>
    <%= f.text_field :title %>
  </div>
  <div class="field">
    <%= f.label :body %>
    <%= f.text_area :body %>
  </div>
  <%= f.submit %>
<% end %>

That includes a lot of boilerplate. All we care about is rendering a form that specifies 2 fields.

Reduce that boilerplate code

With the SimpleForm gem, we can reduce the code down to something like this:

1
2
3
4
5
<%= simple_form_for @article do |f| %>
  <%= f.input :title %>
  <%= f.input :body %>
  <%= f.submit %>
<% end %>

That’s perfect! All of the labels are inserted automatically and the wrappers for the form inputs are handled through SimpleForm configuration with sensible defaults. In this form, we’ve reduced the boilerplate down to 2 lines (submit and form end tag) from 8.

Forms are definitely an area where there has always been a lot of unnecessary code, especially with the power of Rails helpers. I’d actually like to see this merged into Rails at some point, but SimpleForm has a lot of functionality and customization and merging every piece of it would be a bit much. However, we could easily optimize for the general case in Rails (wrapping the inputs in a div and inserting labels) very, very simply.

Now, where else can we do something like this?

Abstracting away any extra code that we don’t need is always a huge win, but where else in Rails can we do such a thing?

Lists

Ordered and unordered lists are one of the biggest areas where we still write code the same way.

1
2
3
4
5
<ul>
  <% @items.each do |item| %>
    <li><%= item.name %></li>
  <% end %>
<ul>

Sure, that’s not a lot of code, but how many times over the course of a project do you see this? If it’s a project of any decent size, it’ll be at least a dozen. Looping over each item within an array (or an ActiveRecord::Relation for people that care so much about precision ;–)) within the ul has always felt odd to me. I know it’s required, but that doesn’t make it sit well. It’s just one of those things.

Awkward feelings aside, how much better would it feel if you could, instead, write this:

1
<%= unordered_list @items { |item| item.name } %>

Regardless of how you feel about the loop within the containing element, this presents more cleanly. We could also use other view helpers, say linking to each item:

1
<%= unordered_list @items { |item| link_to item.name, item } %>

There are other pieces of HTML that go together all the time. I’ll update this article with more as I think of them.

UPDATE: I submitted this as a pull request to Rails, but it was rejected. I’d forgotten that you can already use content_tag_for to iterate over a collection, but I still wanted this abstraction at the list level. I still believe that the way we do one HTML tag per helper method is silly; it feels more like translating HTML into Ruby instead of abstracting away the HTML.

Every ul tag has as its children nothing but li tags. This means that writing li at all is solely to delimit the list items themselves and hence it becomes unnecessary boilerplate that can be abstracted away.

I very much disagree with them for rejecting this and I don’t think the reason of “you can already do this by writing more code that’s harder to read” is a good reason to reject it. :–) However, they are the core team and I’m sure they deal with indignant pull requests all the time, so I won’t add to it.

Perpetuity Object Declarations

So far, in Perpetuity, this is what I’ve got setup for Mapper declarations:

1
2
3
4
5
6
7
class ArticleMapper < Perpetuity::Mapper
  attribute :title, String
  attribute :body, String
  attribute :author, User

  id { title.gsub('\\W+', '-').downcase } # SEO-friendly URLs
end

What ends up happening is that it sees that it can serialise the title and body attributes of the Article model because they’re String objects. But it knows it can’t serialise the author attribute because it’s a User instance, so it saves the id of the author object (it must already be persisted or Perpetuity throws an error … for now) in its place. When loading the association from the database, it calls UserMapper.find(model.author) and places the result into the Article instance.

This is not the cleanest idea, since we would have an actual value there before the association is loaded … and it would be wrong. I’ve thrown around several ideas for this:

Load all associations when the object is retrieved from the database

This has the obvious pro of never having to worry about associations in user code, but we end up retrieving extra data from the database which we may never use. Clearly, a more efficient way of handling it would be to do lazy loading, but then we end up moving away from the Data Mapper pattern and start implementing the Active Record pattern by telling the object how to work with the database.

Granted, this is done through metaprogramming (we would inject the code into the object at load time) so the written code is pure Data Mapper, but it still feels wrong. We do currently apply a little magic to the object to assign it an id when it’s been persisted into/loaded from the database, but I’m trying to limit it as much as humanly possible.

Force a user to load all associations manually

This is the way it is currently implemented and seems to be the most true to the Data Mapper pattern. In order to get associated objects, a user would have to call something like so:

1
2
article = ArticleMapper.find(params[:id])
ArticleMapper.load_association!(article, :author)

This code executes a database query and assigns the proper User object to the author attribute of the article. The disadvantage of this is that we must be mindful of every single database query, but the advantage is that … we end up being mindful of every single database query. :–)

How this is an advantage is that we don’t allow queries to get away from us. ActiveRecord-style associations mean that we can link up objects without caring about the consequences. I’ve caught myself doing this, only to look at the logs and notice that I’ve executed dozens of queries when I only meant to execute 3 or 4 simply because I was treating associations as data instead of separate database rows.

The verdict

If you haven’t already figured this out, I’m strongly leaning toward manual loading of associations, but I want to do something a little cleaner. Calling two class methods in two lines is ugly (I understand that everything being a class method on the mapper class is a code smell in itself, as well, but I’m working on that), so I think something more like this is in order:

1
article = ArticleMapper.find(params[:id], associations: [:author])

That should help keep things clean while still forcing you to think about each query you’re doing. We could also expand this to the Mapper#retrieve method so that we don’t end up doing N+1 queries. Both MongoDB (currently the only supported DB) and SQL (would like to get this one in, but I think I’ll work on that later) support selecting on inclusion of values in lists, so we could optimise from N+1 down to 2 queries.

For example, if we’re displaying a list of blog articles that will be shown with comment information within the list, we could do something like this:

1
articles = ArticleMapper.retrieve(published: true, associations: [:comments])

Instead of iterating over each article retrieved and loading their respective associations (N queries for comments + 1 original comment query), it gets all the articles for the specified criteria and then retrieves all comments whose article ids are in that list of articles — two queries, just as with retrieving a single article and its comments. Just as importantly, it only requires a single line of clean code.