Road to Web 2.0: The Database Design
This is the first of a continuing series of articles that explore making a Web 2.0 application. The concepts are easy to put to use, and aren't that much different (in my opinion) of normal web architecture. In this series of articles I will be exploring my design and development a Web 2.0 application. In this first article I will be discussing the initial database design of com.pendio.us for insight on the choices that I have made for a new style of discussion board. While I am still actively developing com.pendio.us, (it is still rather buggy, but that is to be expected for a week old project) I thought it might be fun to discuss some of my design choices as a longer tutorial style entry. I will only be discussing the design of the actual application, please note that I am using several tools to speed my development:
Xaraya is used as my rapid application development platform. It is written in PHP and uses multiple Databases for it's back end. I believe most of the concepts that I will be exploring could easily be ported to other platforms, but Xaraya is the one that I am comfortable with. Any platform that gives you direct access to the output layer should work just fine. I needed something that would also send back my AJAX request in XML without a ton of work, and Xaraya fits the bill. For those that have never used Xaraya, there will be a learning curve for development, but the end time savings are considerable.
Just a note about Xaraya and this application that I have written, Xaraya offers many “hooks” that I could have used to save me even more time. I decided to write a completely stand alone application and thus I did not take as much advantage as I could have of the platform.
Prototype is being used as my Javascript library. There are many out there, but Prototype seems to be the best documented. In addition, there are many other libraries (as found on their main page) which build on top of Prototype for some amazing features.
MySql is being used as my back end. I considered using PostGreSQL, as Xaraya supports both, but I am more familiar (right or wrong) with MySQL.
The first thing to consider when designing a database is what do you want to do with your application? Thinking Web 2.0 doesn't add anything to that basic concept. Here is what I wanted to do with com.pendio.us:
A forum that free style tags instead of the traditional static category system.
A configurable tag cloud tag that would take more into account than just hits to a tag.
Posts that could be assigned to any or multiple tags.
User Karma that would also play into the tag cloud consideration.
Standard forum functions such as post count, subscriptions, bbcode, signatures, etc.
Better SEO than traditional forums
Expansion of future features with minimum database changes
Post Voting
Mailing list options.
User configuration
Relatively simple, and pretty easy to picture. Basically, I want to combine tags, posts, and users together to determine the popularity of discussions. Looking at my initial specifications, I really only have to store a few text type items and everything else will be integers for look-ups.
For tags, I need to store the tag name, but what else needs to be stored when thinking of a top level of my forum? I probably want to store the time the tag was created to add to the cloud bonus for fresh posts. I probably also need to know when the last post was created for comparisons against other users fields for the “new” icons. If I want standard forum features such as locked, sticky, etc type of tags, then I am going to need a status field. I also might want to save the user that created the original tag for easier look ups. Finally, I want to save the cloud count for the tag itself. My database picture comes relatively clear with these things in mind:
I have a single varchar field to store the tag name. I can't imagine every wanting a tag name to be bigger than 255 characters, therefore I am using var characters. I think as tags as titles not as actual text. The only thing that I did not mention in my thought process was the options.
You will see in all three of my major tables, I host an 'options' field. I am using this field for future expansion. It is awfully easy to store a serialized array in a table for later processing. It does affect performance just a bit, but for those brilliant, AHA! Moments that you have at two o'clock in the morning, it is a easy compromise to make. The one change that I should have made in this table structure was to use tagid instead of id for the primary key name. While I am using tagid throughout my code, this would have simplified my comments just a bit.
For my posts, the thought process was very much the same. I need to be able to capture the text that a user writes first and foremost. In addition, I want to collect information such as the time of the post and who made the post. In addition, like any other forum application, I want to be able to give it a status such as a sticky post or maybe even promote that post to the top of the tag. Finally, I do want to keep post count information to calculate karma for my users. Pretty simple when you think about this design:
My last major sub-system that I am going to have to write for my application is the user data. The majority of my user information is covered by Xaraya, but there is some information that I want to be specific to my application. I want to capture Karma based on my forum. So, I need to know how many tags and posts a user has created. In addition, I want to know when the last time a user created something. Finally, what is the karma information in the end after another user “votes” for you?
This was the most simple of all the tables:
So, now I have my major sub-systems defined. I also need some glue to hold them all together. The easiest way to do this is to create a series of look up tables that can cover how the tables will interact. Once you have the major systems and the look-ups, it is a piece of cake to do a left-join on these tables to extract the meaningful information that my application needs.
With that in mind, what glue do I need? Well, I need to know what post belongs to what tag. Whenever a post is created, I just need to update my look-up table to say, 'hey, this post is associated to these tags'. My look-up table is incredibly simple with that in mind:
I am saving the tagid and the postid. Since both of these are indexed, I should be able to quickly look-up on my joins.
Next, I need to figure out who is subscribed to what tag so I can separate these subscriptions from the cattle. Well, if I am looking for subscriptions then I am basically assigning a user to my tag and thus my look-up would be something like this:
Once again, incredibly simple look up table, but I have associated a user with my tag. Now, I can say, 'hey, I like this conversation, let me save it for quicker access'.
Finally, no forum would be complete without letting you know something has been updated on your next visit. Thinking about this, I need to know who and what tag. I also need to have some times to compare. In my tag table, I have my last post time stamp. I also have various timestamps on each of my various subsystems, but nothing that compared when a visitor visited an individual tag. So, I need to make a field for quick access to that information.
Now, I have something that I can see that user “x” visited tag “x” at this time. I have something that I can compare the timestamp in my tag field against on a user by user basis. I thought about doing this as a session var or even a cookie, as this table will grow substantially with more usage. I may still go back to that idea, but I felt that I also wanted the information for later use as well. With that in mind, I will probably purge this table of visits over “x” weeks old, thus keeping the size to sane levels.
That, my friends, is my design for this database. Thinking about this a bit, I have managed to get myself into a Web 2.0 mindset without really deviating from normal DB design. I have tags, karma, tag clouds all with a traditional system of updating and monitoring.
Next series up is how we access these fields from our standard API with extremely flexible SQL queries.