GSoC 2013: UUID Abstraction

Link to mail on the mailing list - http://lists.automattic.com/pipermail/wp-hackers/2013-April/045894.html

Description of the idea:

The UUID Abstraction idea comes for the list of suggestions given by WordPress. On reading through the data available on UUIDs, a number of variants are available. I feel that v5 which uses a SHA-1 hash is a suitable choice even from the security point of view. A v5 UUID is generated by providing a namespace and a name. Given the same name and the namespace, the generated output will be the same.

How UUIDs can be generated?

I searched around the net and found this – https://gist.github.com/dahnielson/508447. The mentioned file contains functions for generating v3, v4, v5 UUIDs.

What namespace can be used?

The namespace can be any of the following -

  • A unique string provided by the user during install/setup
  • The website name (but this can be similar in case of two websites, not sure if it will affect anything in any case)
  • A fixed/common UUID for all installations (not sure if this is a really good idea)
  • A random UUID created via a v4 UUID generator

The name space will be a string except the random generated UUID (from case 4). These namespaces can be converted to a UUID as the v5 UUID requires the namespace to be another UUID.

On giving the idea further thought, the user and post tables can have different namespaces. And this naming can be done by following a format like this (fixed namespace + user namespace)(converted to UUID)

Change in the user table:

A new column for UUID will be introduced where the username will be used as the name for generating the UUID. Since the username is a mandatory field, it will always be present.

namespace + username = User UUID

Change in the posts table:

This table already contains a GUID which is in the form of a URL. It is unique, but for consistency it is logical to implement a UUID in this scenario. Similarly over here, the name provided for generating the UUID can be provided as the post title.

Removal of GUID will not make sense because it is used all over the software for generating the URL in many places. Instead the field can be modified and updated to use UUID in it’s place. The DB schema will be modified to use a binary(16) instead of the current varchar(255).

namespace + post title = Post UUID

What have I done so far with this idea:

Posted my understanding of the idea on the mailing list and even tried to discuss on IRC, but haven’t got any response till now.

Plugin, theme, or core:

The project on completion needs to be integrated to the core. For the entire development, a plugin will be created and all testing on it will be carried out as a plugin only.

Anticipated challenges:

  1. The project deals exclusively with the core and a number of existing files in the WordPress core will require modification. When the plugin created for the project is activated for the first time, it should backup the entire core and the wp_users and wp_posts table. The importance of this operation is that, when the plugin is deactivated, it should revert back to the old implementation, so as to avoid any issues.
  2. The GUID field currently stores the permalink which is utilized in the files as it is. When any import/export operation is run, it uses the GUID from the original source. It is of utmost importance for the UUID to use a unique namespace so as to reduce all possibilities of conflicts at any point of time.

Potential mentors:

I haven’t been able to speak to any of the mentors regarding this idea, though I have floated it to Andrew Nacin. I also spoke to Marko Heijnen regarding the Import/Export idea at the start of the program. As for my preference regarding the mentor for this project, I would like to work under Andrew Nacin or Sergey Biryukov as they both are have a great understanding of the entire code base and they would understand the impacts of the change on the entire WordPress codebase. Moreover, Andrew is a big inspiration for me because of the amount of contributions he has made towards WordPress and I am sure there is a lot that I can learn from someone who worked on a GSoC project himself.

Schedule of Deliverables

Milestones and deliverables schedule: The entire project can be divided into the following parts -

  • Identify files that will be affected after making changes.
  • User module:
    • Altering the wp_user table
  • Posts module:
    • Altering the wp_posts table
    • Changing the code in the files affected by the change
  • Testing
  • Documentation

Community Bonding Period:

  • Studying the WordPress codebase
  • Interacting with other developers

June 17 – June 23:

  • Deciding on the namespace that will be used as this is the starting point of the entire project
  • Understanding how any of the choices will impact users and developers
  • Making list of files/features that will be affected

June 24 – June 30:

  • Commence writing the plugin
  • Setup basic plugin structure
  • Integrating a UUID generation script which will be included in the plugin’s library

July 1 – July 7:

  • Altering the user table to include the UUID field
  • Writing the code that will generate UUIDs for all the entries under the user table

July 8 – July 14:

  • Modify files that were using ID from the user table
  • Improve existing code and fix issues

July 15 – July 21:

  • Looking into how the generated UUIDs can be utilized – something on the lines of improved search
  • Work on some of these features

July 22 – July 28:

  • Testing for any possible bugs in the plugin

July 29 – August 2:

  • Mid semester submission.
  • Self-evaluation of the work completed so far.

August 3 – August 9:

  • Altering the posts table to include the UUID field
  • Making a decision on change to GUID field
  • Writing the code that will generate UUIDs for all the entries under the posts table

August 17 – August 23:

  • Understanding how import/export operations will be affected by the GUID change
  • Making changes to all the affected files which use post’s GUID

August 31 – September 6:

  • Making changes to all the affected files which use post’s GUID

September 7 – September 16:

  • Testing the final code for any bugs.
  • Checking all the operations and attempting to integrate with core.

September 16 – September 22:

  • Self-evaluation of the entire project and attempting to fix any bugs after integration with core.
  • Documentation of the entire project

4 thoughts on “GSoC 2013: UUID Abstraction”

  1. Regarding the post table, “Instead the field can be modified and updated to use UUID in its place” I assume you’re talking about updating the DB schema to specify the field is a Binary(16) rather than a VarChar(255), or did you have something else in mind? (ref: http://forums.mysql.com/read.php?98,49626,49626#msg-49626).

    Also, you don’t talk about how UUIDs will be used with the Users table. The original idea on the Codex GSoC page suggested replacing the auto-incrementing user ID with a UUID; your proposal suggests adding a new column instead. What’s the benefit of a new column? Will there be any substantive changes to user lookups (i.e. http://site.com/?author=5 => http://site.com/?author=d662f68d-45d3-4974-875f-69ce68cb49f5)?

    1. Yes. I am talking of updating the DB schema only. Sorry, if I am not clear over there. I will edit the final document and add more details to it.

      As for the UUIDs in the users table, I am a little unclear about my the decision on the removal of the ID field. I would like to keep it there as it is but change the functions to use the UUID. That is why I write one of the goals as modifying the files that will be affected. Thank you for making me notice the glaring mistake over there. I will modify it and specifically mention what I wish to achieve over there and how I will be doing it. There’s no firm reason for removing the ID field, but I would like to stay the way it is. Actually, it will not make much of a difference if we change the ID field to UUID. Only the auto-incrementation will be removed and the UUID will be generated every time a new user is added. I would like to discuss this fact again with more people and see if bringing in the change to ID will actually bring a better change.

      Thank you for the review.

  2. I’m glad to see someone so interested in taking on this challenge. It’s a big step toward migrating content between staging & live environments.

    That UUID library should be good to get started (I’ve used it before — it’s reliable), but its original source is a PHP.net comment and those are automatically licensed CC BY 3.0 (http://www.php.net/license/index.php#doc-lic), which isn’t GPL compatible. If this plugin is going to be released on wordpress.org or — even better — merged into core, you’ll need to find or make a library with compatible licensing terms. You & your mentor will need to figure out when in the GSoC process that actually becomes a concern, though, and hopefully it doesn’t slow down the rest of the work too much.

    1. Thank you very much Dave.

      Even I went through that comment regarding the UUID generation script.
      For development of the plugin, we can use the script, but once it’s actually going into a release cycle, then we can look into the development of a new library to do the same. It shouldn’t be a very difficult task. Thank you for bringing it to my notice.

Leave a Reply