GSoC 2013: UUID Abstraction

Link to mail on the mailing list - http://lists.automattic.com/pipermail/wp-hackers/2013-April/045894.html

Description of the idea:

The UUID Abstraction idea comes for the list of suggestions given by WordPress. On reading through the data available on UUIDs, a number of variants are available. I feel that v5 which uses a SHA-1 hash is a suitable choice even from the security point of view. A v5 UUID is generated by providing a namespace and a name. Given the same name and the namespace, the generated output will be the same.

How UUIDs can be generated?

I searched around the net and found this – https://gist.github.com/dahnielson/508447. The mentioned file contains functions for generating v3, v4, v5 UUIDs.

What namespace can be used?

The namespace can be any of the following -

  • A unique string provided by the user during install/setup
  • The website name (but this can be similar in case of two websites, not sure if it will affect anything in any case)
  • A fixed/common UUID for all installations (not sure if this is a really good idea)
  • A random UUID created via a v4 UUID generator

The name space will be a string except the random generated UUID (from case 4). These namespaces can be converted to a UUID as the v5 UUID requires the namespace to be another UUID.

On giving the idea further thought, the user and post tables can have different namespaces. And this naming can be done by following a format like this (fixed namespace + user namespace)(converted to UUID)

Change in the user table:

A new column for UUID will be introduced where the username will be used as the name for generating the UUID. Since the username is a mandatory field, it will always be present.

namespace + username = User UUID

Change in the posts table:

This table already contains a GUID which is in the form of a URL. It is unique, but for consistency it is logical to implement a UUID in this scenario. Similarly over here, the name provided for generating the UUID can be provided as the post title.

Removal of GUID will not make sense because it is used all over the software for generating the URL in many places. Instead the field can be modified and updated to use UUID in it’s place. The DB schema will be modified to use a binary(16) instead of the current varchar(255).

namespace + post title = Post UUID

What have I done so far with this idea:

Posted my understanding of the idea on the mailing list and even tried to discuss on IRC, but haven’t got any response till now.

Plugin, theme, or core:

The project on completion needs to be integrated to the core. For the entire development, a plugin will be created and all testing on it will be carried out as a plugin only.

Anticipated challenges:

  1. The project deals exclusively with the core and a number of existing files in the WordPress core will require modification. When the plugin created for the project is activated for the first time, it should backup the entire core and the wp_users and wp_posts table. The importance of this operation is that, when the plugin is deactivated, it should revert back to the old implementation, so as to avoid any issues.
  2. The GUID field currently stores the permalink which is utilized in the files as it is. When any import/export operation is run, it uses the GUID from the original source. It is of utmost importance for the UUID to use a unique namespace so as to reduce all possibilities of conflicts at any point of time.

Potential mentors:

I haven’t been able to speak to any of the mentors regarding this idea, though I have floated it to Andrew Nacin. I also spoke to Marko Heijnen regarding the Import/Export idea at the start of the program. As for my preference regarding the mentor for this project, I would like to work under Andrew Nacin or Sergey Biryukov as they both are have a great understanding of the entire code base and they would understand the impacts of the change on the entire WordPress codebase. Moreover, Andrew is a big inspiration for me because of the amount of contributions he has made towards WordPress and I am sure there is a lot that I can learn from someone who worked on a GSoC project himself.

Schedule of Deliverables

Milestones and deliverables schedule: The entire project can be divided into the following parts -

  • Identify files that will be affected after making changes.
  • User module:
    • Altering the wp_user table
  • Posts module:
    • Altering the wp_posts table
    • Changing the code in the files affected by the change
  • Testing
  • Documentation

Community Bonding Period:

  • Studying the WordPress codebase
  • Interacting with other developers

June 17 – June 23:

  • Deciding on the namespace that will be used as this is the starting point of the entire project
  • Understanding how any of the choices will impact users and developers
  • Making list of files/features that will be affected

June 24 – June 30:

  • Commence writing the plugin
  • Setup basic plugin structure
  • Integrating a UUID generation script which will be included in the plugin’s library

July 1 – July 7:

  • Altering the user table to include the UUID field
  • Writing the code that will generate UUIDs for all the entries under the user table

July 8 – July 14:

  • Modify files that were using ID from the user table
  • Improve existing code and fix issues

July 15 – July 21:

  • Looking into how the generated UUIDs can be utilized – something on the lines of improved search
  • Work on some of these features

July 22 – July 28:

  • Testing for any possible bugs in the plugin

July 29 – August 2:

  • Mid semester submission.
  • Self-evaluation of the work completed so far.

August 3 – August 9:

  • Altering the posts table to include the UUID field
  • Making a decision on change to GUID field
  • Writing the code that will generate UUIDs for all the entries under the posts table

August 17 – August 23:

  • Understanding how import/export operations will be affected by the GUID change
  • Making changes to all the affected files which use post’s GUID

August 31 – September 6:

  • Making changes to all the affected files which use post’s GUID

September 7 – September 16:

  • Testing the final code for any bugs.
  • Checking all the operations and attempting to integrate with core.

September 16 – September 22:

  • Self-evaluation of the entire project and attempting to fix any bugs after integration with core.
  • Documentation of the entire project

The first two weeks

The starting weeks with WordPress contributions can not be called interesting. The amount I have learned in a little more than ten days is far from anything that I could have done fiddling with repositories on GitHub.

The first thing that I got to know is that WordPress uses SVN for it’s internal development. Moreover, they have their own tracker called Trac which is good. It does take a little time to get a hang of it, but then it is really great.

Another thing that I noticed is that the contributors take out their time to look into most of the new tickets. Till date I have opened three tickets and one of them was closed a few hours after I opened it as it was a duplicate. The other two tickets are- Help text visibility in title input box is not intuitive and Support for HTML5 roles

I have also been able to submit two patches and one of them got rejected. I did learn one thing from the rejection, that CSS will be preferred at any place when working with WordPress. If something can be done with CSS, then avoid JS.

That’s all for now. Now, I will look into some new tickets and see if I can work on something.

Contributions of a developer