If you’re a WordPress user who is even remotely concerned about security — and, honestly, as a citizen of the internet, security should always be on your mind — the past few months have likely been a little crazy for you. Over the course of 16 days, the WordPress core team put out both a major release (4.2) and three critical security releases.
And, as Andrew Nacin revealed at LoopConf in early May, WordPress 4.2 included a “secret” security fix. Months in the making, this clandestine fix was cleverly disguised as emoji support.
So, if you really want to get pedantic, that was technically four security releases in a little over two weeks’ time. However, it’s likely that all of these releases — with the exception of 4.2, of course — were applied to your site without you even noticing. Automatic updates FTW, right?
As I’ve said in the past, WordPress core is secure. That statement is as true today as it was two years ago. And since the release of WordPress 3.7 in October 2013 — which included the aforementioned automatic updater — the core team has done an extremely good job of keeping things that way.
For anyone who tends to be OCD about the little numbered dots in their dashboard, however, things have probably gotten a little overwhelming as of late.
According to the phenomenal WPScan Vulnerability Database, April and May 2015 saw the release of 139 non-core (see: plugin and theme) exploits. And while the numbers were skewed by the large number of `add_query_arg()` and `remove_query_arg()` XSS fixes pushed out in April, that’s still an upward trend from previous months.
But that’s actually okay. Every single vulnerability that’s found, and every single fix that’s released — be that to WordPress core, a plugin, or a theme — only continues to make things stronger.
As long as you keep things up to date, that is.
Staying on top of updates might be an easy task for someone who has a handful of sites. But what about people who maintain dozens of sites? Fortunately, tools like InfiniteWP, WP Remote, and ManageWP exist for this exact purpose. Some also possess additional features like backups and centralized single sign-on. If you’re one of the harried folks suffering from update fatigue, they’re all worth a look.
There’s also a great solution available for the ridiculously busy or incredibly lazy. As of a few months ago, the popular Jetpack plugin now possesses functionality that will allow you to put your site’s plugin updates on autopilot. I use it on a few of my personal installs and love knowing that they’re taken care of.
If you consider yourself the do-it-yourself type and want to have your hands on all parts of the upgrade process, the least you can do is know what’s happening on your sites. I suggest following the @WPVuln Twitter account to get updates on vulnerable plugins or installing Plugin Security Scanner. PSS leverages the aforementioned WPScan Vulnerability Database and sends nightly mails to site admins if any of their plugins are found to be exploitable.
No matter how you chose to engage potential security threats against your site, there’s literally something for folks from every skill and comfort level.
Do you have any favorite tricks that I might’ve missed? Disagree with my assessment of the state of WordPress security? I’d love to hear what you think in the comments!
Andrew Nacin: Thank you very much. This has been a really great conference so far, right? Yeah. You guys have to wake up a little bit and then you can get lunch after this. I promise. Like, honestly it’ll happen. It’ll be there, but you’ve got to be awake. So my name is Andy Nacin. I’m one of the lead developers for WordPress. I’ve been contributing to the project for about five years. I’ve also led a number of releases. I’ve led WordPress 3.5, 3.7, 3.9. Very different releases. 3.5 was like obviously, very media heavy. 3.7 added automatic updates, thank God. And 3.9 was more focused on the visual editor making that a lot better. That said, if you followed along closely with core development, you might’ve noticed a drop in activity for me probably right around the end of 3.9.
Andrew Nacin: I have somewhere around like 6 or 7,000 commits to WordPress over the years. Only I think Ryan Boring, one of the original lead developers has more than that. Uh, but what happened is towards the end of WordPress 3.9 last March or so, Jetpack had this really crazy vulnerability and we had to suddenly coordinate and manage automatic updates for plugins for the very first time. Now, if you follow along the community, you’ve seen that, hey, we’ve actually done this a few more times in the last few months. So we’ve written up a lot more about it. But nonetheless, like this issue existed. And so Mike Schroeder, who is my co-lead for 3.9 ended up taking over the release and I like locked myself in a closet for three weeks working on this other thing. At which point after 3.9, I ended up pivoting away from core a little bit out of necessity to focus on WordPress.org and security.
Andrew Nacin: I’ve been working with the security team for some time, but nonetheless we had a lot of things going on that we needed to be very careful about. So my time, despite essentially like leading a lot of the day to day of a major, very popular open source project, actually took place in private. There was not a whole lot of public activity. So what I want to do today is perhaps share with you some of that activity that has happened over the previous few years. So my time now of course is very limited. I’m currently not full time on the project. Most of you know that in January I took a leave from working for Matt Mullenweg, the WordPress founder, to work full time at the U.S. Digital Service at the White House. We have like a very broad mandate from the very top to really just transform how government works and fix a lot of things and all these really bad problems, whether it’s healthcare.gov or a veterans affairs or the very broken immigration system.
Andrew Nacin: I think what we do is pretty amazing and it’s a lot of fun. However, I love saying this too, this talk is completely in my personal capacity as lead developer of WordPress. It is not representative of the views of the administration, the White House, the Executive Office of the President, et cetera, et cetera. And I want to say that in particular because while the title of the talk is Anatomy of a Critical Software Bug, we’re actually going to be talking about the most critical version, which would be anatomy of a critical security bug. I was a little afraid as to whether I would actually use this talk title. This was on the website for like an hour last week and then it changed over to software and the reason was because the thing I was going to be talking about was released on Wednesday and I needed to make sure it’d be released first before then coming here.
Andrew Nacin: So, I will say that this is being live streamed was something I was actually hesitant about because of the topic. Of course it’s being recorded. I’m sure many of you are on Twitter. You’re going to have to pay attention to this because if you miss a slide, you’re going to like, have no idea what I’m talking about. Just to warn you. So I’m going to tell you a little bit of a story, and I want to apologize and thank three people. Nikolai is here. You heard from him earlier. Also Mike Adams and Gary Pendergast. These are three core developers for WordPress and they have been suffering through these problems for the previous two years with me. So, a lot of the words that you’ll see here were not necessarily written by me, but in many cases by them.
Andrew Nacin: Now, there’s a lot of vulnerabilities that you’ve heard of. Of course, like cross-site scripting, CSRF, SQL injection. I find these to be really, really boring. It’s easy to fix them. It’s easy to work around them. There’s some cooler ones that are actually really interesting. XML entity injection, SSRX, server side request forgery. Mike Adams actually gave a talk at Word Camp in San Francisco a few years ago talking about some of these. Really great talk. I find these interesting, but nah I’m not going to talk about those either. I both like and of course hate those vulnerabilities that are incredibly complex, the complicated ones. The ones that make you think and make you wonder if you can actually fix it. Of course that is never a good feeling to have. Is this bug even fixable? It’s especially not good when it’s a security issue. Now a lot of times what ends up happening when you’re dealing with anything related to building software is this kind of like polar opposite spectrum of security and usability.
Andrew Nacin: The most secure piece of software is probably one that doesn’t exist. The most usable piece of software is probably one that does not have passwords or users or really anything and just anyone can log in. How do you kind of find that balance? And so with the WordPress project as an example, we are backwards compatible and we treat that as a piece of our usability. We don’t let it hold us back on features. We do break things when we need to, but it’s really important. And then how do you actually make that balance when you’re going between a normal bug or an edge case or whatever it is, and then go into a security fix, because now you might have to break something and you might have to break something in a very, very bad way. So this particular vulnerability actually dates back quite a long ways in terms of the backwards compatibility thing.
Andrew Nacin: In fact, it dates to 2001. Before WordPress existed when it was b2, they use MySQL like a lot of other things out there. And the one thing that they never did is they never turned on MySQL strict mode. If this line of code had existed in 2001 or even 2005, or even hey, just before I started in 2009, I wouldn’t need to talk about any of this and we wouldn’t have had probably like the worst and most complex vulnerability and fix that WordPress has ever had. Strict mode, if you’re not familiar, does things like, it rejects the query when something is invalid about it. It’s an invalid string that it can’t parse because it’s a character that’s not recognized by that database. Maybe it’s like an invalid date. You’re trying to pass a string into an integer field, whatever it might be.
Andrew Nacin: So if this line was there, that would be good. It’s gotten to the point actually where I’ve kind of thought that, like if I ever go for a job interview and they said, do you have any questions? And I would say, did you enable strict mode? Like that is, that is my thing. And you look at this, if you actually go and look at like MediaWiki and Drupal and Joomla and all of these. Page PBB, like they all turned on strict mode. Oops. So we kind of forgot about this one. Now when strict mode is off, MySQL will just kind of guess. It’ll say like, oh, that string, we’ll just go and convert it to a zero. Or that date we’re going to convert it to, I don’t know, even sometimes they just let the invalidates kind of sit in there. If the value is too long for the field that just chops it off.
Andrew Nacin: So if you have like a username field for the characters, try to insert something too big, it’ll just chop it. So pivoting away from strict mode, we can then talk about like how perhaps the lack of strict mode can cause a problem. So I’m going to introduce a kind of a class of vulnerability called object injection. It has a number of other terms as well, but as many of you know, WordPress uses PHP serialization for options and metadata posts, comments, users. There are some benefits to this. For example, it is not lossy like JSON is. Or certainly, it’s much less lossy. JSON can only store. Like it can’t store a real object, but it can store like an associative array as an object and that’s about as far as it can go. But even then, once you convert it to an associative array, you don’t know if it was an object originally or if it was array.
Andrew Nacin: So PHP, the way it serializes things down, this is actually on the left. You have the value that you normally see on the right is a string representation of that value. I deliberately left the floating point decimal in there as a joke, but that is what PHP converts it to. But you can see in particular the fifth one here, how it takes that class and like creates a class out of it. And it will come back as an object to the point where this is a WP post object serialized. And you can see in particular it is stored in the fact that it is a WB post. And so when it gets unserialized out of the database, you get back an object. I’m not saying you should do this. In fact, you probably should not. If you’re going to use non-scalar values like razor objects that you’re storing inside metadata, you should really consider what you’re trying to do and try and simplify the data structure as much as possible.
Andrew Nacin: The problem here is when we can get to arbitrary unserialization. Serialized data has complex stuff. It has numbers and strings sure, but it also has like, you know, really bad things like objects. So imagine we have this class right here and we have this class and it was, you know, this is actually fairly standard, right? Like we have a file somewhere on our property and then something gets set and whatever else. And then like when the object is simply destructed than it just deletes the file. Well now imagine if the serialized data looks like this and maybe imagine if it wasn’t deleting the file but doing something else, printing it out, sending it in an email, whatever it might be on destruct, something is happening. And now, so it’s very important for users to not be able to control it because then essentially what would happen is that this thing gets unserialized and it becomes this.
Andrew Nacin: So that’s not good. Now, as I mentioned add option, update option, all of these functions serialize automatically and it does this in a few different ways. First it says like, if it needs to be serialized, if it’s a non scalar value, if it’s an array or an object, then we serialize it. And if the data already look serialized, serialize it again. There has been a … We almost broke this at one point. There’s been a comment here for years. The first one, double serialization is required for backwards compatibility because it broke a lot of things when we tried to do it. This morning when finalizing my talk, I committed the set in the last line here. Also the world will end.
Andrew Nacin: In reality, this is not just a backwards compatibility fix. This is actually like the world will end. Everything will be insecure. And I want to explain why. If we have a string that a user supplies, let’s say the user actually passes this first string here. It’s a string, as their blog name and update option is like, oh cool, it’s an object, it’s already serialized. I’ll just send it in there. Well, when it comes back out, what’s WordPress going to do, it’s going to look at it and it’s going to say, oh, it’s already serialized, I should unserialize it. Now I have an object for my blog name. So, you might be familiar with like HTML editing and coding where like you’re escaping like an ampersand to amp or whatever it might be. This is kind of that same similar way of like multiple layers of escaping.
Andrew Nacin: So instead what we do is that we reserialize it, and now as you see, like that was just a string. You can see the S, the S means string. The 19 means it is a number of characters that that whole thing is. And then of course that is the string closing quote, semi colon, end of your serialized string. And we can repeat this for an object or first string. However, it works. Like we see above, we have a string here, same deal. The problem here is that like how do we actually determine whether something is serialized? Well, pretty simple actually. We say like if it’s serialized then serialize it again, or if it’s serialized and then unserialize it when we’re getting that data back out of the database. We can’t unserialize too many layers because then we have a big problem. So we can look at this particular function and this is what it does. Also in PHP, all of the little characters that flag it and spell out bad ions. It’s kind of cool.
Andrew Nacin: So this is what it does. If you’re not familiar or familiar with various questions and you’ll be able to see this. If badiONs had started the string, then a colon, then it’s going to be looking for, some set of numbers than another colon. It’s looking for a quote, then all sorts of stuff. And at the very end of the string, it wants either a brace for certain ones of these or a semicolon for others. In practice, this is not actually implemented as a regular expression in WordPress, it is using only string functions because it’s faster. But basically we can break it down to this, right? Like this is what it is. Starts with, ends with. So why does this actually matter? Well, you might notice on the left, the Trigram for heaven, otherwise known as the hamburger icon at this point, the hamburger and on the right, I actually don’t remember what this one is called, but it’s not just that there are three lines and four lines here. It’s actually the one on the left is a three byte multibyte sequence, which you can see here at the bottom.
Andrew Nacin: The one on the right is a four byte multibyte sequence. So MySQL’s default UTF eight character set. I’m sure you’ve seen this in WP config many, many times, only stores three bytes. It will not store the fourth byte. It made up its own custom little character set string called UTF8MB4, that then will allow you to store all of this byte. So if, let’s say you don’t have UTF8MB4, which also means you don’t get Emoji by the way. So there’s some Emoji sprinkled throughout the stock Emoji is in the four byte character plane. Let’s say we sent in this value here with a little, like a four byte character at the end. It goes into the database and MySQL says, oh, I’ll just chop off everything after here. So we didn’t sterilize it because it didn’t end with a brace. It ended with like random other text. So it’s fine. It’s just a normal string. It’s my blog name. This is an awesome blog name, by the way.
Andrew Nacin: Now here’s the problem, of course. It comes back out and unserializes it. Oh no, we have an object. Right? And so, like, basically now we have like this. That is a bad, bad thing. So as you know in this case, the nice thing about this vulnerability is that it requires a vulnerable object to actually exploit. But of course we get to this point where otherwise we’re dealing with arbitrary code execution based on really whatever exists on here. So we looked at a few different options, like how do we solve this problem? Well we thought about this. We’re like, oh we can like preflight them, we can actually take this value and based on the table we were trying insert it into and the field we were trying to insert into, take this value and like query the table and figure out what its character set is.
Andrew Nacin: And then once we know that information we can then run another query that looks like this, that says convert it for me and let me know if it changes. If it doesn’t change, then we know that it can be stored. If it does change, we have a problem, we have to reject it. So we’re basically like implementing our own little, like preflight checks directly in core. So there’s a lot of complications here. In fact, when we were looking at all this stuff, we would look at how other database drop-ins react, whether it’s like the one in W3 Total Cache or like a PDO library or even a SQLite implementation, if you wanted that sort of thing. But then also something like HyPer DB which has to send queries to different data centers. So we were like, how could we trick HyPer DB to know that there’s a table here to send it to?
Andrew Nacin: So that’s a comment, and then it says, oh, wp postmaster that’s in this data center over here, I’ll go run this query over here. It’s really just to trick stupid software, right? So the problem, of course, that like this right here, this is what like 15, 20 lines? It was just like our very first proof of concept when we were like, hmm, how could this work? Right? This evolved into like a few hundred lines of code because then we started to catch all these edge cases and then we started thinking about all this and we’re like, oh man, how are we going to really solve this problem? And of course, like this overhead here, right? Every single time we wanted to insert, insert something that is serialized into the database, we would then be doing a query to get like character set information and then doing another query to convert it.
Andrew Nacin: So every query add two more. That is not good. That is more overhead for a system that, without at least caching in place, is they’re already running a decent amount of queries on a page. So we started sitting down and we were trying to figure out like how else can we possibly solve this? And so we looked at this and we said, we actually sat down at Word Camp San Francisco and said, well, we could try MySQL strict mode, which would solve all of our problems with the one line of code. It would also break everything. So can’t really do that. Two, we were wondering what if we could loosen up is serialized a little bit. What if when it trying to determine whether it needed to serialize something, we just don’t actually anchor it to the end of the string? And so, as you can see here, all that is removed from these sample regular expressions is like do not anchor it. Just at some point in the string, look for this semi colon or for this brace. And yay. No preflighting. Problem solved. Talk over. It was not nearly crazy.
Andrew Nacin: It was like 15 lines of code despite having written 2000 of them between the tests and all the preflight stuff that we had done in the previous time when we were working on this. Then we got another email. After this was released, this was released in WordPress 3.6.1, two years ago. Get another email later on that said, multibyte comment injection. I was like, uh oh, that doesn’t sound good. So a really great guy, Cedric Van Bockhaven, was the reporter. Worked with them for a while on this and he said three things. He said, this attack relies on truncating everything after a four byte character, at which point all of us have deja vu immediately and we’re like, oh, this is going to be fun.
Andrew Nacin: And then it’s, oh, but don’t worry when MB4 is used, this attack won’t work. We have been talking about adding MB4 to core for a while to get Emoji support. So Emoji and security were related. There you go. And then it said it depends on the theme and I was like, why does it depend on the theme? So we started looking at this and it gives us this actual proof of concept, which is this. You can kind of see where this is going maybe. This gets inserted into the database. This is a comment. The quotation tag is allowed as an element in the comments. It goes in, MySQL cuts it off, goodbye. That’s now the entire comment. Now we have an open HTML tag and then another comment goes in and says that. Now Casey’s looks at both of these and they’re like, oh, that’s fine. First one isn’t even HTML. The second one is just some text. Combined though, these single quoted attributes can just span multiple elements, close at a later date in a future comment even.
Andrew Nacin: And you can see where this could be a problem where someone could, let’s say, post a comment that is like actually a really good comment and then it gets approved and then depending on the settings in your discussion settings, all future comments by that person is automatically approved. So if you’re a good enough spammer, we have a problem. And the problem is that because this was involving two different comments, this wasn’t even easy to detect with a kismet. Because a lot of these times we actually, when we talked to the KISMET folks, we’re like hey, we have this thing, is there a way that we can possibly mitigate this while we’re still working on a fix?
Andrew Nacin: We looked at this and we’re like, maybe not. So we’re realizing, hmm, why is this working? And I cut out some of the HTML here, but the reason why this worked is because there weren’t other elements in the way that had single quotes. And so we’re like, oh, it’s on the theme. Great. So I downloaded all 200 something WordPress.com themes. They actually … They exist in a public SVN repo which is kind of cool. And then wrote like these few lines of code. This is actually not the code I wrote. I was just playing around with this yesterday. I wrote something else then that was probably longer and more complicated. And like on every page refresh, it would switch to the next theme in the list. And so I sat there, refreshing the page and seeing if it like sent up an alert box, refreshing the page, seeing if it’s sent up an alert box.
Andrew Nacin: And I came up with like, oh, only like 10 out of 250 themes were affected. That’s not that bad. And this is mind you like the first night that we received this, we just received this report and we’re just like, oh, like I’m just going to play with this and this is, you know, probably 2:00 AM in the morning. I’m not really thinking about it. And then I’m realizing, oh, the single … The themes that aren’t working, that don’t have this vulnerability, it’s because in core the comment reply link uses a single quote, thank God and like prevents this from working. But there are other ways around that as well. Gravatar, like get avatar that function to WordPress returns everything as single quotes. It was preventing a vulnerability for us. But the problem is that this was not dependent on the theme, this was dependent on like just generally what the settings were of the site, the positioning of the reply links, how the comment walker was actually designed for that theme. Everything.
Andrew Nacin: And then we were realizing, okay, this is not just the comment and then the following comment, this is any two fields in the database. Everything is affected. The post title and the post content. I don’t know, the user description and also like they’re some other link to something, right? Like the link to their website or whatever it might be. So I don’t know. WordPress has 11 database tables plus the multisite stuff, 11 database tables, something like 150 fields. Maybe more than that. Every single one of them affected. his would have been really nice in 2001. So we can’t do it because the sky will fall and like everything will break and not just like plugins will break. Core extensively relies on like non-strict mode functionality, but we must do it because the sky will fall. Yay.
Andrew Nacin: So, you know, we’re kind of crazy. So let’s just implement strict mode in PHP. So, we like started playing around with this idea of like what if in WPDB prepare, which is where most, most arguments go through. We just strip out all of the four byte characters. So yes, those were all real big things. Big Five is an actual character set. UT at most. Almost every WordPress site out there uses UTF8. Some of the older ones use Latin1. WordPress.org uses a mix of those two. WordPress.com actually uses exclusively Latin1. Latin1 can store everything. UTF8, well obviously we have a bit of a problem and all of these other ones we have a bit of a problem. So we’re like, oh good. We can just like remove these characters. So this is particularly like the character sequence for … In this case, we’re looking for a four byte character.
Andrew Nacin: So this is like one character that particularly is the start of that four byte character. Then three more characters after it and then we remove them. The problem is that for example, like some of these are valid characters in other character sets. And we started thinking about this. We’re like, this is not good. We have no good solution for this. We will never be able to fix this. And like that despair continued to come back very repeatedly throughout this process. I think we went through like a hundred and 50 different iterations of this. We wrote easily 10,000 lines of tests, a lot of code. So also this patch was like, oh, we’ll just remove all the four byte characters and no one gets Emoji. But it didn’t catch other things like an invalid UTF8 sequence. This is a three byte sequence. This is like a normal character, but it’s invalid. And MySQL is like, I don’t know what those three bites mean. I’m going to just truncate your field.
Andrew Nacin: So this actually, when we’re looking at this, this was the original report, it doesn’t depend on the theme, it doesn’t get fixed if you’re using UTF8 and before and it’s actually not specifically limited to four byte, multibyte characters. So essentially just realize that everything, everything is bad, everything is broken, the world is on fire. Okay. So you can understand … Like as we’re discovering all these pieces we are, we’re deeper and deeper and deeper in this hole and we have no idea how to get out until like, all right, we wrote this preflight and stuff a while ago and didn’t need to use it.
Andrew Nacin: So we went back and we said, okay, we’ll preflight not just like the queries that are inserting serialized data, which is a fairly limited process and is occurring only on certain rights, but we will just preflight all of them. So this is an actual comment from our private bug tracker. Why don’t we just like reach in to prepare and then do this, like bake it into a low level piece of WordPress that shouldn’t even be involved in this process. And then we’re like, oh, well if we do this in prepare, prepare doesn’t always know what’s happening. Prepare is just like it’s a string or it’s an integer or it’s a float. It doesn’t actually know what column it’s corresponding to and columns, columns and tables all have their own character set. In fact it’s even, it’s even worse than that.
Andrew Nacin: So, and then we also had this problem where like not all right queries will actually go through update or insert. Delete, of course, is technically a right query that that one works. But then we have, for example, an add option, add option, in order to avoid a race condition actually calls insert into on duplicate key update. And it calls directly like wp, the method query. And so in that case, what we have is we have this entire query string and I mean, so at this point we’ve now gone beyond the idea of like writing MySQL strict mode and then we actually start thinking about writing a sequel parser in PHP. [inaudible 00:25:57]
Andrew Nacin: We checked out the libraries, they were all appropriately insane. And we’re like, yeah, no, this will work fine. Sure. No, no very bad idea. We have some … We have some level of tolerance and that was way beyond it. So we’re like, okay, what if we like preflight the query string because what we can do is we can just run the entire query string against this database and say like does it get truncated? Not the individual values but the whole thing and if some of it gets truncated, we know all of this query is wrong so we can reject it. So we just actually do this like select convert. This is an actual query by the way convert is just a function you don’t need a from on this or anything like that. [inaudible 00:26:35] posts, using UTF8, see how it comes back.
Andrew Nacin: The problem of course is that we have this issue where there are a lot of different character sets that are being used and we don’t know what is going on in a particular query. So we reached back into HyPer DB. HyPer DB, again, is this multi data center, multi database, multi environment, whatever you want it to be for like charting out WordPress databases, particularly multisite. And we’re like, oh yeah, sure. So what we’ll do is we won’t write like an entire SQL parser, what we’ll do instead is we will just like extract the table name and then query that table and then figure out what the character sets are for the fields. This is really easy. It’s just a few regular expressions. It’s not that bad. To get like, this is from HyPer DB that now WordPress has. Yay. Get table from query.
Andrew Nacin: It’s a property on WPDB. Look it up. Not now. Pay attention first. So the problem is that a tables character set. Now Gary Pendergast worked at Oracle and MySQL for like five years. We had a number of other people here who have a decent amount of experience in this. When I started this whole process a few years ago, I knew nothing about character sets because they’re very confusing. Now I know everything about character sets, which also is really confusing and kind of sucks. But a tables character set in MySQL is only the default for a new field. So if you have the alter table add add column, that’s when you’re going to use that table character set. Otherwise it doesn’t matter. So we went back to realizing, oh crap, we need all of the individual fields because in a single table you might have a … You might have like two different fields with two different character sets, one might be UTF8, one might be Latin1, something along those lines.
Andrew Nacin: And we’re like, oh, this’ll never happen of course. And then we check WordPress.org and we had it happen. So WordPress.org isn’t exactly a shining example of a WordPress installation. But nonetheless we had a table that had these, this mixed bag of character sets and they’re like, oh crap. So each field can have it’s own character set. That is not all. There are a lot of different character sets and collation properties in MySQL. The client, PHP, is setting a character set. The actual connection has its own character set. The server, the database, the table, the column, everything has its own character set and also collation. Lot of stuff going on here. A lot of ways that this can go wrong and we’re actually reading like PHPC code and MySQL code to understand how, which one we need to be converting at which time and whether if we have a discrepancy, how do we solve this?
Andrew Nacin: Now for the most part, this corresponds to a fairly small number of sites. Most of them are UTF8. That one we can handle, hypothetically. Latin1 takes absolutely everything, so that’s fine, but then we start realizing like we need to go back to per column. So we’re wondering, we’re like, oh well for like insert and update, we can just go back to the per column checking and then in query we can go back to checking the entire thing just in case, like as the fallback, at which point the reply was why. But we had to. And like basically it was the difference between a scalpel and a sledge hammer. So all the code that we had previously written, we like ripped it out, re-abstracted it because now we have to have tables and columns and character sets and fields past the different levels of this API.
Andrew Nacin: This is all mind you inside of the WordPress database layer. So the WordPress database layer was like 2,100 lines of code. It is now more than 3,000 lines of code. That is more than 50 percent increase. And that was in a security release. Are we crazy? Absolutely. We’re incredibly crazy. Now, the nice thing about this is that when we were doing this, what we ended up doing is we told everyone it was for Emoji. Surprise. So we told everyone it was for Emoji, which means that the code for the security release was actually in trunk for a while since January, I believe, before we released this a few weeks ago. No one had any idea what it did because it was like a thousand lines of the database abstraction layer to just remove invalid characters. No one under … It was really tough for us to understand, you know, how else we can pick this apart, but it was like, it’s so opaque because it’s over here and then it’s a cross site scripting vulnerability over here, as a result of someone not turning on strict mode in 2001. Yay.
Andrew Nacin: So we snuck this in under the guise of Emojis. So Emoji’s important 4.2. All a ruse. It exists, it’s in WordPress 4.2, but it was entirely for security. We spent like a year on this for security or two years really in terms of all of the history going into all this. So I will say we have a lot of really smart and capable people, on the WordPress security team. You should reach out, if like, if all of this, you’re like nodding the entire time and knew all this and you’re like, heck yeah, I love complex stuff that is impossible to solve, that needs to be solved, otherwise the internet goes up in flames. You should like, let me know and join us.
Andrew Nacin: I only warn you once and then otherwise you get to fix all the things. But the security team does a lot of other things. In fact, we don’t just focus on like the random esoteric stuff that only affects WordPress because of a non decision made in 2001. There are a lot of other things that we’re working on. For example, we have to deal with, let’s say like forced updates for plugins. We’ve done this a number of times now. We’ve done this over the last few months. We’ve written pretty extensively about this and how it actually works and why we do this. And what we end up doing is that we coordinate this entire process with researchers and across plugins and in some cases even across themes. On Wednesday, so a whopping two days ago, there was an XSS vulnerability in Genericons.
Andrew Nacin: Now, as I’ve said previously, XS is boring. However, Genericons bubble in 2015 would be design only every WordPress site out there, minus the 10 that deleted it. And so we have this like cross-site scripting vulnerability everywhere. And so what we did is we coordinated with all these theme and plugin authors and we said, hey, we’re going to update your stuff just to warn you. And so we updated all of those things. And then when WordPress 4.2.2 came out, what it ended up doing, is it crawled the entire wp content directory and deleted all of the files it can find that were vulnerable. It didn’t delete the stuff that’s supposed to be there. And this way we could like guard actively against these sites. This was not a bug in core. Core was not affected. We could’ve just pushed like a forced plugin update for these other plugins.
Andrew Nacin: But that’s not what we wanted to do. We wanted to, as much as possible across the board, solve all the problems. Sometimes that is really tough to do to the point where it is impossible. But when we’re looking at these problems, and this was obviously … This is a low level issue at the database layer, but ultimately there were a lot of other pieces here that like we have done. For example, service and request forgery protection over the few years where we now have wp safe remote to get, which you should use instead of the wp remote to get because it is safe. It doesn’t allow you to attack your router and things like that. So we do a lot of stuff on the security team, not just the crazy things like this, not just the things that affect WordPress, but also just generally the things that impact the web. We’re at a particular scale. W3Text as of last night is 23.9% of the internet is running WordPress. Sixty percent plus of all sites that run a CMS. That is a lot of sites and it’s a lot of responsibility.
Andrew Nacin: So we can come back to this, MySQL strict mode. The primary issue is zero dates. This is the primary thing. WordPress, you’ve probably have seen this before, when it’s a draft, it’s storing the zero date, zero, zero, zero, zero, zero, zero, et cetera. This isn’t allowed in MySQL. MySQL doesn’t allow zero dates. It allows a null, but it doesn’t allow a zero date. So we’re going to try and maybe like write a SQL parser to change the queries on the fly to make it go to null, but it’s not as simple as just like a find. This would be too easy if we could just take this string and replace it with null. No, because if it’s an aware clause, it has to be is null rather than equals null, because why not? Probably some decision made in like the 80s and ANSII.
Andrew Nacin: So we have to figure out another way. So we would still like to figure out a way to turn on strict mode because it’s very important. In fact, MySQL 5.7 is turning it on by default. Ideally, eventually they eliminate it. Why? Because as I’ve demonstrated, it’s just a little bit unsafe. No one should run an application that doesn’t have it turned on and so if you’re interested in that, I’m sure there will be a lot of conversations in the WordPress core community about this in particular, especially now that I’ve totally blown the doors open with this talk. Hi livestream. How’s it going? Everything is safe. It’s okay. And finally, when I Tweeted this out a few weeks ago because people were like, Emoji, why would you want Emoji? That makes no sense. Well, because it also kept everything secure. Thank you very much. I really appreciate it.
Jason Cosper works as the Developer Advocate for WP Engine. He loves going full OCD over interesting problems and learning new things. In his spare time, Cosper enjoys hanging out with his wife and two very tiny dogs, grilling meats, sampling assorted whiskeys, writing cranky tweets about the Lakers and brewing coffee.