Thu 9 Sep 2010
Wed 1 Sep 2010
There's been a meme going around recently that SQL and relational databases are somehow "too complicated", antiquated and "old hat" and should be replaced with something simpler and therefore more efficient.
This opinion is missguided (and perhaps slightly juvenile). Never-the-less a kind of "NoSQL" movement formed which has created some very useful things in the Distributed Hash Table (DHT) space. (In a video on Cassandra, Eric Evans claims to have invented the term NoSQL and wishes he hadn't!).
I hope to show that SQL and DHT (NoSQL) systems are complimentary to each other and not in competition.
Useful data storage system have "ACID" characteristics (Atomicity, Consistency, Isolation, Durability). SQL systems are very strong on Atomicity, Consistency and Isolation and can also achieve "5 nines" or more reliability in terms of Durability. But, even with highly partitioned data stores, the Consistency requirements often prove to be a bottleneck in terms of performance. This can be seen as an impact on Durability – i.e. database performance under sufficient write load can drop to a point where the database is effectively unavailable.
Sharding – completely splitting the database into isolated parts – can be used to increase performance very effectively, but Consistency, and queries that require access to the whole database, can become costly and complicated. In the latter case a proxy is usually required to submit the same query to all shards and then combine the results together before returning it to the client. This can be very ineffiecient when making range queries.
DHT systems trade Atomicity and Consistancy even further for more Durability under load (ie. performance scaling). Strictly speaking NoSQL can be implemented by a simple hash table on a single host – e.g. Berkley DB – but these implementations have no scaling capability so are not included in this discussion.
SQL implementations include: MySQL, Oracle, PostgreSQL, SQL server etc. DHT implementations include: Cassandra, HBase, membase, voldemort etc.. MapReduce implementations (e.g. Hadoop) are a form of DHT but one that can trade key uniqueness for the speed of "stream/tail processing".
|Immediate (or blocking) consistancy||Eventual consistancy: reads don't wait for a write to completely propogate. Last write wins, conflict resolution on read etc.|
|Transactional||Multiple-operation transactions implemented in the application.|
|Scale write performance by partitioning (utilise multiple disk spindles). Writes go to a privileged master or master cluster (which may also service reads).
Scale read performance by "fan out": multiple read slaves replicating from the master.
All nodes are functionally equal, no privileged "name" or meta nodes.
|Relational. Indexes available on multiple columns (one column optionally a "primary" unique key).||Non-relational, single index, key-value stores ("column family" DHT systems are just an extension of the single key)|
The metric is then quite simple: if high-capacity (data volume or operations per second) is required, data is only ever accessed by primary key, and eventual consistancy is good enough, then you have an excellent candidate for storage in a DHT.
Other relational storage can be replaced with DHT systems but only at the cost of denormalising the data – the data is structured for reads not writes – but this should probably be avoided! You can use a DHT to speed up a RDMS with regard to the storage of blobs. Some RBMSs have a separate disk space for blobs, some include them in the normal memory space along with the rest of the data. If you have a DHT to hand then another technique is to split up any updates into 2 halves – the first uses the RDMS to store the simple, relational data and returns a primary key, the 2nd then store the blobs in the DHT against that primary key instead of in the RDMS. This shortens the write thread, and any associated locking, in the RDMS as much as possible.
Wed 20 Feb 2008
[Andrew Orlowski doesn't support feedback through the normal El Reg comments system, only by private email (I wonder why), so I'll reproduce my response to him here]
“So who’ll pay for Internet 3.0, then?”
All server hosting companies – and, therefore, the websites run on them – have to pay a network operator for their fat connection to the Internet. The BBC is no exception: though it may have its own data-centre it will have to pay for its pipe to Linx or wherever.
How that upload fee gets distributed to the last-mile, end-user providers is the real question.
Fri 25 Jan 2008
According to The Register, Nokia is in talks to acquire a stake in Facebook with a view to “porting the social network on to Nokia handsets in a major way”. The key point of surprise is that while Nokia has close to a Billion paying customers and Facebook has only 50 million (who hardly pay a bean) Nokia is likely to pay Facebook for the privilege!
So why is Facebook worth so much?
The answer, of course, is that it’s not – the large valuations on Facebook are complete nonsense.
Microsoft paid $240M for a 1.6% share of Facebook to keep Google out and nothing more!
But that’s quite a business model for Facebook – keep finding major players in other markets willing to sign up “exclusive” deals. 240 mil here, another 150 mil there would be enough to keep the Z boy in business cards for life.
So Nokia is either being very clever or very stupid, it’s a shame it’s not clear which.
But I wonder if Zuckerberg understands the irony of running a social website funded my companies who just want to exclude each other.
Mon 21 Aug 2006
This is re-write of a post I’d originally produced for the internal blog where I work. I wanted to bring it out into the public, so to speak, as I may have a sequence of general thoughts that start from here.
—-The 80:infinity rule – and a plea for the future
One of the problems with the “everything should be open/readable unless specified otherwise” premise favoured by the more vocal in the blogosphere is that security is virtually impossible to strap on as an afterthought module. The security functions needed to implement chinese walls, Sarbanes-Oxley and other contractual constraints â€“ i.e. the “triple A”: of Authentication, Authorisation and Auditing – often (always?) need to be in the core design of a tool or environment to be successful, even if they are usually turned off for collaboration.
Which brings me to the 80:infinity rule.
The joke goes: “the last 20% of a project takes 80% of the time, unfortunately so does the first 80%…”
But with modern RAD/Agile/nom-de-jour tools the first 80% can be done very quickly: within days, hours or even minutes (depending on how well the demonstration is rehearsed But in my experience the last 20% is where the interesting stuff happens, and the more bling is devoted to the first 80% (to impress a gullible management) the more likely the last 20% will tend towards infinity.
With vendor products that means being locked into â€œrolling beta-releaseâ€, bleeding edge, and missed deadlines for promised functions.
Does that sound familiar? Is there at least one environment in your workplace evaluated only on its first 80%… And as support engineers and developers who’ve had a system dumped on them know, it’s the last 20% that causes the most pain.
In the enterprise where I work I’d guess the last 20% includes things like: AAA, proper ldap / enterprise directory integration (no, not just Active Directory), speed/scalability, redundancy/resilience, reporting, ownership/traceability (relates to AAA), integration rather than synchronisation, usability etc.
Getting that last 20% correct, right from the beginning, can have a far greater impact on project’s bottom-line budget than the first 80% ever can.
So, my plea for the future: if you`re in a position to make tool choices, ignore the first 80% as any fool vendor or contractor can implement that. For successful purchases and environments evaluate for the last 20%… *
* as they say in Southpark, â€œWon`t somebody pleeeese think of the childrenâ€
“Every moment in planning saves three or four in execution” – Crawford Greenwalt