CouchDB 1.0 Released

Apache CouchDB is a free/open source RESTful JSON document (NoSQL) database with map reduce views and peer-based replication. Version 1.0 was just released today and is 300% faster than the previous version and includes Microsoft Windows support, an authentication system, and flexible replicator options. The New York Times, ReadWriteEnterprise and InfoWorld covered the release. Couchio has a clever release announcement as well. I’ve been reading up on CouchDB since Matthew Weier O’Phinney’s presentation on Document Databases at the last Burlington, VT PHP Users Group meeting and I may have some projects where CouchDB will be a good fit.

Document Databases

At tomorrow’s Burlington, VT PHP Users Group meeting Matthew Weier O’Phinney will be giving a presentation on Document Databases. From the meeting description:

NoSQL has become a new buzzword in web development—but what is it, exactly? We’ll look at the big picture to identify what types of NoSQL solutions exist, what sorts of problems they solve, and go into some specifics on CouchDB and MongoDB usage so that you can see how you might use these new tools within your PHP development.

The meeting will be at Office Squared in downtown Burlington from 6:00 PM to 8:00 PM tomorrow (June 24th, 2010) evening. Registration is free.

Update (6/24): The meeting has been postponed until next Tuesday, June 29th, 2010.

Magento Roundtable

Next week’s Burlington, VT PHP Users Group meeting will be a Magento Roundtable discussion. From the meeting description:

Over the last year, Magento has increased in popularity as a viable open source eCommerce platform for the mid to large scale online retailer. This rapid growth has lead to many questions on what has been fixed, what continues to be a problem, and how complex a system an eCommerce engine needs to be.

Have you used Magento? Are you considering using it for an upcoming project? Do you have a specific problem that Magento solves for you and would like to share your story? Join the roundtable and compare notes with others who have experience with Magento.

Meetings are open to the public via RSVP. You will have the chance to network and connect with fellow PHP developers.

To Participate:

  1. Register on Eventbrite.
  2. Sign up for the Burlington, VT PHP User Group list on Google Groups.
  3. Forward this link to anyone else you feel would be interested in this meeting’s topics.

TEK·X Day Three

Sadly, today was the final day of TEK·X. However, there was plenty of information and networking packed into the last few days. Marco Tabini and the rest of the team put on a top-notch conference. Being the last day, there were only three sessions. I started out with Jason Austin’s Lean Mean PHP Machine session where he talked about implementing software development best practices in a small team. Lorna Jane Mitchell then gave an excellent talk, without the aid of slides, called Open Source Your Career. Her talk provided a nice transition to the Community Roundtable session with Michelangelo van Dam, Lorna Jane Mitchell, Rafael Dohms, Ben Ramsey, and Keith Casey moderating. User groups were a big focus of the roundtable and I hope more people were inspired to start their own local PHP user groups.

TEK·X Day Two

It’s hard to believe tomorrow is the last day of TEK·X. Where did the time go? Today started with Matthew Schmidt’s 10 Developer Trends in 2010. He talked about agile development, browser standards, AJAX, security vulnerabilities, RIAs, touch interfaces, key/value stores, version control, cloud computing, and dynamic languages. While not a bad keynote, the topics seemed fairly basic and obvious given the audience.

Next up for me was Derick Rethans’ Xdebug talk. Xdebug is an extremely useful tool for PHP developers. I’ve used its stack trace feature as well as its code coverage analysis via PHPUnit. I’ve also dabbled with its profiling capabilities. The session introduced me to several other Xdebug features with which I’d like to experiment.

After that I had the pleasure of seeing Matthew Turland’s talk on New SPL Features in PHP 5.3. New SPL data structures in PHP 5.3 include stacks, queues, heaps, priority queues, and sets. Matthew provided test code that compared the performance and memory usage of each these new data structures to that of using PHP’s array functionality.

I skipped the first afternoon session to take part in the Hack Track which happened to coincide with Zend Framework’s May Bug Hunt Days. I was granted commit access and directly committed my first bug fix, a small change to make HTTP headers case-insensitive.

Others stuck around to fix more bugs while I went to check out Bill Karwin’s Models for Hierarchical Data with SQL and PHP. Examples of hierarchical data include categories/subcategories, bill of materials, and threaded discussions. Bill talked about four main approaches to storing hierarchical data in SQL databases: adjacency list, path enumeration, nested sets, and usage of a closure table.

The adjacency list is a naive approach that almost everyone tries first. Basically, each entry knows its immediate parent. The problem with this approach is that querying deep trees can be very inefficient involving many joins. Path enumeration involves storing an enumerated chain of ancestors in each entry. This can be very efficient and take advantage of indexing. However, there is no referential integrity with this approach. The nested set approach seemed a bit complicated. I don’t feel I can explain it properly here, so you’ll have to check out Bill’s slides if you’re interested in how this works. The closure table approach made the most sense to me and didn’t seem overly complicated. Not only do you store each entry, but you also store every path including those from the parent node to each descendant, those from each ancestor to its child node, as well as a reflexive reference from the node to itself.

My final session of the day was Travis Swicegood’s Building Real-Time Applications with XMPP, the Extensible Messaging and Presence Protocol. If you’ve use Google Talk then you’ve used XMPP. As a web developer, the request and response pattern in HTTP is ingrained into my thinking. However, XMPP is a very different creature in that it keeps a socket open during what can be a lengthy exchange of messages. While I don’t think HTTP is going away anytime soon, real-time applications involving potentially large numbers of publishers and subscribers (e.g. Twitter) are becoming more prevalent and XMPP is well suited for this environment.

TEK·X Day One

Day one of TEK·X here in Chicago got off to a great start with Josh Holmes’ Lost Art of Simplicity keynote. I agreed with pretty much everything that Josh had to say. As software developers, we’re often all too eager to start building a complex solution to what may be a simple problem.

Following are some short notes on the other sessions I attended today:

  • I’m always amazed at the Apache web server’s capabilities and Rich Bowen didn’t disappoint with his Apache Cookbook talk.
  • Joël Perras’ talk on Graphs, Edges & Nodes was a useful introduction to an important concept in today’s world of social networking and linked data.
  • David Strauss gave the audience a ton of helpful information about creating a scalable LAMP infrastructure.
  • Scott MacVicar talked about some upcoming features in the PHP language itself.
  • Bill Karwin gave a very clear presentation on SQL Injection Myths and Fallacies. This is a topic any web developer must have a good handle on since SQL injection is one of the most common security vulnerabilities.
  • Ben Ramsey talked about creating Desktop Apps with PHP and Titanium. I found this to be a very intriguing, albeit bizarre, technology.

The evening wrapped up with a social event and open bar sponsored by Zend and Echolibre. This was a great opportunity to catch up with others from the PHP community. I’m looking forward to another full day of sessions tomorrow.

TEK·X Arrival and Day Zero

Yesterday I arrived in Chicago for my first TEK·X PHP conference. After getting in, I had an interesting conversation with Bill Karwin over dinner. Bill is the author of  SQL Antipatterns and is presenting on SQL Injection Myths and Fallacies as well as Models for Hierarchical Data with SQL and PHP here at TEK·X. I was able to pick his brain on a wide range of topics including his thoughts on NoSQL and Object-Relational Mappers (ORMs). I tend to be skeptical of both (although there are certainly uses for both) and I got the sense from our conversation that my skepticism is well founded.

Today was tutorial day, or day zero. This morning I attended Arne Blankerts’ Bad Guy For a Day – A Websecurity hands-on tutorial. I liked that he took a step back and looked at the different types of security such as the transport layer, infrastructure layer, data warehouse, user interface design, user level security, and application level security. He talked about the usual suspects such as cross-site scripting (XSS), session fixation, cross-site request forgery (XSRF), and SQL injection. Filtering input and escaping output was also addressed, of course. He demonstrated attacks on several security holes in an (intentionally) badly written application. Take a look at the slides for more details.

This afternoon I attended Ed Finkler’s JavaScript for PHP Developers talk. Like most PHP developers, I find myself often working with other web technologies such as JavaScript. The object model in JavaScript is very different than in PHP and I found Ed’s explanations of these differences very helpful. The JavaScript core language is often confused with the Document Object Model (DOM) API and a clear explanation of the line between the two was also useful. His slides don’t appear to be posted yet, but I’ll link to them once they are.

One last note: if you happen to be here at TEK·X then be sure to rate the sessions on joind.in. The speakers really appreciate the feedback!

Searching A Field With Digits In Zend Framework’s Lucene Component

Recently I ran into a bug in one of our applications using Zend_Search_Lucene where the same document was showing up multiple times in search results. Actually, many different documents were showing up more than once. I tracked it down to the routine that updated indexed documents. With Zend_Search_Lucene you can’t actually update indexed documents, but you can delete an indexed document and then insert a new, updated, document. In order to delete a document you first must search for it by a previously indexed field and then, once found, delete it using its internal document identifier. The problem seemed to be that documents were not being found and deleted when updated, thus duplicates of the same document were accumulating on each update.

The field I was indexing, and subsequently using to find and delete documents, was a 40 character SHA-1 hash. While trying to track down the bug in my application, I discovered that only documents having a SHA-1 hash beginning with a digit were getting duplicated (in other words, were not being found when I tried to delete them) and that documents having a SHA-1 beginning with a letter were not getting duplicated (in other words, were being found and deleted).

A Stack Overflow post on searching numbers with Zend_Search_Lucene had the information I needed to fix the bug. First, I changed the hash field from a text field to a keyword field which prevented it from being tokenized (this, of course, required me to delete the existing index and re-index every document). Second, when searching on the hash field I replaced the default Text analyzer with the TextNum analyzer. These two changes seemed to do the trick, as I haven’t seen any duplicate search results after having run several index updates.

Layouts in Zend Framework

In my previous post in this series we looked at Zend Framework’s routing and Model-View-Controller (MVC) components. In this post we’ll take slight detour and explore Zend_Layout, a component that lets you have a consistent template for the layout of pages throughout your website.

In the past you may have implemented consisted layouts in PHP using “includes” but, as you’ll see, layouts are much easier to manage than “includes”. The primary improvement over “includes” is that you can change the entire layout of your website without editing every single PHP script. This is because the layout decides where the content gets placed instead of each PHP script including the needed partials (e.g. header and footer). This is accomplished through an implementation of the Two Step View pattern.

If you’re using Zend_Tool then you can enable layouts using the command zf enable layout. This will create a layout view script, application/layouts/scripts/layout.phtml, and add the following line to the production section of your application/configs/application.ini file:

resources.layout.layoutPath = APPLICATION_PATH "/layouts/scripts/"

Zend_Layout has a front controller plugin that has two main jobs. First, it takes care of rendering the layout for us. Second, it retrieves what are called “named segments” from Zend Framework’s response object and assigns them as variables in your layout. A segment named “default” is assigned to the variable named content. Typically (unless you do something to override this) the “default” segment will contain the output of your individual controller actions. This means that you can render the content of your controller actions wherever you’d like in your layout view script by outputting the value of $this->layout()->content.

In the demo blogging application, Postr, I’ve also created a header layout view script, application/layouts/scripts/header.phtml, and a footer layout view script, application/layouts/scripts/footer.phtml. The header and footer layout view scripts are rendered by outputting the results of $this->render('header.phtml') and $this->render('footer.phtml'), respectively.

In my next post in this series I plan on taking a look at the Entry controller (application/controllers/EntryController.php) in the demo blogging application, Postr. We’ll explore its actions and corresponding view scripts. Zend_Form, Zend_Paginator, and modeling domain objects will be touched on briefly, but explored in more depth in later posts.

Job Opening: Functional Analyst & Quality Assurance Specialist

Found Line is hiring! We’re looking for a Functional Analyst & Quality Assurance Specialist to help us create useful web applications. Here is a description of the job:

As a Functional Analyst & Quality Assurance Specialist at Found Line, you will have three primary responsibilities. First, you will communicate with outside clients and subject matter experts to develop the functional specification for each iteration of various web applications. Second, you will translate functional specifications into tickets that a software developer will complete. Third, you will perform acceptance testing on each iteration of these web applications to assure that the functional specification you originally outlined has been met. Future responsibilities may include implementing and coordinating usability testing plans. Some client support and training may be necessary as well but this will not be a primary focus of your job.

You will be participating in a process based on both agile and more traditional software development methods. While you will write and test against functional specifications, these specifications will be for narrowly defined scopes of work (i.e. iterations). This is a fast-paced environment with each iteration usually being only a few days in length. We are small but very busy, and need someone who is a self-starter, an excellent communicator, and can work independently. Experience with DocBook, XMLmind XML Editor (XXE), Subversion (SVN), and Trac a plus, but not required. This is an on-site position, full-time or part-time. No contractors or recruiters, please.

Send your resume and cover letter to jobs@foundline.com (no phone calls).