Want to drive engagement and retain users for your web application? Data is often the answer. Joining large web data sets with your own proprietary customer data can yield amazing visualations and insight your customers can't get anywhere else. I'm happy to publish my first Node.js framework dealing with this specific task: Funneler.
MongoDB is an excellent data store for capturing and querying user activity feeds. It's schema-less design allows for grouping of similar, but different, content -- often what feeds are comprised of. I recently created an activity feed system on a large, high traffic site and wanted to share some of the challenges and solutions I found.
Contain is a Zend Framework 2 library allowing for passing structured data objects with strict types and internal validation throughout your application and persistence layer. It integrates nicely with Zend Framework 2 components like
Zend\InputFilter and events via
Working with PHP, you'll often find yourself passing around arrays to communicate data. Contain changes this methodology by passing structured data as self-validating entities.
The only problem with tools like Sass or Less is that they're dependent on Ruby or Node.js. Although I enjoy them both, most of my projects are in PHP. Most PHP developers won't have Ruby handy, so this is a quick walk-through on getting things setup in projects of another language.
I work mostly in PHP and I do nearly all of my development out of vim and the command line. These are a few commands I often use that you might not know about as well as some handy settings to make the workflow even easier:
- De-dupe your Command History
- History Auto-Complete
- Last Argument Substitution
- Use Screen
- Watching Log Files for Changes
- View all Open Files by a Program
- View Trace Logs for System Calls
- Identify Bottlenecks
- Awk and Vim
Understanding the tools and applying them to development tasks will save you considerable time and hassle, and it will allow you to approach problems with a whole new mindset.
One of the issues with working in the software-as-a-service industry is that credit card numbers often have to be stored locally in a database. Keeping it on file with your payment gateway alone has a few limitations. The business folks may request storage for various reasons such as:
- Recurring subscriptions (often with variable amounts).
- Customers wanting to keep their card on file for future purchases.
- Transactional based software providers (such as cloud providers).
- Customer service wanting to verify the card on file with a customer.
- Managers wanting to authorize certain charges for employees.
When storing information as confidential as credit card data, even though its behind multiple passwords and buried in a database, it should be encrypted. Id prefer not to do this with every implementation, so I decided to write a class called CreditCardFreezer.
I am releasing PHPStateMapper 1.0.5, which includes a new class called PHPStateMapper_Election for reporting election results. I've added customizable maps, vector images to fix scaling issues and easier ways to load data.
PHPStateMapper is an open source PHP library for drawing a map with areas shaded by varying degrees of intensity based on data given as a simple list (e.g.: MN: 5, WI: 12, MI: 23). It exports a PNG image in a configurable size and color. Features include:
- Easy to deploy - Only an include and a few lines of code.
- Custom size - Anywhere between 100 and 2,000 pixels wide for web or printing.
- Custom color - Just give it a hex color when you instantiate the object.
- Extendible - Just about any map can be added.
- Clean code - Object oriented, PHP 5+ well documented source code.
- Easy to seed - Use the CSV importer object or just pass the data into the object.
I recently started a project where I needed to optimize email storage and retrieval in a very large database. The original system stored email addresses completely de-normalized, as large strings in repeating in multiple tables. The idea was to move these into a lookup table and store the original address as efficiently as possible - with minimal duplication.
There are multiple ways to store email addresses and it really depends on their use as to which is the most effective (mass mailing, logins, etc.). I generally prefer a shared lookup table, where a single table stores unique addresses with an auto-increment which is then referenced by all tables who need it. If multiple rows in many tables share the same address, only one is stored as a row in the lookup table and the id is referenced multiple times. This cuts down on storage, lookup times, index sizes and also is great for tracking things like opt-out status or when addresses bounce (only one row needs to be updated).
Many frameworks do this for you; but, if you’re writing a small application or updating an older one, you might need to take a more manual approach to dealing with localization and date/time values. I'll go over the common settings, how to store them, and lists of possible values.
Everyone in the world can read a date as either 12/31/2010 or 31/12/2010 (alternate slashes and two digit years can be substituted). You dont need to allow customizing the display beyond these two formats - its just cosmetic after that. If you hard coded column sizes on grids and forms, you can store this setting as boolean called
date_dmy or similar. Or, you can take the WordPress/phpBB approach and store it as a string with phps date() formatter strings.
I recently got a chance to do some professional development with the latest WordPress 3.0 Thelonious release and it blew me out of the water. For a while now, I’ve been doing some work on the side in web design/development (I love to code, but I’m a designer at heart) and I use WordPress almost exclusively. I have to say, I'm very excited about the new features, menus in general, and their apparent redirection towards content management and less of a sole focus on blogging.
This is just a quick one for an issue I ran into today when I was tuning some MySQL indexes for better performance. Some of our older PHP code was performing a SELECT using a BETWEEN to specify a date range. After adding an index to speed up this range query, I noticed that it wasn't taking and was still performing a table scan/file sort.
The code was using PHPs date function with the
c format string, which inserts a date/time string like so:
2010-05-23T16:45:08-05:00. When inserting or comparing dates, MySQL will automatically convert this to its internal date/time format; but, it will throw a warning.
Simply changing the PHP date functions format string from
Y-m-d H:i:a (which is a native MySQL date/time format) did the trick. The explain remained the same, but the number of rows was drastically reduced from hundreds of thousands to a few hundred (the result of the range) which took a 20 second query down to about a tenth of a second.
Over the past several years, I’ve encountered a lot of growing pains while managing a SaaS infrastructure for my company. One of our big successes in transitioning to the Lighttpd web server was almost reverted because our hardware load balancer wasn’t able to health check our front-end web servers.
Working on the CATS project has taken me from developing primarily English software into a whole new realm of excitement - internationalization and localization (i8n / L10n). Suddenly Ive got people from 120 countries (and not a handful, hundreds of paying customers!) wanting to see full support for their native tongues.
I could probably talk for hours on the enormous effort that it took to take CATS to the level of i8n support it has today; but, instead, I'm going to talk about the top 10 headaches I ran into.
Before I get started, if you're looking at adopting i8n / L10n either pre-development or on an existing project, UTF-8 is the way to go. There are alternatives; but unless you have a very heavy non-Latin based user base, stop looking. UTF-8 is backwards compatible with ASCII, it supports just about everything (and is supported by just about everything) and its the best thing since sliced bread.
If you're working on LAMP, you’re most likely already taking advantage of journaling on your file system and with your database. With journaling, any operation performed is first logged to the disk and then read circularly to be performed.
On a modern file system such as reiserfs or ext3, journaling means that a sudden loss of power between the removal of a directory entry and the marking of its inode as free in the free space map wont lead to an orphaned inode (a storage leak).
In a database, journaling is often referred to with ACID (atomicity, consistency, isolation, durability) or the bundling of a series of operations in a transaction. When banking for example, you wouldnt want a process to fail after deducting the money from your account but before depositing it as a payment into another account.
So far, were pretty comfortable implementing journaling everywhere that were storing data; but what about in the application layer. As the project triangle asks, we toss out simplicity (bye bye Ruby on Rails) for better reliability and performance. I propose numerous situations where journaling can be beneficial:
A staple of most dynamic web applications is the data-grid, which is a tool used to display a handful of rows from a data source and provide the end user with the ability to sort and paginate through a data set. It’s a vital component to the CRUD (create, read, update and delete) set of operations most web applications provide.
When it comes to data-grids and advanced data sources, it seems like almost everyone rolls their own. While frameworks like Rails provide the scaffolding out of the box for simple ORM designs, they hardly hold up for the tougher jobs with multi-table joins or other complications.
As an example, CATS, a project that Ive lead development on since 2007 uses a data-grid library with approximately 40,000 lines of object oriented PHP source code that Ive written, re-written and optimized over time. The database for this project is several hundred gigabytes with billions of rows, so as far as optimizations and speed bumps go, Ive seen quite a few.