Michael Angstadt's Homepage

Blog

How this page works

More blog entries >>

04/18/2019 6:33 pm
At my workplace, I administer a Windows Server 2008 RC2 web server that makes daily backups to an external hard drive using Windows Server Backup. The backups have been consistently failing recently, and I've found out why.

Even though the backups were not finishing successfully, Windows Server Backup labeled the backups as “Completed with warnings” because it was able to successfully backup some of the data before failing. The backups would always fail at the same point: after backing up approximately 12 GB of data. The failures were intermittent at first, but became more consistent over time to the point where they were happening every night.

Error message

The error message Windows Server Backup was giving me was very vague:
"The operation failed due to a device error encountered with either the source or destination. Detailed error: The request could not be performed because of an I/O device error."

 The error message in the event logs was equally vague:
“The backup operation that started at
(Contrary to the message’s suggestion, I could not find anything useful in the event details.)

Bad sector?

Many websites suggested that this error could be caused by a bad sector on the source or destination drive. I ran chkdsk against the 1 TB external hard drive that the backups were being stored on. The command took about 5 hours to run. It was alarming to see that the command consumed 4 GB of memory while running, but according to numerous websites, this is normal. No errors were found.

Unfortunately, I couldn’t run chkdsk on the server’s hard drive because it would require that the machine be put out of service.  Chkdsk cannot be run on the system drive while the server is operational—it must reboot and then run before booting into Windows. Since the server hosts our website and another library’s website, I couldn’t take it offline without advance notice. So, that wasn’t an option.

USB port?

I tried plugging the external drive into a different USB port, thinking maybe the USB port was bad. A long shot, but you never know. No effect.

Large file?

One website suggested it could be failing because a file was too large. I knew of one particularly large file on the system: “C:\MySQL Data files\ibdata1”. This file is where MySQL stores its databases. The file is over 1 GB, so I tried excluding that file from the backup.

(In order exclude specific files from the backup, I had to disable the “Bare metal recovery” option from the backup settings.)

Reparse points

Before I could complete the configuration wizard, I got an error message:
“One of the paths specified for backup is under a reparse point. Back up of files under a reparse point is not supported. Specify a file path that contains the destination of the reparse point, and then retry the operation.”
What is a reparse point? A reparse point is kind of like a shortcut. For example, there is a reparse point for the old “C:\Documents and Settings” folder that just redirects to the “C:\Users” folder.

To see all the system’s reparse points, type the following command at the root of the drive:

dir /s /al

I found that the server had a LOT of reparse points. Most of them were in the user folders and were for the old Windows XP common folders names (for example, a reparse point named “My Pictures” existed, which redirected to the “Pictures” folder)

Due to the number of reparse points on the system, excluding all of them individually wasn’t a practical option. I decided I would just exclude the “Documents and Settings” reparse point to see if that would be enough.

This time, I could complete the configuration wizard and run the backup. The backup even completed successfully (though, it took 8 hours to complete, which is much longer than the 1 hour it used to take).

The solution

The backup was successful, but it failed to back up a one, single file—an old IIS log file. The error message was:
“Error in backup of during read: Error [0x8007045d] The request could not be performed because of an I/O device error.”
Another vague “I/O device error” message. I tried copying the file to the desktop to see if the file could be read. The copy operation failed with a similar error.

This made me think that there was something corrupted about the file, which was causing the backup to fail. Maybe a bad sector on the hard drive. I deleted the file, changed the backup settings back to “Bare metal recovery” and ran the backup again. The backup succeeded!




08/12/2018 6:32 pm
The public access computers at the library where I work use software called Deep Freeze, which prevents any changes to the computer from being persisted between boots. Rebooting the computer reverts everything back to the way it was since the last reboot. The software is essential for a public-access environment, as it prevents users from doing any long-term damage to the system and also helps with privacy.

Deep Freeze does its job wonderfully, but I recently started noticing some issues with installing Windows Updates when we switched to Windows 10. The reason I think these problems are caused by Deep Freeze is our staff computers, which are nearly identical to our public ones, do not have Deep Freeze installed on them, and they have not experienced these problems.

Problem 1: "Undoing changes"

During the phase of the update process when the updates are installed after you reboot the computer, Windows reports that the updates could not be installed and that it's "undoing changes".



The solution that I discovered was to run the following commands BEFORE checking for updates. These commands must be run from an admin-level command prompt.

sfc /scannow
dism /Online /Cleanup-Image /RestoreHealth

If you ever had to troubleshoot a Window problem, odds are you have seen these two commands before, as they are floating all over the Internet in tech help forums. I like to think of them as general-purpose troubleshooting commands that are good to run if you are have any problem with the Windows operating system itself. There's also never any harm in running them.

The first command checks Windows' operating system files for corruption. In my case, it always reports that it found corrupted Windows files and that it fixed them. The second command, in my case, doesn't report that it found any problems, so it may not be necessary for this particular problem.

Since this problem has reoccurred so many times for me, I have now made it a part of my routine to run these commands before checking for updates.

Problem 2: Booting to the blue "Automatic Repair" screen


This problem only happens after installing Windows 10 feature updates (as opposed to "quality" updates, which are smaller and more frequent). When the computer is turned on, it sometimes (but not always) boots to a blue screen titled "Automatic Repair" (pictured below).


This screen will either (a) report that it repaired some problems and prompt you to reboot your computer, or (b) report that it couldn't repair the problems and prompt you to shutdown your computer. In the latter case, clicking "Advanced options", then "Continue" will boot the computer normally. The screen appears roughly half the time the computer is turned on.

The solution to this problem is first to uninstall Deep Freeze. Then, run the two commands above. Finally, reinstall Deep Freeze.

To prevent this issue from happening in the first place, uninstall Deep Freeze before installing the Windows update.
07/03/2017 11:25 am
Dial-up

Telephone lines can be used to transmit computer data. This was how people connected to the internet in the early days. However, telephone wires are designed to transmit analog information, and computers can only consume digital information. A dial-up modemconverts analog signals to digital and vice versa.

To connect to the internet with a dial-up modem, you enter a phone number for your modem to call, which is provided to you by your ISP. You also have to provide a username and password. The connection process is noisyand takes several seconds to complete. ISPs would often charge by the minute, so you never wanted to leave your connection open when you weren't using it (you also couldn't make phone calls while connected). Dial-up connections use a protocol called Point-to-Point Protocol (PPP), which is specifically designed for transmitting data over dial-up.

The unit of measurement that is used to measure the speed of data across a telephone line is a baud. The maximum speed a telephone line can achieve is 2,400 baud. As modems improved over time, they could pack more and more bits into each baud. For example, a 33.6 Kbps modem can pack 14 bits into each baud (2400 x 14 = 33,600). The highest speed that can be achieved through dial-up is 56 Kbps. Dial-up connections also have fairly high latency compared to other internet connection approaches.

To try to break the 56K barrier, some ISPs experimented with server-side compression. This involved compressing certain kinds of data before sending it over the wire to the client, resulting in higher download speeds. This approach was hugely successful for certain kinds of data that can be easily compressed, such as HTML pages and plain text. But many data formats are already compressed, such as ZIP files and streaming video, so no speed improvements could be gained from them. Image file formats like JPEG and PNG already use compression, but ISPs would compress them even more, resulting faster speeds, but, as a consequence, a loss of image quality.

ISDN

As dial-up modems began approaching the 56K limit, telephone companies began converting all their analog telephone lines to digital. The process of sending digital signals across digital telephone lines is called ISDN, and it allows speeds of up to 64 Kbps (wow!).

An ISDN line contains two types of channels. Bearer (B) channels are used for voice and digital signals and run at 64 Kbps. Delta (D) channels are used for setup and configuration data and run at 16 Kbps. A common setup would be to install two B channels and one D channel, giving you speeds of up to 128 Kbps. This setup was referred to as basic rate interface (BRI). A more powerful, but less common, setup involved twenty-three B channels (providing 1.544 Mbps) and one 64 Kbps D channel. This was called primary rate interface (PRI) or a T1 line. The main downside to ISDN connections was that you had to be within 18,000 feet of the central ISP building for it to work.

DSL

Digital subscriber line connections use your telephone line like dial-up, but the connection is always-on and is much faster. They also allow you to make phone calls while the connection is active. Speeds can vary anywhere from 3 Mbps to hundreds of Mbps. The most common type of DSL connection is asynchronous DSL. ADSL has upload speeds that are slower than download speeds. On the other hand, synchronous DSL (SDSL) gives you identical upload and download speeds, but is more expensive. Just like with ISDN, you must be within a certain distance of the main ISP office. The distance can vary from a few hundred feet to 18,000 feet.

Cable

A cable connection piggy-backs off of your cable television connection. It provides upload speeds between up to 20 Mbps and download speeds of over 100 Mbps.

Fiber

There are two kinds of fiber connections. In fiber-to-the-node (FTTN), the ISP installs a central box somewhere in your neighborhood, which is connected to the actual fiber line. Then, the individual houses connect to the box using standard Ethernet or coaxial cabling. In fiber-to-the-premises (FTTP) your house is directly connected with the central office via fiber. Fiber varies in speed, but can be as fast as 1 Gbps (which is what Google Fiber provides). In some cases the download speed matches the upload speed. I have an FTTP fiber connection that gives me 100 Mbps upload and download speeds.

Satellite

The main benefit to a satellite connection is that it works anywhere in the world. No infrastructure is required (telephone lines, cable lines, etc). A satellite dish must be professionally setup so that it has line-of-sight communication with the satellite up in space. The main downsides are: higher than average latency and signal degradation in cloudy weather.

References

07/02/2017 10:07 pm
There are many different kinds of technologies that allow for the wireless transmission of digital information through the air. These include Wi-Fi, Bluetooth, infrared, and cellular.

Wi-Fi

The most well known is the 802.11 family of protocols, more commonly known as Wi-Fi. In a typical Wi-Fi setup, all computers connect to a central device called a WAP(wireless access point). WAP is the technical term for “wireless router”. Every wireless network has a service set identifier (SSID), which is a human-readable name for the network. The SSID is what appears when you search for available wireless networks in your device's Wi-Fi settings. The WAP broadcasts the SSID so new devices can find and connect to the network.

For small buildings, like a SOHO (small office/home office) environment, only one WAP is needed because its signal can reach all or most parts of the building. This is referred to as a Basic Service Set (BSS). However, larger buildings cannot make due with just a single WAP. In this situation, multiple WAPs are strategically placed throughout the building, and are joined together into an Extended Basic Service Set (EBSS). In a EBSS, all the WAPs have the same SSID, so as you roam around the building, your device automatically switches WAPs based on whichever has the strongest signal.

Securing your WAP

Hiding the SSID: It's possible to configure a WAP to not broadcast its SSID, which helps prevent unauthorized people from accessing it.

Enabling MAC address filtering: Every computer device has something called a MAC address, which is a 48-bit, globally unique identifier. You can provide your WAP with the MAC addresses of all your devices so that no other devices are allowed to connect.

Changing the admin password: Many WAPs leave the factory with identical administrator passwords. Change it! The administrator password is used to access the configuration settings of the WAP (usually through a web interface), so it's important to have a strong and unique password.

Controllingphysical access: Many WAPs have a handful of Ethernet ports on them. Connecting a computer to one of these ports bypasses all the wireless security that is in place, so you should either disable these ports or place your WAP in a location that only authorized personnel can access. Also, when you buy internet service for your home, the ISP often provides you with a WAP that has the Wi-Fi and administrator passwords stamped onto the case. So if you don't want to change them, make sure the WAP isn't in a place that can be seen by strangers (like your window sill!).

Ad Hoc Mode

Connecting to a wireless network through a WAP is referred to as “infrastructure mode”. But it's interesting to note that a WAP isn't required to network computers wirelessly. In “ad hoc mode” (also sometimes referred to as “peer-to-peer mode”), computers connect directly with each other to form an Independent Basic Service Set(IBSS). This is useful if a WAP isn't available and the number of computers you need to network is small.

Antennas

The antenna most commonly used by WAPs and computer devices is a dipole antenna, which is a type of omni-directional antenna. They look like a stick but actually have two antennas inside them. Some WAPs have detachable antennas, which gives you the option of installing larger, more powerful ones.

Signal strength (called “gain”) is measured in decibels (dB). Most WAPs broadcast at around 2 dB, and some let you adjust this. You might think that the higher the gain, the better, but not always. Lowering the gain to an amount that just barely covers your building will prevent your neighbors from being able to connect to your network. This also does your neighbors a favor because it lowers the amount of RFI (radio frequency interference) that their wireless networks will have to contend with.

The orientation of the antenna matters. This is called polarization. If an antenna is standing straight up, it has a vertical alignment. If it is laying flat, it has a horizontal alignment. Since the antenna in your laptop is located in the lid next to the screen, it generally has a vertical alignment when the lid is open. In order to communicate effectively, the antennas of the computer and the WAP must have similar polarities. It's good practice to tilt the WAP's antenna to a 45 degree angle to accommodate the largest variety of polarities.

Wi-Fi Security Protocols

Because all communication is traveling through the air, anyone with the right equipment and skills can intercept this communication and read it—just like tuning your car radio to a radio station. Unlike radio broadcasts, the information that travels through Wi-Fi networks can be very sensitive. To help protect your privacy, various security protocols have been released over the years.

WEP. Created in 1997, this protocol encrypts all communication with 40- or 104-bit encryption. And it was not very secure. For one, it uses the same encryption key to encrypt all communication with all client computers, which makes it possible for a single computer to listen in on everyone else's communication. And in 2001, a serious encryption flaw was discovered which allowed a WEP key to be cracked in minutes. WEP was officially retired in 2003 and replaced by WPA.

WPA. This protocol corrects WEP's weakness of using a single encryption key by changing the encryption key for every packet of data that is transmitted (called TKIP). The encryption key size was also increased to 64- or 128-bits. And it includes a feature which prevents malicious clients from altering and resending data packets. WPA was only intended for temporary use until the WPA2 standard was finalized.

WPA2. Finalized in 2006, WPA2 includes all of the improvements that WPA brought to the table, as well as an improved encryption algorithm called AES. AES is a very strong algorithm that no one has been able to find a significant flaw in (yet). In fact, the U.S. government approved it to be used for transmitting classified information in 2003. WPA2 is currently the most secure wireless security standard, and it's what all your devices should be using. WAPs that support “mixed-mode” allow devices to connect using either WPA or WPA2 (for older devices that do not support WPA2).

WPS. What if you want to connect a device like a printer or scanner to your Wi-Fi network? Because these devices often lack display screens, how are you supposed to give it the SSID and password of your Wi-Fi network? Enter WPS. It allows you to connect a device to a network with as little as two button presses. First, you press the WPS button on the device. Then, you press the WPS button on the WPA (your WPA must support WPS). And bingo, it's connected. However, it has a major security flaw. It also allows you to connect devices to it using an eight-digit code, which an attacker could use to brute force his way into the network. Therefore, security experts recommend that you turn WPS off if your WAP supports it.

Sidenote: HTTPS

You might be nervous about transmitting sensitive information over a wireless network, especially if it is a public Wi-Fi network, like the one at Starbucks or your favorite coffee shop—AND YOU SHOULD BE! Even if the network uses the best possible encryption standard (WPA2), not only could someone theoretically discover a flaw at any time and start intercepting your data, but the owners of the WAP could theoretically configure their WAP to intercept and log all information that travels through it! Or, attackers could set up their own WAP within range of the legitimate WAP and configure their WAP to broadcast an SSID which is identical to that of the legitimate WAP, causing your device to connect to the attacker's WAP instead of to the legitimate one (if I recall correctly, this was done at the 2016 Olympics in Rio).

However, you need not worry as long as you are browsing secure websites (using the HTTPS protocol) and using apps that use secure connections. The encryption standard that protects you is called SSL. When using this standard, your computer encrypts the data beforesending it over the air. What’s more, the data can't be decrypted until it reaches its intended recipient. So even if someone intercepted your communication, they wouldn't be able to make any sense of it because it is encrypted. God forbid if someone breaks SSL—the internet as we know it would grind to a halt, because this standard is what makes possible such things as online shopping and online banking!

The 802.11 family of protocols

A number of different Wi-Fi protocols have been released over the years, each of which have different characteristics. These are the low-level protocols that the security protocols discussed above run “on top of”. I'll refer you to my Computer Networks 101 blog post for a description of these protocols.

Bluetooth

For short-range, wireless communication, Bluetooth is often used. It is designed to do very specific things and is not intended to be general purpose, like Wi-Fi is. A Bluetooth network is called a PAN(personal area network). It is extremely resistant to RFI (radio frequency interference) due to the fact that it hops frequencies about 1,600 times per second.

Every Bluetooth device is assigned a “class”, based on its range. Lower class devices use less power because they don't have to transmit as strong of a signal.

Class 1 100 meters
Class 2 10 meters
Class 3 1 meter

Many different versions have been released over the years (summarized in the table below):

Version Max speed Description
1.1, 1.2 1 Mbps

2.0, 2.1 3 Mbps A feature called Enhanced Data Rate (EDR) improves its max speed.
3.0 + HS 24 Mbps The high speed (HS) feature is optional and uses a Wi-Fi network to achieve the full 24 Mbps bandwidth.
4.0, 4.1, 4.2
“Bluetooth Smart”
24 Mbps Focuses on power consumption, security, and IP connectivity.
5.0 24 Mbps Focused on the “Internet of Things”, aims to be low power.

Infrared

Infrared is most commonly used in remote controls, like the one for your television. But it can also be used to transmit digital information. The Infrared Data Association (IrDA) protocol uses infrared light as its communication medium. However, it is very limited. It only supports speeds of up to 4 Mbps and is half-duplex. And it only has a max range of 1 meter. Plus, it relies on line of sight communication (any physical object placed in its way will break the link). Because of these limitations, IrDA no security features—why bother make any when the computers have to be so close to each other and it’s so easy to block the signal? Note that some computers have what lookslike an infrared receiver, but these are usually used for remote controls, not for IrDA.

Cellular

Lastly, we have cellular. Cellular data connections are often referred to as 1G, 2G, 3G, or 4G. These do not refer to specific standards, but are loose terms that refer to how recent and fast the underlying technology is. At the present time, the fastest cellular technology is LTE. It is considered 4G and theoretically supports speeds of up to 300 Mbps download and 75 Mbps upload.

If you are not in range of a Wi-Fi network, you can tetheryour device to your cell phone. My understanding of this is that you can download apps that do this, but you need to jailbreak your device in order for them to work.

References

07/01/2017 1:13 pm
Every computer has a box called a power supply, which is responsible for supplying electricity to the internal components of the computer. Its main task is to convert the AC (alternating current) power from the electrical outlet to DC (direct current) power, and then dole out the DC power to the computer's internal components. Different parts of the world use different voltage standards for their electrical outlets, so a power supply has to be compatible with the voltage standards in your part of the world. For example, power outlets in North America run at around 115V, and those in Europe generally run at around 230V. Some power supplies have a physical switch on the outside that tell it what voltage to expect (called fixed-input). Others will adjust automatically (called auto-switching).

Due to the nature of AC power, power supplies can take damage over time from something called harmonics. Harmonics is caused by the way in which electrical devices draw power from an AC connection, and is what causes electrical devices to make faint humming sounds. Most power supplies come with circuitry that protect against this, called active power factor correction (active PFC). You should never buy a power supplies that does not have this.

I need more power, Captain!

Every power supply has a maximum amount of wattage it can draw. If the internal components of the computer try to draw more than that, the computer won't work right. For example, if you want to install a brand new, high performance graphics card, you should make sure your power supply has enough available voltage. Note that power supplies are replaceable, so if your current power supply isn't good enough, you can always replace it.

Power supplies do not use all of the AC power it consumes. Some power is lost due to inefficiencies and released in the form of heat. Most power supplies are at least 80% efficient, and they will advertise what their efficiency is on the packaging. A more efficient power supply will consume less power.

It's important to note that power supplies only draw the amount of energy that is actually being used by the computer—they do NOT draw the maximum amount they are capable of. For example, if you have a power supply can that provide a max of 500 W and your computer is only using 200 W, then the power supply will only draw enough power for 200 W. You won't be wasting electricity if you buy a power supply that can supply more power than your computer needs. In fact, it is good to have a such a power supply for two reasons: (1) To allow room for future upgrades and (2) to account for the fact that power supplies produce less wattage over time due to wear and tear.

Rails

The DC power that the power supply generates is doled out through three voltage rails. Each rail supplies a different voltage: 12V, 5V, and 3.3V. The 12V rail is typically used to power devices that have motors of some sort, such as hard disk drives and optical drives, but there is no restriction regarding what each voltage rail can be used for (for example, a high-end graphics card might want to use the 12V rail).

Each rail has a maximum amount of amperage it supports, and this is monitored by circuitry called over-current protection (OCP). Single-rail systems have a single OCP that monitors all the rails. Multi-rail systems have one OCP per rail to monitor each rail. If the amperage in any rail is exceeded, the power supply will shut itself off to prevent damage to itself.  When multi-rail systems were first introduced, they were very unstable due to poorly written specifications, but they have gotten much better since then.  For computers that use a lot of power, like servers and gaming PCs, multi-rail systems give your system extra protection against short-circuits.  For an ordinary, low-wattage desktop PCs, it doesn't really make a difference whether you have a single-rail or multi-rail system.

Power supply standards

Various power supply standards have been released over the years. ATX (also called ATX12V) introduced the idea of providing a constant supply of power (5V) to the motherboard, even when the computer is off. This is called soft power, and it allows the computer to implement various power saving features. This is the reason why you always should always unplug a computer before servicing it! This standard was later improved upon by subsequent standards (below).

ATX12V 1.3 added the P4 connector, which supplies extra power to the motherboard. It also added the AUX connector. The downside to this standard was that it was not specific enough, which resulted in power supply manufacturers producing wildly different power supplies.

EPS12V was created for servers that need more power than the average desktop machine. It added a 24-pin motherboard power connector. It also introduced the idea of “voltage rails” (explained above).

ATX12V 2.0 adopted many of the advancements that EPS12V brought to the table. Notably, it added a 24-pin P1 connector and voltage rails.

Connectors

Many of the different connectors you will see coming out of a power supply are listed in the table below. Yeah! Tables!

Connector Voltages Pins Description
P1 power connector 3.3V, 5V, 12V 20/24 The older variant of this connector has 20 pins. The newer variant (which is backward compatible) has 24 pins and provides more current.

Molex 5V, 12V 4 Typically used to power storage devices, like hard drives.

Mini 5V, 12V 4 This connector used to be used for 3.5” floppy disk drives and isn't used much anymore. You have to be careful when plugging in this connector because it is easy to plug in upside down, which will ruin the device.

SATA power connector 3.3V, 5V, 12V 15 Only used for SATA hard drives. In practice, only the 5 V and 12V voltages are used.

SATA slimline connector 5V 6 A smaller version of the SATA power connector.

SATA micro connector 3.3, 5V 9 Even smaller!  Can't reliably find a photo of this one.
P4 connector 12V 4 Used in conjunction with a 20-pin P1 connector to supply the motherboard with extra power.

AUX connector 3.3V, 5V 6 Also used for supply the motherboard with extra power.

EPS12V
EATX12V
ATX12V 2x4
12V 8 This connector goes by many different names. One half is compatible with the P4 connector.

PCIe Connector 12V 6/8 In some 8-pin connectors, two of the pins are detachable so make them compatible with the 6-pin version. It looks similar to the EPS12V connector, but is not compatible with it.


References

More blog entries >>

How this page works

Last Updated: 1/3/2012

My blog is actually hosted on blogger.com. The way I'm able to display my blog posts here is by parsing the blog's RSS feed. RSS feeds are used by blogs to help alert their avid readers whenever a new post is created. They are just XML files that contain data on the most recent blog posts. They include things like the title and publish date of each post, as well as the actual blog post text. I can use most of the data from my RSS feed without any trouble, but there are a few things I need to tweak in order to display everything properly.

View the source

Fixing the code samples

One tweak is fixing the code samples I often include in my posts. Blogger replaces all newlines in the blog post with <br /> tags. This is a problem because, due to the syntax highlighting library I use, the <br /> tags themselves show up in the code samples. So, I need to replace all of these tags with newline characters. However, I can't just replace all <br /> tags in the entire blog post because I only want to replace the tags that are within code samples. This means that I have to use something a little more complex than a simple search-and-replace operation:

$content = //the blog post
$contentFixed = preg_replace_callback('~(<pre\\s+class="brush:.*?">)(.*?)(</pre>)~', function($matches){
	$code = $matches[2];
	$code = str_replace('<br />', "\n", $code);
	return $matches[1] . $code . $matches[3];
}, $content);

Here, I'm using the preg_replace_callback PHP function, which will execute a function that I define every time the regular expression finds a match in the subject string. I know that each code sample is wrapped in a <pre> tag and that the tag has a class attribute whose value starts with "brush:", so I use that information to find the code samples. Then, for each match the regular expression finds, it calls my custom function, where I have it replace the <br /> tags with newlines.

Fixing the dates

Because the publish dates of each blog post in the RSS feed are relative to the UTC timezone, I also have to make sure to apply my local timezone to each date. Otherwise, the dates will not be displayed correctly (like saying that I made a post at 2am in the morning).

$dateFromRss = 'Tue, 20 Dec 2011 02:30:00 +0000';
$dateFixed = new DateTime($dateFromRss);
$dateFixed->setTimezone(new DateTimeZone('America/New_York'));

Adding Highslide support to images

One extra feature that I included is adding Highslide support to each image (Highslide is a "lightbox" library which lets you view images in special popup windows). To do this, I load the blog post into a DOM, use XPath to query for all links that have images inside of them, and then add the appropriate attributes to the link tag.

$content = //the blog post

//XML doesn't like "&nbsp;", so replace it with the proper XML equivalent
//see: http://techtrouts.com/webkit-entity-nbsp-not-defined-convert-html-entities-to-xml/
$content = str_replace("&nbsp;", "&#160;", $content);

//load the text into a DOM
//add a root tag incase there isn't one
$xml = simplexml_load_string('<div>' . $content . '</div>');

//if there's a problem loading the XML, skip the highslide stuff
if ($xml !== false){
	//get all links that contain an image
	$links = $xml->xpath('//a[img]');
	
	//add the highslide stuff to each link
	foreach ($links as $link){
		$link->addAttribute('class', 'highslide');
		$link->addAttribute('onclick', 'return hs.expand(this)');
	}

	//marshal XML to a string
	$content = $xml->asXML();
	
	//remove the XML declaration at the top
	$content = preg_replace('~^<\\?xml.*?\\?>~', '', $content);
	
	//trim whitespace
	$content = trim($content);
	
	//remove the root tag that we added
	$content = preg_replace('~(^<div>)|(</div>$)~', '', $content);
}

As you can see, the blog post text has to be awkwardly manipulated in order to be read into a DOM and written back out as a string. That's why I have a lot of comments here--when I have to revisit this code in 6 months, I won't be totally confused.

Caching the RSS file

One last thing to mention is that I cache the RSS file so that my website doesn't have to contact Blogger every time someone loads this page. When the cached file gets to be more than an hour old, a fresh copy of the file is downloaded from Blogger.

Back to top