You’ve crafted the perfect text message. The punch of a novel packed into a single SMS, worthy of the bard himself. Your campaign goes off without a hitch. Then, when you take a look at your costs you see they’re four times what you expected. Leading you to think: What the heck is a segment and why am I being charged for so many of them?
When you send an SMS message containing more than 160 characters, the message is split into smaller messages for transmission. Large messages are split into 153-character ‘segments’ and sent individually, then re-assembled by the recipient’s device. For example, a 161-character message will be sent as two messages: one with 153 characters and a second with eight characters.
If you include non-GSM characters, such as Emojis (😀), in SMS messages, those messages have to be sent using the UCS-2 encoding. Messages containing any UCS-2 characters will be limited to 70 characters. UCS-2 messages of more than 70 characters will be split into 67-character segments.
Liveforce bills for every segment sent, so if you have a message with, say, 140 characters and only one or two of them are UCS-2, you can avoid the cost of the second segement by removing those UCS-2 characters, if you can.
Looking Back On The Nokia Brick Phone To Understand Message Segments
Think back to when you first started texting on your good ol’ indestructible Nokia brick. While hammering out messages on a T9 keyboard, you may have noticed a counter ticking down from 160 next to a 1. When that counter hit 0, you’d see that 1 that was sitting next to the 160 jump up to a 2.
This means you’d end up with two messages on your bill. This first number was counting how many characters you had left per segment and the second one was counting how many segments you had used.
What’s Changed About Segments Since Back In The Day
SMS standards have barely changed since the days of the brickphone. Messages are still sent in 140 byte chunks known as message segments.
When Liveforce communicates with carriers to send out SMS messages, we send them one segment at a time. To figure out how many characters this affords you, we’re going to have to do a little math.
A Little Math, Much Clearer Insight Into Segments
Standard SMS encoding uses the GSM 03.38 character set which takes 7 bits to encode a character. 140 bytes x 8 bits in a byte divided 7 bits leaves us with the 160 character message segment.
Message segments are how the SMS industry as a whole counts messages. This means that in addition to your costs, you should also think in terms of segments when you’re analysing SMS.
How Does The Perfect Message Behave?
Going back to your perfect text message, you count up the characters, and something still seems off. You’ve only used 210 characters but it looks like each of these messages has more than two segments.
Part of the answer lies in the encoding. Notice that this message has UCS2 listed as the encoding instead of GSM. To accommodate a message as lit as this one, Liveforce has to use a different character set. When you send messages with non-GSM characters such as Emojis we have to use a different type of encoding known as UCS-2. UCS2 takes 16 bits to encode each character so going back to the math we did above we now have a limit of 70 characters (140 bytes * 8 bits in a byte / 16 bits). Besides emojis you should also be careful with accented characters. GSM 03.38 includes some accented characters such as ñ, à, and ö, but does not include others such as á, í, or ú.
What Exactly Does A Data Header Do?
Still, it looks like with this 70 character limit, this message should still only be three segments, not four. The last piece of the puzzle lies in concatenation. When you send multi segment, messages Liveforce uses User Data Headers to tell the destination how to reassemble it. This takes up 6 bytes per message leaving only 67 characters for UCS2 encoded messages or 153 for GSM encoded messages.
Maybe it turns out the fire emojis aren’t worth it after all. However, when you trim the same message down and resend it, it still doesn’t seem to work out quite right:
This message contains two of the “gotchas” that commonly cause encoding issues: smart quotes and non-GSM spaces. Take a look at this message that appears almost identical:
There are only three characters that have been switched: the spaces between sentences were changed from ‘ ’ to ‘ ’ (U+2002 to U+0020) and the “smart quote” after Shakespeare was replaced with a standard apostrophe ‘ instead ‘ (U+2019 to U+0027). Smart quotes are usually a result of text editors being too darn helpful. Non-GSM spaces are usually a result of copying and pasting. Be extra careful with those as they’re often converted to conventional spaces for display.
Why Is It Only An Estimate?
Be aware that any messages sent that include "Merge Tags" (eg. Crew's name or Job details) could result in a higher number of segments being sent due to the length of the Merge Tags. For example, some of Crew's names maybe long enough to add an extra segment to a message that they receive.
Useful Tool
Here is a link to a tool where you can check how many segments (messages) are in your SMS message