Payload Header Suppression (PHS) is an optional feature in 802.16-2004. Since the bandwidth on the air interface is limited and shared by multiple users within a sector, it makes sense to optimize the use of such bandwidth. The idea of PHS is to remove redundant information in packet headers using known rules. These suppression rules help in reconstructing the header correctly at the receiving end. The rules are agreed in advance between SS and BS. In general, these rules are designed in such a way that fields of the header that do not change for the entire duration of the service flow are suppressed. Only changing fields are transmitted. Since the standard allows for multiple rules for every service flow,

The following important points are made about PHS:

  1. PHS rules are specific to service flows. Thus, a rule agreed for a service flow cannot be applied to another service flow.
  2. Every service flow can have multiple rules with one rule per classifier. Each rule is referenced using an index named PHSI. A classifier is associated with a rule using the PHSI. While the rule for a service flow can be changed, the new rule must be added to the service flow only after deleting the old rule. During this transition when no rule is defined, PHSI=0 will not be used.
  3. PHSI will be omitted if PHS is not enabled. “Not enabled” is taken to mean that PHS rule is not defined.
  4. The use of PHSI=0 preceding a higher layer PDU implies that suppression has not been applied for this PDU though PHS is enabled for this service flow.
  5. Some service flows can have PHS enabled while others may wish to disable it.
  6. When PHS is enabled, option exists to enable verification procedures at the transmitting end. Essentially, this CS layer verifies that higher layers have not changed the values of fields that have been identified for suppression.
  7. While rules can be defined at either end (BS or SS), only the BS can allocated a PHSI.
  8. In general, the responsibility to generate a rule lies with the higher layer (or manually configured using NMS). It is not part of the Convergence Sublayer (CS). This makes good sense for layered design. Nonetheless, intelligence can be built into CS for reasons described later in this document.
  9. Rules can be created in a single message flow or in separate message flows. Typically, one could create a rule at provisioning based on certain known header fields. This could be altered later at service activation in order to achieve greater suppression.
  10. Rule creation can happen as part of the creation of service flow and its classifier. It can also happen in a separate message flow.
  11. Rule creation can use DSA or DSC messages.
  12. Rule deletion can use DSC or DSD messages.
  13. It is possible to delete a PHS rule, delete all rules, add a rule or set a rule. When deleting all rules, PHSI will be ignored. Add a rule that already exists is an error.
  14. PHS rule consists of the following set – {PHSI, PHSF, PHSM, PHSS, PHSV, Vendor-specific PHS parameters}.
  15. If there is any error in rule definition as part of DSA or DSC, the receiver should ignore the message. There is no mechanism in the standard to inform sender the source of error.
  16. Since PHS happens before encryption, the latter has no effect on PHS. In other words, PHSF is constructed based on unencrypted data.

At a high level, enabling PHS is signaled as part of REG-REQ. This indicates capability of SS. PHS will be possible only if both BS and SS are able to support it and agree to use it after negotiation.

A brief summary of all parameters of a PHS rule follows.


This is unique per service flow. This is used to identity the PHS rule. It precedes the higher layer PDU when PHS is enabled. It does not exist when PHS is disabled. If PHS is enabled and suppression is not done, PHSI=0 is used. It has the range 1-255.


This is a specified number of bytes containing header information to be suppressed. It is stored and used on both sending and receiving sides. Number of bytes is same as the value of PHSS.


This is a mask that determines which parts of the PHSF need to be suppressed. A value of 1 indicates a byte to be suppressed. Otherwise, the byte is included in the transmission. This has a maximum length of 8 bytes to cover the range of PHSS. Bit 0 related to first byte of PHSF. Bit n related to (n+1)th byte of PHSF. If PHSM is omitted, the default is to suppress all bytes of the PHSF.


This indicates the size of the PHSF. Since this is just one byte, only a maximum of 255 bytes can be suppressed. During rule negotiation, if this is omitted or is of value zero, PHS is disabled.


This controls verification which can be enabled or disabled as part of the rule definition. In general, it is desirable to have this enabled. If omitted, verification is enabled by default.

ACM Bangalore Chapter was started in 2006 and today it is said to be the most active chapters in India. They conduct a regular monthly TechTalk and I was invited to be the speaker at this month’s event. I was given the liberty to choose the topic. I decided to talk about various aspects of security in wireless cellular systems.

Although I had planned for a 90-minute talk, it stretched an hour more. The audience was more curious than I had expected. The questions were intelligent. The session was quite interactive and it suited well the size of the group. About 50 attended this talk.

I do not intend to write about what I spoke. Slides of my talk be seen on ACM Bangalore’s website. I would like to touch upon some of the interesting questions that the audience posed.

Question#1 – How can a 3G phone with a GSM SIM work on a 3G network?

We must remember that ultimately everything hinges on the security context, which can be either GSM or UMTS. In either case, the same security context should be enabled on the AuC. So if GSM SIM is used, the security context on the AuC ought to be GSM, say a R98- AuC. Triplets are generated and passed on to the VLR or SGSN. Since VLR/SGSN are R99+ and they use UTRAN RAN, VLR/SGSN will have standardized conversion functions (c4 and c5) to convert Kc to CK and IK. CK and IK are then used within UTRAN RAN for securing the air interface.
Question#2 – Does number portability mean that data within an AuC is compromised?

Not really. Number portability does not mean sensitive data from old AuC are transferred to the new AuC. The new operator will issue a new USIM which will have a new IMSI. Number portability only means that MSISDN is kept the same for others to call the mobile. The translation between MSISDN and IMSI is done at a national level register. Such a translation will identify the Home PLMN and the HLR that’s needs to be contacted for an incoming call.

That’s the theory and that’s how it should be done. It will be interesting to know how operators in India do this.
Question#3 – If I am roaming, is the AuC of the visited PLMN involved in AKA?

We know that algorithms in the SIM and AuC are proprietory and kept secret by the operator. So if I am roaming to another PLMN, will that be compromised? The answer is no. Even when roaming, the visited PLMN will contact the HLR of the Home PLMN. It is the HLR which then works with the AuC to perform AKA for the subscriber. Conclusion is that even in the case of roaming, AKA is performed only by the AuC of the Home PLMN. No other AuC is involved.

Question#4 – Why do we have Counter Check Procedure in RRC when we will anyway be unable to decrypt encrypted data if counters are not synchronized?

This procedure was introduced to prevent “man-in-the-middle” attacks. The procedure is invoked to check that all counters are synchronized. It is true that if the receiver is unable to decrypt an already encrypted message, we can probably say that the counters have gone out of synchronization. However, such a case may arise for radio bearers transmitting data. What about those bearers which are idle? Also, RLC-UM and RLC-AM will not know if data has been corrupted or bogus. Only the application can determine that. This procedure facilitates the check of counters on all radio bearers. This gives the network more information. It may close the RRC connection or it may decide to inform MM to start a new AKA.

Question#5 – When changing ciphering key in UMTS, how is the transition from old to new keys managed?

There are activation times within the Security Mode procedure at RRC. Security Mode Command contains RLC SN (RLC UM and AM) and CFN (RLC TM) when the change will be activated on the DL. For the UL, UE send back in the Security Mode Complete the RLC SN at which the change will be made. In addition to this, RLC transmission is suspended on all bearers with exception of the SRB on which the procedure is executed. This is a precaution that takes into account a slow response in receiving Security Mode Complete. Even when RLC entities are suspended they are commanded to suspend only after a certain number of PDUs.

Question#6 – What’s the use of FRESH as an input to f9 integrity algorithm in UMTS?

Changing FRESH gives additional protection without requiring a new AKA for key refreshment. This may happen for instance after SRNS Relocation. However, I have no insights into actual network implementations in this regard.

Question#7 – At which layer do ciphering and integrity happen?

GSM – ciphering happens at PHY in MS and BTS.

GPRS – ciphering happens at LLC in MS and SGSN.

UMTS – ciphering happens at RLC (for UM and AM) and MAC (RLC-TM) in UE and RNC. Integrity happens at RRC in UE and RNC.

Question#8 – When we enter a new location area and Location Updating Procedure is initiated, will it also involve AKA?

Not necessarily. If the CKSN/KSI sent in the Location Updating Request is a valid value and network decides that current keys can continue to be used, no new AKA will be started. For this to be possible, the new VLR must be able to contact the old VLR to retrieve the security context of the mobile.

MPLS on a WiMAX Base Station

Recently I was asked if WiMAX base stations come with MPLS support. This might have sounded an innocuous question for someone from an IP background. If you are from a wireless background like I am, it might sound a little strange. A WiMAX base station provides wireless connectivity at the physical layer. In particular, WiMAX provides last mile connectivity to the end user. It can also be used to provide point-to-point (PTP) backhaul links.

From this perspective, anything above the physical layer can run transparently on WiMAX. While this is so, WiMAX has defined Convergence Sublayers (CS) at its interface which can then be mapped correctly to 802.16 MAC layer before the packets are sent on the wireless channel. The supported CS Specifications include ATM and Packet (IPv4, IPv6, Ethernet, 802.1Q-VLAN) CS. So where does MPLS fit in, if at all?

MPLS is a technology that sits between Layer 2 and Layer 3. It can be seen to be outside the scope of a WiMAX base station and certainly outside the scope of the WiMAX standards. This is where we, as engineers, have to look at the whole thing from a deployment and operational angle.

Firstly, operators want MPLS because of the many advantages it offers. It leverages on both IP and Ethernet, technologies which are cheap and ubiquitous. It offers QoS. It provides multipoint connectivity but in a simpler way than IP. Its faster to switch at Layer 2 using labels than perform routing decisions at Layer 3. The attractiveness about MPLS in the coming years is that it is set to enable the move towards all-IP transport networks. When IP replaces TDM and ATM architectures, MPLS is set to play a major role.

So operators are interested in MPLS. Before they install new devices into their network, they want to know if it supports MPLS. The problem is that there is a clear distinction between core networks and access networks. MPLS is usually limited to the core. However, there has been significant push towards bringing MPLS to the access networks. This enables end-to-end traffic engineering, right up to the WiMAX base station. 3ROAM offers such a base station with MPLS built-in. Likewise, New Edge Networks is another company that is taking MPLS beyond the core to edge networks.

What if the WiMAX base station did not support MPLS? In this case, an MPLS-enabled network would terminate at an MPLS edge router (ingress or egress). This router would then be co-located and connected to the WiMAX base station. The problem for the operator in this situation would be to have a router for every base station. This is simply not cost-effective.

In general, WiMAX base stations operate in bridge mode (Layer 2) or routing mode (Layer 3). If a base station has to be MPLS-enabled, it has to work in Layer 3 mode. In other words, the base station doubles as an ingress/egress router. It does more than simply provide wireless connectivity.

Sprint has in its long-term roadmap this architecture in mind for backhauling of its WiMAX network. Sprint’s WiMAX base stations would be  MPLS-enabled and the backhaul between such a base station and its ASN Gateway would be IP over MPLS. One of Sprint’s providers for its WiMAX backhaul is Ciena which uses PBB-TE. This may very well carry IP over MPLS right up to the base station. The backhaul itself is wireless with equipment supplied by DragonWave.

To answer the question with which we started, WiMAX base stations may very well support MPLS for cost-effectiveness from operator’s point of view. Practically, it makes a difficult case since the MPLS-enabled network is likely to be from a different provider than the base station itself. Integration becomes an issue. It is well-known that managing an MPLS network is a challenge and requires a steep learning curve. Nonetheless, MPLS may be that small factor to choose one base station over another when an operator is unable to decide otherwise.

While HTML has been the traditional markup language for the wired world, it was too complex and not strictly structured to suit the wireless world. To start with, mobile devices of the early days were power hungry and less on resources. Due to such limitations devices would end up implementing them in a partial way leading to a disconnect between content providers and mobile browser support. It is therefore no surprise that in the early days mobile devices did not support HTML. Rather, a few proprietary markup languages reared their independent heads (HDML from Openwave, ITTP from Ericsson, TTML from Nokia). From the point of adoption, they were never independent. Independence comes only by way of standardization.

Standardization is a beautiful thing. Like we use English universally as a language of choice, it enables content to run on any number of browsers that conform to the standard. It builds competence by enabling developers to learn one language and use it across many projects and companies. It is something that’s good for the industry, those who work in it and consumers of devices. It is that comfort and familiarity that in a fast changing and varied world, there are some things that help us (and our devices) to connect.

The currently recommended standard for the wireless world comes from Open Mobile Alliance (OMA). It is called XHTML-MP which was standardized way back in 2000. Almost a decade has gone and it is still going strong. It’s predecessor, WML, was in rage for many years but all new websites meant for the wireless world are using XHTML-MP. When WML came out, it was right for the mobile devices of the time. HTML was too complex for those devices with small screen, static display, monochrome rendering and little by way of scrolling. WML had this concept of deck/cards that made best use of precious air resources. It introduced compression in the form of WAP Binary XML (WBXML). It had programmable softkeys that was a great idea to ease user interaction. It enabled client-side scripting through its WML Scripting. So WML was a success, at least for a while.

Timeline of Markup Languages

Figure 1: Timeline of Markup Languages

The problem with WML was that it was too different from HTML. When WML came out in 1998, HTML had been around for a good number of years. This coupled with the steady growth of the Internet and the popularity of HTML, meant that it had it all – better browers, better and larger developer community, better tools. Websites written in HTML could not easily be viewed on mobile devices. Site developers had to re-write in WML.

In an effort to bring WML closer to HTML, a radical approach was adopted. Dump WML and create something new out of XHTML (eXtensible HyperText Markup Language), a new standard. This may be appalling to many on account of its backward incompatibility but this has been a sensible and logical decision on part of the standardization community. During the time WML has evolved, a new language had taken shape that had revolutionized every other language – XML. XML had become a popular language for data representation. It was being used in backend processing, data storage or transfer. It was being used on websites to represent data processed or rendered on client-side by Javascript or Flash. XHTML is in fact a combination of both XML and HTML – the former provided strictness of structure while the latter provided document semantics.

XHTML formed the basis of XHTML Basic, a stripped down version that was suitable for adoption by the mobile community. That’s exactly what they adopted it but they extended XHTML Basic into what we call XHTML-MP, XHTML Mobile Profile. This is the beauty of XHTML – it is extensible and modular without loss of strictness in its syntax. One of the extensions over XTHML Basic is the use of WAP CSS. XHTML-MP is also influenced by cHTML (Figure 2), which is a standard used by NTT DoCoMo in its iMode.

Evolution of XHTML-MP

Figure 2: Evolution of XHTML-MP

XHTML-MP is the official markup language for WAP 2.0. All sites for WAP 2.0 must use only XHTML-MP. Although WML 2.0 exists, it is only WML 1.x re-engineering with XHTML syntax for backwards compatibility. WML 2.0 is rarely used and is discouraged.

XHTML-MP has worked well so far. It’s future however is not so certain. With devices getting better all the time, with mobile phones getting almost as good as laptops, with better browsers such as Opera Mini that is able to display full websites in XHTML on devices with bigger colour screens, site developers may forget about XHTML-MP in the years to come.

There are so many different techniques to test software. Sometimes it is difficult to decide what technique is most suitable. In part, the decision also depends on the software development phase. A techique that applies at unit test phase may not necessarily be suitable for acceptance testing.

Table 1 is a summary of the different techniques commonly used, mapped against testing phases. All applicable mappings are shaded in green. The table also gives the mapping of the roles of particular teams during the testing cycle. Definitions of different testing techniques can easily be found on many websites [1,2] and books [3,4].

Table 1: Testing Techniques and Roles Mapped against Testing Phases

Testing Techniques and Roles Mapped against Testing Phases

This mapping is based on my personal experience with testing. Every system is different. In some cases it may make sense to use a particular technique in a certain phase even if such a mapping is not listed in Table 1. For example, the table indicates the “Load & Stress Testing” does not apply to the integration phase. In some projects, it may make sense to do this during integration if the designer knows that the performance bottleneck for the system is at the interfaces.

Knowing correctly what to test – which dictates how to test – when indicates a certain maturity of the product team and management. It involves an understanding of the sytem, in and out. In involves anticipating things that could go wrong. It involves application of prior experience and collective knowledge building.

For example, it is easy to understand why “Usability Testing” should be performance during product acceptance but why would anyone want to do that during “Unit Testing”? Such a test performed at an earlier stage gives scope for user feedback and avoid expensive rework at a later stage. Another example is “Intrusive Testing” which is a dangerous activity during system integration and beyond, simply because the system could be delivered with the changes. If a similar test is needed at system integration phase, “Negative Testing” is better suited.

It will be seen that some activities span the entire test cycle. Regressing and test automation go hand-in-hand in many projects. The degree of automation and regression vary from project to project based on how much of present resources can be spared or how often the product undergoes changes.

Considering the test teams, it is seen in Table 1 that either development team or integration team may perform integration. The former is true of small organizations and the latter for bigger organizations. In general, there is no separate team for performing system integration. This is generally done by the test team straight after system testing.


  1. http://www.aptest.com/testtypes.html
  2. http://www.softwaretestinghelp.com/types-of-software-testing/
  3. John Watkins. Testing IT: An Off-the-Shelf Software Testing Process. Cambridge University Press. 2001.
  4. Publications of the Software Testing Institute.

Testing for Robustness

A couple of years ago I was tasked with writing a MAC decoder for HSDPA packets. Writing the decoder wasn’t that difficult but one of the requirements was to make it robust. What does it mean to make a decoder robust? In the simplest sense,

  • It should not crash even when given an input of rubbish.
  • It should identify errors and inform the user.
  • It should do a best-effort job of decoding as much of the input as makes sense.
  • It should exit gracefully even when configuration of HS MAC and PHY are null or invalid.

In the adopted implementation, the decoder would parse the input bit stream and decode it on the fly. It will flag errors as it encounters them. It will continue to decode as much of the input stream as possible and flag multiple errors when encountered. Naturally, to perform decoding of HSDPA packets, HSDPA configuration at MAC is a necessary control input to the decoder. In addition, we wanted to make the output user-friendly. We wanted to map the data stream to HS-SCCH, HS-DPCCH and HS-PDSCH configuration as well.

Once the decoder was coded, the next important task was to test it. Robustness of design is only as good as the tests on the design. It is customary to perform smoke tests for typical scenarios. Then there are boundary value tests. Then there are stress tests which are applicable more at the system level than at the module level. There are also performance tests, which was not a major concern for our platform.

Because the decoder parses configuration as well, it was important that the decoder considers the entire vector space of configuration as well.

The following possible decoding errors where identified:

  • genericError
  • hsDpcchConfigurationNull
  • hsPdschConfigurationNull
  • hsScchChannelInvalid
  • hsScchConfigurationNull
  • macConfigurationNull
  • numberOfMacDPdusOutofRange
  • queueIdentifierInvalid
  • selectedTfriIndexInvalid
  • sidInvalid
  • subFrameNumberOutofRange
  • tooManySidFields
  • transportBlockSizeIndexInvalid
  • transportBlockSizeIndexOutofRange
  • unexpectedEndOfStream
  • zeroMacDPdus
  • zeroSizedMacDPdus

Arriving at these possibilities requires a little bit of analysis of the Mac-hs PDU structure. One must look at all the fields, the range of valid values and the dependencies from one field to another. One must also look at all these in relation to the constraints imposed by the configuration.

Unit tests were created using BOOST. In particular, BOOST_AUTO_UNIT_TEST was used. This was already the case with most of the modules in the system. It’s so easy to use (like JUnit of Java) that it encourages developers to test their code as much as possible before releasing it for integration. If bugs should still remain undiscovered or creep in due to undocumented changes to interfaces, these unit tests can be expanded easily to test the bug fix. For some critical modules, we also had the practice of running these unit tests as part of the build process. This works well as an automated means of module verification even when the system is built on a different platform.

Below is a short list of tests created for the decoder:

  • allZeroData
  • hsPdschBasic
  • macDMultiplexing
  • multipleSID
  • multiplexingWithCtField
  • nonZeroDeltaPowerRcvNackReTx
  • nonZeroQidScch
  • nonZeroSubFrameCqiTsnHarqNewData
  • nonZeroTfri16Qam
  • nullConfiguration
  • randomData

It will be apparent that tests are not named as “test1”, “test2” and so forth. Just as function names and variable names ought to be descriptive, test names should indicate the purpose they serve. Note that each of the above tests can have many variations both in the encoded data input stream and the configuration of MAC and PHY. A test matrix is called for in these situations to figure out exactly what needs to be tested. However, when testing for robustness it makes sense to test each point of the matrix. Where the inputs are valid, decoding should be correct. Where they are invalid, expected errors should be captured.

In particular, let us consider the test name “randomData”. This runs an input stream of randomly generated bits (the stream itself is of random length) through the decoder. It does this for each possible configuration of MAC and PHY. The test is to see that the decoder does not crash. Randomness does not guarantee that there will be an error but it does make a valid test to ensure the decoder does not crash.

While specific tests gave me a great deal of confidence that the decoder worked correctly, it did not give me the same confidence about its robustness. It was only after the random data test that I discovered a few more bugs, fixed them and went a long way in making the decoder robust.

Data flow for Mac-hs Decoder Testing

Figure1: Data flow for Mac-hs Decoder Testing

I will conclude with a brief insight into the data flow during testing. This is illustrated in Figure 1. Let us note that,

  • ASN.1 is used as the input for all unit tests. ASN.1 is widely used in protocol engineering. It is the standard in 3GPP. It makes sense to use an input format that easy to read, reuse and maintain. Available tools (such as the already tested Converter) can be reused with confidence.
  • Converters are used to represent ASN.1 content as C++ objects.
  • Comparison between decoded PDU and expected PDU is done using C++ objects. A comparison operator can do this elegantly.
  • A third-party ASN.1 encoder is used to generate the encoded PDUs. This gives independence from the unit under test. An in-house encoder would not do. A bug in the latter could also be present in the decoder invalidating the test procedure.
  • It is important that every aspect of this test framework has already been tested in its own unit test framework. In this example, we should have prior confidence about the Converter and the Compare operator.

This morning I attended a two-hour presentation by Ankita Garg of IBM, Bangalore. The event was organized by ACM Bangalore at the premises of Computer Society of India (CSI), Bangalore Chapter on Infantry Road. It was about making Linux into an enterprise real-time OS. Here I present a summary of the talk. If you are more curious, you can download the complete presentation.


How does one define a real-time system? It’s all about guarantees made on latency and response times. Guarantees are made about an upper bound on the response time. For this to be possible the run-time behaviour of the system must be optimized by design. This leads to deterministic response times.

We have all heard of soft and hard real time systems. The difference between them is that the former is tolerant to occasional lapse in the guarantees while the latter is not. A hard real-time that doesn’t meet its deadline is said to have failed. If a system can meet its deadline 100% of the time then it can be formally called a hard real-time system.

Problems with Linux

Linux was not designed for real-time behaviour. For example, when a system call is made, kernel code is executed. Response time cannot be guaranteed because during its execution an interrupt could occur. This introduces non-determinism to response times.

The scheduling in Linux tries to achieve fairness while considering the priority of tasks. This is the case with priority based Round Robin scheduling. Scheduling is unpredictable because priorities are not always strict. A lower priority process could be running at times until the scheduler gets a chance to reschedule a higher priority process. Kernel is non-pre-emptive. When kernel is executing critical sections interrupts are disabled. One more problem is the resolution of timer which is at best 4 ms and usually 10 ms.

Solutions in the Real-Time Branch

A branch of development has been forked from the main branch. This is called CONFIG_PREEMPT_RT. This contains enhancements to support real-time requirements. Some of these enhancements have been ported to the main branch as well.

One important change is on spin locks. These are more like semaphores. Interrupts are not disabled so that these spin locks are pre-emptive. However, spin locks can be used in the old way as well. It all depends if the APIs are called for spinlock_t or raw_spinlock_t.

The sched_yield has a different behaviour too. A task that calls this is added back to the runnable queue but not at the end of the queue. Instead it is added to its right level of priority. This means that a lower priority process can face starvation. If such a thing does happen, it is only because design is faulty. Users need to consider setting correct priorities to their tasks. There is still the problem of priority inversion which is usually overcome using priority inheritance.

There is also the concept of push and pull. In a multiprocessor system, decision has to be made about the CPU where a task will run. A task waiting in the runnable queue of a particular CPU can be pushed or pulled to another CPU depending on tasks just completed.

Another area of change is IRQ. IRQ is kept simple while the processing is moved to an IRQ handler. There was some talk on soft IRQ, top half and bottom half, something I didn’t understand. I suppose these will be familiar to those who have worked on interrupt code on Linux.

In plain Linux, timers are based on the OS timer tick. This does not give high resolution. High resolution is achieved by using programmable interrupt timers, which requires support from hardware. Thus timers are separated from the OS timer ticks.

Futex is a new type of mutex that is fast if uncontested. It happens in the user space. Only if the mutex is busy it goes to kernel space and it takes the slower route.

In IBM, the speaker mentioned the tools she had used to tune the system for real-time: SystemTap, ftrace, oprofile, tuna

Proprietary Solutions

Other than what’s been discussed above, other solutions are available – RTLinux, L4Linux, Dual OS/Dual Core and using Virtual Logic, timesys… There was not a lot of discussion about these implementations.

Enterprise Real-Time

Why is real-time behaviour important for enterprises? This is because enterprises make guarantees through Service Level Agreements (SLA). They guarantee certain maximum delays which can only be achieved on an RTOS. The greater issue here is that such delays are not limited to the OS. Delays are as perceived by users. This means that delays at the application layer have to be considered too. It is easy to see that designers have to first address issues of real-time at the OS level before considering the same at the application layer.

The presenter gave application examples based on Java. Java, rather than C or C++, is more common for enterprise solutions these days than perhaps a decade ago. The problem with Java is that there are factors that introduce non-determinism:

  • Garbage collection: runs when system is low on free memory

  • Dynamic class loading: loads when the class is first accessed

  • Dynamic compilation: compiled once when required

Solutions exist for all these. For example, the garbage collector is made to run periodically when the application task is inactive. This makes response times more deterministic. Static class loading can be performed in advance. Instead of just-in-time (JIT) compilation, ahead-of-time compilation can be done – this replaces the usual *.jar files with *.jxe files. Some of these are part of IBM’s solution named WebSphere Real-Time.

There is wider community that is looking at RTSJ – Real-Time Specifications for Java.


Real-time guarantee is not just about the software. It is for the system as a whole. Hardware may provide certain functionality to enable real-time, as we have seen for the case of higher resolution of timers. Since real-time behaviour is about response times, in some cases performance may be compromised. Certain tasks may be slower but this is necessarily so because they there far more important tasks that need a time guarantee. There is indeed a trade-off between performance and real-time requirement. On average, real-time systems may not have much better response times than normal systems. However, building a real-time system is not about averages. It is about an absolute guarantee that is far difficult to meet on a non-real-time system.