8 minute read

Notice Please realize the content and information of this blog relies on specific versions of software. Follow with caution.

Introduction

Let it spin!

ROS is ubiquitous, everyday people use this powerful framework more and more. And in probably every program written with ROS, there is a spin. No matter how simply or how terribly complicated function the users have defined, they all have to make it spin. But what really happens with that spin? Of course ROS is not powered by hamster wheels. This blog will discuss the actual behind the scene story of ROS’s internal mechanism.

Here the py version of ROS is used for discussion, not because it is easy but because it is hard. The framework have to do some extra works to connect the lower level c/c++ written rcl to rclpy. So if you could follow along the python version workflow you can understand what happens with rclcpp easily. Also, it is recommended that you fire up a debugger and step through the process yourself. This blog is not a comprehensive explanation but rather an overview with a little more details.

The ROS program used is MinimalPublisher, arguably one of the simplest ROS example. ROS version is foxy, installed from APT.

def main(args=None):
    rclpy.init(args=args)

    minimal_publisher = MinimalPublisher()

    rclpy.spin(minimal_publisher)

    # Destroy the node explicitly
    # (optional - otherwise it will be done automatically
    # when the garbage collector destroys the node object)
    minimal_publisher.destroy_node()
    rclpy.shutdown()

Before Spin

In the introduction section above, basic information of spin function is presented. But spin is not the start of ROS, before that, there is an init function that prepares the necessary underlying structures for spin. This init function does a few very important things.

In This Context

The first thing it must do is to get a context. ROS2 wiki says: “(Context) Encapsulates the lifecycle of init and shutdown.”. Very easy, isn’t it? Well the actual story is much more complicated. As you follow along the code, you would notice (in most cases) rclpy creates a Context by calling its __init__ function in context.py. But in that function, it does not create anything all by itself. It actually imports an implementation and calls yet another function to initialize the Context. You would see this line of code.

self._handle = Handle(rclpy_implementation.rclpy_create_context())

“Oh it’s just another pimpl.” No, it is not. It is actually the deep-rooted dark side of ROS. As people all agree that ROS has great modulation, little known was the cost of it. When you have python on top of c on top of c++ on top of c and c++ (not joking here, it is seriously the structure down to rmw layer not including dds), you have to do something terribly terrible. The grimmest and darkest usages in computer systems of our entire history as a conscious specie, dlopen that is. ROS uses a similar idea to connect things from different layers.

Back to the example, this line of code above actually calls a function from another underlying c file called _rclpy. This is the connection joint of rcl c files to the higher level rclpy python files. In rclpy_create_context, ROS would call respective functions of rcl to do memory allocation and initialization. And eventually wrap the pointer in a PyCapsule_Newed object.

Context Sensitive

Getting a context is not enough, ROS also has to populate the content of this context. So summarizing the entirety of Context with just handling the “init and shutdown” is a bit too over simplified. Continue stepping through the function you would encounter the below line of code also in context.py.

rclpy_implementation.rclpy_init(args if args is not None else sys.argv, capsule)

Similar to the function call in the previous code block, this line of code also calls to the underlying c library. But with a twist, this is not the terminus for this function. In rclpy_init’s function body there are various calls like rcl_init_options_init and rcl_init. These function entries, which look quite normal, are where the magic hides. Take rcl_init as an example, one will find several functions starts with the designation of rmw_. These are the calls to even lower level rmw layer functions. And the actual implementations are vendors dependent. If the debugger being used is powerful enough one would be able to observe several functions in different files sharing the same definition. And their path would reveal these functions come from different vendors. For ROS to know which one to use, the idea of dlopen is used again. Through adjusting a RMW_IMPLEMENTATION environment variable the framework would know which middleware to use.

At this point someone would probably be curious of what happens here. Why a function call from rclpy eventually gets handled in the rmw layer. Well, remember the emphasis on modularization and layered abstraction from ROS’s design documents. They are the direct cause of such complicated function calling structure. Moreover, one may notice that this means, the performance, resources, and other various aspects are NOT under rcl’s control. These critical elements depends on the rmw implementation used, and could be drastically different from vendors to vendors. So if one have to forget all but one thing from this blog, remember to choose the rmw and its underlying DDS vendor wisely. They have supreme impact to the system’s key metrics.

After this point, there is not much left to say about the pre-spin init function. One can look around the function calls and feel the efforts from the designers to hide as much detail as possible. Be respectful, without them there would be no ROS and our robotics industry can never be this prosperous.

Spinning

Now we can look at what this spin function actually does. Hopefully the previous chapter has already given a good warm-up and basic knowledge of ROS2’s internal structures.

Executor!

We need better executors. – Every ROS user with performance concerns

Executors are critical in ROS, they are the actual “things” which, well executes both internal framework tasks and external user defined callbacks. For more information about them, one can consult this article which gives more detailed discussion. The focus of this blog is not executor classes so details will not be presented here. One only needs the idea of executors to continue reading.

Similar to context initialization, the executor used need to be created. The default choice is a SingleThreadedExecutor, which has some significant issues. But there is no doubt this type of executors is functional and the discussion of different executors is out-of-scope here. When creating the executor, there are several interesting things to note. The most important one is the presence of GuardConditions. This group serves as a key element in ROS’s lower level code for information transformation. Here it is used to identify whether the WaitSet should be rebuilt. Speaking of WaitSet, it is the core mechanism that moves ROS and the representation of the framework’s “pull style” process logic. More discussion will be provided later. Also, please note there would be a second GuardCondition registered to handle SIGINT situation. So don’t be surprised when more than one GuardConditions were observed.

Another point to note for people who were familiar with ROS1, now the API actually checks for ok() by itself. One can see a few of them in the code flow. It is a better encapsulation effort of the code to save those meaningless seconds spent on the “while-ok” loop.

Execute Order 66

After setting up the executor, the node which the used created in the program would be added to the executor. Then the executor would spin_once in a while-ok loop. This spin_once function is really just a wrapper around a gigantic wait function. Which would use the aforementioned WaitSet to track the changes that happen in the context of this domain(not every type of change). At least the higher level design expects this such behavior. The actual implementation, as one may expect after reading till this point of the blog, depends on the actual vendor’s rmw implementation. FastDDS uses builtin reader entities while CycloneDDS uses a caching mechanism with an acting pooling thread.

Back to the WaitSet itself, six different types of entities can be listened. Subscription, Timers, Clients, Services, Guards, and Waitables, if one is familiar enough with ROS’s communication mechanism it would be obvious that these concepts come from ROS’s communication building blocks. Except the curious Waitable here, it does not belong to any group of ROS’s communication. This type of entities is actually a higher level idea compared to other categories, as the other five class of entities are all Waitables. So theoretically, the only type of entities which the WaitSet should handle is the Waitables. The reason why the other five types of entities are still not replaced by Waitables is probably the amount of work required.

The trigger of one entity normally would be caused by a change in the rmw layer. There when DDS receives a new message it would signal the rcl that some information is available. After a WaitSet entity gets triggered, ROS would do work based on the category of the triggered entity. There would normally be a take to get data from DDS followed by a series of user defined callbacks. This part is rather straight forward but be sure to realize rmw and the underlying DDS implementation plays a huge role here. They would together to define the speed, resources, and pretty much everything one would care about if one digs this deep.

To summarize, the fundaments of ROS is a while loop which actively checks for changes in the system and execute tasks based on the triggered category.

Conclusion

This blog gives a gentle overview on the internal mechanism of ROS. The initialization of ROS and actual implementation of spin function was presented. As mentioned previously, if one really wants to remember just one thing. Remember to choose your DDS vendor wisely.

“Hail the Omnissiah! He is the God in the Machine, the Source of All Knowledge.”


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.