Hi Antal!

I'm asking because I looked into STM32 programming with Rust recently, and platform libraries are starting to show up, but nothing as high level as XPCC. I looked into the SVD files, but I don't think that alternate function pins are encoded in them for example (the thing that enables the type-safe connect() methods).

Have you seen @japaric's post on that topic? 
http://blog.japaric.io/brave-new-io

Unfortunately, SVD files don't contain information beyond the memory map and even worse, the SVD files are often incorrect in some minor way.
Btw, so are the official CMSIS header files that ST ships.
https://github.com/modm-io/cmsis-header-stm32/commit/1ec448711db342deb4d4bf20b670166701a38f26#r27042841

Could someone give me a small description of how the device file generation for STM32 works (both the one currently used in XPCC, and/or the one in MODM)? What I mean is, what information comes from what sources, and how is it put together.

In short: The data comes out of CubeMX, which contains a bunch of XML files.
These are transformed, cross-referenced and filtered to generate these device files.
https://github.com/modm-io/modm-devices/tree/develop/devices/stm32

I've held a talk about how we use these device files in modm at EmBO++17:
https://www.slideshare.net/emBO_Conference/datadriven-hal-generation

These device files are more complete and more correct than the ones in xpcc and have several important bug fixes (like correctly supporting STM32F1 group remap).
I rewrote the generator from scratch to fix these issue, but unfortunately I've made the decision not to publish it. Let me justify this.


1. The internal data in ST's CubeMX is good, but since it's internal data, the format changes and data sometimes gets removed, added, changed.
There is still a lot of manual cleanup that needs to happen. I've even sent ST patches of their own internal data, some of which have been applied, others not.
Unfortunately, the CubeMX data does not contain some of the most important data either. This I have to manually extract from dozens of reference manuals/datasheets, compare and add manually. This includes: some of STM32F1 GPIO remap, ALL groups of memory sizes (yes!), ALL peripheral types and all device groups.

This is an insane amount of work, and I'm not willing to put in the hours for this anymore than is absolutely necessary.
I fear that if I were to publish this generator and it were to become popular (which the Rust community surrounding @japaric absolutely has the ability to do), I'd be overwhelmed with support requests for all sorts of additional data, regardless of how labor intensive it is to procure.

I've spent the last 4 years looking at this data, if it's not in the above device tree data already, it is _not_ trivial to add. An example is the clock tree data, which exists in the raw data, but it's lacking critical data, which again I'd have to manually add. Ultimately the CubeMX data is incomplete and sometimes just plain wrong.


2. There is a real risk of creating a pseudo-standard of assumptions from my work, since there is no other STM32 data available elsewhere at the moment.
I've been following the Linux Device Tree community for a while now, especially the discussion about the Device Tree Specification:
https://github.com/devicetree-org/devicetree-specification

To caricature the discussion so far: A few (truly very smart) engineers are extrapolating their personal, anecdotal experiences of how *they* used data for a few devices into a grand theory of how this data should be formatted and used for *every* other device out there. And they are just simply wrong about that.
You are wrong about what you think you know. You experiences mean nothing, you cannot possibly know all the hardware details of thousands of devices.
ST has ~1200 devices, Atmel ~600 AVRs, and their configuration data is not as regular as you'd like to think.
Here is an (out-of-data) and incomplete overview of ~1000 STM32 devices: http://salkinium.com/stm32.html

A better way would be to formulate your personal experiences as executable assumption checkers, and run them over the raw data.
This is actually what I did in modm in the GPIO driver, because the assumptions I made in xpcc regarding GPIO AF were wrong.
In fact they were _so_ wrong that our assumption check failed on the majority of STM32 devices and we had to rewrite the API from scratch.
https://image.slidesharecdn.com/niklashauser-170228153235/95/datadriven-hal-generation-47-1024.jpg

Now imagine a fuckup like that at device tree level, nobody is rewriting anything. I don't want to be yelled at by Linus.


3. Not a single vendor actually wants to give you this data, it makes them so very deeply comparable to each other. Vendors are already unhappy with the rather detailed Digikey search options. The CubeMX data was a very lucky find, and I have not found data even remotely as extensive as this for other vendors.
So this is an ST only solution (the AVR data is even more incomplete), and so I refer back to point 2.


4. I recently quit my job at ARM working on mbed OS and uVisor because they didn't allow any out-of-the-box thinking on embedded software designs, like we do in xpcc/modm. I mean, they don't even generate their linker-/startup scripts, which is absolutely TRIVIAL to do with even the most basic data. It's insane how much unnecessary manual labor goes into porting/maintaining mbed OS.
My takeaway was that with the entire IoT craze, nobody in this industry takes the time to fix the foundational issues, like HAL design, tool support, library support (don't get me started on Newlib). Most engineers I met can't even point out any flaws in these, they are just so used to it.
I'm still downbeat and angry about that and I need time to gain some distance from that experience.

I've been contributing to xpcc/modm now since 2011, that's a long time.
I'm not a student anymore, I need choose more carefully what I want to spend time on now.
So I'm taking a step back to reorient myself a little and build up motivation again. 
There are also interesting and more lucrative opportunities in other industries.


For now I recommend you use the device files as they are. You're not going to be able to extract any more data from CubeMX without unreasonable effort.
I'll keep them updated with every quarterly release of modm. I'll reevaluate my position if more projects start using that data and people are willing to take up some of the maintenance effort like what has happened in xpcc. That really is a joy to maintain, I feel like people care about this stuff :-)
Make sure to look at modm if you want to generate code/data/documentation from this data, since it was designed for exactly that purpose.

Or use the DeviceFile class directly: https://github.com/modm-io/modm-devices/blob/develop/tools/device/modm/device_file.py
Usage: https://github.com/modm-io/modm/blob/develop/repo.lb#L29-L39


Sorry for being in a bad mood today,
(sad) Niklas