looking for an example of using ReadVisdData to read CAN/CAN FD messages。 #110

windfenggg · 2024-11-29T09:38:09Z

windfenggg
Nov 29, 2024

Dear ihedvall,

Hello, I am currently able to use your ReadData example to read signals. However, when it comes to reading CAN(FD) messages, I'm not sure how to utilize ReadVisdData. Could you kindly provide an example for reading messages, similar to the ReadData example? Thank you.

Best regards.

#include <mdf/mdfreader.h>
#include <mdf/ichannelgroup.h>
#include <mdf/idatagroup.h>

using namespace mdf;

int main() {
    MdfReader reader("c:/mdf/example.mf4");  // Note the file is now open.
    reader.ReadEverythingButData();

    const auto* mdf_file = reader.GetFile(); 
    DataGroupList dg_list;                  
    mdf_file->DataGroups(dg_list);

    for (const auto* dg4 : dg_list) {
      ChannelObserverList subscriber_list;
      const auto cg_list = dg4->ChannelGroups();
      for (const auto* cg4 : cg_list ) {
        const auto cn_list = cg4->Channels();
        for (const auto* cn4 : cn_list) {
          auto sub = CreateChannelObserver(*dg4, *cg4, *cn4);
          subscriber_list.push_back(std::move(sub));
        }
      }
      reader.ReadVisdData(*dg4, , , );    
    }
    reader.Close(); 
}

Answered by ihedvall

Dec 2, 2024

@windfenggg
Added a unit test for the CAN FD file. Just printed out the first 5 samples.

TEST_F(TestRead, TestCanFd) {
  std::string test_file = GetMdfFile(kCanFdFile.data());
  if (test_file.empty()) {
    GTEST_SKIP_("CAN FD file not found");
  }

  MdfReader reader(test_file);
  const bool read_config = reader.ReadEverythingButData();
  ASSERT_TRUE(read_config);

  const MdfFile* mf4_file = reader.GetFile();
  ASSERT_TRUE(mf4_file != nullptr);

  std::vector<IDataGroup*> dg_list;
  mf4_file->DataGroups(dg_list);
  EXPECT_EQ(dg_list.size(), 3);

  for (IDataGroup* data_group : dg_list ) {
    ASSERT_TRUE(data_group != nullptr);

    // Subscribe on all channels
    ChannelObserverList o…

View full answer

ihedvall · 2024-11-29T16:33:10Z

ihedvall
Nov 29, 2024
Maintainer

@windfenggg
VLSD stands for Variable Length Signal Data and is typically used for string and byte array data. The CAN data message consist of timestamp, CAN ID, length and some data bytes. CAN is always 1-8 data bytes while CAN FD can store up to 64 bytes of data.

Just to confuse everything, there exist a so-called VLSD channel group (CG). This channel group is used to store the bytes while channel group with the VLSD channel only stores an offset into that VLSD channel group. The VLSD group cannot have any channels. By this trick, the samples can be appended to the MDF file.

The ReadData() function reads in all samples including all the VLSD data bytes, into the primary memory. CAN bus loggers should follow the MDF standard, so there should be a 'CAN_DataFrame' channel group in each measurement.

If you have a large file (> 10GB), the read may fail due to out-of-memory. Instead of read everything, you need to read the file with some partial read. There exist some ReadXXX() functions for doing partial reads.

Instead of ReadData() you may call ReadPartialData() and add a range of samples.

The ReadVLSDData() is little bit tricky to use. It was made to solve the problem when each sample contains a video stream. These files are large (~100GB). The trick is to set the ReadVLSDData property to false on the VLSD channel observer to false. The channel observer will now only read in the offset for each sample. This read is done with the ordinary ReadData() function and is fast. This read are followed by a ReadVLSDData() call, with the entire offset list or just a part of the offsets. You should add a callback function that is called for each sample.

As you notice the ReadVLSDData() function is not so user friendly and not very useful for small byte arrays as CAN (FD) byte arrays. Another method is to not use channel observers and instead attach an Sample Observer that has an OnSample callback. It's up to you to handle each sample but you can now optimize/filter the read as you want.

There are some (Google) unit tests that shows the usages (mdflib_test/src/testread.cpp):

TestRead, TestLargeFile. The ReadVLSDData() function (22GB file).
TestRead, TestPartialRead. This test use the ReadPartialRead() function.
TestRead, TestCssSampleObserver. This test use the sample observer.

Ingemar Hedvall

0 replies

ihedvall · 2024-12-01T10:23:44Z

ihedvall
Dec 1, 2024
Maintainer

@windfenggg
Some simpler answer. The ReadData() function should read in all signal data including the CAN (FD) messages. You can use the ReadPartialData(), if you only need to read a part of the samples.

After the ReadData(), the channel observer includes all samples and its data in primary memory. You should scan through the channels and get its data by calling the IChannelObserver::GetChannelValue<T>() function or the IChannelObserver::GetEngValue<T>() function. The channel value is a non-scaled value while the engineering value is a scaled value. An external application normally uses the GetEngValue() function.

You need to keep some track of what type of data the channel holds. For example GetEngValue<double>() will not return any valid value if the data type is a byte array (VLSD). Instead call the GetEngValue< std::vector<uint8_t> >() to get a byte array (CAN message byte).

You can start optimizing the read of the file in a second step. For example you may check if the channel group has any samples and if it doesn't have any samples, not subscribing of its channels.

You can also add a single sample observer instead of a bunch of channel observers. You can inherit or add a callback to the observer. The callback will be called for each channel group and sample. This method reduce the memory requirements and is in theory faster. This method require that you are responsible for the caching of samples so if the channel observers working OK for you, continue to use them.

Ingemar Hedvall

0 replies

windfenggg · 2024-12-01T11:54:13Z

windfenggg
Dec 1, 2024
Author

@ihedvall
Dear ihedvall,
Thank you for your detailed response and patience. I have implemented the ISampleObserver interface and overridden the OnSample function to successfully read the CAN messages.
Best regards.

0 replies

windfenggg · 2024-12-02T03:01:28Z

windfenggg
Dec 2, 2024
Author

@ihedvall
Dear ihedvall,
Hello, when I use the sample observer to read each CANFD message and print the bytes for debugging, the byte information I get is inconsistent with what CANoe generates. The data field information is all incorrect. Is it because CANFD messages cannot be processed in the same way as CAN messages? I am not sure if you have encountered any type-related issues when reading CANFD messages. Thank you."
Best regards.

0 replies

ihedvall · 2024-12-02T06:54:16Z

ihedvall
Dec 2, 2024
Maintainer

@windfenggg
Yes, it might be an error. I don't have your MDF file but typical CAN bus logger file have the below configuration and output. You can use the MDF File Viewer if you want to inspect your file.

If you look closer on the CG (CAN_DataFrame). It consist of 2 channel time and frame data. The frame data is packed byte array with CN ID, DLC and data bytes, packed into a byte array. Note that the frame data channel have sub-channel. These sub-channel describes how the frame data is packed. It's these sub-channels that are of interest for you.

It seems to be some error regarding of the CAN_Frame data. It is strange error as the sub-channel data is OK. I will check if this a presentation error or something else.

The below is snapshots from the MDF Viewer. You can fetch the Windows executable in the GitHub release area.

If you have a simple MDF example file that you can send over, it would simply the fault tracing.

0 replies

ihedvall · 2024-12-02T08:28:07Z

ihedvall
Dec 2, 2024
Maintainer

@windfenggg
Sorry. I have mislead both you and myself. The previous outputs are OK. The CAN_DataFrame byte array is of fixed size. It holds all sub-signal values except the CAN_DataFrame.DataBytes that holds an offset into to the VLSD CG block. Well it is little tricky.

For you in the sample observer OnSample() function, The record ID(1) reports all the CAN frames flags and the timestamp but not the CAN data bytes. The record(2) reports the CAN data bytes.

The channel observers seems to solve this cross-reference why they gives a simple solution. See also ChannelObserver::OnSample() (channelobserver.cpp).

Your log file may use some other configuration so we need to figure out how to parse the file.

Ingemar Hedvall

0 replies

windfenggg · 2024-12-02T08:36:59Z

windfenggg
Dec 2, 2024
Author

@ihedvall
You're too kind, it might be that I didn't express myself clearly. Thank you very much for your work; it can process CAN messages correctly. I'm wondering if CAN FD hasn't been fully supported yet, or if I can't read CAN FD in the same way as CAN messages.I'm very glad to receive your reply. Now, I have checked the CAN messages using the MDF Viewer, and there are no issues. I can also parse the CAN_DataFrame byte by byte. Below is the inspection of the CAN messages.

However, when I use the MDF Viewer to inspect CAN FD messages, the CAN_DataFrame and data field bytes appear very strange. It seems like some error has occurred, and the byte information is not being displayed correctly like it is for the CAN messages, as shown below. I have an ASC CAN FD message example file, but the tool to convert the example file to MDF (CANoe) is on an internal network. I'm not sure if you have access to any related tools that can convert ASC to MDF, which might help with troubleshooting the issue. Thank you.

0 replies

ihedvall · 2024-12-02T09:32:55Z

ihedvall
Dec 2, 2024
Maintainer

@windfenggg

If you look at the CAN file, it uses something called MLSD (Max Length Signal Data). The CAN_DataFrame normally all CAN message flags + an offset into signal data. The offset is a 8 byte (uint64_t) value into a signal data blob. This blob can be stored in a CG VLSD block or in an external SD block. The MLSD configuration stores the data bytes instead of the offset thus there is no need for an external CG-VLSD/SD block. Number of bytes is fetched from the length.

When the MDF file now shall store more than 8 bytes, it has to store the CAN data bytes in either a CG-VLSD block or a SD block. The CAN_DataFrame.DataBytes seems to have the offset data (uint64_t) value instead of the actual data value. My suspect is that a SD block is used to store the data bytes. If you have the possibility to open the MDF file in the MDF Viewer and check the CAN_DataFrame.DataBytes channel configuration. It is the flag and its reference index to Signal Data that is of concern.

0 replies

ihedvall · 2024-12-02T10:36:37Z

ihedvall
Dec 2, 2024
Maintainer

@windfenggg

I have several files with MLSD and CG-VLSD data storage but I have no file by some other tool that uses SD storage. I test the MDF SD storage against my own writer, which isn't optimal. It should be the configuration of the CAN_DataFrame.DataBytes channel that controls this.

I don't have any ASC to MDF converters. Just need a hint in the right direction. I append the CAN SD configuration that my unit test uses.

0 replies

ihedvall · 2024-12-02T10:37:31Z

ihedvall
Dec 2, 2024
Maintainer

@windfenggg
Forgot to append.

0 replies

windfenggg · 2024-12-02T10:52:10Z

windfenggg
Dec 2, 2024
Author

@ihedvall
Thank you very much for your reply. I have converted the file to MF4 using another tool. This is a CANFD MF4 file. Could you please check when you have time to see if the CAN_DataFrame can be retrieved properly from it?

0 replies

windfenggg · 2024-12-02T11:14:32Z

windfenggg
Dec 2, 2024
Author

@ihedvall
I suspect that the DataBytes of the CANFD message might have been compressed, which is causing the incorrect byte information to be retrieved. The SD view you have might be correct, because there is no DZ block under the DataBytes of a CAN message, but there is one under CANFD.

0 replies

ihedvall · 2024-12-02T11:28:28Z

ihedvall
Dec 2, 2024
Maintainer

@windfenggg
I tested the file in the MdfViewer and it reads the 1234.. data bytes as expected. It's mocking us. I go for lunch, it's something simpler.

0 replies

windfenggg · 2024-12-02T11:43:04Z

windfenggg
Dec 2, 2024
Author

@ihedvall
Thank you so much! It seems that your project is very versatile, and I also appreciate your open-source spirit. I hope you enjoy your lunch! When you have some free time, I would love to learn how you managed to correctly read CANFD messages.
Best regards！

0 replies

ihedvall · 2024-12-02T20:01:26Z

ihedvall
Dec 2, 2024
Maintainer

@windfenggg
Added a unit test for the CAN FD file. Just printed out the first 5 samples.

TEST_F(TestRead, TestCanFd) {
  std::string test_file = GetMdfFile(kCanFdFile.data());
  if (test_file.empty()) {
    GTEST_SKIP_("CAN FD file not found");
  }

  MdfReader reader(test_file);
  const bool read_config = reader.ReadEverythingButData();
  ASSERT_TRUE(read_config);

  const MdfFile* mf4_file = reader.GetFile();
  ASSERT_TRUE(mf4_file != nullptr);

  std::vector<IDataGroup*> dg_list;
  mf4_file->DataGroups(dg_list);
  EXPECT_EQ(dg_list.size(), 3);

  for (IDataGroup* data_group : dg_list ) {
    ASSERT_TRUE(data_group != nullptr);

    // Subscribe on all channels
    ChannelObserverList observer_list;
    CreateChannelObserverForDataGroup(*data_group, observer_list);
    EXPECT_FALSE(observer_list.empty());

    const bool read = reader.ReadData(*data_group);
    EXPECT_TRUE(read);

    for (uint64_t sample = 0; sample < 5; ++sample) {
      
      bool valid = false;
      uint16_t channel = 0;
      bool direction = false;
      uint32_t can_id = 0;
      std::vector<uint8_t> data_bytes;

      for (const auto &observer : observer_list) {
        if (observer->Name() == "CAN_DataFrame.BusChannel") {
          valid = observer->GetEngValue(sample, channel);
          EXPECT_TRUE(valid);
        } else if (observer->Name() == "CAN_DataFrame.Dir") {
          valid = observer->GetEngValue(sample, direction);
          EXPECT_TRUE(valid);
        } else if (observer->Name() == "CAN_DataFrame.ID") {
          valid = observer->GetEngValue(sample, can_id);
          EXPECT_TRUE(valid);
        } else if (observer->Name() == "CAN_DataFrame.DataBytes") {
          valid = observer->GetEngValue(sample, data_bytes);
          EXPECT_TRUE(valid);
        }
      }
      if (valid) {
        std::cout << "Sample: " << sample
                  << " Channel: " << channel
                  << " Direction: " << (direction ? "Tx" : "Rx")
                  << " Data: ";
        for (uint8_t data : data_bytes) {
          std::cout << std::hex << std::uppercase << std::setw(2)
                    << std::setfill('0') << static_cast<int>(data) << " ";
        }
        std::cout << std::endl;

      }
    }

Output:
Sample: 0 Channel: 1 Direction: Rx Data: 12 34 56 78 00 00 00 00
Sample: 1 Channel: 1 Direction: Rx Data: 12 34 56 78 00 00 00 00
Sample: 2 Channel: 1 Direction: Rx Data: 12 34 56 78 00 00 00 00
Sample: 3 Channel: 1 Direction: Rx Data: 12 34 56 78 00 00 00 00
Sample: 4 Channel: 1 Direction: Rx Data: 12 34 56 78 00 00 00 00
Sample: 0 Channel: 2 Direction: Rx Data: 12 34 56 78 00 00 00 00
Sample: 1 Channel: 2 Direction: Rx Data: 12 34 56 78 00 00 00 00
Sample: 2 Channel: 2 Direction: Rx Data: 12 34 56 78 00 00 00 00
Sample: 3 Channel: 2 Direction: Rx Data: 12 34 56 78 00 00 00 00
Sample: 4 Channel: 2 Direction: Rx Data: 12 34 56 78 00 00 00 00

0 replies

windfenggg · 2025-02-11T03:08:53Z

windfenggg
Feb 11, 2025
Author

@ihedvall
Currently, I am using a single thread implementation where each signal corresponds to one CG (Channel Group), but all are stored in a single DG (Data Group). As a result, the DG block keeps growing larger. Do HL/DL/DZ configurations have related API interfaces? Specifically, when writing a DG block, there is a HL (High-Level) under it, a DL (Data-Level) under the HL, and the DL can have multiple DZ (Data-Zone) blocks. Once a DZ block reaches a certain size, it automatically creates a new DZ block.

1 reply

ihedvall Feb 11, 2025
Maintainer

@windfenggg
The MDF Basic Writer add samples to the last DG block only. Compression is a new feature in MDF 4.1. You cannot add samples as they arrive. Instead they needs to be queue up (cache) until the queue have ~4MB of data. This 4MB (DZ) chunk is compressed and saved to the MDF file. To keep track of each chunk of data 2 index block are added (HL/DL). If you enable compression, the writer will now start using this storage method.

The HL/DL/DZ method can be used to solve the problem with storing samples in several DG block at the "same" time. This method is currently not supported but should be possible to implement as most of the mechanism already exist. There will be just minor changes in your code but you need to create an "MdfSortedWriter" instead of an "MdfBasicWriter".

Do you want me to implement this type of MDF writer?

windfenggg · 2025-02-11T11:27:01Z

windfenggg
Feb 11, 2025
Author

@ihedvall
Yes, currently, modifying the underlying block organization code is still challenging for me. I have submitted an issue and hope that you can implement this type of writer when you have time. It should be able to store samples in multiple DG blocks at the "same time." Thank you!

0 replies

sy950915 · 2025-04-02T10:41:07Z

sy950915
Apr 2, 2025

@ihedvall Hello, I see that the current system already supports storing CAN/CANFD/LIN/ETH bus messages into MF4 files. Have you considered adding support for FlexRay bus data recording into MF4?

2 replies

ihedvall Apr 2, 2025
Maintainer

@sy950915
Yes, Issue #120 include the FlexRay and MOST support. As the library now can save many parallel measurements (DG), it is now possible to support all types of bus configuration. It's about 2 weeks work.
I'm working with the meta-data issue #59 and I have 1-2 weeks left.

sy950915 Apr 5, 2025

Thank you very much for your reply, and I am very much looking forward to the addition of FlexRay.

ihedvall · 2025-05-20T11:32:44Z

ihedvall
May 20, 2025
Maintainer

@sy950915
I have add the previous issues in the latest check-in. It contains all MDF bus types CAN, LIN, MOST, FlexRay and ETH. This is a major change so test before using it. I still have to fix the documentation and create a new release.

0 replies

windfenggg · 2025-09-04T07:26:31Z

windfenggg
Sep 4, 2025
Author

@ihedvall
Dear Mr. Ihedvall,
I am currently implementing a multi-DG real-time writing approach for MF4 files. Each MF4 file contains approximately 25 signals, with one DG allocated per signal. I have experimented with various thread scheduling strategies, and when stopping the measurement, I invoke the library’s StopMeasurement and FinalizeMeasurement interfaces. However, the application calling mdflib.dll consistently becomes unresponsive at that stage.
As a workaround, I attempted to use a thread pool to schedule multiple DGs and perform parallel writes across signals, but this did not resolve the issue. In my current setup, the data producer pushes samples at a rate of 10 µs per signal, resulting in roughly 20 million samples per second in total. Under this load, the StopMeasurement operation incurs significant delays.
I also observed that when compression is enabled, after starting the measurement, buffered data is flushed to the MF4 file either every 4 MB or every 10 minutes, at which point the file size is updated.
Given this behavior, I would like to ask for your advice: when dealing with high-throughput, large-scale signal logging to MF4 files, what would be the recommended direction for optimization? Is this high latency during StopMeasurement and FinalizeMeasurement expected with such large data volumes, or might it be related to how I am using the library?
I would greatly appreciate your guidance. Thank you.

0 replies

ihedvall · 2025-09-04T08:52:53Z

ihedvall
Sep 4, 2025
Maintainer

@windfenggg
I had a similar problem that I had 4000 signals at a 100 Hz sample rate. Let us skip the MDF for a while. I first tested with Oracle DB and it inserted ~10 sample/s and with array insert ~1000 samples/s i.e. skip SQL database. Next I tested with dBase files (no index). This could store up 10 MS/s. This type of storage is used by many Historical Databases in the industry. Note that the above was tested 10-15 year ago so don't take the exact measurement to serious.

Your 20MS/s problem is very close what is possible to stream onto a mechanical disc which in turn is so-to-say single threaded. What I am trying to say that doing multi-threaded writing to a single disc is a bad idea. The multi-DG MDF storage have only one write thread but one in-memory queue.

I don't know your application but I suspect it's similar to what I did for a power plant. The requirement is to record all 4000 signals at 100 Hz 24/7.

Synchronize the sample int time slots (10ms). The purpose to get one timestamp for all 4000 values.
Next is to store this time slot onto a disc. You need a small in-memory cache as it is more efficient to save each 1-2 seconds that open and close for each sample.
The disc cache, creates a new data file each 10 minute and register its first and last timestamp in a local database (SQLite). The disc cache (kind of a database) is saved for 3 days.
In parallel you also save each time slot in a real-time database. This database is used to find start and stop time for a measurement conditions.
When an interesting conditions is detected, a measurement file is created by using the sample data from the 3 day cache. These measurement files are long-time stored and evaluated in some sort of Data Lake.

The MDF is perfect choice for the long term storage. In my problem, the I/O signal configuration is fairly static so using MDF for the 3 day disc cache is little bit overkill but in principle the MDF could work there as well.

There are a lot of other problem but I stop here as I don't know your requirements. 100kHz sample rate is very high if you want to store 24/7. If it is in burst mode, there might be a simpler solution.

Ingemar Hedvall

0 replies

windfenggg · 2025-09-05T02:22:26Z

windfenggg
Sep 5, 2025
Author

@ihedvall
Thank you for your reply.
In my use case, I need to record multiple MF4 files in real time, with each MF4 containing multiple signals. The trigger frequency of the signals is very high (10 μs), and it is critical that no signals are lost during recording. Currently, I call StopMeasurement and FinalizeMeasurement when stopping the recording. Based on the logs, the overhead of these calls is acceptable. However, I have observed that message accumulation in my queue appears to be the main bottleneck.
I would also like to clarify the usage of the SaveSample interface. From my understanding, it currently only supports saving a single signal point at a time. Are there any plans to provide an interface that supports batch-saving multiple signal points in one call, where the signal names remain the same but the timestamps and values differ, so that they can be stored together in a single DG block?
Best regards!

1 reply

ihedvall Sep 5, 2025
Maintainer

@windfenggg
I assume that 100kHz sampling per signal is just under some (shorter) period not 24/7 application. If you don't want to loose any samples, each SaveSample(DG, CG) call must be ready within 10us. If your source have some sort of internal buffer then you don't need a RTOS as long as the average time is below 10us/call.

As all signals have their own timestamp, you can do the SCADA trick of keeping the last reported value (LRV) and if the next value is identical with the previous, do not save this value. This may have a dramatic performance boost and file size. In SCADA system, they also use hysteresis i.e. ignoring small changes.

The current implementation uses a (linear) queue for each DG block. It is called per DG/CG block and it save all CG block signal values into a so-called record. This record is later saved onto the disc by the write thread. All signals in a CG block, use the same timestamp. If you assign a signal to CG block and use many CG blocks in one DG block instead of the current solution, then it is possible to save several signal values with timestamp. It requires some new function calls but this is minor problem.

Each DG block have its own so-called Sample Queue. The Basic MDF 4 Writer assumes that this internal queue is flushed onto the disc each 10s or so. In case of compression the flush should occur when the Sample Queue > 4MB. It is similar when saving many DG block in the same measurement. When flushing to disc, only samples from one Sample Queue (DG block) can be stored. Note that it only exist one Write Thread independent on how many sample queues your application uses.

If you add more sample queues, you may save more samples in parallel. The single write thread needs to flush one of these queue. Choosing the right queue can be complex so there can be some speed improvements here. The save samples and flush (write) sample threads, works on the same queue so they lock out each other during some period.

There exist several MDF writer types with different Sample Queue implementations. When working with high speed inputs, there you need to do some trade-offs, for example the file may be locked for reading when it measure. The Convert writer type is the closest to your requirements.

Regarding the batch interface. It is a possibility to build a batch interface but at the end the MDF standard defines how the values are sorted and stored. There are some new storage type defined in MDF 4.2 that I can look at. The risk is that no third party viewer/parsers can read that file. The MDF C++ library can read this format but I have not seen any tool producing a correct file so far. I suppose the Vector tool can do this but so far there are no reference files. The 4.2 storage type have some issue with the current sample queues as it flush out many samples for each values at the same time.

I can look for an interface that you can add value/timestamp to some internal structure sort the timestamp/values into MDF storage blocks and at some smart point flush to the disc. Restriction could be unique signal names within the file and variable length values as strings should be avoided (enumerates are OK). Only one DG block but many CG block would also be beneficial but not can be solved. Each signal must have a time master associated.

I will look into the 4.2 new type of storage. You can maybe look on the restriction for a batch mode

ihedvall · 2025-09-06T07:48:08Z

ihedvall
Sep 6, 2025
Maintainer

@windfenggg
I have checked the version 4.2 LD/DV storage. The MDF stores sample by sample also know as row-storage. This type of storage is typical used by logger application. The disadvantage comes later when reading. When fetching all samples for one channel, the reader have to read in all channel data bytes.

It is possible to save samples fast and also read them fast, if the following conditions are met.

Only one CG group on each DG group.
No Variable Length Storage (VLSD).

With the above requirements, the CG record have a fixed byte width. The data block is now actually a type of matrix where the reader know where each value is stored. The storage is still row storage but with a specialized reader, the reading will be faster.

Normally the writer saves the data block in one DT block. This can only be done if there is only one DG block. If there are multiple DG block storage, the data block is divided into smaller linked data blocks typical using the HL-DL-DT/DZ block technique. If the above requirements are met, the LD-DV storage technique can be used. The LD block is similar to the DL block but the LD block stores the time stamp of the first sample in the DV block.

It should be fairly simple to add the LD-DV storage to the existing readers by adding a new Storage Type enumeration. Currently the Fixed Length, VLSD or MLSD enumerations exist but adding "Column" storage and some new Sample Queue handling will solve that problem.

The big issue will be if the third-party reader will handle this. I suppose newer Vector tools will handle this new 4.2 storage. The MDF lib has only one generic reader which read the above data samples.

The LD block contains number of samples and also the first sample time for each data block may generate new "Read Data" interfaces as "Read (All) Samples for a CG block" or similar call range based. These calls will be much faster than existing ones.

The above is related to the Batch writer type but I leave that to another comment.

0 replies

ihedvall · 2025-09-06T08:59:32Z

ihedvall
Sep 6, 2025
Maintainer

@windfenggg
In your application you having very high sample rate. The sample rate is most likely higher than it is possible to store onto a disc why the samples are mainly stored in the internal sample queue (memory). At the stop measurement call, that queue must be flushed to disc.

The main problem is not that it take some time to write the values to the disc. The main problem if saving the samples to the internal sample queue. You want a Save Sample interface with 3 input argument, signal reference, signal value and timestamp. It would be more general to use a Save Sample interface with 2 input argument, the CG reference and a value array. The value array is added in the signal order in the channel group, typical with the time value first followed by the other signal values.

The Save Sample call must return fast so there is no time for any mutex or new calls. Instead I propose to use the DMA (Direct Memory Access) technique. This requires 2 or more fixed arrays. One array is the active and the other are the inactive ones. The active one is what the Save Sample function uses and when it has filled the active one it switch to the next fixed array and mark the previous one at full. Another thread is responsible to move the samples to another place and mark the array as empty.

Typical only 2 buffers are needed but if you chicken out more than 2 buffers can be used. The above techniques require that the buffers are allocated and of fixed size which is similar to the column storage requirements. In principle the Save Sample can be optimized away but require that you simply add the sample values directly to the active buffer as the CG reference the active buffer. The disadvantage is that clearing the full buffer before it becomes active again, then these samples are lost.

The full buffer needs to converted into MDF format and that is saved onto the disc similar to the current Sample Queue. Having a one-to-one relation between the buffer and a LD-DV or (HL)-DL-DT storage, optimize the write.

I think the end-user needs to set up the buffer size in number of samples. A 2-3 seconds buffer would be enough. The Save Sample call thread/task should within a RTOS task but this is the end-user's problem.

Please give some comments about this (DMA) solution proposal.

0 replies

windfenggg · 2025-09-06T12:57:17Z

windfenggg
Sep 6, 2025
Author

@ihedvall
Thank you very much for your patient and detailed responses and clarifications.

Previously, the main issue I encountered was that my self-maintained queue was being consumed too slowly. I have since modified the application layer to use multithreaded queue consumption. After several hours of testing, I observed that stopping the measurement is now much smoother, and the number of accumulated signals in the queue remains within an acceptable range. At this stage, I suspect that the remaining inefficiencies may be related to my thread scheduling strategy. I am currently running stress tests to verify whether any data points are being lost.

If the performance bottleneck ultimately lies in the SaveSample interface, I believe it would be valuable to consider adding a batch processing interface in the future. My idea for such an interface is as follows: prior to pushing data, all DG/CG/CN structures are created in advance (with each DG corresponding to one signal, each DG containing one CG, and each CG containing two CNs). I also establish a mapping between signal names and DG addresses. When a batch of signals arrives, they are inserted into my queue, processed by a background thread, matched to the corresponding DG, and then written into the MF4 file via SaveSample. However, even with multithreaded parallel writes, the current approach only allows one signal point per DG to be saved at a time.

A batch interface could potentially improve this situation, for example by allowing me to pass a DG address together with a timestamp array pointer and a signal value array pointer, where the timestamp and signal value arrays are guaranteed to be aligned. I am curious whether this would be feasible to implement within the library. Since the generated MF4 files must remain compatible with third-party tools, the priority for adopting MDF 4.2 LD/DV storage is relatively low at this stage, though I understand it may be more suitable for columnar signal storage in the future.

Regarding the double-buffer (DMA-like) queue technique you proposed, if it can be configured in such a way that no data is lost while still supporting high-frequency writes at the application level, I would be strongly inclined toward this approach.

To summarize: I am continuing performance tests to determine whether the actual bottleneck is in the library’s SaveSample interface. If you have the opportunity to further investigate the double-buffer technique, I would be glad to collaborate by running high-frequency write tests on my side.

Best regards!

1 reply

ihedvall Sep 7, 2025
Maintainer

@windfenggg
Just some comments

The reason for a double buffered interface is at 20 MS/s you only have ~17 CPU cycles (4GHZ CPU) per sample. Using STL or any C++ is a No. The buffer is a plain C array. This means that no VLSD signals can be stored i.e. no byte array or strings. It also limits the number of CG block/DG block to one. In reality, all signals data types must be numbers (int, uint or float (double).

A CG block have at minimum 2 signals the (time) master and a signal. Note that the InitMeasurement() call, actually saves the configuration DG/CG/CN to the disc. After the call, the CG record size is known which is used to setup the buffer size. You also get a unique index (file position) for each block.

The optimal is to have several signal values with the same timestamp. Adding more DG/CG groups, make it possible to add samples in parallel. There is one (double) buffer / DG block.

The pre-trig time is a requirement as this high frequency mode is commonly used to capturing a measurement around an event.

windfenggg · 2025-09-10T03:52:53Z

windfenggg
Sep 10, 2025
Author

@ihedvall
Thank you for your suggestions. Although I haven’t adopted the double-buffering approach yet, the real-time write performance is still acceptable for now. On the data push side, the strategy has been adjusted—the pushing rate isn’t as high as originally configured. The expectation was 100,000 points per second for each signal, but in practice I’m only receiving about half of that. Previously, the main issue was with my application-level queue and thread allocation, but that has been fixed. For now, I don’t think I need a batch processing interface.

Currently, I allocate one queue and one thread per signal. The same thread handles both storing the signal and writing to the MF4 file. The downside is that CPU cores are limited, but the number of threads can get quite large, since I may need to save multiple MF4 files simultaneously, and each file can contain dozens of signals that are continuously receiving data. My main goals are to ensure no data loss and to keep measurement shutdown fast.

At the moment, I have two questions:

1)Suppose I have 50 different signals that need to be written in real time, each with different timestamps. Should I store them as:

50 DGs, one DG per signal, each DG with one CG, and each CG with two CNs?

Or as 1 DG with 50 CGs, each CG with two CNs?

Or as 1 DG with 1 CG and 51 CNs, with timestamps forcibly synchronized?
I remember you previously mentioned the “single DG, multiple CGs, each CG with two CNs” approach, where storing different signals results in an unsorted MF4 file. Does this mainly affect read performance?

2)What’s the purpose of setting a pre-trigger time for event-triggered signals? In your test code, after setting a pre-trigger time, you were able to save a batch of samples between init and start. My current high-frequency writes are mainly DAQ event-triggered, with the fastest rate being 10 µs.

Best Regards!

1 reply

ihedvall Sep 10, 2025
Maintainer

@windfenggg

How To Configure the Write

How to store is mainly dependent on the timestamp and how fast you want to read out the values. The fast reading assumes that the end-user wants all samples for one channel. If you want to read all samples for all channels, the storage type doesn't matter.

In real life, the write of an MDF is always done on a local drive, today typical SSD drive. If you try to write a file on NAS, this can fail depending how its (NAS) write-through flag is set.

The MDF files are normally stored on a NAS and you read the file through a 1 GBit/s network. The SATA drive can transfer 6 GBit/s. Data Centers usually use a 10 GBit/s local optical fiber network due to the above limitations.

In general, the 1 CG/DG block is called a sorted MDF file. In general reading is much faster that having multiple CG/DG. 2 CG/DG block should take twice the time to read but some readers can do some trick by skipping the unwanted CG block.

In your case, there is problem how to setup the internal sample queues so they can handle the input sample rate. The time it takes so store the sample data onto the disc shouldn't be a big issue.

One CG/DG block will optimize the read. Each CG block will store a master signal (timestamp) and at least one signal value.

Pre-trig Measurements

The pre-trig time defines the the internal sample queue size before starting. When you call the InitMeasurement() function, the Sample Queue is created and now starts to add sample to the queue.. The queue size will grow until current time larger than the sample time plus pre-trig time.

When you call the StartMeasurement() function, all samples, including the pre-trig samples, are now stored onto the disc until you call the StopMeasurement() function.

The pre-trig measurement is typical used when the start measurement condition trigger a measurement and the samples of interest is before the trigger. A typical application is a so-called APM (Apre's post mortem) meaasurement. This measurement is triggered by a stop signal. This stop signal is triggered by some high temperature, a fire or your test object just disintegrated. The APM measurement is started and now store the interesting samples before the stop.

The other application is that you want to measure with high frequency, the sample values around an event. Typically, gear changes and valve changes measuring torque values.

Load-Step Measurements

Another type of measurement is to do a so-called load-step measurement. The idea, is to calculate one or more average values over a so called load-step part of a cycle. A cycle is typical a similar to a measurement. The load-step values have the same time-stamp but you don't store that time. Instead you skip the master channel and stores the absolute time plus the load-step values. The load-step is just one sample and the load-step number is the sample number.

This type measurement runs for a long time (month) and it's the trend that is of interest. You need to able to read and write the MDF file continuously. This type of measurement typical have some validation task running in parallel, that checks the sensor quality. If you measure a month with a broken sensor, you have to redo that measurement.

FFT Measurements

A test objects that have some rotation, you can do FFT analyze on. The disadvantage is to define a sample window to do the calculation on. Having the I/O sample periodically on time is a bad idea. Instead its better to trig the I/O by the rotation angle. The master channel is now an angle instead of a time.

If you still sample by time, you need to recalculate the samples by some spline function into angle samples and then do the FFT calculation. You can find ball bearing (flat) error as well as detecting trends that something will fail soon.

windfenggg · 2025-09-10T09:42:09Z

windfenggg
Sep 10, 2025
Author

@ihedvall
Thank you for your detailed reply, it has cleared up many of my confusions about measurement. I feel that my understanding of MDF files is just the tip of the iceberg, and through our discussion I’ve learned a lot of knowledge. I really appreciate it!
Best Regards!

0 replies

windfenggg · 2025-09-19T09:12:01Z

windfenggg
Sep 19, 2025
Author

@ihedvall

Hello, I’ve recently encountered a performance issue and would like your advice. I’m trying to merge two MF4 files (1.mf4 and 2.mf4) into a third file (3.mf4). I’ve tested two approaches:

Approach 1: Use a single thread to read 1.mf4 first and then 2.mf4. The data is stored in a map<string, vector<pair<double,double>>> all_signals, where the key is the signal name and the value is a list of timestamp–value pairs. After reading, I perform parallel sorting on all_signals, and finally use a single thread to write everything into 3.mf4.

Approach 2: Use two threads to read 1.mf4 and 2.mf4 in parallel, store the results into the same map, then perform parallel sorting, and finally use a thread pool to concurrently write each signal into separate DGs of 3.mf4.

In my tests, Approach 1 is faster. So my question is: when merging MF4 files, is single-threaded read/write actually more efficient, since (as you mentioned before) only one writing thread is effectively active?

Also, do you have suggestions for reducing the overall merge time? I noticed that when I choose MdfWriterType::MdfConverter, the runtime is lower compared to MdfWriterType::Mdf4Basic. Is MdfConverter specifically optimized for file conversion? And what are the main differences between MdfConverter and Mdf4Basic?

Best Regards!

1 reply

ihedvall Sep 19, 2025
Maintainer

@windfenggg
The MDF library is little bit weak regarding converter applications.

The Basic4 writer is designed an online logger that store values as they are received from an I/O or a bus. The pre-trig time and the timestamps have a big focus.

The Converter simply saves all samples that are added ignoring the pre-trig time. The converter also tries to minimize the primary memory size but I think there might be some code improvement here.

In general, you should not focus on performance as CPU power is a cheap solution. Instead focus on the problem that you suddenly have 2 large input files and your 8-64GB primary memory is not enough. Converter applications, can add samples faster than it can save them. The internal sample queue handle this, well until no more memory. The problem is to slow down the input flow to match the output flow.

My approach is similar to [2], would be to use the internal sample queue for each DG block instead of using an extra temporary sample queue. I would use the Sample Observer with callbacks, one thread per DG callback.

The internal DG sample queue uses the absolute time as timestamp while the output file store the relative time. The problem is that each input file have its absolute start time and you can only store one start time in the output file. You need to recalculate all timestamps before calling the SaveSample() function.

Note that you can read MDF files from a NAS but you must write MDF files to a local (block storage) disc and then copy the finalized file to the NAS. It's a NAS write through flag problem. If you are smart, the input and output files should be on different discs.

Reading the files with Channel Observers require a lot of memory that can be a problem with large input files. The solution is to use a Sample Observer with a sample callback instead. You now have many threaded callbacks running in parallel, returning samples. You can now with some creative timestamp sorting algorithm, block the callback returns depending on the timestamp. Note that this assumes that the input files are unsorted and only have one DG block/file.

If you are concerned about conversion times, you need to have at least 1000+ MDF files or converter calls, before the optimization pays off. I had a similar problem in the bus master project where I needed to "unsort" a sorted MDF file. I use the approach 1 and read in all samples and then sort them in time order, hoping that the end-user have enough memory.

The approach (2)3 with callbacks are tricky but you can do it instead of solving a sudoku puzzle

windfenggg · 2025-09-19T14:56:06Z

windfenggg
Sep 19, 2025
Author

@ihedvall

Thank you for your suggestion. I will try the callback-based application layer approach. The scenario involves merging around a hundred MF4 files into a single MF4 file. Each MF4 contains only one DG that stores a single signal. Ideally, each MF4 file should be just a few hundred MB in size.

0 replies

windfenggg · 2025-10-10T08:35:38Z

windfenggg
Oct 10, 2025
Author

@ihedvall

Hello, I would like to ask whether MdfReader is thread-safe. Can different DG blocks of an MF4 file be read in parallel? I created a thread for each DG block with callbacks, but I found that the data received in the multi-threaded OnSample callback was inconsistent with expectations. However, when traversing DG blocks in a single thread, the callback reading works correctly.

If MdfReader is not thread-safe, are there any techniques to speed up reading large MF4 files (around 10 GB, not video or audio data)?

Best Regards!

1 reply

ihedvall Oct 10, 2025
Maintainer

@windfenggg
The problem is the the file stream and its current file position. When reading a data block, the current file position is stepped forward. To make it thread safe, the ReadData() can be set to block out other threads but this will be similar to read block by block in a single thread.

It should be possible to open several file streams which is similar to use several MDF readers or call ReadData() with a file stream argument. I have not tested this if it generates a faster read as the file itself is a singleton.

You can try the principle by creating several MDF readers that reads one of the DG block only. Note that I have not tested this if it generates a much faster read. There will be several calls to the ReadEverythingButData() function but this can be solved by doing a new ReadData() function that accept a file stream argument to use during the read.

looking for an example of using ReadVisdData to read CAN/CAN FD messages。 #110

Uh oh!

Uh oh!

windfenggg Nov 29, 2024

Replies: 56 comments · 13 replies

Uh oh!

Uh oh!

ihedvall Nov 29, 2024 Maintainer

Uh oh!

ihedvall Dec 1, 2024 Maintainer

Uh oh!

windfenggg Dec 1, 2024 Author

Uh oh!

windfenggg Dec 2, 2024 Author

Uh oh!

ihedvall Dec 2, 2024 Maintainer

Uh oh!

ihedvall Dec 2, 2024 Maintainer

Uh oh!

Uh oh!

windfenggg Dec 2, 2024 Author

Uh oh!

ihedvall Dec 2, 2024 Maintainer

Uh oh!

ihedvall Dec 2, 2024 Maintainer

Uh oh!

ihedvall Dec 2, 2024 Maintainer

Uh oh!

Uh oh!

windfenggg Dec 2, 2024 Author

Uh oh!

windfenggg Dec 2, 2024 Author

Uh oh!

ihedvall Dec 2, 2024 Maintainer

Uh oh!

windfenggg Dec 2, 2024 Author

Uh oh!

ihedvall Dec 2, 2024 Maintainer

Uh oh!

windfenggg Feb 11, 2025 Author

Uh oh!

ihedvall Feb 11, 2025 Maintainer

Uh oh!

windfenggg Feb 11, 2025 Author

Uh oh!

sy950915 Apr 2, 2025

Uh oh!

Uh oh!

ihedvall Apr 2, 2025 Maintainer

Uh oh!

sy950915 Apr 5, 2025

Uh oh!

ihedvall May 20, 2025 Maintainer

Uh oh!

windfenggg Sep 4, 2025 Author

Uh oh!

ihedvall Sep 4, 2025 Maintainer

Uh oh!

windfenggg Sep 5, 2025 Author

Uh oh!

ihedvall Sep 5, 2025 Maintainer

Uh oh!

ihedvall Sep 6, 2025 Maintainer

Uh oh!

ihedvall Sep 6, 2025 Maintainer

Uh oh!

windfenggg Sep 6, 2025 Author

Uh oh!

windfenggg
Nov 29, 2024

Replies: 56 comments 13 replies

ihedvall
Nov 29, 2024
Maintainer

ihedvall
Dec 1, 2024
Maintainer

windfenggg
Dec 1, 2024
Author

windfenggg
Dec 2, 2024
Author

ihedvall
Dec 2, 2024
Maintainer

ihedvall
Dec 2, 2024
Maintainer

windfenggg
Dec 2, 2024
Author

ihedvall
Dec 2, 2024
Maintainer

ihedvall
Dec 2, 2024
Maintainer

ihedvall
Dec 2, 2024
Maintainer

windfenggg
Dec 2, 2024
Author

windfenggg
Dec 2, 2024
Author

ihedvall
Dec 2, 2024
Maintainer

windfenggg
Dec 2, 2024
Author

ihedvall
Dec 2, 2024
Maintainer

windfenggg
Feb 11, 2025
Author

ihedvall Feb 11, 2025
Maintainer

windfenggg
Feb 11, 2025
Author

sy950915
Apr 2, 2025

ihedvall Apr 2, 2025
Maintainer

ihedvall
May 20, 2025
Maintainer

windfenggg
Sep 4, 2025
Author

ihedvall
Sep 4, 2025
Maintainer

windfenggg
Sep 5, 2025
Author

ihedvall Sep 5, 2025
Maintainer

ihedvall
Sep 6, 2025
Maintainer

ihedvall
Sep 6, 2025
Maintainer

windfenggg
Sep 6, 2025
Author