-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read/write test error #2
Comments
Hi, thanks for your interest in my work :) The read data vector does change during write operations. This is due to the way OSERDES and ISERDES are wired to the IOBUF. Here is a sketch I used in my thesis: This is why you should only really consider the read data output as useful when the rd_data_valid flag is high. It is only high for one clock period per read operation. You do not mention toying around with This project's sister repository houses a working example for the Arty S7 board, but the top Verilog module should still be useful as an example of a working instantiation of this interface. Note that to use the read calibration module, The sister repository also includes a Python script that makes it very easy to test whether you've calibrated the two read parameters correctly. The written and read data is saved to disk as two binary files, and it is possible to read them with a hex viewer/editor. It is then easy to see by how much the read data must be delayed. (The parameters allow for multiples of 64b increments, and a single switch for a 32b shift.) *If you do use the Python script, you will see why I recommend unique strings instead of repeating strings of 4 bits. Let me know how it goes, or if you need any help. |
Wow, thanks for your patient reply! Based on the connection topology of OSERDES, ISERDES, and IOBUF that you described(the drawings are fantastic), combined with previous tests, it is guessed that the DRAM like does not work, resulting in read operations when the output of OSERDES is stored and then input to ISERDES. For I will try to determine the values of the above parameters according to the top-level module of your demo combined with the Python script. I will communicate with you if there is any progress. Thanks again! By the way, in the demo top layer of your Arty S7 board, the input DDR interface clock |
I'm afraid I don't understand your first sentence. OSERDES is only active during writes. At all other times, the IOBUF output is disabled (Z). Even if I implemented back-to-back reads and writes (in the current implementation, the bank is precharged when switching from writes to reads), OSERDES does not interfere with read data. The read mechanism is really very simple. The ISERDES data output width is doubled (from 64 to 128 bits), and the read data valid flag is just a shift register. Take a look at the 12 lines and you'll see what I mean.
This is how I decided to control for CL and PCB delays etc. I guess the next step would be automating it, but it works well enough once you figure out the combination you need. The arty_s7_playground repository is very very messy. Locally, my MMCM is correct, but I've decided not to bother uploading every variation of the IP files. I used to do that in the beginning, where the commits are absolutely huge, but nowadays I just hand select the files I edit. Vivado is anything but version control friendly. I assume people will mostly be interested in the verilog files, as the MMCM and FIFO configuration is well documented in the readme file in this repo. The ddrFreq variable in Python is really only there to calculate memory throughput. What's important is that the UART baud rate is correct. The last two lines are really just for show. Here's what a successful run looks like: Keep us updated! |
Thank you for your quick reply, OSERDES does not interfere with reading data, that's true. Anyway, I've already started the verification process by changing the The UART baud rate setting of 3_000_000 is wrong. I replaced yours with my own UART baud rate setting of 9600. Please ignore my low speed UART. The first test Compare TX data with RX data. RX data is the result of TX data shifted 12 bytes to the left, as shown in the figure below. The second test RX data is the result of TX data shifted 8 bytes to the left, as shown in the figure below. The most recent test For a data size of 1024, sometimes it succeeds or fails. For larger data sizes, it almost always fails. A mistake that is prone to recurrence, The faulty RX data is the result of a previous 12-byte shift to the left of the RX data, as shown in the figure below. Another type of error occurs when the data size is larger, and no pattern can be found for RX data errors, as shown in the figure below. Next, I will try to reduce the |
This is really great error reporting, thank you for the effort. Hopefully after testing you understand how the I'll be honest and admit I've never seen any such errors appear in my testing. The recurring first error makes me believe that the read command for address 'hC8 (at txdata blob location 'h190) is simply ignored. The data present at 'h190 is simply whatever is left on the ISERDES parallel output from the read that was issued to address 'hC0. One guess would be that the timing constraints of your memory module differ from the one I've used -- Have you checked that the timing parameters I've left in my code are fine for your memory chip? Also, though standardized, perhaps the mode register options could be different for your chip, as well? As for the second error, which seems sporadic, I genuinely have no insight to share. Either the write or the read cycle fails for the data at blob location 'h0F0, and then at 'h100 again the recurring first error repeats. If it turns out that it is the writes that are failing then something is very wrong. The write part of the PHY is made as well as I could manage (I'd argue even: as well as is possible with SERDES in memory mode), and without high speed probes placed onto the DQ lines there really is no way to even diagnose the error. The reads might fail due to bad read calibration, but then my instincts would tell me you would see more than just sporadic 128-bit bursts corrupted. Still, you could observe the IDELAY tap values via As an aside, I was told by somebody that the Zynq chips have memory routed to the PS side of the chip, where a static DDR3 controller resides. I was led to believe UG933 applies to Zynq (where only routing to the PS is discussed). It's interesting that your board has memory routed to PL despite being a Zynq chip. Though the information forwarded to me might have been incorrect, or I might have misunderstood it. Another side note, the Arty S7 board uses a FT 2232H, which can work at up to 12 Mbaud (or 6M or 3M etc). If your chip doesn't support it, that should be fine for the purposes of this test. (I used the high baud rate to confirm that the entirety of my memory was accessible. Transferring 2 Gbit of data, both ways, over a 3 Mbaud connection is slow!) |
Thank you very much for your help. I have benefited a lot. The model of my TTL-RS232 module is Yes, the PS side of ZYNQ does have the DDR MC of arm, but my board PS and PL have their own external DRAM, I did not enable the PS side, in fact, I use ZYNQ as a K7 FPGA, so there is no DRAM routing to the PS side. Your guess is right, the core timing parameter is really not suitable, in your design, However, there was a problem when I tried 466M. I used DDR-1066(7-7-7) Core timing parameter, I wonder whether the Looking forward to your reply and best regards! |
Your explanation of Zynq's memory interface configuration is much appreciated. Ignoring the PS is one way to get a good FPGA haha At the risk of stating the obvious, you can just default to always using the fastest speed bin timings for your memory chip. I.e. if your chip is rated for operation at 1600, the timing values for that speed bin should be valid even when running at lower frequencies (for most timing parameters that should hold true even in DLL off mode at <125 MHz). That is to say, there is no need to look at different speed bin tables when running the memory at 300-400 MHz or 466 MHz, you can default to the fastest timings supported by the memory chip. @robinsonb5 reported similar issues with sporadic bit shifts at higher frequencies as you do. If he has anything to add, his opinion is more than welcome in public as well (if not, I apologize for the tag). It is true that the errors might be a consequence of my improper understanding of ODT. This is partly revealed in Issue #1, which @TheAnimatrix and I discussed further in private. My understanding is explained in that issue, but to summarize, I interpreted the following table (sourced from the Micron MT41K128M16 2Gbit DDR3L datasheet) to mean "If R_(TT,nom) is enabled in MR1, then the R_(TT,nom) value is in effect regardless of the ODT pin." The ODT chapter in that datasheet further claims that "[write] accesses use RTT,nom if dynamic ODT (RTT(WR)) is disabled," which is done via MR2 by default in this interface. That is to say, I interpreted the ODT pin to control when the impedance switches from R_(TT,nom) (set in MR1) to R_(TT,WR) (set in MR2). According to this interpretation, since R_(TT,WR) is disabled in MR2 and the ODT ball is kept low, the termination impedance should always be R_(TT,nom). I'm still not sure which interpretation is correct. To further the confusion, Micron's datasheet lists an additional mode where the ODT pin may be wired high permanently (supposedly via a current limiting resistor) that JESD79-3 doesn't include at all! I also think (but am not 100% sure) that Xilinx's MIG keeps the ODT ball low. Micron's TN-41-04 states: "When the module is being accessed during a WRITE operation, greater termination impedance is desired, for example, 60Ω or 120Ω." Perhaps raising R_(TT,WR) to 60 Ohm would be beneficial at these high frequencies. Sadly I cannot test this hypothesis because my Spartan FPGA can be clocked at max 464 MHz (plus my testing top module fails timing at ~330 MHz; I never anticipated I would be able to test at such high frequencies, so none of it is really optimized). You could set M[9,6,2] in MR1 to {0,0,1} (sets R_(TT,nom) to 60 Ohm) and see if it helps (assuming the data bus is terminated to R_(TT,nom) when the ODT ball is tied to 1'b0). Perhaps full ODT functionality would be needed -- If somebody reading this wants to try and implement that, you are more than welcome to contribute to this project. Another problem could be due to the lack of control between the clock, DQ, and DQS signals. ODELAY elements are not available in HR banks, and the PHASER primitives are conveniently left undocumented by Xilinx. I've thought of changing the clock signal phase using the MMCM, but with a resolution of 45° that would be futile. One solution presented to me could be adding error correction outside of the memory controller. With my knowledge of DDR3 SDRAM and insight into this interface in its current form (including testing on an Arty S7-50), I can only conclude that such sporadic errors are unavoidable at high frequencies. The CL value has no bearing on the quality of the read data. Only the delay with which the memory delivers the read data after a read command is issued is changed. For the memory interface, the CL value changes nothing (no logic controls for or counts the CL clock cycles, but it could mean that |
Hey, sorry for the late response. I could've sworn I replied already. Happy New Year, I guess! I've never had anything like this happen, not with my Python script and not in fast sequential access. I wrote a module that does sequential writes across the entire memory, then sequentially reads from the entire memory, and compares the write and read data patterns. In this scenario, I have never gotten a read fail. I have no clue why some data would be corrupted in your case. First guess would be that maybe some cells leak charge too quickly (Are you in a warm environment, or does the FPGA or memory chip warm up significantly?), so by increasing the address range, you prolong the time the offending cells have to corrupt data. (For this to be true, I think the corrupted address should always be the same one, but I'm not sure. This is all just speculation.) You could test if the refresh period is too short, somehow, though in all the simulations I've done it's always been around 7.8 us. Maybe write the entire chip, wait a while, then read back from it? Or just decrease the Other than this, I really have no clue what could be going on. In the meantime I've also lost access to the development board I was using so I can't contribute with any testing of my own. |
Finally waiting for your reply, thank you very much for your analysis, happy New Year! In fact, I haven't really figured out why this is happening. My verification is that FPGA and DRAM are not heated at room temperature, and the temperature of DRAM should be within the normal operating temperature (0~85℃). Refer to JESD79-3E(DDR3) Table21-Temperature Range. If the temperature exceeds 85℃, the frequency of refresh command should be increased. In order to remove the interference of hardware environment, I will use MIG to test under the same conditions to see whether the same situation will occur. I guess it should have nothing to do with hardware environment. At the same time, I will also try to reduce During this period of time, I tried other implementation methods, using When writing, I enabled Looking forward to your reply and best regards! |
When implementing support for multiple DDR3 chips on the same ck/cmd/addr bus (like on a DIMM), you need to account for the fly-by topology effects. The physical datapath traces on the PCB might be equal between the DRAM chips, but because of the fly-by topology, the clock, address, and command bus signals arrive to the chips with different delays. If you do not account for this, then data will obviously get corrupted. I'm just hypothesizing here that you didn't use DDR3's write leveling function, because these lower end chips don't provide ODELAY functionality. Corrupted write data is an inevitability in this scenario. This illustration of the effect of fly-by topology on datapath timing is taken from Micron TN-41-13: That's just a guess though since you don't mention any delay primitives being used. As for my own interface, I'm afraid I don't have any more input to give with regards to your issue. Maybe a complete set of IO constraints might expose some as-yet unexposed design flaw. Maybe it's a corner case I didn't account for. |
Hello, I am very interested in this project, and I have met some problems in my study.
I have instantiated the
ddr3_x16_phy_cust
andddr3_rdcal
modules in your Arty S7-50 project, and programmed the app module to generate therdcal_start
signal and control the data and address input of theddr3_rdcal
module.The parameters as a whole follow your configuration.
The DDR interface frequency is 300M, ISERDES_16B, 32B, and 48B are both FALSE. The IDELAYCTRL frequency is 200M. The RD_DELAY is set to 6/10(Same phenomenon), and is deployed on zynq7030.
The board level tests are as follows:
After the
w_rdcal_done
signal is high, the single read/write test passes. Data written and read at address 0x10 are both 0xaaaa_aaaa_aaaa_aaaa. For details, see the following figure :However, the problem occurs when the data is read after continuous writing. At address 0x0, full A is written to address 0x8, and full B is written to address 0x10. The data read at address 0x0 is full C, and the full A is different. For details, see the following figure :
According to the waveform, when the write operation is effective, the write data will be updated to the read data repository. In this way, the data read for the first time is the data written for the last time, which has nothing to do with the address.
I haven't studied your code in depth, so I want to study it further after running it through first. What is the problem according to your experience?
Looking forward to your reply and guidance, thank you!
The text was updated successfully, but these errors were encountered: