Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Db2 z/OS: UnicodeDecodeError Exception thrown in conn_errormsg() #876

Open
g-haas opened this issue Aug 11, 2023 · 19 comments
Open

Db2 z/OS: UnicodeDecodeError Exception thrown in conn_errormsg() #876

g-haas opened this issue Aug 11, 2023 · 19 comments

Comments

@g-haas
Copy link

g-haas commented Aug 11, 2023

Problem description

Apparently, when trying to connect to a database server using ibm_db.connect(), ibm_db assumes error strings returned by the database server to be ASCII-encoded. This, however, may not be the case for Db2 on z/OS, as z/OS traditionally uses EBCDIC encoding. As a result, one cannot retrieve a detailed error description using ibm_db.conn_errormsg().

Here’s what we did. We deliberately specified an invalid plan name in the ODBC config file and called ibm_db.connect(). Obviously, the connection failed. We then wanted to get some information about the error. Since in our opinion, it was too early for calling ibm_db.stmt_errormsg() at this point, we tried ibm_db.conn_errormsg(). This call, however, resulted in an exception being thrown and the Python runtime complaining about some UnicodeDecodeError. This also happened if we did not even try to print() the error message – the call to the ibm_db.conn_errormsg() method by itself triggered the error.

Having a look at ibm_db.c, we noticed that ibm_db_conn_errormsg calls StringOBJ_FromASCII(). ibm_db_stmt_errormsg (backing ibm_db.stmt_errormsg() which works fine), in contrast, calls StringOBJ_FromStr(). Hence, we patched ibm_db_conn_errormsg, so that it calls StringOBJ_FromStr(), too, built our custom version of the ibm_db package and rerun our test. Of course, the connection still fails. However, now the call to ibm_db.conn_errormsg() succeeds and error details can be retrieved. After appropriate codepage translations, the text can even be printed.

So, why do the two _errormsg() methods treat the error messages they receive from the server differently? Is this on purpose?

Environment

  • Operating System Name: z/OS Unix System Services (USS)
  • db2level output from clidriver if in use: ?
  • Target Db2 Server Version: Db2 12 for z/OS, function level V12R1M510
  • Python Version: 3.11.4
  • ibm_db version: 3.1.4
  • For non-Windows, output of below commands:
    uname --> OS/390
    uname -m --> 8561

db2-connection-test.py

This is the small example we tested with:

import ibm_db

db_subsys = "xxxx"

conn = None
con_str = "DSN=DSN"+db_subsys

try:
    print( "Trying to connect...")
    conn = ibm_db.connect(con_str, None, None)

except:
    print("Connection failed with error %s" % ibm_db.conn_error())
    errmsg = ibm_db.conn_errormsg()
    print(str(errmsg.encode("IBM273").decode("IBM1047")))

    exit()
else:
    print ("Connection ok")

odbc.ini

[COMMON]
MVSDEFAULTSSID=xxxx
FLOAT=IEEE
CURRENTAPPENSCH=ASCII
[xxxx]
AUTOCOMMIT=1
MVSATTACHTYPE=CAF
PLANNAME=xxxx

Result (unpatched ibm_db)

Here's the output produced with an the stock ibm_db package:

Trying to connect...
Connection failed with error 51002
Traceback (most recent call last):
  File "db2-connection-test.py", line 11, in <module>
    conn = ibm_db.connect(con_str, None, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "db2-connection-test.py", line 15, in <module>
    errmsg = ibm_db.conn_errormsg()
             ^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

Result (patched ibm_db)

After applying the modifications mentioned above, the following output is yielded:

Trying to connect...
Connection failed with error 51002
{DB2 FOR OS/390}{ODBC DRIVER} � DSNT408I SQLCODE = -805, ERROR:  PACKAGE NAME [...]
@imavo
Copy link
Contributor

imavo commented Aug 11, 2023

Does the same symptom appear in other functions, e.g ibm_db_conn_warn(), ibm_db_stmt_warn(), ibm_db_conn_error(), ibm_db_stmt_error(), and some other places?

@bimalkjha
Copy link
Member

@g-haas When you specified CURRENTAPPENSCH=ASCII in db2cli.ini file, doesn't {DB2 FOR OS/390}{ODBC DRIVER} should return the error message in ASCII and not in EBCDIC? Could you please check why CURRENTAPPENSCH=ASCII is not effective for ODBC DRIVER? Thanks.

@pjfarleyiii
Copy link

pjfarleyiii commented Aug 12, 2023

It is possible this is related to issue #852 where we are still trying to understand why the specification of CURRENTAPPENSCH=ASCII in ther odbc.ini file does not seem to be working as expected.

I can confirm that in a z/OS USS shell at least for function ibm_db.stmt_errormsg() with CURRENTAPPENSCH=ASCII in z/OS file odbc.ini the returned error message is PARTLY in ASCII - the last 13 bytes or so are in IBM1047. I had to code this in my test of the ibmdb functions to get the entire message printed to the terminal:

    try:
        stmt = ibm_db.exec_immediate(conn,sql_stmt)
    except:
        print("exec_immediate failed:")
        #print(ibm_db.stmt_errormsg())
        errmsg = str(ibm_db.stmt_errormsg().encode("ISO8859-1").decode("IBM1047"))
        errmsg = str(errmsg[:-13]) + str(errmsg[-13:].encode("IBM1047").decode("utf-8"))
        print(errmsg)

Sample output from that code:

{DB2 FOR OS/390}{ODBC DRIVER}{DSN12015}
 DSNT408I SQLCODE = -199, ERROR:  ILLEGAL USE OF KEYWORD SCHEME.  TOKEN ENCODING
          COMPATIBILITY WAS EXPECTED
 DSNT418I SQLSTATE   = 42601 SQLSTATE RETURN CODE
 DSNT415I SQLERRP    = DSNHPARS SQL PROCEDURE DETECTING ERROR
 DSNT416I SQLERRD    = 2  0  0  -1  28  506 SQL DIAGNOSTIC INFORMATION
 DSNT416I SQLERRD    = X'00000002'  X'00000000'  X'00000000'  X'FFFFFFFF'
          X'0000001C'  X'000001FA' SQL DIAGNOSTIC INFORMATION
  ERRLOC=5:10:2 SQLCODE=-199

The " SQLCODE=-199" part at the end of the value of ibm_db.stmt_errormsg()is what seems to still be encoded in IBM1047.

Here are the bytes printed for print(bytes(ibm_db.stmt_errormsg().encode("ISO8859-1"))), note the " SQLCODE=-199" at the end is NOT the same encoding:

b'\xc0\xc4\xc2\xf2@\xc6\xd6\xd9@\xd6\xe2a\xf3\xf9\xf0\xd0\xc0\xd6\xc4\xc2\xc3@\xc4\xd9\xc9\xe5\xc5\xd9\xd0\xc0\xc4\xe2\xd5\xf1\xf2\xf0\xf1\xf5\xd0@\x15@\xc4\xe2\xd5\xe3\xf4\xf0\xf8\xc9@\xe2\xd8\xd3\xc3\xd6\xc4\xc5@~@`\xf1\xf9\xf9k@\xc5\xd9\xd9\xd6\xd9z@@\xc9\xd3\xd3\xc5\xc7\xc1\xd3@\xe4\xe2\xc5@\xd6\xc6@\xd2\xc5\xe8\xe6\xd6\xd9\xc4@\xe2\xc3\xc8\xc5\xd4\xc5K@@\xe3\xd6\xd2\xc5\xd5@\xc5\xd5\xc3\xd6\xc4\xc9\xd5\xc7\x15@@@@@@@@@@\xc3\xd6\xd4\xd7\xc1\xe3\xc9\xc2\xc9\xd3\xc9\xe3\xe8@\xe6\xc1\xe2@\xc5\xe7\xd7\xc5\xc3\xe3\xc5\xc4@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\x15@\xc4\xe2\xd5\xe3\xf4\xf1\xf8\xc9@\xe2\xd8\xd3\xe2\xe3\xc1\xe3\xc5@@@~@\xf4\xf2\xf6\xf0\xf1@\xe2\xd8\xd3\xe2\xe3\xc1\xe3\xc5@\xd9\xc5\xe3\xe4\xd9\xd5@\xc3\xd6\xc4\xc5@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\x15@\xc4\xe2\xd5\xe3\xf4\xf1\xf5\xc9@\xe2\xd8\xd3\xc5\xd9\xd9\xd7@@@@~@\xc4\xe2\xd5\xc8\xd7\xc1\xd9\xe2@\xe2\xd8\xd3@\xd7\xd9\xd6\xc3\xc5\xc4\xe4\xd9\xc5@\xc4\xc5\xe3\xc5\xc3\xe3\xc9\xd5\xc7@\xc5\xd9\xd9\xd6\xd9@@@@@@@@@@@@@@@@@@@\x15@\xc4\xe2\xd5\xe3\xf4\xf1\xf6\xc9@\xe2\xd8\xd3\xc5\xd9\xd9\xc4@@@@~@\xf2@@\xf0@@\xf0@@`\xf1@@\xf2\xf8@@\xf5\xf0\xf6@\xe2\xd8\xd3@\xc4\xc9\xc1\xc7\xd5\xd6\xe2\xe3\xc9\xc3@\xc9\xd5\xc6\xd6\xd9\xd4\xc1\xe3\xc9\xd6\xd5@@@@@@@@@@\x15@\xc4\xe2\xd5\xe3\xf4\xf1\xf6\xc9@\xe2\xd8\xd3\xc5\xd9\xd9\xc4@@@@~@\xe7}\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf2}@@\xe7}\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0}@@\xe7}\xf0\xf0\xf0\xf0\xf0\xf0\xf0\xf0}@@\xe7}\xc6\xc6\xc6\xc6\xc6\xc6\xc6\xc6}@@@@@@@\x15@@@@@@@@@@\xe7}\xf0\xf0\xf0\xf0\xf0\xf0\xf1\xc3}@@\xe7}\xf0\xf0\xf0\xf0\xf0\xf1\xc6\xc1}@\xe2\xd8\xd3@\xc4\xc9\xc1\xc7\xd5\xd6\xe2\xe3\xc9\xc3@\xc9\xd5\xc6\xd6\xd9\xd4\xc1\xe3\xc9\xd6\xd5@@@@@@@@@@@@@@@@@@@\x15@@\xc5\xd9\xd9\xd3\xd6\xc3~\xf5z\xf1\xf0z\xf2 SQLCODE=-199'

And here is that same string decoded with IBM1047 with ibm_db.stmt_errormsg().encode("ISO8859-1").decode("IBM1047"), note the apparent "garbage" bytes at the end:

{DB2 FOR OS/390}{ODBC DRIVER}{DSN12015}
 DSNT408I SQLCODE = -199, ERROR:  ILLEGAL USE OF KEYWORD SCHEME.  TOKEN ENCODING
          COMPATIBILITY WAS EXPECTED
 DSNT418I SQLSTATE   = 42601 SQLSTATE RETURN CODE
 DSNT415I SQLERRP    = DSNHPARS SQL PROCEDURE DETECTING ERROR
 DSNT416I SQLERRD    = 2  0  0  -1  28  506 SQL DIAGNOSTIC INFORMATION
 DSNT416I SQLERRD    = X'00000002'  X'00000000'  X'00000000'  X'FFFFFFFF'
          X'0000001C'  X'000001FA' SQL DIAGNOSTIC INFORMATION
  ERRLOC=5:10:2ëé<ä|àá

@bimalkjha
Copy link
Member

bimalkjha commented Aug 14, 2023

@g-haas Could you please check tagging of your ODBC ini file?
ls -lTE $DSNAOINI

Please make sure tagging of your odbc.ini file is "binary" or "mixed":

chtag -b $DSNAOINI
or
chtag -m -c IBM-1047 $DSNAOINI

If file is tagged text (chtag -t -c IBM1047 $DSNAOINI) the S0C4 abend occurs.

Please update us about the test result after tag correction. Thanks.

@jthyssenrocket
Copy link

@g-haas

Python uses UTF-8 internally so ODBC must use ASCII or UNICODE, so there is no need to add encode/decode or patch ibm_db.c.

We've found several customers with incorrect encoding/tagging of the odbc.ini file, which prevents ODBC from properly reading the file thus failing to honor your CURRENTAPPENSC=ASCII setting.

Can you post the encoding of your odbc.ini file?
chtag -p $DSNAOINI

Also, the -805 suggests the ODBC plans and packages are not bound.
Please check if Db2 sampe job SDSNSAMP(DSNTIJCL) has been run. This jobs binds the plan DSNACLI and a number of packages (DSNCLIC1, DSNCLIC2, etc etc) in collection DSNAOCLI.

@denisgaebler
Copy link

denisgaebler commented Aug 17, 2023

@jthyssenrocket I have worked with the customer and the -805 is intended to reproduce the error. This issue is not about fixing the -805 error, its about the encoding issue. Actually I would expect that even if CURRENTAPPENSC is not honoured, the code should be able to print an error message without the hazzle of several encode().decode() call chains.
And why is a string concatinated based on different encodings into an unusable String?

@jthyssenrocket
Copy link

@denisgaebler

I am working with ODBC z/OS development, and the conclusion right now is that it is not possible for the python driver to check if current application encoding scheme is set to ASCII or UNICODE before starting to issue queries. There is no ODBC/CLI API to check this on z/OS.

Python is using unicode internally and does not expect to get error messages or data from ODBC back in EBCDIC.

There are no encoding issues if the ODBC ini file is tagged as required and has CURRENTAPPENSC=ASCII or UNICODE.

@denisgaebler
Copy link

@jthyssenrocket So you cannot do a read from python with unicode as codeset parameter to check if its correctly tagged or throws an error on reading odbc.ini and evaluate the CURRENTAPPENVSC parm in there?
Something like:

# Read a file in 'UTF-8' encoding
f = open('odbc.ini', encoding='UTF-8')
print(f.read())

And catch a possible exception and throw it to the user before doing an actual call to Db2?

@jthyssenrocket
Copy link

jthyssenrocket commented Aug 17, 2023

@denisgaebler

ibm_db is not reading any files. We issue ODBC/CLI calls to the C API implemented by ODBC for z/OS, so we need an C function we can call that returns the current application encoding scheme.

Edit: we're working on getting all relevant document updated to highlight the encoding requirement for the ODBC ini file (Db2 for z/OS manual + python ibm_db docs + node.js ibm_db docs).

@denisgaebler
Copy link

@jthyssenrocket Maybe ibm_db should read odbc.ini to avoid all sorts of obscure errors that are caused by incorrectly tagged odbc.ini?

@jthyssenrocket
Copy link

jthyssenrocket commented Aug 17, 2023

It is outside the scope of ibm_db to read configuration files for the underlying drivers. We don't read db2dsdriver.cfg files on WIndows either.

It is not a unique requirement for ibm_db. Any (unicode) ODBC client that relies on a odbc.ini file in USS have this requirement. It doesn't matter if it is python, node.js, COBOL program, PL/I program, etc.

@g-haas
Copy link
Author

g-haas commented Aug 17, 2023

Python uses UTF-8 internally so ODBC must use ASCII or UNICODE, so there is no need to add encode/decode or patch ibm_db.c.

We originally used CURRENTAPPENSCH=ASCII in our odbc.ini (see above).

We just observed that, when setting CURRENTAPPENSCH=UNICODE, at least the original Exception thrown on ibm_db.connect() contains some useful and print-able information (this is not the case for ASCII). However, the call to ibm_db.conn_errormsg() in the except block still triggers another UnicodeDecodeError:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

If you look closely, it's a different one than in the original description ([...] can't decode byte 0xc3 [...] vs. [...] can't decode byte 0xe4 [...]).

We've found several customers with incorrect encoding/tagging of the odbc.ini file, which prevents ODBC from properly reading the file thus failing to honor your CURRENTAPPENSC=ASCII setting.

Can you post the encoding of your odbc.ini file? chtag -p $DSNAOINI

Sure:

# chtag -p $DSNAOINI
b binary      T=off odbc.ini

Given the above observation regarding the slightly different UnicodeDecodeErrors as well as the varying behavior with regard to the Exceptions that are being thrown, I'd assume that our odbc.ini is being honored.

Also, the -805 suggests the ODBC plans and packages are not bound. Please check if Db2 sampe job SDSNSAMP(DSNTIJCL) has been run. This jobs binds the plan DSNACLI and a number of packages (DSNCLIC1, DSNCLIC2, etc etc) in collection DSNAOCLI.

Of course, you're right about the meaning of the -805. However, as indicated in the original description, we deliberately specified an invalid plan name for test purposes :)

@g-haas
Copy link
Author

g-haas commented Aug 17, 2023

Does the same symptom appear in other functions, e.g ibm_db_conn_warn(), ibm_db_stmt_warn(), ibm_db_conn_error(), ibm_db_stmt_error(), and some other places?

  • ibm_db_conn_warn(): ?
  • ibm_db_conn_error(): no
  • ibm_db_stmt_warn(): ?
  • ibm_db_stmt_error(): no (tested with an existing plan specified in odbc.ini and invalid SQL statement)
  • ibm_db_stmt_errormsg(): no (tested with an existing plan specified in odbc.ini and invalid SQL statement)

@jthyssenrocket
Copy link

We use the ODBC/CLI "W" (wide) APIs, e.g., SQLConnectW. The "W" APIs are supposed to return UTF-16 content always independent on CURRENTAPPENSC. See https://www.ibm.com/docs/en/db2-for-zos/13?topic=data-db2-odbc-unicode-support

Would it be possible to collect APPLTRACE=1 and DIAGTRACE=1, and upload here? There might be some invalid characters in the message returned by Db2.

@bimalkjha
Copy link
Member

We have documented steps to install ibm_db on z/OS here in details. Please check it. Thanks.

@jthyssenrocket
Copy link

There is a deeper issue: we're using the SQLGetDiagRec API to retrieve error messages (not SQLGetDiagRecW UTF-16 API), so messages are returned in varying codepages. This is likely not an issue on distributed platforms, but on z/OS it appears we're getting a mix of EBCDIC, ASCII, and UNICODE error messages back, but our code assumes it is ASCII

It seems it would be more robust to use the SQLGetDiagRecW which always returns UTF-16 error messages instead of assuming the message returned is in ASCII (and use StringOBJ_FromASCII() to convert it to python string).

The recommendation from ODBC for z/OS development is also to use SQLGetDiagRecW.

We are calling SQLGetDiagRec in lots of places, so this will be a larger change, though.

@bimalkjha
Copy link
Member

Opened Jira bug https://jira.rocketsoftware.com/browse/DBC-14843 to fix this issue in ibm_db driver. Thanks.

@Earammak
Copy link
Collaborator

@g-haas, Can you please try with connection string to connect to ibm_db[ by providing wrong details either from hostaname , username or password] and share the logs here.
I see that, in my machine i will be able to see the complete message.

@Earammak
Copy link
Collaborator

@g-haas Can you please share the update ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants