Home
JAQForum Ver 20.06
Log In or Join  
Active Topics
Local Time 00:27 10 May 2024 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : Error : [CYW43] do_ioctl(2, 263, 16): timeout

     Page 1 of 2    
Author Message
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 158
Posted: 06:17am 27 Apr 2024
Copy link to clipboard 
Print this post

I've been chasing the above problem on a WebMite WiFi connection for a few months now, trying to figure out what's going on and the meaning behind it. It seems there's a few other people on the forum with the same problem.
My brain has gone mushy over it, and after about my 700th theory of what's going on (Well, maybe not 700, but it sure seems like it!)

I'll spare everyone the theories, guesses and stabs-in-the-dark I've done on this, but I've got an idea that might be plausible (Every other idea never seemed to 'feel' right, and they ended up being dead-ends... this one 'feels' right)

So, here's a screen grab of a packet I sometimes get from the WWW and is processed by the WebMite, answered with a standard 404 response.



...and a few days later...



Exactly the same packet, but something I hadn't seen before is the [NUL] characters in it... The top line is the PRINT statement to send the string to MMCC - it's not part of the packet.

(note that this packet didn't trip up the WebMite, but you can see that the Process Time is 8500mS and 5500mS seconds. The process time printed is how long the WebInterrupt code takes to process that particular packet, from first entry to sending the 404 reply. Most packets take around 500mS)

So, I did a search on [NUL] in HTTP code and came up with this text -

  Quote  Field values containing CR, LF, or NUL characters are invalid and dangerous, due to the varying ways that implementations might parse and interpret those characters; a recipient of CR, LF, or NUL within a field value MUST either reject the message or replace each of those characters with SP before further processing or forwarding of that message.

Other sites say that a [NUL] char can produce 'unpredictable results' (Exactly what I seem to be having)

Which started me on theory #701, and something I'm testing at the moment.
I'll need to wait a few days running the current code until the error happens and figure out what came through, then modify the code to try and trap any [NUL] chars to see if that fixes anything...
Probably a week or so for testing, so in the meanwhile, I'm looking for some clarity of thought from others here...

Is my theory #701 a valid idea?
Am I missing something obvious?
Any other suggestions? Ideas?

John
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8600
Posted: 07:34am 27 Apr 2024
Copy link to clipboard 
Print this post

John

This sounds like a VERY promising avenue. I've been running Geoff's watering controller now for many weeks and sometimes it runs for 2 or 3 days without interruption, other times it may reset twice or three times a day. I've added mm.info(uptime) to the main html page so I can easily see how long it has been running. It definitely feels like something external is triggering the resets and NUL characters in code that is doing string handling are clearly bad news for C.

Where in the firmware are you looking to correct the spurious NUL characters?
Edited 2024-04-27 17:36 by matherp
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 158
Posted: 08:14am 27 Apr 2024
Copy link to clipboard 
Print this post

Thanks for the sanity check, it's been sending me bonkers!  

I've also run Geoff's controller at times (not actually controlling anything) and it gets the occasional hiccup that mostly triggers a watchdog reset.

If it sheds more light on the subject... For my testing as it is right now, I wrote some VB code to send a string (not html code, just a string "Hits " + an increasing number) that is answered with the standard 404. Right now, it has run 10 hours today with 11382 packets sent, at random times (0.6 to 6 seconds time span) with no problems. I'll leave it run overnight and check tomorrow.
From the public WWW, I've had 25 correctly formatted hits, printed to MMCC, also with no problems.
I won't know the true result until I check the saved log on the WebMite and decipher any spurious characters that might come up. If you want, I can post the resulting file if it helps...

  Quote  It definitely feels like something external is triggering the resets and NUL characters in code that is doing string handling are clearly bad news for C.

Agreed.
The ONLY time that there's a problem, it comes from the WWW (ie: external sources) and the NUL char is the only thing I can point my finger at.
I also read a few times that NUL chars are a problem for strings in C language. I don't know, I have NO idea how C works (My hat's off to you with what you can do in C)

I was just planning to trap the NUL chars in the MMbasic routine itself. Maybe replace the chars with a space or other printable char, but more likely just reject it all together.
I would have no idea where to start in modifying the firmware, so that's off the table for me  

As a side note, probably related... I tried using UDP after the frustrations of TCP and managed to send 65000+ VB messages perfectly. I don't know if it's related, but might be a clue.

John
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8600
Posted: 08:47am 27 Apr 2024
Copy link to clipboard 
Print this post

John: Please try this version - it should replace any NUL characters with space


PicoMiteWeb.zip

If you ever get the error message "Error: WTF" please let me know. It is there to trap something that should never happen but could be related
Edited 2024-04-27 18:49 by matherp
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 158
Posted: 09:15pm 27 Apr 2024
Copy link to clipboard 
Print this post

Thanks Peter, I've flashed the WebMite just now and set everything exactly the same as previous.
I'll run it for as long as possible and see where it leads...
I can confirm that the 'test' NULs sent were replaced with SPACEs

This morning, 30 mins or so after I got up, the MM code received a packet the same as I posted above. The heartbeat was still winking (so the code was running), but the VB code was caught in a loop here
  Quote    While WinsockTCP.State = 7
   DoEvents
 Wend

...which in my code, tells me that the previous packet sent was never replied to, and so the winsock wasn't closed (7 means Socket Connected)
That would make sense if the WM was stuck in a bottle-neck trying to respond to the spurious packet prior.
(Edit: Probably not... It was because I didn't do a whole lot of error trapping in the VB code and ran into a conflict. My fault... Me bad...)

Anyway, I'll keep you posted - Thanks for the great work on the M.Mites  

Out of curiosity, what conditions would result in my most common expression lately - "Error: WTF"?
Edited 2024-04-28 16:19 by Malibu

John
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 158
Posted: 01:08am 29 Apr 2024
Copy link to clipboard 
Print this post

Sadly, [NUL] turned out to be another dead-end  
(At least as far as the timeout error is concerned)

I was running as usual, doing some unrelated fiddling on the computer listening to the click of the relay going on and off (So I know there's a new http request come in)
I had this packet come in that caught my eye -



but the WebMite kept running without a worry  
Maybe... 3 minutes later, there was silence from the relay which usually spells trouble.
Sure enough, the heartbeat was off, no response to the VB packets and nothing new showing on the MMCC screen. (The timeout error wasn't showing either)

Out of curiosity, I clicked the "Run" button on the MMCC and the heartbeat LED came on, and the "[CYW43] do_ioctl(2, 263, 16): timeout" message appeared on the screen.
Click Run again, the led went off and the error message showed again...
Click again... led went on, error showed...
... and so on for the 70'ish times I clicked the run button...
Every time I would click run, the heartbeat led changes state and the error is printed.

I checked the saved logs and there was no indication of any problems.

I ended up re-powering the WM and now it's up and working again, so I'm back to square one.

{I did manage to work out how to fire the watchdog timer by sending http packets. I'll need to churn it around in my head for a bit and I'll post soon}

John
 
led-bloon

Senior Member

Joined: 21/12/2014
Location: Australia
Posts: 203
Posted: 06:38am 29 Apr 2024
Copy link to clipboard 
Print this post

Google mstshash - there are plenty of hits for it, like this:
mstshash
It would appear to be a common problem. Perhaps dispose of these type packets
early and see if that fixes it?
led
Edited 2024-04-29 16:48 by led-bloon
Miss you George
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 158
Posted: 06:59am 29 Apr 2024
Copy link to clipboard 
Print this post

G'day led,
Interesting link - pretty much what I thought the packet would be; some sort of bot.
A plan I have is to add a function to check if there's any chars under Chr(32) to reject the packet.
I think the problem is deeper than that though, but it's another option I haven't tried yet  

  Quote  I would have thought ANY packet with binary (non-alphanumeric) data should be
dumped.


Yeah, I'm pretty much in the same school there. I would have thought with all the ISP's passing along the packet data over WWW, at least one of them would think 'This is rubbish... I'll delete it'

Right now, I've jumped down the rabbit hole and looking into the CYW43 driver C-Code.
(Just because I still have a few loose hairs left on my head, so I may as well lose them too   )

John
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8600
Posted: 07:17am 29 Apr 2024
Copy link to clipboard 
Print this post

  Quote  Out of curiosity, I clicked the "Run" button on the MMCC and the heartbeat LED came on, and the "[CYW43] do_ioctl(2, 263, 16): timeout" message appeared on the screen.
Click Run again, the led went off and the error message showed again...
Click again... led went on, error showed...
... and so on for the 70'ish times I clicked the run button...
Every time I would click run, the heartbeat led changes state and the error is printed.


This is very interesting. The timeout message is part of the Pico SDK and uses printf rather than the normal MMBasic text output routines and it looks like printf is blocking. I've changed the printf to a normal non-blocking message. Please test and let's see what happens


PicoMiteWeb.zip
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 158
Posted: 07:43am 29 Apr 2024
Copy link to clipboard 
Print this post

Thanks, I've just nuked and flashed the WebMite  

Coincidentally, as I was reading your post, the WM froze and had the same symptoms as above. That was after 11610 hits from the VB program.
[Edit] Which tells me that it's probably not limited only to WWW packets. I'm running on TCP port 1111, so anything on port 80, doesn't get forwarded.

I'll check this new version in the AM and report back in the PM tomorrow (ugggh! it's a work day tomorrow   )
Edited 2024-04-29 17:59 by Malibu

John
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 158
Posted: 09:15am 30 Apr 2024
Copy link to clipboard 
Print this post

So, the latest update...
I re-wrote the vb code because the error trapping in the previous version was a bit non-existent. It's much better now  

I ran the vb code and the WM code tonight when I got home from work. Here's the results:
12013 hits to the WM from VB code at 500mS intervals over approximately 2 1/2 hours until it trapped a fault.
The VB code process list as printed to a text box that shows each stage as it happens (Top line on the right is start time, bottom line is the fault time, and 'Number' is the WinSock error number...) -



This is the print to MMCC...



Which shows that the WM received the data, but couldn't (or wasn't able to) respond.
The VB connection is/was still open, so it was waiting for a reply, but for some reason the WM got stuck inside a brain-freeze and couldn't send the data out.

I'll triple check all my code, but I'm pretty sure it's pretty much bullet-proof  
There's no sign of the '[CYW43] do_ioctl(2, 263, 16): timeout' as yet, but this could be part of the symptoms... and, as it is right now, the output relay is ON (it gets turned on at interrupt entry and turned off at exit time) and the heartbeat LED is on solid. If I hit ^C then run, run, run... the heartbeat changes state on each 'run'.
I have also disabled the Watchdog, so when a fault occurs, it freezes up. Otherwise this condition would result in firing the WD timer.

I will try and refine the vb code tomorrow and see if I can squeeze another byte of information out of the data/info.

Any thoughts?

John
 
OA47

Guru

Joined: 11/04/2012
Location: Australia
Posts: 911
Posted: 03:05am 01 May 2024
Copy link to clipboard 
Print this post

I have a project that is crashing at least twice a day. The program locks up without error message until a key is pressed then I get the [CYW43] do_ioctl(2, 263, 16): timeout and a power reset is the only way I can get the program running again. I have a 5 second watchdog running in the main loop and ON ERROR SKIP prior to the WEB calls. Is there any other error trapping I can do to keep the program running?

0A47
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 5917
Posted: 03:41am 01 May 2024
Copy link to clipboard 
Print this post

I have been running Peters latest beta on a test system. It locks up and does a reboot most days.

My system that is doing real work is back to an early beta and is very reliable.

The method I find that gives the best service is twofold.
1 Run the hardware watchdog.
2 put a loop counter in the main program loop.
The counter is reset during a normal interrupt.
If the counter gets too high, the interrupts are not triggering so do a CPO restart.

 DO
   IF tickTime THEN dotick ' reset the loop counter and do the background work here
   k$ = INKEY$
   IF k$ <>"" THEN
     SELECT CASE k$
       CASE "L","l"
         ptt_ok = 1-ptt_ok
         PIN(ptt_inhibit) = ptt_ok
       CASE "X","x"
         OPTION AUTORUN OFF
         WATCHDOG HW OFF
         syslog "Quit"
         Quit = 1
     END SELECT
   ENDIF
   INC loopCount
   
   IF loopCount > 15000 THEN
     syslog " Loop failed - RESTARTING"
     CPU RESTART
   ENDIF
   IF checkNTP THEN updateNTP
 LOOP UNTIL Quit


The theory is
If the CPU timers are running, the watchdog will detect a stalled program
If the timers have stopped, the loop keeps going without any interrupts so the counter gets to the panic setting and reboots. This requires NO pause statements in the main loop.


Jim
Edited 2024-05-01 13:45 by TassyJim
VK7JH
MMedit   MMBasic Help
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 158
Posted: 05:48am 01 May 2024
Copy link to clipboard 
Print this post

Jim, AO47 - thanks for your input  
I currently have the watchdog turned off because I want to freeze the MMCC screen (and WM) where the problem happens - I'm trying to avoid a reset and keep an 'as-is' state.
There's a few times where a problem happens where the heartbeat stops. If the WD fires, it can sometimes clear it and the HB starts again, so I miss the lock-up conditions.
Normally I'll have the HB at 20 seconds.

For MOST times, a full power reset is the only way to do a 'clean' recovery - By clean, I mean everything goes back to as it should be.
For example, it was locked up when I got home from work with the dreaded timeout message, so (long story short) I did ^C and clicked run and everything started working again. Regular heartbeat, relay clicking on and off. Hit ^C, the WM went to a 'normal' stopped state, but the heartbeat was not regular. on-off-on-off-on-on-off.
After a full repower, it's happy again and doing exactly as intended.

I'm stumped and I'm out of ideas. I've looked at the C-code source for the CYW43 driver and found where the message probably gets fired, but knowing nothing about C, I struggle with the workings of it.
This here -
CYW43_WARN("do_ioctl(%u, %u, %u): timeout\n", (unsigned int)kind, (unsigned int)cmd, (unsigned int)len);

There's a lot of discussion out there on the timeout problem and most people follow it up with a "do_ioctl: got unexpected packet" error as well - which is something I haven't seen. (I have seen a few "Internal error in tcp_server_recv - Attempting recovery", but they seem to be handled OK, I suspect by MMbasic and not the Pico SDK.

There's too many symptoms, at irregular times resulting in unpredictable behaviour for me to have a "GOTCHYA" moment...

I dunno, I think I'm beaten on this one  

Edit: And to shed some observations on the watchdog firing, I can fire it at my pleasure. If a http request gets sent to the WM from VB, the interrupt triggers. IF I close the vb down before the interrupt sub has finished (ie: connection is lost) the WM has nowhere to reply to, so sits in a kind of 'limbo' waiting for a connection that is listening. It will fire the watchdog, 100% sure of it. So the questions is, what happens when a legitimate WWW request comes in and drops out during the interrupt routine?
Edited 2024-05-01 16:06 by Malibu

John
 
OA47

Guru

Joined: 11/04/2012
Location: Australia
Posts: 911
Posted: 10:37pm 01 May 2024
Copy link to clipboard 
Print this post

Thanks Jim,
  Quote  My system that is doing real work is back to an early beta and is very reliable.


Out of curiosity which version is this?

0A47
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 5917
Posted: 10:52pm 01 May 2024
Copy link to clipboard 
Print this post

  OA47 said  Thanks Jim,
  Quote  My system that is doing real work is back to an early beta and is very reliable.


Out of curiosity which version is this?

0A47


I don't push it too hard and leave the important things to non-webmite devices.

WebMite MMBasic Version 5.07.08b10
OPTION SYSTEM SPI GP10,GP11,GP12
OPTION CPUSPEED (KHz) 250000
OPTION LCDPANEL VIRTUAL_C
OPTION WIFI xxxxxx, ***********, RTVmon
OPTION TCP SERVER PORT 80, 10000
OPTION TELNET CONSOLE ON
OPTION SDCARD GP13
>


and just in case you don't have a copy

WebMiteV5.07.08b10.zip

It is old so missing a lot but works for my uses.

Jim
VK7JH
MMedit   MMBasic Help
 
Malibu
Senior Member

Joined: 07/07/2018
Location: Australia
Posts: 158
Posted: 05:05am 02 May 2024
Copy link to clipboard 
Print this post

Sorry Jim, a silly question - what's the 10000 in this line?
  Quote  OPTION TCP SERVER PORT 80, 10000


The manual says...
  Quote  OPTION TCP SERVER PORT n
Launches a TCP server on port 'n' during every restart of the WebMite. Typically HTTP servers use port 80.
USE "OPTION TCP SERVER PORT 0" to disable
When the server is running it can respond to up to MM.INFO(MAX CONNECTIONS)


The only parameter have found is for the actual port number (80 for a web server) or did I miss something (As I usually do)?

John
 
Pluto
Guru

Joined: 09/06/2017
Location: Finland
Posts: 330
Posted: 07:07am 02 May 2024
Copy link to clipboard 
Print this post

Found this in my own notes, couldn't however find it on the thebackshed:
OPTION TCP SERVER PORT n, timeout

Pluto
 
JohnS
Guru

Joined: 18/11/2011
Location: United Kingdom
Posts: 3675
Posted: 07:32am 02 May 2024
Copy link to clipboard 
Print this post

hmm, I found a quote here

Other than that?

John
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 5917
Posted: 07:35am 02 May 2024
Copy link to clipboard 
Print this post

Pluto is correct. It is a timeout which defaults to 5 seconds (5000mS)
You can set it between 1 and 20 seconds.

If you are serving up large images a longer timeout is needed.

I can't find the reference to it but it will be here somewhere in the webmite beta threads.

Something that I missed when checking the manuals.

Jim
VK7JH
MMedit   MMBasic Help
 
     Page 1 of 2    
Print this page
© JAQ Software 2024